businessoptics api客户端
businessoptics的Python项目详细描述
businessoptics客户端
基于python请求库,轻松访问businessoptics api。
例如:
frombusinessopticsimportClientprintClient(auth=('user@example.com','apikey')).workspace(123).query('ideaname').tuples()
安装
pip install businessoptics
身份验证
构造新客户机。
身份验证详细信息可以直接传入:
client=Client(auth=('user@example.com','apikey'))
或从环境变量中提取:
- 商业光学电子邮件
- 商业光学
所以你可以去
client=Client()
或者从~/.businessoptics_client.config
格式的json文件
{"user@example.com":"<apikey>","other@example.com":"<apikey>"}
因此,您可以轻松地在多个用户之间切换并创建一个客户端,如下所示
client=Client(auth="user@example.com")
用法
客户端使用日志记录来显示它在做什么,因此请确保已配置日志记录。快速执行此操作的方法如下:
frombusinessopticsimportsetup_quick_console_loggingsetup_quick_console_logging()
运行查询并下载元组
client=Client()workspace=client.workspace(123)# For a single idea:tuples=workspace.query('idea_name1').tuples()# tuples is now a list of dictionaries# For multiple ideas:query=workspace.query(['idea_name1','idea_name2'])tuples1=query.tuples('idea_name1')tuples2=query.tuples('idea_name2')# For large numbers of tuples:fortupinquery.tuple_generator('idea_name1'):process(tup)# Get a (possibly cached) pandas dataframe of tuples:df=query.to_df('idea_name1')# Quick queries have a similar API, e.g.tuples=workspace.quick('idea_name1').tuples()
将元组上载到数据集
dataset=client.dataset(456)dataset.upload_tuples([{'id':1,'name':'alice'}],[{'id':2,'name':'bob'}],)# Or use a generator for large amounts of tuples, e.g.deftuples():fori,tupinenumerate(large_database_query()):tup['id']=iyieldtupdataset.upload_tuples(tuples())# Or upload a pandas DataFramedf=pd.read_csv(...)dataset.upload_df(df)# Uploads all the columns, but not the indexesdataset.upload_df(df.reset_index())# Uploads all the indexes as well
从google drive下载文件
frombusinessopticsimportgdrive_file# Get URL by clicking on a file and then 'Get shareable link' in Google Drivegfile=gdrive_file('https://drive.google.com/open?id=ABCDEF123')gfile.path()# The path to the downloaded filegfile.open()# Open the filegfile.unzip()# Unzip the file, returning a similar object. The zip must only contain one filegfile.untar()# Untar the file. Similar to the above, but use for '.tar.gz'.gfile.unzip('the_only_file_you_need.csv')# extract a specific file when there are many# Read a zipped CSV into Pandasdf=pd.read_csv(gfile.unzip().path())
将文件上载到google drive
frombusinessopticsimportupload_to_google_drive# File will be called 'a_local_file.csv' on Google Driveupload_to_google_drive('path/to/a_local_file.csv')# File will be called 'name_on_drive.csv' on Google Driveupload_to_google_drive('path/to/a_local_file.csv','name_on_drive.csv')# File will be zipped before upload and will be called 'a_local_file.csv.zip' on Google Drive upload_to_google_drive('path/to/a_local_file.csv',zipit=True)
控制下载缓存
如果存在/global_cache
文件夹(例如,它在jupyter.businessoptics.net上),那么默认情况下,缓存文件(来自gdrive_file()
或.to_df()
)存储在那里并由每个人共享。如果多人下载相同的行为,这可能会导致奇怪的行为。您可以避免这种情况,只使用主文件夹中的缓存,如下所示:
frombusinessopticsimportisolate_cacheisolate_cache()
通用api用法
每个客户端实例都有一个基url。从它发出的所有请求都从该基开始,您可以选择向请求的url添加更多的请求。例如:
client=Client()# client.base_url is ''workspace=client.workspace(123)# workspace.base_url is '/api/v2/workspace/123'# sends a GET request to /api/v2/workspace/123, returning metadata about the workspaceworkspace.get()# sends a GET request to /api/v2/workspace/123/query, returning the query history# of the workspaceworkspace.get('query')
api用json响应,json被自动解析为python数据结构,顶部有一个字典。
如果要发送post、put或delete请求,请使用post/put/delete
方法。您可能需要使用一些字典为请求主体指定json
关键字参数。
如果出现错误,将引发APIError
异常。
资源类
Client
有几个子类,每个子类表示应用程序中的不同资源,并且有不同的方法。下面是如何创建这些类的实例以及如何使用它们的简要概述。有关详细信息,请参阅源代码和docstrings。
所有这些类都有基本url,它们接受一个简单的get()
来获取有关资源的元数据。
frombusinessopticsimportClient,Workspace,DataCollection,Dataset,Query,IdeaResult,Dashboardclient=Client()# Workspaceworkspace=client.workspace(123)workspace=client.workspace('workspace name')workspace=Workspace.at(client,'/api/v2/workspace/123/')## Initialise a training runtraining_run=workspace.train(['idea1','idea2'])## Wait for it to completetraining_run.await()# Query## Get an existing, previously initiated query:query=client.query(456)query=Query.at(client,'/api/v2/query/456')## Run a new query:query=workspace.query(['idea_name1','idea_name2'])### Pass knowledge parameters:query=workspace.query('idea_name',parameters={'param1':1,'param2':2})### Run using hadoop:query=workspace.query('idea_name',execution_mode='hadoop')## To get tuples, use the tuples(), tuple_generator(), or to_df() methods that## exist in IdeaResult. You don't have to separately get the result, just pass## the idea name as the first argument, e.g. you can do:tuples=query.tuples('idea_name')## which is equivalent to:tuples=query.result('idea_name').tuples()## You can also run quick queries by replacing workspace.query with workspace.quick# IdeaResultresult=query.result('idea_name1')result=IdeaResult.at(client,'/api/v2/query/456/result/idea_name1')tuples=result.tuples()## For large numbers of tuples:fortupinresult.tuple_generator():process(tup)## Get a dataframe:df=result.to_df()## Reingest into a datasetdata_update=result.reingest_into_existing_dataset(456)## Wait for the reingestion to finishdata_update.await()# DataCollectioncollection=client.datacollection(123)collection=client.datacollection('collection name')collection=DataCollection.at(client,'/api/v2/datacollection/123')collection=client.dataset(456).collection# NOT the datacollection method# Datasetdataset=client.dataset(456)dataset=collection.dataset('dataset name')dataset=Dataset.at(client,'/api/v2/dataset/456')## For uploading tuples, see section above## Downloading tuples is similar to IdeaResult: ## use the methods tuples(), tuple_generator(), and to_df()## You can also specify filters for the first two methds - see the docstring for tuple_generator## Create a new dataset:dataset=collection.create_tablestore_dataset(name='test',dimensions=[dict(name='col1',type='integer',default='-1',key=False),dict(name='col2',type='integer',default='-1',key=False),])## or from a dataframe (see docstring):dataset=collection.create_tablestore_dataset_from_df('df test',df)## Duplicate a datasetdataset_name=dataset.get()['name']new_dataset_name='new.'+dataset_namenew_dataset=dataset.duplicate(new_dataset_name)# see docstring for more parameters## Rename a dataset:new_dataset.rename(dataset_name)## Delete a dataset:dataset.delete()## Delete tuples:dataset.delete_tuples()# see docstring for how to specify filter# Dashboarddashboard=client.dashboard(456)dashboard=workspace.dashboard('dashboard name')dashboard=Dashboard.at('/api/v2/dashboard/456')