odps-python-sdk与数据分析框架
pyodps的Python项目详细描述
访问odps api的elegent方法。 Documentation
安装
快捷方式:
pip install 'pyodps[full]'
如果不需要使用jupyter,只需键入
pip install pyodps
依赖项将自动安装。
或源代码:
$ virtualenv pyodps_env $ source pyodps_env/bin/activate $ git clone <git clone URL> pyodps $ cd pyodps $ python setup.py install
依赖关系
- python(>;=2.6),包括python 3+,pypy,建议使用python 2.7
- 设置工具(>;=3.0)
- 请求(>;=2.4.0)
运行unittest
- 将conf/test.conf.template复制到odps/tests/test.conf,并填充它 使用您的帐户
- 运行python -m unittest discover
用法
>>>fromodpsimportODPS>>>o=ODPS('**your-access-id**','**your-secret-access-key**',...project='**your-project**',endpoint='**your-end-point**')>>>dual=o.get_table('dual')>>>dual.name'dual'>>>dual.schemaodps.Schema{c_int_abigintc_int_bbigintc_double_adoublec_double_bdoublec_string_astringc_string_bstringc_bool_abooleanc_bool_bbooleanc_datetime_adatetimec_datetime_bdatetime}>>>dual.creation_timedatetime.datetime(2014,6,6,13,28,24)>>>dual.is_virtual_viewFalse>>>dual.size448>>>dual.schema.columns[<columnc_int_a,typebigint>,<columnc_int_b,typebigint>,<columnc_double_a,typedouble>,<columnc_double_b,typedouble>,<columnc_string_a,typestring>,<columnc_string_b,typestring>,<columnc_bool_a,typeboolean>,<columnc_bool_b,typeboolean>,<columnc_datetime_a,typedatetime>,<columnc_datetime_b,typedatetime>]
数据帧API
>>>fromodps.dfimportDataFrame>>>df=DataFrame(o.get_table('pyodps_iris'))>>>df.dtypesodps.Schema{sepallengthfloat64sepalwidthfloat64petallengthfloat64petalwidthfloat64namestring}>>>df.head(5)|==========================================|1/1(100.00%)0ssepallengthsepalwidthpetallengthpetalwidthname05.13.51.40.2Iris-setosa14.93.01.40.2Iris-setosa24.73.21.30.2Iris-setosa34.63.11.50.2Iris-setosa45.03.61.40.2Iris-setosa>>>df[df.sepalwidth>3]['name','sepalwidth'].head(5)|==========================================|1/1(100.00%)12snamesepalwidth0Iris-setosa3.51Iris-setosa3.22Iris-setosa3.13Iris-setosa3.64Iris-setosa3.9
命令行和ipython增强功能
In [1]: %load_ext odps In [2]: %enter Out[2]: <odps.inter.Room at 0x10fe0e450> In [3]: %sql select * from pyodps_iris limit 5 |==========================================| 1 / 1 (100.00%) 2s Out[3]: sepallength sepalwidth petallength petalwidth name 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa
python udf调试工具
#file: plus.pyfromodps.udfimportannotate@annotate('bigint,bigint->bigint')classPlus(object):defevaluate(self,a,b):returna+b
$ cat plus.input 1,1 3,2 $ pyou plus.Plus < plus.input 2 5
贡献
对于开发安装,克隆存储库,然后从 来源:
git clone https://github.com/aliyun/aliyun-odps-python-sdk cd pyodps pip install -r requirements.txt -e .
如果需要修改前端代码,则需要安装 nodejs/npm。创建并安装 前端代码,使用
python setup.py build_js python setup.py install_js