适用于Amazon Athena JDBC驱动程序的Python DB API 2.0(PEP 249)兼容包装器

PyAthenaJDBC-qubole的Python项目详细描述


https://img.shields.io/pypi/pyversions/PyAthenaJDBC.svghttps://travis-ci.org/laughingman7743/PyAthenaJDBC.svg?branch=masterhttps://codecov.io/gh/laughingman7743/PyAthenaJDBC/branch/master/graph/badge.svghttps://img.shields.io/pypi/l/PyAthenaJDBC.svg

Pyathenajdbc

pyathenajdbc是一个符合pythonDB API 2.0 (PEP 249)的包装器,用于Amazon Athena JDBC driver

要求

  • Python
  • 第2、7、3、4、3.5、3.6节
  • 爪哇语
    • Java=8
  • 安装

    $ pip install PyAthenaJDBC
    

    额外套餐:

    PackageInstall commandVersion
    Pandas^{tt1}$>=0.19.0
    SQLAlchemy^{tt2}$>=1.0.0

    使用量

    基本用法

    frompyathenajdbcimportconnectconn=connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',region_name='us-west-2')try:withconn.cursor()ascursor:cursor.execute("""
            SELECT * FROM one_row
            """)print(cursor.description)print(cursor.fetchall())finally:conn.close()

    光标迭代
    frompyathenajdbcimportconnectconn=connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',region_name='us-west-2')try:withconn.cursor()ascursor:cursor.execute("""
            SELECT * FROM many_rows LIMIT 10
            """)forrowincursor:print(row)finally:conn.close()

    使用参数查询

    支持的DB API paramstyle仅为PyFormatPyFormat只支持named placeholders旧的%运算符样式,参数指定字典格式。

    frompyathenajdbcimportconnectconn=connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',region_name='us-west-2')try:withconn.cursor()ascursor:cursor.execute("""
            SELECT col_string FROM one_row_complex
            WHERE col_string = %(param)s
            """,{'param':'a string'})print(cursor.fetchall())finally:conn.close()

    如果查询中包含%字符,则必须使用%%进行转义,如下所示:

    SELECTcol_stringFROMone_row_complexWHEREcol_string=%(param)sORcol_stringLIKE'a%%'

    JVM选项

    在connect方法或connection对象中,可以使用字符串数组指定jvm选项。

    您可以按如下方式增加jvm堆大小:

    frompyathenajdbcimportconnectconn=connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',region_name='us-west-2',jvm_options=['-Xms1024m','-Xmx4096m'])try:withconn.cursor()ascursor:cursor.execute("""
            SELECT * FROM many_rows
            """)print(cursor.fetchall())finally:conn.close()

    sql炼金术

    使用pip install SQLAlchemy>=1.0.0pip install PyAthenaJDBC[SQLAlchemy]安装sqlalchemy。 支持的sqlalchemy为1.0.0或更高版本。

    importcontextlibfromurllib.parseimportquote_plus# PY2: from urllib import quote_plusfromsqlalchemy.engineimportcreate_enginefromsqlalchemy.sql.expressionimportselectfromsqlalchemy.sql.functionsimportfuncfromsqlalchemy.sql.schemaimportTable,MetaDataconn_str='awsathena+jdbc://{access_key}:{secret_key}@athena.{region_name}.amazonaws.com:443/'\
               '{schema_name}?s3_staging_dir={s3_staging_dir}'engine=create_engine(conn_str.format(access_key=quote_plus('YOUR_ACCESS_KEY'),secret_key=quote_plus('YOUR_SECRET_ACCESS_KEY'),region_name='us-west-2',schema_name='default',s3_staging_dir=quote_plus('s3://YOUR_S3_BUCKET/path/to/')))try:withcontextlib.closing(engine.connect())asconn:many_rows=Table('many_rows',MetaData(bind=engine),autoload=True)print(select([func.count('*')],from_obj=many_rows).scalar())finally:engine.dispose()

    连接字符串的格式如下:

    awsathena+jdbc://{access_key}:{secret_key}@athena.{region_name}.amazonaws.com:443/{schema_name}?s3_staging_dir={s3_staging_dir}&driver_path={driver_path}&...

    注意:s3_staging_dir需要引号。如果access_keysecret_key和其他参数包含特殊字符,则还需要引号。

    熊猫

    熊猫数据帧的最小示例:

    frompyathenajdbcimportconnectimportpandasaspdconn=connect(access_key='YOUR_ACCESS_KEY_ID',secret_key='YOUR_SECRET_ACCESS_KEY',s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',region_name='us-west-2',jvm_path='/path/to/jvm')# optional, as used by JPypedf=pd.read_sql("SELECT * FROM many_rows LIMIT 10",conn)

    作为熊猫数据帧:

    importcontextlibfrompyathenajdbcimportconnectfrompyathenajdbc.utilimportas_pandaswithcontextlib.closing(connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/'region_name='us-west-2')))asconn:withconn.cursor()ascursor:cursor.execute("""
            SELECT * FROM many_rows
            """)df=as_pandas(cursor)print(df.describe())

    示例

    Redash查询运行程序示例

    examples/redash/athena.py

    凭证

    支持AWS CLI credentialsProperties file credentialsAWS credentials provider chain

    凭证文件

    ~/.aws/凭证

    [default]aws_access_key_id=YOUR_ACCESS_KEY_IDaws_secret_access_key=YOUR_SECRET_ACCESS_KEY

    ~/.aws/配置

    [default]region=us-west-2output=json

    环境变量
    $ exportAWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID
    $ exportAWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY
    $ exportAWS_DEFAULT_REGION=us-west-2
    

    附加环境变量:

    $ exportAWS_ATHENA_S3_STAGING_DIR=s3://YOUR_S3_BUCKET/path/to/
    

    属性文件凭据

    创建以下格式的属性文件。

    /路径/to/awscredentials.properties

    accessKeyId:YOUR_ACCESS_KEY_IDsecretKey:YOUR_SECRET_ACCESS_KEY

    使用connect方法或连接对象的credential_file指定属性文件路径。

    frompyathenajdbcimportconnectconn=connect(credential_file='/path/to/AWSCredentials.properties',s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',region_name='us-west-2')

    pyathenajdbc使用属性文件来验证amazon athena。

    AWS凭证提供商链

    AWS credentials provider chain

    AWS credentials provider chain that looks for credentials in this order:

    • Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY (RECOMMENDED since they are recognized by all the AWS SDKs and CLI except for .NET), or AWS_ACCESS_KEY and AWS_SECRET_KEY (only recognized by Java SDK)
    • Java System Properties - aws.accessKeyId and aws.secretKey
    • Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI
    • Credentials delivered through the Amazon EC2 container service if AWS_CONTAINER_CREDENTIALS_RELATIVE_URI” environment variable is set and security manager has permission to access the variable,
    • Instance profile credentials delivered through the Amazon EC2 metadata service

    在connect方法或connection对象中,可以通过指定至少s3_staging_dirregion_name进行连接。 不需要指定access_keysecret_key

    frompyathenajdbcimportconnectconn=connect(s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',region_name='us-west-2')

    Terraform实例配置文件示例:

    examples/terraform/

    测试

    取决于以下环境变量:

    $ exportAWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID
    $ exportAWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY
    $ exportAWS_DEFAULT_REGION=us-west-2
    $ exportAWS_ATHENA_S3_STAGING_DIR=s3://YOUR_S3_BUCKET/path/to/
    

    运行测试
    $ pip install pipenv
    $ pipenv install --dev
    $ pipenv run scripts/test_data/upload_test_data.sh
    $ pipenv run pytest
    $ pipenv run scripts/test_data/delete_test_data.sh
    

    运行测试多个python版本
    $ pip install pipenv
    $ pipenv install --dev
    $ pipenv run scripts/test_data/upload_test_data.sh
    $ pyenv local3.6.5 3.5.5 3.4.8 2.7.14
    $ pipenv run tox
    $ pipenv run scripts/test_data/delete_test_data.sh
    

    欢迎加入QQ群-->: 979659372 Python中文网_新手群

    推荐PyPI第三方库


    热门话题
    java向嵌入式Jetty添加多个端点   java如何在JAXWS处理程序中区分请求和响应?   使用Scenebuilder for JAVAFx的登录应用程序的java MVC体系结构   java对话框将不显示   Windows 7上的Java系统变量   java删除动态添加的面板   java将Javadoc嵌入到HTML网站中   带有URL编码数据的java Spring RestTemplate POST请求   java JAXR只运行一次函数   HttpClient缺少java依赖项   java深层反射比较   基于javarmi和CORBA的分布式计算   如何使用当前数据库时间从Java更新MongoDB?   java通过光标保存数据调试时显示错误数据