Python无法读取大文件

2024-03-29 08:13:12 发布

您现在位置:Python中文网/ 问答频道 /正文

如何从jupyter笔记本中的hdfs中读取大表作为数据帧? 脚本通过docker映像启动

库:

  • sasl==0.2.1
  • 节俭==0.11.0
  • 节约sasl==0.4a1
  • 黑斑羚==0.16.2
from impala.dbapi import connect 
from impala.util import as_pandas

impala_conn = connect(host='hostname', port=21050,
auth_mechanism='GSSAPI', 
                      timeout=100000, use_ssl=True, ca_cert=None, 
                      ldap_user=None, ldap_password=None, 
                      kerberos_service_name='impala')

这很有效


import pandas as pd
df = pd.read_sql("select id, crt_mnemo from demo_db.stg_deals_opn LIMIT 100", impala_conn)
print(df)

这是行不通的。操作挂起,不给出错误


import pandas as pd
df = pd.read_sql("select id, crt_mnemo from demo_db.stg_deals_opn LIMIT 1000", impala_conn)
print(df)


Tags: fromimportnonepandasdfreadsqlas