使用Phoenix 4.x和Python 2.x从Hbase 1.1获取数据时出错

2024-05-14 16:57:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我是Phoenix、HBase和Python的初学者,正在使用Pyspark编写一个小POC,使用Phoenix从HBase数据库检索一些基本信息。在

这是我的代码片段。在

query = 'select count(PK) from A_Model.TableA'
jdbc_url = 'jdbc:phoenix:..xxx/hbase-secure'
df_records = sparkConfig.getSqlContext().read.format('jdbc')\
                  .options(driver='org.apache.phoenix.jdbc.PhoenixDriver', url=jdbc_url, dbtable=query).load()

当尝试使用spark submit运行此程序时,我得到以下错误

^{pr2}$

Tags: 代码信息数据库urlcountqueryselectpyspark
1条回答
网友
1楼 · 发布于 2024-05-14 16:57:04

Spark SQL - load data with JDBC using SQL statement, not table name中所述,您应该使用子查询:

query = '(select count(PK) from A_Model.TableA) AS some_name'

但实际上it is recommended to use connector, not JDBC

Although Spark supports connecting directly to JDBC databases, it’s only able to parallelize queries by partioning on a numeric column. It also requires a known lower bound, upper bound and partition count in order to create split queries.

In contrast, the phoenix-spark integration is able to leverage the underlying splits provided by Phoenix in order to retrieve and save data across multiple workers. All that’s required is a database URL and a table name. Optional SELECT columns can be given, as well as pushdown predicates for efficient filtering.

相关问题 更多 >

    热门问题