使用Phoenix 4.x和Python 2.x从Hbase 1.1获取数据时出错

query = 'select count(PK) from A_Model.TableA' jdbc_url = 'jdbc:phoenix:..xxx/hbase-secure' df_records = sparkConfig.getSqlContext().read.format('jdbc')\ .options(driver='org.apache.phoenix.jdbc.PhoenixDriver', url=jdbc_url, dbtable=query).load()

1条回答

网友

1楼 · 发布于 2024-05-14 16:57:04

如Spark SQL - load data with JDBC using SQL statement, not table name中所述，您应该使用子查询：

query = '(select count(PK) from A_Model.TableA) AS some_name'

但实际上it is recommended to use connector, not JDBC：

Although Spark supports connecting directly to JDBC databases, it’s only able to parallelize queries by partioning on a numeric column. It also requires a known lower bound, upper bound and partition count in order to create split queries.
In contrast, the phoenix-spark integration is able to leverage the underlying splits provided by Phoenix in order to retrieve and save data across multiple workers. All that’s required is a database URL and a table name. Optional SELECT columns can be given, as well as pushdown predicates for efficient filtering.

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Phoenix 4.x和Python 2.x从Hbase 1.1获取数据时出错

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >