要从SQL查询中获取表名
select *
from table1 as t1
full outer join table2 as t2
on t1.id = t2.id
我在Scala中找到了一个解决方案How to get table names from SQL query?
^{pr2}$当我迭代返回序列getTables(query).foreach(println)
时,它给出了正确的表名
table1
table2
PySpark的等效语法是什么?我遇到的最接近的是 How to extract column name and column type from SQL in pyspark
plan = spark_session._jsparkSession.sessionState().sqlParser().parsePlan(query)
print(f"table: {plan.tableDesc().identifier().table()}")
回溯失败了
Py4JError: An error occurred while calling o78.tableDesc. Trace:
py4j.Py4JException: Method tableDesc([]) does not exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:835)
I understand, the problem stems up from the fact that I need to filter all plan items which are of type
UnresolvedRelation
but I cannot find an equivalent notation in python/pyspark
我有办法,但相当复杂。它转储Java对象和JSON(穷人的序列化过程),将其反序列化为python对象,过滤和解析表名
当我迭代函数
list(get_tables(query))
时,会产生['fast_track_gv_nexus', 'buybox_gv_nexus']
注意不幸的是,CTE
示例
^{pr2}$为了解决这个问题,我必须通过正则表达式来破解
相关问题 更多 >
编程相关推荐