尝试从嵌套的xmlfi的每个级别创建多个spark数据帧

2024-04-26 21:56:13 发布

您现在位置：Python中文网/ 问答频道 /正文

7633

网友

男 | 程序猿一只，喜欢编程写python代码。

因此，我使用databricks community edition使用pyspark实现将一个xml文件解析为spark数据帧，并希望创建每个xml级别的多个数据帧

我已经编写了一段代码，在其中读取xml并将顶层节点展平到单个数据帧

df_xml = spark.read.format('com.databricks.spark.xml').options(rootTag='POSLog',rowTag='Transaction').load(file_location)

from pyspark.sql.functions import *

def flatten_df(nested_df):
    exist = True
    while exist:
        flat_cols = [c[0] for c in nested_df.dtypes if c[1][:6] != 'struct']
        nested_cols = [c[0] for c in nested_df.dtypes if c[1][:6] == 'struct']
        if len(nested_cols) > 0:
          print(nested_cols)
          flat_df = nested_df.select(flat_cols +
                                     [col("`"+nc+'`.`'+c+"`").alias((nc+'_'+c).replace(".","_"))
                                      for nc in nested_cols
                                      for c in nested_df.select("`"+nc+'`.*').columns])
          nested_df=flat_df
          #break
        else:
          exist = False
    return flat_df
df1=flatten_df(df_xml)
display(df1)

但是，我不能/不知道如何访问有问题的数据帧的列，以便循环访问并获取更多的数据帧。你知道吗

Tags：数据 in df for if xml spark pyspark

0条回答

目前没有回答

尝试从嵌套的xmlfi的每个级别创建多个spark数据帧

相关问题更多 >

编程相关推荐

热门问题

热门文章

尝试从嵌套的xmlfi的每个级别创建多个spark数据帧

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >