如何合并python中存储在变量中的多个路径中的所有文件?

2024-06-10 22:26:57 发布

您现在位置:Python中文网/ 问答频道 /正文

如何合并python中存储在变量中的多个路径中的所有文件

下面,我尝试在一个可验证的文件中获取所有路径:

SUR_INVOICE_FILES = []
listoffolders = []
dir = path_in
inbound = dbutils.fs.ls (dir)
for folder in inbound:
    subfolderlist = dbutils.fs.ls(folder.path)   
    for listoffolders in subfolderlist:
        list_of_sources= listoffolders.path
        SR = dbutils.fs.ls(list_of_sources)
        SUR_INVOICE_FILES.append(listoffolders.path)
        root = (''+re.search('(/\w+)+.+',list_of_sources).group())
        print(root)

“root”为我提供了文件的所有路径:

/mnt/datalake/**/SurInvoice/2020-08-31_093551/SurInvoice.parquet
/mnt/datalake/**/SurInvoice/2020-08-31_103115/SurInvoice.parquet
/mnt/datalake/**/SurInvoice/2020-09-01_075931/SurInvoice.parquet
/mnt/datalake/**/SurInvoice/2020-09-17_080933/SurInvoice.parquet

现在我想把merge的所有内容parquet files放在一个文件中,并以JSON格式存储在不同的网络目录中。我该怎么做


Tags: 文件ofpathin路径fslslist
1条回答
网友
1楼 · 发布于 2024-06-10 22:26:57

使用pandas读取,连接数据帧,然后写入json文件。像这样的

import pandas as pd
dfs = []
for file_path in file_paths :
    dfs.append(pd.read_parquet(file_path))
df = pd.concat(dfs,ignore_index=True)
df.to_json(path)

将此添加到代码中。看看这是否有效:

SUR_INVOICE_FILES = []
listoffolders = []
dfs = []
dir = path_in
inbound = dbutils.fs.ls (dir)
for folder in inbound:
    subfolderlist = dbutils.fs.ls(folder.path)   
    for listoffolders in subfolderlist:
        list_of_sources= listoffolders.path
        SR = dbutils.fs.ls(list_of_sources)
        SUR_INVOICE_FILES.append(listoffolders.path)
        root = (''+re.search('(/\w+)+.+',list_of_sources).group())
        dfs.append(pd.read_parquet(root))
        print(root)
df = pd.concat(dfs,ignore_index=True)
df.to_json(save_path)

相关问题 更多 >