将多个csv文件导入pandas并连接到一个DataFram

2024-04-26 02:17:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从一个目录中读取几个csv文件到pandas,并将它们连接到一个大数据帧中。不过我还没搞清楚。以下是我目前掌握的情况:

import glob
import pandas as pd

# get data file names
path =r'C:\DRO\DCL_rawdata_files'
filenames = glob.glob(path + "/*.csv")

dfs = []
for filename in filenames:
    dfs.append(pd.read_csv(filename))

# Concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)

我想我需要一些帮助在为循环???


Tags: 文件csv数据pathimport目录pandasdata
3条回答

替代darindaCoder's answer

path = r'C:\DRO\DCL_rawdata_files'                     # use your path
all_files = glob.glob(os.path.join(path, "*.csv"))     # advisable to use os.path.join as this makes concatenation OS independent

df_from_each_file = (pd.read_csv(f) for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
# doesn't create a list, nor does it append to one
import glob, os    
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join('', "my_files*.csv"))))

如果所有csv文件中都有相同的列,则可以尝试下面的代码。 我添加了header=0,以便在读取csv之后,可以将第一行指定为列名。

import pandas as pd
import glob

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(path + "/*.csv")

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

相关问题 更多 >