我正在读取多个CSV文件并重新格式化它们。我开发了这段代码,可以读取单个文件。但是,我想知道是否可以循环将多个文件读取到单独的数据帧中,然后处理这些数据帧以格式化和重写csv文件
import pandas as pd
station_id = 'id.csv'
input_file = 'filename.txt'
unformatted = 'C:/Users/....../Unformatted/'
formatted = 'C:/....../Formatted/'
print(f'\nReading data file: {input_file}.')
fields = {
'Timestamp': 'timestamp',
# 'Sample Point Name': 'station_name',
# 'Sample Point Name Description': 'station_description',
# 'Start Date':'state_date',
'PM10 (1h) Validated': 'PM_1h_10_ug_m3',
'PM10 Validated' :'PM_10_ug_m3',
# 'PM2.5 (1h) Final': 'pm_25',
# 'PM2.5 Final': 'pm2.5_ug_m3'
}
df = pd.read_table(unformatted+input_file, usecols=fields.keys(), sep='\t', encoding = 'utf-16')
df.rename(columns=fields, inplace=True)
df.loc[:, 'timestamp'] = pd.to_datetime(df['timestamp'], dayfirst=True)
df['date'] = df['timestamp']
df['time'] = df['timestamp']
df['date'] = df['date'].dt.strftime('%d/%m/%Y')
df['time'] = df['time'].apply(lambda z: z.strftime('%H%M'))
df['Date_Time'] = df['date'] +' '+ df['time']
df.drop(['timestamp', 'date', 'time'], axis=1, inplace=True)
df = df[['Date_Time', 'PM_1h_10_ug_m3', 'PM_10_ug_m3']]
availability_PM_1h = df['PM_1h_10_ug_m3'].count()/df['Date_Time'].count()*100
availability_PM_10_min = df['PM_10_ug_m3'].count()/df['Date_Time'].count()*100
#Check for nan values
PM10_nan = df['PM_10_ug_m3'].isnull().sum()
PM10_1h_nan = df['PM_1h_10_ug_m3'].isnull().sum()
print('Count of PM10 NaN: ' + str(PM10_nan))
print('Count of PM10_1h NaN: ' + str(PM10_1h_nan))
df.to_csv(formatted+station_id, index=False)
假设您将整个代码包装为一个函数中的单个文件:
read_single_df(filepath)
。然后,多个文件的代码如下所示:现在,您可以将列表
dfs
中的每个数据帧调用为dfs[0]
、dfs[1]
等,并在下游应用进一步的处理对代码的一些改进建议:
下面一行是您所需要的,而不是那六行
相关问题 更多 >
编程相关推荐