我正在使用Python进行数据清理。我有下面的工作流程来调用我的所有功能
if __name__ == "__main__":
data_file, hash_file, cols = read_file()
survey_data, cleaned_hash_file = format_files(data_file, hash_file, cols)
survey_data, cleaned_hash_file = rename_columns(survey_data, cleaned_hash_file)
survey_data, cleaned_hash_file = data_transformation_stage_1(survey_data, cleaned_hash_file)
observation, survey_data, cleaned_hash_file = data_transformation_stage_2(survey_data, cleaned_hash_file)
observation, survey_data, cleaned_hash_file = data_transformation_stage_3(observation, survey_data, cleaned_hash_file)
observation, survey_data, cleaned_hash_file = observation_date_fill(observation, survey_data, cleaned_hash_file)
write_file(observation, survey_data, cleaned_hash_file)
因此,每个函数的输出(返回语句变量)被用作后续函数的输入。所有函数都返回dataframe作为输出。所以observation
、survey_data
、cleaned_hash_file
、data_file
、hash_file
、cols
都是每个函数中使用的数据帧。你知道吗
有没有其他更好更优雅的方式来写这个?你知道吗
创建此类:
用法如下:
您可以扩展python
map
来接受多个函数的映射,如下所示:尝试遍历函数。它假设当前迭代的输入与上一个迭代的输出具有相同的顺序:
相关问题 更多 >
编程相关推荐