如何将来自多个列和多个文件的数据按摩到单个数据帧中？

sp_id sp_dt v1 v1 v3 x1|x2|x30|x40 2018-10-07 100 200 300 x1|x2|x30|x40 2018-10-14 80 80 90 x1|x2|x30|x40 2018-10-21 34 35 36 x1|x2|x31|x41 2018-10-07 100 200 300 x1|x2|x31|x41 2018-10-14 80 80 90 x1|x2|x31|x41 2018-10-21 34 35 36 .... x1|x2|x39|x49 2018-10-21 340 350 36

Variable sp_partid1 sp_partid2 2018-10-07 ... 2018-10-21 v4 x30 x40 160 ... 154 v4 x31 x41 59 ... 75 .... v4 x39 x49 75 ... 44 v5 x30 x40 16 ... 24 v5 x31 x41 59 ... 79 .... v5 x39 x49 75 ... 34

sp_id sp_dt v1 v1 v3 v4 v5 x1|x2|x30|x40 2018-10-07 100 200 300 160 16 x1|x2|x30|x40 2018-10-14 80 80 90 ... ... x1|x2|x30|x40 2018-10-21 34 35 36 154 24 x1|x2|x31|x41 2018-10-07 100 200 300 59 59 x1|x2|x31|x41 2018-10-14 80 80 90 ... ... x1|x2|x31|x41 2018-10-21 34 35 36 75 79 .... x1|x2|x39|x49 2018-10-21 340 350 36 44 34

get a list of variables check if the variable(say v4 in this case) exists in any sheet if it does: does it have any "part of sp_id" #In the example shown sp_partid1 and sp_partid2 of excel sheets #are part of sp_id of dataframe. if yes: #it means the part of sp_id is common for all values. (x1|x2) in this case. add a new column to dataframe, v4, which has sp_id, sp_dt and, the value of that date if no: #it means the whol sp_id is common for all values. (x1|x2|x3|x4) in this case and not shown in example. add a new column to dataframe, v4, and copy the value under the appropriate dates in excel sheet into corresponding v4 values and sp_dt

df # is the top data frame which I have not gotten around to using yet var_value # gets values in a loop like 'v4, v5...' sheets_dict = {name: pd.read_excel('excel_file.xlsx', sheet_name = name, parse_dates = True) for name in sheets} for key, value in sheets_dict.items(): if 'Variable' in value.columns: # 'Variable' column exists in this sheet if var_value in value['Variable'].values: # var_value exists in 'Variable' column (say, v4) for column in value.columns: if column.startswith('sp_'): #Do something with column values, then map the values etc

2条回答

网友

1楼 · 编辑于 2024-04-20 08:40:33

您正在尝试做的是有意义的，但是这是一个相当长的操作序列，因此您在实现它时遇到一些困难是正常的。我认为您应该回到关系数据库的更高抽象级别，并使用pandas提供的高级数据帧操作。你知道吗

让我们从高级操作的角度总结一下您想要做的事情：

更改sheet_dicts数据帧的格式，使其具有相同的数据，但呈现方式不同

   id3           id4        date            v4         v5       
   x30           x40        2018-10-07      160        154
   x31           x41        2018-10-08      30         10

将原始数据帧的ID拆分为几列。你知道吗
在id和date上将生成的数据帧与原始数据帧连接起来。你知道吗

我不能给你一个精确的实现，你的规范仍然是相当模糊的，即使全球目标是明确的。另外，我没有一个参考来指导您使用关系数据库，但是我强烈建议您了解情况，这将为您节省大量时间，特别是如果您经常需要执行此类任务的话。你知道吗

网友
2楼 · 编辑于 2024-04-20 08:40:33

假设您的excel表中有一个包含以下数据
Variable sp_partid1 sp_partid2 2018-10-07 2018-10-08 2018-10-21 0 v4 x30 x40 160 10.0 154 1 v4 x31 x41 59 NaN 75 2 v4 x32 x42 75 10.0 44 3 v5 x30 x40 16 10.0 24 4 v5 x31 x41 59 10.0 79 5 v5 x32 x42 75 10.0 34
您可以使用pandasmelt和pivot_table函数的组合来获得所需的结果。你知道吗
import pandas as pd book= pd.read_excel('del.xlsx',sheet_name=None) for df in book.values(): df=df.melt(id_vars=['Variable','sp_partid1','sp_partid2'], var_name="Date", value_name="Value") # concatenate strings of two columns separated by a '|' df['sp_id'] = df['sp_partid1'] +'|'+ df['sp_partid2'] df = df.loc[:,['Variable', 'sp_id','Date','Value']] df = df.pivot_table('Value', ['sp_id','Date'], 'Variable').reset_index( drop=False ) print(df) >> output Variable sp_id Date v4 v5 0 x30|x40 2018-10-07 160.0 16.0 1 x30|x40 2018-10-08 10.0 10.0 2 x30|x40 2018-10-21 154.0 24.0 3 x31|x41 2018-10-07 59.0 59.0 4 x31|x41 2018-10-08 NaN 10.0 5 x31|x41 2018-10-21 75.0 79.0 6 x32|x42 2018-10-07 75.0 75.0 7 x32|x42 2018-10-08 10.0 10.0 8 x32|x42 2018-10-21 44.0 34.0
阅读带有sheet\u name=None的excel工作簿将给出一个worksheet name为key的字典和一个data frame为value

相关问题更多 >

编程相关推荐

热门问题

热门文章