我正在尝试填充一个看起来像这样的数据帧
Name Origin Date Open High Low Close Date+1 Open+1 High+1 Low+1 Close+1
0 Bananas Bali 20200108 NaN NaN NaN NaN 20200109 NaN NaN NaN NaN
1 Coconut Bahamas 20200110 NaN NaN NaN NaN 20200111 NaN NaN NaN NaN
在数据框中找到的数据如下所示
Name Origin Date Time Open High Low Close
0 Bananas Bali 20200108 15:30:00 1.58 1.85 1.4 1.50
1 Bananas Bali 20200108 22:00:00 1.68 1.78 1.5 1.60
2 Bananas Bali 20200109 15:30:00 1.88 1.95 1.7 1.86
3 Bananas Bali 20200109 22:00:00 1.78 1.88 1.6 1.65
4 Coconut Bahamas 20200110 15:30:00 2.58 2.85 2.4 2.50
5 Coconut Bahamas 20200110 22:00:00 2.68 2.78 2.5 2.60
6 Coconut Bahamas 20200111 15:30:00 2.88 2.95 2.7 2.86
7 Coconut Bahamas 20200111 22:00:00 2.78 2.88 2.6 2.65
由于第一个数据帧中的列有不同的名称(例如“Open”和“Open+1”),我想不出一种简单的方法来索引匹配,而不必复制代码并重命名第二个数据帧中的列。因此,我认为按列号索引匹配比较容易,但我在弄清楚如何做这件事时遇到了问题。列的条件为“名称”、“来源”和“日期”(日期+1表示打开+1,等等)
我尝试使用以下代码:
ColOpen = df2.iloc[:, [0,1,2,4,5,6,7]].groupby([0,1,2]).agg(Open=(4,'first'),High=(5,'max'),Low=(6,'min'), Close=(7,'last'))
为列获取正确的值,但我得到了一个“KeyError:0”,它引用了列编号
我在下面创建了一个示例代码,可用于获取相同的数据帧
import pandas as pd
#Creating first sample dataframe
lst1 = [['Bananas', 'Bali', '20200108', 'NaN', 'NaN', 'NaN', 'NaN', '20200109', 'NaN', 'NaN', 'NaN', 'NaN'],
['Coconut', 'Bahamas', '20200110', 'NaN', 'NaN', 'NaN', 'NaN', '20200111', 'NaN', 'NaN', 'NaN', 'Nan']]
df1 = pd.DataFrame(lst1, columns =['Name', 'Origin', 'Date', 'Open', 'High', 'Low', 'Close', 'Date+1', 'Open+1', 'High+1', 'Low+1', 'Close+1'])
print('First Dataframe')
print(df1)
#Creating second sample dataframe
lst2 = [['Bananas', 'Bali', '20200108', '15:30:00', 1.58, 1.85, 1.50, 1.50],
['Bananas', 'Bali', '20200108', '22:00:00', 1.68, 1.78, 1.40, 1.60],
['Bananas', 'Bali', '20200109', '15:30:00', 1.88, 1.95, 1.70, 1.86],
['Bananas', 'Bali', '20200109', '22:00:00', 1.78, 1.88, 1.60, 1.65],
['Coconut', 'Bahamas', '20200110', '15:30:00', 2.58, 2.85, 2.50, 2.50],
['Coconut', 'Bahamas', '20200110', '22:00:00', 2.68, 2.78, 2.40, 2.60],
['Coconut', 'Bahamas', '20200111', '15:30:00', 2.88, 2.95, 2.70, 2.86],
['Coconut', 'Bahamas', '20200111', '22:00:00', 2.78, 2.88, 2.60, 2.65]]
df2 = pd.DataFrame(lst2, columns =['Name', 'Origin', 'Date', 'Time', 'Open', 'High', 'Low', 'Close'])
print('Second Dataframe')
print(df2)
#Index Match
ColOpen = df2.iloc[:, [0,1,2,4,5,6,7]].groupby([0,1,2]).agg(Open=(4,'first'),High=(5,'max'),Low=(6,'min'), Close=(7,'last'))
print("Printing first index")
print(ColOpen)
#Desired Output
lst3 = [['Bananas', 'Bali', '20200108', 1.58, 1.85, 1.4, 1.6, '20200109', 1.88, 1.95, 1.6, 1.65],
['Coconut', 'Bahamas', '20200110', 2.58, 2.85, 2.4, 2.6, '20200111', 2.88, 2.95, 2.6, 2.65]]
df3 = pd.DataFrame(lst3, columns =['Name', 'Origin', 'Date', 'Open', 'High', 'Low', 'Close', 'Date+1', 'Open+1', 'High+1', 'Low+1', 'Close+1'])
print('Desired Output')
print(df3)
有人能帮我弄清楚怎么做吗
编辑:所需的输出。还更新了一点代码
Name Origin Date Open ... Open+1 High+1 Low+1 Close+1
0 Bananas Bali 20200108 1.58 ... 1.88 1.95 1.6 1.65
1 Coconut Bahamas 20200110 2.58 ... 2.88 2.95 2.6 2.65
编辑:使用groupby找到了一个更简单的解决方案
基本上,您
pd.concat
您的数据,但您concat
的数据是shift
向后1行。然后concat
,并进行一些编辑。给你df4
是您正在寻找的这不是最有效的答案,但期望的结果却非同寻常。下面是代码,我主要使用
Python
函数和pandas dataframes
。通过使用Ctrl+C
复制表或手动添加来获取数据相关问题 更多 >
编程相关推荐