在pandas数据框中添加新列并进行列运算
In [1]: from datetime import datetime
In [2]: import os
In [3]: import pandas as pd
In [4]: file_path = os.path.normpath('F:/EUR/data.csv')
In [5]: parse = lambda x: datetime.strptime(x, '%d.%m.%Y %H:%M:%S')
In [6]: df = pd.read_csv(file_path, parse_dates=[[0, 1]], date_parser=parse, ind
ex_col=[0], header=None)
In [7]: keys = ['Open', 'High', 'Low', 'Close']
In [8]: df.columns = [x for x in keys]
In [9]: grouped = df.groupby([df.index.year, df.index.day])
In [10]: df[:5]
Out[10]:
Open High Low Close
0_1
2007-01-02 23:30:00 1.3198 1.3205 1.3197 1.3203
2007-01-02 00:00:00 1.3203 1.3206 1.3200 1.3205
2007-01-02 00:30:00 1.3205 1.3213 1.3205 1.3212
2007-01-02 01:00:00 1.3212 1.3217 1.3211 1.3214
2007-01-02 01:30:00 1.3214 1.3226 1.3213 1.3225
1. 我想对一个分组后的对象进行简单的数学运算,并把结果放到一个新列里,比如说:
如果 df['Close'] 大于 df['Open']:
df['sum'] = df['Close'] - df['Open']
2. 还有,我为什么不能像这样分组:grouped = df.groupby([df.index.year, df.index.day, df['Close'] > df['Open']])
我对 groupby 的工作原理不是很理解。
3. 我该如何把结果放到一个新列里,比如说:
对于 (k1, k2), group in grouped:
df['new_col'] = group[group['Close'] > group['Open']]['Close'] - group[group['Close'] > group['Open']]['Open']
或者也许有更好的方法。
1 个回答
1
你试过这个吗?
grouped = df.groupby([df.index.year,df.index.day])
df['sum'] = grouped.apply(lambda x: x.Open + x.Close)