有条件地对Pandas数据帧进行计算

2024-04-18 22:37:03 发布

您现在位置:Python中文网/ 问答频道 /正文

time_period    total_cost    total_revenue
7days          150           250
14days         350           600
30days         900           750
7days          180           400
14days         430           620

考虑到这些数据,我想将total_cost和total_revenue列转换为给定时间段的平均值。我以为这会奏效:

^{pr2}$

但它返回的数据帧不变。在


Tags: 数据timeperiod平均值total时间段costrevenue
2条回答

我相信你是在操作数据帧的拷贝。我认为您应该使用apply

from StringIO import StringIO
import pandas
datastring = StringIO("""\
time_period    total_cost    total_revenue
7days          150           250
14days         350           600
30days         900           750
7days          180           400
14days         430           620
""")

data = pandas.read_table(datastring, sep='\s\s+')

data['total_cost_avg'] = data.apply(
    lambda row: row['total_cost'] / float(row['time_period'][:-4]), 
    axis=1
)

给我:

^{pr2}$

保罗的回答很好。在这里添加我的方法

test_df = pd.read_csv("file1.csv")
test_df

   time_period      total_cost   total_revenue
0    7days          150        250
1    14days         350        600
2    30days         900        750
3    7days          180        400
4    14days         430        620

test_df['days'] = test_df.time_period.str.extract('(\d*)days').apply(int)
test_df['total_cost'] = test_df.total_cost / test_df.days
test_df['total_revenue'] = test_df.total_revenue / test_df.days
del test_df['days']
test_df


   time_period   total_cost       total_revenue
0    7days       21.428571          35.714286
1    14days      25.000000          42.857143
2    30days      30.000000          25.000000
3    7days       25.714286          57.142857
4    14days      30.714286          44.285714

相关问题 更多 >