我有以下三个数据帧:
我想要的实际列是标题代码、类型、第1周附加率、第4周附加率、有限附加率和总附加单位。你知道吗
Week1A=第1周销售的附加产品。
Week1U=附件产品发布第一周之前(含第一周)售出的图书。
Week4A=截至第4周(含第4周)销售的附加产品。
Week4U=附件产品发布第4周之前(含第4周)售出的图书。
LTDA=截至最近日期销售的附加产品。
LTDU=截至最近日期的图书销售情况。
例如,['Week 1 Attach Rate'] = ['Week1A']/['Week1U'
],['Week 4 Attach Rate'] = ['Week4A']/['Week4U']
,['LTD Attach Rate'] = ['LTDA']/['LTD1U']
数据帧
import pandas as pd
from io import StringIO
attachproduct = StringIO("""TitleCode,Type,WeekEnding,TotalUnits
A,1,12/16/2017 0:00,548
A,1,12/23/2017 0:00,74
A,1,12/30/2017 0:00,87
A,1,1/6/2018 0:00,4
A,1,1/13/2018 0:00,878
A,2,12/16/2017 0:00,49
A,2,12/23/2017 0:00,8498
A,2,12/30/2017 0:00,84
A,2,1/6/2018 0:00,74
A,2,1/13/2018 0:00,453
B,1,12/23/2017 0:00,68
B,1,12/30/2017 0:00,573
B,1,1/6/2018 0:00,75
B,1,1/13/2018 0:00,752
B,1,1/20/2018 0:00,75
B,2,12/23/2017 0:00,17
B,2,12/30/2017 0:00,98
B,2,1/6/2018 0:00,7875
B,2,1/13/2018 0:00,73
B,2,1/20/2018 0:00,75
C,1,12/23/2017 0:00,79
C,1,12/30/2017 0:00,75727
C,1,1/6/2018 0:00,77
C,1,1/13/2018 0:00,3727
C,1,1/20/2018 0:00,72
C,1,1/27/2018 0:00,7275
C,1,2/3/2018 0:00,27
""")
primaryproduct = StringIO("""TitleCode,Type,WeekEnding,TotalUnits
A,1,11/11/2017 0:00,830
A,1,11/18/2017 0:00,830
A,1,11/25/2017 0:00,830
A,1,12/2/2017 0:00,132
A,1,12/9/2017 0:00,161
A,1,12/16/2017 0:00,6
A,1,12/23/2017 0:00,1701
A,1,12/30/2017 0:00,1240
A,2,11/11/2017 0:00,141
A,2,11/18/2017 0:00,141
A,2,11/25/2017 0:00,141
A,2,12/2/2017 0:00,22388
A,2,12/9/2017 0:00,255
A,2,12/16/2017 0:00,90
A,2,12/23/2017 0:00,1471
A,2,12/30/2017 0:00,1010
A,2,1/6/2018 0:00,8
A,2,1/13/2018 0:00,9
B,1,12/2/2017 0:00,254
B,1,12/9/2017 0:00,1022
B,1,12/16/2017 0:00,241
B,1,12/23/2017 0:00,1532
B,1,12/30/2017 0:00,122
B,1,1/6/2018 0:00,442
B,1,1/13/2018 0:00,761
B,1,1/20/2018 0:00,1081
B,2,12/2/2017 0:00,49
B,2,12/9/2017 0:00,351
B,2,12/16/2017 0:00,19951
B,2,12/23/2017 0:00,253
B,2,12/30/2017 0:00,282
B,2,1/6/2018 0:00,601
B,2,1/13/2018 0:00,921
B,2,1/20/2018 0:00,1241
C,1,11/25/2017 0:00,273
C,1,12/2/2017 0:00,151944
C,1,12/9/2017 0:00,95
C,1,12/16/2017 0:00,8736
C,1,12/23/2017 0:00,172
C,1,12/30/2017 0:00,15005
C,1,1/6/2018 0:00,51
C,1,1/13/2018 0:00,52
C,1,1/20/2018 0:00,45
C,1,1/27/2018 0:00,6
C,1,2/3/2018 0:00,55
""")
expected = StringIO("""TitleCode,Type,Week1A,Week4A,LTDA,Week1M,Week4M,LTDM,Week 1 Attach Rate,4 Week Attach Rate,LTD Attach Rate,Total Attached Units
A,1,548,713,1591,2789,5731,5731,19.6%,12.4%,27.8%,1591
A,2,49,8705,9158,23155,25644,25653,0.2%,33.9%,35.7%,9158
B,1,68,1468,1543,3049,301644,5455,2.2%,0.5%,28.3%,1543
B,2,17,8063,8138,20604,22408,23648,0.1%,36.0%,34.4%,8138
C,1,79,79610,86984,161220,176327,176433,0.0%,45.1%,49.3%,86984
""")
attachdf = pd.read_csv(attachproduct, parse_dates=True)
primarydf = pd.read_csv(primaryproduct, parse_dates=True)
expecteddf = pd.read_csv(expected, parse_dates=True)
attachdf['WeekEnding']=pd.to_datetime(attachdf['WeekEnding'])
primarydf['WeekEnding']=pd.to_datetime(primarydf['WeekEnding'])
获取最新生活价值是直截了当的,但我不知道我了解什么是最好的方法,为约束的第1周和第4周利率。你知道吗
ltdattach=attachdf.groupby(['TitleCode','Type']).sum())/(primarydf.groupby(['TitleCode','Type']).sum())
考虑使用
groupby
和transform
计算列,并连接或合并helper dataframes、week1df和week4df:输出
相关问题 更多 >
编程相关推荐