我有一个python数据帧,可以简化如下:
python
df= pd.DataFrame([['January','Monday',np.nan,np.nan,np.nan,1,20],['January','Monday',np.nan,np.nan,np.nan,2,25],['February','Monday',np.nan,np.nan,np.nan,1,15],\
['February','Monday',np.nan,np.nan,np.nan,2,20],['February','Monday',np.nan,np.nan,np.nan,3,25],['March','Tuesday',np.nan,np.nan,np.nan,1,50],\
['March','Wednesday',np.nan,np.nan,np.nan,1,75]],columns = ['Month','Day','Data1','Data2', 'Data3','Count','Initial_Data'])
Month Day Data1 Data2 Data3 Count Initial_Data
0 January Monday NaN NaN NaN 1 20
1 January Monday NaN NaN NaN 2 25
2 February Monday NaN NaN NaN 1 15
3 February Monday NaN NaN NaN 2 20
4 February Monday NaN NaN NaN 3 25
5 March Tuesday NaN NaN NaN 1 50
6 March Wednesday NaN NaN NaN 1 75
新数据框架的目的/目标:我想按月份和日期对数据进行分类。我想用来自初始数据的数字填充列Data1、Data2和Data3。例如,对于一月和星期一,Data1=20,Data2=25,Data3保持为NaN,因为一月和星期一的计数最高=2。对于二月和星期一,我希望Data1=15,Data2=20和Data3=25,这是因为二月和星期一的计数最高,为3。对于三月日星期二,我希望Data1=50,Data2和Data3=NaN,对于三月日星期三,我希望Data1=75和Data2=Data3=NaN,因为它们的最高计数为1。最终数据如下:
Month Day Data1 Data2 Data3
0 January Monday 20 25.0 NaN
1 January Monday 20 25.0 NaN
2 February Monday 15 20.0 25.0
3 February Monday 15 20.0 25.0
4 February Monday 15 20.0 25.0
5 March Tuesday 50 NaN NaN
6 March Wednesday 75 NaN NaN
我尝试使用if语句,但它不起作用,因为我找不到填充所有三列(Data1、Data2和Data3)的解决方案。非常感谢。你知道吗
你可以试试这个:
输出:
详情:
首先,使用
set_index
和unstack
最内部的索引移动到to列中的'Count'。从而重塑数据帧。然后在列标题中添加“Data”前缀。你知道吗接下来,我们需要
merge
或者基于Month和day列将两个数据帧连接在一起。你知道吗这是我的答案,但斯科特用一个更好的答案击败了我。你知道吗
相关问题 更多 >
编程相关推荐