表和列

2024-04-25 03:51:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的数据框

df = pd.DataFrame()
df['SubjectArea'] = ["a","b","a","c","a","s","d","b","s","a","s","c","s","z","a"]
df['Articles'] = [10, 20,5,58,98,15,35,89,47,15,25,145,89,689,25]
df['NoOfReading'] = [30, 40,45,25,35,88,68,98,45,125,255,np.nan,75,125,265]
df
    SubjectArea Articles    NoOfReading
0       a         10            30.0
1       b         20            40.0
2       a         5             45.0
3       c         58            25.0
4       a         98            35.0
5       s         15            88.0
6       d         35            68.0
7       b         89            98.0
8       s         47            45.0
9       a         15            125.0
10      s         25            255.0
11      c         145           NaN
12      s         89            75.0
13      z         689           125.0
14      a         25            265.0

我想为每个主题区域创建一个如下所示的数据框架,并根据加权平均值进行排名

df.fillna(0, inplace=True)
df["weightedAverage"] = df["Articles"]*0.35 + df["NoOfReading"]*0.65
df2 = df[df["SubjectArea"]=="a"]
##df2["weightedAverage"] = df2["Articles"]*0.35 + df2["NoOfReading"]*0.65
df2 = df2.sort_values(by="weightedAverage",ascending=[False])
df2['Rank'] = df2['weightedAverage'].rank(method='dense', ascending=False)
df2.index = range(len(df2))
df2
    SubjectArea Articles    NoOfReading weightedAverage Rank
0       a       25          265.0           181.00      1.0
1       a       15          125.0           86.50       2.0
2       a       98          35.0            57.05       3.0
3       a       5           45.0            31.00       4.0
4       a       10          30.0            23.00       5.0

因此,我想为所有“subjectArea”创建一个数据帧内容,其排名和加权平均值低于1

   SubjectArea Articles    NoOfReading weightedAverage Rank
0       a       25          265.0           181.00      1.0
1       a       15          125.0           86.50       2.0
2       a       98          35.0            57.05       3.0
3       a       5           45.0            31.00       4.0
4       a       10          30.0            23.00       5.0
  SubjectArea  Articles    NoOfReading weightedAverage Rank
0       b       89          98.0            94.85       1.0
1       b       20          40.0            33.00       2.0
.
.
.
.
.

是否可以使用具有秩的数据透视表创建类似的数据透视表?还是其他方法

任何帮助都将不胜感激。提前感谢


Tags: 数据falsedataframedfnpnanarticles平均值
1条回答
网友
1楼 · 发布于 2024-04-25 03:51:40

您可以在不使用groupby分隔主题的情况下分配排名:

df["weightedAverage"] = df["Articles"]*0.35 + df["NoOfReading"]*0.65

df['Rank'] = df.groupby('SubjectArea')['weightedAverage'].rank()

df = df.sort_values(['SubjectArea', 'Rank'])

输出:

   SubjectArea  Articles  NoOfReading  weightedAverage  Rank
0            a        10         30.0            23.00   1.0
2            a         5         45.0            31.00   2.0
4            a        98         35.0            57.05   3.0
9            a        15        125.0            86.50   4.0
14           a        25        265.0           181.00   5.0
1            b        20         40.0            33.00   1.0
7            b        89         98.0            94.85   2.0
3            c        58         25.0            36.55   1.0
11           c       145          NaN              NaN   NaN
6            d        35         68.0            56.45   1.0
8            s        47         45.0            45.70   1.0
5            s        15         88.0            62.45   2.0
12           s        89         75.0            79.90   3.0
10           s        25        255.0           174.50   4.0
13           z       689        125.0           322.40   1.0

注意:通常,如果您想通过列访问子数据帧,使用groupby循环会更快:

for subject, data in df.groupby('SubjectArea'):
    # do something with `data`

相关问题 更多 >