大Pandas亲缘关系价值的累积排序

2024-03-28 14:48:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图找到一种方法来计算出熊猫之间的联系。在

让我们从田径比赛中获取假设数据,在那里我有人、比赛、热身和时间。在

每个人的位置如下:

对于给定的比赛/热身组合:

  • 把时间最短的人放在第一位
  • 时间倒数第二的人排在第二位

等等。。。在

这将是相当简单的代码,但首先。。在

如果两个人有相同的时间,他们都得到相同的地方,然后下一个时间大于他们的时间将有这个值+1作为位置。在

在下表中,对于100码短跑,热火1,RUNNER1第一名,RUNNER2/RUNNER3获得第二名,RUNNER3获得第三名(下一次是在RUNNER2/RUNNER3之后)

基本上,逻辑如下:

如果比赛<>;比赛。转移()或加热>;热转换()则位置=1

如果种族=比赛。转移()和热量=热转换()和时间>;时间。转移然后放置=地点.班次()+1

如果种族=比赛。转移()和热量=热转换()和时间>;时间。转移然后放置=地点.班次()

让我困惑的是如何处理关系。否则我可以做些

df['Place']=np.where(
              (df['race']==df['race'].shift())
              &
              (df['heat']==df['heat'].shift()),
              df['Place'].shift()+1,
              1)

谢谢你!在

示例数据如下:

^{pr2}$

最后我想要的是

Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,1,9.96,3
RUNNER5,100 Yard Dash,1,9.97,4
RUNNER6,100 Yard Dash,1,10.01,5
RUNNER7,100 Yard Dash,2,9.88,1
RUNNER8,100 Yard Dash,2,9.93,2
RUNNER9,100 Yard Dash,2,9.93,2
RUNNER10,100 Yard Dash,2,10.03,3
RUNNER11,100 Yard Dash,2,10.26,4
RUNNER7,200 Yard Dash,1,19.63,1
RUNNER8,200 Yard Dash,1,19.67,2
RUNNER9,200 Yard Dash,1,19.72,3
RUNNER10,200 Yard Dash,1,19.72,3
RUNNER11,200 Yard Dash,1,19.86,4
RUNNER12,200 Yard Dash,1,19.92,4

[编辑]现在,再往前走一步。。

假设一旦我离开一组唯一值,下一次该组出现时,这些值将重置为1。。在

所以,举例来说,-注意到它是“热度1”,然后是“热度2”,又回到了“热度1”-我不希望排名从之前的“热度1”继续下去,而是希望它们重新设置。在

Person,Race,Heat,Time,Place
RUNNER1,100 Yard Dash,1,9.87,1
RUNNER2,100 Yard Dash,1,9.92,2
RUNNER3,100 Yard Dash,1,9.92,2
RUNNER4,100 Yard Dash,2,9.96,1
RUNNER5,100 Yard Dash,2,9.97,2
RUNNER6,100 Yard Dash,2,10.01,3
RUNNER7,100 Yard Dash,1,9.88,1
RUNNER8,100 Yard Dash,1,9.93,2
RUNNER9,100 Yard Dash,1,9.93,2

Tags: 数据gtdfshift时间placedash热度
1条回答
网友
1楼 · 发布于 2024-03-28 14:48:52

您可以使用:

grouped =  df.groupby(['Race','Heat'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)

^{pr2}$

收益率

    Heat    Person           Race   Time  Place  Rank
0      1   RUNNER1  100 Yard Dash   9.87    1.0   1.0
1      1   RUNNER2  100 Yard Dash   9.92    2.0   2.0
2      1   RUNNER3  100 Yard Dash   9.92    2.0   2.0
3      1   RUNNER4  100 Yard Dash   9.96    3.0   4.0
4      1   RUNNER5  100 Yard Dash   9.97    4.0   5.0
5      1   RUNNER6  100 Yard Dash  10.01    5.0   6.0
6      2   RUNNER7  100 Yard Dash   9.88    1.0   1.0
7      2   RUNNER8  100 Yard Dash   9.93    2.0   2.0
8      2   RUNNER9  100 Yard Dash   9.93    2.0   2.0
9      2  RUNNER10  100 Yard Dash  10.03    3.0   4.0
10     2  RUNNER11  100 Yard Dash  10.26    4.0   5.0
11     1   RUNNER7  200 Yard Dash  19.63    1.0   1.0
12     1   RUNNER8  200 Yard Dash  19.67    2.0   2.0
13     1   RUNNER9  200 Yard Dash  19.72    3.0   3.0
14     1  RUNNER10  200 Yard Dash  19.72    3.0   3.0
15     1  RUNNER11  200 Yard Dash  19.86    4.0   5.0
16     1  RUNNER12  200 Yard Dash  19.92    5.0   6.0

请注意,Pandas有一个^{}方法,它可以计算许多常见的等级形式,但不是您描述的那种。请注意,例如在第3行,第二个和第三个跑步者平局后,Rank是4,而Place是3。在


关于编辑:使用

(df['Heat'] != df['Heat'].shift()).cumsum()

要消除热量的歧义:

import pandas as pd
df = pd.DataFrame({'Heat': [1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1], 'Person': ['RUNNER1', 'RUNNER2', 'RUNNER3', 'RUNNER4', 'RUNNER5', 'RUNNER6', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER7', 'RUNNER8', 'RUNNER9', 'RUNNER10', 'RUNNER11', 'RUNNER12'], 'Race': ['100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash', '100 Yard Dash'], 'Time': [9.8699999999999992, 9.9199999999999999, 9.9199999999999999, 9.9600000000000009, 9.9700000000000006, 10.01, 9.8800000000000008, 9.9299999999999997, 9.9299999999999997, 10.029999999999999, 10.26, 19.629999999999999, 19.670000000000002, 19.719999999999999, 19.719999999999999, 19.859999999999999, 19.920000000000002]})

df['HeatGroup'] = (df['Heat'] != df['Heat'].shift()).cumsum()
grouped =  df.groupby(['Race','HeatGroup'])
df['Place'] = grouped['Time'].transform(lambda x: pd.factorize(x, sort=True)[0]+1)
df['Rank'] = grouped['Time'].rank(method='min')
print(df)

收益率

    Heat    Person           Race   Time  HeatGroup  Place  Rank
0      1   RUNNER1  100 Yard Dash   9.87          1    1.0   1.0
1      1   RUNNER2  100 Yard Dash   9.92          1    2.0   2.0
2      1   RUNNER3  100 Yard Dash   9.92          1    2.0   2.0
3      1   RUNNER4  100 Yard Dash   9.96          1    3.0   4.0
4      1   RUNNER5  100 Yard Dash   9.97          1    4.0   5.0
5      1   RUNNER6  100 Yard Dash  10.01          1    5.0   6.0
6      2   RUNNER7  100 Yard Dash   9.88          2    1.0   1.0
7      2   RUNNER8  100 Yard Dash   9.93          2    2.0   2.0
8      2   RUNNER9  100 Yard Dash   9.93          2    2.0   2.0
9      2  RUNNER10  100 Yard Dash  10.03          2    3.0   4.0
10     2  RUNNER11  100 Yard Dash  10.26          2    4.0   5.0
11     1   RUNNER7  100 Yard Dash  19.63          3    1.0   1.0
12     1   RUNNER8  100 Yard Dash  19.67          3    2.0   2.0
13     1   RUNNER9  100 Yard Dash  19.72          3    3.0   3.0
14     1  RUNNER10  100 Yard Dash  19.72          3    3.0   3.0
15     1  RUNNER11  100 Yard Dash  19.86          3    4.0   5.0
16     1  RUNNER12  100 Yard Dash  19.92          3    5.0   6.0

相关问题 更多 >