在Pandas中，如何获得多索引级别中出现的分数？

import pandas as pd import dateutil.parser df = pd.DataFrame({'Type' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo', 'foo', 'foo'], 'Time' : ['9:11', '9:54', '15:12', '11:39', '21:50', '15:40', '1:23', '1:48', '9:13', '9:48']})

Time Fraction _hour Type 1 foo 2 1.0 9 bar 1 0.25 foo 3 0.75 11 bar 1 1.0 15 bar 1 0.5 foo 1 0.5 21 foo 1 1.0

2条回答

网友

1楼 · 编辑于 2024-04-24 15:07:13

您可以按\u hour索引分组，并使用transform（或apply）计算分数：

grouped_count['Fraction'] = grouped_count.groupby(level='_hour').Time.transform(lambda x: x/x.sum())

grouped_count
#            Time  Fraction
#_hour Type                
#1     foo      2      1.00
#9     bar      1      0.25
#      foo      3      0.75
#11    bar      1      1.00
#15    bar      1      0.50
#      foo      1      0.50
#21    foo      1      1.00

如果不需要时间列，也可以执行.value_counts(normalize=True)：

df.groupby('_hour').Type.value_counts(normalize=True)
#_hour  Type
#1      foo     1.00
#9      foo     0.75
#       bar     0.25
#11     bar     1.00
#15     bar     0.50
#       foo     0.50
#21     foo     1.00
#Name: Type, dtype: float64

使用标准的h:m字符串，还可以按如下方式解析hour：

df.groupby(df.Time.str.extract(r'^(\d+)', expand=False)).Type.value_counts(normalize=True)

网友

2楼 · 编辑于 2024-04-24 15:07:13

用途：

#get hour by splitting to Series h
h = df['Time'].str.split(':').str[0].astype(int).rename('hour')
#for groupby use instead column Series
grouped_count = df.groupby([h, 'Type'])['Time'].count().to_frame()
#divide by aggregate first level hour and sum
grouped_count['Fraction'] =  grouped_count.div(grouped_count.sum(level=0))
print(grouped_count)
           Time  Fraction
hour Type                
1    foo      2      1.00
9    bar      1      0.25
     foo      3      0.75
11   bar      1      1.00
15   bar      1      0.50
     foo      1      0.50
21   foo      1      1.00

相关问题更多 >

编程相关推荐

热门问题

热门文章