擅长:python、mysql、java
<p>这是一种使用围绕<code>groupby.transform</code>的矢量化功能的方法。我将<code>Breakdown</code>系列定义为元组列表,因为这是最灵活的格式。如果愿意,可以应用特定的字符串格式。你知道吗</p>
<pre><code>import pandas as pd
df = pd.DataFrame({'Date': ['2018-04-22', '2018-04-22', '2018-04-22', '2018-04-24'],
'Category': ['Outage', 'Outage', 'Outage', 'Transport'],
'Location': ['MT', 'ND', 'SD', 'TX'],
'ImpactRate': [0.05194, 0.02552, 0.09962, 0.03111]})
# apply ratio
df['Total'] = df.groupby(['Date', 'Category'])['ImpactRate'].transform('sum')
df['ImpactRate'] /= df['Total']
# create tuple column
df['Breakdown'] = list(zip(df.Location, df.ImpactRate))
# groupby to list
df = df.groupby(['Category', 'Date', 'Total'])['Breakdown'].apply(list).reset_index()
</code></pre>
<p>结果:</p>
<pre><code>print(df)
Category Date Total \
0 Outage 2018-04-22 0.17708
1 Transport 2018-04-24 0.03111
Breakdown
0 [(MT, 0.293313756494), (ND, 0.144115653942), (...
1 [(TX, 1.0)]
</code></pre>