回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我的数据在质量上类似于这个虚拟表:</p>
<pre><code>speed_observation, car_brand, traction_force
10, ford, 2
20, ford, 4
35, seat, 8
50, ford, 16
10, audi, 2
20, audi, 5
43, audi, 2
12, seat, 2.5
10, ford, 0.5
30, audi, 6
23, ford, 4
17, seat, 5.5
10, seat, 10
38, audi, 2
40, ford, 9
19, ford, 6.6
49, seat, 18
18, ford, 4
</code></pre>
<p>我想按汽车品牌对数据框进行分组,并针对每个品牌将速度观测值分为不同的范围(例如[0,25]和[25,50]),然后针对每个品牌和bin计算测量的平均牵引力,得到如下结果:</p>
<pre><code>speed_bin_upper_lim, car_brand, avrg_traction_force_in_speed_bin
25, audi, X1
50, audi, X2
25, ford, X3
50, ford, X4
25, seat, X5
50, seat, X6
</code></pre>
<p>我该怎么做?它应适用于任意数量的唯一<code>car_brand</code>类,用户应仅提供速度箱的数量或箱的范围(例如<code>n=3</code>或<code>[0,25,50]</code>)。我想<code>pd.groupby</code>和<code>pd.cut</code>会这样做,但我没有找到确切的方法</p>
<hr/>
<p>Quang Hoang的答案非常有效,如果您想扩展它,因为您想再按一列进行分组,比如<code>wheel_kind</code>,您的数据帧如下所示:</p>
<pre><code>speed_observation,car_brand,wheel_kind,traction_force
10, ford, winter, 2
20, ford, summer, 4
35, seat, summer, 8
50, ford, winter, 16
10, audi, summer, 2
20, audi, summer, 5
43, audi, summer, 2
12, seat, summer, 2.5
10, ford, summer, 0.5
30, audi, summer, 6
23, ford, summer, 4
17, seat, summer, 5.5
10, seat, summer, 10
38, audi, summer, 2
40, ford, summer, 9
19, ford, summer, 6.6
49, seat, summer, 18
18, ford, summer, 4
</code></pre>
<p>然后只需将<code>wheel_kind</code>列添加到前面的解决方案中,更准确地说:</p>
<pre><code>(df.groupby(['car_brand', `wheel_kind`, cuts])
.traction_force.mean()
.reset_index(name='avg_traction_force')
)
</code></pre>
<p>之后别忘了放下南区,因为<code>ford</code>和<code>audi</code>没有冬季车轮:</p>
<pre><code>df_grp.dropna(inplace=True)
df_grp.reset_index(drop=True, inplace=True) #just to reset the index
</code></pre>