整数索引的Pandas等价重采样问题的回答

整数索引的Pandas等价重采样

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<h3>设置</h3> <pre><code>import pandas as pd import numpy as np np.random.seed([3,1415]) df = pd.DataFrame(np.random.rand(20, 2), columns=['A', 'B']) </code></pre> <p>您需要自己创建要分组的标签。我会用：</p> <pre><code>(df.index.to_series() / 5).astype(int) </code></pre> <p>要获得一系列的值，比如<code>[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, ...]</code>，那么在<code>groupby</code>中使用这个</p> <p>您还需要指定新数据帧的索引。我会用：</p> <pre><code>df.index[4::5] </code></pre> <p>从第5个位置（因此是<code>4</code>）开始，然后每隔5个位置获取一个当前索引。它看起来像<code>[4, 9, 14, 19]</code>。我本来可以做<code>df.index[::5]</code>来获得起始位置，但我选择了结束位置。</p> <h3>溶液</h3> <pre><code># assign as variable because I'm going to use it more than once. s = (df.index.to_series() / 5).astype(int) df.groupby(s).std().set_index(s.index[4::5]) </code></pre> <p>看起来像：</p> <pre><code> A B 4 0.198019 0.320451 9 0.329750 0.408232 14 0.293297 0.223991 19 0.095633 0.376390 </code></pre> <h3>其他注意事项</h3> <p>这相当于下采样。我们还没有解决抽样问题。</p> <p>要通过更频繁的操作返回到数据帧索引，可以使用<code>reindex</code>，如下所示：</p> <pre><code># assign what we've done above to df_down df_down = df.groupby(s).std().set_index(s.index[4::5]) df_up = df_down.reindex(range(20)).bfill() </code></pre> <p>看起来像：</p> <pre><code> A B 0 0.198019 0.320451 1 0.198019 0.320451 2 0.198019 0.320451 3 0.198019 0.320451 4 0.198019 0.320451 5 0.329750 0.408232 6 0.329750 0.408232 7 0.329750 0.408232 8 0.329750 0.408232 9 0.329750 0.408232 10 0.293297 0.223991 11 0.293297 0.223991 12 0.293297 0.223991 13 0.293297 0.223991 14 0.293297 0.223991 15 0.095633 0.376390 16 0.095633 0.376390 17 0.095633 0.376390 18 0.095633 0.376390 19 0.095633 0.376390 </code></pre> <p>我们还可以使用其他东西来<code>reindex</code>，比如<code>range(0, 20, 2)</code>来将样本提升到偶数整数索引。</p>

整数索引的Pandas等价重采样

1 个回答

相关Python问题