<p>你的计算比熊猫式的更简洁,我的意思是,如果你把你的数据帧仅仅看作一个大数组,那么计算就可以简洁地表达出来,而当你试图把数据帧和熔化、分组等纠缠在一起时,解决方案(至少是我提出的解决方案)就更复杂了</p>
<p>整个计算基本上可以用一行来表示:</p>
<pre><code>np.sqrt((arr**2).reshape(arr.shape[0],-1,3).sum(axis=-1))/times[:,None]
</code></pre>
<p>所以这是一种新的方式:</p>
<pre><code>import numpy as np
import pandas as pd
import io
content = '''
Time A_x A_y A_z B_x B_y B_z
-0.075509 -0.123527 -0.547239 -0.453707 -0.969796 0.248761 1.369613
-0.206369 -0.112098 -1.122609 0.218538 -0.878985 0.566872 -1.048862
-0.194552 0.818276 -1.563931 0.097377 1.641384 -0.766217 -1.482096
0.502731 0.766515 -0.650482 -0.087203 -0.089075 0.443969 0.354747
1.411380 -2.419204 -0.882383 0.005204 -0.204358 -0.999242 -0.395236
1.036695 1.115630 0.081825 -1.038442 0.515798 -0.060016 2.669702
0.392943 0.226386 0.039879 0.732611 -0.073447 1.164285 1.034357
-1.253264 0.389148 0.158289 0.440282 -1.195860 0.872064 0.906377
-0.133580 -0.308314 -0.839347 -0.517989 0.652120 0.477232 -0.391767
0.623841 0.473552 0.059428 0.726088 -0.593291 -3.186297 -0.846863'''
df = pd.read_table(io.BytesIO(content), sep='\s+', header=True)
arr = df.values
times = arr[:,0]
arr = arr[:,1:]
result = np.sqrt((arr**2).reshape(arr.shape[0],-1,3).sum(axis=-1))/times[:,None]
result = pd.DataFrame(result, columns=['Velocity_%s'%(x,) for x in list('AB')])
print(result)
</code></pre>
<p>会产生</p>
<pre><code> Velocity_A Velocity_B
0 -9.555311 -22.467965
1 -5.568487 -7.177625
2 -9.086257 -12.030091
3 2.007230 1.144208
4 1.824531 0.775006
5 1.472305 2.623467
6 1.954044 3.967796
7 -0.485576 -1.384815
8 -7.736036 -6.722931
9 1.392823 5.369757
</code></pre>
<hr/>
<p>因为您的实际数据帧具有形状(50000,36),所以选择快速方法可能很重要。以下是一个基准:</p>
<pre><code>import numpy as np
import pandas as pd
import string
N = 12
col_ids = string.letters[:N]
df = pd.DataFrame(
np.random.randn(50000, 3*N+1),
columns=['Time']+['{}_{}'.format(letter, coord) for letter in col_ids
for coord in list('xyz')])
def using_numpy(df):
arr = df.values
times = arr[:,0]
arr = arr[:,1:]
result = np.sqrt((arr**2).reshape(arr.shape[0],-1,3).sum(axis=-1))/times[:,None]
result = pd.DataFrame(result, columns=['Velocity_%s'%(x,) for x in col_ids])
return result
def using_loop(df):
results = pd.DataFrame(index=df.index) # the result container
for id in col_ids:
results['Velocity_'+id] = np.sqrt((df.filter(regex=id+'_')**2).sum(axis=1))/df.Time
return results
</code></pre>
<p>使用<a href="http://ipython.org" rel="nofollow">IPython</a>:</p>
<pre><code>In [43]: %timeit using_numpy(df)
10 loops, best of 3: 34.7 ms per loop
In [44]: %timeit using_loop(df)
10 loops, best of 3: 82 ms per loop
</code></pre>