我已经习惯了numpy.apply_along_axis
的强大功能,我想知道我是否可以将矢量化提升到下一个级别—主要是为了提高速度,即通过尝试消除下面代码中的for循环来利用函数的潜力
from pandas import DataFrame
import numpy as np
from time import time
list_away = []
new_data_morphed = DataFrame(np.random.rand(1000, 5)).transpose().as_matrix()
group_by_strat_full = DataFrame([
[0, 1, 4],
[1, 2, 3],
[2, 3, 4],
[0, 1, 4]], columns=['add_mask_1', 'add_mask_2', 'add_mask_3', ])
all_the_core_strats = [lambda x: x.prod(), lambda x: sum(x), ]
def run_strat_mod(df_vals, core_strat, dict_mask_1, dict_mask_2, dict_mask_3, list_away):
slice_df = df_vals[[dict_mask_1, dict_mask_2, dict_mask_3]]
# TODO: this comprehension list should be vectorized
to_append = [np.apply_along_axis(lambda x: u(x), 0, slice_df) for u in core_strat]
list_away.append(to_append)
t1 = time()
results_3 = group_by_strat_full.apply(lambda x: run_strat_mod(
new_data_morphed,
all_the_core_strats,
x['add_mask_1'],
x['add_mask_2'],
x['add_mask_3'],
list_away), axis=1)
t2 = time()
print(abs(t1 - t2))
为了做到这一点,我考虑重复初始数组集,即slice_df
,这样我就可以将numpy.apply_along_axis
应用到一个新的all_the_core_strats_mod
输出如下:
print(slice_df)
[[[ 0.91302268 0.6172959 0.05478723 ..., 0.37028638 0.52116891
0.14158221]
[ 0.72579223 0.78732047 0.61335979 ..., 0.46359203 0.27593171
0.73700975]
[ 0.21706977 0.87639447 0.44936619 ..., 0.44319643 0.53712003
0.8071096 ]]
对此:
slice_df = np.array([df_vals[[dict_mask_1, dict_mask_2, dict_mask_3]]] * len(core_strat))
print(slice_df)
[[[ 0.91302268 0.6172959 0.05478723 ..., 0.37028638 0.52116891
0.14158221]
[ 0.72579223 0.78732047 0.61335979 ..., 0.46359203 0.27593171
0.73700975]
[ 0.21706977 0.87639447 0.44936619 ..., 0.44319643 0.53712003
0.8071096 ]]
[[ 0.91302268 0.6172959 0.05478723 ..., 0.37028638 0.52116891
0.14158221]
[ 0.72579223 0.78732047 0.61335979 ..., 0.46359203 0.27593171
0.73700975]
[ 0.21706977 0.87639447 0.44936619 ..., 0.44319643 0.53712003
0.8071096 ]]]
然后呢
def all_the_core_strats_mod(x):
return [x[0].prod(), sum(x[1])]
to_append = np.apply_along_axis(all_the_core_strats_mod, 0, slice_df)
但它并不像我想象的那样工作(将函数分别应用于每个复制的块)
欢迎任何想法(越快越好!)
将其应用于三维阵列,为简单起见,我将使用:
所以
apply_along_axis
传递给foo
,x[:,0,0]
,x[:,0,1]
,x[:,1,0]
,等等。对这两个数字进行prod
和sum
不是很令人兴奋apply_along_axis
只是一种方便的方法:相关问题 更多 >
编程相关推荐