加快计算速度的建议

2024-05-15 01:35:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在执行下面的计算,速度非常慢(主要是因为我正在循环(PM_mix)的数据帧非常大)。我知道如果可能的话,不应该循环使用数据帧,但我不知道避免这种情况的最佳方法。我觉得解决方案可能是使用numpy执行计算,然后将输出数组转换为数据帧,但我不知道最好的方法。由于我基本上是在尝试将每个数据帧列乘以一个数组(F_range),那么是否值得尝试计算一个多维数组,然后将其展平?如果您有任何建议,我将不胜感激-谢谢

# initial modal abundances
ol_abund = 0.66
opx_abund = 0.17
cpx_abund = 0.12
gar_abund = 0.06

# melting modes
ol_meltmode = 0.0833
opx_meltmode = -0.190
cpx_meltmode = 0.8095
gar_meltmode = 0.298

# calculate bulk D
bulk_D = ol_abund*OIB_D['olivine'] + opx_abund*OIB_D['opx'] + cpx_abund*OIB_D['cpx'] + gar_abund*OIB_D['garnet']
# caclulate bulk P
bulk_P = ol_meltmode*OIB_D['olivine'] + opx_meltmode*OIB_D['opx'] + cpx_meltmode*OIB_D['cpx'] + gar_meltmode*OIB_D['garnet']

# F-range 1 - 5% (0.1% increments)
F_range = np.linspace(0.005,0.04,36)

# loop through and calculate new mixtures
df = pd.DataFrame()
melt_list = []

for col in PM_mix:
    # reset dataframe
    df = pd.DataFrame()
    for F in F_range:
        # calculate melt concentration using D and P values for each F
        melt = PM_mix[col][:13]/(bulk_D + F*(1 - bulk_P))
        # append modeling parameters for each source composition
        melt = melt.append(PM_mix[col][13:20])
        df[F] = melt
    # append percent melt for each iteration
    df = df.append(pd.Series(F_range,index=df.columns,name='F'))
    melt_list.append(df)

# concatenate list of dataframes into single dataframe
all_melts = pd.concat(melt_list,axis=1)

# renumber columns of dataframe
all_melts.columns = range(np.shape(all_melts)[1])

bulk_Dbulk_P可以被视为相同的1D数组,以便再现问题:

bulk_D = array([1.78800e-04, 4.91500e-04, 2.28550e-03, 1.13606e-03, 5.21800e-03,
       1.17696e-02, 1.37100e-02, 1.85100e-02, 2.95700e-02, 4.00100e-02,
       4.25960e-02, 7.73200e-02, 3.44720e-01])

Tags: 数据dfforrange数组bulkmixappend
2条回答

这可能更快

def mult(F):
    y=(PM_mix.iloc[:13]/(bulk_D + F*(1 - bulk_P))).to_numpy()
    return (y[:,:,np.newaxis])

x=map(mult, F_range)
x=list(x)

w=np.concatenate(x, axis=2)

ncol=len(F_range)*PM_mix.shape[1]
w=w.reshape((13,ncol))

v=PM_mix.iloc[13:20].to_numpy()

def repe(_):
    return(v[:,:,np.newaxis])

u=map(repe, range(len(F_range)))
u=list(u)

u=np.concatenate(u, axis=2)

u=u.reshape((7,ncol))

F_range.shape=(1,len(F_range))
f=np.hstack([F_range]*PM_mix.shape[1])

t=np.concatenate([w, u, f], axis=0)

s=pd.DataFrame(t)

print(s)

我假设PM_mix[col][:13]/(bulk_D + F*(1 - bulk_P))产生一个pd.Series形状(13,),不管bulk_Dbulk_P是数组还是常量。在我的实现中,我将它们保留为常量

我在大小为(20,1000)的示例数据帧上运行了您的代码,得到的运行时间是24 s ± 437 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)。我的快速实现如下所示:

PM_mix = pd.DataFrame(data=np.random.randn(20, 1000))

bulk_D = np.random.rand()
bulk_P = np.random.rand()

F_range = np.linspace(0.005,0.04,36)

# precalculate division term
weight = 1/(bulk_D + F_range*(1 - bulk_P))

# mask to exclude indices 13:20
mask_mul = np.array([1. if i < 13 else 0. for i in range(20)])

# mask to only include indices 13:20 i.e. modeling parameters
mask_add = np.array([0. if i < 13 else 1. for i in range(20)])

# values in a column of the data frame -> array of shape (21, 36)...
# (values and melt parameters x F) with additional row for F values
def col2arr(col_vals):
    return np.concatenate(
        [np.dot((col_vals*mask_mul).reshape(-1,1), weight.reshape(1,-1))
            + (col_vals*mask_add).reshape(-1,1),
        F_range.reshape(1,-1)], axis=0)

# concatenate the results of this operation for each column in PM_mix
data = np.concatenate(np.array(list(map(col2arr, PM_mix.values.T))), axis=-1)

# create new df
df_new = pd.DataFrame(data=data)

# set index as your desired index
df_new.index = list(df_new.index[:-1])+['F']

它的运行时间是29 ms ± 454 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)(快827倍)

我计算了两种方法的数据帧,并验证它们相等,如下所示:

>> np.allclose(df_new.values, all_melts.values)
True

通常,创建额外的数据帧并连接它们会降低代码的速度。如果可以,请坚持使用更轻量级的数据结构

相关问题 更多 >

    热门问题