我有一个像下面这样的数据框和大函数,我想将norm\u group函数应用到数据框列,但使用apply命令会花费太多时间。有没有什么方法可以减少这个代码的时间?目前每个环路需要24.4秒。你知道吗
import pandas as pd
import numpy as np
np.random.seed(1234)
n = 1500000
df = pd.DataFrame()
df['group'] = np.random.randint(1700, size=n)
df['ID'] = np.random.randint(5, size=n)
df['s_count'] = np.random.randint(5, size=n)
df['p_count'] = np.random.randint(5, size=n)
df['d_count'] = np.random.randint(5, size=n)
df['Total'] = np.random.randint(400, size=n)
df['Normalized_total'] = df.groupby('group')['Total'].apply(lambda x: (x-x.min())/(x.max()- x.min()))
df['Normalized_total'] = df['Normalized_total'].apply(lambda x:round(x,2))
def norm_group(a,b,c,d,e):
if a >= 0.7 and b >=1000 and c >2:
return "Both High "
elif a >= 0.7 and b >=1000 and c < 2:
return "High and C Low"
elif a >= 0.4 and b >=500 and d > 2:
return "Medium and D High"
elif a >= 0.4 and b >=500 and d < 2:
return "Medium and D Low"
elif a >= 0.4 and b >=500 and e > 2:
return "Medium and E High"
elif a >= 0.4 and b >=500 and e < 2:
return "Medium and E Low"
else:
return "Low"
%timeit df['Categery'] = df.apply(lambda x:norm_group(a=x['Normalized_total'],b=x['group']), axis=1)
24.4 s±551 ms/回路(7次运行的平均值±标准偏差,每次1回路)
我的原始数据框中有多个文本列,希望应用类似的函数,与此函数相比,这类函数需要花费更多的时间。你知道吗
谢谢
可以使用
np.select
矢量化:性能:
相关问题 更多 >
编程相关推荐