如何加速python数据帧中的嵌套循环?

2024-05-15 18:31:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个pandas.DataFrame,包含不同证券的信息。有“日期”、“证券id”、“国家”、“因子名称”和“因子值”列,其中“因子值”表示“因子值”是“债务”还是“权益”。我被要求计算每个国家在每个日期的每种证券的债务股本比率。我只能考虑使用嵌套循环来循环每个列的唯一值,但它似乎要花很长时间才能运行。有没有什么方法可以加速我的代码

dates = data["date"].unique()
securities = data["security_id"].unique()
countries = data["country"].unique()
for date in dates:
    for sec in securities:
        for country in countries:
            ratio = get_DEratio(date, sec, country)
def get_DEratio(date, sec, country):
    TE_lst = data[(data["date"] == date) & (data["security_id"] == sec) 
              & (data["country"] == country) & (data["factor"] == "TE")]["factor_value"].tolist()
    TD_lst = data[(data["date"] == date) & (data["security_id"] == sec)
              & (data["country"] == country) & (data["factor"] == "TD")]["factor_value"].tolist()
    
    if not TD_lst or not TE_lst:
        return 0
    
    TD, TE = TD_lst[0], TE_lst[0]
    if TD == 0 or TE == 0:
        return 0
    return TD / TE

Tags: inidfordatadateseccountry因子
1条回答
网友
1楼 · 发布于 2024-05-15 18:31:11

假设源数据帧包含:

        date security_id country factor_name  factor_value
0 2020-06-01          S1      C1          TE          10.0
1 2020-06-01          S1      C1          TD          20.0
2 2020-06-01          S2      C1          TE          12.0
3 2020-06-01          S2      C1          TD          20.0
4 2020-06-01          S1      C2          TE          12.0
5 2020-06-01          S1      C2          TD          20.0
6 2020-06-01          S2      C2          TE          14.0
7 2020-06-01          S2      C2          TD          20.0
8 2020-06-01          S3      C2          TE          14.0
9 2020-06-01          S4      C2          TD          20.0

首先计算辅助数据帧:

wrk = df.set_index(['date', 'security_id', 'country', 'factor_name'])\
    .factor_value.unstack()

结果是:

factor_name                       TD    TE
date       security_id country            
2020-06-01 S1          C1       20.0  10.0
                       C2       20.0  12.0
           S2          C1       20.0  12.0
                       C2       20.0  14.0
           S3          C2        NaN  14.0
           S4          C2       20.0   NaN

然后,要获得最终结果,请运行:

result = wrk.TD.div(wrk.TE).fillna(0)

您将获得:

date        security_id  country
2020-06-01  S1           C1         2.000000
                         C2         1.666667
            S2           C1         1.666667
                         C2         1.428571
            S3           C2         0.000000
            S4           C2         0.000000
dtype: float64

相关问题 更多 >