pandas:有效地应用将整个datafram用作输入的函数

2024-04-27 02:33:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个熊猫数据框,它根据日期为产品购买建模。我想增加的功能有多少购买发生在昨天,上周等有一个优雅和有效的方法来做到这一点?现在我在做一个循环,这需要很多时间。。你知道吗

根据数据:

import pandas as pd, numpy as np
dico = {"dates":["2017-11-20"]*3+["2017-11-21"]*3+ ["2017-11-22"]*3, "product":["A", "B", "C"]*3, "sales": np.arange(1,10)}
df = pd.DataFrame.from_dict(dico)
df["dates"] = pd.to_datetime(df.dates)

要获得前两天的销售额和前两天的销售额之和,我循环:

one_day = pd.to_timedelta(1, unit='d')
two_days = pd.to_timedelta(2, unit='d')

yesterday_sales, last_two_days_sales = [], []
for _, row in df.iterrows():
    yesterday_performance = df.loc[(df["product"] == row["product"]) & (df.dates == (row["dates"]-one_day)) ]
    if yesterday_performance.shape[0] == 1:
        yesterday_sales.append(yesterday_performance.sales.values[0])
    else:
        yesterday_sales.append(-1)

    two_days_sales = df.loc[(df["product"] == row["product"]) & (df["dates"] >= (row["dates"]-two_days)) & (df["dates"] < (row["dates"]))]
    if two_days_sales.shape[0] >= 1:
        last_two_days_sales.append(two_days_sales.sales.sum())
    else:
        last_two_days_sales.append(-1)

df["yesterday_sales"] = yesterday_sales
df["last_two_days_sales"] = last_two_days_sales

循环中的每件事都很耗时,但我想不出更好的方法。你知道吗


Tags: to数据方法dfperformanceproductdaysrow
1条回答
网友
1楼 · 发布于 2024-04-27 02:33:28

我把你的代码简化了一点。它仍然没有矢量化,但如果性能不是一个问题,那么应该更容易维护:

def one_day(row):
    yday_perf = df.loc[(df['product'] == row['product']) & (df['dates'] == (row['dates'] + pd.Timedelta(days=-1))), 'sales']    
    return yday_perf.values[0] if not yday_perf.empty else -1

def two_day(row):
    twoday_perf = df.loc[(df['product'] == row['product']) & (df['dates'] >= (row['dates'] + pd.Timedelta(days=-2))) & (df['dates'] < row['dates']), 'sales']
    return twoday_perf.sum() if len(twoday_perf) >=1 else -1

df['yesterday_sales'] = df.apply(one_day, axis=1)
df['last_two_days_sales'] = df.apply(two_day, axis=1)

#        dates product  sales  yesterday_sales  last_two_days_sales
# 0 2017-11-20       A      1               -1                   -1
# 1 2017-11-20       B      2               -1                   -1
# 2 2017-11-20       C      3               -1                   -1
# 3 2017-11-21       A      4                1                    1
# 4 2017-11-21       B      5                2                    2
# 5 2017-11-21       C      6                3                    3
# 6 2017-11-22       A      7                4                    5
# 7 2017-11-22       B      8                5                    7
# 8 2017-11-22       C      9                6                    9

相关问题 更多 >