合并其他列匹配的数据帧的值

2024-06-07 18:01:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个存储日期、汽车品牌、颜色和城市的数据框:

 date              car_brand    color     city
 "2020-01-01"      porsche      red       paris
 "2020-01-02"      prosche      red       paris
 "2020-01-03"      porsche      red       london
 "2020-01-04"      porsche      red       paris
 "2020-01-05"      porsche      red       london
 "2020-01-01"      audi         blue      munich
 "2020-01-02"      audi         red       munich
 "2020-01-03"      audi         red       london
 "2020-01-04"      audi         red       london
 "2020-01-05"      audi         red       london

现在,我想通过以下方式从中创建一个数据帧: 将连续几天汽车品牌、颜色和城市匹配的行合并在一起。所以在这个例子中,我想以一个数据帧结束

 date                             car_brand    color     city
 ["2020-01-01","2020-01-02"]      porsche      red       paris
 ["2020-01-03"]                   porsche      red       london
 ["2020-01-04"]                   porsche      red       paris
 ["2020-01-05"]                   porsche      red       london
 ["2020-01-01"]                   audi         blue      munich
 ["2020-01-02"]                   audi         red       munich
 ["2020-01-03","2020-01-05"]      audi         red       london

我怎样才能做到这一点?我试过pd.concat和pd.merge,但到目前为止没有任何效果。谢谢


Tags: 数据citydate颜色blueredcarcolor
1条回答
网友
1楼 · 发布于 2024-06-07 18:01:12

如果连续性很重要,可以检查列表理解。这是从组上的lambda函数获取list的技术的扩展

df = pd.read_csv(io.StringIO(""" date              car_brand    color     city
 "2020-01-01"      porsche      red       paris
 "2020-01-02"      porsche      red       paris
 "2020-01-03"      porsche      red       london
 "2020-01-04"      porsche      red       paris
 "2020-01-05"      porsche      red       london
 "2020-01-01"      audi         blue      munich
 "2020-01-02"      audi         red       munich
 "2020-01-03"      audi         red       london
 "2020-01-04"      audi         red       london
 "2020-01-05"      audi         red       london"""), sep="\s+")
df["date"] = pd.to_datetime(df["date"])
df = (
    df
    .groupby([c for c in df.columns if c!="date"])["date"]
    # only include if first date or if it's a consequetive date
    .agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
    .reset_index()
)

输出

car_brand color   city                                                            date
     audi  blue munich                                           [2020-01-01 00:00:00]
     audi   red london [2020-01-03 00:00:00, 2020-01-04 00:00:00, 2020-01-05 00:00:00]
     audi   red munich                                           [2020-01-02 00:00:00]
  porsche   red london                                           [2020-01-03 00:00:00]
  porsche   red  paris                      [2020-01-01 00:00:00, 2020-01-02 00:00:00]

相关问题 更多 >

    热门问题