把四分之一的硬币卷成一个硬币时的日期差距

2024-06-07 19:03:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图将具有类似度量的数据集中的行汇总到合并行中。上卷必须满足两个条件:

  • 这些措施(范围为1-5)应在全球范围内保持不变 行,以便将它们卷成一行。你知道吗
  • 日期应该是连续的(没有日期间隔)。你知道吗

如果不满足这些条件,代码应该生成一个单独的行。你知道吗

这是我使用的示例数据:

id,measure1,measure2,measure3,measure4,measure5,begin_date,end_date
ABC123XYZ789,1,1,1,1,1,1/1/2019,3/31/2019
ABC123XYZ789,1,1,1,1,1,4/23/2019,6/30/2019
ABC123XYZ789,1,1,1,1,1,7/1/2019,9/30/2019
ABC123XYZ789,1,1,1,1,1,10/12/2019,12/31/2019
FGH589J6U88SW,1,1,1,1,1,1/1/2019,3/31/2019
FGH589J6U88SW,1,1,1,1,1,4/1/2019,6/30/2019
FGH589J6U88SW,1,1,1,2,1,7/1/2019,9/30/2019
FGH589J6U88SW,1,1,1,2,1,10/1/2019,12/31/2019
253DRWQ85AT2F334B,1,2,1,3,1,1/1/2019,3/31/2019
253DRWQ85AT2F334B,1,2,1,3,1,4/1/2019,6/30/2019
253DRWQ85AT2F334B,1,2,1,3,1,7/1/2019,9/30/2019
253DRWQ85AT2F334B,1,2,1,3,1,10/1/2019,12/31/2019

预期结果应为:

id,measure1,measure2,measure3,measure4,measure5,begin_date,end_date
ABC123XYZ789,1,1,1,1,1,1/1/2019,3/31/2019
ABC123XYZ789,1,1,1,1,1,4/23/2019,9/30/2019
ABC123XYZ789,1,1,1,1,1,10/12/2019,12/31/2019
FGH589J6U88SW,1,1,1,1,1,1/1/2019,6/30/2019
FGH589J6U88SW,1,1,1,2,1,7/1/2019,12/31/2019
253DRWQ85AT2F334B,1,2,1,3,1,1/1/2019,12/31/2019

我已经实现了下面的代码,它似乎解决了条件#1,但我正在寻找如何将条件#2合并到解决方案中的想法。你知道吗

import pandas as pd
import time

startTime=time.time()
data=pd.read_csv('C:\\Users\\usertemp\\Data\\Rollup2.csv')

data['end_date']= pd.to_datetime(data['end_date']) 
data['begin_date']= pd.to_datetime(data['begin_date']) 

data = data.groupby(['id','measure1','measure2', 'measure3', 'measure4', 'measure5']) \
    ['begin_date', 'end_date'].agg({'begin_date': ['min'], 'end_date': ['max']}).reset_index()
print(data)
print("It took %s seconds for the collapse process" % (time.time() - startTime))

感谢您的帮助。你知道吗


Tags: iddatadatetime条件endpdbegin
1条回答
网友
1楼 · 发布于 2024-06-07 19:03:04

您可以执行以下操作。你知道吗

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Convert begin_date and end_time to datetime
df['begin_date'] = pd.to_datetime(df['begin_date'], format='%m/%d/%Y')
df['end_date']= pd.to_datetime(df['end_date'], format='%m/%d/%Y')

# We create a new column which contains the end_date+1 from the previous row
df['end_date_prev'] = df['end_date'].iloc[:-1] + timedelta(days=1)
df['end_date_prev'] = np.roll(df['end_date_prev'], 1)

# Create a cumsum that resets when begin_date and end_date_prev doesn't match
df['cont'] = (~(df['begin_date'] == df['end_date_prev'])).astype(int).cumsum()

# Since we need all measures to match we create a string column containing all measurements
df['comb_measure'] = df['measure1'].astype(str).str.cat(df[['measure{}'.format(i) for i in range(2,6)]].astype(str))

# Get the final df
new_df = df.groupby(['id', 'comb_measure', 'cont']).agg(
    {'measure1':'first', 'measure2':'first', 'measure3':'first', 'measure4':'first', 'measure5':'first', 
     'begin_date':'first', 'end_date':'last'})

相关问题 更多 >

    热门问题