在数据帧之间搜索和查找

2024-06-17 12:01:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样的数据框:(不需要这些日期、长度或顺序)

     date1       date2 dummy
2015-10-01  2015-09-02     1
2015-10-01  2015-09-02     1
2015-10-03  2015-09-02     0
2015-10-04  2015-09-05     0
..........  ..........     .
..........  ..........     .
..........  ..........     .
2015-10-20  2015-11-04     1
2015-10-20  2015-11-05     1

我正在创建一个新的数据框,其中包含“date2”中的最早日期和“date1”中的最晚日期,并用日期填充这段时间。你知道吗

startdate = df['date2'].min(axis=0)
enddate = df['date1'].max(axis=0)

def perdelta(start, end, delta):
  curr = start
  while curr <= end:
    yield curr
    curr += delta

data2 =[]
for result in perdelta(startdate, enddate, timedelta(days=1)):
   data2.append(result)

我想找到新数据框中的每一行日期,将其与“date1”匹配,并计算有多少相同的日期在“dummy”中为零。 我可以找到所有的零,并计算它们的具体日期与熊猫群比

g = df.groupby(['date1'])
df3 = pd.DataFrame(g.apply(lambda x: x[x['dummy'] == 0]['dummy'].count()), columns=['all_zeros'])

但这只会在“date1”中找到日期并计算零,而不是从我的startdate开始,它还会跳过有1的日期而不粘贴零(计算非零应该粘贴0)。你知道吗

我想得到的结果是:

 date_newdf  count
'startdate'      0 (cuz it does not exist in date1)
 2015-09-05      0 (cuz it does not exist in date1)
 ..........      .
 ..........      .
 ..........      .
 2015-10-01      3 (found 3 zeroz with the this date)
 ..........      .
  'enddate'      2

等等

要复制:

data = {'date1': ['15-10-01', '15-10-01', '15-10-03', '15-10-04', '15-10-05', '15-10-05'],
    'date2': ['15-09-02', '15-09-02', '15-09-02', '15-09-05', '15-09-05', '15-09-05'],
    'dummy': [1,1,0,0,0,1]}
df = pd.DataFrame(data, columns=['date1', 'date2' , 'dummy'])    

Tags: 数据indfstartdummyenddeltaaxis
1条回答
网友
1楼 · 发布于 2024-06-17 12:01:03

我认为,您需要在脚本末尾添加带有list data2reindex函数,然后将缺少的数据NaN填充到1。你知道吗

更好的测试输入:

       date1      date2  dummy
0 2015-10-01 2015-09-02      1
1 2015-10-01 2015-09-02      1
2 2015-10-03 2015-09-02      0
3 2015-10-04 2015-09-05      0
4 2015-10-05 2015-11-05      0
5 2015-10-05 2015-11-05      0
6 2015-10-05 2015-11-05      0
7 2015-10-05 2015-11-05      1
8 2015-10-05 2015-11-05      1
print df3
            all_zeros
date1                
2015-10-01          0
2015-10-03          1
2015-10-04          1
2015-10-05          3

df3 = df3.reindex(pd.DatetimeIndex(data2))
df3 = df3.fillna(0)
print df3
            all_zeros
2015-09-02          0
2015-09-03          0
2015-09-04          0
2015-09-05          0
2015-09-06          0
2015-09-07          0
2015-09-08          0
2015-09-09          0
2015-09-10          0
2015-09-11          0
2015-09-12          0
2015-09-13          0
2015-09-14          0
2015-09-15          0
2015-09-16          0
2015-09-17          0
2015-09-18          0
2015-09-19          0
2015-09-20          0
2015-09-21          0
2015-09-22          0
2015-09-23          0
2015-09-24          0
2015-09-25          0
2015-09-26          0
2015-09-27          0
2015-09-28          0
2015-09-29          0
2015-09-30          0
2015-10-01          0
2015-10-02          0
2015-10-03          1
2015-10-04          1
2015-10-05          3

相关问题 更多 >