基于di选择Pandas数据帧的行

2024-04-26 07:38:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个pandas数据帧,它只有两列,像这样

          Timestamp       X
0   2017-01-01 00:00:00 18450
1   2017-01-01 00:10:00 13787
2   2017-01-01 00:20:00 3249
3   2017-01-01 00:30:00 44354
4   2017-01-01 00:40:00 50750

时间戳列基本上是从月初到月末间隔10分钟。要创建示例,可以使用以下代码。在

^{pr2}$

我有一本这样的字典

  {Timestamp('2017-01-18 01:37:19.160000'): Timestamp('2017-01-18 01:37:29.520000'),
 Timestamp('2017-01-18 01:41:04.880000'): Timestamp('2017-01-18 01:41:10.280000'),
 Timestamp('2017-01-18 21:33:52.800000'): Timestamp('2017-01-18 21:40:00.040000'),
 Timestamp('2017-01-18 21:40:02.120000'): Timestamp('2017-01-18 21:50:00.040000'),
 Timestamp('2017-01-18 21:50:02.120000'): Timestamp('2017-01-18 22:00:00.040000'),
 Timestamp('2017-01-18 22:00:02.120000'): Timestamp('2017-01-18 22:01:50.760000'),
 Timestamp('2017-01-18 22:20:22.760000'): Timestamp('2017-01-18 22:25:20.760000'),
 Timestamp('2017-01-18 22:35:52.800000'): Timestamp('2017-01-18 22:40:00.040000')}

字典中的键是开始时间,值是结束时间。我想基于l_data中的dict创建一个名为L的列

如果dict中键和值之间的时间间隔大于5分钟,我必须将l_data中的时间戳标记为1。在

如何在pandas中直接实现这一点,而不是使用多个循环。?在

预期产出如下所示

126 1/18/2017 21:00 43401   0
127 1/18/2017 21:10 290     0
128 1/18/2017 21:20 92509   0
129 1/18/2017 21:30 64545   0
130 1/18/2017 21:40 47780   1
131 1/18/2017 21:50 53293   1
132 1/18/2017 22:00 45634   0
133 1/18/2017 22:10 51462   0
134 1/18/2017 22:20 44736   0
135 1/18/2017 22:30 11697   1
136 1/18/2017 22:40 82587   1
137 1/18/2017 22:50 76250   0
138 1/18/2017 23:00 33307   0
139 1/18/2017 23:10 25851   0
140 1/18/2017 23:20 71131   0
141 1/18/2017 23:30 88015   0
142 1/18/2017 23:40 45577   0
143 1/18/2017 23:50 76761   0
144 1/19/2017 0:00  45363   0

只显示有效行


Tags: 数据代码标记示例pandasdata间隔字典
1条回答
网友
1楼 · 发布于 2024-04-26 07:38:29

我相信你需要:

d = { pd.Timestamp('2017-01-18 21:45:02.120000'): pd.Timestamp('2017-01-18 21:50:29.040000'),
pd.Timestamp('2017-01-18 21:51:02.120000'): pd.Timestamp('2017-01-18 22:52:00.040000'),
pd.Timestamp('2017-01-18 22:52:02.120000'): pd.Timestamp('2017-01-18 22:57:59.760000'),
pd.Timestamp('2017-01-18 23:41:52.800000'): pd.Timestamp('2017-01-18 23:43:00.040000'),
pd.Timestamp('2017-01-18 23:44:52.800000'): pd.Timestamp('2017-01-18 23:50:30.040000'),
pd.Timestamp('2017-01-19 01:10:32.800000'): pd.Timestamp('2017-01-19 01:11:30.040000'),
pd.Timestamp('2017-01-19 01:40:32.800000'): pd.Timestamp('2017-01-19 01:55:30.040000'),
pd.Timestamp('2017-01-19 01:57:32.800000'): pd.Timestamp('2017-01-19 02:04:30.040000')}

l_data = pd.DataFrame()
l_data['Timestamp'] = pd.date_range(start=pd.Timestamp('2017-01-18 20:00:00'), 
                                    end=pd.Timestamp('2017-01-19 04:00:00'), freq='10T')
l_data['expected'] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

^{pr2}$
l_data['L'] = l_data['Timestamp'].isin(v).astype(int)
print (l_data.head(20))
             Timestamp  expected  L
0  2017-01-18 20:00:00         0  0
1  2017-01-18 20:10:00         0  0
2  2017-01-18 20:20:00         0  0
3  2017-01-18 20:30:00         0  0
4  2017-01-18 20:40:00         0  0
5  2017-01-18 20:50:00         0  0
6  2017-01-18 21:00:00         0  0
7  2017-01-18 21:10:00         0  0
8  2017-01-18 21:20:00         0  0
9  2017-01-18 21:30:00         0  0
10 2017-01-18 21:40:00         0  0
11 2017-01-18 21:50:00         1  1
12 2017-01-18 22:00:00         1  1
13 2017-01-18 22:10:00         1  1
14 2017-01-18 22:20:00         1  1
15 2017-01-18 22:30:00         1  1
16 2017-01-18 22:40:00         1  1
17 2017-01-18 22:50:00         1  1
18 2017-01-18 23:00:00         1  1
19 2017-01-18 23:10:00         0  0

相关问题 更多 >