在Pandas DataFrame列中保存的元组上操作

2024-04-26 18:47:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据帧:

   start      end         days
0  2015-07-01 2015-07-07         (1, 2, 3, 4, 5, 6, 7)
1  2015-07-08 2015-07-14    (8, 9, 10, 11, 12, 13, 14)
2  2015-07-15 2015-07-21  (15, 16, 17, 18, 19, 20, 21)
3  2015-07-22 2015-07-28  (22, 23, 24, 25, 26, 27, 28)
4  2015-07-29 2015-08-04      (29, 30, 31, 1, 2, 3, 4)
5  2015-08-05 2015-08-11       (5, 6, 7, 8, 9, 10, 11)
6  2015-08-12 2015-08-18  (12, 13, 14, 15, 16, 17, 18)
7  2015-08-19 2015-08-25  (19, 20, 21, 22, 23, 24, 25)
8  2015-08-26 2015-09-01   (26, 27, 28, 29, 30, 31, 1)
9  2015-09-02 2015-09-08         (2, 3, 4, 5, 6, 7, 8)
10 2015-09-09 2015-09-15   (9, 10, 11, 12, 13, 14, 15)
11 2015-09-16 2015-09-22  (16, 17, 18, 19, 20, 21, 22)
12 2015-09-23 2015-09-29  (23, 24, 25, 26, 27, 28, 29)

我有兴趣使用days列(包含元组),使用Pandas语法进行基本筛选似乎不起作用:

^{pr2}$

我希望上面的代码可以过滤DataFrame以返回以下行,即包含4的元组:

       start      end             days
    0  2015-07-01 2015-07-07         (1, 2, 3, 4, 5, 6, 7)
    4  2015-07-29 2015-08-04      (29, 30, 31, 1, 2, 3, 4)
    9  2015-09-02 2015-09-08         (2, 3, 4, 5, 6, 7, 8)

而是返回一个空的数据帧。

我还尝试创建一个新列来保存True/False值,该列是根据如下表达式进行检查的:

df['daysTF'] = 4 in df['days']

这将返回所有行的“daysTF”列设置为True的DataFrame,而不是仅当tuple中包含4时才返回True。


Tags: 数据代码falsetruedataframepandasdf语法
2条回答

另一种方法是:

df[[4 in daystuple for daystuple in df[‘days’]]]

一种方法是使用Series.apply方法,尽管这可能不是很快。示例-

df[df['days'].apply(lambda x: 4 in x)]

演示-

^{pr2}$

相关问题 更多 >