从没有唯一索引的数据帧生成时间序列序列

2024-04-25 20:57:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个来自csv的数据帧,它有以下列: 用户id、路径、时间戳、性别

| user_id   | path  | timestamp             | gender    |
|:-------:  |------ |---------------------  |--------   |
| 0         | 1     | 2017-01-01 01:08:56   | f         |
| 0         | 2     | 2017-01-01 01:07:56   | f         |
| 0         | 3     | 2017-01-01 01:08:40   | f         |
| 0         | 4     | 2017-01-01 01:04:36   | f         |
| 0         | 5     | 2017-01-01 01:09:53   | f         |
| 0         | 6     | 2017-01-01 01:12:33   | f         |
| 0         | 7     | 2017-01-01 01:14:12   | f         |
| 0         | 8     | 2017-01-01 01:16:25   | f         |
| 0         | 9     | 2017-01-01 01:16:56   | f         |
| 1         | 1     | 2017-01-01 01:08:56   | m         |
| 1         | 2     | 2017-01-01 01:08:06   | m         |
| 1         | 3     | 2017-01-01 01:10:51   | m         |
| 1         | 4     | 2017-01-01 01:13:53   | m         |
| 2         | 1     | 2017-01-01 01:08:56   | f         |
| 3         | 2     | 2017-01-01 01:34:56   | m         |

输出应如下所示:一系列元素:

| paths                 | timestamps                                                                                                                                                                                    | gender    |
|-------------------    |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  |--------   |
| 1,2,3,4,5,6,7,8,9     | 2017-01-01 01:08:56, 2017-01-01 01:07:56, 2017-01-01 01:08:40, 2017-01-01 01:04:36, 2017-01-01 01:09:53, 2017-01-01 01:12:33, 2017-01-01 01:14:12, 2017-01-01 01:16:25, 2017-01-01 01:16:56   | f         |

问题是同一个用户id有多行来自不同的时间戳,我需要一个序列来进行时间序列分类(基于路径预测性别)。此外,时间戳在整个数据帧中不是唯一的,但它们是针对每个用户的

我首先用下面的代码尝试了pandas groupby函数

dictionary = {}

for name, group in grouped:
   index = name[0]
   if dictionary.get(index, -1) == -1:
       dictionary[index] = {"sequence": group.path.values, "timestamps": group.timestamp.values, "gender": group.gender.values[0]}
   else:
       dictionary[index]["sequence"] = [dictionary[index]["sequence"], group.path.values]

这并不是真的工作,因为我不能得到值出来(它停留在多索引),我无法提取每个组的值

此外,我还尝试了以下代码段:

dictionary = {}

for name, group in grouped:
    index = name[0]
    if dictionary.get(index, -1) == -1:
        dictionary[index] = {"sequence": group.path.values, "timestamps": group.timestamp.values, "gender": group.gender.values[0]}
    else:
        dictionary[index]["sequence"] = [dictionary[index]["sequence"], group.path.values]

Result after trying to generate a dictionary

谢谢你的帮助


Tags: 数据path用户name路径idindexdictionary