从没有唯一索引的数据帧生成时间序列序列

2024-04-25 20:57:03 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个来自csv的数据帧，它有以下列：用户id、路径、时间戳、性别

| user_id   | path  | timestamp             | gender    |
|:-------:  |------ |---------------------  |--------   |
| 0         | 1     | 2017-01-01 01:08:56   | f         |
| 0         | 2     | 2017-01-01 01:07:56   | f         |
| 0         | 3     | 2017-01-01 01:08:40   | f         |
| 0         | 4     | 2017-01-01 01:04:36   | f         |
| 0         | 5     | 2017-01-01 01:09:53   | f         |
| 0         | 6     | 2017-01-01 01:12:33   | f         |
| 0         | 7     | 2017-01-01 01:14:12   | f         |
| 0         | 8     | 2017-01-01 01:16:25   | f         |
| 0         | 9     | 2017-01-01 01:16:56   | f         |
| 1         | 1     | 2017-01-01 01:08:56   | m         |
| 1         | 2     | 2017-01-01 01:08:06   | m         |
| 1         | 3     | 2017-01-01 01:10:51   | m         |
| 1         | 4     | 2017-01-01 01:13:53   | m         |
| 2         | 1     | 2017-01-01 01:08:56   | f         |
| 3         | 2     | 2017-01-01 01:34:56   | m         |

输出应如下所示：一系列元素：

| paths                 | timestamps                                                                                                                                                                                    | gender    |
|-------------------    |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  |--------   |
| 1,2,3,4,5,6,7,8,9     | 2017-01-01 01:08:56, 2017-01-01 01:07:56, 2017-01-01 01:08:40, 2017-01-01 01:04:36, 2017-01-01 01:09:53, 2017-01-01 01:12:33, 2017-01-01 01:14:12, 2017-01-01 01:16:25, 2017-01-01 01:16:56   | f         |

问题是同一个用户id有多行来自不同的时间戳，我需要一个序列来进行时间序列分类（基于路径预测性别）。此外，时间戳在整个数据帧中不是唯一的，但它们是针对每个用户的

我首先用下面的代码尝试了pandas groupby函数

dictionary = {}

for name, group in grouped:
   index = name[0]
   if dictionary.get(index, -1) == -1:
       dictionary[index] = {"sequence": group.path.values, "timestamps": group.timestamp.values, "gender": group.gender.values[0]}
   else:
       dictionary[index]["sequence"] = [dictionary[index]["sequence"], group.path.values]

这并不是真的工作，因为我不能得到值出来（它停留在多索引），我无法提取每个组的值

此外，我还尝试了以下代码段：

dictionary = {}

for name, group in grouped:
    index = name[0]
    if dictionary.get(index, -1) == -1:
        dictionary[index] = {"sequence": group.path.values, "timestamps": group.timestamp.values, "gender": group.gender.values[0]}
    else:
        dictionary[index]["sequence"] = [dictionary[index]["sequence"], group.path.values]

谢谢你的帮助

Tags：数据 path 用户 name 路径 id index dictionary

0条回答

目前没有回答

从没有唯一索引的数据帧生成时间序列序列

相关问题更多 >

编程相关推荐

热门问题

热门文章

从没有唯一索引的数据帧生成时间序列序列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >