我有一个来自csv的数据帧,它有以下列: 用户id、路径、时间戳、性别
| user_id | path | timestamp | gender |
|:-------: |------ |--------------------- |-------- |
| 0 | 1 | 2017-01-01 01:08:56 | f |
| 0 | 2 | 2017-01-01 01:07:56 | f |
| 0 | 3 | 2017-01-01 01:08:40 | f |
| 0 | 4 | 2017-01-01 01:04:36 | f |
| 0 | 5 | 2017-01-01 01:09:53 | f |
| 0 | 6 | 2017-01-01 01:12:33 | f |
| 0 | 7 | 2017-01-01 01:14:12 | f |
| 0 | 8 | 2017-01-01 01:16:25 | f |
| 0 | 9 | 2017-01-01 01:16:56 | f |
| 1 | 1 | 2017-01-01 01:08:56 | m |
| 1 | 2 | 2017-01-01 01:08:06 | m |
| 1 | 3 | 2017-01-01 01:10:51 | m |
| 1 | 4 | 2017-01-01 01:13:53 | m |
| 2 | 1 | 2017-01-01 01:08:56 | f |
| 3 | 2 | 2017-01-01 01:34:56 | m |
输出应如下所示:一系列元素:
| paths | timestamps | gender |
|------------------- |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |-------- |
| 1,2,3,4,5,6,7,8,9 | 2017-01-01 01:08:56, 2017-01-01 01:07:56, 2017-01-01 01:08:40, 2017-01-01 01:04:36, 2017-01-01 01:09:53, 2017-01-01 01:12:33, 2017-01-01 01:14:12, 2017-01-01 01:16:25, 2017-01-01 01:16:56 | f |
问题是同一个用户id有多行来自不同的时间戳,我需要一个序列来进行时间序列分类(基于路径预测性别)。此外,时间戳在整个数据帧中不是唯一的,但它们是针对每个用户的
我首先用下面的代码尝试了pandas groupby函数
dictionary = {}
for name, group in grouped:
index = name[0]
if dictionary.get(index, -1) == -1:
dictionary[index] = {"sequence": group.path.values, "timestamps": group.timestamp.values, "gender": group.gender.values[0]}
else:
dictionary[index]["sequence"] = [dictionary[index]["sequence"], group.path.values]
这并不是真的工作,因为我不能得到值出来(它停留在多索引),我无法提取每个组的值
此外,我还尝试了以下代码段:
dictionary = {}
for name, group in grouped:
index = name[0]
if dictionary.get(index, -1) == -1:
dictionary[index] = {"sequence": group.path.values, "timestamps": group.timestamp.values, "gender": group.gender.values[0]}
else:
dictionary[index]["sequence"] = [dictionary[index]["sequence"], group.path.values]
谢谢你的帮助
目前没有回答
相关问题 更多 >
编程相关推荐