基于python数据帧中列的状态变化将时间序列数据拆分为组

x = pd.DataFrame([['john','abc',1],['john','abc',2],['john','abc',3],['john','xyz',4],['john','xyz',5],['john','abc',6],['john','abc',7],['matt','abc',8]]) x.columns = ['name','loc','time'] name loc time john abc 1 john abc 2 john abc 3 john xyz 4 john xyz 5 john abc 6 john abc 7 matt abc 8

2条回答

网友

1楼 · 编辑于 2024-04-19 11:04:52

可以在groupby中使用函数：

x = pd.DataFrame([['john','abc',1],['john','abc',2],['john','abc',3],['john','xyz',4],['john','xyz',5],['john','abc',6],['john','abc',7],['matt','abc',8]])
x.columns = ['name','loc','time']

last_group = None
c =0
def f(y):
    global c,last_group
    g = x.irow(y)['name'],x.irow(y)['loc']
    if last_group != g:
        c += 1
        last_group = g
    return c

print x.groupby(f).head()

网友

2楼 · 编辑于 2024-04-19 11:04:52

这实际上不是groupby的工作，因为行的顺序很重要。相反，使用shift比较连续的行。你知道吗

In [37]: cols = ['name', 'loc']

In [38]: change = (x[cols] != x[cols].shift(-1)).any(1).shift(1).fillna(True)

In [39]: groups = x[change]

In [40]: groups.columns = ['name', 'loc', 'first']

In [41]: groups['last'] = (groups['first'].shift(-1) - 1).fillna(len(x))

In [42]: groups
Out[42]:
   name  loc  first  last
0  john  abc      1     3
3  john  xyz      4     5
5  john  abc      6     7
7  matt  abc      8     8

[4 rows x 4 columns]

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于python数据帧中列的状态变化将时间序列数据拆分为组

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >