将列取消堆叠到datafram中

2024-04-23 14:14:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一些乱七八糟的传感器读取数据像这样。每条记录(长度不同)用“----”分隔并堆叠在一起。有没有办法把它展平成一个数据帧,其中每一行都是一条记录?你知道吗

test = pd.DataFrame({"Messy":["21/12/2017 11:12:48","Port:4","Reading 1: 1","----","21/12/2017 11:13:48","Port:4","Reading 1: 2","Reading 2: 2.5","----"]})
test

    Messy
0   21/12/2017 11:12:48
1   Port:4
2   Reading 1: 1
3   ----
4   21/12/2017 11:13:48
5   Port:4
6   Reading 1: 2
7   Reading 2: 2.5
8   ----

我想要的是这样的东西:

target = pd.DataFrame({"Time":["21/12/2017 11:12:48","21/12/2017 11:13:48"],"Port":["Port:4","Port:4"],"Field1":['Reading 1: 1','Reading 1: 2'],"Field2":['','Reading 2: 2.5']})
target

   Field1         Feild2           Port      Time
0  Reading 1: 1                    Port:4    21/12/2017 11:12:48
1  Reading 1: 2   Reading 2: 2.5   Port:4    21/12/2017 11:13:48

Tags: 数据testtargetdataframetimeport记录传感器
3条回答

下面是一个解决方案。你的数据乱七八糟。此方法假定您的所有数据都是按4列分组结构的。你知道吗

import numpy as np, pandas as pd

test = pd.DataFrame({"Messy":["21/12/2017 11:12:48","Port:4","Reading 1: 1","  ","21/12/2017 11:13:48","Port:4","Reading 1: 2","Reading 2: 2.5","  "]})

lst = [np.hstack(np.hstack(i)) for i in zip((test.iloc[4*i:4*i+4].values \
                               for i in range(int(len(test.index)/4))))]

df = pd.DataFrame(lst, columns=['Date', 'Port', 'Field1', 'Field2']).replace({'  ': ''})

#                   Date    Port        Field1          Field2
# 0  21/12/2017 11:12:48  Port:4  Reading 1: 1                
# 1  21/12/2017 11:13:48  Port:4  Reading 1: 2  Reading 2: 2.5

假设最多有4列,并且所有记录的顺序都相同,下面是另一个使用reiopandas的解决方案:

import pandas as pd
import io
import re
d = {"Messy":["21/12/2017 11:12:48","Port:4","Reading 1: 1","  ",
            "21/12/2017 11:13:48","Port:4","Reading 1: 2","Reading 2: 2.5",
            "  "]}

test = pd.read_csv(io.StringIO(re.sub(r',  ,?','\n', ','.join(d['Messy']))),
                   names=['Time','Port','Field1','Field2'])


In [13]: 
print(test)

Out[13]:
    Time                Port    Field1          Field2
0   21/12/2017 11:12:48 Port:4  Reading 1: 1    NaN
1   21/12/2017 11:13:48 Port:4  Reading 1: 2    Reading 2: 2.5

您可以通过在pd.read_csv()函数的nameslist属性中添加更多列名来扩展此解决方案,例如,如果数据中的一条记录最多有10列,只需将它们映射到10个列名即可。你知道吗

显然,它确实依赖于数据,但您可以尝试:

#check separator
m = test['Messy'].str.startswith('  ')
#create groups
test['g'] = m.cumsum()
#filter separator rows
df = test[~m].copy()
#count groups
df['c'] = df.groupby('g').cumcount()
print (df)
                 Messy  g  c
0  21/12/2017 11:12:48  0  0
1               Port:4  0  1
2         Reading 1: 1  0  2
4  21/12/2017 11:13:48  1  0
5               Port:4  1  1
6         Reading 1: 2  1  2
7       Reading 2: 2.5  1  3

#pivoting
df = df.pivot('g','c','Messy')
print (df)
c                    0       1             2               3
g                                                           
0  21/12/2017 11:12:48  Port:4  Reading 1: 1            None
1  21/12/2017 11:13:48  Port:4  Reading 1: 2  Reading 2: 2.5

相关问题 更多 >