将多行数据合并为DataFrame中的单行

1 投票
1 回答
46 浏览
提问于 2025-04-14 18:34

我正在从一个多行的文本文件中提取数据点,并试图把这些数据分组到数据表中的一行里。但是现在每个数据点都被放在了单独的一行,我希望能把它们整理成两行,分别是group1和group2。我对Python还不太熟悉。如果有更有效的方法来做到这一点,那就更好了。我尝试过使用groupby(),但似乎没有效果?提前谢谢大家。

import pandas as pd

data = """
Jan 2024
Group1 02/02/2024
dog 10 20
cat 21 32
Group2 05/02/2024
dog 23 45
cat 45 65
owl 24 12
monthly
Admin 02 22
clean 05 32
"""

extract = []
dog, cat, owl = [], [], []
for line in data.splitlines():
    a = c = e = ''
    # print(line)
    if 'Group' in line:
        group = line.rsplit()[0]
    
    if 'dog' in line or 'cat' in line or 'owl' in line:
        if line.startswith("dog"):
            dog, a, b = line.split()
        elif line.startswith("cat"):
            cat, c, d = line.split()
        elif line.startswith("owl"):
            owl, e, f = line.split()
        
        extract.append({
            'group': group,
            'dog': a,
            'cat': c,
            'owl': e
        })

df = pd.DataFrame(extract)
df = df[['group', 'dog', 'cat', 'owl']]
print(df)

目前我得到的是:

    group dog cat owl
0  Group1  10
1  Group1      21
2  Group2  23
3  Group2      45
4  Group2          24

我想要的是:

   group dog cat owl
0  Group1  10 21
1  Group2  23 45  24 

1 个回答

0

在创建数据框之前,你可以先把行合并起来。这个过程可以通过为每个组保持一个 dict,也就是 列名->值 的形式来实现。当一个新组开始时,更新这个字典,并在这个新组之前把它作为一行添加进去。别忘了在最后也要添加一行。

extract = []
row = None

for line in data.splitlines():
    if 'Group' in line:
        if row is not None: # we have something to add
            extract.append(row)
        group = line.rsplit()[0]
        row = {'group': group} # new group starts - refreshing our dict
    
    if 'dog' in line or 'cat' in line or 'owl' in line:
        animal, val1, val2 = line.split()
        row[animal] = val1
        
if row is not None: # a final group
    extract.append(row)

df = pd.DataFrame(extract)
df = df[['group', 'dog', 'cat', 'owl']]
print(df)

输出结果:

    group dog cat  owl
0  Group1  10  21  NaN
1  Group2  23  45   24

撰写回答