Pandas读取数据帧将连续标题转换为列

2024-04-29 03:55:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我在.txt文件中有以下数据:

LC xx1   
Name y1 y2 y3
A 10 12 13
B 9 11 15
C 7 15 16

LC xy2   
Name y1 y2 y3
A 11 12 19
B 20 37 20
C 40 15 1

我想将其读入具有以下结构的熊猫数据帧:

LC Name y1 y2 y3
XX1 A 10 12 13
xx1 B 9 11 15
xx1 C 7 15 16
xy2 A 11 12 19
xy2 B 20 37 20
xy2 C 40 15 1

有没有人知道如何通过编程来实现这一点? 我必须在一个10MB的大文件中完成。你知道吗

提前谢谢。你知道吗

J.A


Tags: 文件数据nametxt编程结构lcy1
2条回答

您可以使用:

temp=u""" LC xx1   
    Name y1 y2 y3
    A 10 12 13
    B 9 11 15
    C 7 15 16

    LC xy2   
    Name y1 y2 y3
    A 11 12 19
    B 20 37 20
    C 40 15 1"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.txt'

#set names parameter by number of columns
df = pd.read_csv(pd.compat.StringIO(temp), sep="\s+", names=range(4))
print (df)
     0    1    2    3
0    LC  xx1  NaN  NaN
1  Name   y1   y2   y3
2     A   10   12   13
3     B    9   11   15
4     C    7   15   16
5    LC  xy2  NaN  NaN
6  Name   y1   y2   y3
7     A   11   12   19
8     B   20   37   20
9     C   40   15    1

#set columns names by second row
df.columns = df.iloc[1]
#remove index name 1
df.columns.name = None
#get mask by comapring LC
mask = df['Name'] == 'LC'
#create new column by mask with forward filling missing values
df.insert(0, 'LC', df['y1'].where(mask).ffill())
#remove rows with LC and columns names
df = df[~mask & (df['Name'] != 'Name')].reset_index(drop=True)
print (df)
    LC Name  y1  y2  y3
0  xx1    A  10  12  13
1  xx1    B   9  11  15
2  xx1    C   7  15  16
3  xy2    A  11  12  19
4  xy2    B  20  37  20
5  xy2    C  40  15   1

另一个python解决方案:

items = []
cols = []
with open('file.txt') as f: 
    LC = ''
    #loop by each line
    for i, line in enumerate(f):
        #remove traling new line char and split by whitespace
        l = line.rstrip('\n').split() 
        #store columns names
        if (i == 1):
            cols = l
        #store value next LC
        if (len(l) == 2) and (l[0] == 'LC'):
            LC = l[1]
        #store each line, remove empty lists
        elif (len(l) > 2) and (l[0] != 'Name'):
            items.append([LC] + l)
#create DataFrame
df = pd.DataFrame(items, columns=['LC'] + cols)
#if necessary convert columns to integers
df.iloc[:, 2:] = df.iloc[:, 2:].astype(int)
print (df)
    LC Name  y1  y2  y3
0  xx1    A  10  12  13
1  xx1    B   9  11  15
2  xx1    C   7  15  16
3  xy2    A  11  12  19
4  xy2    B  20  37  20
5  xy2    C  40  15   1

这应该很管用。你知道吗

import pandas as pd
from io import StringIO

df=[]

with open("a.txt","r") as file:#load file with context
    stream_string=""
    for line in file:#loop on each line
        if line.startswith("LC "):#detect LC
            #store LC for retrival
            lc=str(line).replace("LC ","").replace("\n","").strip()
        elif line!="\n":
            #append line to string if not \n
            stream_string+=line
        else:
            #read string into pandas and add to list
            data=pd.read_csv(StringIO(stream_string),sep=" ")
            data["LC"]=lc
            df.append(data)
            stream_string=""
    else:
        #codes for EOF
        data=pd.read_csv(StringIO(stream_string),sep=" ")
        data["LC"]=lc
        df.append(data)
#create df by joining df in list
df=pd.concat(df,ignore_index=True)

相关问题 更多 >