非结构化CSV-Python数据帧的选择性转置和转换

100,NEM12,202006011242,TCAUSTM,PROGREEN 200,VCCCCB0100,E1K1Q1,001,E1,N1,006668,KWH,15, 300,20200406,122.000,120.000,120.000,122.000,120.000,120.000,120.000,122.000,120.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,120.000,122.000,120.000,120.000,120.000,122.000,120.000,120.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,120.000,120.000,122.000,120.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,F16,28,,20200601113727, 300,20200407,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,4.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,4.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,F16,28,,20200601113727,

1条回答

网友

1楼 · 发布于 2024-05-13 16:56:33

我认为你必须系统地建立你想要的结构。我无论如何都看不见能让from_csv()工作

直接从文件加载CSV数据，并使用列表理解生成二维列表
concat()来自第一个&；第二排。剩余行的日期，加上读数数组和间隔数组
pd.Series.explode步骤2中的数组explode multiple lists
最后清理、命名列并删除不需要的列

data = '''100,NEM12,202006011242,TCAUSTM,PROGREEN 
200,VCCCCB0100,E1K1Q1,001,E1,N1,006668,KWH,15,
300,20200406,122.000,120.000,120.000,122.000,120.000,120.000,120.000,122.000,120.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,120.000,122.000,120.000,120.000,120.000,122.000,120.000,120.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,122.000,120.000,120.000,120.000,120.000,122.000,120.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,F16,28,,20200601113727,
300,20200407,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,4.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,4.000,0.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,2.000,0.000,F16,28,,20200601113727,'''

with open("sample.csv", "w") as f:
    f.write(data)

with open("sample.csv") as f:
    data = f.read()

arr = [l.split(",") for l in data.split("\n")]    
df = pd.concat([
    pd.DataFrame(arr[0][1:]).T, # pick columns from first row - exclude 100
    pd.DataFrame(arr[1][1:]).T, # pick columns from second row - exclude 200
    # pick the interval_date (exclude 300) and then turn all other columns into an array
    pd.DataFrame(arr[2:]).iloc[:,1:2]\
        .assign(Reading=pd.DataFrame(arr[2:]).iloc[:,2:].values.tolist(),
               Int_No=[[i+1 for i,e in enumerate(l)] for l in pd.DataFrame(arr[2:]).iloc[:,2:].values.tolist()])
    ], axis=1).fillna(method="ffill") # there are two readings rows fill second row
df.columns = [i for i,c in enumerate(df.columns)] # clean up the column numbers
# explode the reading, then name columns and drop redundant ones
# https://stackoverflow.com/questions/45846765/efficient-way-to-unnest-explode-multiple-list-columns-in-a-pandas-dataframe
df = df.set_index(list(df.columns[:-2])).apply(pd.Series.explode).reset_index()\
    .rename({14:"Reading",13:"Interval_date",11:"Interval_length",4:"NEMI",7:"NMI_Suffix",
            10:"UOM",15:"Interval_No"}, axis=1)\
    .drop([0,1,2,3,5,6,8,9,12], 1)

print(df[:10].to_string(index=False))

输出

       NEMI NMI_Suffix  UOM Interval_length Interval_date  Reading Interval_No
 VCCCCB0100         E1  KWH              15      20200406  122.000           1
 VCCCCB0100         E1  KWH              15      20200406  120.000           2
 VCCCCB0100         E1  KWH              15      20200406  120.000           3
 VCCCCB0100         E1  KWH              15      20200406  122.000           4
 VCCCCB0100         E1  KWH              15      20200406  120.000           5
 VCCCCB0100         E1  KWH              15      20200406  120.000           6
 VCCCCB0100         E1  KWH              15      20200406  120.000           7
 VCCCCB0100         E1  KWH              15      20200406  122.000           8
 VCCCCB0100         E1  KWH              15      20200406  120.000           9
 VCCCCB0100         E1  KWH              15      20200406  120.000          10

相关问题更多 >

编程相关推荐

热门问题

热门文章