使用Python中的Split函数从现有列值创建新列

2024-04-19 13:34:13 发布

您现在位置:Python中文网/ 问答频道 /正文

在对以下数据执行代码时,我得到错误:SyntaxError:解析时出现意外EOF

我有一个文件夹,其中放置了多个csv文件,我需要处理每个文件,并使用拆分函数“;”拆分列(Column2)值。一旦值被拆分,我们就必须将key作为列名进行投影,将key值作为列值进行投影

Column1   Column2 

Item1    Material, Teflon ; MODEL: 28' Inches ; MAKE : SAMSUNG ; SUPPLIER/PO DETAILS: AW Tech ; POWER INPUT :65W @240 VOLTS ; NO OF INPUTS : 4 ; METHOD : AIR COOLED ; TYPE : LED
Item1    Material, PLASTIC ; MODEL: 55' Inches ; MAKE : SONY ; SUPPLIER/PO DETAILS: DK MART ; POWER INPUT :55W @240 VOLTS ; NO OF INPUTS : 5 ; METHOD : NEO AIR COOLED ; TYPE : SMART LED
Item1    Material, Teflon ; MODEL: 42' Inches ; MAKE : LG ; SUPPLIER/PO DETAILS: AW Tech ; POWER INPUT :65W @240 VOLTS ; NO OF INPUTS : 4 ; METHOD : AIR COOLED ; TYPE : LED
NaN
NaN
Item1     MATERIAL, PLASTIC ; MAKE        : VIDEOCON ; POWER INPUT        : 22V /240 VOLT ; COMPLETED UNIT : SPARES
Item1    MATERIAL ; MAKE : SONY ; SUPPLIER/PO DETAILS: AW Tech; ; COMPLETED UNIT : UNIT PARTS

预期产出

Item                MODEL       Make    Supplier/PO Details  Power Input    No Of Inputs  Method      Type      Completed Units

Material, Teflon    28' Inches  SAMSUNG       AW Tech        65W @240 VOLTS     4        Air Cooled    LED        
Material, PLASTIC   55' Inches  SONY          DK Material    55W @240 VOLTS     5      NEO AIR COOLED Smart LED    
Material, Teflon    42' Inches  LG            AW Tech        65W @240 VOLTS     4        Air Cooled    LED  
MATERIAL, PLASTIC               VIDEOCON                     22V /240 VOLTS                                         SPARES
Material                        SONY          AW Tech                                                               UNIT PARTS

我一直在尝试的代码:

from ast import literal_eval
path = r"C:\Users\Input\Tests\*.csv"
for fname in glob.glob(path):
  df=pd.read_csv(fname)
  my_list=list(df.columns)
  print(len(my_list),my_list)           
  out = df['Column2'].str.title().str.split(' ; ',1,expand=True)
  newout=('{"'+out[1].replace({':':'":"',' ; ':'","'},regex=True)+'"}')
  newout=newout.str.rsplit(',',1,expand=True)
  m=~(newout[1].str.contains(':').fillna(True))
  newout.loc[m,0]=newout.loc[m,0]+':'+newout.loc[m,1]
  newout.loc[~m,0]=newout.loc[~m,0]+','+newout.loc[~m,1]
  newout=pd.DataFrame(newout[0].dropna().map(literal_eval).tolist())
  newout.insert(0,'Item',out[0])
  newout.columns=newout.columns.str.strip()

Tags: makeledmodelloctechpomaterialsupplier
2条回答

EndOfFile错误实际上很容易解决,可以在for循环中使用print语句查看代码是否正在爆炸。深呼吸,一步一步地跟随这个网站

from ast import literal_eval
path = r"C:\Users\Input\Tests\*.csv"
for fname in glob.glob(path):
  df=pd.read_csv(fname)
  //print the df you are looking at so you can see what data is not 
  being processed in your for loop
  print(df)

我会将这些注释掉,以确保您的df是正确的 然后添加下一行代码

  my_list=list(df.columns)
  print(len(my_list),my_list)           
  out = df['Column2'].str.title().str.split(' ; ',1,expand=True)
  //So you parce here..would place a print statement
  print(out)

这将是下一行添加一旦你通过这些 请记住在添加更多代码时注释掉print语句,以确保在调试时正确处理数据。学会调试比得到答案更重要

  newout=('{"'+out[1].replace({':':'":"',' ; ':'","'},regex=True)+'"}')
  newout=newout.str.rsplit(',',1,expand=True)
  //you parce on this line
  print(newout)
  m=~(newout[1].str.contains(':').fillna(True))
  newout.loc[m,0]=newout.loc[m,0]+':'+newout.loc[m,1]
  newout.loc[~m,0]=newout.loc[~m,0]+','+newout.loc[~m,1]
  newout=pd.DataFrame(newout[0].dropna().map(literal_eval).tolist())
  //this would need a print statement
  print(newout)
  newout.insert(0,'Item',out[0])
  newout.columns=newout.columns.str.strip()

这将是我的第一个建议,看看你的结束文件正在发生。也会在google colabs中打破这一点,这样您就可以遵循您的编码背后的逻辑…将添加一个if(文件结尾)…break->;在拆分上调试时停止for循环

以防万一,如果您的错误没有修复,您可以尝试以下代码:

我是这样试的

解决方案:

import pandas as pd

# Assuming you can Loop on csv folder, then:
df = pd.read_csv('data_.csv')
df.dropna(subset = ["Column2"], inplace=True)

new_data = {'Item' : {}}
for index, row in enumerate(df['Column2'].to_list()):
    row_values = row.split(';') 
    new_data["Item"][index] = (row_values[0].strip())
    for kv in row_values[1:]:
        key_value = kv.split(':')
        if len(key_value) != 2:
            continue

        key = key_value[0].strip()
        value = key_value[1].strip()

        if key in new_data:
            new_data[key][index] = value
        else:
            new_data[key] = {index : value}
            
new_df  = pd.DataFrame(new_data)
print(new_df)

输出:

enter image description here

注意假设您可以使用csv

相关问题 更多 >