读取不一致的文本数据并写入csv

name-> Alice name-> Sam sibling-> Kate, sibling-> Luke, hobby_1-> football hobby_2-> games name-> Alex name-> Ramsay hobby_1-> dance hobby_2-> swimming hobby_3-> jogging

Name Sibling Hobbies name-> Alice N/A N/A name-> Sam sibling-> Kate hobby_1-> football sibling-> Luke hobby_2-> games name-> Alex N/A N/A name-> Ramsay N/A hobby_1-> dance hobby_2-> swimming hobby_3-> jogging

file = open('student.txt' , 'r') with open('student.csv' , 'w') as writer: writer.write ('Name,Sibling,Hobbies\n') for eachline in file: if 'name' in eachline: writer.write (eachline) if 'sibling' in eachline: writer.write (eachline) if 'hobby' in eachline: writer.write (eachline)

1条回答

网友

1楼 · 发布于 2024-04-26 08:01:09

您可以按如下方式处理此问题：

将文本解析成字典
为csv中的每个所需列生成列表
将列表作为列写入csv文件中

示例实现：

file = open('student.txt' , 'r')

### parse the text into a dictionary

# remove whitespaces and commas from each line
lines = [x.strip().replace(',', '') for x in file]
# initialize dictionary
data = dict()

index = 0
while index < len(lines):
    line = lines[index]
    # for each name, generate a dictionary containing two lists
    if 'name' in line:
        # one list for siblings, one list for hobbies
        data[line] = {'siblings': [], 'hobbies': []}
        name_index = index + 1
        # populate the lists with the values listed under the name
        while (name_index < len(lines)) and ('name' not in lines[name_index]):
            if 'sibling' in lines[name_index]:
                data[line]['siblings'].append(lines[name_index])
            elif 'hobby' in lines[name_index]:
                data[line]['hobbies'].append(lines[name_index])
            name_index += 1
            index += 1
    index += 1
    
# data looks like:
'''
{'name-> Alice': {'siblings': [], 'hobbies': []},
 'name-> Sam': {'siblings': ['sibling-> Kate', 'sibling-> Luke'],
  'hobbies': ['hobby_1-> football', 'hobby_2-> games']},
 'name-> Ramsay': {'siblings': [],
  'hobbies': ['hobby_1-> dance', 'hobby_2-> swimming', 'hobby_3-> jogging']}}
'''

### generate a list for each column in the csv

names = []
siblings = []
hobbies = []

null_str = 'N/A'

for key in data:
    # add name to names
    names.append(key)
    
    # get rows for this name
    len_sibs = len(data[key]['siblings'])
    len_hobs = len(data[key]['hobbies'])
    num_rows = max([len_sibs, len_hobs])
    
    if num_rows == 0:
        siblings.append(null_str)
        hobbies.append(null_str)
    else:
        # add (num_rows - 1) empty strings to names
        names.extend([''] * (num_rows - 1))
        siblings_na_added = False
        hobbies_na_added = False
        for i in range(num_rows):
            # add siblings values with conditions for N/A and ''
            if i > (len(data[key]['siblings']) - 1):
                if siblings_na_added == False:
                    siblings.append(null_str)
                    siblings_na_added = True
                else:
                    siblings.append('')
            else:
                siblings.append(data[key]['siblings'][i])
            # add hobbies values with conditions for N/A and ''
            if i > (len(data[key]['hobbies']) - 1):
                if hobbies_na_added == False:
                    hobbies.append(null_str)
                    hobbies_na_added = True
                else:
                    hobbies.append('')
            else:
                hobbies.append(data[key]['hobbies'][i])
    
### write the lists as columns in a csv file

with open('student.csv' , 'w') as writer:
    writer.write ('Name,Sibling,Hobbies\n')
    for i in range(len(names)):
        row = names[i] + ',' + siblings[i] + ',' + hobbies[i] + '\n'
        writer.write(row)

输出读取为文本：

Name,Sibling,Hobbies
name-> Alice,N/A,N/A
name-> Sam,sibling-> Kate,hobby_1-> football
,sibling-> Luke,hobby_2-> games
name-> Ramsay,N/A,hobby_1-> dance
,,hobby_2-> swimming
,,hobby_3-> jogging

输出在excel中读取为csv:

问题比我想象的要复杂：）

相关问题更多 >

编程相关推荐

热门问题

热门文章