读取不一致的文本数据并写入csv

2024-04-26 08:01:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我是新手,谢谢你的耐心。我有一个txt文件“student.txt”,我想把它写入一个csv文件“student.csv”

student.txt

    name-> Alice
    name-> Sam
    sibling-> Kate,
    sibling-> Luke,
    hobby_1-> football
    hobby_2-> games
    name-> Alex
    name-> Ramsay
    hobby_1-> dance
    hobby_2-> swimming
    hobby_3-> jogging

以csv为单位的预期输出:

Name            Sibling               Hobbies
name-> Alice     N/A                  N/A
name-> Sam       sibling-> Kate       hobby_1-> football
                 sibling-> Luke       hobby_2-> games
name-> Alex      N/A                  N/A
name-> Ramsay    N/A                  hobby_1-> dance
                                      hobby_2-> swimming
                                      hobby_3-> jogging

到目前为止我已经完成的代码:

file = open('student.txt' , 'r')
with open('student.csv' , 'w') as writer:
    writer.write ('Name,Sibling,Hobbies\n')


    for eachline in file:
            if 'name' in eachline:
                writer.write (eachline)

            if 'sibling' in eachline:
                writer.write (eachline)

            if 'hobby' in eachline:
                writer.write (eachline)

基本上,名称之后的任何数据都是在下一个名称之前捕获的。但我不知道如何在csv中有序地把它放在N/A中,特别是当一些名字没有兄弟姐妹/爱好时


Tags: 文件csvnameintxtifsamstudent
1条回答
网友
1楼 · 发布于 2024-04-26 08:01:09

您可以按如下方式处理此问题:

  • 将文本解析成字典
  • 为csv中的每个所需列生成列表
  • 将列表作为列写入csv文件中

示例实现:

file = open('student.txt' , 'r')

### parse the text into a dictionary

# remove whitespaces and commas from each line
lines = [x.strip().replace(',', '') for x in file]
# initialize dictionary
data = dict()

index = 0
while index < len(lines):
    line = lines[index]
    # for each name, generate a dictionary containing two lists
    if 'name' in line:
        # one list for siblings, one list for hobbies
        data[line] = {'siblings': [], 'hobbies': []}
        name_index = index + 1
        # populate the lists with the values listed under the name
        while (name_index < len(lines)) and ('name' not in lines[name_index]):
            if 'sibling' in lines[name_index]:
                data[line]['siblings'].append(lines[name_index])
            elif 'hobby' in lines[name_index]:
                data[line]['hobbies'].append(lines[name_index])
            name_index += 1
            index += 1
    index += 1
    
# data looks like:
'''
{'name-> Alice': {'siblings': [], 'hobbies': []},
 'name-> Sam': {'siblings': ['sibling-> Kate', 'sibling-> Luke'],
  'hobbies': ['hobby_1-> football', 'hobby_2-> games']},
 'name-> Ramsay': {'siblings': [],
  'hobbies': ['hobby_1-> dance', 'hobby_2-> swimming', 'hobby_3-> jogging']}}
'''

### generate a list for each column in the csv

names = []
siblings = []
hobbies = []

null_str = 'N/A'

for key in data:
    # add name to names
    names.append(key)
    
    # get rows for this name
    len_sibs = len(data[key]['siblings'])
    len_hobs = len(data[key]['hobbies'])
    num_rows = max([len_sibs, len_hobs])
    
    if num_rows == 0:
        siblings.append(null_str)
        hobbies.append(null_str)
    else:
        # add (num_rows - 1) empty strings to names
        names.extend([''] * (num_rows - 1))
        siblings_na_added = False
        hobbies_na_added = False
        for i in range(num_rows):
            # add siblings values with conditions for N/A and ''
            if i > (len(data[key]['siblings']) - 1):
                if siblings_na_added == False:
                    siblings.append(null_str)
                    siblings_na_added = True
                else:
                    siblings.append('')
            else:
                siblings.append(data[key]['siblings'][i])
            # add hobbies values with conditions for N/A and ''
            if i > (len(data[key]['hobbies']) - 1):
                if hobbies_na_added == False:
                    hobbies.append(null_str)
                    hobbies_na_added = True
                else:
                    hobbies.append('')
            else:
                hobbies.append(data[key]['hobbies'][i])
    
### write the lists as columns in a csv file

with open('student.csv' , 'w') as writer:
    writer.write ('Name,Sibling,Hobbies\n')
    for i in range(len(names)):
        row = names[i] + ',' + siblings[i] + ',' + hobbies[i] + '\n'
        writer.write(row)

输出读取为文本:

Name,Sibling,Hobbies
name-> Alice,N/A,N/A
name-> Sam,sibling-> Kate,hobby_1-> football
,sibling-> Luke,hobby_2-> games
name-> Ramsay,N/A,hobby_1-> dance
,,hobby_2-> swimming
,,hobby_3-> jogging

输出在excel中读取为csv:

enter image description here

问题比我想象的要复杂:)

相关问题 更多 >