读取CSV时使用re.findall

2024-04-28 04:43:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试读取CSV文件,并使用re.findall获取特定部分

下面是我的CSV文件的前几行的示例

School: Johnson County Elementary School | Student First Name: John | Student Last Name: Doe, 1, Please leave yearbook with sister in office
School: Kirkwood Elementary School | Student First Name: Karen | Student Last Name: Rodgers, 3, Null
School: 2nd Street Elementary School | Student First Name: Joe | Student Last Name: Greene, 12, Give to mom at pickup

这是我正在使用的代码

import csv
import re

def fileReader():
while True:
    input_file = input('What file would you like to read from? (or stop) ')
    if input_file.upper() == 'STOP':
        break
    schools = input('What school would you like to generate reports for? ')
    file_contents = open(input_file, newline='', encoding='utf-8')
    for row in csv.reader(file_contents):
        schoolName = re.findall('(?<=Student First Name: ).+?(?= |)',row[0], re.DOTALL)
        print(schoolName)


fileReader()

当我运行这段代码时,输出是学校名称的第一个字符,如下所示:

['J']
['K']
['2']

相反,我想要整个学校的名字,比如:

['Johnson County Elementary School']
['Kirkwood Elementary School']
['2nd Street Elementary School']

我真的很困惑,为什么期末考试返回的是第一个字母,而不是学校的全名


Tags: 文件csvtonamereinputstudentfile
1条回答
网友
1楼 · 发布于 2024-04-28 04:43:03

首先,寻找School而不是Student First Name😀

然后,|作为OR运算符对正则表达式是特殊的,必须作为\|转义才能按字面意思找到它:

schoolName = re.findall('(?<=School: ).+?(?= \|)',row[0], re.DOTALL)

您不需要使用csv模块或lookahead/lookahead来查找学校:

import re

with open('input.csv') as file:
    for row in file:
        schoolName = re.search('School: (.+?) \|',row).group(1)
        print(schoolName)

相关问题 更多 >