使用python跨多行拆分文本提取行

f = open('./dat.txt', 'r') array = [] for line in f: # if "1\t\"Overall evaluation" in line: # words = line.split("1\t\"Overall evaluation") # print words[0] number = int(line.split(':')[1].strip('"\n')) print number

299 1 "Overall evaluation: 3 Invite to interview: 3 Strength or novelty of the idea (1): 4 Strength or novelty of the idea (2): 3 Strength or novelty of the idea (3): 3 Use or provision of open data (1): 4 Use or provision of open data (2): 3 ""Open by default"" (1): 2 ""Open by default"" (2): 3 Value proposition and potential scale (1): 4 Value proposition and potential scale (2): 2 Market opportunity and timing (1): 4 Market opportunity and timing (2): 4 Triple bottom line impact (1): 4 Triple bottom line impact (2): 2 Triple bottom line impact (3): 2 Knowledge and skills of the team (1): 3 Knowledge and skills of the team (2): 4 Capacity to realise the idea (1): 4 Capacity to realise the idea (2): 3 Capacity to realise the idea (3): 4 Appropriateness of the budget to realise the idea: 3" 299 2 "Overall evaluation: 3 Invite to interview: 3 Strength or novelty of the idea (1): 3 Strength or novelty of the idea (2): 2 Strength or novelty of the idea (3): 4 Use or provision of open data (1): 4 Use or provision of open data (2): 3 ""Open by default"" (1): 3 ""Open by default"" (2): 2 Value proposition and potential scale (1): 4 Value proposition and potential scale (2): 3 Market opportunity and timing (1): 4 Market opportunity and timing (2): 3 Triple bottom line impact (1): 3 Triple bottom line impact (2): 2 Triple bottom line impact (3): 1 Knowledge and skills of the team (1): 4 Knowledge and skills of the team (2): 4 Capacity to realise the idea (1): 4 Capacity to realise the idea (2): 4 Capacity to realise the idea (3): 4 Appropriateness of the budget to realise the idea: 2" 364 1 "Overall evaluation: 3 Invite to interview: 3 ...

f = open('./dat.txt', 'r') array = [] for line in f: if "1\t\"Overall evaluation" in line: words = line.split("1\t\"Overall evaluation") print words[0] # number = int(line.split(':')[1].strip('"\n')) # print number

1条回答

网友

1楼 · 发布于 2024-05-14 08:33:42

正则表达式就是门票。你可以用两种模式来做。像这样：

import re

with open('./dat.txt') as fin:
    for line in fin:
        ma = re.match(r'^(\d+) \d.+Overall evaluation', line)
        if ma:
            print("record identifier %r" % ma.group(1))
            continue
        ma = re.search(r': (\d+)$', line)
        if ma:
            print(ma.group(1))
            continue
        print("unrecognized line: %s" % line)

注意：最后一个print语句不是您需求的一部分，但是每当我调试regex时，我总是添加一些catchall来帮助调试糟糕的regex语句。一旦我弄清楚了我的模式，我就去掉了catchall。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章