提取数字之间的文本Python

import codecs import re regex = r'\D(?!\d)' # read a contract in with codecs.open("/Users/someuser/x/y/blah.txt", "r","utf-8") as ins: text = ins.read() # perform magics output = re.findall(regex, text) output

2条回答

网友

1楼 · 编辑于 2024-04-19 07:58:36

这不管用吗？在

import codecs
import re

# find anything that matches the header number pattern
regex = r'\d\.\d\.\d\.\d\.\s'

# read a contract in
with codecs.open("/Users/someuser/x/y/blah.txt", "r","utf-8") as ins:
    text = ins.read()

# perform magics, replace with empty string
output = re.sub(regex, '', text)

# output

网友

2楼 · 编辑于 2024-04-19 07:58:36

好的，如果我理解正确的话，你需要捕捉到段号之间的所有信息。在

这是我想出的regex字符串：regex = r'(?:\d\.){4}.(.+?)(?:\d\.){4}'

让我们把它分解一下：

(?:\d\.){4}这是我们的4个数字，后跟一个句点。(?:)使其成为一个非捕获组，因此我们可以查找此模式将其计数4次，但不能将其添加到匹配项中。在

(.+?)这是我们要捕捉的部分。当使用括号而不使用?:时，它构成一个捕获组，这就是我们要匹配的。 .+?表示一个或多个非贪心字符。问号是非贪心的部分，它意味着我们不会永远保持匹配字符，当我们到达表达式的下一部分时，我们停止。在

(?:\d\.){4} 我们再次以节模式结束，因为我们希望在两个部分之间捕获

下面是我们用来获取我们想要的代码：

p = re.compile(regex, flags=re.DOTALL)

DOTALL标志允许我们保留换行符，通常.匹配除换行符之外的任何字符。在

sections = p.findall(text)其中text是要搜索的字符串

findall方法返回我们匹配的捕获组的列表。在

['A section\n\nSome text. Some other text, too. And stuff. And even more text on the next line.\n\n', "Some sections are really great\n\nWelcome to this section. Which is probably better than others. And I can't even begin to explain how great it is.\n\n"]

相关问题更多 >

编程相关推荐

热门问题

热门文章