在Python中使用regex查找和替换文件中的单词列表

2024-05-29 11:02:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我想把文件的内容打印到终端,并在这个过程中突出显示在列表中找到的任何单词,而不修改原始文件。下面是一个尚未运行的代码的示例:

    def highlight_story(self):
        """Print a line from a file and highlight words in a list."""

        the_file = open(self.filename, 'r')
        file_contents = the_file.read()

        for word in highlight_terms:
            regex = re.compile(
                  r'\b'      # Word boundary.
                + word       # Each item in the list.
                + r's{0,1}', # One optional 's' at the end.
                flags=re.IGNORECASE | re.VERBOSE)
            subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
            result = re.sub(regex, subst, file_contents)

        print result
        the_file.close()

highlight_terms = [
    'dog',
    'hedgehog',
    'grue'
]

实际上,只突出显示列表中的最后一项,而不管它是什么或列表有多长。我假设每个替换都被执行,然后在下一个迭代开始时“被遗忘”。看起来像这样:

Grues have been known to eat both human and non-human animals. In poorly-lit areas dogs and hedgehogs are considered by any affluent grue to a be delicacies. Dogs can frighten awat a grue, however, by barking in a musical scale. A hedgehog, on the other hand, must simply resign itself to its fate of becoming a hotdog fit for a grue king.

但应该是这样的:

Grues have been known to eat both human and non-human animals. In poorly-lit areas dogs and hedgehogs are considered by any affluent grue to a be delicacies. Dogs can frighten away a grue, however, by barking in a musical scale. A hedgehog, on the other hand, must simply resign itself to its fate of becoming a hotdog fit for a grue king.

我怎样才能阻止其他替代品的丢失?在


Tags: and文件thetoinselfre列表
3条回答

提供的regex是正确的,但是for循环是错误的。在

result = re.sub(regex, subst, file_contents)

这一行用file_content中的regex替换subst。在

在第二次迭代中,它再次在file_content中执行替换,正如您打算在result上进行的那样

如何纠正

结果=文件内容

^{pr2}$

每次通过循环都需要将file_contents重新分配给被替换的字符串,重新分配file_contents不会更改文件中的内容:

def highlight_story(self):
        """Print a line from a file and highlight words in a list."""

        the_file = open(self.filename, 'r')
        file_contents = the_file.read()
        output = ""
        for word in highlight_terms:
            regex = re.compile(
                  r'\b'      # Word boundary.
                + word       # Each item in the list.
                + r's{0,1}', # One optional 's' at the end.
                flags=re.IGNORECASE | re.VERBOSE)
            subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
            file_contents  = re.sub(regex, subst, file_contents) # reassign to updatedvalue
        print file_contents
        the_file.close()

另外,使用with打开文件是一种更好的方法,您可以在循环外复制字符串,在循环内更新:

^{pr2}$

您可以将regex修改为以下内容:

regex = re.compile(r'\b('+'|'.join(highlight_terms)+r')s?', flags=re.IGNORECASE | re.VERBOSE)  # note the ? instead of {0, 1}. It has the same effect

那么,就不需要for循环了。在

这段代码获取单词列表,然后用|将它们连接在一起。所以如果你的名单是:

^{pr2}$

正则表达式将是:

\b(cat|dog|mouse)s?

相关问题 更多 >

    热门问题