文本替换在特殊情况下不起作用

2024-04-16 15:12:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个单词列表文件,名为文字.txt,其中包含数百个单词和一些字幕文件(.srt)。我想通过所有的字幕文件,并在单词列表文件中搜索所有的单词。如果找到一个单词,我想把它的颜色改成绿色。代码如下:

import fileinput
import os
import re

wordsPath = 'C:/Users/John/Desktop/Subs/Words.txt'
subsPath = 'C:/Users/John/Desktop/Subs/Season1'
wordList = []

wordFile = open(wordsPath, 'r')
for line in wordFile:
    line = line.strip()
    wordList.append(line)

for word in wordList:
    for root, dirs, files in os.walk(subsPath, topdown=False):
        for fileName in files:
            if fileName.endswith(".srt"):
                with open(fileName, 'r') as file :
                    filedata = file.read()
                    filedata = filedata.replace(' '  +word+  ' ', ' ' + '<font color="Green">' +word+'</font>' + ' ')
                with open(fileName, 'w') as file:
                    file.write(filedata)

假设单词“book”在列表中,并且可以在其中一个副标题文件中找到。只要这个词出现在“这本书太棒了”这样的句子中,我的代码就可以很好地工作。然而,当这个词像“书”、“书”这样被提及时,当它出现在乞讨或句末时,代码就失效了。我怎样才能解决这个问题?你知道吗


Tags: 文件代码inimporttxt列表forline
1条回答
网友
1楼 · 发布于 2024-04-16 15:12:38

您正在使用文档中的str.replace

Return a copy of the string with all occurrences of substring old replaced by new

这里,出现表示字符串old的精确匹配,然后函数将尝试替换被空格包围的单词,例如' book ',它不同于' BOOK '' Book '' book'。让我们看看一些不匹配的情况:

" book " == " BOOK "  # False
" book " == " book"  # False
" book " == " Book "  # False
" book " == " bOok " # False
" book " == "   book " # False

另一种方法是使用如下正则表达式:

import re

words = ["book", "rule"]
sentences = ["This book is amazing", "The not so good book", "OMG what a great BOOK", "One Book to rule them all",
             "Just book."]

patterns = [re.compile(r"\b({})\b".format(word), re.IGNORECASE | re.UNICODE) for word in words]
replacements = ['<font color="Green">' + word + '</font>' for word in words]

for sentence in sentences:

    result = sentence[:]
    for pattern, replacement in zip(patterns, replacements):
        result = pattern.sub(r'<font color="Green">\1</font>', result)
    print(result)

输出

This <font color="Green">book</font> is amazing
The not so good <font color="Green">book</font>
OMG what a great <font color="Green">BOOK</font>
One <font color="Green">Book</font> to <font color="Green">rule</font> them all
Just <font color="Green">book</font>.

相关问题 更多 >