替换字符串中的四个字母单词

0 投票

3 回答

2373 浏览

提问于 2025-04-17 17:50

我正在尝试写一段代码，从一个文件读取内容，把所有四个字母的单词替换成'xxxx'，然后把结果写入另一个文件。我知道这个问题在网上已经有人讨论过了，我也在网上搜索了其他相关的问题，但它们都是一样的。我还尝试过修改代码，但还是没能找到解决办法。

def censor(filename):
    'string ==> None, creates file censored.txt in current folder with all 4 letter words replaces with string xxxx'
    import string
    infile = open(filename,'r')
    infile2 = open('censored.txt','w')
    for word in infile:
        words = word.split()
        for i, word in enumerate(words):
            words.strip(string.punctuation)
            if len(word) == 4:
                words[i] == 'xxxx'
                infile2.write(words[i])

我知道这段代码一团糟，根本不管用，但我觉得发出来也许能帮到我。我想到了一个办法，就是先把文本中的标点符号去掉，这样就不会把一些四个字母的单词算成五个字母了。然后把单词分开放到一个列表里，替换掉四个字母的单词，最后再把它们按原来的顺序拼接起来，只不过单词被替换了。所以像“I like to work.”就会变成“I xxxx to xxxx。”

我还看过这个网站上另一个类似的帖子，找到了一个能用的解决方案，但它没有解决标点符号的问题。

def maybe_replace(word, length=4):
    if len(word) == length:
        return 'xxxx'
    else:
        return word

def replacement(filename):
    infile = open(filename,'r')
    outfile = open('censored.txt','w')
    for line in infile:
        words = line.split()
        newWords = [maybe_replace(word) for word in words]
        newLine = ' '.join(newWords)
        outfile.write(newLine + '\n')
    outfile.close()
    infile.close()

所以在这种情况下，如果我有一个单词列表，比如“Frog, boot, cat, dog.”，它会返回“Frog, boot, xxxx xxxx”。

我还找到过一个使用正则表达式的解决方案，但我还是个新手，真的不太理解那个方案。任何帮助都会很感激。

正则表达式字符串处理文本清理编程新手替换算法文件读写单词列表四个字母单词

3 个回答

这就是我的答案！ :)

import string as s
alfanum = s.ascii_letters + s.digits

def maybe_replace(arg, length=4):
    word = ""
    for t in arg: word += t if t in alfanum else ""

    if len(word) == length: 
        if len(arg)>4: return 'xxxx'+arg[4:]
        else: return 'xxxx'
    else: 
      return arg

text = "Frog! boot, cat, dog. bye, bye!"
words = text.split()
print words
print [maybe_replace(word) for word in words]

>>> ['Frog!', 'boot,', 'cat,', 'dog.', 'bye,', 'bye!']
>>> ['xxxx!', 'xxxx,', 'cat,', 'dog.', 'bye,', 'bye!']

回答于 2025-04-17 由 Python大师

分享举报

你代码的第二部分有个问题，就是这行 words = line.split()。默认情况下，它是根据空格来分割的，所以像','这样的符号也被算作你单词的一部分。

如果你真的不想碰正则表达式，我有个建议（虽然还是有一点正则表达式的内容）：

import re
words = re.split('[\W]+', line)

这行代码是让Python根据非字母数字的字符来分割这一行。

回答于 2025-04-17 由 Python大师

分享举报

这个正则表达式的解决方案非常简单：

import re

text = """
    I also found another solution using 
    regex, but I'm still a novice and 
    really can't understand that solution. 
    Any help would be appreciated.
"""

print re.sub(r'\b\w{4}\b', 'xxxx', text)

这个正则表达式可以匹配：

\b，这是一个单词边界。它可以匹配一个单词的开始或结束。
\w{4}匹配四个单词字符（也就是字母、、数字<0-9>或者下划线_）。
\b又是一个单词边界。

输出结果是：

I xxxx found another solution using 
regex, but I'm still a novice and 
really can't understand xxxx solution. 
Any xxxx would be appreciated.

回答于 2025-04-17 由 Python大师

分享举报

替换字符串中的四个字母单词

3 个回答

撰写回答