从不良词汇列表创建过滤函数
我正在尝试创建一个函数,用来屏蔽字符串中的某些词。这个功能有点效果,但也有一些小问题。
这是我的代码:
def censor(sentence):
badwords = 'apple orange banana'.split()
sentence = sentence.split()
for i in badwords:
for words in sentence:
if i in words:
pos = sentence.index(words)
sentence.remove(words)
sentence.insert(pos, '*' * len(i))
print " ".join(sentence)
sentence = "you are an appletini and apple. new sentence: an orange is a banana. orange test."
censor(sentence)
输出结果是:
you are an ***** and ***** new sentence: an ****** is a ****** ****** test.
有些标点符号消失了,而且单词 "appletini"
被错误地替换了。
这个问题怎么解决呢?
另外,有没有更简单的方法来实现这样的功能呢?
2 个回答
0
试试这个:
for i in bad_word_list:
sentence = sentence.replace(i, '*' * len(i))
2
具体的问题有:
- 你完全没有考虑标点符号;
- 在插入
'*'
的时候,你用的是“脏话”的长度,而不是这个词本身。
我建议你调整一下循环的顺序,这样就只需要处理一次句子,并且使用enumerate
,而不是用remove
和insert
:
def censor(sentence):
badwords = ("test", "word") # consider making this an argument too
sentence = sentence.split()
for index, word in enumerate(sentence):
if any(badword in word for badword in badwords):
sentence[index] = "".join(['*' if c.isalpha() else c for c in word])
return " ".join(sentence) # return rather than print
测试str.isalpha
只会把大写和小写字母替换成星号。演示:
>>> censor("Censor these testing words, will you? Here's a test-case!")
"Censor these ******* *****, will you? Here's a ****-****!"
# ^ note length ^ note punctuation