满足条件时如何从字符串中删除单词?
我需要对一系列字符串进行一些“清理”。
1. 要去掉特殊字符(比如 !@#$%^ 等等)
2. 字符串中的所有单词都要变成小写
3. 如果单词的长度小于等于 2 个字符,就要去掉这些单词。(比如 "a, it, me, us" 等等)
trainset = [('It is too bad that our jane is just a pigeon. It would be great if it could speak. It would be able to prove my innocence.'), ('I have no other choice. Is death the only way to prove it? Loving you is really hard!'), ('These are my last words.')]
def cleanedthings(trainset):
cleanedtrain = []
specialch = "!@#$%^&*-=_+:;\".,/?`~][}{|)("
for line in trainset:
for word in line.split():
lowword = word.lower()
for ch in specialch:
if ch in lowword:
lowword = lowword.replace(ch,"")
if len(lowword) >= 3:
cleanedtrain.append(lowword)
return cleanedtrain
上面的函数好像不太好用……你能帮我吗?另外,我希望最后的结果是字符串格式,而不是列表格式。
1 个回答
0
检查一下缩进和语法。逻辑是没问题的。
trainset = [('It is too bad that our jane is just a pigeon. It would be great if it could speak. It would be able to prove my innocence.'), ('I have no other choice. Is death the only way to prove it? Loving you is really hard!'), ('These are my last words.')]
def cleanedthings(trainset):
cleanedtrain = []
specialch = "!@#$%^&*-=_+:;\".,/?`~][}{|)("
for line in trainset:
for word in line.split():
lowword = word.lower()
for ch in specialch:
if ch in lowword:
lowword = lowword.replace(ch,"")
if len(lowword) >= 3:
cleanedtrain.append(lowword)
return cleanedtrain
print " ".join(cleanedthings(trainset))