查找以辅音开头和结尾的单词
我正在尝试找出那些以辅音字母开头和结尾的单词。下面是我尝试过的代码,但结果并不是我想要的。我现在很困惑,需要你们的帮助和建议。
import re
a = "Still, the conflicting reports only further served to worsen tensions in the Ukraine crisis, which has grown drastically \
in the past few weeks to a new confrontation between Russia and the West reminiscent of low points in the Cold War."
b = re.findall(" ([b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z, ',', '.'].+?[b, c, d, f, g, h, j, k, l, m, n, p, q, r, s, t, v, w, x, y, z, ',', '.']) ", a.lower())
print(b)
输出结果是:
['the conflicting', 'further', 'to worsen', 'the ukraine crisis,', 'has', 'drastically', 'the past', 'weeks', 'new', 'between', 'the west', 'low', 'the cold']
但是输出结果不正确。我必须使用正则表达式。没有它的话,我觉得会太难了。
非常感谢!
4 个回答
1
首先,你应该用 split()
方法把 a
拆分成每一个单词。接着,你要检查每个单词的第一个字母和最后一个字母是否在 consonants
这个列表里。如果在的话,就把这个单词加到 all
里。最后,你把 all
的内容打印出来。
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
a = "Still, the conflicting reports only further served to worsen tensions in the Ukraine crisis, which has grown drastically \
in the past few weeks to a new confrontation between Russia and the West reminiscent of low points in the Cold War."
all = []
for word in a.split():
if word[0] in consonants and word[len(word)-1] in consonants:
all.append(word)
print all
1
如果你想去掉标点符号,这个正则表达式可以用:
>>> re.findall(r'\b[bcdfghj-np-tv-z][a-z]*[bcdfghj-np-tv-z]\b', a.lower())
['still', 'conflicting', 'reports', 'further', 'served', 'worsen', 'tensions', 'crisis', 'which', 'has', 'grown', 'drastically', 'past', 'few', 'weeks', 'new', 'confrontation', 'between', 'west', 'reminiscent', 'low', 'points', 'cold', 'war']
不过,你最开始的尝试看起来是想保留逗号和句号,所以如果这是你的目标,可以用这个:
>>> re.findall(r'\b[bcdfghj-np-tv-z][a-z]*[bcdfghj-np-tv-z][,.]?(?![a-z])', a.lower())
['still,', 'conflicting', 'reports', 'further', 'served', 'worsen', 'tensions', 'crisis,', 'which', 'has', 'grown', 'drastically', 'past', 'few', 'weeks', 'new', 'confrontation', 'between', 'west', 'reminiscent', 'low', 'points', 'cold', 'war.']
我不太明白为什么我第一个例子中的 \b
通常不会匹配到结尾的标点(文档上说它会),但无论如何,这些都能用。
如果你想考虑到缩写形式,表达式可以简单改成这个:
r"\b[bcdfghj-np-tv-z][a-z']*[bcdfghj-np-tv-z][,.]?(?![a-z])"
2
试试这个:
vowels = ['a', 'e', 'i', 'o', 'u']
words = [w for w in a.split() if w[0] not in vowels and w[-1] not in vowels]
不过这样做并不能处理以.
和,
结尾的单词。
补充说明:如果你需要用正则表达式来查找模式:
ending_in_vowel = r'(\b\w+[AaEeIiOoUu]\b)?' #matches all words ending with a vowel
begin_in_vowel = r'(\b[AaEeIiOoUu]\w+\b)?' #matches all words beginning with a vowel
我们需要找出所有既不以元音字母开头,也不以元音字母结尾的单词。
ignore = [b for b in re.findall(begin_in_vowel, a) if b]
ignore.extend([b for b in re.findall(ending_in_vowel, a) if b])
然后你的结果就是:
result = [word for word in a.split() if word not in ignore]
4
这里有一个很简单明了的解决办法,使用了 startswith()
和 endswith()
这两个函数。为了达到你的目标,你需要自己去掉一些特殊字符,并把你的字符串转换成一个单词的列表(在代码中叫做 s
):
vowels = ('a', 'e', 'i', 'o', 'u')
[w for w in s if not w.lower().startswith(vowels) and not w.lower().endswith(vowels)]