我现在使用python在文本中提取连续的大写单词(至少两个)。你知道吗
例如,有一个句子
Hollywood is a neighborhood in the central region of Los Angeles.
那么预期的输出应该是
Los Angeles
我试着用函数式编程的方式来做这件事。你知道吗
import itertools
import string
import operator
text = "Take any tram, U-bahn or bus which stops at Düsseldorf Hauptbahnhof (HBF). Leave the station via the main exit Konrad Adenauer Platz, you will see trams and buses in front of the station. Walk up Friedrich Ebert Straße turning right into the third street which is the Oststraße."
def fold(it):
def fold_impl(x, y):
return itertools.starmap(operator.and_, zip(x, itertools.islice(y, 1, None)))
return fold_impl(*itertools.tee(it))
def unfold(it):
def unfold_impl(x, y):
return itertools.starmap(operator.or_, zip(itertools.chain(x, [False]), itertools.chain([False], y)))
return unfold_impl(*itertools.tee(it))
def ngrams(it, n):
return it if n <= 1 else unfold(ngrams(fold(it), n - 1))
def ngrams_idx(it, n):
return (sorted(x[0] for x in g) for k, g in itertools.groupby(enumerate(ngrams(it, n)), key=lambda x: x[1]) if k)
def booleanize(text_vec):
return map(lambda x: x[0] in string.ascii_uppercase, text_vec)
def ngrams_phrase(text_vec, n):
def word(text_vec, idx):
return ' '.join(map(lambda i: text_vec[i], idx))
return [word(text_vec, idx) for idx in ngrams_idx(booleanize(text_vec), n)]
但是我觉得我把它弄得有点太复杂了,有没有更简单的方法用函数式编程来处理这个问题?你知道吗
我认为entry调用应该是
ngram_phrase(text.split(), 2)
,OP正在查找所有出现的短语,这些短语的连续大写首字母的数量至少为2,例如,将代码片段与text
一起运行将导致["Düsseldorf Hauptbahnhof", "Konrad Adenauer Platz", "Friedrich Ebert Straße"]
。你知道吗看看这个:
在python中这并不是一个好的实践,但最短的方法是减少拆分的文本:
相关问题 更多 >
编程相关推荐