在Python中，如果字符串与单词、短语、布尔值和列表中的任何项匹配，那么最快的方法是什么？

2024-05-12 20:38:30 发布

您现在位置：Python中文网/ 问答频道 /正文

911

网友

男 | 程序猿一只，喜欢编程写python代码。

我试图在Python中找到一种快速的方法来检查一个术语列表是否可以与50到50000个字符的字符串匹配。

一个术语可以是：

一个字，如“苹果”
一个短语，如“樱桃派”
单词和短语的布尔运算，如“甜馅饼、咸馅饼和蛋白糖”

匹配是指单词或短语在单词边界附近的位置，因此：

match(term='apple', string='An apple a day.') # True
match(term='berry pie', string='A delicious berry pie.') # True
match(term='berry pie', string='A delicious blueberry pie.') # False

我目前有40个左右的术语，大部分都是简单的词。条款数量会随着时间的推移而增加，但我预计不会超过400条。

我不感兴趣的是字符串匹配的项，或者它匹配的字符串中的哪个位置，我只需要一个true/false值来匹配每个字符串-很可能没有任何项匹配字符串，所以对于每500个匹配的项中的1个，我可以存储字符串以供进一步处理。

速度是最重要的标准，我想利用那些比我聪明的人的现有代码，而不是试图实现一份白皮书。：）

到目前为止，我想出的最快的解决方案是：

def data():
    return [
        "The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family (Rosaceae).",
        "This resulted in early armies adopting the style of hunter-foraging.",
        "Beef pie fillings are popular in Australia. Chicken pie fillings are too."
    ]

def boolean_and(terms):
    return '(%s)' % (''.join(['(?=.*\\b%s\\b)' % (term) for term in terms]))

def run():
    words_and_phrases = ['apple', 'cherry pie']
    booleans = [boolean_and(terms) for terms in [['sweet pie', 'savoury pie', 'meringue'], ['chicken pie', 'beef pie']]]
    regex = re.compile(r'(?i)(\b(%s)\b|%s)' % ('|'.join(words_and_phrases), '|'.join(booleans)))
    matched_data = list()
    for d in data():
        if regex.search(d):
            matched_data.append(d)

regex的结尾是：

(?i)(\b(apple|cherry pie)\b|((?=.*\bsweet pie\b)(?=.*\bsavoury pie\b)(?=.*\bmeringue\b))|((?=.*\bchicken pie\b)(?=.*\bbeef pie\b)))

因此，所有的词都被“或”放在一起，忽略大小写，单词/短语用“\b”表示单词边界，布尔型和使用lookaheads使所有的词都匹配，但它们不必按特定顺序匹配。

时间结果：

 print timeit.Timer('run()', 'from __main__ import run').timeit(number=10000)
 1.41534304619

如果没有lookaheads（即boolean and），这确实很快，但一旦添加，速度就会大大降低。

有人知道如何改进吗？有没有一种方法可以优化前景，或者是一种完全不同的方法？我不认为词干会起作用，因为它往往有点贪婪的匹配。

Tags： and the 方法字符串 in apple data string

0条回答

目前没有回答

在Python中，如果字符串与单词、短语、布尔值和列表中的任何项匹配，那么最快的方法是什么？

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python中，如果字符串与单词、短语、布尔值和列表中的任何项匹配，那么最快的方法是什么？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >