检查其他字符串中存在的加工字符串的有效方法

1条回答

网友

1楼 · 发布于 2024-06-07 06:24:43

为了降低时间复杂度，我们可以增加空间复杂度。通过keywords并将它们散列到set（）中，假设每个关键字都是唯一的（如果不是，重复项将被删除）。你知道吗

然后您可以遍历paragraph并创建一个、两个或三个单词的短语，检查它们是否存在，并随着这些短语中的任何一个出现在hashedKeywords中而增加它们的计数。时间复杂度为O（m+n）=~O（n），但空间复杂度从O（1）到O（n）。你知道吗

import string # for removing punctuation

# Sample input with bigrams and trigrams in keywords
paragraphs = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."
keywords = ['magna', 'lorem ipsum', 'sed do eiusmod', 'aliqua']

# Hash keywords into set for faster look up
hashedKeywords = set()
for keyword in keywords:
    hashedKeywords.add(keyword)

# Strip punctuation from paragraph phrases using translate() and make it case insensitive using lower()
table = str.maketrans({key: None for key in string.punctuation})
wordsInParagraphs = [w.translate(table).lower() for w in paragraphs.split()]

# Initialize for loop
maxGram = 3
wordFrequency = {}

# Loop through words in paragraphs but also create a small list of one, two, or three word phrases. 

for i in range(len(wordsInParagraphs)):
    # List slicing ensures the last word and second to last word will produce a one and two string list, respectively (since slicing past the length of the list will simply return a list up to the last element in Python)
    phrases = wordsInParagraphs[i:i+maxGram] # e.g. ['lorem', 'ipsum', 'dolor']

    # Loop through the one, two, and three word phrases and check if phrase is in keywords
    for j in range(len(phrases)):
        phrase = ' '.join(phrases[0:j+1]) # Join list of strings into a complete string e.g. 'lorem', 'lorem ipsum', and 'lorem ipsum dolor'
        if phrase in hashedKeywords:
            wordFrequency.setdefault(phrase , 0)
            wordFrequency[phrase] += 1
print(wordFrequency)

输出：

{'lorem ipsum': 1, 'sed do eiusmod': 1, 'magna': 1, 'aliqua': 1}

注意：这是在python3中。如果在python2中希望删除标点符号，请参见this answer。你知道吗

一些关键字是n-grams

相关问题更多 >

编程相关推荐

热门问题

热门文章