Python：将dictionary值中的短语匹配到一个句子（dictionary键）并基于匹配输出

* it 0 is 0 lovely weather 1 (combined because it's a phrase) and 0 it is kind of warm 1 (combined because it's a phrase) * and 0 the 0 weather 0 is 0 rainy and cold 1 (combined because it's a phrase) ...(and so on)...

for k,v in dict1.items(): words_in_val = v.split() if len(words_in_val) == 1: words = k.split() for each_word in words: if v == each_word: print(each_word + '\t' + '1') else: print(each_word + '\t' + '0') if len(words_in_val) == 2:: words = k.split() for index,item in enumerate(words[:-1]): if words[index] == words_in_val[0]: if words[index+1] == words_in_val[1]: words[index] = ' '.join(words_in_val) words.remove(words[index+1]) ....something like this...

1条回答

网友

1楼 · 发布于 2024-05-29 10:51:02

所以我会这样做：

from collections import defaultdict

dict1 = {'it is lovely weather and it is kind of warm':['it is kind of', 'it is kind'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']}

def tag_sentences(dict):
    id = 1
    tagged_results = []
    for sentence, phrases in dict.items():
        words = sentence.split()
        phrases_split = [phrase.split() for phrase in phrases]
        positions_keeper = {}
        sentence_results = [(word, 0) for word in words]
        for word_index, word in enumerate(words):
            for index, phrase in enumerate(phrases_split):
                position = positions_keeper.get(index, 0)
                if phrase[position] == word:
                    if len(phrase) > position + 1:
                        positions_keeper[index] = position + 1
                    else:
                        for i in range(len(phrase)):
                            sentence_results[word_index - i] = (sentence_results[word_index - i][0], id)
                        id = id + 1
                else:
                    positions_keeper[index] = 0
        tagged_results.append(sentence_results)
    return tagged_results

def print_tagged_results(tagged_results):
    for tagged_result in tagged_results:
        memory = 0
        memory_sentence = ""
        for result, id in tagged_result:
            if memory != 0 and memory != id:
                print(memory_sentence + "1")
                memory_sentence = ""
            if id == 0:
                print(result, 0)
            else:
                memory_sentence += result + " "
            memory = id
        if memory != 0:
            print(memory_sentence + "1")

tagged_results = tag_sentences(dict1)
print_tagged_results(tagged_results)

基本上是这样做的：

首先，我创建一个标记列表，格式为：[(it, 0), (is, 0), (lovely, 0) ...]
在标记列表中，我将0=>；not In a group和其他整数标记为grouping-together（标记1的单词组合在一起，标记2的单词组合在一起）
我反复遍历每个单词，如果它与短语的开头匹配，或者如果我已经处于当前短语位置的循环中，则标记它
如果是短语的结尾，我用相同的id标记这个单词和所有过去与这个短语匹配的单词
如果它不是结束，我将保持位置，并开始下一个迭代。你知道吗
最后，我得到了一个格式为[(it, 0), (is, 0), (lovely, 1) ... (kind,2), (of, 2), ...]的标记列表

如果一个短语是另一个短语的子短语，那么它就不起作用了，但是您在示例中从来没有提到过它应该如何应对这种情况。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章