Python:将dictionary值中的短语匹配到一个句子(dictionary键)并基于匹配输出

2024-05-29 10:51:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一本字典,每一个键都是一个句子,值是这个句子中的特定单词或短语。你知道吗

例如:

dict1 = {'it is lovely weather and it is kind of warm':['lovely weather', 'it is kind of warm'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']}

我想我的输出是每个句子的标签是否在字典值的短语的基础上。你知道吗

在本例中,输出为(其中0不在值中,1在值中)

*
it 0
is 0
lovely weather 1 (combined because it's a phrase)
and 0
it is kind of warm 1 (combined because it's a phrase)
*
and 0
the 0
weather 0
is 0
rainy and cold 1 (combined because it's a phrase)
...(and so on)...

我可以做这样的事情,但只能通过硬编码短语中的字数:

for k,v in dict1.items():
   words_in_val = v.split()
   if len(words_in_val) == 1:
      words = k.split()
      for each_word in words:
             if v == each_word:
                   print(each_word + '\t' + '1')
             else:
                   print(each_word + '\t' + '0')


     if len(words_in_val) == 2::
         words = k.split()
         for index,item in enumerate(words[:-1]):
                if words[index] == words_in_val[0]:
                       if words[index+1] == words_in_val[1]:
                              words[index] = ' '.join(words_in_val)
                              words.remove(words[index+1])
                              ....something like this...

我的问题是,我可以看到它开始变得混乱,而且在理论上,我可以在我想要匹配的短语中有无限数量的单词,尽管它通常是<;10个。你知道吗

有人知道怎么做吗?你知道吗


Tags: andofinindexifisitval
1条回答
网友
1楼 · 发布于 2024-05-29 10:51:02

所以我会这样做:

from collections import defaultdict

dict1 = {'it is lovely weather and it is kind of warm':['it is kind of', 'it is kind'],'and the weather is rainy and cold':['rainy and cold'],'the temperature is ok':['temperature']}

def tag_sentences(dict):
    id = 1
    tagged_results = []
    for sentence, phrases in dict.items():
        words = sentence.split()
        phrases_split = [phrase.split() for phrase in phrases]
        positions_keeper = {}
        sentence_results = [(word, 0) for word in words]
        for word_index, word in enumerate(words):
            for index, phrase in enumerate(phrases_split):
                position = positions_keeper.get(index, 0)
                if phrase[position] == word:
                    if len(phrase) > position + 1:
                        positions_keeper[index] = position + 1
                    else:
                        for i in range(len(phrase)):
                            sentence_results[word_index - i] = (sentence_results[word_index - i][0], id)
                        id = id + 1
                else:
                    positions_keeper[index] = 0
        tagged_results.append(sentence_results)
    return tagged_results

def print_tagged_results(tagged_results):
    for tagged_result in tagged_results:
        memory = 0
        memory_sentence = ""
        for result, id in tagged_result:
            if memory != 0 and memory != id:
                print(memory_sentence + "1")
                memory_sentence = ""
            if id == 0:
                print(result, 0)
            else:
                memory_sentence += result + " "
            memory = id
        if memory != 0:
            print(memory_sentence + "1")

tagged_results = tag_sentences(dict1)
print_tagged_results(tagged_results)

基本上是这样做的:

  1. 首先,我创建一个标记列表,格式为:[(it, 0), (is, 0), (lovely, 0) ...]
  2. 在标记列表中,我将0=>;not In a group和其他整数标记为grouping-together(标记1的单词组合在一起,标记2的单词组合在一起)
  3. 我反复遍历每个单词,如果它与短语的开头匹配,或者如果我已经处于当前短语位置的循环中,则标记它
  4. 如果是短语的结尾,我用相同的id标记这个单词和所有过去与这个短语匹配的单词
  5. 如果它不是结束,我将保持位置,并开始下一个迭代。你知道吗
  6. 最后,我得到了一个格式为[(it, 0), (is, 0), (lovely, 1) ... (kind,2), (of, 2), ...]的标记列表

如果一个短语是另一个短语的子短语,那么它就不起作用了,但是您在示例中从来没有提到过它应该如何应对这种情况。你知道吗

相关问题 更多 >

    热门问题