Python-遍历字符串列表并对部分匹配字符串进行分组

2024-04-28 16:37:55 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我有一个字符串列表如下:

list = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]

如何遍历列表并对不带给定关键字的部分匹配字符串进行分组。结果如下:

list 1 = [["I love cat","I love dog","I love fish"],["I hate banana","I hate apple","I hate orange"]]

非常感谢。


Tags: 字符串apple列表关键字listcatbananadog
3条回答

试着建立一个反向索引,然后你可以选择任何你喜欢的关键字。这种方法忽略单词顺序:

index = {}
for sentence in sentence_list:
    for word in set(sentence.split()):
        index.setdefault(word, set()).add(sentence)

或者这种方法,它通过所有可能的全词短语前缀来键控索引:

index = {}
for sentence in sentence_list:
    number_of_words = length(sentence.split())
    for i in xrange(1, number_of_words):
        key_phrase = sentence.rsplit(maxsplit=i)[0]
        index.setdefault(key_phrase, set()).add(sentence)

如果你想找到所有包含关键词的句子(或者以短语开头,如果这是你的索引):

match_sentences = index[key_term]

或一组给定的关键字:

matching_sentences = reduce(list_of_keywords[1:], lambda x, y: x & index[y], initializer = index[list_of_keywords[0]])

现在,您可以通过使用这些索引生成句子来构建列表理解,从而生成一个按几乎所有术语或短语组合分组的列表。E、 例如,如果您构建了短语前缀索引,并希望所有内容按前两个单词短语分组:

return [list(index[k]) for k in index if len(k.split()) == 2]

序列匹配器将为您完成任务。调整分数比以获得更好的结果。

试试这个:

from difflib import SequenceMatcher
sentence_list = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]
result=[]
for sentence in sentence_list:
    if(len(result)==0):
        result.append([sentence])
    else:
        for i in range(0,len(result)):
            score=SequenceMatcher(None,sentence,result[i][0]).ratio()
            if(score<0.5):
                if(i==len(result)-1):
                    result.append([sentence])
            else:
                if(score != 1):
                    result[i].append(sentence)

输出:

[['I love cat', 'I love dog', 'I love fish'], ['I hate banana', 'I hate apple', 'I hate orange']]

在命名变量时避免使用list这样的词。而且list 1不是有效的python变量。

试试这个:

import sys
from itertools import groupby

#Assuming you group by the first two words in each string, e.g. 'I love', 'I hate'.

L = ["I love cat", "I love dog", "I love fish", "I hate banana", "I hate apple", "I hate orange"]

L = sorted(L)

result = []

for key,group in groupby(L, lambda x: x.split(' ')[0] + ' ' + x.split(' ')[1]):
    result.append(list(group))

print(result)

相关问题 更多 >