使用准则删除列表中的重复项

2024-04-25 20:07:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个由两个元素组成的集合,第一个元素仍然是单词,第二个元素是单词来源的文件,现在如果单词相同,我需要将文件名附加到单词上 E、 G.input([['word1', 'F1.txt'], ['word1', 'F2.txt'], ['word2', 'F1.txt'], ['word2', 'F2.txt'], ['word3', 'F1.txt'], ['word3', 'F2.txt'], ['word4', 'F2.txt']]) 应该输出[['word1', 'F1.txt', 'F2.txt'], ['word2', 'F1.txt', 'F2.txt'], ['word3', 'F1.txt', 'F2.txt'], ['word4', 'F2.txt']] 你能告诉我怎么做吗?你知道吗


Tags: 文件txt元素input文件名来源单词f2
3条回答

另外,如果您不想使用defaultdict,可以按以下步骤操作:

inner=[[]]
count = 0
def loockup(data,i, count):
    for j in range(i+1, len(data)):
        if data[i][0] == data[j][0] and data[j][1] not in inner[count]:
            inner[count].append(data[j][1])
    return inner

for i in range(len(data)):
    if data[i][0] in inner[count]:
        inner=loockup(data,i,count)
    else:
        if i!=0:
            count +=1
            inner.append([])
        inner[count].append(data[i][0])
        inner[count].append(data[i][1])
        loockup(data,i, count)
print (inner)

使用一组可见项保持插入顺序:

from collections import defaultdict

def remove_dups_pairs_ordered(lst):
    d = defaultdict(list)

    # stores word,file pairs we already seen
    seen = set()
    for item in lst:
        word, file = item
        key = (word, file)

        # skip adding word,file we already seen before
        if key in seen:
            continue
        seen.add(key)
        d[word].append(file)

    # convert the dict word -> [f1, f2..] into 
    # a list of lists [[word1, f1,f2, ...], [word2, f1, f2...], ...]
    return [[word] + files for word, files in d.items()]

print(remove_dups_pairs_ordered(lst))

输出:

[['fire', 'elem.txt', 'things.txt'], ['water', 'elem.txt', 'nature.txt']]


不使用defaultdict&set保留订单:

from collections import defaultdict

def remove_dups_pairs(lst):
    d = defaultdict(set)

    for item in lst:
        d[item[0]].add(item[1])
    return [[word] + list(files) for word, files in d.items()]

lst = [
    ["fire","elem.txt"], ["fire","things.txt"],
    ["water","elem.txt"], ["water","elem.txt"],
    ["water","nature.txt"]
]

print(remove_dups_pairs(lst))

输出:

   [['fire', 'things.txt', 'elem.txt'], ['water', 'nature.txt', 'elem.txt']]

可以使用setdefaultdict

from collections import defaultdict


def remove_dups_pairs(lst):
    s = set(map(tuple, lst))
    d = defaultdict(list)
    for word, file in s:
        d[word].append(file)
    return [[key] + values for key, values in d.items()]


print(remove_dups_pairs([["fire", "elem.txt"], ["fire", "things.txt"], ["water", "elem.txt"], ["water", "elem.txt"], ["water", "nature.txt"]]))

输出

[['fire', 'elem.txt', 'things.txt'], ['water', 'elem.txt', 'nature.txt']]

正如@ShmulikA提到的,set不保留顺序,如果需要保留顺序,可以这样做:

def remove_dups_pairs(lst):
    d = defaultdict(list)
    seen = set()
    for word, file in lst:
        if (word, file) not in seen:
            d[word].append(file)
            seen.add((word, file))

    return [[key] + values for key, values in d.items()]


print(remove_dups_pairs([["fire", "elem.txt"], ["fire", "things.txt"], ["water", "elem.txt"], ["water", "elem.txt"],
                         ["water", "nature.txt"]]))

输出

[['water', 'elem.txt', 'nature.txt'], ['fire', 'elem.txt', 'things.txt']]

相关问题 更多 >