如何找到双条目并将它们变异为一个键

2024-06-10 02:58:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我的剧本是这样的:

import csv
with open('lees.csv','rU') as naver:

    reader = csv.DictReader (naver)
    for alist in reader:
        name = alist["naam"]
        polisnumber = alist["polisnr"]
        riskadr = alist["risico adr"]
        insurencecode = alist["branchecode"]
        relationnumber = alist["rel"]
        header = alist["aanhef"]
        tav = alist["tav"]
        thelist = [name,riskadr,polisnumber,
                  relationnumber,insurencecode,header,tav]

脚本的输出是:

['Cautus  B.V.', 'plein 92', '1129008', '10', 'AVB', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa']
['Cautus  B.V.', 'Wei 9-11', '1019123', '10', 'AVB', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa']
['Cautus  B.V.', 'plein 92', '1129008', '10', 'BEDR', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa']
['Cautus  B.V.', 'Wei 9-11', '1019123', '10', 'BEDR', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa']
['De company', 'tiellaan 42', 'KD0022232', '13', 'AVB', 'Geachte heer Tigch', 'De heer I. Tigch']
['De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch']
['Slever ', 'klopt 42', 'KD2220115', '17', 'AVB', 'Geachte heer Slever', 'De heer T.Slever']

如您所见,我从.csv文件创建了一个dir。你知道吗

我的问题是,我需要编写一个脚本来过滤第二个重复项的riskadrwei 9-11/plein 92/tiellaan 42)中的重复项,并将第二个重复项的insurencecodeAVB/BEDR/DAS与其他项一起添加到新列表中的第一个

所以现在我们有两个条目具有相同的adr风险,如下所示:

['De company', 'tiellaan 42', 'KD0022232', '13', 'AVB', 'Geachte heer Tigch', 'De heer I. Tigch']
['De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch']

但是我想要一个scipt,它从2个条目中生成1个条目,并将保险类型添加到第一个1中,如下所示(AVB/DAS):

['De company', 'tiellaan 42', 'KD0022232', '13', 'AVB','DAS', 'Geachte heer Tigch', 'De heer I. Tigch']

Tags: csvdecompanydasalistavbdaacautus
3条回答
>>> a = [
... ('De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch'),
... ('De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch'),
... ]
>>> 
>>> set(a)
set([('De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch')])
>>>

将它们保存为元组而不是列表,并将它们添加到集合中。。。如果这是你需要的

您应该能够使用itertools.groupby实现您的目标:

from itertools import groupby

# define input
l = [['Cautus  B.V.', 'plein 92', '1129008', '10', 'AVB', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa'],
     ['Cautus  B.V.', 'Wei 9-11', '1019123', '10', 'AVB', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa'],
     ['Cautus  B.V.', 'plein 92', '1129008', '10', 'BEDR', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa'],
     ['Cautus  B.V.', 'Wei 9-11', '1019123', '10', 'BEDR', 'Geachte mevrouw Daa', 'Mevrouw C.P. Daa'],
     ['De company', 'tiellaan 42', 'KD0022232', '13', 'AVB', 'Geachte heer Tigch', 'De heer I. Tigch'],
     ['De company', 'tiellaan 42', 'KD0022232', '13', 'DAS', 'Geachte heer Tigch', 'De heer I. Tigch'],
     ['Slever ', 'klopt 42', 'KD2220115', '17', 'AVB', 'Geachte heer Slever', 'De heer T.Slever']]

# remove clutter
l_clean = [(x[1], x[4]) for x in l]

# sort (groupby requires input to be sorted)
l_sorted = sorted(l_clean)

# group by first column
l_final = [(k, zip(*v)[1]) for k,v in groupby(l_sorted, key=lambda x:x[0])]

# print output
for k,v in l_final: 
    print k, list(v)

输出为:

Wei 9-11 ['AVB', 'BEDR']
klopt 42 ['AVB']
plein 92 ['AVB', 'BEDR']
tiellaan 42 ['AVB', 'DAS']

请注意,您将需要调整用于排序和分组的key函数,以便在不同于l_clean的输入中正常工作。你知道吗

你可能需要这样的东西。有一个内存数组(最后一个列表),您可以在其中检查是否存在类似的列表。如果发现,请附上保险代码

def search(item, array):
    for i in range(len(array)):
        # if first four elements and last two elements are identical
        if array[i][:4] == item[0:4] and array[i][-2:] == item[-2:]:
            return i
    return -1

index = search(thelist, ultimatelist):
if index > 0:
    ultimatelist[index] = ultimatelist[index][:4] + thelist[4] + ultimatelist[index][4:]

相关问题 更多 >