Python中基于公共匹配属性的对象列表合并为一致列表

2024-06-16 10:31:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个对象列表,我想根据匹配的属性(id)和可选的类参数“压缩”成一个更小的对象列表。在

class Case:
    def __init__(self, id, formtype, age, fever=None, cough=None, gender=None):
        self.case_id = case_id
        self.form_type = formtype
        self.age = age
        self.fever = fever
        self.cough = cough
        self.gender = gender

caselist = [
    Case(id="12345", formtype="A", age=12, fever=1, gender="female"),
    Case(id="12345", formtype="B", age=12, cough=0),
    Case(id="67890", formtype="A", age=34, fever=0, gender="male"),
    Case(id="67890", formtype="B", age=34, cough=1),
    Case(id="75321", formtype="A", age=2, fever=0, gender="male")
]

我怎么才能得到这样的新名单?它应该选择formtype="B"而不是formtype="A"。在

^{pr2}$

我试着用dict压缩它,但没有成功:

compressed = [Case(id=case.id, formtype=None, age=case.age) for event in caselist if case.formtype == 'A']


Tags: 对象selfnoneid列表age属性gender
3条回答

有时,我认为琐碎的显式方法也是最好的方法,我会简单地采用以下方法:

compressed_cases_dict = {}
for case in caselist:
    if case.case_id not in compressed_cases_dict:
        compressed_cases_dict[case.case_id] = case
    else:
        if case.form_type == 'B':
            compressed_cases_dict[case.case_id].form_type = 'B'
            compressed_cases_dict[case.case_id].cough = case.cough
        else:
            compressed_cases_dict[case.case_id].fever = case.fever
            compressed_cases_dict[case.case_id].gender = case.gender

# if we really want just a list
cases = compressed_cases_dict.values()

它与输入一起给出输出(在为Case类定义了__str__函数之后):

^{pr2}$

注意,对于ID75321,它的咳嗽参数是None而不是0,我认为这是更好的,因为您没有关于该id的咳嗽参数的任何信息(同样对于ID12345,正确的咳嗽参数是0,而不是1。我想这是您的示例输出中的错误)

它还只迭代原始的caselist一次,并使用字典进行O(1)id查找

这是一个相当长的一个比你要去的地方,但这个工作。它创建A表单和B表单的单独列表。然后在B窗体上循环,并查找匹配的A窗体。如果它找到一个匹配项,那么它会将所有A值添加到B表单中

def merge(acases, bcases):
    newlist = []
    for b in bcases:
        for a in acases[:]:
            if b.id == a.id:
                if not b.cough:
                    b.cough = a.cough
                if not b.fever:
                    b.fever = a.fever
                if not b.gender:
                    b.gender = a.gender
                newlist.append(b)
                acases.remove(a)
    newlist += acases
    return newlist


caselist = [
    Case(id="12345", formtype="A", age=12, fever=1, gender="female"),
    Case(id="12345", formtype="B", age=12, cough=0),
    Case(id="67890", formtype="A", age=34, fever=0, gender="male"),
    Case(id="67890", formtype="B", age=34, cough=1),
    Case(id="75321", formtype="A", age=2, fever=0, gender="male")
]

acases = [case for case in caselist if case.formtype == 'A']
bcases = [case for case in caselist if case.formtype == 'B']

caselist = merge(acases, bcases)

for i in caselist:
    print '{0} {1} {2} {3} {4} {5}'.format(i.id, i.formtype, i.age, i.cough, i.fever, i.gender)

12345 B 12 0 1 female
67890 B 34 1 0 male
75321 A 2 None 0 male

这是另一种方法,它比我以前的答案更有效,但不如@LeartS的答案有效。这两个答案都可以处理不同的表单布局

^{pr2}$

按id分组,并将具有“B”form_类型的对象保留为具有“B”formtype的重复id的对象,否则保持原样,如果要使用“B”中未设置的任何属性,可以使用getattr和setattr迭代属性,以设置B中以前未设置的任何属性,除非事先知道A中设置了什么和/或B中设置了什么,否则不能硬编码设置什么或不设置什么:

class Case:
    def __init__(self, id, formtype, age, fever=None, cough=None, gender=None):
        self.case_id = id
        self.form_type = formtype
        self.age = age
        self.fever = fever
        self.cough = cough
        self.gender = gender

    def __iter__(self):
        for ele in ["case_id", "form_type", "age",
                    "fever", "cough", "gender"]:
            yield ele


caselist = [
    Case(id="12345", formtype="A", age=12, fever=1, gender="female"),
    Case(id="12345", formtype="B", age=12, cough=0),
    Case(id="67890", formtype="A", age=34, fever=0, gender="male"),
    Case(id="67890", formtype="B", age=34, cough=1),
    Case(id="75321", formtype="A", age=2, fever=0, gender="male")
]

d = {}

for c in caselist:
    if c.case_id not in d:
        d[c.case_id] = c
    elif d[c.case_id].form_type != "B" and c.form_type == "B":
        tmp = d[c.case_id]
        for attr in c:
            if getattr(c, attr) is None:
                setattr(c, attr, getattr(tmp, attr))
        d[c.case_id] = c

caselist[:] = d.values()
print(caselist)

相关问题 更多 >