如何用嵌套的子对象最好地表示对象?

2024-04-20 15:02:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个有点复杂的数据集。这是一系列的物体。每个对象都包含一系列长度不同的子对象。每个子对象都有一个固定的模式

我没有包括一个示例,因为它非常冗长,而且(通过视觉)解析起来也很麻烦

相反,我编写了一个脚本,它复制了结构的本质:

from random import randint

# fixed (but high) number of subobject types
random_subobject_type = lambda: randint(0,100)
make_subobject = lambda: dict(
    type=get_random_subobject_type(),
    feature1=randint(0,5),
    feature2=randint(0,5),
    feature3=randint(0,5)
)

# numer of subobjects per object varies and can be 0
random_number_of_subobjects_in_object = lambda: randint(0,5)
make_set_of_subobjects = lambda: \
    [make_subobject() for _ in range(random_number_of_subobjects_in_object())]

make_object = lambda: dict(
    some_feature=randint(0,10),
    some_other_feature=randint(0,10),
    subobjs=make_set_of_subobjects())

number_of_objects_in_dataset = 4
[make_object() for _ in range(number_of_objects_in_dataset)]

下面是示例输出。脚本包含随机性,以显示变化的内容,因此每次运行都是不同的:

[{'some_feature': 10,
  'some_other_feature': 5,
  'subobjs': [{'type': 95, 'feature1': 1, 'feature2': 2, 'feature3': 5},
   {'type': 85, 'feature1': 3, 'feature2': 3, 'feature3': 0},
   {'type': 46, 'feature1': 0, 'feature2': 0, 'feature3': 4},
   {'type': 58, 'feature1': 4, 'feature2': 4, 'feature3': 4},
   {'type': 51, 'feature1': 1, 'feature2': 1, 'feature3': 0}]},
 {'some_feature': 0,
  'some_other_feature': 7,
  'subobjs': [{'type': 8, 'feature1': 0, 'feature2': 0, 'feature3': 3},
   {'type': 68, 'feature1': 1, 'feature2': 0, 'feature3': 2},
   {'type': 98, 'feature1': 3, 'feature2': 4, 'feature3': 5},
   {'type': 33, 'feature1': 3, 'feature2': 1, 'feature3': 1},
   {'type': 7, 'feature1': 0, 'feature2': 2, 'feature3': 0}]},
 {'some_feature': 7,
  'some_other_feature': 8,
  'subobjs': [{'type': 51, 'feature1': 3, 'feature2': 2, 'feature3': 5},
   {'type': 34, 'feature1': 3, 'feature2': 5, 'feature3': 4}]},
 {'some_feature': 4,
  'some_other_feature': 10,
  'subobjs': [{'type': 41, 'feature1': 4, 'feature2': 2, 'feature3': 0},
   {'type': 61, 'feature1': 1, 'feature2': 1, 'feature3': 1}]}]

所有数组表示数学集合;i、 秩序没有意义

我想用Pandas操作和分析这个数据集,但我不知道如何处理这个结构

一个明显的解决方案是使用一个具有许多列的稀疏数据帧。每个唯一属性类型中的每个键对应一列。然而,这看起来很混乱,如果列的数量增长得非常大,可能会导致效率低下。有更好的办法吗


Tags: oflambdainnumbermaketyperandomsome