我有一个有点复杂的数据集。这是一系列的物体。每个对象都包含一系列长度不同的子对象。每个子对象都有一个固定的模式
我没有包括一个示例,因为它非常冗长,而且(通过视觉)解析起来也很麻烦
相反,我编写了一个脚本,它复制了结构的本质:
from random import randint
# fixed (but high) number of subobject types
random_subobject_type = lambda: randint(0,100)
make_subobject = lambda: dict(
type=get_random_subobject_type(),
feature1=randint(0,5),
feature2=randint(0,5),
feature3=randint(0,5)
)
# numer of subobjects per object varies and can be 0
random_number_of_subobjects_in_object = lambda: randint(0,5)
make_set_of_subobjects = lambda: \
[make_subobject() for _ in range(random_number_of_subobjects_in_object())]
make_object = lambda: dict(
some_feature=randint(0,10),
some_other_feature=randint(0,10),
subobjs=make_set_of_subobjects())
number_of_objects_in_dataset = 4
[make_object() for _ in range(number_of_objects_in_dataset)]
下面是示例输出。脚本包含随机性,以显示变化的内容,因此每次运行都是不同的:
[{'some_feature': 10,
'some_other_feature': 5,
'subobjs': [{'type': 95, 'feature1': 1, 'feature2': 2, 'feature3': 5},
{'type': 85, 'feature1': 3, 'feature2': 3, 'feature3': 0},
{'type': 46, 'feature1': 0, 'feature2': 0, 'feature3': 4},
{'type': 58, 'feature1': 4, 'feature2': 4, 'feature3': 4},
{'type': 51, 'feature1': 1, 'feature2': 1, 'feature3': 0}]},
{'some_feature': 0,
'some_other_feature': 7,
'subobjs': [{'type': 8, 'feature1': 0, 'feature2': 0, 'feature3': 3},
{'type': 68, 'feature1': 1, 'feature2': 0, 'feature3': 2},
{'type': 98, 'feature1': 3, 'feature2': 4, 'feature3': 5},
{'type': 33, 'feature1': 3, 'feature2': 1, 'feature3': 1},
{'type': 7, 'feature1': 0, 'feature2': 2, 'feature3': 0}]},
{'some_feature': 7,
'some_other_feature': 8,
'subobjs': [{'type': 51, 'feature1': 3, 'feature2': 2, 'feature3': 5},
{'type': 34, 'feature1': 3, 'feature2': 5, 'feature3': 4}]},
{'some_feature': 4,
'some_other_feature': 10,
'subobjs': [{'type': 41, 'feature1': 4, 'feature2': 2, 'feature3': 0},
{'type': 61, 'feature1': 1, 'feature2': 1, 'feature3': 1}]}]
所有数组表示数学集合;i、 秩序没有意义
我想用Pandas操作和分析这个数据集,但我不知道如何处理这个结构
一个明显的解决方案是使用一个具有许多列的稀疏数据帧。每个唯一属性类型中的每个键对应一列。然而,这看起来很混乱,如果列的数量增长得非常大,可能会导致效率低下。有更好的办法吗
目前没有回答
相关问题 更多 >
编程相关推荐