最近我不得不做很多这样的事情(假设它是csv,表格更容易阅读)。使用不同的模式将csv文件转换为json
我从来不会称自己为“python程序员”,但我经常处理数据,这是我争论的首选语言
我想要一些帮助来概括这一点,我认为这是一个很好的学习练习,所以不需要“为我编写代码”。只是朝正确的方向轻轻推一下。如果你读了这些。。。谢谢
元数据输入的适当结构是什么? 有什么建议/想法吗
示例输入、预期结果、当前状态和目标状态
Toplvl TopLvlAttr1 TopLvlAttr2 SecondLvl SecondLvlAttr1 ThirdLvl ThirdLLvlAttr1
T1 TA1_1 TA2_1 S1 SA1_1 TR1 TRA_1
T1 TA1_1 TA2_1 S1 SA1_1 TR2 TRA_2
T1 TA1_1 TA2_1 S1 SA1_1 TR3 TRA_3
T1 TA1_1 TA2_1 S2 SA1_2 TR4 TRA_1
T1 TA1_1 TA2_1 S2 SA1_2 TR5 TRA_2
T1 TA1_1 TA2_1 S2 SA1_2 TR6 TRA_3
T2 TA2_1 TA2_2 S3 SA1_3 TR7 TRA_1
T2 TA2_1 TA2_2 S3 SA1_3 TR8 TRA_2
T2 TA2_1 TA2_2 S3 SA1_3 TR9 TRA_3
预期的json如下所示
{
"T1": {
"TopLevelAttribute1": "TA1_1",
"TopLevelAttribute2": "TA1_2",
"SecondLevels": {
"S1": {
"SecondLevelAttribute1": "SA1_1",
"ThirdLevels": {
"TR1": {
"ThirdLevelAttribute1": "TRA_1"
},
"TR2": {
"ThirdLevelAttribute1": "TRA_2"
},
"TR3": {
"ThirdLevelAttribute1": "TRA_3"
}
}
},
"S2": {
"SecondLevelAttribute1": "SA2_1",
"ThirdLevels": {
"TR4": {
"ThirdLevelAttribute1": "TRA_5"
},
"TR5": {
"ThirdLevelAttribute1": "TRA_5"
},
"TR6": {
"ThirdLevelAttribute1": "TRA_6"
}
}
}
}
},
"T2": {
"TopLevelAttribute1": "TA2_1",
"TopLevelAttribute2": "TA2_2",
"SecondLevels": {
"S3": {
"SecondLevelAttribute1": "SA1_3",
"ThirdLevels": {
"TR7": {
"ThirdLevelAttribute1": "TRA_7"
},
"TR8": {
"ThirdLevelAttribute1": "TRA_8"
},
"TR9": {
"ThirdLevelAttribute1": "TRA_9"
}
}
}
}
}
}
我通常插入的模式&;长得像
import json
with open('sample_csv_input.csv','r') as sd:
data_lines = sd.readlines()
sample_output = {}
for line in data_lines:
data_attributes = line.split(',')
toplvl, toplvl_attr1, toplvl_attr2, secondlvl, secondlvl_attr1, thirdlvl, thirdlvl_attr1 = data_attributes
third_level_element = {
thirdlvl: {
"ThirdLevelAttribute1": thirdlvl_attr1
}
}
if toplvl in sample_output.keys():
if secondlvl in sample_output[toplvl]["SecondLevels"].keys():
sample_output[toplvl]["SecondLevels"][secondlvl]["ThirdLevels"].update(third_level_element)
else:
sample_output[toplvl]["SecondLevels"][secondlvl] = {
"SecondLevelAttribute1": secondlvl_attr1,
"ThirdLevels": third_level_element}
else:
sample_output[toplvl] = {
"TopLevelAttribute1": toplvl_attr1,
"TopLevelAttribute2": toplvl_attr2,
"SecondLevels":{
secondlvl:{
"SecondLevelAttribute1": secondlvl_attr1,
"ThirdLevels": third_level_element
}
}
}
with open('sample_json.output.json','w') as so:
json.dump(sample_output, so, indent=4)
我知道这很糟糕
我试图做一些更一般化的事情,比如有一段定义结构的元数据和一个从中获取它的通用函数。但是被卡住了
# Hierarchy Attribute Map
{
"TopLvl":{
"Attributes":[{"TopLevelAttribute1": "TopLvlAttr1","TopLevelAttribute2": "TopLvlAttr2"}],
"Children": [{"SecondLevels": "SecondLvl"}]
},
"SecondLvl":{
"Attributes":[{"SecondLevelAttribute1": "SecondLvlAttr1"}],
"Children": [{"ThirdLevels": "ThirdLvl"}]
},
"ThirdLvl"{
"Attributes":[{"SecondLevelAttribute1": "ThirdLLvlAttr1"}]
}
}
import csv
import json
class csv2json:
def __init__(self, input_csv_file, hierarchy_map):
self.input_csv_file = input_csv_file
self.labeled_rows = self.read_with_labels()
self.converted_json = self.convert2json()
def get_children(self):
pass
def get_attributes(self):
pass
def insert_element(self):
pass
def read_with_labels(self, input_csv_file):
csv_rows = []
with open(input_csv_file, 'r') as csvfile:
reader = csv.DictReader(csvfile)
field = reader.fieldnames
for row in reader:
csv_rows.extend([{field[i]: row[field[i]] for i in range(len(field))}])
return csv_rows
def convert2json(self):
pass
如有任何想法,我们将不胜感激
您可以使用
jinja2
包和设置模板生成数据,如下所示:在每一行中运行模板,并通过以下方式合并生成的json-Update value of a nested dictionary of varying depth
相关问题 更多 >
编程相关推荐