如何在Python中将数据存储到数据字典,标题顺序混乱时该怎么办
我想把以下数据存储在一个数据字典里,这样我就可以方便地导出到CSV文件中。问题是,每个学校的ID对应的列顺序并不总是一样的:
text = """
school id= 28392
name|year|degree|age|race
Susan A. Smith|2007|PhD|27|white
Fred Collins|2006|PhD|26|hispanic
Amber Real|2007|MBA|28|white
Mike Lee|2003|PhD|27|white
school id= 273533123
name|year|age|race|degree
John B. Black|2003|27|hispanic|MBA
Steven Smith|2005|28|black|PhD
Jacob Waters|2003|25|hispanic|MBA
school id = 3452332
name|year|race|age|degree
Peter Hintze|2002|white|27|Bachelors
Ann Graden|2004|black|25|MBA
Bryan Stewart|2004|white|28|PhD
"""
我希望最后能把所有数据输出到一个CSV文件,文件的标题应该是:
school id|year|name|age|race|degree
我可以用Python做到这一点吗?
1 个回答
6
这其实看起来很简单。先把文件处理成一种数据结构,然后再把它导出为csv格式。
school = None
headers = None
data = {}
for line in text.splitlines():
if line.startswith("school id"):
school = line.split('=')[1].strip()
headers = None
continue
if school is not None and headers is None:
headers = line.split('|')
continue
if school is not None and headers is not None and line:
if not school in data:
data[school] = []
datum = dict(zip(headers, line.split('|')))
data[school].append(datum)
In [29]: data
Out[29]:
{'273533123': [{'age': '27',
'degree': 'MBA',
'name': 'John B. Black',
'race': 'hispanic',
'year': '2003'},
{'age': '28',
'degree': 'PhD',
'name': 'Steven Smith',
'race': 'black',
'year': '2005'},
{'age': '25',
'degree': 'MBA',
'name': 'Jacob Waters',
'race': 'hispanic',
'year': '2003'}],
'28392': [{'age': '27',
'degree': 'PhD',
'name': 'Susan A. Smith',
'race': 'white',
'year': '2007'},
{'age': '26',
'degree': 'PhD',
'name': 'Fred Collins',
'race': 'hispanic',
'year': '2006'},
{'age': '28',
'degree': 'MBA',
'name': 'Amber Real',
'race': 'white',
'year': '2007'},
{'age': '27',
'degree': 'PhD',
'name': 'Mike Lee',
'race': 'white',
'year': '2003'}],
'3452332': [{'age': '27',
'degree': 'Bachelors',
'name': 'Peter Hintze',
'race': 'white',
'year': '2002'},
{'age': '25',
'degree': 'MBA',
'name': 'Ann Graden',
'race': 'black',
'year': '2004'},
{'age': '28',
'degree': 'PhD',
'name': 'Bryan Stewart',
'race': 'white',
'year': '2004'}]}