将XML数据组织成字典

Question

我正在尝试把XML数据整理成字典格式，这样可以用来进行蒙特卡洛模拟。

下面是XML中几个条目的例子：

<retirement>
    <item>
        <low>-0.34</low>
        <high>-0.32</high>
        <freq>0.0294117647058824</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
    <item>
        <low>-0.32</low>
        <high>-0.29</high>
        <freq>0</freq>
        <variable>stock</variable>
        <type>historic</type>
    </item>
</retirement>

我现在的数据集只有两个变量，类型可能是三种或四种离散类型中的一种。手动编写两个变量没问题，但我想开始处理更多变量的数据，并希望这个过程能自动化。我的目标是自动将这些XML数据导入字典，这样以后就可以进一步处理，而不需要手动写入数组的标题和变量。

这是我现在的代码：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse('xmlfile')

# Create iterable item list
Items = tree.findall('item')

# Create Master Dictionary
masterDictionary = {}

# Assign variables to dictionary
for Item in Items:
    thisKey = Item.find('variable').text
    if thisKey in masterDictionary == False:
        masterDictionary[thisKey] = []
    else:
        pass

thisList = masterDictionary[thisKey]
newDataPoint = DataPoint(float(Item.find('low').text), float(Item.find('high').text), float(Item.find('freq').text))
thisSublist.append(newDataPoint)

我在这行代码 @ thisList = masterDictionary[thisKey] 时遇到了 KeyError 错误。

我还在尝试创建一个类来处理XML中的其他元素：

# Define a class for each data point that contains low, hi and freq attributes
class DataPoint:
 def __init__(self, low, high, freq):
  self.low = low
  self.high = high
  self.freq = freq

这样我就可以用类似下面的方式检查一个值：

masterDictionary['stock'] [0].freq

任何帮助都非常感谢。

更新

谢谢你的帮助，约翰。缩进问题是我自己的疏忽。这是我第一次在Stack上发帖，我复制粘贴的时候没做好。else: 后面的部分实际上是缩进在for循环里面的，而类的缩进在我的代码中是四个空格——只是发帖时出了问题。我会记住大小写的规范。你的建议确实有效，现在用这些命令：

print masterDictionary.keys()
print masterDictionary['stock'][0].low

得到的结果是：

['inflation', 'stock']
-0.34

这确实是我的两个变量，值与顶部列出的XML同步。

更新 2

好吧，我以为我解决了这个问题，但我又粗心了，结果发现我并没有完全修复。之前的解决方案把所有数据都写到了我的两个字典键上，导致我有两个相同的数据列表，分别分配给两个不同的字典键。我的想法是从XML中将不同的数据集分配给匹配的字典键。以下是当前的代码：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}
thisList = []

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text 
    masterDictionary[thisKey] = thisList
    if thisKey not in masterDictionary:
        masterDictionary[thisKey] = []
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    thisList.append(newDataPoint)

当我输入：

print masterDictionary['stock'][5].low
print masterDictionary['inflation'][5].low
print len(masterDictionary['stock'])
print len(masterDictionary['inflation'])

结果在两个键（'stock' 和 'inflation'）中是相同的：

-.22
-.22
56
56

XML文件中有27个带有股票标签的项目，29个带有通货膨胀标签的项目。我该如何让每个分配给字典键的列表只提取循环中的特定数据呢？

更新 3

似乎用两个循环可以工作，但我不知道为什么一个循环就不行。我意外地实现了这个：

# Import XML Parser
import xml.etree.ElementTree as ET

# Parse XML directly from the file path
tree = ET.parse(xml file)

# Create iterable item list
items = tree.findall('item')

# Create class for historic variables
class DataPoint:
    def __init__(self, low, high, freq):
        self.low = low
        self.high = high
        self.freq = freq

# Create Master Dictionary and variable list for historic variables
masterDictionary = {}

# Loop to assign variables as dictionary keys and associate their values with them
for item in items:
    thisKey = item.find('variable').text
    thisList = []
    masterDictionary[thisKey] = thisList

for item in items:
    thisKey = item.find('variable').text
    newDataPoint = DataPoint(float(item.find('low').text), float(item.find('high').text), float(item.find('freq').text))
    masterDictionary[thisKey].append(newDataPoint)

我尝试了很多种组合来让它在一个循环中实现，但都没有成功。我可以把所有数据列到两个键中——两个相同的数组（这没什么帮助），或者把数据正确地分类到两个不同的数组中，但只保留最后一个数据条目（每次循环都会覆盖自己，只留下一个条目在数组中）。

错误处理数据提取 xml数据循环结构字典结构数据集管理蒙特卡洛模拟数据自动化

将XML数据组织成字典

3 个回答

撰写回答