使用Python和BeautifulSoup解析XML文档

2024-05-29 04:43:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个相当复杂的XML文档,至少对我来说,其中包含一些信息,我尝试检查lxml库以完成任务,但是我遇到了困难。 当我有2个measValue measObjLdn,并且我想要回这些值时,我该怎么办

KPI                    GRUPO      VALOR
avgMemoryUtilization1M  CP-ISA      72 
avgMemoryUtilization1M  CP-ISA2     86
avgPDPUtilization       1           29
avgPDPUtilization       2           32

我拥有的XML文档非常类似于下面的文档:

            <measInfo measInfoId="KPISystemCP-ISA">
        <granPeriod duration="PT300S" endTime="2019-05-14T12:05:01-03:00" />
        <measType p="1">VS.avgCpuUtilization</measType>
        <measType p="2">VS.avgMemoryUtilization</measType>
        <measType p="3">VS.avgMemoryUtilization1M</measType>
        <measType p="4">VS.SDFsFpUtilization</measType>
        <measType p="5">VS.SDFsLcpUtilization</measType>
        <measType p="6">VS.avgVmFpCpuNicUsage</measType>
        <measType p="7">VS.avgVmFpCpuWorkerUsage</measType>
        <measType p="8">VS.avgVmFpCpuSchedulerUsage</measType>
        <measType p="9">VS.avgVmFpCpuCollapsedUsage</measType>
        <measType p="10">VS.avgVmFpCpuCombinedUsage</measType>
        <measType p="11">VS.hwCfgBitsInfo</measType>
        <measValue measObjLdn="KPI=System,GroupName=CP-ISA,group=1,slot=3,mda=1">
            <r p="1">1</r>
            <r p="2">72</r>
            <r p="3">72</r>
            <r p="4">0.00</r>
            <r p="5">0.00</r>
            <r p="6">0.00</r>
            <r p="7">0.05</r>
            <r p="8">0.00</r>
            <r p="9">0.00</r>
            <r p="10">0.00</r>
            <r p="11">4</r>
        </measValue>
        <measValue measObjLdn="KPI=System,GroupName=CP-ISA2,group=2,slot=4,mda=1">
            <r p="1">1</r>
            <r p="2">86</r>
            <r p="3">86</r>
            <r p="4">0.00</r>
            <r p="5">0.00</r>
            <r p="6">0.00</r>
            <r p="7">0.05</r>
            <r p="8">0.00</r>
            <r p="9">0.00</r>
            <r p="10">0.00</r>
            <r p="11">7</r>
        </measValue>
    </measInfo>
    <measInfo>
        <granPeriod duration="PT300S" endTime="2019-05-14T12:05:01-03:00" />
        <measType p="1">VS.avgUtilization</measType>
        <measType p="2">VS.avgPDPUtilization</measType>
        <measType p="3">VS.avgPDPUtilization1M</measType>
        <measValue measObjLdn="KPI=System2,GroupName=1,group=1,slot=3,mda=1">
            <r p="1">1</r>
            <r p="2">29</r>
            <r p="3">99</r>
        </measValue>
        <measValue measObjLdn="KPI=System2,GroupName=2,group=2,slot=4,mda=1">
            <r p="1">1</r>
            <r p="2">32</r>
            <r p="3">16</r>
        </measValue>
    </measInfo>

Tags: 文档groupcpvskpislotisamda
1条回答
网友
1楼 · 发布于 2024-05-29 04:43:33

您可以在BeautifulSoup中使用find_all()方法。你知道吗

要分解问题,首先需要得到每个measInfo元素

soup = BeautifulSoup(xml, 'html.parser')
measinfos = soup.find_all('measinfo')

这将返回一个ResultSet对象,其中包含我们可以循环的2measInfo元素。你知道吗

以第一个元素为例,我们可以将其解析出来以获得一些有用的信息。你知道吗

measinfo = measinfos[0]  # First item in ResultSet
measinfoid = measinfo.get('measinfoid')  # get the measInfoId (such as KPISystemCP-ISA)
meastypes = measinfo.find_all('meastype')  # get all the measType tags to be able to map the correct values
measvalues = measinfo.find_all('measvalue')  # get all the `measValue` elements

我们可以将这些“标签”放入dict中,以便以后更容易映射到值

meastypes_dict = {}
    for meastype in meastypes:
        meastypes_dict[meastype.attrs['p']] = meastype.text

meastype.attrs['p']找到p属性并返回值

现在标签准备好了。接下来,我们来看第一个measValue元素。我们将对此进行循环,并为每个值指定一个标签。你知道吗

measvalue = measvalues[0]  # First item in ResultSet
measobjldn = measvalue.get('measobjldn')  # get the measObjLdn (such as KPI=System,GroupName=CP-ISA,...)
for result in measvalue.find_all('r'):  # loop through values
    label = meastypes_dict[result.attrs['p']]  # Using the `p` attribute from the value element, we can find which label this corresponds to
    value = result.text  # The value of the element
    print(measinfoid, measobjldn, label, value)

最终代码:

soup = BeautifulSoup(xml, 'html.parser')
measinfos = soup.find_all('measinfo')
for measinfo in measinfos:
    measinfoid = measinfo.get('measinfoid')
    meastypes = measinfo.find_all('meastype')
    measvalues = measinfo.find_all('measvalue')
    meastypes_dict = {}
    for meastype in meastypes:
        meastypes_dict[meastype.attrs['p']] = meastype.text

    for measvalue in measvalues:
        measobjldn = measvalue.get('measobjldn')
        for result in measvalue.find_all('r'):
            label = meastypes_dict[result.attrs['p']]
            value = result.text 
            print(measinfoid, measobjldn, label, value)

相关问题 更多 >

    热门问题