将dataframe导出到attributesliststructure xml

2024-06-16 09:45:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我叫巴勃罗,这是我在这个小组里的第一个问题。 在查看了其他相关帖子后,我决定提出一个请求, 我想知道是否有办法执行以下操作

假设我有以下数据帧结构:

+----+---------+------------+------------+----------+
|    |   MRBTS | dest       | gw         |   length |
|----+---------+------------+------------+----------|
|  0 |   13004 | 10.104.0.0 | 10.48.0.0  |       16 |
|  1 |   13004 | 10.107.0.0 | 10.45.0.0  |       16 |
|  2 |   13005 | 10.104.0.0 | 10.130.0.0 |        8 |
|  3 |   13005 | 10.102.0.0 | 10.130.0.0 |        8 |
|  4 |   13005 | 0.0.0.0    | 10.110.0.0 |       16 |
+----+---------+------------+------------+----------+

测试DF:

我想通过MRBT导出到XML列表groupping,如下所示:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
  <cmData type="plan" scope="all" name="iprt" id="PlanConfiguration( 7152069 )">
    <header>
      <log dateTime="2020-06-19T07:38:16.000-03:00" action="created" appInfo="PlanExporter">InternalValues are used</log>
    </header>
    <managedObject distName="MRBTS-13004">
      <list >
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="length">16</p>
          <p name="gw">10.38.0.0</p>
        </item>
        <item>
          <p name="dest">10.107.0.0</p>
          <p name="length">16</p>
          <p name="gw">10.45.0.0</p>
        </item>
      </list>
    </managedObject>
    <managedObject  distName="MRBTS-13005">
      <list >
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="length">8</p>
          <p name="gw">10.130.8.0</p>
        </item>
        <item>
          <p name="dest">10.102.0.0</p>
          <p name="length">8</p>
          <p name="gw">10.130.8.0</p>
        </item>
        <item>
          <p name="dest">0.0.0.0</p>
          <p name="length">16</p>
          <p name="gw">10.110.0.0</p>
        </item>
      </list>
    </managedObject>
  </cmData>
</raml>

我从另一篇文章(How do convert a pandas/dataframe to XML?)中获得了这段代码,但在尝试按MRBT分组时,我被绊倒了:

import pandas as pd
df = pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
                   'dest':['10.104.0.0','10.107.0.0','10.104.0.0','10.102.0.0','0.0.0.0'],
                   'gw':['10.48.0.0','10.45.0.0','10.130.0.0','10.130.0.0','10.110.0.0'],
                   'length':['16','16','8','8','16']})

def func(row):
    xml = ['<list >']
    for field in row.index:
        xml.append('  <field name="{0}">{1}</field>'.format(field, row[field]))
    xml.append('</list>')
    return '\n'.join(xml)



print ('\n'.join(df.apply(func, axis=1)))

这个结果是:

<list >
  <field name="MRBTS">13004</field>
  <field name="dest">10.104.0.0</field>
  <field name="gw">10.48.0.0</field>
  <field name="length">16</field>
</list>
<list >
  <field name="MRBTS">13004</field>
  <field name="dest">10.107.0.0</field>
  <field name="gw">10.45.0.0</field>
  <field name="length">16</field>
</list>
<list >
  <field name="MRBTS">13005</field>
  <field name="dest">10.104.0.0</field>
  <field name="gw">10.130.0.0</field>
  <field name="length">8</field>
</list>
<list >
  <field name="MRBTS">13005</field>
  <field name="dest">10.102.0.0</field>
  <field name="gw">10.130.0.0</field>
  <field name="length">8</field>
</list>
<list >
  <field name="MRBTS">13005</field>
  <field name="dest">0.0.0.0</field>
  <field name="gw">10.110.0.0</field>
  <field name="length">16</field>
</list>

你能帮我解决这个问题吗


Tags: namefieldversionxmlitemlengthramllist
2条回答

因为XML文档不是文本文档,所以避免使用字符串连接构建XML。相反,考虑使用DOM方法使用第三方^ {CD1> }建立树,甚至内置模块,^ {CD2}}(稍加修改)。对于数据,按MRBTS字段遍历数据帧的子集:

import lxml.etree as et
import pandas as pd

### STATIC PART OF XML
root = et.Element('raml', {"version": "2.0", "xmlns": "raml20.xsd"})

cmData = et.SubElement(root, "cmData",
                       {"type":"plan", "scope":"all", "name":"iprt", "id":"PlanConfiguration( 7152069 )"})

header = et.SubElement(cmData, "header")
log = et.SubElement(header, "log",
                    {"dateTime":"2020-06-19T07:38:16.000-03:00", "action":"created", "appInfo":"PlanExporter"})
log.text = "InternalValues are used"

### DYNAMIC PART OF XML
df = pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
                   'dest':['10.104.0.0','10.107.0.0','10.104.0.0','10.102.0.0','0.0.0.0'],
                   'gw':['10.48.0.0','10.45.0.0','10.130.0.0','10.130.0.0','10.110.0.0'],
                   'length':['16','16','8','8','16']})
# SUBSET ITERATION                 
for i, g in df.groupby("MRBTS"):
    managedObject = et.SubElement(cmData, "managedObject", {"distName":"MRBTS-"+i})
    list = et.SubElement(managedObject, "list")
    
    # BUILD DICTIONARY OUT OF EACH ROW
    d = g.drop('MRBTS', axis='columns').to_dict('index')
    
    for ik, iv in d.items():
        item = et.SubElement(list, 'item')
        for k, v in iv.items():
            p = et.SubElement(item, 'p', {"name":k})
            p.text = v

# OUTPUT TREE
tree = et.ElementTree(root)
tree_out = tree.write("Output.xml",
                      xml_declaration=True, 
                      encoding="UTF-8",
                      pretty_print=True,
                      doctype="<!DOCTYPE raml SYSTEM 'raml20.dtd'>")

输出XML

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
  <cmData id="PlanConfiguration( 7152069 )" name="iprt" scope="all" type="plan">
    <header>
      <log action="created" appInfo="PlanExporter" dateTime="2020-06-19T07:38:16.000-03:00">InternalValues are used</log>
    </header>
    <managedObject distName="MRBTS-13004">
      <list>
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="gw">10.48.0.0</p>
          <p name="length">16</p>
        </item>
        <item>
          <p name="dest">10.107.0.0</p>
          <p name="gw">10.45.0.0</p>
          <p name="length">16</p>
        </item>
      </list>
    </managedObject>
    <managedObject distName="MRBTS-13005">
      <list>
        <item>
          <p name="dest">10.104.0.0</p>
          <p name="gw">10.130.0.0</p>
          <p name="length">8</p>
        </item>
        <item>
          <p name="dest">10.102.0.0</p>
          <p name="gw">10.130.0.0</p>
          <p name="length">8</p>
        </item>
        <item>
          <p name="dest">0.0.0.0</p>
          <p name="gw">10.110.0.0</p>
          <p name="length">16</p>
        </item>
      </list>
    </managedObject>
  </cmData>
</raml>

我认为关键在于首先为目标xml表示更好地构建数据

  1. groupby MRBTS
  2. 自定义聚合以返回属性list aggregation的项列表。我使用了一些速记列表理解来准备去agg()的Kwarg
  3. 您现在从这个数据框架获得的JSON/dict的结构与您的目标需求相当
  4. 我对XML很生疏,已经15年没有做过任何事情了。可能有更好的库可以将JSON转换为XML。这表明结构基本上在那里。一点XSLT就可以很容易地实现这一点
  5. 在通过json2xml发送dict之前,我已经从pandas.to_dict()对dict进行了一些实验
from json2xml import json2xml
df = pd.DataFrame({'MRBTS':['13004','13004','13005','13005','13005'],
                   'dest':['10.104.0.0','10.107.0.0','10.104.0.0','10.102.0.0','0.0.0.0'],
                   'gw':['10.48.0.0','10.45.0.0','10.130.0.0','10.130.0.0','10.110.0.0'],
                   'length':['16','16','8','8','16']})

a = {c:lambda x: list(x) for c in df.columns if c!="MRBTS"}
df2 = df.groupby("MRBTS").agg(a).reset_index()


js = [{"distName":mo["MRBTS"], "item":[{"dest":mi, "length":mo["length"][i], "gw":mo["gw"][i]} 
                                       for i, mi in enumerate(mo["dest"])]} 
      for mo in df2.to_dict(orient="records")]
print(json2xml.Json2xml(js, attr_type=False).to_xml())
print(json2xml.Json2xml(df2.to_dict(orient="records"), attr_type=False).to_xml())


输出(仅第一条记录)

<?xml version="1.0" ?>
<all>
    <item>
        <distName>13004</distName>
        <item>
            <item>
                <dest>10.104.0.0</dest>
                <length>16</length>
                <gw>10.48.0.0</gw>
            </item>
            <item>
                <dest>10.107.0.0</dest>
                <length>16</length>
                <gw>10.45.0.0</gw>
            </item>
        </item>
    </item>
</all>

相关问题 更多 >