合并xml文件中具有公共标记的元素

2024-05-16 10:05:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用Python中的ElementTree创建了一个xml文件。我对python非常陌生,所以如果我在术语方面犯了一些错误,请原谅。 我想合并具有相同属性名的元素的内容

<?xml version="1.0" ?>
<DefaultLines>
    <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
        <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
                <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
                <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
        </FileName>
        <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
                <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
        </FileName>
    </Files>
</DefaultLines>

例如,Filename1和Filename2具有相同的属性,即“emem\u fifo\u 1c.vhd”。如果“file”相同,我希望文件名中的元素合并为一个

我的输出xml应该如下所示

<?xml version="1.0" ?>
<DefaultLines>
    <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
        <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
                <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
                <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
                <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
        </FileName>
    </Files>
</DefaultLines>

我真的不知道如何在python中使用ElementTree做同样的事情

更新: 我正准备在大兵寿的帮助下解决这个问题。但是,我面临另一个问题,即节点内的内容重复。我试图在将它们添加到xml中时删除它们,但它不起作用

<?xml version="1.0" ?>
<DefaultLines>
    <Files Date="2020-10-31" Name="D:\report_byfile_detailed.txt">
        <FileName file="emem_fifo_1c.vhd ">
            <DefLines>
                <Message>'108'<Child>Expression</Child>Item    1  ((W_EN and not(fifo_full)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>108<Child>Expression</Child>Item    1  ((W_EN and not(fifo_full)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>109<Child>Expression</Child>Item    1  ((R_EN and not(fifo_empty)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
            <DefLines>
                <Message>108<Child>Expression</Child>Item    1  ((W_EN and not(fifo_full)) and not(SRESET))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   4:<Child>Expression</Child>fifo_full_1 not SRESET &amp;&amp; W_EN</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   6:<Child>Expression</Child>SRESET_1 (W_EN and not(fifo_full))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   4:<Child>Expression</Child>fifo_full_1           not SRESET &amp;&amp; W_EN</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>
            <DefLines>
                <Message>Row   6:<Child>Expression</Child>SRESET_1              (W_EN and not(fifo_full))</Message>
                <Justification />
                <Comment />
                <Status />
            </DefLines>

“108”“109”“第4行”“第6行”将被追加多次。我是否可能只保留第一次出现的内容,并删除其余内容

更新: 使用该方法删除重复项后,我得到的xml节点不完整:

<?xml version="1.0" ?>
<DefaultLines>
<Files Date="2020-11-01" Name="D:\report_byfile_detailed.txt">
<FileName file="emem_fifo_1c.vhd ">
<DefLines>
<Message>
'108'
<Child>Expression</Child>
Item    1  ((W_EN and not(fifo_full)) and not(SRESET))
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'119'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'120'
<Child>Statement</Child>
w_addr &lt;= (others =&gt; '0');
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'135'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>

<DefLines>
<Message>
'136'
<Child>Statement</Child>
r_addr &lt;= (others =&gt; '0');
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'157'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'158'
<Child>Statement</Child>
fifo_empty &lt;= '1';
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'180'
<Child>Branch</Child>
if (SRESET = '1') then
</Message>
<Justification />
<Comment />
<Status />
</DefLines>

<DefLines>
<Message>
'181', '182'
<Child>Statement</Child>
fifo_used     &lt;= (others =&gt; '0');
fifo_used_one &lt;= '0';
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
'568', '569', '570', '571'
<Child>Statement</Child>
config_rd_fsm                     

    &lt;= '0';
axi4_lite_slave_rdata_ch_out &lt;= AXI4LITE_RDATA32_S2M_DEF;
config_rd_fsm                &lt;= IDLE;
</Message>
<Justification />
<Comment />
<Status />
**<

<DefLines>**
<Message>
161
<Child>Condition</Child>
Item    1  (((r_en_valid = '1') and (fifo_used_one = '1')) and (w_en_valid = '0'))
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
DefLines>

<DefLines>
<Message>
367
<Child>Branch</Child>
when others =&gt;
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
**<Child>Bran**

<DefLines>
<Message>
Row   5:    
<Child>Condition</Child>
(w_en_valid = '0')_0     ((r_en_valid = '1') and (fifo_used_one = '1'))
</Message>
<Justification />
<Comment />
<Status />
**</DefLines>sh</Child>**
All False Count
</Message>
<Justification />
<Comment />
<Status />
</DefLines>
<DefLines>
<Message>
587
<Child>Branch</Child>
when others =&gt;
</Message>
**<Justi>**


</FileName>

我试图在树不完整的地方加上粗体,因此在生成和解析xml树时出错


Tags: andltchildmessagestatuscommentnotxml
2条回答

这就是如何使用lxml实现的;我会尽力解释的

基本原则是,我们随机选择第一个FileName作为目标信息的存储库,将该目标信息粘贴到其中,然后删除该目标的父对象

    from lxml import etree
    deflines = """<?xml version="1.0" ?>
    <DefaultLines>
        <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
            <FileName file="emem_fifo_1c.vhd  ">
                <DefLines>
                    <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
                </DefLines>
                <DefLines>
                    <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
                </DefLines>
            </FileName>
          <FileName file="some_other_name.text">
                <DefLines>
                    <Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
                </DefLines>
            </FileName>
            <FileName file="emem_fifo_1c.vhd  " id="mushi">
                <DefLines>
                    <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
                </DefLines>
            </FileName>       
        </Files>
    </DefaultLines>
    """
    # I added another FileName which doesn't meet the requirements, just to demonstrate how it works
    
    doc = etree.XML(deflines)
    destination = doc.xpath('//Files/FileName[1]//DefLines')[0]
    for  dl in doc.xpath('//FileName[@file="emem_fifo_1c.vhd  "][position()>1]//DefLines'): #position has to be >1 to make sure we skip the destination element        
        dl.getparent().getparent().remove(dl.getparent()) #the target was inside a parent which to be removed; so we search for the target's grandparent 
        destination.append(dl)
    print(etree.tostring(doc, xml_declaration = True).decode())

另一种方法,供您参考

from simplified_scrapy import SimplifiedDoc, utils
xml = '''
<?xml version="1.0" ?>
<DefaultLines>
   <Files Date="2020-10-23" Name="D:\report_byfile_detailed.txt">
      <FileName file="emem_fifo_1c.vhd  ">
            <DefLines>
               <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
            <DefLines>
               <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
      </FileName>
      <FileName file="some_other_name.text">
            <DefLines>
               <Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
            </DefLines>
      </FileName>
      <FileName file="emem_fifo_1c.vhd  " id="mushi">
            <DefLines>
               <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
      </FileName>       
   </Files>
</DefaultLines>
'''

dic = {}
doc = SimplifiedDoc(xml)
nodes = doc.selects('Files>FileName')
for node in nodes:
   last = dic.get(node['file'])
   if last:
      last.appendChild(node.html)
      node.remove()
   else:
      dic[node['file']]=node
      
# print (doc.html)
# remove the duplicate items
nodes = doc.selects('Files>FileName')
for node in nodes:
    dic.clear()
    lst = node.selects('DefLines')
    if len(lst) <= 1:
        continue
    for n in lst:
        key = n.select('Message').firstText()
        exist = dic.get(key)
        if exist:
            n.remove()
        else:
            dic[key] = True
# Sort
nodes = doc.selects('Files>FileName')
for node in nodes:
    dic.clear()
    lst = node.selects('DefLines')
    if len(lst) <= 1:
        continue
    for n in lst:
        dic[n.select('Message').firstText()] = n.outerHtml # Cache, replace it below.

    i = 0
    for key in sorted(dic):
        lst[i].replaceSelf(dic[key]) # Replace after sorting
        i = i + 1
# Save
utils.saveFile('test.xml', doc.html)

结果:

<?xml version="1.0" ?>
<DefaultLines>
   <Files Date="2020-10-23" Name="D: eport_byfile_detailed.txt">
      <FileName file="emem_fifo_1c.vhd ">
            <DefLines>
               <Message>'119'<Child>Branch</Child>if (SRESET = '1') then</Message>
            </DefLines>
            <DefLines>
               <Message>'120'<Child>Statement</Child>w_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
      
            <DefLines>
               <Message>'136'<Child>Statement</Child>r_addr &lt;= (others =&gt; '0');</Message>
            </DefLines>
      </FileName>
      <FileName file="some_other_name.text">
            <DefLines>
               <Message>'xxxx'<Child>Branch</Child>if (yyyyy= '1000') then</Message>
            </DefLines>
      </FileName>       
   </Files>
</DefaultLines>

相关问题 更多 >