如何将2个xml文件与名称空间合并

2024-05-15 02:38:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试使用ElementTree模块合并两个XML文件。以下是XML:

a.xml

<?xml version="1.0"?>
<ListOrdersResponse xmlns="https://mws.amazonservices.com/Orders/2013-09-01">
  <ListOrdersResult>
    <NextToken>token</NextToken>
    <CreatedBefore>2014-10-07T08:13:11Z</CreatedBefore>
    <Orders>
      <Order>
        <AmazonOrderId>12345</AmazonOrderId>
        <SellerOrderId>R12345</SellerOrderId>
        <PurchaseDate>2014-10-02T14:40:37Z</PurchaseDate>
        <LastUpdateDate>2014-10-03T09:47:02Z</LastUpdateDate>
        <OrderStatus>Shipped</OrderStatus>
        <FulfillmentChannel>MFN</FulfillmentChannel>
        <SalesChannel>Amazon.in</SalesChannel>
        <ShipServiceLevel>IN Exp Dom 2</ShipServiceLevel>
        <ShippingAddress>
          <Name>name</Name>
          <AddressLine1>line1</AddressLine1>
          <AddressLine2>line2</AddressLine2>
          <City>Pune</City>
          <StateOrRegion>Maharashtra</StateOrRegion>
          <PostalCode>411027</PostalCode>
          <CountryCode>IN</CountryCode>
          <Phone>123456789</Phone>
        </ShippingAddress>
        <OrderTotal>
          <CurrencyCode>INR</CurrencyCode>
          <Amount>520.00</Amount>
        </OrderTotal>
        <NumberOfItemsShipped>1</NumberOfItemsShipped>
        <NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
        <PaymentExecutionDetail/>
        <PaymentMethod>Other</PaymentMethod>
        <MarketplaceId>mid</MarketplaceId>
        <BuyerEmail>email@buyer.com</BuyerEmail>
        <BuyerName>name</BuyerName>
        <ShipmentServiceLevelCategory>Expedited</ShipmentServiceLevelCategory>
        <ShippedByAmazonTFM>false</ShippedByAmazonTFM>
        <TFMShipmentStatus>Delivered</TFMShipmentStatus>
        <OrderType>StandardOrder</OrderType>
        <EarliestShipDate>2014-10-05T18:30:00Z</EarliestShipDate>
        <LatestShipDate>2014-10-07T18:29:59Z</LatestShipDate>
        <EarliestDeliveryDate>2014-10-07T18:30:00Z</EarliestDeliveryDate>
        <LatestDeliveryDate>2014-10-11T18:29:59Z</LatestDeliveryDate>
      </Order>
    </Orders>
  </ListOrdersResult>
</ListOrdersResponse>

b.xml

^{pr2}$

我想将Orders中的Orders元素中的元素添加到{}的元素中 因此,预期产出为:

<?xml version="1.0"?>
<ListOrdersResponse xmlns="https://mws.amazonservices.com/Orders/2013-09-01">
  <ListOrdersResult>
    <NextToken>token</NextToken>
    <CreatedBefore>2014-10-07T08:13:11Z</CreatedBefore>
    <Orders>
      <Order>
        <AmazonOrderId>12345</AmazonOrderId>
        <SellerOrderId>R12345</SellerOrderId>
        <PurchaseDate>2014-10-02T14:40:37Z</PurchaseDate>
        <LastUpdateDate>2014-10-03T09:47:02Z</LastUpdateDate>
        <OrderStatus>Shipped</OrderStatus>
        <FulfillmentChannel>MFN</FulfillmentChannel>
        <SalesChannel>Amazon.in</SalesChannel>
        <ShipServiceLevel>IN Exp Dom 2</ShipServiceLevel>
        <ShippingAddress>
          <Name>name</Name>
          <AddressLine1>line1</AddressLine1>
          <AddressLine2>line2</AddressLine2>
          <City>Pune</City>
          <StateOrRegion>Maharashtra</StateOrRegion>
          <PostalCode>411027</PostalCode>
          <CountryCode>IN</CountryCode>
          <Phone>123456789</Phone>
        </ShippingAddress>
        <OrderTotal>
          <CurrencyCode>INR</CurrencyCode>
          <Amount>520.00</Amount>
        </OrderTotal>
        <NumberOfItemsShipped>1</NumberOfItemsShipped>
        <NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
        <PaymentExecutionDetail/>
        <PaymentMethod>Other</PaymentMethod>
        <MarketplaceId>mid</MarketplaceId>
        <BuyerEmail>email@buyer.com</BuyerEmail>
        <BuyerName>name</BuyerName>
        <ShipmentServiceLevelCategory>Expedited</ShipmentServiceLevelCategory>
        <ShippedByAmazonTFM>false</ShippedByAmazonTFM>
        <TFMShipmentStatus>Delivered</TFMShipmentStatus>
        <OrderType>StandardOrder</OrderType>
        <EarliestShipDate>2014-10-05T18:30:00Z</EarliestShipDate>
        <LatestShipDate>2014-10-07T18:29:59Z</LatestShipDate>
        <EarliestDeliveryDate>2014-10-07T18:30:00Z</EarliestDeliveryDate>
        <LatestDeliveryDate>2014-10-11T18:29:59Z</LatestDeliveryDate>
      </Order>
      <Order>
        <AmazonOrderId>oid1</AmazonOrderId>
        <PurchaseDate>2014-10-04T13:37:41Z</PurchaseDate>
        <LastUpdateDate>2014-10-06T09:52:21Z</LastUpdateDate>
        <OrderStatus>Shipped</OrderStatus>
        <FulfillmentChannel>MFN</FulfillmentChannel>
        <SalesChannel>Amazon.in</SalesChannel>
        <ShipServiceLevel>IN Std Dom 2_50k_cod</ShipServiceLevel>
        <ShippingAddress>
          <Name>name1</Name>
          <AddressLine1>line1-1</AddressLine1>
          <AddressLine2>line2-1</AddressLine2>
          <City>WADHVANCITY,SURENDRANAGAR</City>
          <StateOrRegion>Gujarat</StateOrRegion>
          <PostalCode>363035</PostalCode>
          <CountryCode>IN</CountryCode>
          <Phone>987654321</Phone>
        </ShippingAddress>
        <OrderTotal>
          <CurrencyCode>INR</CurrencyCode>
          <Amount>242.00</Amount>
        </OrderTotal>
        <NumberOfItemsShipped>1</NumberOfItemsShipped>
        <NumberOfItemsUnshipped>0</NumberOfItemsUnshipped>
        <PaymentExecutionDetail/>
        <PaymentMethod>Other</PaymentMethod>
        <MarketplaceId>mid1</MarketplaceId>
        <BuyerEmail>email1@buyer.com</BuyerEmail>
        <BuyerName>name1</BuyerName>
        <ShipmentServiceLevelCategory>Standard</ShipmentServiceLevelCategory>
        <ShippedByAmazonTFM>false</ShippedByAmazonTFM>
        <TFMShipmentStatus>PendingPickUp</TFMShipmentStatus>
        <OrderType>StandardOrder</OrderType>
        <EarliestShipDate>2014-10-05T18:30:00Z</EarliestShipDate>
        <LatestShipDate>2014-10-07T18:29:59Z</LatestShipDate>
        <EarliestDeliveryDate>2014-10-09T18:30:00Z</EarliestDeliveryDate>
        <LatestDeliveryDate>2014-10-15T18:29:59Z</LatestDeliveryDate>
      </Order>
    </Orders>
  </ListOrdersResult>
</ListOrdersResponse>

我试过了:

import xml.etree.ElementTree as ET
import os
import shlex
import subprocess

tree = ET.parse("a.xml")
root = tree.getroot()

combined_xml = root

namespaces = {'resp': 'https://mws.amazonservices.com/Orders/2013-09-01'}
results = combined_xml.find("resp:ListOrdersResult", namespaces=namespaces)
insertion_point = results.find("resp:Orders", namespaces=namespaces)


tree1 = ET.parse("b.xml")
root1 = tree1.getroot()

results1 = root1.find("resp:ListOrdersByNextTokenResult", namespaces=namespaces)
order_array1 = results1.find("resp:Orders", namespaces=namespaces)

for order in order_array1:
    insertion_point.extend(order)

print ET.tostring(combined_xml)

但我得到了以下输出:

<ns0:ListOrdersResponse xmlns:ns0="https://mws.amazonservices.com/Orders/2013-09-01">
  <ns0:ListOrdersResult>
    <ns0:NextToken>token</ns0:NextToken>
    <ns0:CreatedBefore>2014-10-07T08:13:11Z</ns0:CreatedBefore>
    <ns0:Orders>
      <ns0:Order>
        <ns0:AmazonOrderId>12345</ns0:AmazonOrderId>
        <ns0:SellerOrderId>R12345</ns0:SellerOrderId>
        <ns0:PurchaseDate>2014-10-02T14:40:37Z</ns0:PurchaseDate>
        <ns0:LastUpdateDate>2014-10-03T09:47:02Z</ns0:LastUpdateDate>
        <ns0:OrderStatus>Shipped</ns0:OrderStatus>
        <ns0:FulfillmentChannel>MFN</ns0:FulfillmentChannel>
        <ns0:SalesChannel>Amazon.in</ns0:SalesChannel>
        <ns0:ShipServiceLevel>IN Exp Dom 2</ns0:ShipServiceLevel>
        <ns0:ShippingAddress>
          <ns0:Name>name</ns0:Name>
          <ns0:AddressLine1>line1</ns0:AddressLine1>
          <ns0:AddressLine2>line2</ns0:AddressLine2>
          <ns0:City>Pune</ns0:City>
          <ns0:StateOrRegion>Maharashtra</ns0:StateOrRegion>
          <ns0:PostalCode>411027</ns0:PostalCode>
          <ns0:CountryCode>IN</ns0:CountryCode>
          <ns0:Phone>123456789</ns0:Phone>
        </ns0:ShippingAddress>
        <ns0:OrderTotal>
          <ns0:CurrencyCode>INR</ns0:CurrencyCode>
          <ns0:Amount>520.00</ns0:Amount>
        </ns0:OrderTotal>
        <ns0:NumberOfItemsShipped>1</ns0:NumberOfItemsShipped>
        <ns0:NumberOfItemsUnshipped>0</ns0:NumberOfItemsUnshipped>
        <ns0:PaymentExecutionDetail />
        <ns0:PaymentMethod>Other</ns0:PaymentMethod>
        <ns0:MarketplaceId>mid</ns0:MarketplaceId>
        <ns0:BuyerEmail>email@buyer.com</ns0:BuyerEmail>
        <ns0:BuyerName>name</ns0:BuyerName>
        <ns0:ShipmentServiceLevelCategory>Expedited</ns0:ShipmentServiceLevelCategory>
        <ns0:ShippedByAmazonTFM>false</ns0:ShippedByAmazonTFM>
        <ns0:TFMShipmentStatus>Delivered</ns0:TFMShipmentStatus>
        <ns0:OrderType>StandardOrder</ns0:OrderType>
        <ns0:EarliestShipDate>2014-10-05T18:30:00Z</ns0:EarliestShipDate>
        <ns0:LatestShipDate>2014-10-07T18:29:59Z</ns0:LatestShipDate>
        <ns0:EarliestDeliveryDate>2014-10-07T18:30:00Z</ns0:EarliestDeliveryDate>
        <ns0:LatestDeliveryDate>2014-10-11T18:29:59Z</ns0:LatestDeliveryDate>
      </ns0:Order>
    <ns0:AmazonOrderId>oid1</ns0:AmazonOrderId>
        <ns0:PurchaseDate>2014-10-04T13:37:41Z</ns0:PurchaseDate>
        <ns0:LastUpdateDate>2014-10-06T09:52:21Z</ns0:LastUpdateDate>
        <ns0:OrderStatus>Shipped</ns0:OrderStatus>
        <ns0:FulfillmentChannel>MFN</ns0:FulfillmentChannel>
        <ns0:SalesChannel>Amazon.in</ns0:SalesChannel>
        <ns0:ShipServiceLevel>IN Std Dom 2_50k_cod</ns0:ShipServiceLevel>
        <ns0:ShippingAddress>
          <ns0:Name>name1</ns0:Name>
          <ns0:AddressLine1>line1-1</ns0:AddressLine1>
          <ns0:AddressLine2>line2-1</ns0:AddressLine2>
          <ns0:City>WADHVANCITY,SURENDRANAGAR</ns0:City>
          <ns0:StateOrRegion>Gujarat</ns0:StateOrRegion>
          <ns0:PostalCode>363035</ns0:PostalCode>
          <ns0:CountryCode>IN</ns0:CountryCode>
          <ns0:Phone>987654321</ns0:Phone>
        </ns0:ShippingAddress>
        <ns0:OrderTotal>
          <ns0:CurrencyCode>INR</ns0:CurrencyCode>
          <ns0:Amount>242.00</ns0:Amount>
        </ns0:OrderTotal>
        <ns0:NumberOfItemsShipped>1</ns0:NumberOfItemsShipped>
        <ns0:NumberOfItemsUnshipped>0</ns0:NumberOfItemsUnshipped>
        <ns0:PaymentExecutionDetail />
        <ns0:PaymentMethod>Other</ns0:PaymentMethod>
        <ns0:MarketplaceId>mid1</ns0:MarketplaceId>
        <ns0:BuyerEmail>email1@byer.com</ns0:BuyerEmail>
        <ns0:BuyerName>name1</ns0:BuyerName>
        <ns0:ShipmentServiceLevelCategory>Standard</ns0:ShipmentServiceLevelCategory>
        <ns0:ShippedByAmazonTFM>false</ns0:ShippedByAmazonTFM>
        <ns0:TFMShipmentStatus>PendingPickUp</ns0:TFMShipmentStatus>
        <ns0:OrderType>StandardOrder</ns0:OrderType>
        <ns0:EarliestShipDate>2014-10-05T18:30:00Z</ns0:EarliestShipDate>
        <ns0:LatestShipDate>2014-10-07T18:29:59Z</ns0:LatestShipDate>
        <ns0:EarliestDeliveryDate>2014-10-09T18:30:00Z</ns0:EarliestDeliveryDate>
        <ns0:LatestDeliveryDate>2014-10-15T18:29:59Z</ns0:LatestDeliveryDate>
      </ns0:Orders>
  </ns0:ListOrdersResult>
</ns0:ListOrdersResponse>

为什么我得到ns0?另外,第二个订单缺少<Order>标记。如果没有ns0,如何获得所需的输出。如果使用另一个模块可以让生活更轻松,我可以接受建议。:) 谢谢


Tags: nameincityxmlordersns0orderstatusaddressline2
2条回答
import xml.etree.ElementTree as ET
from StringIO import StringIO

namespaces = {'resp': 'https://mws.amazonservices.com/Orders/2013-09-01'}

tree = ET.parse("a.xml")
root = tree.getroot()

results = root.find("resp:ListOrdersResult", namespaces=namespaces)
order_array = results.find("resp:Orders", namespaces=namespaces).getchildren()

tree1 = ET.parse("b.xml")
root1 = tree1.getroot()

results1 = root1.find("resp:ListOrdersByNextTokenResult", namespaces=namespaces)
order_array1 = results1.find("resp:Orders", namespaces=namespaces).getchildren()

for order in order_array1:
    order_array.append(order)

tree.write("temp.xml")

correct_data = open("temp.xml").read().replace('ns0:', '').replace(':ns0','')
filewrite = open("combined.xml", 'w')
filewrite.write(correct_data)
filewrite.close()

ns0的意思是“命名空间0”-它是命名空间dict和“resp:标记名“条款。在

不过,我还是建议使用beautifulsoup4来实现这一点—使用xml更好:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('a.xml'))
insertion_point = soup.listordersresult.orders

orders_b = BeautifulSoup(open('b.xml')).listordersbynexttokenresult.orders
# could probably just be orders_b = BeautifulSoup(open('b.xml'))

orders_to_insert = orders_b.find_all('order')
for order in orders_to_insert:
    insertion_point.append(order)
print(soup)

相关问题 更多 >

    热门问题