PYTHON中的XML到CSV:为每个节点提取一系列子节点

2024-04-25 23:59:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我的目标是将.XML文件转换为.CSV文件。 这部分代码已经可以使用了。在

但是,我还想提取其中一个“父”节点的子节点。在

也许一个例子更能说明问题

以下是我的XML的结构:

<nedisCatalogue>
   <headerInfo>
    <feedVersion>1-0</feedVersion>
    <dateCreated>2018-01-22T23:37:01+0100</dateCreated>
    <supplier>Nedis_BENED</supplier>
    <locale>nl_BE</locale>
   </headerInfo>
   <productList>
    <product>
     <nedisPartnr><![CDATA[VS-150/63BA]]></nedisPartnr>
     <nedisArtlid>17005</nedisArtlid>
     <vendorPartnr><![CDATA[TONFREQ-ELKOS / BIPOL 150, 5390]]></vendorPartnr>
     <brand><![CDATA[Visaton]]></brand>
     <EAN>4007540053905</EAN>
     <intrastatCode>8532220000</intrastatCode>
     <UNSPSC>52161514</UNSPSC>
     <headerText><![CDATA[Crossover Foil capacitor]]></headerText>
     <internetText><![CDATA[Bipolaire elco met een ruwe folie en een zeer goede prijs/kwaliteits-verhouding voor de bouw van cross-overs. 63 Vdc, 10% tolerantie.]]></internetText>
     <generalText><![CDATA[Dimensions 16 x 35 mm    
    ]]></generalText>
  <images>
   <image type="2" order="15">767736.JPG</image>
  </images>
  <attachments>
  </attachments>
  <categories>
   <tree name="Internet_Tree_ISHP">
    <entry depth="001" id="1067858"><![CDATA[Audio]]></entry>
    <entry depth="002" id="1067945"><![CDATA[Speakers]]></entry>
    <entry depth="003" id="1068470"><![CDATA[Accessoires]]></entry>
   </tree>
  </categories>
  <properties>
   <property id="360" multiplierID="" unitID="" valueID=""><![CDATA[...]]></property>
  </properties>
     <status>
      <code status="NORMAL"></code>
     </status>
     <packaging quantity="1" weight="8"></packaging>
     <introductionDate>2015-10-26</introductionDate>
     <serialnumberKeeping>N</serialnumberKeeping>
     <priceLevels>
          <normalPricing from="2017-02-13" to="2018-01-23">
       <price level="1" moq="1" currency="EUR">2.48</price>
      </normalPricing>
      <specialOfferPricing></specialOfferPricing>
     <goingPriceInclVAT currency="EUR" quantity="1">3.99</goingPriceInclVAT>
     </priceLevels>
     <tax>
     </tax>
     <stock>
      <inStockLocal>25</inStockLocal>
      <inStockCentral>25</inStockCentral>
      <ATP>
       <nextExpectedStockDateLocal></nextExpectedStockDateLocal>
       <nextExpectedStockDateCentral></nextExpectedStockDateCentral>
      </ATP>
     </stock>
    </product>
  ....
</nedisCatalogue>

下面是我现在掌握的代码:

^{pr2}$

如果您运行代码,您将看到我只检索categories/tree的第一个“entry”;这是正常的。但是,我不知道如何创建一个循环,为每个节点“categories”创建新的列,例如categories1、categies2和categories3,这些列的值为:“entry”。在

我的结果应该是这样的

Nedis Part Number   Nedis Article ID         Vendor Part Number   
VS-150/63BA         17005              TONFREQ-ELKOS / BIPOL 150, 5390  


Brand     EAN           Header text               Internet Text 
Visaton   4,00754E+12   Crossover Foil capacitor  Bipolaire elco …



General Text              Category1    Categroy2     Category3
Dimensions 16 x 35 mm     Audio        Speakers      Accessoires

我已经尽力了,但没能找到解决办法。在

任何帮助都将非常感谢!!!:)

非常感谢

艾伦


Tags: 文件代码idtree节点statusxmlean
1条回答
网友
1楼 · 发布于 2024-04-25 23:59:12

我想这就是你想要的:

for child in time.find('categories').find('tree'):
    categ = child.text
    row.append(categ)

下面是一个解决方案,它在xml中循环一次,以确定要添加多少个标头,添加标头,然后循环查看每个产品的类别列表:

**更新后,除了类别外,还可以遍历图像。这是最大的区别:

^{pr2}$

它将计算出单个记录上的最大类别数,然后再计算出那么多列。如果特定记录的类别较少,则此代码会将空白值作为占位符插入,以便列标题始终与数据对齐。在

例如:

Cat1     Cat2     Cat3     Img1     Img2     Img3
A        B        C        1        2        3
D        E        <blank>  4        <blank>  <blank>

以下是完整的解决方案:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("c:\\python\\xml.xml")
root = tree.getroot()

f = open('c:\\python\\xml.csv', 'w')

csvwriter = csv.writer(f, delimiter=',')

count = 0

head = ['Nedis Part Number', 'Nedis Article ID', 'Vendor Part Number', 'Brand', 'EAN', 'Header text', 'Internet Text', 'General Text']
prdlist = root[1]

maxcat = 0
for time in prdlist.findall('product'):
    cur = 0
    for child in time.find('categories').find('tree'):
        cur += 1
    if cur > maxcat:
        maxcat = cur

for cnt in range (0, maxcat):
    head.append('Category ' + str(cnt + 1))

maximg = 0
for time in prdlist.findall('product'):
    cur = 0
    for child in time.find('images'):
        cur += 1
    if cur > maximg:
        maximg = cur

for cnt in range(0, maximg):
    head.append('Image ' + str(cnt + 1))

csvwriter.writerow(head)

for time in prdlist.findall('product'):
    row = []
    nedis_number = time.find('nedisPartnr').text
    row.append(nedis_number)
    nedis_art_id = time.find('nedisArtlid').text
    row.append(nedis_art_id)
    vendor_part_nbr = time.find('vendorPartnr').text
    row.append(vendor_part_nbr)
    Brand = time.find('brand').text
    row.append(Brand)
    ean = time.find('EAN').text
    row.append(ean)
    header_text = time.find('headerText').text
    row.append(header_text)
    internet_text = time.find('internetText').text
    row.append(internet_text)
    general_text = time.find('generalText').text
    row.append(general_text)

    curcat = 0

    for child in time.find('categories').find('tree'):
        categ = child.text
        row.append(categ)
        curcat += 1

    while curcat < maxcat:
        row.append('')
        curcat += 1

    curimg = 0

    for img in time.find('images'):
        image = img.text
        row.append(image)
        curimg += 1

    while curimg < maximg:
        row.append('')
        curimg += 1

    csvwriter.writerow(row)

f.close()

相关问题 更多 >