使用Python在XML中查找和替换标签

2 投票
1 回答
4818 浏览
提问于 2025-04-16 22:45

我之前提过一个类似的问题,但这次有点不同。我想用Python来查找和替换XML标签。我使用XML来上传一些GIS(地理信息系统)形状文件的元数据。在元数据编辑器中,我可以选择某些数据收集的日期。选项有“单个日期”、“多个日期”和“日期范围”。在第一个XML中,有一些标签是关于日期范围的,你会看到“rngdates”标签,下面还有一些子元素,比如“begdate”、“begtime”、“enddate”等。我想把这些标签编辑掉,让它看起来像第二个XML,那个XML包含多个单独的日期。新的标签是“mdattim”、“sngdate”和“caldate”。希望这样解释清楚了,如果需要更多信息请问我。XML确实有点复杂,我还没有完全理解。

谢谢,
Mike

第一个XML:

<idinfo>
  <citation>
    <citeinfo>
       <origin>My Company Name</origin>
       <pubdate>05/04/2009</pubdate>
       <title>Feature Class Name</title>
       <edition>0</edition>
       <geoform>vector digital data</geoform>
       <onlink>.</onlink>
     </citeinfo>
   </citation>
<descript>
  <abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
  <purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
 </descript>
<timeperd>
 <timeinfo>
   <rngdates>
     <begdate>7/13/2010</begdate>
     <begtime>unknown</begtime>
     <enddate>7/15/2010</enddate>
     <endtime>unknown</endtime>
    </rngdates>
 </timeinfo>
 <current>ground condition</current>
</timeperd>

第二个XML:

<idinfo>
  <citation>
    <citeinfo>
      <origin>My Company Name</origin>
      <pubdate>03/07/2011</pubdate>
      <title>Feature Class Name</title>
      <edition>0</edition>
      <geoform>vector digital data</geoform>
      <onlink>.</onlink>
    </citeinfo>
   </citation>
 <descript>
   <abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
   <purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
 </descript>
<timeperd>
 <timeinfo>
  <mdattim>
    <sngdate>
      <caldate>08-24-2009</caldate>
      <time>unknown</time>
     </sngdate>
    <sngdate>
      <caldate>08-26-2009</caldate>
    </sngdate>
   <sngdate>
      <caldate>08-26-2009</caldate>
    </sngdate>
   <sngdate>
      <caldate>07-07-2010</caldate>
    </sngdate>
  </mdattim>
</timeinfo>

这是我目前的Python代码:

folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"

for filename in glob.glob(os.path.join(folderPath, "*.xml")):

    fullpath = os.path.join(folderPath, filename)

    if os.path.isfile(fullpath):
        basename, filename2 = os.path.split(fullpath)

        root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\Run_Metadata_2009\\" + filename2)

        iter = root.getiterator()
        #Iterate
        for element in iter:
            print element.tag

            if element.tag == "begdate":
                element.tag.replace("begdate", "sngdate")

1 个回答

1

我觉得我成功让代码运行起来了。这段代码可以让你编辑某些标签,如果你需要从现有的XML文件中更改它们的话。我之所以这么做,是为了在一个批处理脚本中为一些GIS的shapefile创建元数据,以便根据日期的不同情况(单个日期、多个日期或日期范围)来更改某些日期值。

这个网页对我帮助很大:http://lxml.de/tutorial.html

我还有一些工作要做,但这正是我最初问题的答案 :) 我相信这段代码可以在很多其他应用中使用。

# Set workspace location for XML files
folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"
# Loop through each file and search for files with .xml extension
for filename in glob.glob(os.path.join(folderPath, "*.xml")):

    fullpath = os.path.join(folderPath, filename)

    # Split file name from the directory path
    if os.path.isfile(fullpath):
        basename, filename2 = os.path.split(fullpath)
        # Set variable to XML files
        root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009\\" + filename2)

        # Set variable for iterator
        iter = root.getiterator()
        #Iterate through the tags in each XML file
        for element in iter:
            if element.tag == "timeinfo":
                tree = root.find(".//timeinfo")
                # Clear all tags below the "timeinfo" tag
                tree.clear()
                # Append new Element
                element.append(ET.Element("mdattim"))
                # Create SubElements to the parent tag
                child1 = ET.SubElement(tree, "sngdate")
                child2 = ET.SubElement(child1, "caldate")
                child3 = ET.SubElement(child1, "time")
                # Set text values for tags
                child2.text = "08-24-2009"
                child3.text = "unknown

撰写回答