使用Python在XML/文本文件中搜索替换多行

Question

---更新 3：

我已经完成了一个脚本，可以把需要的数据更新到xml文件里，但写入的文件中有一段代码却没有被包含进去。这是为什么呢？我该怎么替换它呢？

<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>

目前的代码可以正常工作（除了上面提到的问题）。

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

path=os.getcwd()
arcpy.env.workspace = path

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)
zone="_Zone"

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_BaseMetadata.xml"

    check_meta=os.listdir(path)
    if FileNm+'.xml' in check_meta:
        shutil.copy2(FileNm+'.xml', newMetaFile)
    else:
        shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    print "Processing: "+str(File)

    for node in tree.findall('.//title'):
        node.text = str(FileNm)
    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = str(os.getcwd()+"\\"+File)
    for node in tree.findall('.//native/digform/formname'):
        node.text = str(FileDesc_obj.featureType)
    for node in tree.findall('.//avlform/nondig/formname'):
        node.text = str(FileDesc_obj.extension)
    for node in tree.findall('.//avlform/digform/formname'):
        node.text = str(float(os.path.getsize(File))/int(1024))+" KB"
    for node in tree.findall('.//theme'):
        node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))
    print node.text
    projection_info=[]
    Zone=FileDesc_obj.spatialReference.name

    if "GCS" in str(FileDesc_obj.spatialReference.name):
        projection_info=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]
        print "Geographic Coordinate system"
    else:
        projection_info=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]
        print "Projected Coordinate system"
    x=0
    for node in tree.findall('.//spdom'):
        for node2 in node.findall('.//keyword'):
            print node2.text
            node2.text = str(projection_info[x])
            print node2.text
            x=x+1


    tree.write(newMetaFile)

---更新 1和2：

多亏了Aleyna，我得到了以下基本代码，它可以正常运行。

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

CodeString=['northbc','southbc', '<nondig><formname>']

nondig='nondigital'
path=os.getcwd()
arcpy.env.workspace = path
xmlfile = path+"\\test.xml"

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_Metadata.xml"
    shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = nondig

    tree.write(newMetaFile)

问题出在处理像这样的xml代码上：

- <spdom>
  <keyword thesaurus="">GDA94</keyword> 
  <keyword thesaurus="">GRS80</keyword> 
  <keyword thesaurus="">Transverse Mercator</keyword> 
  <keyword thesaurus="">Zone 55 (144E - 150E)</keyword> 
  </spdom>

因为关键字“thes...在<spdom>中并不是唯一的，我们能否按照来自以下值的顺序更新这些内容：

FileDesc_obj.spatialReference.name

u'GCS_GDA_1994'

---原始帖子---

我正在建立一个程序，从我们库里的空间文件生成xml元数据文件。我已经创建了脚本，从文件中提取所需的空间和属性数据，并创建了一个shp和文本文件的索引，但现在我想把这些信息写入一个基础的元数据xml文件，这个文件是按照anzlic标准写的，通过替换一些公共或静态元素的值……

举个例子，我想替换以下的xml代码：

<northbc>8097970</northbc>
<southbc>8078568</southbc>

为：

<northbc> GeneratedValue_[desc.extent.XMax] /<northbc>
<southbc> GeneratedValue_[desc.extent.XMax] </southbc>

问题是，显然<tag>和</tag>之间的数字或值不会是相同的。

同样，对于像<title>、<nondig>、<formname>这样的xml标签……在后面的例子中，这两个标签必须一起搜索，因为formname出现了多次（并不是唯一的）。

我正在使用Python正则表达式手册[在这里][1]。

正则表达式 xml处理文件操作数据更新标签搜索多行替换元数据生成 anzlic标准

使用Python在XML/文本文件中搜索替换多行

3 个回答

撰写回答