覆盖XML文件

1 投票

1 回答

1122 浏览

提问于 2025-04-16 19:33

我正在尝试使用elementtree来解析一个XML文件。不过，这个XML文件是从MySQL导出的。当这个XML文件创建时，如果数据库里有像这样的条目：c:cygwin\bin，它会把'\b'当成退格符来处理。因此，我想把XML文件中的所有'\b'删除，这样我就可以通过elementtree.parse()方法发送它。但不知为什么，删除所有'\b'后，我并没有把整个文件写出来。

这是我正在做的事情：

def preprocess(file):
    #exporting from MySQL query browser adds a weird
    #character to the result set, remove it
    #so the XML parser can read the data
    print "in preprocess"
    lines = map(lambda line: line.replace("\b", " "), file)

    #go to the beginning of the file
    file.seek(0);

    #overwrite with correct data
    file.writelines(lines)
    sys.exit()


'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
    p = re.compile("\\b") #search for '\b'
    if(p.match(line)):
        processing = True
        break #only one match needed

if processing:
    preprocess(xml_file)

结果是，我得到的XML文件头部被截断了，所以当传给解析器时，它就失败了。

这是从XML文件中被截掉的部分：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ROOT SYSTEM "diskreport.dtd">
<ROOT>
    <row>
      <field name="buildid">26960</field>
      <field name="cast(status as char)">Filesystem           1K-blocks      Used Available Use% Mounted on
C:cygwinin        285217976  88055920 197162056  31% /usr/bin

任何帮助或想法都非常感谢，

谢谢

数据清洗 elementtree xml解析数据格式转换 xml文件处理文件截断 mysql导出退格符处理

1 个回答

我找到了问题所在，我之前用 p.match 来查找 '\b' 的匹配项，但其实我应该用 p.search。因为 p.match 只会从行的开头开始查找，而 p.search 会在整行中查找所有出现的地方。

解决方案：

def preprocess(file):
    #exporting from MySQL query browser adds a weird
    #character to the result set, remove it
    #so the XML parser can read the data
    print "in preprocess"
    lines = map(lambda line: line.replace("\b", ""), file)

    #go to the beginning of the file
    file.seek(0);

    #overwrite with correct data
    file.writelines(lines)
    sys.exit()


'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
    p = re.compile("\\b")
    if(p.search(line)): ####Changed to p.search here
        processing = True
        break #only one match needed

if processing:
    preprocess(xml_file)

回答于 2025-04-16 由 Python大师

分享举报

覆盖XML文件

1 个回答

撰写回答