覆盖XML文件
我正在尝试使用elementtree来解析一个XML文件。不过,这个XML文件是从MySQL导出的。当这个XML文件创建时,如果数据库里有像这样的条目:c:cygwin\bin,它会把'\b'当成退格符来处理。因此,我想把XML文件中的所有'\b'删除,这样我就可以通过elementtree.parse()方法发送它。但不知为什么,删除所有'\b'后,我并没有把整个文件写出来。
这是我正在做的事情:
def preprocess(file):
#exporting from MySQL query browser adds a weird
#character to the result set, remove it
#so the XML parser can read the data
print "in preprocess"
lines = map(lambda line: line.replace("\b", " "), file)
#go to the beginning of the file
file.seek(0);
#overwrite with correct data
file.writelines(lines)
sys.exit()
'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
p = re.compile("\\b") #search for '\b'
if(p.match(line)):
processing = True
break #only one match needed
if processing:
preprocess(xml_file)
结果是,我得到的XML文件头部被截断了,所以当传给解析器时,它就失败了。
这是从XML文件中被截掉的部分:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ROOT SYSTEM "diskreport.dtd">
<ROOT>
<row>
<field name="buildid">26960</field>
<field name="cast(status as char)">Filesystem 1K-blocks Used Available Use% Mounted on
C:cygwinin 285217976 88055920 197162056 31% /usr/bin
任何帮助或想法都非常感谢,
谢谢
1 个回答
1
我找到了问题所在,我之前用 p.match 来查找 '\b' 的匹配项,但其实我应该用 p.search。因为 p.match 只会从行的开头开始查找,而 p.search 会在整行中查找所有出现的地方。
解决方案:
def preprocess(file):
#exporting from MySQL query browser adds a weird
#character to the result set, remove it
#so the XML parser can read the data
print "in preprocess"
lines = map(lambda line: line.replace("\b", ""), file)
#go to the beginning of the file
file.seek(0);
#overwrite with correct data
file.writelines(lines)
sys.exit()
'''Entry into the program'''
#test the file to see if processing is needed before parsing
for line in xml_file:
p = re.compile("\\b")
if(p.search(line)): ####Changed to p.search here
processing = True
break #only one match needed
if processing:
preprocess(xml_file)