访问zip文件内文件夹中的文件
我想要访问一个压缩文件(zip文件)里的xml文件,以便对它们进行一些筛选。但是我不知道怎么深入到压缩文件里的文件夹去访问这些文件。我的问题是,如果文件在某些文件夹里,我就无法通过zip_file.namelist来访问它们。以下是我的代码:
import sys, getopt
from lxml import etree
from io import StringIO
import zipfile
def main(argv):
inputfile = ''
outputfile = ''
try:
opts, args = getopt.getopt(argv,"hi:o:",["ifile=","ofile="])
except getopt.GetoptError:
print 'test.py -i <inputfile> -o <outputfile>'
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print 'test.py -i <inputfile> -o <outputfile>'
sys.exit()
elif opt in ("-i", "--ifile"):
inputfile = arg
elif opt in ("-o", "--ofile"):
outputfile = arg
archive = zipfile.ZipFile(inputfile, 'r')
with archive as zip_file:
for file in zip_file.namelist():
if file.endswith(".amd"):
try:
print("Process the file")
xslt_root = etree.XML('''\
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="TimeStamp"/>
<xsl:template match="@timeStamp"/>
<xsl:template match="TimeStamps"/>
<xsl:template match="Signature"/>
</xsl:stylesheet>
''')
transform = etree.XSLT(xslt_root)
doc = etree.parse(zip_file.open(file))
result_tree = transform(doc)
resultfile = unicode(str(result_tree))
zip_file.write(resultfile)
finally:
zip_file.close()
if __name__=='__main__':
main(sys.argv[1:])
错误提示:无法读取“ex4_linktime/”,因为这是一个文件夹而不是文件!
File "parser.pxi", line 1110, in lxml.etree._BaseParser._parseDocFromFile (src\lxml\lxml.etree.c:96832)
File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:91290)
File "parser.pxi", line 683, in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:92476)
File "parser.pxi", line 620, in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:91737)
IOError: Error reading file 'ex4_linktime/': failed to load external entity "ex4_linktime/"
错误提示2:没有将修改后的文件写回去!
File "C:\Python27\lib\zipfile.py", line 1033, in write
st = os.stat(filename)
WindowsErrorProcess the file
: [Error 3] The system cannot find the path specified: u'<?xml version="1.0"? >\n<ComponentData toolVersion="V6.1.4" schemaVersion="6.1.0.0">\n\t<DataSet name="Bank1">...
1 个回答
3
当你使用
etree.parse(file)
时,file 只是一个字符串。etree 并不知道它需要去 zip 文件里找这个名字,它只会在当前目录下查找。你可以试试:doc = etree.parse(zip_file.open(file))
你还需要跳过目录名称——这些名称后面会有一个斜杠:
for filename in zip_file.namelist(): if filename.endswith('/'): # skip directory names continue
要更新 zip 文件,可以使用:
zip_file.writestr(filename, resultfile)