Unicode:Python/lxml文件输出不符合预期（打印vs write）

import lxml file_name = input('Enter the file name, including .xml extension: ') print('Parsing ' + file_name) from lxml import etree parser = lxml.etree.XMLParser() tree = lxml.etree.parse(file_name, parser) root = tree.getroot() nsmap = {'xmlns': 'urn:tva:metadata:2010'} with open(file_name+'.log', 'w', encoding='utf-8') as f: for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap): crid = (info.get('programId')) titlex = (info.find('.//xmlns:Title', namespaces=nsmap)) title = (titlex.text if titlex != None else 'Missing') synopsis1x = (info.find('.//xmlns:Synopsis[1]', namespaces=nsmap)) synopsis1 = (synopsis1x.text if synopsis1x != None else 'Missing') synopsis1 = synopsis1.replace('\r','').replace('\n','') f.write('{}|{}|{}\n'.format(crid, title, synopsis1))

2条回答

网友

1楼 · 编辑于 2024-04-25 13:07:56

你的代码看起来不错，所以我认为你的输入是无效的。假设您使用UTF-8查看器或shell查看输出文件，那么我怀疑<?xml中的编码与实际编码不匹配。在

这就解释了为什么打印可以工作，但不能写入文件。如果您的shell/IDE设置为“ISO-8859-2”，并且您的输入XML也是“ISO-8859-2”，那么打印将推出原始编码。在

网友

2楼 · 编辑于 2024-04-25 13:07:56

我以前有过很多麻烦。但解决办法相当简单。有一章介绍了如何用unicode读写documentation中的文件。这个Python talk对理解这个问题也很有启发性。Unicode可能很痛苦。不过，如果您开始使用python3，它会变得容易得多。在

import codecs
f = codecs.open('test', encoding='utf-8', mode='w+')
f.write(u'\u4500 blah blah blah\n')
f.seek(0)
print repr(f.readline()[:1])
f.close()

相关问题更多 >

编程相关推荐

热门问题

热门文章