在Python中将XML写入CSV时的编码错误
我正在尝试把一个XML文件转换成CSV格式,但这个XML文件的编码是"ISO-8859-1",里面似乎有一些字符不在Python用来写行的ascii编码里。
我遇到了这个错误:
Traceback (most recent call last):
File "convert_folder_to_csv_PLAYER.py", line 139, in <module>
xml2csv_PLAYER(filename)
File "convert_folder_to_csv_PLAYER.py", line 121, in xml2csv_PLAYER
fout.writerow(row)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 4: ordinal not in range(128)
我试着这样打开文件:
dom1 = parse(input_filename.encode( "utf-8" ) )
我还试着在写入每一行之前替换掉\xE1这个字符。有没有什么建议?
1 个回答
1
XML解析器返回的是unicode
对象。这其实是个好事。不过,csv
模块处理不了这些对象。
你可以在把XML解析器返回的每个unicode
字符串交给csv
写入器之前,先对它们进行编码,但更好的办法是使用这个来自csv
模块官方文档的csv UnicodeWriter
示例:
import csv, codecs, cStringIO
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)