Python - SQLite 导出CSV错误 - ASCII值未解析
下午好,
我在用Python把SQLite数据库的数据转成CSV文件时遇到了一些麻烦。我到处找答案,但都没能解决我的问题,或者是我的代码写得有问题。
我想把SQLite数据库中那些不在ASCII字符表里的字符(也就是大于128的字符)替换掉。
这是我一直在用的代码:
#!/opt/local/bin/python
import sqlite3
import csv, codecs, cStringIO
class UnicodeWriter:
"""
A CSV writer which will write rows to CSV file "f",
which is encoded in the given encoding.
"""
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow([unicode(s).encode("utf-8") for s in row])
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
conn = sqlite3.connect('test.db')
c = conn.cursor()
# Select whichever rows you want in whatever order you like
c.execute('select ROWID, Name, Type, PID from PID')
writer = UnicodeWriter(open("ProductListing.csv", "wb"))
# Make sure the list of column headers you pass in are in the same order as your SELECT
writer.writerow(["ROWID", "Product Name", "Product Type", "PID", ])
writer.writerows(c)
我试着按照这里的说明加上'replace',但还是出现了同样的错误。Python: Convert Unicode to ASCII without errors for CSV file
错误是UnicodeDecodeError。
Traceback (most recent call last):
File "SQLite2CSV1.py", line 53, in <module>
writer.writerows(c)
File "SQLite2CSV1.py", line 32, in writerows
self.writerow(row)
File "SQLite2CSV1.py", line 19, in writerow
self.writer.writerow([unicode(s).encode("utf-8") for s in row])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 65: ordinal not in range(128)
显然,我希望代码能足够强大,如果遇到这些范围外的字符,就能把它们替换成像'?'(\x3f)这样的字符。
有没有办法在UnicodeWriter类中做到这一点?还有什么方法可以让代码更强大,不会出现这些错误。
非常感谢你的帮助。
2 个回答
0
在使用unix环境时,这个方法对我有效
sqlite3.exe a.db .dump > a.sql;
tr -d "[\\200-\\377]" < a.sql > clean.sql;
sqlite3.exe clean.db < clean.sql;
(这不是一个python的解决方案,但由于它简洁,可能对其他人有帮助。这个方法是直接去掉所有非ascii字符,而不是尝试替换它们。)
1
如果你只是想写一个ASCII格式的CSV文件,可以直接使用内置的 csv.writer()
。为了确保你传入的所有值都是ASCII格式的,可以使用 encode('ascii', errors='replace')
。
举个例子:
import csv
rows = [
[u'some', u'other', u'more'],
[u'umlaut:\u00fd', u'euro sign:\u20ac', '']
]
with open('/tmp/test.csv', 'wb') as csvFile:
writer = csv.writer(csvFile)
for row in rows:
asciifiedRow = [item.encode('ascii', errors='replace') for item in row]
print '%r --> %r' % (row, asciifiedRow)
writer.writerow(asciifiedRow)
运行这个代码后,控制台会输出:
[u'some', u'other', u'more'] --> ['some', 'other', 'more']
[u'umlaut:\xfd', u'euro sign:\u20ac', ''] --> ['umlaut:?', 'euro sign:?', '']
生成的CSV文件内容是:
some,other,more
umlaut:?,euro sign:?,