<p>在<a href="http://docs.python.org/library/csv.html#examples" rel="noreferrer">http://docs.python.org/library/csv.html#examples</a>给出的关于如何读取Unicode的示例代码看起来已经过时,因为它不适用于Python 2.6和2.7。</p>
<p>下面是<code>UnicodeDictReader</code>,它可以与utf-8一起工作,也可以与其他编码一起工作,但我只在utf-8输入端测试过它。</p>
<p>简而言之,其思想是仅在csv行被<code>csv.reader</code>拆分为字段后解码Unicode。</p>
<pre><code>class UnicodeCsvReader(object):
def __init__(self, f, encoding="utf-8", **kwargs):
self.csv_reader = csv.reader(f, **kwargs)
self.encoding = encoding
def __iter__(self):
return self
def next(self):
# read and split the csv row into fields
row = self.csv_reader.next()
# now decode
return [unicode(cell, self.encoding) for cell in row]
@property
def line_num(self):
return self.csv_reader.line_num
class UnicodeDictReader(csv.DictReader):
def __init__(self, f, encoding="utf-8", fieldnames=None, **kwds):
csv.DictReader.__init__(self, f, fieldnames=fieldnames, **kwds)
self.reader = UnicodeCsvReader(f, encoding=encoding, **kwds)
</code></pre>
<p>用法(源文件编码为utf-8):</p>
<pre><code>csv_lines = (
"абв,123",
"где,456",
)
for row in UnicodeCsvReader(csv_lines):
for col in row:
print(type(col), col)
</code></pre>
<p>输出:</p>
<pre><code>$ python test.py
<type 'unicode'> абв
<type 'unicode'> 123
<type 'unicode'> где
<type 'unicode'> 456
</code></pre>