<p><code>read_csv</code>使用<code>encoding</code>选项处理不同格式的文件。我通常用<code>read_csv('file', encoding = "ISO-8859-1")</code>,或者<code>encoding = "utf-8"</code>来阅读,通常用<code>utf-8</code>来阅读<code>to_csv</code>。</p>
<p>您还可以使用几个<code>alias</code>选项中的一个,例如<code>'latin'</code>,而不是<code>'ISO-8859-1'</code>(请参见<a href="https://docs.python.org/3/library/codecs.html#standard-encodings" rel="noreferrer">python docs</a>,也可以查看您可能遇到的许多其他编码)。</p>
<p>见<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html" rel="noreferrer">relevant Pandas documentation</a>,
<a href="http://docs.python.org/3/library/csv.html#examples" rel="noreferrer">python docs examples on csv files</a>,还有很多相关的问题。一个好的背景资源是<a href="https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/" rel="noreferrer">What every developer should know about unicode and character sets</a>。</p>
<p>要检测编码(假设文件包含非ascii字符),可以使用<code>enca</code>(请参见<a href="https://linux.die.net/man/1/enconv" rel="noreferrer">man page</a>)或<code>file -i</code>(linux)或<code>file -I</code>(osx)(请参见<a href="https://linux.die.net/man/1/file" rel="noreferrer">man page</a>)。</p>