<h2>tl;博士</h2>
<p>答案是永远不要!<sub>(除非你真的知道自己在做什么)</sub></p>
<p>9/10倍的解决方案可以通过正确理解编码/解码来解决。</p>
<p>1/10的人的区域设置或环境定义不正确,需要设置:</p>
<pre><code>PYTHONIOENCODING="UTF-8"
</code></pre>
<p>在他们的环境中修复控制台打印问题。</p>
<h2>它是做什么的?</h2>
<p><strike><code>sys.setdefaultencoding("utf-8")</code></strike>(删除以避免重复使用)更改每当Python 2.x需要将Unicode()转换为str()时(反之亦然)使用的默认编码/解码,并且不提供编码。一、 e:</p>
<pre><code>str(u"\u20AC")
unicode("€")
"{}".format(u"\u20AC")
</code></pre>
<p>在Python 2.x中,默认编码设置为ASCII,上面的示例将失败:</p>
<pre><code>UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
</code></pre>
<p>(我的控制台配置为UTF-8,因此<code>"€" = '\xe2\x82\xac'</code>,因此<code>\xe2</code>上出现异常)</p>
<p>或者</p>
<pre><code>UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)
</code></pre>
<p><strike><code>sys.setdefaultencoding("utf-8")</code></strike>允许这些为我工作,但不一定为不使用UTF-8的人工作。<strong>默认的ASCII可确保编码假设不会烘焙到代码中</strong></p>
<h3>控制台</h3>
<p><strike><code>sys.setdefaultencoding("utf-8")</code></strike>还有一个副作用,即在将字符打印到控制台时,似乎可以修复<code>sys.stdout.encoding</code>。Python使用用户的语言环境(Linux/OS X/Un*X)或代码页(Windows)来设置此设置。有时,用户的区域设置被破坏,只需要<code>PYTHONIOENCODING</code>来修复控制台编码。</p>
<p>示例:</p>
<pre class="lang-none prettyprint-override"><code>$ export LANG=en_GB.gibberish
$ python
>>> import sys
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
>>> print u"\u20AC"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)
>>> exit()
$ PYTHONIOENCODING=UTF-8 python
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
>>> print u"\u20AC"
€
</code></pre>
<h3><strike>sys.setdefaultencoding(“utf-8”)</strike>有什么不好的?</h3>
<p>16年来,人们一直在开发针对Python2.x的代码,他们的理解是默认编码是ASCII。<code>UnicodeError</code>编写了异常处理方法,以处理在发现包含非ASCII的字符串上进行的字符串到Unicode的转换。</p>
<p>来自<a href="https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/" rel="noreferrer">https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/</a></p>
<pre><code>def welcome_message(byte_string):
try:
return u"%s runs your business" % byte_string
except UnicodeError:
return u"%s runs your business" % unicode(byte_string,
encoding=detect_encoding(byte_string))
print(welcome_message(u"Angstrom (Å®)".encode("latin-1"))
</code></pre>
<blockquote>
<p>Previous to setting defaultencoding this code would be unable to decode the “Å” in the ascii encoding and then would enter the exception handler to guess the encoding and properly turn it into unicode. Printing: Angstrom (Å®) runs your business. Once you’ve set the defaultencoding to utf-8 the code will find that the byte_string can be interpreted as utf-8 and so it will mangle the data and return this instead: Angstrom (Ů) runs your business.</p>
</blockquote>
<p><strong>更改应该是常量的内容将对依赖的模块产生显著影响。最好是修复进出代码的数据。</strong></p>
<h3>示例问题</h3>
<p>虽然在下面的示例中,将defaultencoding设置为UTF-8不是根本原因,但它显示了如何掩盖问题,以及当输入编码更改时,代码如何以不明显的方式中断:
<a href="https://stackoverflow.com/questions/38518023/unicodedecodeerror-utf8-codec-cant-decode-byte-0x80-in-position-3131-invali/38534992#38534992">UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 3131: invalid start byte</a></p>