<p>在Python 2.x中:</p>
<pre><code>f = open('data.txt', 'rb')
</code></pre>
<p>正如<a href="http://docs.python.org/2/library/functions.html#open">the docs</a>所说:</p>
<blockquote>
<p>The default is to use text mode, which may convert '\n' characters to a platform-specific representation on writing and back on reading. Thus, when opening a binary file, you should append <code>'b'</code> to the mode value to open the file in binary mode, which will improve portability. (Appending <code>'b'</code> is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.)</p>
</blockquote>
<p>在Python3.x中,有三种选择:</p>
<pre><code>f1 = open('data.txt', 'rb')
</code></pre>
<p>这将使换行保持未转换,但也将返回<code>bytes</code>,而不是<code>str</code>,您必须亲自将<code>decode</code>显式地返回到Unicode。(当然,2.x版本也返回了需要手动解码的字节(如果您想要Unicode的话),但在2.x版本中,<code>str</code>对象就是这样;在3.x版本中,<code>str</code>是Unicode的。)</p>
<pre><code>f2 = open('data.txt', 'r', newline='')
</code></pre>
<p>这将返回<code>str</code>,并保留未翻译的换行符。然而,与2.x等价物不同,<code>readline</code>和friends将把<code>'\r\n'</code>视为换行符,而不是后跟换行符的正则字符。通常这无关紧要,但如果是的话,请记住。</p>
<pre><code>f3 = open('data.txt', 'rb', encoding=locale.getpreferredencoding(False))
</code></pre>
<p>这与2.x代码处理新行的方式完全相同,并且返回<code>str</code>使用的编码与使用所有默认值时得到的编码相同…但在当前3.x中不再有效</p>
<blockquote>
<p>When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated.</p>
</blockquote>
<p>需要为<code>f3</code>指定显式编码的原因是,以二进制模式打开文件意味着默认值从“使用<code>locale.getpreferredencoding(False)</code>解码”更改为“不解码,并返回原始<code>bytes</code>,而不是<code>str</code>”。同样,从<a href="http://docs.python.org/3/library/functions.html#open">the docs</a>:</p>
<blockquote>
<p>In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave encoding unspecified.)</p>
</blockquote>
<p>但是:</p>
<blockquote>
<p>'encoding' … should only be used in text mode.</p>
</blockquote>
<p>而且,至少从3.3开始,这是强制的;如果尝试使用二进制模式,则会得到<code>ValueError: binary mode doesn't take an encoding argument</code>。</p>
<p>所以,如果您想编写同时在2.x和3.x上工作的代码,您使用什么?如果你想处理<code>bytes</code>,显然<code>f</code>和f1<code>are the same. But if you want to deal in</code>str<code>, as appropriate for each version, the simplest answer is to write different code for each, probably</code>f<code>and</code>f2`。如果出现很多这样的情况,请考虑编写以下任一包装函数:</p>
<pre><code>if sys.version_info >= (3, 0):
def crlf_open(path, mode):
return open(path, mode, newline='')
else:
def crlf_open(path, mode):
return open(path, mode+'b')
</code></pre>
<p>在编写多版本代码时要注意的另一件事是,如果您没有编写区域设置感知代码,<code>locale.getpreferredencoding(False)</code>在3.x中几乎总是返回一些合理的值,但在2.x中通常只返回<code>'US-ASCII'</code>。使用<code>locale.getpreferredencoding(True)</code>在技术上是不正确的,但如果你不想考虑编码的话,可能更像是你真正想要的。(尝试在2.x和3.x解释器中同时调用它,以查看原因或阅读文档。)</p>
<p>当然,如果你真的知道文件的编码,那总比猜测好。</p>
<p>在这两种情况下,<code>'r'</code>都表示“只读”。如果不指定模式,则默认值为<code>'r'</code>,因此与默认值等效的二进制模式为<code>'rb'</code>。</p>