<p><code>r.content</code>返回<code>bytes</code>(相反,<a href="http://www.python-requests.org/en/latest/user/quickstart/#response-content" rel="nofollow noreferrer">^{<cd3>} returns a ^{<cd4>}</a>。<code>requests</code>模块尝试根据HTTP头猜测正确的解码,并使用该编码为您解码字节。在将来,也许这就是你想用的)</p>
<p>如果<code>r.content</code>包含<code>bytes</code>如<code>b'command\xc5\xabcor'</code>,则
<code>str(r.content)</code>返回一个<code>str</code>,以字符<code>b'</code>开始,以文字<code>'</code>结束</p>
<pre><code>In [45]: str(b'command\xc5\xabcor')
Out[45]: "b'command\\xc5\\xabcor'"
</code></pre>
<p>可以使用<a href="https://docs.python.org/3/library/ast.html" rel="nofollow noreferrer">^{<cd13>}</a>恢复字节:</p>
<pre><code>In [46]: ast.literal_eval(str(b'command\xc5\xabcor'))
Out[46]: b'command\xc5\xabcor'
</code></pre>
<p>然后您可以将这些<code>bytes</code>解码为<code>str</code>。您发布的URL声明内容是UTF-8编码的:</p>
<pre><code><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</code></pre>
<p>假设您下载的所有数据都使用相同的编码,您可以通过调用<code>bytes.decode('utf-8')</code>方法将内容恢复为str:</p>
<pre><code>In [47]: ast.literal_eval(str(b'command\xc5\xabcor')).decode('utf-8')
Out[47]: 'commandūcor'
</code></pre>
<hr/>
<pre><code>import ast
import requests
r = requests.get("https://www.dizionario-latino.com/dizionario-latino-flessione.php?lemma=COMMANDUCOR100", verify = False)
out = str(r.content)
with open("test.html", 'w') as file:
file.write(out)
with open("test.html", 'r') as f_in, open("test-fixed.html", 'w') as f_out:
broken_text = f_in.read()
content = ast.literal_eval(broken_text)
assert content == r.content
text = content.decode('utf-8')
f_out.write(text)
</code></pre>