替换特殊字符（\n、\r等）

2条回答

网友

1楼 · 编辑于 2024-04-26 03:29:10

在http.client.HTTPResponse（从urllib.request.urlopen得到的）上的read()返回一个bytes对象。不能简单地使用str(your_bytes_object)将其转换为str，因为这会将\r\n（打印为换行符）转换为\\r\\n（实际打印为\r\n而不是换行符的编码形式）：

>>> a_bytes_object = b'This is a test\r\nMore test'
>>> str(a_bytes_object)
"b'This is a test\\r\\nMore test'"
>>> print(str(a_bytes_object))
b'This is a test\r\nMore test'

相反，必须使用bytes.decode(your_encoding)对给定的bytes对象进行解码。latin-1如果只需要将其解码为字符串以写入文件，则通常将其用作编码：

^{pr2}$

您也可以将编码作为第二个参数传递给str，而不是使用decode。 str(a_bytes_object, "latin-1") 而不是 a_bytes_object.decode("latin-1")

或者，您可以简单地以二进制模式（open('/file/path', 'wb')）打开文件，然后将bytes对象写入其中。在

^{3}$

您也可以尝试读取Content-Type头（类似于text/html; charset=ISO-8859-1）来提取字符集，然后解码为正确的字符串，但这是有风险的，因为它不总是有效的（不是所有服务器都发送头，不是所有的都包括编码，不是所有的编码都受Python支持，等等）。在

网友

2楼 · 编辑于 2024-04-26 03:29:10

我认为您将replace视为直接修改字符串的内容，而不是返回需要分配给新变量的内容。在

from urllib.request import urlopen
url = 'http://www.google.com'
html = str(urlopen(url).read())

html_2 = html.replace('\r','')

with open('/file/path/filename.txt', 'w') as f:
    f.write(html_2)

相关问题更多 >

编程相关推荐

热门问题

热门文章