Beautiful Soup 4未能打印网页文本

0 投票

1 回答

856 浏览

提问于 2025-04-18 04:33

我正在使用 Python 3.4，配合 Beautiful Soup 4 和 requests 库。
我想抓取一个网页，并用 Beautiful Soup 打印出网页上的文本。它可以抓取网页并打印出标题，如果我提供编码（utf-8），它甚至可以美化输出，但当我尝试打印网页上的文本时，就出现了编码错误。

from bs4 import BeautifulSoup
import requests

sparknotesSearch = requests.get("http://www.sparknotes.com/search?q=Sonnet")
soup = BeautifulSoup(sparknotesSearch.text)

print (soup.title)
#Can't print this?
print(soup.get_text())

我得到的错误/输出是这样的：

<title>SparkNotes Search Results: sONNET</title>
Traceback (most recent call last):
  File "C:\Users\Cayle J. Elsey\Dropbox\Programming\Salient_Point\networking.py", line 10, in <module>
    print(soup.get_text())
  File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 6238: character maps to <undefined>
[Finished in 0.5s]

数据提取网页抓取文本解析 beautiful soup 编码错误 requests 库

1 个回答

只需要把你的字符串转换成UTF-8格式，就能解决你的问题了。

 html= soup.prettify()
   html=html.encode('UTF-8')

回答于 2025-04-18 由 Python大师

分享举报

Beautiful Soup 4未能打印网页文本

1 个回答

撰写回答