Python Beautiful Soup'ascii'编码解码器无法编码字符u'\xa5'

2024-04-25 05:14:21 发布

男 | 程序猿一只，喜欢编程写python代码。

我在网上抓取网页的某些元素时遇到了一些奇怪的字符。出现错误的字符有：

? ????Á¢¢Á? /?? />? /??? ?/¢¥Á ??%% ?Á ?????Á? ?> /???¥??> ¥? ¥©Á ?>¢¥/%%/¥??> ?Â >Á? Â?Á ©???¢ ñ%Á?¥???/% Á%Á?¥??>?? />? Â??Á? ??¥?? ??¢¥????¥??> ¢`¢¥Á¢ ??%% ?Á ??À?/?Á? ¥? _ÁÁ¥ ?>??Á/¢?>À Á????Á>¥ ????¥Á? />? ??__?>??/¥??>¢ ?Á

我的代码如下

url= "http://www.nsf.gov#######@#@#@##";
    #webbrowser.open(url,new =new );
    flagcnt+=1
    if flagcnt%20==0: #autosleep for avoiding shut-out
        print "flagcount: "
        print flagcnt
        time.sleep(5)
     #Program Code extraction
    r = requests.get (url)
    sp=BeautifulSoup(r.content)

页码：http://www.nsf.gov/awardsearch

我阅读了关于这个错误的所有页面，其中一些页面建议解码和编码，但它们似乎没有帮助。我不知道这里使用的是哪种编码。尝试了降低BS版本，但没有帮助。如有任何帮助，我们将不胜感激。 Python2.7 BS 4

Tags： http url 元素网页编码 new bs www

1条回答

网友

1楼 · 发布于 2024-04-25 05:14:21

这对我有效：

page_text = r.text.encode('utf-8').decode('ascii', 'ignore')
page_soupy = BeautifulSoup.BeautifulSoup(page_text)

Python Beautiful Soup'ascii'编码解码器无法编码字符u'\xa5'

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python Beautiful Soup'ascii'编码解码器无法编码字符u'\xa5'

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >