"属性错误：'HTTPResponse'对象没有'split'属性"

import urllib.request import urllib from bs4 import BeautifulSoup symbolsfile = open("Stocklist.txt") symbolslist = symbolsfile.read() thesymbolslist = symbolslist.split("\n") i=0 while i<len (thesymbolslist): theurl = "http://www.google.com/finance/getprices?q=" + thesymbolslist[i] + "&i=10&p=25m&f=c" thepage = urllib.request.urlopen (theurl) print(thesymbolslist[i] + " price is " + thepage.split()[len(thepage.split())-1]) i= i+1

2条回答

网友

1楼 · 编辑于 2024-04-19 18:08:59

问题的原因

这是因为urllib.request.urlopen (theurl)返回的是表示连接的对象，而不是字符串。

解决方案

若要从该连接读取数据并实际获取字符串，需要执行以下操作

thepage = urllib.request.urlopen(theurl).read()

然后剩下的代码应该自然地执行。

解决方案附录

有时，字符串本身包含无法识别的字符编码标志符号，在这种情况下，Python会将其转换为bytestring。

正确的处理方法是找到正确的字符编码，并使用它将bytestring解码为常规字符串，如this question：

thepage = urllib.request.urlopen(theurl)
# read the correct character encoding from `Content-Type` request header
charset_encoding = thepage.info().get_content_charset()
# apply encoding
thepage = thepage.read().decode(charset_encoding)

有时假设字符编码是utf-8是安全的，在这种情况下

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

经常工作。这是一个统计上的好猜测。

网友

2楼 · 编辑于 2024-04-19 18:08:59

检查documentation可能在将来节省您的时间。它说urlopen（）方法返回一个具有read（）方法的HTTPResponse对象。在Python 3中，您需要解码源代码的输出，在本例中是UTF-8。所以就写吧

thepage = urllib.request.urlopen(theurl).read().decode('utf-8')

问题的原因

解决方案

解决方案附录

相关问题更多 >

编程相关推荐

热门问题

热门文章