如何在Python/BeautifulSoup中对列表元素使用FIND() - 我遇到了NoneType错误
好的,这段代码是可以运行的:
from bs4 import BeautifulSoup
import urllib
import re
htmlfile = urllib.urlopen(MY SITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())
title = soup.find('p', {'class': 'deal-title should-truncate'}).getText()
print "Title: " + str(title)
但是上面的代码只给我返回了第一个结果。我想要能够遍历整个网站,找到每一个出现的情况。为此,我尝试使用一个综合循环来查找每次出现的figure标签(因为这个段落标签总是在figure标签之间)。这样我就可以专注于figure里面的内容。然而,当我尝试下面的代码时:
from bs4 import BeautifulSoup
import urllib
import re
htmlfile = urllib.urlopen(MY WEBSITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())
deals = [figure for figure in soup.findAll('figure')]
for i in deals:
title = i.find('p', {'class': 'deal-title should-truncate'}).getText()
print "Title: " + str(title)
我得到了这个错误:
追踪记录(最近的调用在最前): 文件 "C:\Python27\blah.py",第11行, title = i.find('p', {'class': 'deal-title should-truncate'}).getText() 属性错误:'NoneType'对象没有属性'getText'
现在我在尝试:
from bs4 import BeautifulSoup import urllib import re
htmlfile = urllib.urlopen(MY SITE SITS HERE) soup = BeautifulSoup(htmlfile.read())
deals = soup.findAll('figure')
for i in deals:
title = i.find('p', {'class': 'deal-title should-truncate'})
if (title == None):
title = "NONE"
else:
title = title.getText()
print "Title: " + str(title)
结果错误变成了:
追踪记录(最近的调用在最前): 文件 "C:\Python27\blah.py",第16行, print "Title: " + str(title) Unicode编码错误:'ascii'编解码器无法编码字符u'\u2013',位置12:序数不在范围内(128)
1 个回答
0
最后的答案,特别感谢BlackJack的帮助
from bs4 import BeautifulSoup
import urllib
import re
htmlfile = urllib.urlopen(MY SITE SITS HERE)
soup = BeautifulSoup(htmlfile.read())
deals = soup.findAll('figure')
for i in deals:
title = i.find('p', {'class': 'deal-title should-truncate'})
if (title == None):
title = "NONE"
else:
title = title.getText()
print "Title: " + title