如何在Python/BeautifulSoup中对列表元素使用FIND() - 我遇到了NoneType错误

0 投票
1 回答
570 浏览
提问于 2025-04-18 13:54

好的,这段代码是可以运行的:

from bs4 import BeautifulSoup
import urllib
import re

htmlfile = urllib.urlopen(MY SITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())

title = soup.find('p', {'class': 'deal-title should-truncate'}).getText()  
print "Title: " + str(title)

但是上面的代码只给我返回了第一个结果。我想要能够遍历整个网站,找到每一个出现的情况。为此,我尝试使用一个综合循环来查找每次出现的figure标签(因为这个段落标签总是在figure标签之间)。这样我就可以专注于figure里面的内容。然而,当我尝试下面的代码时:

from bs4 import BeautifulSoup
import urllib
import re

htmlfile = urllib.urlopen(MY WEBSITE URL SITS HERE)
soup = BeautifulSoup(htmlfile.read())

deals = [figure for figure in soup.findAll('figure')]

for i in deals:
    title = i.find('p', {'class': 'deal-title should-truncate'}).getText()  
    print "Title: " + str(title)

我得到了这个错误:

追踪记录(最近的调用在最前): 文件 "C:\Python27\blah.py",第11行, title = i.find('p', {'class': 'deal-title should-truncate'}).getText() 属性错误:'NoneType'对象没有属性'getText'

现在我在尝试:

from bs4 import BeautifulSoup import urllib import re

htmlfile = urllib.urlopen(MY SITE SITS HERE) soup = BeautifulSoup(htmlfile.read())

deals = soup.findAll('figure')

for i in deals:
    title = i.find('p', {'class': 'deal-title should-truncate'})
    if (title == None):
        title = "NONE"
    else:
        title = title.getText()
    print "Title: " + str(title)

结果错误变成了:

追踪记录(最近的调用在最前): 文件 "C:\Python27\blah.py",第16行, print "Title: " + str(title) Unicode编码错误:'ascii'编解码器无法编码字符u'\u2013',位置12:序数不在范围内(128)

1 个回答

0

最后的答案,特别感谢BlackJack的帮助

from bs4 import BeautifulSoup
import urllib
import re

htmlfile = urllib.urlopen(MY SITE SITS HERE)
soup = BeautifulSoup(htmlfile.read())

deals = soup.findAll('figure')

for i in deals:
    title = i.find('p', {'class': 'deal-title should-truncate'})
    if (title == None):
        title = "NONE"
    else:
        title = title.getText()
    print "Title: " + title

撰写回答