BeautifulSoup:“NoneType”错误，该错误会不时引发

import requests from bs4 import BeautifulSoup as bs tagsnumber = 0 even = 0 descdict_counter = -1 linkdict = {} w3schools = requests.get('http://www.w3schools.com/tags/default.asp').text table = bs(w3schools, "lxml").tbody tdlist = table('td') # to find the descriptions alist = table('a') # to get all the links for link in alist: descdict_counter += 2 # to extract all the even td for decsriptions fulllink = str('http://www.w3schools.com/tags/' + link.get('href')) shortdesc = str(tdlist[descdict_counter].string) key_iter = {str(link.string): fulllink} linkdict.update(key_iter) tagsnumber += 1 print('Total tags imported: ' + str(tagsnumber)) print(linkdict)

1条回答

网友

1楼 · 发布于 2024-06-16 09:34:16

我同意@DYZ所说的。除此之外，您可能对BeautifulSoup的另一种替代方法感兴趣，它有时通过xpath表达式提供对更简单解决方案的访问。是lxml。你知道吗

>>> import requests
>>> w3schools = requests.get('http://www.w3schools.com/tags/default.asp').text
>>> from lxml import html
>>> tree = html.fromstring(w3schools)
>>> links = tree.xpath('//table[@class="w3-table-all notranslate"]//a')
>>> len(links)
119
>>> descrips = tree.xpath('//table[@class="w3-table-all notranslate"]//td[2]')
>>> len(descrips)
119
>>> links[0].attrib
{'href': 'tag_comment.asp'}
>>> descrips[0].text
'Defines a comment'

编辑：差点忘了：您的代码取决于tbody标记的存在。浏览器将很高兴地显示包含缺少此标记的表的页面。因此，即使在没有什么借口的今天，它也常常被忽略。但如果我没弄错的话，它的缺失会让你的代码嘎吱作响。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章