使用Beautiful Soup爬取Wikipedia表格返回'None

2 投票

1 回答

32 浏览

提问于 2025-04-14 17:49

我刚开始接触网页抓取和编程。对更有经验的人来说，这可能是个简单的问题……也许不是……问题是这样的：

我想从维基百科抓取一个表格。我已经在网页的HTML中找到了这个表格，并把相关信息加到了我的代码里。但是当我运行代码时，返回的结果是'none'，而不是确认表格已经正确找到的信息。

from bs4 import BeautifulSoup
from urllib.request import urlopen


url = 'https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles'
html = urlopen(url) 
soup = BeautifulSoup(html, 'html.parser')            

table = soup.find('table',{'class':'wikitable sortable plainrowheaders jquery-tablesorter'})
print(table)

返回：None

数据解析网页抓取 beautiful soup 维基百科 HTML 编程入门

1 个回答

从"class"字符串中去掉jquery-tablesorter这个类 - 这个类是由JavaScript添加的，而beautifulsoup看不到它（注意：总是观察服务器发送给你的真实HTML文档，这就是beautifulsoup所看到的 - 在你的浏览器中按ctrl-U可以查看）：

from urllib.request import urlopen

from bs4 import BeautifulSoup

url = "https://en.wikipedia.org/wiki/List_of_songs_recorded_by_the_Beatles"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")

table = soup.find("table", {"class": "wikitable sortable plainrowheaders"})
print(table)

打印输出：

<table class="wikitable sortable plainrowheaders" style="text-align:center">
<caption>Name of song, core catalogue release, songwriter, lead vocalist and year of original release
</caption>
<tbody><tr>
<th scope="col">Song
</th>
<th scope="col">Core catalogue release(s)
</th>
<th scope="col">Songwriter(s)
</th>

...

回答于 2025-04-14 由 Python大师

分享举报

使用Beautiful Soup爬取Wikipedia表格返回'None

1 个回答

撰写回答