从简化的选项卡中刮取数据

from bs4 import BeautifulSoup import urllib2 url = "http://www.gks.ru/bgd/free/B00_25/IssWWW.exe/Stg/d000/000715.HTM" page = urllib2.urlopen(url) soup = BeautifulSoup(page.read(), 'html.parser') table=soup.findAll('p',text=True) print(table)

1条回答

网友

1楼 · 发布于 2024-04-25 00:39:18

假设您想要获得每月的价格数据，您需要在table中找到所有tr元素，并跳过前3行（标题行）。请注意，html.parser对我不起作用，但lxml起作用（请参见Differences between parsers）：

soup = BeautifulSoup(page, 'lxml')  # requires 'lxml' to be installed

table = soup.find("center").find("table")
for row in table.find_all("tr")[3:]:
    cells = [cell.get_text(strip=True) for cell in row.find_all("td")]
    print(cells)

印刷品：

['January', '469,4', '15,0', '3,9']
['February', '479,8', '16,7', '2,2']
['March', '485,6', '16,9', '1,2']
['April', '487,8', '16,4', '0,5']
['May', '489,5', '15,8', '0,4']
['June', '490,5', '15,3', '0,2']
['July', '494,4', '15,6', '0,8']
['August', '496,1', '15,8', '0,4']
['September', '499,0', '15,7', '0,6']
['October', '502,7', '15,6', '0,7']
['November', '506,4', '15,0', '0,8']
['December', '', '', '']

相关问题更多 >

编程相关推荐

热门问题

热门文章

从简化的选项卡中刮取数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >