用BeautifulSoup从网页中获取特定表

import requests from bs4 import BeautifulSoup url = "http://www.dividend.com/dividend-stocks/" r = requests.get(url) soup = BeautifulSoup(r.content, "html5lib") # Skip first two tables tables = soup.find("table") tables = tables.find_next("table") tables = tables.find_next("table") row = '' for td in tables.find_all("td"): if len(td.text.strip()) > 0: row = row + td.text.strip().replace('\n', ' ') +',' # Handle last column in a row, remove extra comma and add new line if td.get('data-th') == 'Pay Date': row = row[:-1] + '\n' print(row)

3条回答

网友

1楼 · 编辑于 2024-04-25 01:25:14

尽管@Barmar的方法看起来更简洁，但这里有另一种选择，使用soup.find_all并保存到JSON（即使在描述中没有这样做）。在

import json

import requests
from bs4 import BeautifulSoup

url = 'http://www.dividend.com/dividend-stocks/'
r = requests.get(url)
r.raise_for_status()
soup = BeautifulSoup(r.content, 'lxml')
stocks = {}

# Skip first two tables and header row of target table
for tr in soup.find_all('table')[2].find_all('tr')[1:]:
    (stock_symbol, company_name, _, dividend_yield, current_price,
     annual_dividend, ex_dividend_date, pay_date) = [
        td.text.strip() for td in tr.find_all('td')]
    stocks[stock_symbol] = {
        'company_name': company_name,
        'dividend_yield': float(dividend_yield.rstrip('%')),
        'current_price': float(current_price.lstrip('$')),
        'annual_dividend': float(annual_dividend.lstrip('$')),
        'ex_dividend_date': ex_dividend_date,
        'pay_date': pay_date
    }

with open('stocks.json', 'w') as f:
    json.dump(stocks, f, indent=2)

网友

2楼 · 编辑于 2024-04-25 01:25:14

可以使用选择器查找特定表：

tables = soup.select("table:nth-of-type(3)")

我不知道为什么你的结果和网页上显示的顺序不同。在

网友

3楼 · 编辑于 2024-04-25 01:25:14

感谢@Barmar和@delicious莴苣发布了解决方案和代码。关于输出的顺序，我意识到每次刷新数据时，我都会看到按输出顺序排列的数据。然后我看到排序后的数据。尝试了几种不同的方法，我能够使用Selenium webdriver来像web呈现的那样提取数据。谢谢大家。在

BPT,BP Prudhoe Bay Royalty Trust,21.12%,$20.80,$4.39,4/11,4/20
PER,Sandridge Permian Trust,18.06%,$2.88,$0.52,5/10,5/26
CHKR,Chesapeake Granite Wash Trust,16.75%,$2.40,$0.40,5/18,6/1
NAT,Nordic American Tankers,13.33%,$6.00,$0.80,5/18,6/8
WIN,Windstream Corp,13.22%,$4.54,$0.60,6/28,7/17
NYMT,New York Mortgage Trust Inc,12.14%,$6.59,$0.80,6/22,7/25
IEP,Icahn Enterprises L.P.,11.65%,$51.50,$6.00,5/11,6/14
FTR,Frontier Communications,11.51%,$1.39,$0.16,6/13,6/30

相关问题更多 >

编程相关推荐

热门问题

热门文章