使用BeautifulSoup从表中提取彩色文本

<html> <body> <div class="alert alert-warning alert-dismissable" role="alert"> <div class="table-responsive"> <table class="table table-sm" align="center" cellpadding="0" cellspacing="0"> <tbody> <tr> <td width="24%"> <strong> <font color="red">Bakers Basin</font> </strong> </td> <td width="24%"> <strong>Oakland</strong> </td> ... ... ... </tr> </tbody> </table> </div> </div> </body> </html>

import urllib.request from bs4 import BeautifulSoup class Scraper: def __init__(self, site): self.site = site def scrape(self): r = urllib.request.urlopen(self.site) html = r.read() parser = "html.parser" soup = BeautifulSoup(html, parser) tabledmv = soup.find_all("font color=\"red\"") for tag in tabledmv: print("\n" + tabledmv.get_text()) website = "https://www.state.nj.us/mvc/" Scraper(website).scrape()

2条回答

网友
1楼 · 编辑于 2024-05-15 04:02:05

该表实际上是从this站点加载的
要仅获取红色文本，您可以使用CSS选择器soup.select('font[color="red"]')，正如@Mr.Polywhill所提到的：
import urllib.request from bs4 import BeautifulSoup class Scraper: def __init__(self, site): self.site = site def scrape(self): r = urllib.request.urlopen(self.site) html = r.read() parser = "html.parser" soup = BeautifulSoup(html, parser) tabledmv = soup.select('font[color="red"]')[1:] for tag in tabledmv: print(tag.get_text()) website = "https://www.state.nj.us/mvc/locations/agency.htm" Scraper(website).scrape()

网友
2楼 · 编辑于 2024-05-15 04:02:05

数据从其他位置加载，在本例中为'https://www.state.nj.us/mvc/locations/agency.htm'。要获取每个城镇的城镇+标题，可以使用以下示例：
import requests from bs4 import BeautifulSoup url = 'https://www.state.nj.us/mvc/locations/agency.htm' soup = BeautifulSoup(requests.get(url).content, 'html.parser') for t in soup.select('td:has(font)'): i = t.find_previous('tr').select('td').index(t) if i < 2: print('{:<20} {}'.format(' '.join(t.text.split()), 'Licensing Centers')) else: print('{:<20} {}'.format(' '.join(t.text.split()), 'Vehicle Centers'))
印刷品：
Bakers Basin Licensing Centers Cherry Hill Vehicle Centers Springfield Vehicle Centers Bayonne Licensing Centers Paterson Licensing Centers East Orange Vehicle Centers Trenton Vehicle Centers Rahway Licensing Centers Hazlet Vehicle Centers Turnersville Vehicle Centers Jersey City Vehicle Centers Wallington Vehicle Centers Delanco Licensing Centers Lakewood Vehicle Centers Washington Vehicle Centers Eatontown Licensing Centers Edison Licensing Centers Toms River Licensing Centers Newton Vehicle Centers Freehold Licensing Centers Runnemede Vehicle Centers Newark Licensing Centers S. Brunswick Vehicle Centers

相关问题更多 >

编程相关推荐

热门问题

热门文章