用靓汤提取特定链接

2024-06-16 10:55:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从下面的HTML代码中提取特定的链接

 <div class="RadAjaxPanel" id="LiveBoard1_LiveBoard1_litGamesPanel">
<br /><b><a href="winss.aspx?team=White Sox&pos=all&stats=bat&qual=0&type=8&season=2018&month=0&season1=2018">White Sox</a></b> @ <b><a href="winss.aspx?team=Athletics&pos=all&stats=bat&qual=0&type=8&season=2018&month=0&season1=2018">Athletics</a></b>&nbsp;&nbsp;15:35 ET<br /><center><table style="width:360px;"><tr><td align="center" width="120.07295665741px" style="border:1px solid black;">33.4 %</td><td align="center" width="239.92704334259px" style="border:1px solid black;">66.6 %</td></tr><table></center><br /><center><table style="width:360px;" class="lineup"><tr><td align="left">SP: <a href="statss.aspx?playerid=18311&position=P">Carson Fulmer</a></td><td align="left">SP: <a href="statss.aspx?playerid=13533&position=P">Andrew Triggs</a></td></tr><tr><td align="left">1. <a href="statss.aspx?playerid=17232&position=2B">Yoan Moncada</a> (2B)<br />2. <a href="statss.aspx?playerid=11602&position=2B">Yolmer Sanchez</a> (3B)<br />3. <a href="statss.aspx?playerid=15676&position=1B">Jose Abreu</a> (DH)<br />4. <a href="statss.aspx?playerid=13157&position=OF">Nick Delmonico</a> (LF)<br />5. <a href="statss.aspx?playerid=7226&position=3B/DH">Matt Davidson</a> (1B)<br />6. <a href="statss.aspx?playerid=5913&position=OF">Leury Garcia</a> (RF)<br />7. <a href="statss.aspx?playerid=3256&position=C">Welington Castillo</a> (C)<br />8. <a href="statss.aspx?playerid=15172&position=SS">Tim Anderson</a> (SS)<br />9. <a href="statss.aspx?playerid=15082&position=OF">Adam Engel</a> (CF)<br /></td>

我希望最后的提取包括球队名称,在本例中,是田径队和白袜队,以及相应的获胜概率(33.4%和66.6%)。我可以提取所有这些链接使用美丽的汤,但我不能删除列表链接。我注意到所有的列表链接都以“statss”开头。在提取页面上的所有链接时,有没有办法告诉beautiful soup分解“statss”链接?我的当前代码显示在下面。如您所知,我尝试使用decompose函数来查找class=lineup,但是输出仍然返回整个列表。提前感谢您的帮助

import requests
from bs4 import BeautifulSoup

page=requests.get('https://www.fangraphs.com/livescoreboard.aspx?date=2018- 
04-18')
soup=BeautifulSoup(page.text, 'html.parser')

#Remove Lineup Links
lineup_links=soup.find(class_='lineup')
lineup_links.decompose()

team_name_list=soup.find(class_='RadAjaxPanel')
team_name_list_items=team_name_list.find_all('a')


for team_name in team_name_list_items:
 print(team_name.prettify())


odds_list=soup.find(class_='RadAjaxPanel')
odds_list_items=odds_list.find_all('td',attrs={'style':'border:1px solid 
black;'})

for odds in odds_list_items:
 print(odds.prettify())

Tags: namebr链接stylepositionteamlistclass
1条回答
网友
1楼 · 发布于 2024-06-16 10:55:43

似乎你要删除的是第一个实例,而不是每一个。尝试循环链接并逐个分解,如:

#Remove Lineup Links
[link.decompose() for link in soup.find_all(class_='lineup')] 

相关问题 更多 >