美女群找到所有标题

2024-04-30 02:10:44 发布

您现在位置:Python中文网/ 问答频道 /正文

html是

<div class="trn-defstat__value">
    <img src="https://trackercdn.com/rainbow6-ubi/assets/images/badge-ash.16913d82e3.png" title="ASH" style="height:    35px; padding-right: 8px;"> 
    <img src="https://trackercdn.com/rainbow6-ubi/assets/images/badge-jager.600b2773be.png" title="JÄGER"   style="height: 35px; padding-right: 8px;">
    <img src="https://trackercdn.com/rainbow6-ubi/assets/images/badge-bandit.385144d970.png" title="BANDIT"     style="height: 35px; padding-right: 8px;">
</div>

我想得到每个标题的值

但在此之前,我是这样写的

from bs4 import BeautifulSoup as bs
import requests


bsURL = "https://r6.tracker.network/profile/pc/Spoit.GODSENT"
respinse = requests.get(bsURL)
html = bs(respinse.text, 'html.parser')


title = html.find_all(class_='trn-defstat__value')[4]

print(title)

结果->

<div class="trn-defstat__value">
<img src="https://trackercdn.com/rainbow6-ubi/assets/images/badge-ash.16913d82e3.png" style="height: 35px; padding-right: 8px;" title="ASH"/>
<img src="https://trackercdn.com/rainbow6-ubi/assets/images/badge-jager.600b2773be.png" style="height: 35px; padding-right: 8px;" title="JÄGER"/>
<img src="https://trackercdn.com/rainbow6-ubi/assets/images/badge-bandit.385144d970.png" style="height: 35px; padding-right: 8px;" title="BANDIT"/>
</div>

我该怎么办


Tags: httpsbadgesrccomimgpngtitlestyle
3条回答

这将有助于你:

from bs4 import BeautifulSoup

html = """
<div class="trn-defstat__value">
    <img src="https://trackercdn.com/rainbow6-ubi/assets/images/badge-ash.16913d82e3.png" title="ASH" style="height:    35px; padding-right: 8px;"> 
    <img src="https://trackercdn.com/rainbow6-ubi/assets/images/badge-jager.600b2773be.png" title="JÄGER"   style="height: 35px; padding-right: 8px;">
    <img src="https://trackercdn.com/rainbow6-ubi/assets/images/badge-bandit.385144d970.png" title="BANDIT"     style="height: 35px; padding-right: 8px;">
</div>
"""
soup = BeautifulSoup(html,'html.parser')

imgs = soup.find_all('img')

for img in imgs:
    print(img['title'])

输出:

ASH
JÄGER
BANDIT

以下是完整的代码:

from bs4 import BeautifulSoup as bs
import requests

bsURL = "https://r6.tracker.network/profile/pc/Spoit.GODSENT"
respinse = requests.get(bsURL)
html = bs(respinse.text, 'html.parser')
divs = html.find_all('div',class_ = "trn-defstat__value")
imgs = []
for div in divs:
    try:
        imgs.append(div.find_all('img'))
    except:
        pass

imgs = [ele for ele in imgs if ele != []]
imgs = [j for sub in imgs for j in sub]

for img in imgs:
    print(img['title'])

输出:

ASH
JÄGER
BANDIT

此脚本将打印Top Operators部分的所有<img>标题:

from bs4 import BeautifulSoup as bs
import requests


bsURL = "https://r6.tracker.network/profile/pc/Spoit.GODSENT"
respinse = requests.get(bsURL)
html = bs(respinse.text, 'html.parser')

# find Top Operators tag
operators = html.find(class_='trn-defstat__name', text='Top Operators')

for img in operators.find_next('div').find_all('img'):
    print(img['title'])

印刷品:

ASH
JÄGER
BANDIT

或使用CSS:

for img in html.select('.trn-defstat__name:contains("Top Operators") + * img'):
    print(img['title'])

只需使用.get()函数获取属性并传入属性名称

pip install html5lib

我建议您使用它,我相信它是一个更好的解析器

from bs4 import BeautifulSoup as bs 
import requests   
bsURL = "https://r6.tracker.network/profile/pc/Spoit.GODSENT" 

respinse = requests.get(bsURL) 



html = bs(respinse.content, 'html5lib')   


container = html.find("div", class_= "trn-defstat mb0 top-operators")


imgs = container.find_all("img")


for img in imgs:
     print(img.get("title"))

我似乎不明白您要刮取的是站点的哪一部分,但请注意它,以便有时首先获得block的html代码,其中包含您要刮取的详细信息:)

相关问题 更多 >