Pyhthon bs4获得流浪特克斯

2021-09-16 23:02:19 发布

您现在位置:Python中文网/ 问答频道 /正文

    <li><a class="atc-group atc-group-active" href="" data-url="/atc-kodlari/1">
                                    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>

                                    <span class="lists-rundown-no">(16)</span>
                                </a>
    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>




<span class="lists-rundown-no">(16)</span>
<a class="atc-group atc-group-active" href="" data-url="/atc-kodlari/1">
                                    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>
                                    HERE!!
                                    <span class="lists-rundown-no">(16)</span>
                                </a></li>

我要演写在这里的角色!!在python上使用漂亮的soup,但它是一个散乱的文本,所以它没有选择器或其他东西。有可能得到那个吗?你知道吗

我试过的。你知道吗

import requests
from bs4 import BeautifulSoup

r = requests.get('anywebsite')
source = BeautifulSoup(r.content,"lxml")

for child in source.select("#atc-wrapper > ul"):
    for child2 in child.findChildren():
        print(child2)
2条回答
网友
1楼 ·

根据显示的html,您可以使用next_sibling并更改css选择器

soup = bs(html, 'lxml')  
soup.select_one('.lists-rundown-no + a > i').next_sibling.strip() #source.select_one('.lists-rundown-no + a > i').next_sibling.strip()
网友
2楼 ·

您可以使用CSS选择器a:last-of-type i来选择最后一个元素<a>中的元素<i>。然后将find_next()与参数text=True一起使用:

data = '''    <li><a class="atc-group atc-group-active" href="" data-url="/atc-kodlari/1">
                                    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>
                                    A - Gastrointestinal kanal ve metabolizma
                                    <span class="lists-rundown-no">(16)</span>
                                </a>
    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>


                                    A - Gastrointestinal kanal ve metabolizma

<span class="lists-rundown-no">(16)</span>
<a class="atc-group atc-group-active" href="" data-url="/atc-kodlari/1">
                                    <i class="fa fa-lg fa-pulse fa-spinner atc-group-loading" style="margin-right: 5px; display: none;"></i>
                                    HERE!!
                                    <span class="lists-rundown-no">(16)</span>
                                </a></li>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'lxml')

# select last i
i = soup.select_one('a:last-of-type i')

# select next text
print(i.find_next(text=True).strip())

印刷品:

HERE!!

进一步阅读:

CSS Selectors Reference

相关问题