当你想要抓取一个网页时，如果目标标签被省略号隐藏了怎么办？

url = 'https://www.jisilu.cn/data/cbnew/cb_index/' txt = requests.get(url) txt.raise_for_status() txt.encoding = 'utf-8' soup = BeautifulSoup(txt.text, "html.parser") body = soup.find('body') div1 = body.find('div', attrs = {'class': 'grid data_content'}) div2 = div1.find_all('div', attrs = {'class': 'grid-row'})[1] td = div2.find('td', attrs = {'valign': 'top'}) div3 = td.find('div', attrs = {'id': 'cb_index'}) div3

2条回答

网友

1楼 · 编辑于 2024-04-25 19:53:50

迭代<a>以从子跨度获取所有数据

或者使用element.encode_contents()

<a title="点击查看明细" href="/data/cbnew/cb_index/" target="_blank">
转债等权指数：
<span style="color:red;">1265.389↑&nbsp;&nbsp;&nbsp;&nbsp;<span title="涨跌">+7.360</span>&nbsp;&nbsp;&nbsp;&nbsp;
<span title="涨幅">+0.590%</span></span>&nbsp;&nbsp;&nbsp;&nbsp;
平均价格 <span title="平均价格">120.295</span>&nbsp;&nbsp;&nbsp;&nbsp;转股溢价率 <span title="平均转股溢价率">21.20%</span>&nbsp;&nbsp;&nbsp;&nbsp;到期收益率 
<span title="平均到期收益率">-0.91%</span></a>

网友

2楼 · 编辑于 2024-04-25 19:53:50

当我遇到这个问题时，我通常用find_all()替换select()函数

url = 'https://www.jisilu.cn/data/cbnew/cb_index/'
txt = requests.get(url)
txt.raise_for_status()
txt.encoding = 'utf-8'
soup = BeautifulSoup(txt.text, "html.parser")
body = soup.find('body')
div1 = body.find('div', attrs = {'class': 'grid data_content'})
div2 = div1.find_all('div', attrs = {'class': 'grid-row'})[1]
for tds in div2:
    td = div2.select('td')
    for td in tds:
        div3 = td.get('id')
        if div3=='cb_index':
           #some more code

它稍微长一点，但我发现它通常对我有效，没有我注意到的时间差

如果您注意到可能是因为HTML元素的某些部分丢失，则考虑将下面的代码的修改版本添加到隐藏的任何元素

div2.select("input[type=hidden]")

希望这有帮助

相关问题更多 >

编程相关推荐

热门问题

热门文章