使用Beautifulsoup和Mechaniz解析元素的ref属性值

<div class="rc" data-hveid="53"> <h3 class="r"> <a href="https://billing.anapp.com/" onmousedown="return rwt(this,'','','','2','AFQjCNGqpb38ftdxRdYvKwOsUv5EOJAlpQ','m3fly0i1VLOK9NJkV55hAQ','0CDYQFjAB','','',event)">Billing: Portal Home</a> </h3>

1条回答

网友

1楼 · 发布于 2024-05-16 15:31:42

from bs4 import BeautifulSoup

html = """
<div class="rc" data-hveid="53">
<h3 class="r">
<a href="https://billing.anapp.com/" onmousedown="return rwt(this,'','','','2','AFQjCNGqpb38ftdxRdYvKwOsUv5EOJAlpQ','m3fly0i1VLOK9NJkV55hAQ','0CDYQFjAB','','',event)">Billing: Portal Home</a>
</h3>
"""

bs = BeautifulSoup(html)
elms = bs.select("h3.r a")
for i in elms:
    print(i.attrs["href"])

印刷品：

https://billing.anapp.com/

h3.r a是一个css selector

您可以使用css选择器（我更喜欢）、xpath或find-in元素。选择器h3.r a将查找具有类r的所有h3，并从其中获取a元素。它可能是一个更复杂的例子，比如#an_id table tr.the_tr_class td.the_td_class它会在给定的类中找到一个属于tr的给定td的id，当然它也在一个表中。

这也会给你同样的结果。find_all返回bs4.element.Tag的列表，find_all有一个递归字段不确定是否可以在一行中完成，我个人更喜欢css选择器，因为它简单而干净。

for elm in  bs.find_all('h3',attrs={'class': 'r'}):
    for a_elm in elm.find_all("a"):
        print(a_elm.attrs["href"])

相关问题更多 >

编程相关推荐

热门问题

热门文章