如何使用Python BeautifulSoup从某个html类中获取href

2024-06-17 13:19:49 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的HTML代码soup。它已经是个漂亮的物体了

<center>

<!--[if lt IE 7]>
 <style type="text/css">
 div, img { behavior: url(http://www.addic7ed.com/js/iepngfix.htc) }
 </style>
<![endif]-->
<br /><center>
<!--Iframe Tag  -->

<!-- begin ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->

<iframe src="http://d2.zedo.com/jsc/d2/ff2.html?n=2051;c=59;s=22;d=14;w=728;h=90" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" allowtransparency="true" width="728" height="90"></iframe>

<!-- end ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
</center><br /><div id="container"> 
        <table class="tabel70" border="0"><tr><!-- table header --><td class="tablecorner"><img src="http://www.addic7ed.com/images/tl.gif" /></td>
                <td></td>
                <td class="tablecorner"><img src="http://www.addic7ed.com/images/tr.gif" /></td>
            </tr><tr><td></td>
                <td>
<form action="/search.php" method="get">
<div align="center">
<input name="search" type="text" id="search" size="50" value="nikita 03x02" class="inputCool" />&#160;
 <input name="Submit" type="submit" class="coolBoton" value="Search" /><br /><b>1 results found</b> </div><br /><center><br /><form action="https://www.paypal.com/cgi-bin/webscr" method="post">
    <input type="hidden" name="cmd" value="_s-xclick" /><input type="hidden" name="hosted_button_id" value="EC7EPAVR5MXV6" /><input type="image" src="https://www.paypal.com/en_US/i/btn/btn_donateCC_LG.gif" border="0" name="submit" alt="PayPal - The safer, easier way to pay online!" /><img alt="" border="0" src="https://www.paypal.com/en_US/i/scr/pixel.gif" width="1" height="1" /></form> <br /></center>
<br /><center>
<!--Iframe Tag  -->

<!-- begin ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->

<iframe src="http://d2.zedo.com/jsc/d2/ff2.html?n=2051;c=59;s=22;d=14;w=728;h=90" frameborder="0" marginheight="0" marginwidth="0" scrolling="no" allowtransparency="true" width="728" height="90"></iframe>

<!-- end ZEDO for channel:  Addic7ed 728x90 , publisher: Addic7ed , Ad Dimension: Super Banner - 728 x 90 -->
</center>
<br /><table class="tabel" align="center" width="80%" border="0"><tr><td><img src="images/television.png" /></td><td><a href="serie/Nikita/3/2/Innocence" debug="68217">Nikita - 03x02 - Innocence</a></td></tr><tr><p>
</p><p>
</p></tr></table></form></td>
                <td></td>
            </tr><tr><!-- table footer --><td class="tablecorner"><img src="http://www.addic7ed.com/images/bl.gif" /></td>
                <td></td>
                <td class="tablecorner"><img src="http://www.addic7ed.com/images/br.gif" /></td>
            </tr></table></div>

我想使用beauthoulsoup和python从class=tabel获取href(ie“serie/Nikita/3/2/innocenty”)

目前我可以用

^{pr2}$

但这似乎有点复杂。有没有更简单(pyhonic)的方法来获取这个url?在

干杯


Tags: brdivsrccomhttpimgwwwtype
1条回答
网友
1楼 · 发布于 2024-06-17 13:19:49

试试这个-

page = urllib2.urlopen(url).read()
link_pat = SoupStrainer('a')
links = BeautifulSoup(page, parseOnlyThese=link_pat)
for link in links:
    url = link['href'].strip('/')

相关问题 更多 >