在Python中使用BeautifulSoup解析数据

<html> <body> <div class="list-authors"> <span class="descriptor">Authors:</span> <a href="/find/astro-ph/1/au:+Lin_D/0/1/0/all/0/1">Dacheng Lin</a>, <a href="/find/astro-ph/1/au:+Remillard_R/0/1/0/all/0/1">Ronald A. Remillard</a>, <a href="/find/astro-ph/1/au:+Homan_J/0/1/0/all/0/1">Jeroen Homan</a> </div> <div class="list-authors"> <span class="descriptor">Authors:</span> <a href="/find/astro-ph/1/au:+Kosovichev_A/0/1/0/all/0/1">A.G. Kosovichev</a> </div>  </body> </html>

import re import urllib2,sys from BeautifulSoup import BeautifulSoup, NavigableString html = urllib2.urlopen(address).read() soup = BeautifulSoup(html) try: authordiv = soup.find('div', attrs={'class': 'list-authors'}) links=tds.findAll('a') for link in links: print ''.join(link[0].contents) #Iterate through entire page and print authors except IOError: print 'IO error'

2条回答

网友

1楼 · 编辑于 2024-05-23 16:39:21

因为link已经从iterable中获取，所以不需要子索引link——您只需执行link.contents[0]。

print link.contents[0]使用两个单独的例子<div class="list-authors">产生：

Dacheng Lin
Ronald A. Remillard
Jeroen Homan
A.G. Kosovichev

所以我不确定我是否理解关于搜索其他div的评论。如果它们是不同的类，则需要分别执行soup.find和soup.findAll，或者只修改第一个soup.find。

网友

2楼 · 编辑于 2024-05-23 16:39:21

只需使用findAll作为divs链接

对于soup.findAll（'div'，attrs={'class'：'list authors'}）中的authordiv：

相关问题更多 >

编程相关推荐

热门问题

热门文章