Webscraper将

from urllib import urlopen import re f = urlopen("http://www.emergencyassistanceuk.co.uk/list-of-uk-police-stations.html").read() b = re.compile('<span class="listlink-police"><a href="(.*)">') a = re.findall(b, f) listiterator = [] listiterator[:] = range(0,16) for i in listiterator: print a print "\n" f.close()

3条回答

网友

1楼 · 编辑于 2024-04-26 00:18:04

使用BeautifulSoup

from bs4 import BeautifulSoup
from urllib2 import urlopen

f = urlopen("http://www.emergencyassistanceuk.co.uk/list-of-uk-police-stations.html").read()

bs = BeautifulSoup(f)

for tag in bs.find_all('span', {'class': 'listlink-police'}):
    print tag.a['href']

网友

2楼 · 编辑于 2024-04-26 00:18:04

有超过1.6k的链接与该类在它上面。在

我认为它工作正常。。。你凭什么认为它不起作用？在

你一定要用漂亮的汤，它愚蠢、简单，而且非常有用。在

网友

3楼 · 编辑于 2024-04-26 00:18:04

您正在使用regex解析HTML。你不应该，因为你最终只会遇到这种类型的问题。首先，.*通配符将尽可能多地匹配文本。但一旦你解决了这个问题，你就会从沮丧之树上摘下另一颗果实。改用一个合适的HTML解析器。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

Webscraper将

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >