Python + Mechanize 异步任务

1 投票

1 回答

986 浏览

提问于 2025-04-16 08:43

我有一段Python代码，它可以访问一个网页并从中提取一些链接。这个提取的方法里有一些特别的技巧，可以把需要的内容抓取出来。不过，一个接一个地请求网页速度有点慢——有没有办法在Python中使用异步处理，这样我就可以同时发出多个请求，快速处理多个网页呢？

url= "http://www.delicious.com/search?p=varun"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
extract(soup)

count=1
#Follows regexp match onto consecutive pages
while soup.find ('a', attrs={'class': 'pn next'}):
    print "yay"
    print count
    endOfPage = "false"
    try :
        page3 = br.follow_link(text_regex="Next")
        html3 = page3.read()
        soup3 = BeautifulSoup(html3)
        extract(soup3)
    except:
        print "End of Pages"
        endOfPage = "true"
    if valval == "true":
        break
    count = count +1

网页抓取网络爬虫链接提取异步处理并发请求

1 个回答

Beautiful Soup的速度比较慢，如果你想要更好的性能，可以试试lxml这个库。还有，如果你的电脑有很多CPU，或许可以考虑使用多进程和队列来提高效率。

回答于 2025-04-16 由 Python大师

分享举报

Python + Mechanize 异步任务

1 个回答

撰写回答