下面这段Python代码做什么？就像带括号的列表推导一样。

import sys import re import urllib2 import urlparse tocrawl = [sys.argv[1]] crawled = [] keywordregex = re.compile('<meta\sname=["\']keywords["\']\scontent=["\'](.*?)["\']\s/>') linkregex = re.compile('<a\s(?:.*?\s)*?href=[\'"](.*?)[\'"].*?>') while 1: crawling = tocrawl.pop(0) response = urllib2.urlopen(crawling) msg = response.read() keywordlist = keywordregex.findall(msg) crawled.append(crawling) links = linkregex.findall(msg) url = urlparse.urlparse(crawling) a = (links.pop(0) for _ in range(len(links))) //What does this do? for link in a: if link.startswith('/'): link = 'http://' + url[1] + link elif link.startswith('#'): link = 'http://' + url[1] + url[2] + link elif not link.startswith('http'): link = 'http://' + url[1] + '/' + link if link not in crawled: tocrawl.append(link)

3条回答

网友

1楼 · 编辑于 2024-05-16 22:27:16

它创建一个生成器，将对象从链接列表中删除。在

解释：

range(len(links))返回从0到链接列表长度（但不包括）的数字列表。因此，如果链接包含[ "www.yahoo.com", "www.google.com", "www.python.org" ]，那么它将生成一个列表[0，1，2]。在

for _ in blah，只需循环列表，丢弃结果。在

links.pop(0)从链接中删除第一项。在

从一个表达式的头部返回一个链接。在

最后，在python控制台中演示：

>>> links = [ "www.yahoo.com", "www.google.com", "www.python.org "]
>>> a = (links.pop(0) for _ in range(len(links)))
>>> a.next()
'www.yahoo.com'
>>> links
['www.google.com', 'www.python.org ']
>>> a.next()
'www.google.com'
>>> links
['www.python.org ']
>>> a.next()
'www.python.org '
>>> links
[]

网友

2楼 · 编辑于 2024-05-16 22:27:16

它是一个generator expression，当您迭代列表时，它会清空列表links。在

他们本可以换掉这个零件的

a = (links.pop(0) for _ in range(len(links))) //What does this do?

for link in a:

有了这个：

^{pr2}$

它也会起到同样的作用。但是，由于从列表的末尾跳出来更有效，所以这比两种方法都要好：

links.reverse()
while links:
    link = links.pop()

{如果后面的链接不是按顺序排列的话，为什么不按顺序处理呢。在

网友

3楼 · 编辑于 2024-05-16 22:27:16

a = (links.pop(0) for _ in range(len(links)))

也可以写成：

^{pr2}$

编辑：

唯一的区别是，当使用生成器时，它是懒洋洋地完成的，因此只有当通过a请求时，项目才会从链接中弹出。而不是一次弹出，当处理大量数据时，它的效率要高得多，如果不使用高级python函数，则无法做到这一点。在

相关问题更多 >

编程相关推荐

热门问题

热门文章