Python基础问题:如何使用urllib.request.urlretrieve下载多个URL?
我有以下这段完全可以正常工作的代码:
import urllib.request
import zipfile
url = "http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop"
filename = "C:/test/archive.zip"
destinationPath = "C:/test"
urllib.request.urlretrieve(url,filename)
sourceZip = zipfile.ZipFile(filename, 'r')
for name in sourceZip.namelist():
sourceZip.extract(name, destinationPath)
sourceZip.close()
它运行得很好几次,但因为我从中获取文件的服务器有一些限制,所以当我达到每天的下载限制时,就会出现这个错误:
Traceback (most recent call last):
File "script.py", line 11, in <module>
urllib.request.urlretrieve(url,filename)
File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python32\lib\urllib\request.py", line 1591, in retrieve
block = fp.read(bs)
ValueError: read of closed file
我该如何修改这个脚本,让它可以包含多个网址,而不是一个单独的网址,并且脚本会不断尝试从这个列表中下载,直到有一个成功为止,然后继续解压。我只需要一个成功的下载。
抱歉,我对Python还很陌生,但我搞不清楚该怎么做。我想我需要把变量改成这样:
url = {
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soe",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sod",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soc",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sob",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soa",
}
然后把这一行改成某种循环:
urllib.request.urlretrieve(url,filename)
4 个回答
0
如果你想要处理复杂的分布式任务,可以看看Celery,它有一个重试机制,详细信息可以在这里找到Celery-retry
或者你也可以了解一下Retry-decorator,下面是一个例子:
import time
# Retry decorator with exponential backoff
def retry(tries, delay=3, backoff=2):
"""Retries a function or method until it returns True.
delay sets the initial delay, and backoff sets how much the delay should
lengthen after each failure. backoff must be greater than 1, or else it
isn't really a backoff. tries must be at least 0, and delay greater than
0."""
if backoff <= 1:
raise ValueError("backoff must be greater than 1")
tries = math.floor(tries)
if tries < 0:
raise ValueError("tries must be 0 or greater")
if delay <= 0:
raise ValueError("delay must be greater than 0")
def deco_retry(f):
def f_retry(*args, **kwargs):
mtries, mdelay = tries, delay # make mutable
rv = f(*args, **kwargs) # first attempt
while mtries > 0:
if rv == True: # Done on success
return True
mtries -= 1 # consume an attempt
time.sleep(mdelay) # wait...
mdelay *= backoff # make future wait longer
rv = f(*args, **kwargs) # Try again
return False # Ran out of tries :-(
return f_retry # true decorator -> decorated function
return deco_retry # @retry(arg[, ...]) -> true decorator
2
import urllib.request
import zipfile
urllist = ("http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop",
"another",
"yet another",
"etc")
filename = "C:/test/test.zip"
destinationPath = "C:/test"
for url in urllist:
try:
urllib.request.urlretrieve(url,filename)
except ValueError:
continue
sourceZip = zipfile.ZipFile(filename, 'r')
for name in sourceZip.namelist():
sourceZip.extract(name, destinationPath)
sourceZip.close()
break
这段话的意思是,如果你只是想试一下每个选项,直到找到一个有效的,然后就停止,那这样做是可以的。
3
你想把你的网址放到一个列表里,然后一个一个地去尝试这些网址。遇到错误的时候,你可以捕捉到这些错误,但不去处理它们,一旦有一个网址成功了,就停止尝试。可以试试下面的代码:
import urllib.request
import zipfile
urls = ["http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop", "other url", "another url"]
filename = "C:/test/test.zip"
destinationPath = "C:/test"
for url in urls:
try:
urllib.request.urlretrieve(url,filename)
sourceZip = zipfile.ZipFile(filename, 'r')
break
except ValueError:
pass
for name in sourceZip.namelist():
sourceZip.extract(name, destinationPath)
sourceZip.close()