Python基础问题:如何使用urllib.request.urlretrieve下载多个URL?

3 投票
4 回答
7680 浏览
提问于 2025-04-16 23:19

我有以下这段完全可以正常工作的代码:

import urllib.request
import zipfile

url = "http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop"
filename = "C:/test/archive.zip"
destinationPath = "C:/test"

urllib.request.urlretrieve(url,filename)
sourceZip = zipfile.ZipFile(filename, 'r')

for name in sourceZip.namelist():
    sourceZip.extract(name, destinationPath)
sourceZip.close()

它运行得很好几次,但因为我从中获取文件的服务器有一些限制,所以当我达到每天的下载限制时,就会出现这个错误:

Traceback (most recent call last):
  File "script.py", line 11, in <module>
    urllib.request.urlretrieve(url,filename)
  File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python32\lib\urllib\request.py", line 1591, in retrieve
    block = fp.read(bs)
ValueError: read of closed file

我该如何修改这个脚本,让它可以包含多个网址,而不是一个单独的网址,并且脚本会不断尝试从这个列表中下载,直到有一个成功为止,然后继续解压。我只需要一个成功的下载。

抱歉,我对Python还很陌生,但我搞不清楚该怎么做。我想我需要把变量改成这样:

url = {
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soe",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sod",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soc",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sob",
"http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2soa",
}

然后把这一行改成某种循环:

urllib.request.urlretrieve(url,filename)

4 个回答

0

如果你想要处理复杂的分布式任务,可以看看Celery,它有一个重试机制,详细信息可以在这里找到Celery-retry

或者你也可以了解一下Retry-decorator,下面是一个例子:

import time

# Retry decorator with exponential backoff
def retry(tries, delay=3, backoff=2):
  """Retries a function or method until it returns True.

  delay sets the initial delay, and backoff sets how much the delay should
  lengthen after each failure. backoff must be greater than 1, or else it
  isn't really a backoff. tries must be at least 0, and delay greater than
  0."""

  if backoff <= 1:
    raise ValueError("backoff must be greater than 1")

  tries = math.floor(tries)
  if tries < 0:
    raise ValueError("tries must be 0 or greater")

  if delay <= 0:
    raise ValueError("delay must be greater than 0")

  def deco_retry(f):
    def f_retry(*args, **kwargs):
      mtries, mdelay = tries, delay # make mutable

      rv = f(*args, **kwargs) # first attempt
      while mtries > 0:
        if rv == True: # Done on success
          return True

        mtries -= 1      # consume an attempt
        time.sleep(mdelay) # wait...
        mdelay *= backoff  # make future wait longer

        rv = f(*args, **kwargs) # Try again

      return False # Ran out of tries :-(

    return f_retry # true decorator -> decorated function
  return deco_retry  # @retry(arg[, ...]) -> true decorator
2
import urllib.request
import zipfile

urllist = ("http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop",
            "another",
            "yet another",
            "etc")

filename = "C:/test/test.zip"
destinationPath = "C:/test"

for url in urllist:
    try:
        urllib.request.urlretrieve(url,filename)
    except ValueError:
        continue
    sourceZip = zipfile.ZipFile(filename, 'r')

    for name in sourceZip.namelist():
        sourceZip.extract(name, destinationPath)
    sourceZip.close()
    break

这段话的意思是,如果你只是想试一下每个选项,直到找到一个有效的,然后就停止,那这样做是可以的。

3

你想把你的网址放到一个列表里,然后一个一个地去尝试这些网址。遇到错误的时候,你可以捕捉到这些错误,但不去处理它们,一旦有一个网址成功了,就停止尝试。可以试试下面的代码:

import urllib.request
import zipfile

urls = ["http://url.com/archive.zip?key=7UCxcuCzFpYeu7tz18JgGZFAAgXQ2sop", "other url", "another url"]
filename = "C:/test/test.zip"
destinationPath = "C:/test"

for url in urls:
    try:
        urllib.request.urlretrieve(url,filename)
        sourceZip = zipfile.ZipFile(filename, 'r')
        break
    except ValueError:
        pass

for name in sourceZip.namelist():
    sourceZip.extract(name, destinationPath)
sourceZip.close()

撰写回答