Python循环下载多个文件

#!/usr/bin/env python3.1 import urllib.request; # Disguise as a Mozila browser on a Windows OS userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'; URL = "www.example.com/img"; req = urllib.request.Request(URL, headers={'User-Agent' : userAgent}); # Counter for the filename. i = 0; while True: fname = str(i).zfill(3) + '.png'; req.full_url = URL + fname; f = open(fname, 'wb'); try: response = urllib.request.urlopen(req); except: break; else: f.write(response.read()); i+=1; response.close(); finally: f.close();

>>> response = urllib.request.urlopen(req); Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python3.1/urllib/request.py", line 121, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python3.1/urllib/request.py", line 356, in open response = meth(req, response) File "/usr/lib/python3.1/urllib/request.py", line 468, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib/python3.1/urllib/request.py", line 394, in error return self._call_chain(*args) File "/usr/lib/python3.1/urllib/request.py", line 328, in _call_chain result = func(*args) File "/usr/lib/python3.1/urllib/request.py", line 476, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 403: Forbidden

import urllib.request; # Disguise as a Mozila browser on a Windows OS userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'; # Counter for the filename. i = 0; while True: fname = str(i).zfill(3) + '.png'; URL = "www.example.com/img" + fname; f = open(fname, 'wb'); try: req = urllib.request.Request(URL, headers={'User-Agent' : userAgent}); response = urllib.request.urlopen(req); except: break; else: f.write(response.read()); i+=1; response.close(); finally: f.close();

3条回答

网友

1楼 · 编辑于 2024-05-15 13:20:34

对于只需要进行一个或两个网络交互的小脚本来说，urllib2是很好的，但是如果您要做更多的工作，您可能会发现，^{}，或者^{}（这并不是巧合地建立在前者的基础上）可能更适合您的需要。您的特定示例可能如下所示：

from itertools import count
import requests

HEADERS = {'user-agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
URL = "http://www.example.com/img%03d.png"

# with a session, we get keep alive
session = requests.session()

for n in count():
    full_url = URL % n
    ignored, filename = URL.rsplit('/', 1)

    with file(filename, 'wb') as outfile:
        response = session.get(full_url, headers=HEADERS)
        if not response.ok:
            break
        outfile.write(response.content)

编辑：如果可以使用常规HTTP身份验证（强烈建议使用403 Forbidden响应），则可以使用auth参数将其添加到requests.get中，如下所示：

response = session.get(full_url, headers=HEADERS, auth=('username','password))

网友

2楼 · 编辑于 2024-05-15 13:20:34

当你收到一个异常时不要打断。改变

except:
    break

到

except:
    #Probably should log some debug information here.
    pass

这将跳过所有有问题的请求，这样就不会影响整个过程。

网友

3楼 · 编辑于 2024-05-15 13:20:34

如果要对每个请求使用自定义用户代理，可以将FancyURLopener子类化。

下面是一个例子：http://wolfprojects.altervista.org/changeua.php

相关问题更多 >

编程相关推荐

热门问题

热门文章