如何在Python中通过代理使用urllib打开网站?

43 投票
4 回答
120634 浏览
提问于 2025-04-16 00:44

我有一个程序,它用来检查一个网站。我想知道怎么在Python中通过代理来检查这个网站……

这是代码,举个例子:

while True:
    try:
        h = urllib.urlopen(website)
        break
    except:
        print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
        time.sleep(5)

4 个回答

6

这里有一个示例代码,教你如何使用urllib通过代理进行连接:

authinfo = urllib.request.HTTPBasicAuthHandler()

proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})

# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
                                     urllib.request.CacheFTPHandler)

# install it
urllib.request.install_opener(opener)

f = urllib.request.urlopen('http://www.google.com/')
"""
59

Python 3 在这方面有点不同。它会尝试自动检测代理设置,但如果你需要特定的或手动的代理设置,可以考虑使用这样的代码:

#!/usr/bin/env python3
import urllib.request

proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:pass@server:port', 
                                             'https': 'https://...'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)

with urllib.request.urlopen(url) as response:
    # ... implement things such as 'html = response.read()'

另外,您可以参考 Python 3 文档中的相关部分

56

默认情况下,urlopen会使用环境变量http_proxy来决定使用哪个HTTP代理:

$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py  # Using http://myproxy.example.com:1234 as a proxy

如果你想在你的应用程序中指定一个代理,可以给urlopen传递一个proxies参数:

proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)

编辑:如果我理解你的评论没错的话,你想尝试几个代理,并在尝试每个代理时打印出来。这样做怎么样?

candidate_proxies = ['http://proxy1.example.com:1234',
                     'http://proxy2.example.com:1234',
                     'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
    print("Trying HTTP proxy %s" % proxy)
    try:
        result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
        print("Got URL using proxy %s" % proxy)
        break
    except:
        print("Trying next proxy in 5 seconds")
        time.sleep(5)

撰写回答