如何高效测试访问特定域的HTTP代理?
我需要一个高效的方法来测试一些免费的在线HTTP代理,看看哪些可以访问特定的网站。
由于测试代理需要等待很长时间,我决定重新设计我的代码,采用异步测试的方法。我研究了httpx和aiohttp这两个包。但是,我遇到了一些意想不到的问题,这让我开始怀疑我现在的代码是否真的适合我的需求。
下面是我使用的三种方法的代码输出:
- 一种是使用requests包进行同步测试,
- 另外两种是进行异步测试。
从结果来看,有几个错误,而且每个请求完成的时间差别很大。有趣的是,requests方法对四个链接返回了HTTP 200状态,而httpx方法返回了五个,aiohttp方法则什么都没返回,这让我感到意外,因为它们本来应该执行相同的任务。这让我对我的实现方式产生了疑问。
另外,在httpx方法中,有一个代理的响应时间异常长,尽管我设置了60秒的超时。结果却花了13,480.64秒(我应该提到,在测试期间,当我发现时间太长时,我把电脑设置成了睡眠模式。等我回来时,发现这个过程还在继续运行,并没有停止。)
有没有人能告诉我我哪里做错了,以及我该如何改进?
1) --> 185.XXX.XX.XX:80 --> ProxyError (4.96s)
2) --> 38.XX.XXX.XXX:443 --> HTTP (200) (2.50s)
3) --> 162.XXX.XX.XXX:80 --> HTTP (200) (20.92s)
4) --> 18.XXX.XXX.XXX:8080 --> HTTP (200) (0.61s)
5) --> 31.XX.XX.XX:50687 --> ConnectionError (7.88s)
6) --> 177.XX.XXX.XXX:80 --> ProxyError (21.07s)
7) --> 8.XXX.XXX.X:4153 --> HTTP (200) (4.96s)
8) --> 146.XX.XXX.XXX:12334 --> ProxyError (21.05s)
9) --> 67.XX.XXX.XXX:33081 --> ProxyError (3.03s)
10) --> 37.XXX.XX.XX:80 --> ReadTimeout (60.16s)
Testing 10 proxies with "requests" took 147.16 seconds.
4) --> 18.XXX.XXX.XXX:8080 --> HTTP (200) (16.09s)
2) --> 38.XX.XXX.XXX:443 --> HTTP (200) (22.11s)
7) --> 8.XXX.XXX.X:4153 --> HTTP (200) (12.96s)
1) --> 185.XXX.XX.XX:80 --> RemoteProtocolError (24.83s)
9) --> 67.XX.XXX.XXX:33081 --> ConnectError (6.02s)
3) --> 162.XXX.XX.XXX:80 --> HTTP (200) (22.48s)
6) --> 177.XX.XXX.XXX:80 --> HTTP (200) (26.96s)
5) --> 31.XX.XX.XX:50687 --> ConnectError (34.50s)
8) --> 146.XX.XXX.XXX:12334 --> ConnectError (27.01s)
10) --> 37.XXX.XX.XX:80 --> ReadError (13480.64s)
Testing 10 proxies with "httpx" took 13507.80 seconds.
1) --> 185.XXX.XX.XX:80 --> ClientProxyConnectionError (1.30s)
2) --> 38.XX.XXX.XXX:443 --> ClientProxyConnectionError (0.67s)
3) --> 162.XXX.XX.XXX:80 --> ClientProxyConnectionError (0.77s)
4) --> 18.XXX.XXX.XXX:8080 --> ClientProxyConnectionError (0.83s)
5) --> 31.XX.XX.XX:50687 --> ClientProxyConnectionError (0.85s)
6) --> 177.XX.XXX.XXX:80 --> ClientProxyConnectionError (0.91s)
7) --> 8.XXX.XXX.X:4153 --> ClientProxyConnectionError (0.94s)
8) --> 146.XX.XXX.XXX:12334 --> ClientProxyConnectionError (1.03s)
9) --> 67.XX.XXX.XXX:33081 --> ClientProxyConnectionError (1.05s)
10) --> 37.XXX.XX.XX:80 --> ClientProxyConnectionError (0.62s)
Testing 10 proxies with "aiohttp" took 2.42 seconds.
这是我使用的代码:
我首先从这个GitHub仓库下载了代理:
import random
import tempfile
import os
import requests
import time
import asyncio
import httpx
import aiohttp
TIMEOUT: int = 60
DEFAULT_DOMAIN: str = r"www.desired.domain.com"
PROXIES_URL: str = "https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/http.txt"
PROXIES_PATH: str = os.path.join(tempfile.gettempdir(), "httpProxies.txt")
HEADERS: dict = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "en,ar;q=0.9,fr;q=0.8",
"Accept-Encoding": "gzip, deflate",
"dnt": "1",
"referer": "https://www.google.com/",
"sec-ch-ua": '"Microsoft Edge";v="123", "Not:A-Brand";v="8", "Chromium";v="123"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "cross-site",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0",
"Connection": "close", # "keep-alive",
}
def get_proxies() -> list[str]:
proxies: list[str] = []
if os.path.exists(PROXIES_PATH):
with open(file=PROXIES_PATH, mode="r") as file:
proxies = file.read().splitlines()
file.close()
else:
response = requests.request(method="GET", url=PROXIES_URL)
if response.status_code == 200:
proxies = response.text
with open(file=PROXIES_PATH, mode="w") as file:
file.write(proxies)
file.close()
proxies = proxies.split("\n")
return proxies
下面是我用来顺序测试这些代理的方法:
def sequential_test(proxies_list: list[str]):
if proxies_list:
with requests.Session() as session:
session.headers = HEADERS
for i, proxy in enumerate(proxies_list, 1):
session.proxies = {"http": f"http://{proxy}"}
try:
color = "\033[91m"
start = time.perf_counter()
response = session.get(url=f"http://{DEFAULT_DOMAIN}", timeout=TIMEOUT)
status = f"HTTP ({response.status_code})"
if response.status_code == 200:
color = "\033[92m"
except Exception as exception: # requests.RequestException
status = type(exception).__name__
print(f"{i:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)")
以下是我用来测试代理是否能正常工作与目标网站的代码。我分别使用了httpx和aiohttp:
async def is_alive_httpx(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
proxy_mounts = {"http://": httpx.AsyncHTTPTransport(proxy=f"http://{proxy}"),}
async with httpx.AsyncClient(
mounts=proxy_mounts,
timeout=TIMEOUT,
headers=HEADERS,
follow_redirects=True
) as session:
try:
color = "\033[91m"
start = time.perf_counter()
response = await session.send(httpx.Request(method="GET", url=f"http://{domain}"))
status = f"HTTP ({response.status_code})"
if response.status_code == 200:
color = "\033[92m"
except Exception as exception: # httpx.HTTPError
status = type(exception).__name__
print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)"
async def is_alive_aiohttp(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
try:
async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=TIMEOUT), headers=HEADERS, trust_env=True,
connector=aiohttp.TCPConnector(ssl_context=None, force_close=True, limit_per_host=5)) as client:
color = "\033[91m"
start = time.perf_counter()
response = await client.get(url=f"http://{domain}", proxy=f"http://{proxy}")
status = f"HTTP ({response.status})"
if response.status == 200:
color = "\033[92m"
except Exception as exception: # aiohttp.ClientError
status = type(exception).__name__
print(status + ":", exception)
finally:
await client.close()
print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:26}\t({time.perf_counter()-start:.2f}s)")
await asyncio.sleep(0.3)
下面是代码的其余部分。你可以直接复制到你的环境中运行(只需确保安装了所需的包):
async def test_proxies(proxies_list: list[str], func):
if proxies_list:
await asyncio.gather(*[func(ip[0], ip[1]) for ip in enumerate(proxies_list, 1)])
def main():
proxies = random.sample(get_proxies(), 10) # get_proxies()[:10]
start = time.perf_counter()
sequential_test(proxies)
print(f'\nTesting {len(proxies)} proxies with "requests" took {time.perf_counter()-start:.2f} seconds.\n')
start = time.perf_counter()
asyncio.run(test_proxies(proxies, is_alive_httpx))
print(f'\nTesting {len(proxies)} proxies with "httpx" took {time.perf_counter()-start:.2f} seconds.\n')
start = time.perf_counter()
asyncio.run(test_proxies(proxies, is_alive_aiohttp))
print(f'\nTesting {len(proxies)} proxies with "aiohttp" took {time.perf_counter()-start:.2f} seconds.\n')
if __name__ == "__main__":
main()
这里是我在使用aiohttp时经常遇到的一些错误,例如:
- ClientProxyConnectionError: 无法连接到主机 ssl:default
- [信号量超时已过]
- [远程计算机拒绝了网络连接]
- ClientResponse:
- [409 冲突]
- [407 需要代理身份验证]
- ClientOSError:
- [WinError 64] 指定的网络名称不再可用
- [WinError 1236] 网络连接被本地系统中止
- ServerDisconnectedError: 服务器已断开连接。
1 个回答
1
要找出你在aiohttp代码中出错的原因,重要的是要打印出完整的错误信息,而不仅仅是错误的名称。
print(exception)
在这段代码中,会打印出关于出错的详细信息。