如何高效测试访问特定域的HTTP代理?

1 投票
1 回答
60 浏览
提问于 2025-04-12 03:40

我需要一个高效的方法来测试一些免费的在线HTTP代理,看看哪些可以访问特定的网站。

由于测试代理需要等待很长时间,我决定重新设计我的代码,采用异步测试的方法。我研究了httpx和aiohttp这两个包。但是,我遇到了一些意想不到的问题,这让我开始怀疑我现在的代码是否真的适合我的需求。

下面是我使用的三种方法的代码输出:

  • 一种是使用requests包进行同步测试,
  • 另外两种是进行异步测试。

从结果来看,有几个错误,而且每个请求完成的时间差别很大。有趣的是,requests方法对四个链接返回了HTTP 200状态,而httpx方法返回了五个,aiohttp方法则什么都没返回,这让我感到意外,因为它们本来应该执行相同的任务。这让我对我的实现方式产生了疑问。

另外,在httpx方法中,有一个代理的响应时间异常长,尽管我设置了60秒的超时。结果却花了13,480.64秒(我应该提到,在测试期间,当我发现时间太长时,我把电脑设置成了睡眠模式。等我回来时,发现这个过程还在继续运行,并没有停止。)

有没有人能告诉我我哪里做错了,以及我该如何改进?

 1) --> 185.XXX.XX.XX:80     --> ProxyError      (4.96s)
 2) --> 38.XX.XXX.XXX:443    --> HTTP (200)      (2.50s)
 3) --> 162.XXX.XX.XXX:80    --> HTTP (200)      (20.92s)
 4) --> 18.XXX.XXX.XXX:8080  --> HTTP (200)      (0.61s)
 5) --> 31.XX.XX.XX:50687    --> ConnectionError (7.88s)
 6) --> 177.XX.XXX.XXX:80    --> ProxyError      (21.07s)
 7) --> 8.XXX.XXX.X:4153     --> HTTP (200)      (4.96s)
 8) --> 146.XX.XXX.XXX:12334 --> ProxyError      (21.05s)
 9) --> 67.XX.XXX.XXX:33081  --> ProxyError      (3.03s)
10) --> 37.XXX.XX.XX:80      --> ReadTimeout     (60.16s)
Testing 10 proxies with "requests" took 147.16 seconds.


 4) --> 18.XXX.XXX.XXX:8080  --> HTTP (200)          (16.09s)
 2) --> 38.XX.XXX.XXX:443    --> HTTP (200)          (22.11s)
 7) --> 8.XXX.XXX.X:4153     --> HTTP (200)          (12.96s)
 1) --> 185.XXX.XX.XX:80     --> RemoteProtocolError (24.83s)
 9) --> 67.XX.XXX.XXX:33081  --> ConnectError        (6.02s)
 3) --> 162.XXX.XX.XXX:80    --> HTTP (200)          (22.48s)
 6) --> 177.XX.XXX.XXX:80    --> HTTP (200)          (26.96s)
 5) --> 31.XX.XX.XX:50687    --> ConnectError        (34.50s)
 8) --> 146.XX.XXX.XXX:12334 --> ConnectError        (27.01s)
10) --> 37.XXX.XX.XX:80      --> ReadError           (13480.64s)
Testing 10 proxies with "httpx" took 13507.80 seconds.


 1) --> 185.XXX.XX.XX:80     --> ClientProxyConnectionError  (1.30s)
 2) --> 38.XX.XXX.XXX:443    --> ClientProxyConnectionError  (0.67s)
 3) --> 162.XXX.XX.XXX:80    --> ClientProxyConnectionError  (0.77s)
 4) --> 18.XXX.XXX.XXX:8080  --> ClientProxyConnectionError  (0.83s)
 5) --> 31.XX.XX.XX:50687    --> ClientProxyConnectionError  (0.85s)
 6) --> 177.XX.XXX.XXX:80    --> ClientProxyConnectionError  (0.91s)
 7) --> 8.XXX.XXX.X:4153     --> ClientProxyConnectionError  (0.94s)
 8) --> 146.XX.XXX.XXX:12334 --> ClientProxyConnectionError  (1.03s)
 9) --> 67.XX.XXX.XXX:33081  --> ClientProxyConnectionError  (1.05s)
10) --> 37.XXX.XX.XX:80      --> ClientProxyConnectionError  (0.62s)
Testing 10 proxies with "aiohttp" took 2.42 seconds.

这是我使用的代码:

我首先从这个GitHub仓库下载了代理:

import random
import tempfile
import os
import requests
import time
import asyncio
import httpx
import aiohttp

TIMEOUT: int = 60
DEFAULT_DOMAIN: str = r"www.desired.domain.com"
PROXIES_URL: str = "https://raw.githubusercontent.com/TheSpeedX/SOCKS-List/master/http.txt"
PROXIES_PATH: str = os.path.join(tempfile.gettempdir(), "httpProxies.txt")
HEADERS: dict = {
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "en,ar;q=0.9,fr;q=0.8",
    "Accept-Encoding": "gzip, deflate",
    "dnt": "1",
    "referer": "https://www.google.com/",
    "sec-ch-ua": '"Microsoft Edge";v="123", "Not:A-Brand";v="8", "Chromium";v="123"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"Windows"',
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "cross-site",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Edg/123.0.0.0",
    "Connection": "close",  # "keep-alive",
}

def get_proxies() -> list[str]:
    proxies: list[str] = []
    if os.path.exists(PROXIES_PATH):
        with open(file=PROXIES_PATH, mode="r") as file:
            proxies = file.read().splitlines()
            file.close()
    else:
        response = requests.request(method="GET", url=PROXIES_URL)
        if response.status_code == 200:
            proxies = response.text
            with open(file=PROXIES_PATH, mode="w") as file:
                file.write(proxies)
                file.close()
            proxies = proxies.split("\n")
    return proxies

下面是我用来顺序测试这些代理的方法:

def sequential_test(proxies_list: list[str]):
    if proxies_list:
        with requests.Session() as session:
            session.headers = HEADERS
            for i, proxy in enumerate(proxies_list, 1):
                session.proxies = {"http": f"http://{proxy}"}
                try:
                    color = "\033[91m"
                    start = time.perf_counter()
                    response = session.get(url=f"http://{DEFAULT_DOMAIN}", timeout=TIMEOUT)
                    status = f"HTTP ({response.status_code})"
                    if response.status_code == 200:
                        color = "\033[92m"
                except Exception as exception:  # requests.RequestException
                    status = type(exception).__name__
                print(f"{i:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)")

以下是我用来测试代理是否能正常工作与目标网站的代码。我分别使用了httpx和aiohttp:

async def is_alive_httpx(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
    proxy_mounts = {"http://": httpx.AsyncHTTPTransport(proxy=f"http://{proxy}"),}
    async with httpx.AsyncClient(
        mounts=proxy_mounts,
        timeout=TIMEOUT,
        headers=HEADERS,
        follow_redirects=True
    ) as session:
        try:
            color = "\033[91m"
            start = time.perf_counter()
            response = await session.send(httpx.Request(method="GET", url=f"http://{domain}"))
            status = f"HTTP ({response.status_code})"
            if response.status_code == 200:
                color = "\033[92m"
        except Exception as exception:  # httpx.HTTPError
            status = type(exception).__name__
        print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:20}\t({time.perf_counter()-start:.2f}s)"
async def is_alive_aiohttp(index: int, proxy: str, domain: str = DEFAULT_DOMAIN) -> None:
    try:
        async with aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=TIMEOUT), headers=HEADERS, trust_env=True,
                                         connector=aiohttp.TCPConnector(ssl_context=None, force_close=True, limit_per_host=5)) as client:
            color = "\033[91m"
            start = time.perf_counter()
            response = await client.get(url=f"http://{domain}", proxy=f"http://{proxy}")
            status = f"HTTP ({response.status})"
            if response.status == 200:
                color = "\033[92m"
    except Exception as exception:  # aiohttp.ClientError
        status = type(exception).__name__
        print(status + ":", exception)
    finally:
        await client.close()
    print(f"{index:>2}) --> {color+proxy:30}\033[0m --> {status:26}\t({time.perf_counter()-start:.2f}s)")
    await asyncio.sleep(0.3)

下面是代码的其余部分。你可以直接复制到你的环境中运行(只需确保安装了所需的包):

async def test_proxies(proxies_list: list[str], func):
    if proxies_list:
        await asyncio.gather(*[func(ip[0], ip[1]) for ip in enumerate(proxies_list, 1)])


def main():
    proxies = random.sample(get_proxies(), 10)  # get_proxies()[:10]

    start = time.perf_counter()
    sequential_test(proxies)
    print(f'\nTesting {len(proxies)} proxies with "requests" took {time.perf_counter()-start:.2f} seconds.\n')

    start = time.perf_counter()
    asyncio.run(test_proxies(proxies, is_alive_httpx))
    print(f'\nTesting {len(proxies)} proxies with "httpx" took {time.perf_counter()-start:.2f} seconds.\n')

    start = time.perf_counter()
    asyncio.run(test_proxies(proxies, is_alive_aiohttp))
    print(f'\nTesting {len(proxies)} proxies with "aiohttp" took {time.perf_counter()-start:.2f} seconds.\n')


if __name__ == "__main__":
    main()

这里是我在使用aiohttp时经常遇到的一些错误,例如:

  • ClientProxyConnectionError: 无法连接到主机 ssl:default
    • [信号量超时已过]
    • [远程计算机拒绝了网络连接]
  • ClientResponse:
    • [409 冲突]
    • [407 需要代理身份验证]
  • ClientOSError:
    • [WinError 64] 指定的网络名称不再可用
    • [WinError 1236] 网络连接被本地系统中止
  • ServerDisconnectedError: 服务器已断开连接。

1 个回答

1

要找出你在aiohttp代码中出错的原因,重要的是要打印出完整的错误信息,而不仅仅是错误的名称。

print(exception)

在这段代码中,会打印出关于出错的详细信息。

撰写回答