URL打开过程中的多线程

0 投票
1 回答
505 浏览
提问于 2025-04-17 21:24

我完成了一个脚本,这个脚本可以检查网址是否需要使用WWW的基本认证,并把结果打印出来,像这样:

#!/usr/bin/python

# Importing libraries
from urllib2 import urlopen, HTTPError
import socket
import urllib2
import threading
import time

# Setting up variables
url = open("oo.txt",'r')
response = None
start = time.time()

# Excuting Coommands
start = time.time()
for line in url:
    try:
        response = urlopen(line, timeout=1)
    except HTTPError as exc:
        # A 401 unauthorized will raise an exception
        response = exc
    except socket.timeout:
        print ("{0} | Request timed out !!".format(line))
    except urllib2.URLError:
        print ("{0} | Access error !!".format(line))

    auth = response and response.info().getheader('WWW-Authenticate')
    if auth and auth.lower().startswith('basic'):
        print "requires basic authentication"
    elif socket.timeout or urllib2.URLError:
        print "Yay"
    else:
        print "Not requires basic authentication"

print "Elapsed Time: %s" % (time.time() - start)

我有一些小问题需要你们帮忙修改这个脚本……我想让这个脚本一次检查10个网址,并把所有的网址结果一起写入一个文本文件。我读过关于多线程和处理的内容,但没有找到适合我这个情况的简单代码。

另外,当出现超时或网址错误时,结果会显示成两行,像这样:

http://www.test.test
 | Access error !!

我想要它显示在一行上,为什么会显示成两行呢?

有没有人能帮我解决这些问题?

提前谢谢你们!

1 个回答

1

这个包叫做 concurrent.futures,它让我们在Python中使用并发变得非常简单。你可以定义一个函数 check_url,这个函数会被用来处理每一个网址。然后,你可以使用 map 函数来同时对每个网址调用这个函数,并且可以遍历返回的结果。

#! /usr/bin/env python3

import concurrent.futures
import urllib.error
import urllib.request
import socket

def load_urls(pathname):
    with open(pathname, 'r') as f:
        return [ line.rstrip('\n') for line in f ]

class BasicAuth(Exception): pass

class CheckBasicAuthHandler(urllib.request.BaseHandler):
    def http_error_401(self, req, fp, code, msg, hdrs):
        if hdrs.get('WWW-Authenticate', '').lower().startswith('basic'):
            raise BasicAuth()
        return None

def check_url(url):
    try:
        opener = urllib.request.build_opener(CheckBasicAuthHandler())
        with opener.open(url, timeout=1) as u:
            return 'requires no authentication'
    except BasicAuth:
        return 'requires basic authentication'
    except socket.timeout:
        return 'request timed out'
    except urllib.error.URLError as e:
        return 'access error ({!r})'.format(e.reason)

if __name__ == '__main__':
    urls = load_urls('/tmp/urls.txt')
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        for url, result in zip(urls, executor.map(check_url, urls)):
            print('{}: {}'.format(url, result))

撰写回答