当任一线程完成任务时终止多个线程

33 投票

5 回答

44685 浏览

提问于 2025-04-16 19:13

我刚接触Python和线程编程。我写了一段Python代码，用来做网页爬虫，搜索网站上的特定关键词。我的问题是，怎样才能用线程同时运行我这个类的三个不同实例？当其中一个实例找到关键词时，所有三个实例都要关闭，停止爬取网页。以下是一些代码。

class Crawler:
      def __init__(self):
            # the actual code for finding the keyword 

 def main():  
        Crawl = Crawler()

 if __name__ == "__main__":
        main()

我该如何使用线程，让爬虫同时进行三次不同的爬取呢？

多线程网页爬虫关键词搜索实例管理线程编程

5 个回答

首先，如果你刚接触Python，我不建议你现在就去研究线程。先熟悉这门语言，然后再去学习多线程。

说到这里，如果你的目标是让程序同时运行（你提到的“同时运行”），你需要知道在Python中（或者说在默认的实现版本CPython中），多个线程实际上不会真正并行运行，即使有多个处理器核心可用。想了解更多，可以去看看GIL（全局解释器锁）的相关内容。

最后，如果你还是想继续学习，可以查看Python文档中的线程模块。我觉得Python的文档非常好，有很多例子和解释，作为参考非常合适。

回答于 2025-04-16 由 Python大师

分享举报

启动一个线程其实很简单：

thread = threading.Thread(function_to_call_inside_thread)
thread.start()

创建一个事件对象，用来通知你什么时候完成：

event = threading.Event()
event.wait() # call this in the main thread to wait for the event
event.set() # call this in a thread when you are ready to stop

一旦事件触发，你需要在你的爬虫里添加停止的方法。

for crawler in crawlers:
    crawler.stop()

然后调用线程的 join 方法。

thread.join() # waits for the thread to finish

如果你经常做这种编程，建议你看看 eventlet 模块。它可以让你写出“多线程”的代码，同时避免很多线程编程的缺点。

回答于 2025-04-16 由 Python大师

分享举报

在Python中，似乎没有一个（简单的）方法可以终止一个线程。

这里有一个简单的例子，展示了如何同时运行多个HTTP请求：

import threading

def crawl():
    import urllib2
    data = urllib2.urlopen("http://www.google.com/").read()

    print "Read google.com"

threads = []

for n in range(10):
    thread = threading.Thread(target=crawl)
    thread.start()

    threads.append(thread)

# to wait until all three functions are finished

print "Waiting..."

for thread in threads:
    thread.join()

print "Complete."

如果你愿意增加一些复杂度，可以使用一种更强大的方法——多进程，这种方法允许你终止类似线程的进程。

我已经扩展了这个例子来使用这种方法。希望对你有帮助：

import multiprocessing

def crawl(result_queue):
    import urllib2
    data = urllib2.urlopen("http://news.ycombinator.com/").read()

    print "Requested..."

    if "result found (for example)":
        result_queue.put("result!")

    print "Read site."

processs = []
result_queue = multiprocessing.Queue()

for n in range(4): # start 4 processes crawling for the result
    process = multiprocessing.Process(target=crawl, args=[result_queue])
    process.start()
    processs.append(process)

print "Waiting for result..."

result = result_queue.get() # waits until any of the proccess have `.put()` a result

for process in processs: # then kill them all off
    process.terminate()

print "Got result:", result

回答于 2025-04-16 由 Python大师

分享举报

当任一线程完成任务时终止多个线程

5 个回答

撰写回答