如何在Python中使用线程？

3条回答

网友

1楼 · 编辑于 2024-04-23 09:19:03

这里有一个简单的例子：您需要尝试几个可选的url，并返回第一个响应的url的内容。

import Queue
import threading
import urllib2

# Called by each thread
def get_url(q, url):
    q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
    t = threading.Thread(target=get_url, args = (q,u))
    t.daemon = True
    t.start()

s = q.get()
print s

在这种情况下，线程被用作一个简单的优化：每个子线程都在等待一个URL来解析和响应，以便将其内容放到队列中；每个线程都是一个守护进程（如果主线程结束，则不会让进程继续运行，这比不结束更常见）；主线程启动所有子线程，在队列上执行get操作，直到其中一个执行了put，然后发出结果并终止（这会删除任何可能仍在运行的子线程，因为它们是守护进程线程）。

在Python中正确使用线程总是与I/O操作相关（因为CPython不使用多个内核来运行CPU绑定的任务，所以线程化的唯一原因是在等待某些I/O时不阻塞进程）。队列几乎总是将工作分配给线程和/或收集工作结果的最佳方法，顺便说一句，它们本质上是线程安全的，因此它们可以避免您担心锁、条件、事件、信号量和其他线程间协调/通信概念。

网友

2楼 · 编辑于 2024-04-23 09:19:03

注意：对于Python中的实际并行化，应该使用multiprocessing模块派生并行执行的多个进程（由于全局解释器锁，Python线程提供了交错，但它们实际上是串行执行的，而不是并行执行的，并且仅在交错I/O操作时才有用）。

但是，如果您只是在寻找交错（或者正在执行可以并行化的I/O操作，尽管存在全局解释器锁），那么threading模块就是开始的地方。作为一个非常简单的示例，让我们考虑通过并行求和子范围来求和大范围的问题：

import threading

class SummingThread(threading.Thread):
     def __init__(self,low,high):
         super(SummingThread, self).__init__()
         self.low=low
         self.high=high
         self.total=0

     def run(self):
         for i in range(self.low,self.high):
             self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join()  # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

请注意，上面的例子非常愚蠢，因为它完全不做I/O，并且由于全局解释器锁，它将在CPython中以串行方式执行，尽管它是交叉执行的（增加了上下文切换的开销）。

网友

3楼 · 编辑于 2024-04-23 09:19:03

自2010年提出这个问题以来，在如何使用Python使用map和pool进行简单的多线程处理方面有了真正的简化。

下面的代码来自一篇文章/博客文章，您一定要查看它（没有附属关系）-Parallelism in one line: A Better Model for Day to Day Threading Tasks。我将在下面总结一下-最后只是几行代码：

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

它是多线程版本：

results = []
for item in my_array:
    results.append(my_function(item))

说明

Map is a cool little function, and the key to easily injecting parallelism into your Python code. For those unfamiliar, map is something lifted from functional languages like Lisp. It is a function which maps another function over a sequence.
Map handles the iteration over the sequence for us, applies the function, and stores all of the results in a handy list at the end.

Enter image description here

实施

Parallel versions of the map function are provided by two libraries:multiprocessing, and also its little known, but equally fantastic step child:multiprocessing.dummy.

multiprocessing.dummy与多处理模块完全相同，but uses threads instead（an important distinction-将多个进程用于CPU密集型任务；用于（和在）I/O的线程：

multiprocessing.dummy replicates the API of multiprocessing, but is no more than a wrapper around the threading module.

import urllib2
from multiprocessing.dummy import Pool as ThreadPool

urls = [
  'http://www.python.org',
  'http://www.python.org/about/',
  'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
  'http://www.python.org/doc/',
  'http://www.python.org/download/',
  'http://www.python.org/getit/',
  'http://www.python.org/community/',
  'https://wiki.python.org/moin/',
]

# Make the Pool of workers
pool = ThreadPool(4)

# Open the URLs in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

以及计时结果：

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

传递多个参数（工作方式如下only in Python 3.3 and later）：

要传递多个数组：

results = pool.starmap(function, zip(list_a, list_b))

或传递常数和数组：

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))

如果使用的是早期版本的Python，则可以通过this workaround传递多个参数。

（感谢user136036提供的有用评论。）

相关问题更多 >

编程相关推荐

热门问题

热门文章