增加每秒的请求量

2024-06-16 09:37:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试增加每秒的请求量。我目前正在运行Python2.7,每秒大约能收到1个请求。我需要多线程/多处理函数还是异步运行函数的多个实例。我不知道该怎么做。请帮忙:-)

while True:
    r = requests.post(url, allow_redirects=False, data={
        str(formDataNameLogin): username,
        str(formDataNamePass): password,
    })

    print 'Sending username: %s with password %s' % (username, password)

Tags: 实例函数falsetrueurldatausernamepassword
2条回答

您可以使用多线程执行多个类似的并行请求:

import Queue
import threading
import time
import requests

exit_flag = 0

class RequestThread(threading.Thread):
    def __init__(self, thread_id, name, q):
        threading.Thread.__init__(self)
        self.thread_id = thread_id
        self.name = name
        self.q = q
    def run(self):
        print("Starting {0:s}".format(self.name))
        process_data(self.name, self.q)
        print("Exiting {0:s}".format(self.name))

def process_data(thread_name, q):
    while not exit_flag:
        queue_lock.acquire()
        if not qork_queue.empty():
            data = q.get()
            queue_lock.release()
            print("{0:s} processing {1:s}".format(thread_name, data))
            response = requests.get(data)
            print(response)
        else:
            queue_lock.release()
        time.sleep(1)

thread_list = ["Thread-1", "Thread-2", "Thread-3"]
request_list = [
    "https://api.github.com/events",
    "http://api.plos.org/search?q=title:THREAD",
    "http://api.plos.org/search?q=title:DNA",
    "http://api.plos.org/search?q=title:PYTHON",
    "http://api.plos.org/search?q=title:JAVA"
]
queue_lock = threading.Lock()
qork_queue = Queue.Queue(10)
threads = []
thread_id = 1

# Create new threads
for t_name in thread_list:
    thread = RequestThread(thread_id, t_name, qork_queue)
    thread.start()
    threads.append(thread)
    thread_id += 1

# Fill the queue
queue_lock.acquire()
for word in request_list:
    qork_queue.put(word)
queue_lock.release()

# Wait for queue to empty
while not qork_queue.empty():
    pass

# Notify threads it's time to exit
exit_flag = 1

# Wait for all threads to complete
for t in threads:
    t.join()

print("Exiting Main Thread")

输出:

^{pr2}$

尽管我不是多线程专家,但还是有一点解释:

1.排队

Queue模块允许您创建一个新的队列对象,该对象可以保存特定数量的项目。有以下方法可以控制队列:

  • get()−从队列中删除并返回项目。在
  • put()−将项目添加到队列。 qsize()−返回当前在队列中的项目数。在
  • empty()−如果队列为空,则返回True;否则返回False。在
  • full()−如果队列已满,则返回True;否则返回False。在

根据我对多线程处理的一点经验,这对于控制仍要处理的数据非常有用。我有这样的情况,线程在做同样的事情,或者除了一个线程都退出了。这有助于我控制要处理的共享数据。在

2.锁定

Python提供的线程模块包含一个易于实现的locking mechanism,它允许您同步线程。通过调用Lock()方法创建一个新锁,该方法返回新锁。在

A primitive lock is in one of two states, “locked” or “unlocked”. It is created in the unlocked state. It has two basic methods, acquire() and release(). When the state is unlocked, acquire() changes the state to locked and returns immediately. When the state is locked, acquire() blocks until a call to release() in another thread changes it to unlocked, then the acquire() call resets it to locked and returns. The release() method should only be called in the locked state; it changes the state to unlocked and returns immediately. If an attempt is made to release an unlocked lock, a ThreadError will be raised.

对于更多的人类语言锁是线程模块提供的最基本的同步机制。在任何时候,锁可以由单个线程持有,也可以完全不由线程持有。如果一个线程试图持有另一个线程已经持有的锁,那么第一个线程的执行将被暂停,直到该锁被释放。在

锁通常用于同步对共享资源的访问。对于每个共享资源,创建一个锁对象。当您需要访问资源时,调用acquire来保持锁(如果需要,这将等待锁释放),然后调用release来释放它。在

3.线程

要使用线程模块实现新线程,必须执行以下操作:

  • 定义Thread类的新子类。在
  • 重写init(self[,args])方法以添加其他参数。在
  • 然后,重写run(self[,args])方法来实现线程在启动时应该执行的操作。在

一旦创建了新的Thread子类,就可以创建它的一个实例,然后通过调用start()来启动一个新线程,后者又调用run()方法。方法:

  • run()–方法是线程的入口点。在
  • start()–方法通过调用run方法来启动线程。在
  • join([time])−等待线程终止。在
  • isAlive()–方法检查线程是否仍在执行。在
  • getName()−返回线程的名称。在
  • setName()−设置线程的名称。在

它真的更快吗?

使用单线程:

$ time python single.py 
Processing request url: https://api.github.com/events
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:THREAD
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:DNA
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:PYTHON
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:JAVA
<Response [200]>
Exiting Main Thread

real    0m22.310s
user    0m0.096s
sys 0m0.022s

使用3个螺纹:

Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-3 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
<Response [200]>
<Response [200]>
<Response [200]>
Thread-1 processing http://api.plos.org/search?q=title:PYTHON
Thread-2 processing http://api.plos.org/search?q=title:JAVA
Exiting Thread-3
<Response [200]>
<Response [200]>
Exiting Thread-1
 Exiting Thread-2
Exiting Main Thread

real    0m11.726s
user    0m6.692s
sys 0m0.028s

使用5个螺纹:

time python multi.py 
Starting Thread-1
Starting Thread-2
Starting Thread-3
 Starting Thread-4
Starting Thread-5
Thread-5 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
Thread-3 processing http://api.plos.org/search?q=title:PYTHONThread-4 processing http://api.plos.org/search?q=title:JAVA

<Response [200]>
<Response [200]>
 <Response [200]>
<Response [200]>
<Response [200]>
Exiting Thread-5
Exiting Thread-4
Exiting Thread-2
Exiting Thread-3
Exiting Thread-1
Exiting Main Thread

real    0m6.446s
user    0m1.104s
sys 0m0.029s

5个线程几乎快4倍。这些只是5个虚拟请求。想象一下更大的数据块。在

请注意:我只在Python2.7下针对Python3.x进行了测试,可能需要进行一些小的调整。在

只需使用任何异步库。我认为异步版本的请求,如grequest、txrequests、requests futures和requests线程最适合您。下面是grequests自述文件中的代码示例:

import grequests

urls = [
    'http://www.heroku.com',
    'http://python-tablib.org',
    'http://httpbin.org',
    'http://python-requests.org',
    'http://fakedomain/',
    'http://kennethreitz.com'
]

Create a set of unsent Requests:

rs = (grequests.get(u) for u in urls)

Send them all at the same time:

grequests.map(rs)

使用或学习其他提到的模块,比如请求线程,可能会稍微涉及一些,尤其是在Python2中

from twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react
from requests_threads import AsyncSession

session = AsyncSession(n=100)

@inlineCallbacks
def main(reactor):
    responses = []
    for i in range(100):
        responses.append(session.get('http://httpbin.org/get'))

    for response in responses:
        r = yield response
        print(r)

if __name__ == '__main__':
    react(main)

asyncio和{a3}可能更值得注意,但是,我想,学习一个已经熟悉的模块的版本会更容易。在

多线程是不必要的,但是您可以尝试mutithreading或者,也许更好的是多进程处理,看看哪个性能最好。在

相关问题 更多 >