我的HTTP服务器接收来自单个客户端套接字的HTTP数据时,应该预期数据会乱序吗?

-1 投票
1 回答
38 浏览
提问于 2025-04-14 17:51

我正在实现自己的HTTP服务器:

import socket
import threading
import queue
import ssl
from manipulator.parser import LineBuffer,LoggableHttpRequest

class SocketServer:
    """
        Basic Socket Server in python
    """

    def __init__(self,host,port,max_threads,ssl_context:ssl.SSLContext=None):
        print("Create Server For Http")        

        self.host = host
        self.port = port
        self.server_socket = self.initSocket()
        self.max_threads = max_threads
        self.request_queue = queue.Queue()   

        self.ssl_context=None
        if(ssl_context != None):
            print("Initialise SSL context")        
            self.ssl_context = ssl_context

    def initSocket(self):
        return socket.socket(socket.AF_INET, socket.SOCK_STREAM)

   
    def __accept(self):
        self.server_socket.listen(5)
        while True:
            try:
                client_socket, client_address = self.server_socket.accept()
                
                if self.ssl_context is not None :
                    print(self.ssl_context)
                    client_socket = self.ssl_context.wrap_socket(client_socket, server_side=True)

                self.request_queue.put((client_socket, client_address))
            except:
                print("Error Occured")


    def __handle(self):
        while True:
            client_socket, address = self.request_queue.get()
            print("Address",address)
            
            try:
                # Read HTTP Request
                # Log Http Request
                # Manipulate Http Request
                # Forward or respond

                buffer = LineBuffer()
                request =  HttpRequest(self.db)

                buffer.pushData(client_socket.recv(2048))
                line = buffer.getLine()
                if(line is not None):
                    request.parse(line)

                content = '<html><body>Hello World</body></html>\r\n'.encode()
                headers = f'HTTP/1.1 200 OK\r\nContent-Length: {len(content)}\r\nContent-Type: text/html\r\n\r\n'.encode()
                client_socket.sendall(headers + content)
          
            finally:
                client_socket.shutdown(socket.SHUT_RDWR)
                client_socket.close()
                self.request_queue.task_done()


    def __initThreads(self):
        for _ in range(self.max_threads):
            threading.Thread(target=self.__handle, daemon=True).start()


    def start(self):
        self.server_socket.bind((self.host, self.port))
        self.__initThreads()
        self.__accept()

我这样做的原因是我想尽快记录和分析进来的HTTP请求。而且,很多第三方库需要C语言的绑定,我想避免这种情况。

到目前为止,我做了一个行分割器,可以把请求分割成\r\n:

class LineBuffer:

    def __init__(self):
        self.buffer = b''
    
    def pushData(self,line):
        self.buffer += str.encode(line)
    
    def getLine(self):
        if  b'\r\n' in self.buffer:
            line,sep,self.buffer = self.buffer.partition(b'\r\n')
            return line+sep
        return None

接下来,我想解析每一行,并把它们转化成一个表示HTTP请求的对象,这样我就可以以流的方式进一步处理它们:

class HttpRequest:
    
    def __init__(self,db):
        self.headers={} #ParsedHeaderrs
        self.body="" #Http Body
        self.version=None
        self.method=None
        self.id=None
        self.raw=""

class HttpParser:

    def __init__(self,db):
        self.db = db
        self.currentRequest=None
    
    def parse(line):
        # do parsing here
        return

我最担心的情况是,客户端会发送两个请求:

请求1:

GET / HTTP/1.1\r\n
HOST lala1.com \r\n

请求2:

POST /file HTTP/1.1\r\n
HOST lala2.com \r\n
\r\n
Qm9QUVM5NDMuLnEvXVN7O2E=
fDMpQjcpOlFodClgOGUzYQ==
NVgvNipmU1d3YFgtLFUhQiM=
MiZwSk0zKno9TkVxNyZFL3s=
NEhGJXZ7OGciOE8mYF5JNA==
dVlJLzpdKlUjXl4tcEpufQ==
XVgiXCdjQyckMjY/Ikt6Rw==
alksJlZ+XHFzQSYqaHlHIztt
YiRnPjdye0gvanV3ZGxaZkI=
MjgwTX0uYHw6M295RS52UDM=
YU0yQ2dQLmJUQVpCNS89PWJB
Ti10MHJBTjAqUFUlIU0sMyRN

但是我的服务器接收到的顺序是:

GET / HTTP/1.1\r\n
POST /file HTTP/1.1\r\n
HOST lala1.com \r\n
\r\n\r\nQm9QUVM5ND
HOST lala2.com \r\n
MuLnEvXVN7O2E=
fDMpQjcpOlFodClgOGUzYQ==
NVgvNipmU1d3YFgtLFUhQiM=
MiZwSk0zKno9TkVxNyZFL3s=
NEhGJXZ7OGciOE8mYF5JNA==
dVlJLzpdKlUjXl4tcEpufQ==
XVgiXCdjQyckMjY/Ikt6Rw==
alksJlZ+XHFzQSYqaHlHIztt
YiRnPjdye0gvanV3ZGxaZkI=
MjgwTX0uYHw6M295RS52UDM=
YU0yQ2dQLmJUQVpCNS89PWJB
Ti10MHJBTjAqUFUlIU0sMyRN
\r\n

在我的情况下,这种情况可行吗?还是说TCP套接字会自己处理数据的顺序?

1 个回答

1

在HTTP/1中,请求和响应是串行的,也就是说,在同一个TCP连接中,多个请求或响应不会交错出现,响应必须按照请求的顺序返回,并且都在同一个TCP连接上。

而在HTTP/2中情况就不同了,请求和响应被拆分成多个帧,这些帧可以在同一个TCP连接中交错发送。因此,多个请求和响应可以同时传输,响应的顺序不需要和请求的顺序一致。但是你现在的代码只支持HTTP/1,也就是说,它根本不尝试解析HTTP/2这种完全不同的格式,只能处理HTTP/1的响应。

有关协议的详细信息,请查看相关标准。

撰写回答