如何在不发送或接收的情况下检测套接字挂起?
我正在写一个TCP服务器,有些请求的响应生成可能需要15秒或更长时间。一些客户端如果响应时间超过几秒就会在他们那边关闭连接。
因为生成响应的过程非常消耗CPU资源,所以我希望在客户端关闭连接的瞬间就停止这个任务。目前,我发现客户端关闭连接的情况是在我发送第一个数据包后才知道,这时会收到各种错误信息。
我该如何在不发送或接收任何数据的情况下,检测到对方已经关闭了连接?这意味着对于recv
来说,所有数据都还在内核中;而对于send
来说,实际上没有数据被传输。
7 个回答
socket的KEEPALIVE选项可以帮助我们发现“连接断了但对方不知道”的情况。
你应该在SOL_SOCKET级别设置SO_KEEPALIVE选项。在Linux系统中,你可以通过TCP_KEEPIDLE(发送保持连接探测前的等待时间,单位是秒)、TCP_KEEPCNT(在认为对方断开之前,允许失败的保持连接探测次数)和TCP_KEEPINTVL(保持连接探测之间的时间间隔,单位是秒)来调整每个socket的超时时间。
在Python中:
import socket
...
s.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPIDLE, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPINTVL, 1)
s.setsockopt(socket.SOL_TCP, socket.TCP_KEEPCNT, 5)
netstat -tanop
命令可以显示socket是否处于保持连接模式:
tcp 0 0 127.0.0.1:6666 127.0.0.1:43746 ESTABLISHED 15242/python2.6 keepalive (0.76/0/0)
而tcpdump
命令则可以显示保持连接探测的情况:
01:07:08.143052 IP localhost.6666 > localhost.43746: . ack 1 win 2048 <nop,nop,timestamp 848683438 848683188>
01:07:08.143084 IP localhost.43746 > localhost.6666: . ack 1 win 2050 <nop,nop,timestamp 848683438 848682438>
01:07:09.143050 IP localhost.6666 > localhost.43746: . ack 1 win 2048 <nop,nop,timestamp 848683688 848683438>
01:07:09.143083 IP localhost.43746 > localhost.6666: . ack 1 win 2050 <nop,nop,timestamp 848683688 848682438>
select模块里有你需要的东西。如果你只需要在Linux上使用,并且你的内核版本足够新,使用select.epoll()
就能得到你想要的信息。大多数Unix系统都支持select.poll()
。
如果你需要跨平台支持,标准的方法是用select.select()
来检查套接字是否有数据可以读取。如果有数据,但recv()
返回的是零字节,那就说明对方已经挂断了。
我觉得Beej的网络编程指南写得很好(虽然是为C语言写的,但大部分内容对标准的套接字操作都适用),而套接字编程入门也提供了不错的Python概述。
编辑:下面是一个简单服务器的例子,它可以排队处理传入的命令,但一旦发现连接在远程端关闭,就会停止处理。
import select
import socket
import time
# Create the server.
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind((socket.gethostname(), 7557))
serversocket.listen(1)
# Wait for an incoming connection.
clientsocket, address = serversocket.accept()
print 'Connection from', address[0]
# Control variables.
queue = []
cancelled = False
while True:
# If nothing queued, wait for incoming request.
if not queue:
queue.append(clientsocket.recv(1024))
# Receive data of length zero ==> connection closed.
if len(queue[0]) == 0:
break
# Get the next request and remove the trailing newline.
request = queue.pop(0)[:-1]
print 'Starting request', request
# Main processing loop.
for i in xrange(15):
# Do some of the processing.
time.sleep(1.0)
# See if the socket is marked as having data ready.
r, w, e = select.select((clientsocket,), (), (), 0)
if r:
data = clientsocket.recv(1024)
# Length of zero ==> connection closed.
if len(data) == 0:
cancelled = True
break
# Add this request to the queue.
queue.append(data)
print 'Queueing request', data[:-1]
# Request was cancelled.
if cancelled:
print 'Request cancelled.'
break
# Done with this request.
print 'Request finished.'
# If we got here, the connection was closed.
print 'Connection closed.'
serversocket.close()
使用方法是运行这个脚本,然后在另一个终端用telnet连接到localhost,端口7557。我做的一个示例运行的输出,排队了三个请求,但在处理第三个请求时关闭了连接:
Connection from 127.0.0.1
Starting request 1
Queueing request 2
Queueing request 3
Request finished.
Starting request 2
Request finished.
Starting request 3
Request cancelled.
Connection closed.
epoll替代方案
另一个编辑:我又写了一个例子,使用select.epoll
来监控事件。我觉得这个例子并没有比原来的例子好多少,因为我看不出有什么方法可以在远程端挂断时接收到事件。你仍然需要监控接收到的数据事件,并检查消息的长度是否为零(我希望我在这方面是错的)。
import select
import socket
import time
port = 7557
# Create the server.
serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serversocket.bind((socket.gethostname(), port))
serversocket.listen(1)
serverfd = serversocket.fileno()
print "Listening on", socket.gethostname(), "port", port
# Make the socket non-blocking.
serversocket.setblocking(0)
# Initialise the list of clients.
clients = {}
# Create an epoll object and register our interest in read events on the server
# socket.
ep = select.epoll()
ep.register(serverfd, select.EPOLLIN)
while True:
# Check for events.
events = ep.poll(0)
for fd, event in events:
# New connection to server.
if fd == serverfd and event & select.EPOLLIN:
# Accept the connection.
connection, address = serversocket.accept()
connection.setblocking(0)
# We want input notifications.
ep.register(connection.fileno(), select.EPOLLIN)
# Store some information about this client.
clients[connection.fileno()] = {
'delay': 0.0,
'input': "",
'response': "",
'connection': connection,
'address': address,
}
# Done.
print "Accepted connection from", address
# A socket was closed on our end.
elif event & select.EPOLLHUP:
print "Closed connection to", clients[fd]['address']
ep.unregister(fd)
del clients[fd]
# Error on a connection.
elif event & select.EPOLLERR:
print "Error on connection to", clients[fd]['address']
ep.modify(fd, 0)
clients[fd]['connection'].shutdown(socket.SHUT_RDWR)
# Incoming data.
elif event & select.EPOLLIN:
print "Incoming data from", clients[fd]['address']
data = clients[fd]['connection'].recv(1024)
# Zero length = remote closure.
if not data:
print "Remote close on ", clients[fd]['address']
ep.modify(fd, 0)
clients[fd]['connection'].shutdown(socket.SHUT_RDWR)
# Store the input.
else:
print data
clients[fd]['input'] += data
# Run when the client is ready to accept some output. The processing
# loop registers for this event when the response is complete.
elif event & select.EPOLLOUT:
print "Sending output to", clients[fd]['address']
# Write as much as we can.
written = clients[fd]['connection'].send(clients[fd]['response'])
# Delete what we have already written from the complete response.
clients[fd]['response'] = clients[fd]['response'][written:]
# When all the the response is written, shut the connection.
if not clients[fd]['response']:
ep.modify(fd, 0)
clients[fd]['connection'].shutdown(socket.SHUT_RDWR)
# Processing loop.
for client in clients.keys():
clients[client]['delay'] += 0.1
# When the 'processing' has finished.
if clients[client]['delay'] >= 15.0:
# Reverse the input to form the response.
clients[client]['response'] = clients[client]['input'][::-1]
# Register for the ready-to-send event. The network loop uses this
# as the signal to send the response.
ep.modify(client, select.EPOLLOUT)
# Processing delay.
time.sleep(0.1)
注意:这只检测正常的关闭。如果远程端只是停止监听而没有发送正确的消息,你就不会知道,直到你尝试写入时出现错误。检查这一点留给读者自己去做。此外,你可能还想对整个循环进行一些错误检查,以便在内部出现问题时,服务器能够优雅地关闭。
我遇到一个反复出现的问题,就是和一些设备通信时,它们的发送和接收使用了不同的TCP连接。基本问题是,TCP协议在你尝试读取数据时,通常不会告诉你一个套接字(socket)已经关闭。你必须尝试写入数据,才能知道连接的另一端已经断开。部分原因是TCP的设计就是这样(读取是被动的)。
我猜Blair的回答适用于那些另一端已经正常关闭套接字的情况(也就是说,他们发送了正确的断开连接消息),但不适用于另一端不礼貌地停止监听的情况。
你的消息开头是否有一个固定格式的头部,可以在整个响应准备好之前先发送?比如说,一个XML文档类型声明?另外,你是否可以在消息的某些地方发送一些额外的空格——也就是一些空数据,以确保套接字仍然是打开的?