又一个“对端重置连接”的错误
我正在用Python的socket模块创建一个服务器/客户端应用程序,但不知为什么我的服务器总是断开连接。奇怪的是,这在Windows上运行得很好,但在Linux上却不行。我到处找解决办法,但没有一个有效。下面是一个简化版的代码,能更好地展示这个问题,虽然通常情况下它是无法工作的。希望这些信息足够了。谢谢!
服务器:
import logging
import socket
import threading
import time
def getData():
HOST = "localhost"
PORT = 5454
while True:
s = socket.socket( socket.AF_INET, socket.SOCK_STREAM )
s.setsockopt( socket.SOL_SOCKET, socket.SO_REUSEADDR, 1 ) #because linux doesn't like reusing addresses by default
s.bind( ( HOST, PORT ) )
logging.debug( "Server listens" )
s.listen( 5 )
conn, addr = s.accept()
logging.debug( "Client connects" )
print "Connected by,", addr
dataRequest = conn.recv( 1024 )
logging.debug( "Server received message" )
time.sleep( .01 ) #usually won't have to sample this fast
data = """Here is some data that is approximately the length
of the data that I am sending in my real server. It is a string that
doesn't contain any unordinary characters except for maybe a tab."""
if not timeThread.isAlive(): #lets client know test is over
data = "\t".join( [ data, "Terminate" ] )
conn.send( data )
s.close()
print "Finished"
print "Press Ctrl-C to quit"
break
else:
logging.debug( "Server sends data back to client" )
conn.send( data )
logging.debug( "Server closes socket" )
s.close()
def timer( t ):
start = time.time()
while ( time.time() - start ) < t:
time.sleep( .4 )
#sets flag for another thread not here
def main():
global timeThread
logging.basicConfig( filename="test.log", level=logging.DEBUG )
#time script runs for
t = 10 #usually much longer (hours)
timeThread = threading.Thread( target=timer, args=( t, ) )
dataThread = threading.Thread( target=getData, args=() )
timeThread.start()
dataThread.start()
#just for testing so I can quit threads when sockets break
while True:
time.sleep( .1 )
timeThread.join()
dataThread.join()
if __name__ == "__main__":
main()
客户端:
import logging
import socket
def getData():
dataList = []
termStr = "Terminate"
data = sendDataRequest()
while termStr not in data:
dataList.append( data )
data = sendDataRequest()
dataList.append( data[ :-len( termStr )-1 ] )
def sendDataRequest():
HOST = "localhost"
PORT = 5454
s = socket.socket( socket.AF_INET, socket.SOCK_STREAM )
while True:
try:
s.connect( ( HOST, PORT ) )
break
except socket.error:
print "Connecting to server..."
logging.debug( "Client sending message" )
s.send( "Hey buddy, I need some data" ) #approximate length
try:
logging.debug( "Client starts reading from socket" )
data = s.recv( 1024 )
logging.debug( "Client done reading" )
except socket.error, e:
logging.debug( "Client throws error: %s", e )
print data
logging.debug( "Client closes socket" )
s.close()
return data
def main():
logging.basicConfig( filename="test.log", level=logging.DEBUG )
getData()
if __name__ == "__main__":
main()
编辑:添加了错误追踪信息
Traceback (most recent call last):
File "client.py", line 39, in <moduel>
main()
File "client.py", line 36, in main
getData()
File "client.py", line 10, in getData
data = sendDataRequest()
File "client.py", line 28, in sendDataRequest
data = s.recv( 1024 )
socket.error: [Errno 104] Connection reset by peer
编辑:添加了调试信息
DEBUG:root:Server listens
DEBUG:root:Client sending message
DEBUG:root:Client connects
DEBUG:root:Client starts reading from socket
DEBUG:root:Server received message
DEBUG:root:Server sends data back to client
DEBUG:root:Server closes socket
DEBUG:root:Client done reading
DEBUG:root:Server listens
DEBUG:root:Client sending message
DEBUG:root:Client connects
DEBUG:root:Client starts reading from socket
DEBUG:root:Server received message
DEBUG:root:Server sends data back to client
DEBUG:root:Client done reading
DEBUG:root:Client sending message
DEBUG:root:Client starts reading from socket
DEBUG:root:Server closes socket
DEBUG:root:Client throws error: [Errno 104] Connection reset by peer
DEBUG:root:Server listens
汤姆的理论看起来是对的。我会尝试找出更好的关闭连接的方法。
这个问题还没有解决,但被接受的答案似乎指出了问题所在。
编辑:我尝试使用汤姆的getData()函数,但看起来服务器还是太早关闭了连接。应该是可以重复出现的问题,因为我在Windows上也没能让它工作。
服务器输出/错误追踪:
Connected by, ('127.0.0.1', 51953)
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "server.py", line 15, in getData
s.bind( ( HOST, PORT ) )
File "<string>", line 1, in bind
error: [Errno 22] Invalid argument
客户端输出/错误追踪:
Here is some data that is approximately the length
of the data that I am sending in my real server. It is a string that
doesn't contain any unordinary characters except for maybe a tab.
Traceback (most recent call last):
File "client.py", line 49, in <moduel>
main()
File "client.py", line 46, in main
getData()
File "client.py", line 11, in getData
data = sendDataRequest()
File "client.py", line 37, in sendDataRequest
print data
UnboundLocalError: local variable 'data' referenced before assignment
日志:
DEBUG:root:Server listens
DEBUG:root:Client sending message
DEBUG:root:Client connects
DEBUG:root:Client starts reading from socket
DEBUG:root:Server received message
DEBUG:root:Server sends data back to client
DEBUG:root:Server closes connection
DEBUG:root:Client done reading
DEBUG:root:Client closes socket
DEBUG:root:Client sending message
DEBUG:root:Client starts reading from socket
DEBUG:root:Client throws error: [Errno 104] Connection reset by peer
更新:我使用了汤姆的getData()
函数,但把s.bind()
移动到循环之前,这样就能工作了。老实说,我不知道为什么这样有效,如果有人能解释一下为什么服务器关闭客户端的socket是安全的,但关闭服务器的socket却不行,那就太好了!谢谢!
3 个回答
我之前也遇到过类似的问题,当我在发送数据时,连接被对方重置了。后来发现,这是因为接收方的某个地方出现了异常,导致了这个问题。所以,当脚本意外结束时,操作系统就会在那个连接上重置连接。这虽然是个比较老的话题,但如果你也遇到类似的线程问题,我的建议是:在尝试让事情变得复杂之前,先确保它在单线程下能正常工作。
我对这个问题不是很了解,但我在研究一个可能相关的问题(在Linux上偶尔出现“连接被对方重置”的错误,而在Windows上运行正常),然后我发现了这个链接:http://scie.nti.st/2008/3/14/amazon-s3-and-connection-reset-by-peer/。在这里,我们的调试高手Garry Dolley在2008年总结道:
“Linux内核2.6.17及以上版本增加了TCP窗口/缓冲区的最大大小,这导致一些设备出现问题,如果它们无法处理足够大的TCP窗口,就会重置连接,我们看到的就是‘连接被对方重置’的消息。”
他提供了一个解决方案,涉及到/etc/sysctl.conf这个文件。我还没有尝试过这个方法,但可能值得一看?
虽然我在Windows 7 64位系统上用Python 2.7无法重现这个问题,但我猜可能发生了以下情况:
- 服务器在监听连接
- 客户端连接上来了
- 客户端发送“嘿,朋友,我需要一些数据”
- 服务器接收到这个请求
- 服务器把数据发回给客户端
- 服务器关闭了连接
- 客户端尝试从连接中读取数据,发现连接已经关闭
- 客户端抛出了“连接被对方重置”的错误。
你提供的客户端的错误追踪信息似乎支持这个猜测。有没有可能通过一些额外的日志记录来证明不是这种情况呢?
还有一些需要注意的地方:
如果你的客户端在收到的第一条数据中没有找到结束字符串,它会重新打开一个新的连接到服务器。我觉得这样不太对——你应该从同一个连接中读取数据,直到把所有数据都读完。
补充:还有几点:
在你的示例日志输出中,你没有更新代码,所以我看不出每一行日志是从哪里来的。不过,看起来你可能有两个客户端同时在运行(可能在不同的进程或线程中?),这就导致了:
我刚注意到最后一点。在这里的示例中 https://docs.python.org/2/library/socket.html#example,服务器并没有关闭连接,而是关闭了从监听中生成的连接。可能是你有两个客户端连接到同一个服务器的连接实例,当你关闭服务器的连接时,实际上是断开了两个连接的客户端,而不仅仅是第一个。如果你在运行多个客户端,记录一些身份信息,比如 DEBUG:root:Client(6) done reading
可能会有帮助。
你能尝试以下代码来检查服务器的数据线程主循环吗?这将显示问题是否与关闭监听连接有关,而不是与已连接的连接有关:
def getData():
HOST = "localhost"
PORT = 5454
s = socket.socket( socket.AF_INET, socket.SOCK_STREAM )
# s.setsockopt( socket.SOL_SOCKET, socket.SO_REUSEADDR, 1 ) #because linux doesn't like reusing addresses by default
s.bind( ( HOST, PORT ) )
logging.debug( "Server listens" )
s.listen( 5 )
while True:
conn, addr = s.accept()
logging.debug( "Client connects" )
print "Connected by,", addr
dataRequest = conn.recv( 1024 )
logging.debug( "Server received message" )
time.sleep( .01 ) #usually won't have to sample this fast
data = """Here is some data that is approximately the length
of the data that I am sending in my real server. It is a string that
doesn't contain any unordinary characters except for maybe a tab."""
if not timeThread.isAlive(): #lets client know test is over
data = "\t".join( [ data, "Terminate" ] )
conn.send( data )
conn.close()
print "Finished"
print "Press Ctrl-C to quit"
break
else:
logging.debug( "Server sends data back to client" )
conn.send( data )
logging.debug( "Server closes connection" )
conn.close()