When you make a simple TCP connection to a remote server (like telnet
), your client won't normally notice when the connection is unexpectly severed on the remote side. E.g. if someone would disconnect a network cable from the server you're connected to, no client would notice. It would simply look like nothing is being sent.
You can detect remote connection loss by configuring your client socket to send TCP keepalive signals after some period of inactivity. If those signals are not acknowledged by the other side, your client will terminate the connection.
TCP keepalives must be supported by the client OS and the server OS for this to work (keepalives are usually enabled for Linux boxes). You will also need to manually enable keepalives in the userland application that is holding the socket.
Understanding keepalive settings
To look at the keepalive settings of your Linux you can run the following three commands:
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
$ cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
$ cat /proc/sys/net/ipv4/tcp_keepalive_probes
9
This roughly translates to: After 7200 seconds (2 hours) of inactivity, up to 9 keepalive signals ("probes") are sent, 75 seconds apart. If none of these 9 keepalives are acknowledged by the other side, a connection loss is assumed and the connection is terminated.
These settings means you would notice a dead TCP connection after 7200 + 9 * 75 = 7875 seconds. You probably want to reduce these numbers significantly in order to detect dead connections in a timely fashion, but you can do this in your application code (no need to reconfigure your Linux).
Example: Simple Ruby TCP socket with keepalives
require 'socket'
s = TCPSocket.new('10.11.12.13', 1234)
s.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true)
s.setsockopt(Socket::SOL_TCP, Socket::TCP_KEEPIDLE, 50)
s.setsockopt(Socket::SOL_TCP, Socket::TCP_KEEPINTVL, 10)
s.setsockopt(Socket::SOL_TCP, Socket::TCP_KEEPCNT, 5)
# Print out whatever is sent from the other side
while line = s.gets
puts line
end
This will start sending up to 10 keepalives probes after 50 seconds, 5 seconds apart. If none of these probes are answered, s.gets
will raise Errno::ETIMEDOUT
.
Example: EventMachine TCP client with keepalives
class Echo < EventMachine::Connection
def post_init
set_sock_opt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true)
set_sock_opt(Socket::SOL_TCP, Socket::TCP_KEEPIDLE, 50)
set_sock_opt(Socket::SOL_TCP, Socket::TCP_KEEPINTVL, 10)
set_sock_opt(Socket::SOL_TCP, Socket::TCP_KEEPCNT, 5)
end
def receive_data(data)
puts data
end
def unbind
puts "Connection terminated"
end
end
EventMachine.run {
EventMachine::connect('10.11.12.13', 1234, Echo)
}
This will start sending up to 10 keepalives probes after 50 seconds, 5 seconds apart. If none of these probes are answered, #unbind
will be called.
Note that there is a bug in the pure Ruby implementation of EventMachine (EM.library_type = :pure_ruby
) where Errno::ETIMEDOUT
crashes the reactor instead of calling #unbind
.