Read more

Sending TCP keepalives in Ruby

Henning Koch
October 20, 2015Software engineer at makandra GmbH

When you make a simple TCP connection to a remote server (like telnet), your client won't normally notice when the connection is unexpectly severed on the remote side. E.g. if someone would disconnect a network cable from the server you're connected to, no client would notice. It would simply look like nothing is being sent.

Illustration book lover

Growing Rails Applications in Practice

Check out our e-book. Learn to structure large Ruby on Rails codebases with the tools you already know and love.

  • Introduce design conventions for controllers and user-facing models
  • Create a system for growth
  • Build applications to last
Read more Show archive.org snapshot

You can detect remote connection loss by configuring your client socket to send TCP keepalive signals after some period of inactivity. If those signals are not acknowledged by the other side, your client will terminate the connection.

TCP keepalives must be supported by the client OS and the server OS for this to work (keepalives are usually enabled for Linux boxes). You will also need to manually enable keepalives in the userland application that is holding the socket.

Understanding keepalive settings

To look at the keepalive settings of your Linux you can run the following three commands:

$ cat /proc/sys/net/ipv4/tcp_keepalive_time
7200

$ cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75

$ cat /proc/sys/net/ipv4/tcp_keepalive_probes
9

This roughly translates to: After 7200 seconds (2 hours) of inactivity, up to 9 keepalive signals ("probes") are sent, 75 seconds apart. If none of these 9 keepalives are acknowledged by the other side, a connection loss is assumed and the connection is terminated.

These settings means you would notice a dead TCP connection after 7200 + 9 * 75 = 7875 seconds. You probably want to reduce these numbers significantly in order to detect dead connections in a timely fashion, but you can do this in your application code (no need to reconfigure your Linux).

Example: Simple Ruby TCP socket with keepalives

require 'socket'

s = TCPSocket.new('10.11.12.13', 1234)

s.setsockopt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true)
s.setsockopt(Socket::SOL_TCP, Socket::TCP_KEEPIDLE, 50)
s.setsockopt(Socket::SOL_TCP, Socket::TCP_KEEPINTVL, 10)
s.setsockopt(Socket::SOL_TCP, Socket::TCP_KEEPCNT, 5)

# Print out whatever is sent from the other side
while line = s.gets
  puts line
end

This will start sending up to 10 keepalives probes after 50 seconds, 5 seconds apart. If none of these probes are answered, s.gets will raise Errno::ETIMEDOUT.

Example: EventMachine TCP client with keepalives

class Echo < EventMachine::Connection

  def post_init
    set_sock_opt(Socket::SOL_SOCKET, Socket::SO_KEEPALIVE, true)
    set_sock_opt(Socket::SOL_TCP, Socket::TCP_KEEPIDLE, 50)
    set_sock_opt(Socket::SOL_TCP, Socket::TCP_KEEPINTVL, 10)
    set_sock_opt(Socket::SOL_TCP, Socket::TCP_KEEPCNT, 5)
  end

  def receive_data(data)
    puts data
  end
  
  def unbind
    puts "Connection terminated"
  end
  
end

EventMachine.run {
  EventMachine::connect('10.11.12.13', 1234, Echo)
}

This will start sending up to 10 keepalives probes after 50 seconds, 5 seconds apart. If none of these probes are answered, #unbind will be called.

Note that there is a bug in the pure Ruby implementation of EventMachine (EM.library_type = :pure_ruby) where Errno::ETIMEDOUT crashes the reactor instead of calling #unbind.

Henning Koch
October 20, 2015Software engineer at makandra GmbH
Posted by Henning Koch to makandra dev (2015-10-20 10:06)