[Opendnssec-develop] Sessions with network HSM:s

Tue Nov 16 08:07:10 UTC 2010

Hello,

> SIDN is having some problems with their HSM, because it closes a session if it has been idle for too long time. E.g. key generation every third month.

Linux can do keepalives on TCP connections through sysctl, but even
in a world dominated by NAT this may not be wise to assume in general.

net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75

There also is an SO_KEEPALIVE according to
http://www.mkssoftware.com/docs/man3/setsockopt.3.asp
and/or an option in <netinet/tcp.h> looking like
#define TCP_KEEPIDLE     4      /* Start keeplives after this period */
but I don't know how general these are.  It might be improper to
dictate the user's choice of OS/HSM combination.

What is the problem in reopening the connection if it died?  There
is an explicit PIN in the configuration, and if access to that has
been given up by lowering rights (I don't know if that is done yet)
then we could have the PIN agent that we discussed long ago and that
somehow has disappeared since then.  That is a general solution, and
I still think it is a proper design.  I sent example code that can
even validate that only the right programs can access the PIN, which
is possible with SysV mechanisms.

Detection of closed sockets is trivial from the error return values,
like ECONNRESET on Linux (and probably other OS's too).

> Utimaco recommended us having a heartbeat mechanism for keeping the session alive.
> 
> Is this the correct way to go? Or should the HSM vendor make sure to implement a heartbeat mechanism in their own library?

Why not just reconnect a lost connection?  That solves such HSM problems and,
at the same time, network disruptions.  We have a redundany layer underneath
PKCS #11 doing this for us, so I hadn't noticed this problem on our SafeNet
HSMs.

Cheers,
 -Rick