[Opendnssec-develop] Sessions with network HSM:s

Roland van Rijswijk Roland.vanRijswijk at surfnet.nl
Tue Nov 16 09:52:09 UTC 2010


Short answer to a long & detailed (thanks for the info) story: I agree with Antoin, OpenDNSSEC should deal in a graceful way with this error rather than just terminating the Enforcer. A back-off and retry is warranted here, also keeping in mind the wish of some users to have offline repositories (which will also lead to CKR_TOKEN_NOT_PRESENT).

Just my 2 cents ;-)

Cheers,

Roland

On 16 nov 2010, at 10:41, Antoin Verschuren wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 16-11-10 09:07, Rick van Rein wrote:
>> 
>> What is the problem in reopening the connection if it died?  
> 
> According to the SafeNet documentation, this is I think the way to go.
> It not only solves the issue when the connection simply timed out, but
> can also recover network glitches.
> The network glitches don't need to be between the enforcer and the HSM.
> We use  HSM's in a pool, and when one of the connections between the
> HSM's is lost, it will also give an error back to the enforcer, refusing
> to accept the command, and then restoring the connection between the
> HSM's using autorecovery. Reissuing the command by the enforcer a little
> bit later is better than simply dying:
> 
> - --- ORIGINAL DOCUMENTATION: ---
> Ch11 - HA
> 
> Implementing HA
> 
> If you use the Luna SA HA feature then the calls to the Luna SAs are
> load-balanced. The session handle that the application receives when it
> opens a session is a virtual one and is managed by the HA code in the
> library. The actual sessions with the HSM are established by the HA code
> in the library and hidden from the application and will come and go as
> necessary to fulfill application level requests.
> 
> Before the introduction of HA AutoRecovery, bringing a failed/lost group
> member back into the group (recovery) was a manual procedure.
> 
> The Administration & Maintenance section contains a general description
> of the how the HA AutoRecovery function works, in practice.
> 
> For every PKCS11 call, the HA recover logic will check to see if we need
> to perform auto recovery to a disconnected appliance. If there is a
> disconnected appliance then it will try to reconnect to that appliance
> before it proceeds with the current PKCS11 call.
> 
> The HA recovery logic is designed in such a way that it will only try to
> reconnection to an appliance every X secs and N number of times where X
> and N are configurable via the "VTL" utility. The following is the
> pseudo code of the HA logic
> 
> if (disconnected_member > 0 and recover_attempt_count < N and time_now -
> last_recover_attempt > X) then
>   performance auto recovery
>   set last_recover_attempt equal to time_now
>   if (recovery failed) then
>      increment recover_attempt_count by 1
>   else
>      decrement disconnected_member by 1
>      reset recover_attempt_count to 0
>   end if
> end if
> 
> The HA auto recovery design runs within a pkcs11 call. The
> responsiveness of recovering a disconnected member is greatly influenced
> by the frequency of PKCS11 calls from the user application. Although the
> logic shows that it will attempt to recover a disconnected client in X
> secs, in reality, it will not run until the user application makes the
> next PKCS11 call.
> 
> How Does Your Software Know That a Member Has Failed?
> 
> When an HA Group member first fails, the HA status for the group shows
> "device error" for the failed member. All subsequent calls return "token
> not present", until the member (HSM Partition or PKI token) is returned
> to service.
> 
> Here is an example of two such calls using CKDemo:
> 
> Enter your choice : 52
> 
> Slots available:
>        slot#1 - LunaNet Slot
> 
>   slot#2 - LunaNet Slot
> 
>   slot#3 - HA Virtual Card Slot
> 
> Select a slot: 3
> 
> HA group 1599447001 status:
> 
>   HSM 599447001      - CKR_DEVICE_ERROR
>   HSM 78665001       - CKR_OK
> Status: Doing great, no errors (CKR_OK)
> 
> <SNAP>
> 
> Enter your choice : 52
> 
> Slots available:
>        slot#1 - LunaNet Slot
>   slot#2 - LunaNet Slot
>  slot#3 - HA Virtual Card Slot
> 
> Select a slot: 3
> 
> HA group 1599447001 status:
> 
>   HSM 599447001      - CKR_TOKEN_NOT_PRESENT
>   HSM 78665001       - CKR_OK
> Status: Doing great, no errors (CKR_OK)
> - --- End of ORIGINAL DOCUMENTATION ---
> 
> 
>> Why not just reconnect a lost connection?  That solves such HSM problems and,
>> at the same time, network disruptions.  We have a redundany layer underneath
>> PKCS #11 doing this for us, so I hadn't noticed this problem on our SafeNet
>> HSMs.
> 
> The reason why you probably don't see this often is because you maintain
> more signatures. We only have one KSK that only changes once every 5
> years, or surprise rollovers, and one ZSK changing every 3 months.
> Roland told me he had seen the CKR_TOKEN_NOT_PRESENT error once,
> probably that was also due to a lost connection because of a network
> glitch between more HSM's in a pool.
> 
> - -- 
> Antoin Verschuren
> 
> Technical Policy Advisor SIDN
> Utrechtseweg 310, PO Box 5022, 6802 EA Arnhem, The Netherlands
> 
> P: +31 26 3525500  F: +31 26 3525505  M: +31 6 23368970
> mailto:antoin.verschuren at sidn.nl  xmpp:antoin at jabber.sidn.nl
> http://www.sidn.nl/
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> 
> iQEcBAEBAgAGBQJM4lGvAAoJEDqHrM883Agn0dQH/RRAy7Wttpot2Ca1DX1Oqc+2
> 1vXtIR0DQRYcJjPAOIk3EtCTtPArOYh1LXw1G7FtgE4crEPQpk5MmBhsBf63a9BJ
> AqVN7lUm6m7nHmk61O8DdoAIKZCLzLLDLVd0P6vumbT5c8CAWJpg6GW1LVpx7Wgu
> vrFK7EC1bHMopqL/nZbabFY/4H9e/wg075AdBmqyX4XOXfnufUffhWURoF3KAijz
> MwIv0rhc6lHAze/YdCsLwJxAvfNcGIMq4kDhIaJMWSeBLKW3nHgqYA2XFGczljsG
> nM8gAAIEYw5fXTkFwdGysffoeITsOJIKKJMKssnc+lHucjolS3TtliIvMVJIgqU=
> =8vqH
> -----END PGP SIGNATURE-----
> _______________________________________________
> Opendnssec-develop mailing list
> Opendnssec-develop at lists.opendnssec.org
> https://lists.opendnssec.org/mailman/listinfo/opendnssec-develop


-- Roland M. van Rijswijk
-- SURFnet Middleware Services
-- t: +31-30-2305388
-- e: roland.vanrijswijk at surfnet.nl




More information about the Opendnssec-develop mailing list