[Opendnssec-user] Error converting from 1.4.14 to 2.1.8

Havard Eidnes he at uninett.no
Mon Mar 8 18:40:35 UTC 2021


> 2) We seem to have issues related to SoftHSM2, which we're
> converting to at the same time.  The problem we're having is that
> the level of error messages related to SoftHSM2 is nearly binary:
> either it works or it doesn't, and not a word about "why" when it
> doesn't.  As an operator this leaves me stumped about what's
> really going on and what possible correction can be made.
>
> You might find this set of log messages in the log:
>
> Mar  4 22:28:50 signer ods-signerd: [zone] unable to publish keys for zone 0.0.1.0.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa: error creating libhsm context
> Mar  4 22:28:50 signer ods-signerd: [tools] unable to read zone 0.0.1.0.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa: failed to publish dnskeys (HSM error)
> Mar  4 22:28:50 signer ods-signerd: CRITICAL: failed to sign zone 0.0.1.0.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa: HSM error
> Mar  4 22:28:50 signer ods-signerd: back-off task [read] for zone 0.0.1.0.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa with 60 seconds

...or...

Mar  7 16:20:11 test-signer ods-signerd: [zone] publish 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa DNSKEY locator ac24cfd4f814e7e7b1ba7480cfb13633
Mar  7 16:20:11 test-signer ods-signerd: [hsm] unable to get key: key 80bb90c066608c6bc11e77cf59d89c49 not found
Mar  7 16:20:11 test-signer ods-signerd: [hsm] hsm_get_dnskey(): Got NULL key
Mar  7 16:20:11 test-signer ods-signerd: [hsm] unable to get key: hsm failed to create dnskey
Mar  7 16:20:11 test-signer ods-signerd: [zone] unable to publish dnskeys for zone 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa: error creating dnskey
Mar  7 16:20:11 test-signer ods-signerd: [tools] unable to read zone 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa: failed to publish dnskeys (General error)
Mar  7 16:20:11 test-signer ods-signerd: CRITICAL: failed to sign zone 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa: General error
Mar  7 16:20:11 test-signer ods-signerd: [worker[3]] finished working

Looking at the output of "ods-hsmutil list" reveals that the
key-id is readily available in the SoftHSM:

ods @ test-signer: {47} ods-hsmutil list > /tmp/hsm-keys
ods @ test-signer: {48} grep 80bb90c066608c6bc11e77cf59d89c49 /tmp/hsm-keys 
SoftHSM               80bb90c066608c6bc11e77cf59d89c49  RSA/1280  
ods @ test-signer: {49} 

and there are no file permission problems with the SoftHSM database.

However...  I did find a point of correlation: if I try to run
with the Signer's WorkerThreads set to 4, it fails as above.  If
I set it to 1, lo and behold, it works just fine:

Mar  7 17:10:34 test-signer ods-signerd: [zone] publish 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa DNSKEY locator ac24cfd4f814e7e7b1ba7480cfb13633
Mar  7 17:10:40 test-signer ods-signerd: [zone] publish 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa DNSKEY locator 80bb90c066608c6bc11e77cf59d89c49
Mar  7 17:10:45 test-signer ods-signerd: [zone] publish 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa DNSKEY locator 0adb7b0d24c25febbfac03e07522d286
Mar  7 17:10:45 test-signer ods-signerd: [zone] NSEC3PARAM in zone 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa not found: skipping delete
Mar  7 17:10:45 test-signer ods-signerd: [adapter] read zone 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa from dns input adapter /usr/pkg/etc/opendnssec/addns.xml
Mar  7 17:10:45 test-signer ods-signerd: [file] openfile 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa.xfrd.tmp count 1
Mar  7 17:10:45 test-signer ods-signerd: [adapter] detected axfr serial=2021021604 for zone 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa
Mar  7 17:10:46 test-signer ods-signerd: [adapter] zone 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa set soa ttl to 3600
Mar  7 17:10:46 test-signer ods-signerd: [adapter] zone 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa set soa minimum to 900

...

Mar  7 18:51:11 test-signer ods-signerd: [tools] log stats for zone 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa serial 2021030700
Mar  7 18:51:11 test-signer ods-signerd: [STATS] 4.0.0.0.0.0.0.7.0.1.0.0.2.ip6.arpa 2021030700 RR[count=258 time=3(sec)] NSEC3[count=0 time=0(sec)] RRSIG[new=1438 reused=0 time=14(sec) avg=102(sig/sec)] TOTAL[time=6026(sec)] 

etc. etc.

Do note, we are currently using sqlite3 for the SoftHSM2
back-end, and I've also upgraded SoftHSM2 to 2.6.1.

Is this a case of

"Doctor, doctor, it hurts when i press my finger here!"
"Don't do that, then!"

??

Does SoftHSM2 with an sqlite3 backend allow for simultaneous
access from multiple worker thereads in the signer?  If "no",
which other back-ends for SoftHSM2 allows this parallelism?  Or
must this be a bug which is peculiar to the OS we're using?  In
either case, improved error reporting on the HSM errors above
would probably help in chasing down the actual reason for this
issue.


Best regards,

- Håvard


More information about the Opendnssec-user mailing list