[Opendnssec-user] OpenDNSSEC losing its marbles on restart?

Havard Eidnes he at uninett.no
Sat Jun 11 12:05:32 UTC 2022


Hi,

I had occasion to do a base OS upgrade on my long-running
OpenDNSSEC signer host.  This was a minor kernel version upgrade,
so should be uneventful.  For the OS itself, that turned out to
be true, but for OpenDNSSEC, "not so much":

Jun 11 12:24:34 signer-host ods-signerd: [zone] zone 0.39.128.in-addr.arpa set soa serial to 2022061106
Jun 11 12:24:35 signer-host ods-signerd: [hsm] unable to get key: key e9128a1b42447b79b3ea79883fd14e71 not found
Jun 11 12:24:35 signer-host ods-signerd: [hsm] hsm_get_dnskey(): Got NULL key
Jun 11 12:24:35 signer-host ods-signerd: [hsm] unable to get key: hsm failed to create dnskey
Jun 11 12:24:35 signer-host ods-signerd: [zone] unable to prepare signing keys for zone 0.37.158.in-addr.arpa: error getting dnskey
Jun 11 12:24:35 signer-host ods-signerd: [worker[3]] CRITICAL: failed to sign zone 0.37.158.in-addr.arpa: General error

What?

This appears to be repeated for a lot of my other zones.

What exactly might be the cause of this error?

My /var/db/softhsm/ae4e912f-a382-8ebc-9d53-7710d137beac appears
to be intact and contains some 3531 files. (Yes, using the
"files" backend.)  About half of them are .lock files (is that
normal?)

However, there doesn't appear to exist any CLI tools in
OpenDNSSEC or SoftHSM2 which might assist me in closing in on why
exactly ods-signerd can't find this key anymore.  I've found
softhsm2-dump-file, but that utility doesn't make it possible to
identify the Key ID and compare it to what OpenDNSSEC logs.

I've done a system call trace of ods-signerd, and it appears to
open each and every .object file in the storage directory, as
well as the "generation" file, but other than that I wasn't able
to narrow in on eaxtly what the error might be.  Sigh!

The generation file contains

ods @ signer-host: {17} od -c generation 
0000000   \0  \0  \0  \0  \0  \0  \0 001                                
0000010
ods @ signer-host: {18} 


Another example:

Jun 11 13:10:55 signer-host ods-signerd: [hsm] unable to get key: key 9215be91f53b9559ba585e59c28f89a2 not found
Jun 11 13:10:55 signer-host ods-signerd: [hsm] unable to get key: hsm failed to create dnskey
Jun 11 13:10:55 signer-host ods-signerd: [worker[4]] CRITICAL: failed to sign zone 5.39.158.in-addr.arpa: General error

However, that key "CKA_ID" isn't visible in the output for
"ods-enforcer key list -v" for that zone (or in general):

ods @ signer-host: {17} ods-enforcer key list -v -z 5.39.158.in-addr.arpa
Keys:
Zone:                           Keytype: State:    Date of next transition: Size: Algorithm: CKA_ID:                          Repository: KeyTag:
5.39.158.in-addr.arpa           KSK      active    2022-06-11 15:00:02      2048  13         8e803f96e9eb78352ae987216b00edb6 SoftHSM     53679
5.39.158.in-addr.arpa           ZSK      active    2022-06-11 15:00:02      1536  13         84830d6a2a949f7e633924375b612d7d SoftHSM     61387
ods @ signer-host: {18} 
ods @ signer-host: {18} ods-enforcer key list -v | grep 9215be91f53b9559ba585e59c28f89a2
ods @ signer-host: {19} 

This zone earlier used RSA keys, but has been rolled over to use
ECDSA keys.

However, the file in
/var/opendnssec/tmp/5.39.158.in-addr.arpa.backup2 contains this
ID:

5.39.158.in-addr.arpa.backup2:;;Key: locator 9215be91f53b9559ba585e59c28f89a2 algorithm 13 flags 256 publish 0 ksk 0 zsk 0 keytag 53312

While these 'tmp' files are temporary, I suspect the "locator"
actually comes from somewhere else.  signconf, perhaps?  Yep, the
signconf/5.39.158.in-addr.arpa.xml file contains

    <Keys>
      <TTL>PT1H</TTL>
      <Key>
        <Flags>256</Flags>
        <Algorithm>13</Algorithm>
        <Locator>9215be91f53b9559ba585e59c28f89a2</Locator>
      </Key>
      <Key>
        <Flags>257</Flags>
        <Algorithm>13</Algorithm>
        <Locator>8e803f96e9eb78352ae987216b00edb6</Locator>
        <KSK/>
        <Publish/>
      </Key>
      <Key>
        <Flags>256</Flags>
        <Algorithm>13</Algorithm>
        <Locator>84830d6a2a949f7e633924375b612d7d</Locator>
        <ZSK/>
        <Publish/>
      </Key>
    </Keys>

but that doesn't jive with what "ods-enforcer key list" above
gave me(!)

So ... what might the actual root cause of this error be?

Hm, I found "ods-enforcer signconf", and did that and restarted
OpenDNSSEC, and the xml file above now contains

    <Keys>
      <TTL>PT1H</TTL>
      <Key>
        <Flags>257</Flags>
        <Algorithm>13</Algorithm>
        <Locator>8e803f96e9eb78352ae987216b00edb6</Locator>
        <KSK/>
        <Publish/>
      </Key>
      <Key>
        <Flags>256</Flags>
        <Algorithm>13</Algorithm>
        <Locator>84830d6a2a949f7e633924375b612d7d</Locator>
        <ZSK/>
        <Publish/>
      </Key>
    </Keys>

and there are no new CRITICAL log messages.

(Sigh of releif!)


So ... why does it look like OpenDNSSEC doesn't keep its key list
(via its signconf) up to date?  I didn't think "ods-enforcer
signconf" was among those things the operator needs to ensure is
performed periodically?


So ... the only new log messages on the console are

Jun 11 13:34:59 signer-host ods-signerd: [backup] bad ixfr journal: trailing RRs after final SOA
Jun 11 13:35:32 signer-host syslogd[197]: last message repeated 180 times

I'll just note that the only one who writes that ixfr journal is
OpenDNSSEC itself, so this is obviously a bug in OpenDNSSEC, in
that it either can't properly write the log or read it in again.

Additionally, the message doesn't mention which zone this error
is logged for, so it is practically useless for the operator.

...and the occasional

Jun 11 13:47:05 signer-host ods-enforcerd: [hsm_key_factory_delete_key] looking for keys to purge from HSM

Why is that logged with so high severity?  Only "err" or higher
is logged to the console (via my syslog.conf).

This is with OpenDNSSEC 2.1.8 and SoftHSM2 2.6.1.  (I see I am
a couple of versions behind the curent OpenDNSSEC.)

Regards,

- Håvard


More information about the Opendnssec-user mailing list