[Opendnssec-user] OpenDNSSEC losing its marbles on restart?
    Berry van Halderen 
    berry at nlnetlabs.nl
       
    Sun Jun 12 10:11:38 UTC 2022
    
    
  
On 2022-06-11 14:05, Havard Eidnes via Opendnssec-user wrote:
> I had occasion to do a base OS upgrade on my long-running
> OpenDNSSEC signer host.  This was a minor kernel version upgrade,
> so should be uneventful.  For the OS itself, that turned out to
> be true, but for OpenDNSSEC, "not so much":
> 
> Jun 11 12:24:35 signer-host ods-signerd: [hsm] unable to get key: key
> e9128a1b42447b79b3ea79883fd14e71 not found
> Jun 11 12:24:35 signer-host ods-signerd: [hsm] hsm_get_dnskey(): Got 
> NULL key
> Jun 11 12:24:35 signer-host ods-signerd: [hsm] unable to get key: hsm
> failed to create dnskey
> Jun 11 12:24:35 signer-host ods-signerd: [zone] unable to prepare
> signing keys for zone 0.37.158.in-addr.arpa: error getting dnskey
> Jun 11 12:24:35 signer-host ods-signerd: [worker[3]] CRITICAL: failed
> to sign zone 0.37.158.in-addr.arpa: General error
> 
> What exactly might be the cause of this error?
These errors are generic to not being able to load the keys from the 
HSM,
so we only need to look at the OpenDNSSEC connection to SoftHSM and
downwards.
> My /var/db/softhsm/ae4e912f-a382-8ebc-9d53-7710d137beac appears
> to be intact and contains some 3531 files. (Yes, using the
> "files" backend.)  About half of them are .lock files (is that
> normal?)
Files backend is fine, and having that many files and lock files
is a pretty sane setup.  The lock files keep being present even
if there is no lock and the keys are distributed over many files.
> However, there doesn't appear to exist any CLI tools in
> OpenDNSSEC or SoftHSM2 which might assist me in closing in on why
> exactly ods-signerd can't find this key anymore.  I've found
> softhsm2-dump-file, but that utility doesn't make it possible to
> identify the Key ID and compare it to what OpenDNSSEC logs.
There isn't a specific tool for SoftHSM, though other PKCS#11
tools will work.  But there is ods-hsmutil which does not rely
on the inner workings and only used the conf.xml file and is
able to list all keys (regardless of status and whether for OpenDNSSEC
or not).  Try:
     ods-hsmutil list
Or better something like:
     ods-hsmutil list | grep 84830d6a2a949f7e633924375b612d7d
It should output a line with the key.  If it outputs a line with the
key there's something wrong with OpenDNSSEC.  If it outputs
nothing there's something wrong with either the setup/reference
to SoftHSM, a permissions problem or a deep problem in SoftHSM.
But there could also be a line stating:
   hsm_session_init(): PKCS#11 module load failed:
Indicating that the SoftHSMv2 isn't loading properly anymore.
 From the next message I don't think that is the case anymore.
> This zone earlier used RSA keys, but has been rolled over to use
> ECDSA keys.
> 
> However, the file in
> /var/opendnssec/tmp/5.39.158.in-addr.arpa.backup2 contains this
> ID:
> 
> 5.39.158.in-addr.arpa.backup2:;;Key: locator
> 9215be91f53b9559ba585e59c28f89a2 algorithm 13 flags 256 publish 0 ksk
> 0 zsk 0 keytag 53312
> 
> While these 'tmp' files are temporary, I suspect the "locator"
> actually comes from somewhere else.  signconf, perhaps?  Yep, the
> signconf/5.39.158.in-addr.arpa.xml file contains
Correct there tmp reference are based on the signconfs.
> but that doesn't jive with what "ods-enforcer key list" above
> gave me(!)
So there has been a change over time in how long which keys
are left behind as a reference in the zone.  This could lead
to issues as also tackled by issue OPENDNSSEC-956. This probably
isn't the case here (as per your next item), but ods-enforcer key
list isn't necessarily 100% in sync with the signconf.
> So ... what might the actual root cause of this error be?
> Hm, I found "ods-enforcer signconf", and did that and restarted
> OpenDNSSEC, and the xml file above now contains
> 
> and there are no new CRITICAL log messages.
> 
> (Sigh of releif!)
> 
> So ... why does it look like OpenDNSSEC doesn't keep its key list
> (via its signconf) up to date?  I didn't think "ods-enforcer
> signconf" was among those things the operator needs to ensure is
> performed periodically?
After writing my initial response I saw at the bottom that your
running OpenDNSSEC 2.1.8, whilst in version 2.1.10 there has been
a fix to an issus that suspiciously looks like the thing your
running into.  https://issues.opendnssec.org/browse/SUPPORT-265
That issue also can lead to signconf not being updated.
If the issue re-appears after upgrading, please let me know but
for now I'm assuming you have run into that issue.  It occurs
on some edge (but existing) cases where keys are being removed
while also the signconf needed updating.  This issue did not
present itself on much earlier issues due to some other changes.
> So ... the only new log messages on the console are
> 
> Jun 11 13:34:59 signer-host ods-signerd: [backup] bad ixfr journal:
> trailing RRs after final SOA
> Jun 11 13:35:32 signer-host syslogd[197]: last message repeated 180 
> times
> 
> I'll just note that the only one who writes that ixfr journal is
> OpenDNSSEC itself, so this is obviously a bug in OpenDNSSEC, in
> that it either can't properly write the log or read it in again.
> 
> Additionally, the message doesn't mention which zone this error
> is logged for, so it is practically useless for the operator.
So that will take me a bit more time to look into you, and I'd agree
with you that this isn't useful not to know which zone this references.
At least that should change, not the bad journal could be a result
from the earlier problem (still no good, but not that bad..).
> Jun 11 13:47:05 signer-host ods-enforcerd:
> [hsm_key_factory_delete_key] looking for keys to purge from HSM
> 
> Why is that logged with so high severity?  Only "err" or higher
> is logged to the console (via my syslog.conf).
I'll change that.  Probably someone did that because it is such
a irreversible thing to do, removing keys from an HSM.
> This is with OpenDNSSEC 2.1.8 and SoftHSM2 2.6.1.  (I see I am
> a couple of versions behind the curent OpenDNSSEC.)
Which distribution are you on?  I am moving to add binary packaged
to some distributions.
Best regards,
Berry van Halderen
    
    
More information about the Opendnssec-user
mailing list