[Opendnssec-user] Key and DS states

Havard Eidnes he at uninett.no
Mon Feb 17 16:50:14 UTC 2020


Hm,

my questions went unanswered?  That's slightly disappointing...

Here's a new and different formulation.

I apologize for the length and the inclusion of multiple issues in the
same e-mail message -- I don't appear to have become good friends with
OpenDNSSEC 2.x yet...


The scripts I maintain which surrounds our OpenDNSSEC installation
are primarily concerned with automating the required follow-up of KSK
rotations, which in our configuration happens once per year per zone.

I have tentatively modified the scripts I use with OpenDNSSEC 1.x to
work with OpenDNSSEC 2.x, but not put them into production yet. Here
you'll find an overall description of what they are currently set up
to do.  The question I have is basically "are some of the actions not
needed, or the conditions for the actions needs to be changed?"

1) Treat all the KSKs in "ods-enforcer key list" output which are
   in state "ready":

   a) extract the DS, query all the publishing NSes for the zone if
      the new DS is being published.  Signal "ds-seen" for those that
      do.
   b) otherwise, extract the DS, signal "ds-submit", and take actions
      to have the DS added to the parent zone.

2) Treat all the KSKs in "ods-enforcer key list" output which are
   in state "retire":

   a) extract the DS, and check whether all the publishing NSes for
      the zone have stopped publishing the DS record.  If not, signal
      "ds-retract" for the corresponding key, and initiate action to
      remove the DS record from the parent zone.
   b) If, on the other hand, the DS is no longer being published by
      the publishing NSes, signal "ds-gone" for the corresponding key.

Cursory testing seems to indicate, however, that signaling
"ds-retract" in 2a) "immediately" doesn't work the way I expect it to;
with this set of states I get:

% ods-enforcer key list --zone eduvpn.no -v
Keys:
Zone:                         Keytype: State:    Date of next transition: Size: Algorithm: CKA_ID:                          Repository: KeyTag:
eduvpn.no                     KSK      retire    waiting for ds-gone      2048  8          b368e2f95a4196769fcd53bfa8cbc6fc SoftHSM     14192
eduvpn.no                     ZSK      retire    ds-unsubmitted           1280  8          4aa28c4aa9e49e92479c5842f7c17522 SoftHSM     38604
eduvpn.no                     ZSK      active    ds-unsubmitted           1280  8          711bfccf43ee87ba89b1fb6165f8c9ab SoftHSM     19269
eduvpn.no                     KSK      ready     waiting for ds-seen      2048  8          51d709d2d5cec4e99e208ef423c4e141 SoftHSM     40393
% ods-enforcer key ds-seen --zone eduvpn.no -x 40393
1 KSK matches found.
1 KSKs changed.
% ods-enforcer key list -d --zone eduvpn.no
Keys:
Zone:                         Key role:     DS:          DNSKEY:      RRSIGDNSKEY: RRSIG:       Pub: Act: Id:
eduvpn.no                     KSK           unretentive  omnipresent  omnipresent  NA           1    1    b368e2f95a4196769fcd53bfa8cbc6fc
eduvpn.no                     ZSK           NA           hidden       NA           hidden       0    0    4aa28c4aa9e49e92479c5842f7c17522
eduvpn.no                     ZSK           NA           omnipresent  NA           omnipresent  1    1    711bfccf43ee87ba89b1fb6165f8c9ab
eduvpn.no                     KSK           rumoured     omnipresent  omnipresent  NA           1    1    51d709d2d5cec4e99e208ef423c4e141
% ods-enforcer key ds-retract --zone eduvpn.no -x 14192
0 KSK matches found.
0 KSKs changed.
% ods-enforcer key ds-retract --zone eduvpn.no -k b368e2f95a4196769fcd53bfa8cbc6fc
0 KSK matches found.
0 KSKs changed.
% echo $status
11
%

What is going on here?  Do I have to wait with "ds-retract" till Pub
and Act are both 0 for the KSK (that would sort of make sense), and
will OpenDNSSEC 2.x transition to that state all by itself based on
the configured timers?

The old "key list -v" output says "waiting for ds-gone"; is
"ds-retract" not a required or desireable action?  Ref.

% ods-enforcer key list -v --zone eduvpn.no
Keys:
Zone:                         Keytype: State:    Date of next transition: Size: Algorithm: CKA_ID:                          Repository: KeyTag:
eduvpn.no                     KSK      retire    waiting for ds-gone      2048  8          b368e2f95a4196769fcd53bfa8cbc6fc SoftHSM     14192
eduvpn.no                     ZSK      retire    ds-unsubmitted           1280  8          4aa28c4aa9e49e92479c5842f7c17522 SoftHSM     38604
eduvpn.no                     ZSK      active    ds-unsubmitted           1280  8          711bfccf43ee87ba89b1fb6165f8c9ab SoftHSM     19269
eduvpn.no                     KSK      active    ds-seen                  2048  8          51d709d2d5cec4e99e208ef423c4e141 SoftHSM     40393
%


One oddity I noticed while looking at this: sometimes "ods-enforcer
key list -d -z eduvpn.no" will hang after e.g. having printed
information for the first of 4 keys for the zone.  Meanwhile, there is
a steady stream of messages in the log of this form:

Feb 17 11:46:24 nnn ods-signerd: [rrset] unable to sign RRset[47]: lhsm_sign() failed
Feb 17 11:46:24 nnn ods-signerd: [hsm] error signing rrset with libhsm
Feb 17 11:46:24 nnn ods-signerd: [rrset] unable to sign RRset[12]: lhsm_sign() failed
Feb 17 11:46:24 nnn ods-signerd: [hsm] sign final: CKR_GENERAL_ERROR
Feb 17 11:46:24 nnn ods-signerd: [hsm] error signing rrset with libhsm
Feb 17 11:46:25 nnn ods-signerd: [hsm] sign final: CKR_GENERAL_ERROR
Feb 17 11:46:25 nnn ods-signerd: [hsm] error signing rrset with libhsm
Feb 17 11:46:25 nnn ods-signerd: [rrset] unable to sign RRset[47]: lhsm_sign() failed

Then, after minutes, the three remaining lines corresponding to the
three remaining keys of eduvpn.no are printed.

I initially wrote that I hoped this error is benign, and that
ods-signerd will re-try the signing at a later stage, but given the
continued use of a now-unpublished KSK (see below) I'm not so sure...
(It may be a separate issue, though.)


Anyway, after a while (a few hours) the "key list -d" display goes to:

% ods-enforcer key list -d --zone eduvpn.no
Keys:
Zone:                         Key role:     DS:          DNSKEY:      RRSIGDNSKEY: RRSIG:       Pub: Act: Id:
eduvpn.no                     KSK           unretentive  unretentive  unretentive  NA           0    0    b368e2f95a4196769fcd53bfa8cbc6fc
eduvpn.no                     ZSK           NA           hidden       NA           hidden       0    0    4aa28c4aa9e49e92479c5842f7c17522
eduvpn.no                     ZSK           NA           omnipresent  NA           omnipresent  1    1    711bfccf43ee87ba89b1fb6165f8c9ab
eduvpn.no                     KSK           omnipresent  omnipresent  omnipresent  NA           1    1    51d709d2d5cec4e99e208ef423c4e141
%

I thought that when the "Pub" and "Act" flags are both zero, all use
of this key has ceased.  However, if I do a zone transfer on the
"downstream" name server from OpenDNSSEC and look for the signatures
which are not made by the active ZSK, it looks like the key with
keytag 14192 is *still* being used to sign the DNSKEY RRset:

:~> dig @... -y xxx eduvpn.no. axfr | grep RRSIG | grep -v 19269 | grep -v NSEC
eduvpn.no.              3600    IN      RRSIG   DNSKEY 8 2 3600 20200306152032 20200214042507 14192 eduvpn.no. BFwyM2T1HTLhgMuQNXwQ6lwRZiN9B9ZTM8uKUEy25kGNzNLw/R9R6Mi+ mVKzKpm0k8x3zoSIvVziEE1c4dGSjvZU9AIv6h+kB1pa3EkQnho5vUGV mjrrb+h+B94A3wb187DNUzqwJ8/Co5g7oECHngnepkmeMigzAJav7qCp MfQMnInPjZxjCf0rzuJi87Fo8pqy5zWsIXnS2GmcGXIVq9rZECRar4is ttUtYqml3jzycGs9xB29KR28rPGnpcNoUQUaxB2pYVY5ErSmAek5LWCi 6s44pdUzJ54L/h+7in5EkIFGFZZkgN3VK5QojFV7aHSl3xtZGqpG0fS5 vEYKrA==
:~> 

If that's the case, it's definately not safe to remove the DS record
from the parent zone until that signature has been replaced by one
using the key with keytag 40393.  Is there a bug in the handling of
the "Act" flag for the key?  Or ... is the criterion for when it's
safe to signal "ds-gone" even more complicated?  Or ... hm, I have
once again on the previous re-sign attempt one of these no-info
"CKR_GENERAL_ERROR" errors logged, though it's unclear which zone that
relates to:

Feb 17 15:11:59 nnn ods-signerd: [scheduler] pop task for zone eduvpn.no
Feb 17 15:11:59 nnn ods-signerd: [scheduler] unschedule task [sign] for zone eduvpn.no
Feb 17 15:11:59 nnn ods-signerd: [worker[2]] start working
Feb 17 15:11:59 nnn ods-signerd: [scheduler] pop task for zone 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [scheduler] unschedule task [sign] for zone 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [namedb] zone eduvpn.no update serial: format=datecounter in=2020011900 internal=2020021700 out=2020021401 now=1581948719
Feb 17 15:11:59 nnn ods-signerd: [worker[3]] start working
Feb 17 15:11:59 nnn ods-signerd: [namedb] zone eduvpn.no update serial: 2020021401 + 299 = 2020021700
Feb 17 15:11:59 nnn ods-signerd: [zone] zone eduvpn.no set soa serial to 2020021700
Feb 17 15:11:59 nnn ods-signerd: [namedb] zone 0.9.6.6.9.4.6.7.4.nrenum.net update serial: format=datecounter in=2019030100 internal=2020021304 out=2020021304 now=1581948719
Feb 17 15:11:59 nnn ods-signerd: [namedb] zone 0.9.6.6.9.4.6.7.4.nrenum.net update serial: 2020021304 + 396 = 2020021700
Feb 17 15:11:59 nnn ods-signerd: [zone] zone 0.9.6.6.9.4.6.7.4.nrenum.net set soa serial to 2020021700
Feb 17 15:11:59 nnn ods-signerd: [hsm] sign final: CKR_GENERAL_ERROR
Feb 17 15:11:59 nnn ods-signerd: [hsm] error signing rrset with libhsm
Feb 17 15:11:59 nnn ods-signerd: [rrset] unable to sign RRset[48]: lhsm_sign() failed
Feb 17 15:11:59 nnn ods-signerd: [scheduler] schedule task [write] for 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [worker[3]] finished working
Feb 17 15:11:59 nnn ods-signerd: [worker[3]]: report for duty
Feb 17 15:11:59 nnn ods-signerd: [scheduler] pop task for zone 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [scheduler] unschedule task [write] for zone 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [worker[3]] start working
Feb 17 15:11:59 nnn ods-signerd: [worker[1]]: report for duty
Feb 17 15:11:59 nnn ods-signerd: [tools] skip write zone 0.9.6.6.9.4.6.7.4.nrenum.net serial 2020021700 (zone not changed)
Feb 17 15:11:59 nnn ods-signerd: [file] openfile 0.9.6.6.9.4.6.7.4.nrenum.net.backup2.tmp count 4294967281
Feb 17 15:11:59 nnn ods-signerd: [worker[2]] sign zone eduvpn.no failed: 1 RRsets failed
Feb 17 15:11:59 nnn ods-signerd: [worker[2]] CRITICAL: failed to sign zone eduvpn.no: General error
Feb 17 15:11:59 nnn ods-signerd: back-off task [sign] for zone eduvpn.no with 1920 seconds
Feb 17 15:11:59 nnn ods-signerd: [scheduler] schedule task [sign] for eduvpn.no
Feb 17 15:11:59 nnn ods-signerd: [task] On Mon Feb 17 15:43:59 2020 I will [sign] zone eduvpn.no

Doing a syscall trace while doing "ods-signer sign eduvpn.no" doesn't
reveal much more than an additional debug message which is normally
suppressed:

  3212     12 ods-signerd GIO   fd 11 wrote 146 bytes
       "<11>1 2020-02-17T15:52:46.939477+01:00 nnn.uninett.no ods-signerd - \
        - - OSSLRSA.cpp(115): An error occurred while performing a PKCS #1 sig\
        nature"

But ... *what* error occurred, and *why*?  I'd claim that the
diagnostics could do with some improvement, because it doesn't say,
and nothing in the preceding of the syscall trace indicates a syscall
gone wrong; it has read

  3212     12 ods-signerd NAMI  "/var/db/softhsm/f3ea35bc-ea0c-1b3e-c8a2-9106c88f4491/generation"

and

  3212     12 ods-signerd NAMI  "/var/db/softhsm/f3ea35bc-ea0c-1b3e-c8a2-9106c88f4491/48ecd976-da51-e0e5-baf7-ef9c9b01afc1.object"

successfully just prior to spewing that error message.

...

It seems something like this could possibly be added to SoftHSM 2.x:

--- src/lib/crypto/OSSLRSA.cpp.orig     2018-09-10 12:00:25.000000000 +0000
+++ src/lib/crypto/OSSLRSA.cpp
@@ -112,7 +112,15 @@ bool OSSLRSA::sign(PrivateKey* privateKe
 
                if (sigLen == -1)
                {
+                       unsigned long err;
+                       char *errstr;
+
                        ERROR_MSG("An error occurred while performing a PKCS #1 signature");
+                       for (err = ERR_get_error(); err != 0; 
+                            err = ERR_get_error()) {
+                               errstr = ERR_error_string(err, NULL);
+                               ERROR_MSG(errstr);
+                       }
 
                        return false;
                }

(I'm sure there are other instances like this in SoftHSM 2.x...)

And ... re-starting OpenDNSSEC after adding the above and doing
"ods-signer sign eduvpn.no" does basically nothing, because OpenDNSSEC
cleverly claims the input hasn't changed:

Feb 17 16:43:35 hugin ods-signerd: [scheduler] pop task for zone eduvpn.no
Feb 17 16:43:35 hugin ods-signerd: [scheduler] unschedule task [forceread] for zone eduvpn.no
Feb 17 16:43:35 hugin ods-signerd: [worker[3]] start working
Feb 17 16:43:35 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator b368e2f95a4196769fcd53bfa8cbc6fc
Feb 17 16:43:35 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator 711bfccf43ee87ba89b1fb6165f8c9ab
Feb 17 16:43:35 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator 51d709d2d5cec4e99e208ef423c4e141
Feb 17 16:43:35 hugin ods-signerd: [adapter] read zone eduvpn.no from dns input adapter /usr/pkg/etc/opendnssec/addns.xml
Feb 17 16:43:35 hugin ods-signerd: [adapter] no new xfr ready for zone eduvpn.no
Feb 17 16:43:35 hugin ods-signerd: zone eduvpn.no unsigned data not changed, continue
Feb 17 16:43:35 hugin ods-signerd: [scheduler] unschedule task [sign] for zone eduvpn.no
Feb 17 16:43:35 hugin ods-signerd: [scheduler] schedule task [sign] for eduvpn.no

And...  Bumping the SOA on the hidden master doesn't cause the
signature over the DNSKEY records to be done with a different KSK
either:

:~> dig @nnn -y xxx eduvpn.no. axfr | grep RRSIG | grep -v 19269 | grep -v NSEC
eduvpn.no.              3600    IN      RRSIG   DNSKEY 8 2 3600 20200306152032 20200214042507 14192 eduvpn.no. BFwyM2T1HTLhgMuQNXwQ6lwRZiN9B9ZTM8uKUEy25kGNzNLw/R9R6Mi+ mVKzKpm0k8x3zoSIvVziEE1c4dGSjvZU9AIv6h+kB1pa3EkQnho5vUGV mjrrb+h+B94A3wb187DNUzqwJ8/Co5g7oECHngnepkmeMigzAJav7qCp MfQMnInPjZxjCf0rzuJi87Fo8pqy5zWsIXnS2GmcGXIVq9rZECRar4is ttUtYqml3jzycGs9xB29KR28rPGnpcNoUQUaxB2pYVY5ErSmAek5LWCi 6s44pdUzJ54L/h+7in5EkIFGFZZkgN3VK5QojFV7aHSl3xtZGqpG0fS5 vEYKrA==
:~> 

So now I'm sort of lost: how does OpenDNSSEC decide to take the new
KSK into use, and why does it still both publish and use the KSK which
has both "Act" and "Pub" set to zero?  In the log it says:

Feb 17 16:39:22 hugin ods-signerd: [file] openfile eduvpn.no.backup2 count 1
Feb 17 16:39:22 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator b368e2f95a4196769fcd53bfa8cbc6fc
Feb 17 16:39:22 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator 711bfccf43ee87ba89b1fb6165f8c9ab
Feb 17 16:39:22 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator 51d709d2d5cec4e99e208ef423c4e141
Feb 17 16:39:22 hugin ods-signerd: [backup] read RRs eduvpn.no

and the DNSKEY starting with b368 has both "Pub" and "Act" set to
zero, ref. above.  But ... in the eduvpn.no.backup2 file, the key with
keytag 14192 is listed with "publish 1".  Why does that information
seem to persist?

The signconf xml file for this zone appears to have only two keys
tagged with <Publish/>, one ZSK, one KSK, and the KSK marked with
<Publish/> is the one starting with 51d7, so the "correct" one.


Also, missing from the list of actions my scripts currently handle is
"clean-up of old KSKs".  Apparently OpenDNSSEC 2.x needs a further
nudge to remove KSKs which have entirely gone out of use via
"ods-enforcer key purge".  I think I asked about what the condition
should be to initiate that action, I suspect I'll have to inspect the
"ods-enforcer key list -d" output for the zone, and do this for keys
where a specific match condition for several of the fields holds (all
either "unretentive" or "NA", and both Pub and Act 0?)  But is that
any different from the state required for "ds-retire" (and then
subsequently "ds-gone")?


Best regards,

- Håvard


More information about the Opendnssec-user mailing list