[Opendnssec-user] Key and DS states
Havard Eidnes
he at uninett.no
Mon Feb 17 16:50:14 UTC 2020
Hm,
my questions went unanswered? That's slightly disappointing...
Here's a new and different formulation.
I apologize for the length and the inclusion of multiple issues in the
same e-mail message -- I don't appear to have become good friends with
OpenDNSSEC 2.x yet...
The scripts I maintain which surrounds our OpenDNSSEC installation
are primarily concerned with automating the required follow-up of KSK
rotations, which in our configuration happens once per year per zone.
I have tentatively modified the scripts I use with OpenDNSSEC 1.x to
work with OpenDNSSEC 2.x, but not put them into production yet. Here
you'll find an overall description of what they are currently set up
to do. The question I have is basically "are some of the actions not
needed, or the conditions for the actions needs to be changed?"
1) Treat all the KSKs in "ods-enforcer key list" output which are
in state "ready":
a) extract the DS, query all the publishing NSes for the zone if
the new DS is being published. Signal "ds-seen" for those that
do.
b) otherwise, extract the DS, signal "ds-submit", and take actions
to have the DS added to the parent zone.
2) Treat all the KSKs in "ods-enforcer key list" output which are
in state "retire":
a) extract the DS, and check whether all the publishing NSes for
the zone have stopped publishing the DS record. If not, signal
"ds-retract" for the corresponding key, and initiate action to
remove the DS record from the parent zone.
b) If, on the other hand, the DS is no longer being published by
the publishing NSes, signal "ds-gone" for the corresponding key.
Cursory testing seems to indicate, however, that signaling
"ds-retract" in 2a) "immediately" doesn't work the way I expect it to;
with this set of states I get:
% ods-enforcer key list --zone eduvpn.no -v
Keys:
Zone: Keytype: State: Date of next transition: Size: Algorithm: CKA_ID: Repository: KeyTag:
eduvpn.no KSK retire waiting for ds-gone 2048 8 b368e2f95a4196769fcd53bfa8cbc6fc SoftHSM 14192
eduvpn.no ZSK retire ds-unsubmitted 1280 8 4aa28c4aa9e49e92479c5842f7c17522 SoftHSM 38604
eduvpn.no ZSK active ds-unsubmitted 1280 8 711bfccf43ee87ba89b1fb6165f8c9ab SoftHSM 19269
eduvpn.no KSK ready waiting for ds-seen 2048 8 51d709d2d5cec4e99e208ef423c4e141 SoftHSM 40393
% ods-enforcer key ds-seen --zone eduvpn.no -x 40393
1 KSK matches found.
1 KSKs changed.
% ods-enforcer key list -d --zone eduvpn.no
Keys:
Zone: Key role: DS: DNSKEY: RRSIGDNSKEY: RRSIG: Pub: Act: Id:
eduvpn.no KSK unretentive omnipresent omnipresent NA 1 1 b368e2f95a4196769fcd53bfa8cbc6fc
eduvpn.no ZSK NA hidden NA hidden 0 0 4aa28c4aa9e49e92479c5842f7c17522
eduvpn.no ZSK NA omnipresent NA omnipresent 1 1 711bfccf43ee87ba89b1fb6165f8c9ab
eduvpn.no KSK rumoured omnipresent omnipresent NA 1 1 51d709d2d5cec4e99e208ef423c4e141
% ods-enforcer key ds-retract --zone eduvpn.no -x 14192
0 KSK matches found.
0 KSKs changed.
% ods-enforcer key ds-retract --zone eduvpn.no -k b368e2f95a4196769fcd53bfa8cbc6fc
0 KSK matches found.
0 KSKs changed.
% echo $status
11
%
What is going on here? Do I have to wait with "ds-retract" till Pub
and Act are both 0 for the KSK (that would sort of make sense), and
will OpenDNSSEC 2.x transition to that state all by itself based on
the configured timers?
The old "key list -v" output says "waiting for ds-gone"; is
"ds-retract" not a required or desireable action? Ref.
% ods-enforcer key list -v --zone eduvpn.no
Keys:
Zone: Keytype: State: Date of next transition: Size: Algorithm: CKA_ID: Repository: KeyTag:
eduvpn.no KSK retire waiting for ds-gone 2048 8 b368e2f95a4196769fcd53bfa8cbc6fc SoftHSM 14192
eduvpn.no ZSK retire ds-unsubmitted 1280 8 4aa28c4aa9e49e92479c5842f7c17522 SoftHSM 38604
eduvpn.no ZSK active ds-unsubmitted 1280 8 711bfccf43ee87ba89b1fb6165f8c9ab SoftHSM 19269
eduvpn.no KSK active ds-seen 2048 8 51d709d2d5cec4e99e208ef423c4e141 SoftHSM 40393
%
One oddity I noticed while looking at this: sometimes "ods-enforcer
key list -d -z eduvpn.no" will hang after e.g. having printed
information for the first of 4 keys for the zone. Meanwhile, there is
a steady stream of messages in the log of this form:
Feb 17 11:46:24 nnn ods-signerd: [rrset] unable to sign RRset[47]: lhsm_sign() failed
Feb 17 11:46:24 nnn ods-signerd: [hsm] error signing rrset with libhsm
Feb 17 11:46:24 nnn ods-signerd: [rrset] unable to sign RRset[12]: lhsm_sign() failed
Feb 17 11:46:24 nnn ods-signerd: [hsm] sign final: CKR_GENERAL_ERROR
Feb 17 11:46:24 nnn ods-signerd: [hsm] error signing rrset with libhsm
Feb 17 11:46:25 nnn ods-signerd: [hsm] sign final: CKR_GENERAL_ERROR
Feb 17 11:46:25 nnn ods-signerd: [hsm] error signing rrset with libhsm
Feb 17 11:46:25 nnn ods-signerd: [rrset] unable to sign RRset[47]: lhsm_sign() failed
Then, after minutes, the three remaining lines corresponding to the
three remaining keys of eduvpn.no are printed.
I initially wrote that I hoped this error is benign, and that
ods-signerd will re-try the signing at a later stage, but given the
continued use of a now-unpublished KSK (see below) I'm not so sure...
(It may be a separate issue, though.)
Anyway, after a while (a few hours) the "key list -d" display goes to:
% ods-enforcer key list -d --zone eduvpn.no
Keys:
Zone: Key role: DS: DNSKEY: RRSIGDNSKEY: RRSIG: Pub: Act: Id:
eduvpn.no KSK unretentive unretentive unretentive NA 0 0 b368e2f95a4196769fcd53bfa8cbc6fc
eduvpn.no ZSK NA hidden NA hidden 0 0 4aa28c4aa9e49e92479c5842f7c17522
eduvpn.no ZSK NA omnipresent NA omnipresent 1 1 711bfccf43ee87ba89b1fb6165f8c9ab
eduvpn.no KSK omnipresent omnipresent omnipresent NA 1 1 51d709d2d5cec4e99e208ef423c4e141
%
I thought that when the "Pub" and "Act" flags are both zero, all use
of this key has ceased. However, if I do a zone transfer on the
"downstream" name server from OpenDNSSEC and look for the signatures
which are not made by the active ZSK, it looks like the key with
keytag 14192 is *still* being used to sign the DNSKEY RRset:
:~> dig @... -y xxx eduvpn.no. axfr | grep RRSIG | grep -v 19269 | grep -v NSEC
eduvpn.no. 3600 IN RRSIG DNSKEY 8 2 3600 20200306152032 20200214042507 14192 eduvpn.no. BFwyM2T1HTLhgMuQNXwQ6lwRZiN9B9ZTM8uKUEy25kGNzNLw/R9R6Mi+ mVKzKpm0k8x3zoSIvVziEE1c4dGSjvZU9AIv6h+kB1pa3EkQnho5vUGV mjrrb+h+B94A3wb187DNUzqwJ8/Co5g7oECHngnepkmeMigzAJav7qCp MfQMnInPjZxjCf0rzuJi87Fo8pqy5zWsIXnS2GmcGXIVq9rZECRar4is ttUtYqml3jzycGs9xB29KR28rPGnpcNoUQUaxB2pYVY5ErSmAek5LWCi 6s44pdUzJ54L/h+7in5EkIFGFZZkgN3VK5QojFV7aHSl3xtZGqpG0fS5 vEYKrA==
:~>
If that's the case, it's definately not safe to remove the DS record
from the parent zone until that signature has been replaced by one
using the key with keytag 40393. Is there a bug in the handling of
the "Act" flag for the key? Or ... is the criterion for when it's
safe to signal "ds-gone" even more complicated? Or ... hm, I have
once again on the previous re-sign attempt one of these no-info
"CKR_GENERAL_ERROR" errors logged, though it's unclear which zone that
relates to:
Feb 17 15:11:59 nnn ods-signerd: [scheduler] pop task for zone eduvpn.no
Feb 17 15:11:59 nnn ods-signerd: [scheduler] unschedule task [sign] for zone eduvpn.no
Feb 17 15:11:59 nnn ods-signerd: [worker[2]] start working
Feb 17 15:11:59 nnn ods-signerd: [scheduler] pop task for zone 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [scheduler] unschedule task [sign] for zone 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [namedb] zone eduvpn.no update serial: format=datecounter in=2020011900 internal=2020021700 out=2020021401 now=1581948719
Feb 17 15:11:59 nnn ods-signerd: [worker[3]] start working
Feb 17 15:11:59 nnn ods-signerd: [namedb] zone eduvpn.no update serial: 2020021401 + 299 = 2020021700
Feb 17 15:11:59 nnn ods-signerd: [zone] zone eduvpn.no set soa serial to 2020021700
Feb 17 15:11:59 nnn ods-signerd: [namedb] zone 0.9.6.6.9.4.6.7.4.nrenum.net update serial: format=datecounter in=2019030100 internal=2020021304 out=2020021304 now=1581948719
Feb 17 15:11:59 nnn ods-signerd: [namedb] zone 0.9.6.6.9.4.6.7.4.nrenum.net update serial: 2020021304 + 396 = 2020021700
Feb 17 15:11:59 nnn ods-signerd: [zone] zone 0.9.6.6.9.4.6.7.4.nrenum.net set soa serial to 2020021700
Feb 17 15:11:59 nnn ods-signerd: [hsm] sign final: CKR_GENERAL_ERROR
Feb 17 15:11:59 nnn ods-signerd: [hsm] error signing rrset with libhsm
Feb 17 15:11:59 nnn ods-signerd: [rrset] unable to sign RRset[48]: lhsm_sign() failed
Feb 17 15:11:59 nnn ods-signerd: [scheduler] schedule task [write] for 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [worker[3]] finished working
Feb 17 15:11:59 nnn ods-signerd: [worker[3]]: report for duty
Feb 17 15:11:59 nnn ods-signerd: [scheduler] pop task for zone 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [scheduler] unschedule task [write] for zone 0.9.6.6.9.4.6.7.4.nrenum.net
Feb 17 15:11:59 nnn ods-signerd: [worker[3]] start working
Feb 17 15:11:59 nnn ods-signerd: [worker[1]]: report for duty
Feb 17 15:11:59 nnn ods-signerd: [tools] skip write zone 0.9.6.6.9.4.6.7.4.nrenum.net serial 2020021700 (zone not changed)
Feb 17 15:11:59 nnn ods-signerd: [file] openfile 0.9.6.6.9.4.6.7.4.nrenum.net.backup2.tmp count 4294967281
Feb 17 15:11:59 nnn ods-signerd: [worker[2]] sign zone eduvpn.no failed: 1 RRsets failed
Feb 17 15:11:59 nnn ods-signerd: [worker[2]] CRITICAL: failed to sign zone eduvpn.no: General error
Feb 17 15:11:59 nnn ods-signerd: back-off task [sign] for zone eduvpn.no with 1920 seconds
Feb 17 15:11:59 nnn ods-signerd: [scheduler] schedule task [sign] for eduvpn.no
Feb 17 15:11:59 nnn ods-signerd: [task] On Mon Feb 17 15:43:59 2020 I will [sign] zone eduvpn.no
Doing a syscall trace while doing "ods-signer sign eduvpn.no" doesn't
reveal much more than an additional debug message which is normally
suppressed:
3212 12 ods-signerd GIO fd 11 wrote 146 bytes
"<11>1 2020-02-17T15:52:46.939477+01:00 nnn.uninett.no ods-signerd - \
- - OSSLRSA.cpp(115): An error occurred while performing a PKCS #1 sig\
nature"
But ... *what* error occurred, and *why*? I'd claim that the
diagnostics could do with some improvement, because it doesn't say,
and nothing in the preceding of the syscall trace indicates a syscall
gone wrong; it has read
3212 12 ods-signerd NAMI "/var/db/softhsm/f3ea35bc-ea0c-1b3e-c8a2-9106c88f4491/generation"
and
3212 12 ods-signerd NAMI "/var/db/softhsm/f3ea35bc-ea0c-1b3e-c8a2-9106c88f4491/48ecd976-da51-e0e5-baf7-ef9c9b01afc1.object"
successfully just prior to spewing that error message.
...
It seems something like this could possibly be added to SoftHSM 2.x:
--- src/lib/crypto/OSSLRSA.cpp.orig 2018-09-10 12:00:25.000000000 +0000
+++ src/lib/crypto/OSSLRSA.cpp
@@ -112,7 +112,15 @@ bool OSSLRSA::sign(PrivateKey* privateKe
if (sigLen == -1)
{
+ unsigned long err;
+ char *errstr;
+
ERROR_MSG("An error occurred while performing a PKCS #1 signature");
+ for (err = ERR_get_error(); err != 0;
+ err = ERR_get_error()) {
+ errstr = ERR_error_string(err, NULL);
+ ERROR_MSG(errstr);
+ }
return false;
}
(I'm sure there are other instances like this in SoftHSM 2.x...)
And ... re-starting OpenDNSSEC after adding the above and doing
"ods-signer sign eduvpn.no" does basically nothing, because OpenDNSSEC
cleverly claims the input hasn't changed:
Feb 17 16:43:35 hugin ods-signerd: [scheduler] pop task for zone eduvpn.no
Feb 17 16:43:35 hugin ods-signerd: [scheduler] unschedule task [forceread] for zone eduvpn.no
Feb 17 16:43:35 hugin ods-signerd: [worker[3]] start working
Feb 17 16:43:35 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator b368e2f95a4196769fcd53bfa8cbc6fc
Feb 17 16:43:35 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator 711bfccf43ee87ba89b1fb6165f8c9ab
Feb 17 16:43:35 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator 51d709d2d5cec4e99e208ef423c4e141
Feb 17 16:43:35 hugin ods-signerd: [adapter] read zone eduvpn.no from dns input adapter /usr/pkg/etc/opendnssec/addns.xml
Feb 17 16:43:35 hugin ods-signerd: [adapter] no new xfr ready for zone eduvpn.no
Feb 17 16:43:35 hugin ods-signerd: zone eduvpn.no unsigned data not changed, continue
Feb 17 16:43:35 hugin ods-signerd: [scheduler] unschedule task [sign] for zone eduvpn.no
Feb 17 16:43:35 hugin ods-signerd: [scheduler] schedule task [sign] for eduvpn.no
And... Bumping the SOA on the hidden master doesn't cause the
signature over the DNSKEY records to be done with a different KSK
either:
:~> dig @nnn -y xxx eduvpn.no. axfr | grep RRSIG | grep -v 19269 | grep -v NSEC
eduvpn.no. 3600 IN RRSIG DNSKEY 8 2 3600 20200306152032 20200214042507 14192 eduvpn.no. BFwyM2T1HTLhgMuQNXwQ6lwRZiN9B9ZTM8uKUEy25kGNzNLw/R9R6Mi+ mVKzKpm0k8x3zoSIvVziEE1c4dGSjvZU9AIv6h+kB1pa3EkQnho5vUGV mjrrb+h+B94A3wb187DNUzqwJ8/Co5g7oECHngnepkmeMigzAJav7qCp MfQMnInPjZxjCf0rzuJi87Fo8pqy5zWsIXnS2GmcGXIVq9rZECRar4is ttUtYqml3jzycGs9xB29KR28rPGnpcNoUQUaxB2pYVY5ErSmAek5LWCi 6s44pdUzJ54L/h+7in5EkIFGFZZkgN3VK5QojFV7aHSl3xtZGqpG0fS5 vEYKrA==
:~>
So now I'm sort of lost: how does OpenDNSSEC decide to take the new
KSK into use, and why does it still both publish and use the KSK which
has both "Act" and "Pub" set to zero? In the log it says:
Feb 17 16:39:22 hugin ods-signerd: [file] openfile eduvpn.no.backup2 count 1
Feb 17 16:39:22 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator b368e2f95a4196769fcd53bfa8cbc6fc
Feb 17 16:39:22 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator 711bfccf43ee87ba89b1fb6165f8c9ab
Feb 17 16:39:22 hugin ods-signerd: [zone] publish eduvpn.no DNSKEY locator 51d709d2d5cec4e99e208ef423c4e141
Feb 17 16:39:22 hugin ods-signerd: [backup] read RRs eduvpn.no
and the DNSKEY starting with b368 has both "Pub" and "Act" set to
zero, ref. above. But ... in the eduvpn.no.backup2 file, the key with
keytag 14192 is listed with "publish 1". Why does that information
seem to persist?
The signconf xml file for this zone appears to have only two keys
tagged with <Publish/>, one ZSK, one KSK, and the KSK marked with
<Publish/> is the one starting with 51d7, so the "correct" one.
Also, missing from the list of actions my scripts currently handle is
"clean-up of old KSKs". Apparently OpenDNSSEC 2.x needs a further
nudge to remove KSKs which have entirely gone out of use via
"ods-enforcer key purge". I think I asked about what the condition
should be to initiate that action, I suspect I'll have to inspect the
"ods-enforcer key list -d" output for the zone, and do this for keys
where a specific match condition for several of the fields holds (all
either "unretentive" or "NA", and both Pub and Act 0?) But is that
any different from the state required for "ds-retire" (and then
subsequently "ds-gone")?
Best regards,
- Håvard
More information about the Opendnssec-user
mailing list