[Opendnssec-user] Missing keys and various other problems on 2.0

Casper Gielen C.Gielen at uvt.nl
Fri Jun 22 11:44:34 UTC 2018


Hi,
I recently upgrade from ODS 1.0 to 2.0. For a while everything was fine
but now I'm running into problems. I have a lot of symptoms and issues
that may or may not be related. Sorry for the long mail, but I'm afraid
to leave out important information if I would split it smaller problems.


My main problem is that zones lose DNSKEYs and get stuk with unverifiable signatures.
Here is an example: 

  http://dnsviz.net/d/wiskundeoptiu.nl/Wytklw/dnssec/

# ods-enforcer key list --zone wiskundeoptiu.nl -v
Keys:
Zone:                 Keytype: State:    Date of next transition: Size: Algorithm: CKA_ID:                          Repository: KeyTag:
wiskundeoptiu.nl      KSK      retire    2018-06-20 15:14:02      2048  8          489db07082a644fcfa67f077627b7c7c LocalHSM    39466
wiskundeoptiu.nl      ZSK      retire    2018-06-20 15:14:02      1024  8          2f3c7829c40248b5537b3cd09266678c LocalHSM    50226
wiskundeoptiu.nl      KSK      active    2018-06-20 15:14:02      2048  8          758cc85fc16528184f32dbfab70663f6 LocalHSM    62161
wiskundeoptiu.nl      ZSK      active    2018-06-20 15:14:02      1024  8          8472c2bac0dbfc86d3a687644a3ef4f5 LocalHSM    59790
wiskundeoptiu.nl      ZSK      ready     2018-06-20 15:14:02      1024  8          3e97dcd131d9264cad2fb84676ade00e LocalHSM    28818

# ods-enforcer key list --zone wiskundeoptiu.nl -d
Keys:
Zone:                 Key role:   DS:          DNSKEY:      RRSIGDNSKEY: RRSIG:       Pub: Act: Id:
wiskundeoptiu.nl      KSK         hidden       hidden       hidden       NA           0    0    489db07082a644fcfa67f077627b7c7c
wiskundeoptiu.nl      ZSK         NA           hidden       NA           hidden       0    0    2f3c7829c40248b5537b3cd09266678c
wiskundeoptiu.nl      KSK         omnipresent  omnipresent  omnipresent  NA           1    1    758cc85fc16528184f32dbfab70663f6
wiskundeoptiu.nl      ZSK         NA           omnipresent  NA           omnipresent  1    1    8472c2bac0dbfc86d3a687644a3ef4f5
wiskundeoptiu.nl      ZSK         NA           omnipresent  NA           rumoured     1    1    3e97dcd131d9264cad2fb84676ade00e

ZSK 50226 is still being used but it is not published in the zone.
It used to be signed with KSK 39466 which is also missing.

The only reasonable root cause that I can think of is that for a while
the enforcer was running as root instead of the 'opendnssec' user. I've changed
that and made sure that the opendnssec-user is allowed to access the softhms2 files.


On to the other symptoms:
Sometimes my logs contain entries like:
 ods-enforcerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002

I think that was related to the wrong permissions. I think it means the key could not be found in the HSM.
Here is a little more context.

Jun 18 06:51:54 ramachandra ods-signerd: [worker[1]] report for duty
Jun 18 06:51:54 ramachandra ods-signerd: [scheduler] pop task for zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [scheduler] unschedule task [sign] for zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [worker[1]] start working on zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [worker[1]] perform task [sign] for zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [worker[1]] sign zone wiskundeoptiu.nl
Jun 18 06:51:54 ramachandra ods-signerd: [namedb] zone wiskundeoptiu.nl update serial: format=datecounter in=2018061502 internal=2018061800 out=0 now=1529297514
Jun 18 06:51:54 ramachandra ods-signerd: [namedb] zone wiskundeoptiu.nl update serial: 2018061502 + 298 = 2018061800
Jun 18 06:51:54 ramachandra ods-signerd: [zone] zone wiskundeoptiu.nl set soa serial to 2018061800
Jun 18 06:51:54 ramachandra ods-signerd: [worker[4]] report for duty
Jun 18 06:51:54 ramachandra ods-signerd: [worker[4]] nothing to do
Jun 18 06:51:54 ramachandra ods-signerd: [worker[2]] report for duty
Jun 18 06:51:54 ramachandra ods-signerd: [worker[2]] nothing to do
Jun 18 06:51:54 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:54 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:54 ramachandra ods-signerd: [worker[3]] report for duty
Jun 18 06:51:54 ramachandra ods-signerd: [worker[3]] nothing to do
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: ObjectFile.cpp(122): The attribute does not exist: 0x00000002
Jun 18 06:51:55 ramachandra ods-signerd: [hsm] unable to get key: key 489db07082a644fcfa67f077627b7c7c not found
Jun 18 06:51:55 ramachandra ods-signerd: [hsm] hsm_get_dnskey(): Got NULL key
Jun 18 06:51:55 ramachandra ods-signerd: [hsm] unable to get key: hsm failed to create dnskey
Jun 18 06:51:55 ramachandra ods-signerd: [zone] unable to prepare signing keys for zone wiskundeoptiu.nl: error getting dnskey
Jun 18 06:51:55 ramachandra ods-signerd: [worker[1]] CRITICAL: failed to sign zone wiskundeoptiu.nl: General error
Jun 18 06:51:55 ramachandra ods-signerd: [worker[1]] backoff task [sign] for zone wiskundeoptiu.nl with 3600 seconds
Jun 18 06:51:55 ramachandra ods-signerd: [worker[1]] finished working on zone wiskundeoptiu.nl
Jun 18 06:51:55 ramachandra ods-signerd: [scheduler] schedule task [sign] for zone wiskundeoptiu.nl
Jun 18 06:51:55 ramachandra ods-signerd: [task] On Mon Jun 18 07:51:55 2018 I will [sign] zone wiskundeoptiu.nl

These logs suggest that key 489db07082a644fcfa67f077627b7c7c (keytag 39466) is not in the HSM.
However, it is:

  # ods-hsmutil list LocalHSM  | grep 489db07082a644fcfa67f077627b7c7c
  LocalHSM              489db07082a644fcfa67f077627b7c7c  RSA/2048  

And it is in the SQL-database as well:
MariaDB [opendnssec]> SELECT * FROM hsmKey WHERE locator='489db07082a644fcfa67f077627b7c7c';
+-------+-----+----------+----------------------------------+-------+------+-----------+------+------------+-----------+---------+------------+--------+
| id    | rev | policyId | locator                          | state | bits | algorithm | role | inception  | isRevoked | keyType | repository | backup |
+-------+-----+----------+----------------------------------+-------+------+-----------+------+------------+-----------+---------+------------+--------+
| 36722 |   3 |        7 | 489db07082a644fcfa67f077627b7c7c |     2 | 2048 |         8 |    1 | 1528956951 |         0 |       1 | LocalHSM   |      3 |
+-------+-----+----------+----------------------------------+-------+------+-----------+------+------------+-----------+---------+------------+--------+


Here is a little more from the database related to that key:

MariaDB [opendnssec]> SELECT id,rev,policyId,signconfNeedsWriting,nextChange,ttlEndDs,ttlEndDk,ttlEndRs,nextKskRoll,nextZskRoll,nextCskRoll FROM zone WHERE name='wiskundeoptiu.nl';
+------+-----+----------+----------------------+------------+------------+------------+------------+-------------+-------------+-------------+
| id   | rev | policyId | signconfNeedsWriting | nextChange | ttlEndDs   | ttlEndDk   | ttlEndRs   | nextKskRoll | nextZskRoll | nextCskRoll |
+------+-----+----------+----------------------+------------+------------+------------+------------+-------------+-------------+-------------+
| 1068 |  35 |        7 |                    0 | 1530221972 | 1529575591 | 1529575591 | 1529583242 |  1560858829 |  1532014562 |           0 |
+------+-----+----------+----------------------+------------+------------+------------+------------+-------------+-------------+-------------+


MariaDB [opendnssec]> SELECT * FROM keyData WHERE hsmKeyId='36722';
+------+-----+--------+----------+-----------+------------+------+-------------+--------------+---------+-----------+---------+-----------+------------+--------+----------+
| id   | rev | zoneId | hsmKeyId | algorithm | inception  | role | introducing | shouldRevoke | standby | activeZsk | publish | activeKsk | dsAtParent | keytag | minimize |
+------+-----+--------+----------+-----------+------------+------+-------------+--------------+---------+-----------+---------+-----------+------------+--------+----------+
| 2773 |  22 |   1068 |    36722 |         8 | 1529044974 |    1 |           0 |            0 |       0 |         0 |       0 |         0 |          0 |  39466 |        4 |
+------+-----+--------+----------+-----------+------------+------+-------------+--------------+---------+-----------+---------+-----------+------------+--------+----------+

MariaDB [opendnssec]> SELECT  * FROM keyState WHERE keyDataId=2773;
+-------+-----+-----------+------+-------+------------+----------+-------+
| id    | rev | keyDataId | type | state | lastChange | minimize | ttl   |
+-------+-----+-----------+------+-------+------------+----------+-------+
| 11089 |   0 |      2773 |    0 |     0 | 1529044974 |        1 |  3600 |
| 11090 |   3 |      2773 |    2 |     0 | 1529571216 |        0 |  3600 |
| 11091 |   3 |      2773 |    3 |     0 | 1529412859 |        0 |  3600 |
| 11092 |   0 |      2773 |    1 |     4 | 1529044974 |        0 | 86400 |
+-------+-----+-----------+------+-------+------------+----------+-------+



This leads to the next symptom or problem: "DB prepare Err 2006: MySQL server has gone away"
The enforcer seems to lose it's connection to MariaDB:

Jun 21 10:51:42 ramachandra ods-enforcerd: DB prepare SQL SELECT policy.id, policy.rev, policy.name, policy.description, policy.signaturesResign, policy.signaturesRefresh, policy.signaturesJitter, policy.signaturesInceptionOffset, policy.signaturesValidityDefault, policy.signaturesValidityDenial, policy.signaturesValidityKeyset, policy.signaturesMaxZoneTtl, policy.denialType, policy.denialOptout, policy.denialTtl, policy.denialResalt, policy.denialAlgorithm, policy.denialIterations, policy.denialSaltLength, policy.denialSalt, policy.denialSaltLastChange, policy.keysTtl, policy.keysRetireSafety, policy.keysPublishSafety, policy.keysShared, policy.keysPurgeAfter, policy.zonePropagationDelay, policy.zoneSoaTtl, policy.zoneSoaMinimum, policy.zoneSoaSerial, policy.parentRegistrationDelay, policy.parentPropagationDelay, policy.parentDsTtl, policy.parentSoaTtl, policy.parentSoaMinimum, policy.passthrough FROM policy WHERE policy.denialType = ?
Jun 21 10:51:42 ramachandra ods-enforcerd: DB prepare Err 2006: MySQL server has gone away
Jun 21 10:51:42 ramachandra ods-enforcerd: [policy_resalt_task] retrying in 60 seconds

The are no errors in the sql-logs and restarting mariadb doesn't change anything.
However after restarting the enforcer the problem goes away for some time but seems to
return after a while. Despite that I've tuned various MySQL-parameters asscociated
with this error-message but with no noticeable change.


It's not that ods-enforcer is entirley stuck, I does respond toe queries like 'key list'.
In fact, my logs show thousands of mysql-statements are logged for each such command.
Listing the keys for 1 zone causes about 3.5k(!) SQL-calls.
Each call is nicely prepared, and then closed immediatly without ever being executed.
I'm not a MySQL/MariaDB-expert so I might be reading the logs wrong, but as far as I
understand SQL each query should only be prepared once and then executed repeatedly.
Immediatly after restaring the enforcer I can see some Executes in the logs but after a little
while they stop appearing while the prepare-statements keep on coming:

# grep Prepare mysql.log|wc -l
1277451

# grep Execute mysql.log|wc -l
9072

<note>
I run 'ods-enforcer key list' from a script every three minutes, that inflates
the total numbers. This is from about 6 hours of logging, which means about
120 calls to 'ods-enforce key list' from just that one script only.
I've proably done as many manual calls while debugging this.
</note>


Good:
    3152 Prepare   SELECT zone.id, zone.rev, zone.policyId, zone.name, zone.signconfNeedsWriting, zone.signconfPath, zone.nextChange, zone.ttlEndDs, zone.ttlEndDk, zone.ttlEndRs, zone.rollKskNow, zone.rollZskNow, zone.rollCskNow, zone.i
nputAdapterType, zone.inputAdapterUri, zone.outputAdapterType, zone.outputAdapterUri, zone.nextKskRoll, zone.nextZskRoll, zone.nextCskRoll FROM zone WHERE zone.id = ?
    3152 Execute   SELECT zone.id, zone.rev, zone.policyId, zone.name, zone.signconfNeedsWriting, zone.signconfPath, zone.nextChange, zone.ttlEndDs, zone.ttlEndDk, zone.ttlEndRs, zone.rollKskNow, zone.rollZskNow, zone.rollCskNow, zone.i
nputAdapterType, zone.inputAdapterUri, zone.outputAdapterType, zone.outputAdapterUri, zone.nextKskRoll, zone.nextZskRoll, zone.nextCskRoll FROM zone WHERE zone.id = 685
    3152 Close stmt     

Bad:
    3121 Prepare   SELECT zone.id, zone.rev, zone.policyId, zone.name, zone.signconfNeedsWriting, zone.signconfPath, zone.nextChange, zone.ttlEndDs, zone.ttlEndDk, zone.ttlEndRs, zone.rollKskNow, zone.rollZskNow, zone.rollCskNow, zone.inputAdapterType, zone.inputAdapterUri, zone.outputAdapterType, zone.outputAdapterUri, zone.nextKskRoll, zone.nextZskRoll, zone.nextCskRoll FROM zone WHERE zone.id = ?
    3121 Close stmt


Even though I call the first one 'good', it's not really ideal because the prepared statement
is closed after use instead of reused.


Finally, my HSM contains about 15k of unused, old keys (for a few hundred zones).
I don't know when, why or how exactly this occured (most of them were migrated
from softhsm1) but I do know my HSM is huge. I don't think it's the cause of my
problems but have included it anyway for the sake of completeness.

  # ods-enforcer zone list | wc -l
  369
  # ods-enforcer key list 2>& /dev/null | wc -l
  1177

About 1175 kesy (minus two lines for headers) for 369 zones seems right, 
and does agree with the keyData table:

  MariaDB [opendnssec]> select count(*) from keyData;
  +----------+
  | count(*) |
  +----------+
  |     1175 |
  +----------+

However hsmKey has another idea:
  MariaDB [opendnssec]> select count(*) from hsmKey;
  +----------+
  | count(*) |
  +----------+
  |     2031 |
  +----------+


Comparing the two tables reveals 856 keys that seem lost.
  MariaDB [opendnssec]> SELECT count(*) FROM hsmKey WHERE id NOT in (SELECT hsmKeyId FROM keyData);
  +----------+
  | count(*) |
  +----------+
  |      856 |
  +----------+

However ods-hsmutil sees even more keys:
 
  # ods-hsmutil list LocalHSM | wc -l
  7283 

and the number of files in the HSM is even larger:

  # ls   /var/lib/softhsm/tokens/1fee18db-b8ab-460a-072c-19d8e0f2cce9 | wc -l
  29113

29113 objects for 7283 keys seems to much.


However the HSM contains far more keys and even more objects that seem not listed at all.
At some point I'll need to clean this out but I would like to solve other problems first
before I start throwing out keys.


PS. The information in this mail was collected over several days. I've tried to present
a complete and consistent view but you might notice that timestamps jump around.

PS2. While debugging I was confronted with the lack of documentation for ODS2.
A lot of documentation refers to ODS1 and RFCS and the reader will have to decide what
parts are (still) relevent and which are not. On top of that the RFCs leave a lot of
freedom, for example with various key-replacement schemes, which gives the reader vary
little guidance on how ODS actually works.
Especially the relations between the various state-machines, the various types of data
(KSK, ZSK, DS, RRSIG), the timings and TTLs from the KASP, and ds-{submit,seen,...}
commands seem impossible to grasp without a lot of prior knowledge and experience.
I'm not entirely sure I get it everything myself after working with it for years.
While I understand that ODS can't be a complete introduction to DNSSEC for newbies, I
think the treshold has become unreasonable high.

PS3. At times I've longed for a '--force' options to commands like 'key rollover' or
'--use-key' for 'sign'.
I know ODS tries to protect the user from mistakes, but there have been times where I
know that DNSSEC validation is not working and ODS prevents a quick recovery. The
fastest way to fix such problems is to delete the zone from ODS and recreate it which
I consider a poor solution.

PS4. The zone 'wiskundeoptiu.nl' that is used as an example in this mail is new and
DNSSEC has never worked properly. The same problem occurs with other, existing zones, but
I can't experiment with those as easily.


Sorry for the long mail, and thank you for your time if you've made it until here.
I hope someone has any suggestions on how to improve the situation.





Addendum - KASP policy for wiskundeoptiu.nl
        <Policy name="sidn">
                <Signatures>
                        <Resign>PT30M</Resign>
                        <Refresh>P7D</Refresh>
                        <Validity>
                                <Default>P14D</Default>
                                <Denial>P14D</Denial>
                        </Validity>
                        <Jitter>PT12H</Jitter>
                        <InceptionOffset>PT600S</InceptionOffset>
                </Signatures>
                <Denial>
                        <NSEC3>
                                <Resalt>P100D</Resalt>
                                <Hash>
                                        <Algorithm>1</Algorithm>
                                        <Iterations>5</Iterations>
                                        <Salt length="8"/>
                                </Hash>
                        </NSEC3>
                </Denial>
                <Keys>
                        <TTL>PT3600S</TTL>
                        <RetireSafety>P1D</RetireSafety>
                        <PublishSafety>PT30M</PublishSafety>
                        <Purge>P30D</Purge>

                        <KSK>
                                <Algorithm length="2048">8</Algorithm>
                                <Lifetime>P1Y</Lifetime>
                                <Repository>LocalHSM</Repository>
                        </KSK>
                        <ZSK>
                                <Algorithm length="1024">8</Algorithm>
                                <Lifetime>P30D</Lifetime>
                                <Repository>LocalHSM</Repository>
                        </ZSK>
                </Keys>
                <Zone>
                        <PropagationDelay>PT30S</PropagationDelay>
                        <SOA>
                                <TTL>PT3600S</TTL>
                                <Minimum>PT300S</Minimum>
                                <Serial>datecounter</Serial>
                        </SOA>
                </Zone>
                <Parent>
                        <PropagationDelay>PT7200S</PropagationDelay>
                        <DS>
                                <TTL>PT3600S</TTL>
                        </DS>
                        <SOA>
                                <TTL>PT73600S</TTL>
                                <Minimum>PT600S</Minimum>
                        </SOA>
                </Parent>
        </Policy>


-- 
Casper Gielen <cgielen at uvt.nl> | LIS UNIX
PGP fingerprint = 16BD 2C9F 8156 C242 F981  63B8 2214 083C F80E 4AF7

Universiteit van Tilburg | Postbus 90153, 5000 LE
Warandelaan 2 | Telefoon 013 466 4100 | G 236 | http://www.uvt.nl





More information about the Opendnssec-user mailing list