[Opendnssec-user] Enforcer and signer starting and stopping order

Erik P. Ostlyngen erik.ostlyngen at norid.no
Tue Apr 18 07:11:05 UTC 2023


The background for my question about starting/stopping is that we had
an incident recently with zsk rollover on our lab installation. It
made me suspect that it may be critically important that the signer is
always running when the enforcer is. Even in shorter periods like
during startup and stopping.

This is what happened:

On april 14, opendnssec enforcer and signer logs messages like these:

2023-04-14T09:02:09+02:00 server097 ods-enforcerd[468050]: [enforcer]
update zone: hole.no
2023-04-14T09:02:09+02:00 server097 ods-enforcerd[468050]: [enforcer]
removeDeadKeys deleting key: 011c2c814fd5065b25da262ac05d103d
2023-04-14T09:02:09+02:00 server097 ods-enforcerd[468050]:
[hsm_key_factory_delete_key] looking for keys to purge from HSM
2023-04-14T09:02:09+02:00 server097 ods-signerd[467707]: File.cpp(94):
Could not open the file (No such file or directory):
/srv/data/softhsm2/tokens//771068f4-4858-0356-2340-10aa373f6cc5/a7529575-8bbb-3a2b-d80f-fc6be9069136.object
2023-04-14T09:02:09+02:00 server097 ods-signerd[467707]: File.cpp(94):
Could not open the file (No such file or directory):
/srv/data/softhsm2/tokens//771068f4-4858-0356-2340-10aa373f6cc5/a7529575-8bbb-3a2b-d80f-fc6be9069136.object
2023-04-14T09:02:09+02:00 server097 ods-enforcerd[468050]:
[hsm_key_factory_delete_key] removing key
011c2c814fd5065b25da262ac05d103d from HSM
2023-04-14T09:02:09+02:00 server097 ods-enforcerd[468050]: [enforcer]
removeDeadKeys: keys deleted from HSM: 1
2023-04-14T09:02:09+02:00 server097 ods-enforcerd[468050]: [enforcer]
reschedule enforcing policy due to deleting keys
2023-04-14T09:02:09+02:00 server097 ods-enforcerd[468050]:
[enforce_task] No changes to signconf file required for zone hole.no

This was repeated for 29 zones (our system runs 800 zones in total).
Some debugging showed that the affected zones had a discrepancy
between their signconf xml file and their 'backup2' temporary
zonefile. The signconf has one of the zsks marked with 'publish' and
the other not. In the 'backup2' file both keys are marked to be
published. Also, the key which signconf doesn't want to be published
is in backup2 marked as the active zsk.

Signconf for the example zone hole.no:
[...]
     <Keys>
       <TTL>PT1H</TTL>
       <Key>
         <Flags>257</Flags>
         <Algorithm>13</Algorithm>
         <Locator>788d3307150d0d3a4310f25740d7450c</Locator>
         <KSK/>
         <Publish/>
       </Key>
       <Key>
         <Flags>256</Flags>
         <Algorithm>13</Algorithm>
         <Locator>011c2c814fd5065b25da262ac05d103d</Locator>
       </Key>
       <Key>
         <Flags>256</Flags>
         <Algorithm>13</Algorithm>
         <Locator>89cdbd63f4c56ff9c12ea15d4ab26e04</Locator>
         <ZSK/>
         <Publish/>
       </Key>
     </Keys>

And in the backup2-file:
[...]
;;Key: locator 788d3307150d0d3a4310f25740d7450c algorithm 13 flags 257
publish 1 ksk 1 zsk 0 keytag 48376
;;Key: locator 011c2c814fd5065b25da262ac05d103d algorithm 13 flags 256
publish 1 ksk 0 zsk 1 keytag 5692
;;Key: locator 89cdbd63f4c56ff9c12ea15d4ab26e04 algorithm 13 flags 256
publish 1 ksk 0 zsk 0 keytag 49850

None of the signconf files had been updated since 31. march, i.e. two
weeks earlier. It seems clear that the key 49850 is the new zsk which
is about to be rolled in. Key 5692 is about to be rolled out. But the
signer is apparently not aware of this (all signatures are still being
made with 5692). Enforcer, however, thinks the roll has finished well,
so it eventually removes 5692 from HSM. Now, the signer cannot read
the old key and panics.

We then discovered that on march 31, the enforcer logged the following:

Mar 31, 2023 @ 09:00:50.000    server097    [signconf_cmd] unable to
notify signer of signconf changes for zone hole.no!

This is repeated for exactly the same 29 zones which had problems on
april 14. Also, this happened during startup of enforcer and signer
(in this order, with a 60 seconds interval between). I.e a 60 seconds
period of time when the enforcer was running but not the signer (which
probably explains the error message). It seems that after the signer
was not notified, it could not by itself discover that signconf had
changed, not even after a restart (it has been restarted many times in
the period from march 31. to april 14).

So, I would like to know:
* Are my analyses and conclusions correct?

* What are the correct actions to be done after we see the 'unable to
notify signer' messages in the logs (this might happen in other
circumstances, like an unintentional stop of the signer)? Apparently,
the signer will not afterwards by itself update the key status from
signconf.

Our system is running opendnssec 2.1.12 (since november 2022) on an
ubuntu 20.04 server. For the developers: I have backup data to
reproduce the situation.

Kind regards,
Erik Østlyngen



On 17/04/2023 13.53, Erik P. Ostlyngen via Opendnssec-user wrote:
> Hi,
> 
> I'm wondering what is the correct way to stop and start the 
> opendnssec enforcer and signer daemons. More specific, what is the
>  correct order to do this?
> 
> Since the enforcer wants to notify the signer from time to time 
> that signconf has changed, I would guess that the stopping order 
> would be enforcer first, then signer. And the starting order the 
> opposite way.
> 
> However, the software package includes a controller script, 
> ods-control. It seems to start the enforcer first, then the
> signer. And stops them in the same order. So, I'm confused. Can
> anyone enlighten?
> 
> Kind regards, Erik Østlyngen 
> _______________________________________________ Opendnssec-user 
> mailing list Opendnssec-user at lists.opendnssec.org 
> https://lists.opendnssec.org/mailman/listinfo/opendnssec-user


More information about the Opendnssec-user mailing list