[Opendnssec-user] Problems adding largish # of zones

Havard Eidnes he at uninett.no
Thu Dec 3 17:43:21 UTC 2015


Hi,

I recently had occasion to bump the number of zones in our OpenDNSSEC
installation by a significant amount -- around 320 zones were added in
"one go", first by a sequence of "ods-ksmutil zone add" commands, and
then with "ods-ksmutil update zonelist", followed by backing up the
SoftHSM and KASP databases, and then notifying the enforcer.

This reveals once again that I'm not able to operationally figure out
how to handle OpenDNSSEC, and I find it to be a quite frustrating
experience, probably because of bugs.

As usual, I'm running OpenDNSSEC 1.4.7 in "DNS in, DNS out" mode.

The problem I appear to have is that a largish number of the newly
added zones have not been transferred from the hidden master to
OpenDNSSEC.  What's more, there doesn't seem to come any new
initiative from OpenDNSSEC's signer to actually re-try the zone
transfers.  Instead, it has gotten it into its head that it's already
doing the zone transfers, but this is untrue, and no coaxing of the
running signer appears to be able to persuade it otherwise.

Instead, the signer keeps logging

Dec  3 17:37:09 hugin ods-signerd: [tools] unable to read zone <zonename>: adapter failed (Incoming zone transfer not ready)
Dec  3 17:37:09 hugin ods-signerd: [worker[4]] backoff task [read] for zone <zonename> with 3600 seconds

However, looking at a packet trace with the hidden master on the
signer machine reveals that ods-signerd did at this instance
*NOT* try to initiate a zone transfer.  It's as if ods-signerd
sits there idle in a loop expecting the zone files to somehow
magically appear in the file system, when I've given clear
instructions that it needs to use zone transfers to fetch the
data.

This, I suspect, goes back to the old complaint I've raised
before that there seems to be insufficient synchronization
between the different internal tasks in ods-signerd.  This can,
among other things, lead to alarming log messages which are
actually (I hope!) benign:

Dec  3 16:28:26 hugin ods-signerd: [worker[4]] CRITICAL: failed to sign zone <zone>: General error

(because that zone has yet to be transferred from the hidden master,
and is therefore not available) and this makes it quite difficult as
an operator to relate to *anything* OpenDNSSEC logs -- it all too
frequently cries "Wolf! Wolf!".  I beleive this is a problem which
needs to be solved.  I do realize that's no small task...

...and in the sequence where the zones were added, ods-enforcerd
complained:

Dec  3 16:12:58 hugin ods-enforcerd: INFO: Promoting ZSK from publish to active as this is the first pass for the zone
Dec  3 16:12:58 hugin ods-enforcerd: ERROR: Trying to make non-backed up ZSK active when RequireBackup flag is set

Yes, I've set RequireBackup, but that's not caused me to commit an
operational error?  Again, seen from an operator, this is "Wolf!
Wolf!" once again.


Meanwhile, I've run "ods-signer" and listed the work queue.  It
remains more or less steady at 366 tasks scheduled, one per zone, many
of them of this type:

On Thu Dec  3 18:33:43 2015 I will [read] zone <zone>

Typically, it's "working" on 4 of them:

cmd> queue
It is now Thu Dec  3 17:40:04 2015
Working with task [read] on zone <zone1>
Working with task [read] on zone <zone2>
Working with task [read] on zone <zone3>
Working with task [read] on zone <zone4>

I have 362 tasks scheduled.
...

However, again, when ods-signerd says "working with task [read]", it
appears it's always talking about "reading from the file system".
While it's doing this, *NO* activity is seen with my packet sniffer
related to these zones towards the hidden master.

I can give "ods-signer" the "flush" command, and while it re-schedules
the various tasks it has queued, it is not making any progress AT ALL
on transferring ANY of the newly added 320 (minus 51) zones which
remains.  According to the log (and the packet sniffer), ods-signer's
xfrd task is periodically probing some of the old already-established
zones, but out of the 320 zones added, 51 have made it, and by the
looks of it, no more of the newly added zones will ever automatically
be transferred from the hidden master.

The last entry I have in the log from the xfrd sub-task of ods-signerd
related to the newly added zones is:

Dec  3 16:28:26 hugin ods-signerd: [xfrd] zone <zone> transfer done [notify acquired 0, serial on disk 2015120315, notify serial 0]

and the local time is now well past 18:30.

Bumping the serial number on the hidden master for some of the new
not-yet-transferred zones and sending a notify just produces this
message in the OpenDNSSEC log:

Dec  3 18:21:30 hugin ods-signerd: [query] ignore notify from <hidden-master>: zone <zone> transfer in progress

to which I can only say "rubbish!", as a zone transfer is most
definately *NOT* in progress -- both the packet sniffer and the
display of the open FDs of ods-signerd disagrees.


Is it any wonder I'm frustrated with what appears to be an utter lack
of robustness in this area of functionality?

It looks like the only recourse I have is to restart OpenDNSSEC, and
then I'll once again get the problem that it falls over due to the
contents of the tmp/ files it has created itself.  Double sigh!


Regards,

- Håvard



More information about the Opendnssec-user mailing list