[Opendnssec-user] Zone stuck, not updating

Havard Eidnes he at uninett.no
Mon Oct 27 09:14:45 UTC 2014


>> I'm using DNS zone transfers in and out of OpenDNSSEC with OpenDNSSEC
>> version 1.4.6.  It looks like one of the zones have become wedged, and
>> OpenDNSSEC refuses to transfer a new copy, despite a new SOA being
>> announced via DNS notify.  ods-signerd logs:
>>
>> <timestamp+host> ods-signerd: [query] ignore notify from a.b.c.d: zone xxx.yyy.no transfer in progress
>>
>> What makes it think it's currently transferring the zone, and is there
>> something I can do to clear that state?  I've done a full restart of
>> OpenDNSSEC via "ods-control stop" and "ods-control start", to no
>> avail.
>
> Did you check to make sure that the ods-signerd process had actually
> exited after invoking "ods-control stop"? I've noticed in our setup that
> if the signer is in a hanging state it may not always respond
> appropriately to the "stop" command and I've had to manually kill the
> process.

Well, I must admit that I didn't there and then; I thought that when
"ods-control stop" said "Stopping signer engine..." and then "Engine
shut down." that it didn't blatantly lie in that last statement...

Anyway, I managed to find what was causing a restarted OpenDNSSEC to
still refuse to act on the notify message, even though no actual zone
transfer is ongoing.  It seems that the zone's internal variable
"serial_notify_acquired" determines if it should act on the received
notify message, ref. query.c's query_process_notify().  Furthermore,
this variable is apparently being read from disk if there exists a
<zone>.xfrd-state file on startup, ref. xfrd.c's xfrd_recover(), with
no further "reality check" whether the transfer is indeed ongoing.
Thus, this state is going to be persistent unless you as an operator
intervene by removing the offending <zone>.xfrd-state file before
starting OpenDNSSEC.

If I run this shell one-liner, I find that I still have two zones in
this state:

  awk '/^;;Serial/ { if ($7 != 0) { print FILENAME " " $7  }}' *.xfrd-state

Hmm, it is probably going to be difficult without further
customization of the zone content to write an external tester for
whether this state has been entered, since the SOA version number
regimes are distrinct between the hidden master and OpenDNSSEC --
OpenDNSSEC keeps publishing its own new SOA versions on re-signing.
One possible way is to introduce a TXT record with the original SOA
number from the hidden master to be able to detect this problem
externally...

I've also submitted most of these findings in a bug report, ref.

  https://issues.opendnssec.org/browse/SUPPORT-147

I'll concede that this *may* have happened due to a system shutdown,
and processes being stopped only with a TERM signal.  However, this
indicates that the "read state of ongoing zone transfers from disk
file with no further checking" is not robust.

Regards,

- Håvard



More information about the Opendnssec-user mailing list