[Opendnssec-user] Zone stuck, not updating

Havard Eidnes he at uninett.no
Thu Oct 30 14:25:22 UTC 2014


>>> I'm using DNS zone transfers in and out of OpenDNSSEC with OpenDNSSEC
>>> version 1.4.6.  It looks like one of the zones have become wedged, and
>>> OpenDNSSEC refuses to transfer a new copy, despite a new SOA being
>>> announced via DNS notify.  ods-signerd logs:
>>>
>>> <timestamp+host> ods-signerd: [query] ignore notify from a.b.c.d: zone
>>> xxx.yyy.no transfer in progress
>>
>> This may be a bit misleading log message: The query code checks
>> whether there is already a notify acquired. If so, there is a check to
>> see if the incoming notify has a serial newer than OpenDNSSEC knows
>> of. If not, it will log this message.
>>
>> "Look, I got a notify already and need to transfer anyway" was perhaps
>> a better log message. Or perhaps "updated notify serial to
>> <new_serial>".
>
> Hmm...  That doesn't match with the observed behaviour.  What I saw
> was that I did an update of the zone on the hidden master, but the new
> zone with the updated SOA version number (for the SOA versioning
> regime between the hidden master and OpenDNSSEC) was not being
> transferred to the OpenDNSSEC host.  This state persisted for at least
> a day, until the user who requested the additions complained that they
> were still not visible in the public DNS, and an investigation
> confirmed this -- the distribution master which is at the exit portion
> of OpenDNSSEC didn't have the newly added records.
> 
> Hm, I may have read the code in query_process_notify() wrong, and my
> initial explanation of the bug was possibly wrong.  But at least I'm
> pretty certain of my observed behaviour: changes from the hidden
> master did not make it through OpenDNSSEC and out to the publication
> master, and stopping OpenDNSSEC, removing the xfrd.state file and
> restarting OpenDNSSEC fixed the logjam.

OpenDNSSEC has now decided that the zone has "expired".  The
Serial line from the .xfrd-state file looks like this:

;;Serial: xfr 2014091709 1414052224 notify 2014102302 1414059442 disk 2014091709 1414052224

Decoded:

serial_xfr = 2014091709
serial_xfr_acquired = Thu Oct 23 10:17:04 CEST 2014
serial_notify = 2014102302
serial_notify_acquired = Thu Oct 23 12:17:22 CEST 2014
serial_disk = 2014091709
serial_disk_acquired = Thu Oct 23 10:17:04 CEST 2014

So the notify serial# received is much newer than the one on disk
or from the last zone transfer.

However, it looks like serial_notify_acquired is not updated; in
my log I have a log message saying:

Oct 28 11:39:49 ods-host ods-signerd: [query] ignore notify from x.x.x.x: zone x.y.z.in-addr.arpa transfer in progress

Further, it seems that the signer is making repeated attempts at
doing IXFR zone transfer of the zone.  OpenDNSSEC is logging

Oct 28 20:07:37 ods-host ods-enforcerd: Zone x.y.z.in-addr.arpa found.
Oct 28 20:07:37 ods-host ods-enforcerd: Policy for x.y.z.in-addr.arpa set to de
fault.
Oct 28 20:07:37 ods-host ods-enforcerd: Config will be output to /var/opendnssec/si
gnconf/x.y.z.in-addr.arpa.xml.
Oct 28 20:07:37 ods-host ods-enforcerd: No change to: /var/opendnssec/signconf/46.3
9.128.in-addr.arpa.xml
Oct 28 20:13:01 ods-host ods-signerd: [STATS] x.y.z.in-addr.arpa 2014102801 RR[
count=0 time=0(sec)] NSEC3[count=0 time=0(sec)] RRSIG[new=2 reused=251 time=0(se
c) avg=0(sig/sec)] TOTAL[time=0(sec)] 
Oct 28 20:58:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request ud
p/ixfr=3180924024 to a.b.c.d
Oct 28 20:58:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received t
oo short udp reply from a.b.c.d, retry tcp
Oct 28 20:58:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tc
p/ixfr=3180924024 to a.b.c.d
Oct 28 20:58:40 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request ud
p/ixfr=3180924024 to a.b.c.d
Oct 28 20:58:40 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received t
oo short udp reply from a.b.c.d, retry tcp
Oct 28 20:58:40 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tc
p/ixfr=3180924024 to a.b.c.d
Oct 28 20:59:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request ud
p/ixfr=3180924024 to a.b.c.d
Oct 28 20:59:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received t
oo short udp reply from a.b.c.d, retry tcp
Oct 28 20:59:10 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tc
p/ixfr=3180924024 to a.b.c.d

However, I do *not* see a log message fitting with this line in
the code for this particular zone:

    ods_log_info("[%s] zone %s transfer done [notify acquired %u, serial on "
        "disk %u, notify serial %u]", xfrd_str, zone->name,
        xfrd->serial_notify_acquired, xfrd->serial_disk,
        xfrd->serial_notify);

On the BIND side (which is the hidden master), I see these lines
corresponding to this:

Oct 28 20:58:10 hidden-master named[4795]: client a.b.e.f#61709/key tsig-key-name (x.y.z.in-addr.arpa): transfer of 'x.y.z.in-addr.arpa/IN': IXFR ended
Oct 28 20:58:40 hidden-master named[4795]: client a.b.e.f#61707/key tsig-key-name (x.y.z.in-addr.arpa): transfer of 'x.y.z.in-addr.arpa/IN': IXFR ended
Oct 28 20:59:10 hidden-master named[4795]: client a.b.e.f#61706/key tsig-key-name (x.y.z.in-addr.arpa): transfer of 'x.y.z.in-addr.arpa/IN': IXFR ended

So...  Can it be that there's an interoperability problem between
OpenDNSSEC and BIND when it comes to doing IXFR?


When zone transfers work, I see

Oct 28 11:39:50 ods-host ods-signerd: [xfrd] zone ok.x.y.in-addr.arpa request udp/ixfr=2014102701 to a.b.c.d
Oct 28 11:39:50 ods-host ods-signerd: [xfrd] zone ok.x.y.in-addr.arpa received too short udp reply from a.b.c.d, retry tcp
Oct 28 11:39:50 ods-host ods-signerd: [xfrd] zone ok.x.y.in-addr.arpa request tcp/ixfr=2014102701 to a.b.c.d
Oct 28 11:39:50 ods-host ods-signerd: [xfrd] zone ok.x.y.in-addr.arpa transfer done [notify acquired 1414492790, serial on disk 2014102804, notify serial 2014102804]


Hmm....  Upping the logging level for the signer should reveal
some more details.  OK, here we go:

Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa timeout
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa make request [udp round 0 master a.b.c.d:0]
Oct 30 13:53:08 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name.
Oct 30 13:53:08 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256.
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] tsig append rr to request id=53126
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request udp/ixfr=3180924024 to a.b.c.d
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] send 215 bytes over udp to a.b.c.d
Oct 30 13:53:08 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event udp read
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa read data from udp
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received too short udp reply from a.b.c.d, retry tcp
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 3)
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] truncation from a.b.c.d
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa open tcp connection to a.b.c.d
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tcp/ixfr=3180924024 to a.b.c.d
Oct 30 13:53:08 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name.
Oct 30 13:53:08 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256.
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] tsig append rr to request id=40799
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sending tcp query id=40799
Oct 30 13:53:08 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp write
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa done writing, get ready for reading
Oct 30 13:53:08 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read
Oct 30 13:53:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 1)
Oct 30 13:53:08 ods-host ods-signerd: [file] open file  file=x.y.z.in-addr.arpa.xfrd mode=appending
Oct 30 13:53:08 ods-host ods-signerd: [file] openfile x.y.z.in-addr.arpa.xfrd count 1
Oct 30 13:53:08 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa release tcp connection to a.b.c.d
Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:53:38 ods-host ods-signerd: [netio] dispatch timeout event without checking for other events
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa timeout
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa make request [udp round 1 master a.b.c.d:0]
Oct 30 13:53:38 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name.
Oct 30 13:53:38 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256.
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] tsig append rr to request id=28997
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request udp/ixfr=3180924024 to a.b.c.d
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] send 215 bytes over udp to a.b.c.d
Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event udp read
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa read data from udp
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received too short udp reply from a.b.c.d, retry tcp
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 3)
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] truncation from a.b.c.d
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa open tcp connection to a.b.c.d
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tcp/ixfr=3180924024 to a.b.c.d
Oct 30 13:53:38 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name.
Oct 30 13:53:38 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256.
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] tsig append rr to request id=50256
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sending tcp query id=50256
Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp write
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa done writing, get ready for reading
Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read
Oct 30 13:53:38 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 1)
Oct 30 13:53:38 ods-host ods-signerd: [file] open file  file=x.y.z.in-addr.arpa.xfrd mode=appending
Oct 30 13:53:38 ods-host ods-signerd: [file] openfile x.y.z.in-addr.arpa.xfrd count 1
Oct 30 13:53:38 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa release tcp connection to a.b.c.d
Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:54:08 ods-host ods-signerd: [netio] dispatch timeout event without checking for other events
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa timeout
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa make request [udp round 2 master a.b.c.d:0]
Oct 30 13:54:08 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name.
Oct 30 13:54:08 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256.
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] tsig append rr to request id=40835
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request udp/ixfr=3180924024 to a.b.c.d
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] send 215 bytes over udp to a.b.c.d
Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event udp read
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa read data from udp
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa received too short udp reply from a.b.c.d, retry tcp
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 3)
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] truncation from a.b.c.d
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa open tcp connection to a.b.c.d
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa request tcp/ixfr=3180924024 to a.b.c.d
Oct 30 13:54:08 ods-host ods-signerd: [domain] tsig sign query with key: tsig-key-name.
Oct 30 13:54:08 ods-host ods-signerd: [domain] tsig sign query with algorithm: hmac-sha256.
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] tsig append rr to request id=51054
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sending tcp query id=51054
Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp write
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa done writing, get ready for reading
Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read
Oct 30 13:54:08 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa xfr packet parsed (res 1)
Oct 30 13:54:08 ods-host ods-signerd: [file] open file  file=x.y.z.in-addr.arpa.xfrd mode=appending
Oct 30 13:54:08 ods-host ods-signerd: [file] openfile x.y.z.in-addr.arpa.xfrd count 1
Oct 30 13:54:08 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa event tcp read
Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout now
Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa release tcp connection to a.b.c.d
Oct 30 13:54:39 ods-host ods-signerd: [xfrhandler] netio dispatch
Oct 30 13:54:39 ods-host ods-signerd: [netio] dispatch timeout event without checking for other events
Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa timeout
Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa sets timer timeout retry 3600
Oct 30 13:54:39 ods-host ods-signerd: [xfrd] zone x.y.z.in-addr.arpa make request wait retry
Oct 30 13:54:39 ods-host ods-signerd: [xfrhandler] netio dispatch

If I manually try to do the IXFR with dig from the OpenDNSSEC
host, it gives me the entire zone, and ends with SOA and TSIG,
and the dig output ends with

% dig @a.b.c.d -y hmac-sha256:<keyname>:<keymaterial> x.y.z.in-addr.arpa. ixfr=2014091709
...
;; Query time: 5 msec
;; SERVER: a.b.c.d#53(a.b.c.d)
;; WHEN: Thu Oct 30 14:08:13 2014
;; XFR size: 130 records (messages 1, bytes 3762)

%

So ... my current best bet is that the problem is a lack of
interoperation with BIND when it in essence sends the whole zone
in response to an IXFR query.

Hmm, looking at tcpdump -v output of a failed transfer attempt by
OpenDNSSEC and a successful one with dig reveals some differences;
I'll send you the details separately, but the start looks ...
"mis-encoded":

    ods-host.uninett.no.56056 > hidden-master.uninett.no.domain: Flags [P.], cksum 0x9bcc (correct), seq 3:218, ack 1, win 4197, options [nop,nop,TS val 1 ecr 2], length 2150 [b2&3=0x1] [1a] [0q] [1n] [564au] ^Bbb^Caaa^Gin-addr^Darpa^@^@M...

vs.

    ods-host.uninett.no.55751 > hidden-master.uninett.no.domain: Flags [P.], cksum 0x858c (correct), seq 1:171, ack 1, win 4197, options [nop,nop,TS val 1 ecr 1], length 17062286 [1n] [2au] IXFR? cc.bb.aaa.in-addr.arpa. ns: cc.bb.aaa.in-addr.arpa. SOA . . 2014091709 0 0 0 0 ar: . ....

and a failed OpenDNSSEC transaction transpires like this:

14:50:45.574240 IP hidden-master.uninett.no.domain > ods-host.uninett.no.56056: Flags [P.], seq 1:186, ack 218, win 4197, options [nop,nop,TS val 2 ecr 1], length 18550176*- 1/0/1 SOA (183)
14:50:45.773002 IP ods-host.uninett.no.56056 > hidden-master.uninett.no.domain: Flags [.], ack 186, win 4197, options [nop,nop,TS val 2 ecr 2], length 0
14:51:15.648025 IP hidden-master.uninett.no.domain > ods-host.uninett.no.56056: Flags [F.], seq 186, ack 218, win 4197, options [nop,nop,TS val 62 ecr 2], length 0

i.e. the BIND instance closes the tcp session after 30 seconds of
inactivity.


So...  Failure to do correct IXFR over TCP, perhaps?

Regards,

- Håvard



More information about the Opendnssec-user mailing list