[Opendnssec-user] Problems adding largish # of zones

Yuri Schaeffer yuri at nlnetlabs.nl
Thu Dec 17 13:12:46 UTC 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Håvard,

> Thanks for enduring my rant. :) Looking forward to see what you
> find.

Thanks again for bringing this to our attention and your analysis. My
grip on the problem has grown the last few days. Although I do not
have a proper fix I do know how to alleviate the pain on short term
and can comment on how to recover from a situation where zones get stuck
.

First of all, as you have found out, there is a list of current TCP
connections. When the number of concurrent connections stay below the
size of this list everything is fine and works as it should. When it
doesn't the signerd executes a different code path, holding off those
connection for later. This code path doesn't work as it should.
Judging from your analysis you are well aware of this.

So to cut to the chase. Based on my testing, on short term your
troubles should go away by increasing the number of this define in
tcpset.h

#define TCPSET_MAX 50

Make this something in the order of the number of zones you are adding
at once. I'd stay a bit away from 1024 as to allow for the signerd to
have some room for other file descriptors. So I'd advice maybe 500 to
900.

Then there is also some SOA handling that behaves a bit inconsistent
depending on the number of notifies coming in. I recommend applying
the following attached patch: serial_handling.diff

We'd appreciate you testing this changes. We have not yet decided if
we'll be releasing this or wait till we found a proper fix. Some test
feedback would be awesome!

Last I'd like to address 'getting unstuck'. You mentioned restarting
the signer and removing temp files helps partially. I can clarify this
a bit. In my test i've seen two scenarios: 1) N zones where added to
ODS where N>TCPSET_MAX. 2) N zones received a notify, where N>TCPSET_MAX
.

1) For me in the first scenario stopping and starting the signer
helps. Though the signerd will get stuck again after the following
TCPSET_MAX connections. So adding 320 zones you'd have to go through 7
stop/start iterations. Ofcourse later, likely these zones will update
at the same time? In case you get in situation 2).

2) This time restarting does not work. The stuckiness is persistent.
What helps is stopping the signer, remove the tmp files like you did,
start the signer and apply the stop/start strategy from scenario 1).

These two 'fixes' work better for higher values of TCPSET_MAX. I don't
really see a disadvantage to doing this. You'll use a couple of
kilobytes more memory on the heap, you won't see it in top. ;)

In the mean time we'll keep looking into an actual fix. I hope though
that these suggestions will relieve some of your pain.

Regards,
Yuri
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlZytM4ACgkQI3PTR4mhavi6zwCfddnqXoq7JKtr3HgcBgsy7XiZ
z9YAoJmpZgQnuNxMJ+ZpnxDrYj9T8cwI
=QqUh
-----END PGP SIGNATURE-----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: serial_handling.diff
Type: text/x-patch
Size: 4907 bytes
Desc: not available
URL: <http://lists.opendnssec.org/pipermail/opendnssec-user/attachments/20151217/a10a8966/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: serial_handling.diff.sig
Type: application/pgp-signature
Size: 72 bytes
Desc: not available
URL: <http://lists.opendnssec.org/pipermail/opendnssec-user/attachments/20151217/a10a8966/attachment-0005.bin>
-------------- next part --------------
_______________________________________________
Opendnssec-user mailing list
Opendnssec-user at lists.opendnssec.org
https://lists.opendnssec.org/mailman/listinfo/opendnssec-user


More information about the Opendnssec-user mailing list