[Opendnssec-user] ods-signer taking a very long time to complete

Matthijs Mekking matthijs at nlnetlabs.nl
Mon Jul 23 12:16:24 UTC 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Paul,

When issuing a "sign zone" command, the signer will go to a couple of
locks:

- - zonelist lock (zl_lock): to look up the zone. zonelist unlock.

- - zone lock (zone_lock), schedule lock (schedule_lock): to reschedule
the zone task. schedule unlock, zone unlock.

The zl_lock and schedule_lock shouldn't be locked that long.

The zone_lock can take up an considerate amount of time: It is used
when signing the zone (read in file, update nsec records, sign zone,
write out file). Could it be that you issue the command during the
time the signer is busy with these tasks? How long does it normally
take to sign the zone (from reading in to writing out)?

If this sounds plausible, I think we can fix this by splitting the
lock: One for updating the task and zone , one for fiddling with the
zone contents.

Best regards,
  Matthijs


On 07/17/2012 05:10 PM, Paul Wouters wrote:
> 
> I've been trying to figure out why at times, sending an
> "ods-signer sign zonename" command seems to just hang there for
> extremely long times. I can see why the ods-signerd takes some
> time, but just sending the command over the socket should not stall
> for like 20+ minutes, it should take at most a few seconds.
> 
> Attaching gdb, I see:
> 
> (gdb) bt #0  0x00000030dd40b75b in
> pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 
> #1  0x0000000000417d56 in ods_thread_wait (cond=0x146e508, 
> lock=0x146e538, wait=3600) at shared/locks.c:135 #2
> 0x000000000040da85 in engine_run (engine=0x146e480, single_run=0)
> at daemon/engine.c:654 #3  0x000000000040f0fa in engine_start
> (cfgfile=<value optimized out>, cmdline_verbosity=<value optimized
> out>, daemonize=<value optimized out>, info=0, single_run=0) at
> daemon/engine.c:1022 #4  0x0000000000405f97 in main (argc=<value
> optimized out>, argv=<value optimized out>) at ods-signerd.c:165
> 
> Which leads to:
> 
> lock_basic_lock(&engine->signal_lock); if (engine->signal ==
> SIGNAL_RUN && !single_run) { ods_log_debug("[%s] taking a break",
> engine_str); lock_basic_sleep(&engine->signal_cond,
> &engine->signal_lock, 3600); } 
> lock_basic_unlock(&engine->signal_lock);
> 
> I'm a little worried here about locks that just last 1h. If
> handling things "every hour since startup" then timings between
> different two signers can be of by quite a time. Would it perhaps
> make sense to make this 3600 a configurable item so we can have
> ods-signerd check more frequently to see if there is work to be
> done?
> 
> Additionally, I would expect that ods-signer would wake up
> ods-signerd out of whatever lock it has to do "immediate work", but
> looking at our ods-signer command taking longer then 20 minutes,
> I'm not sure if this is guaranteed to happen, and it might be that
> ods-signer is waiting for "time since started modulo 3600" before
> it can deliver its command to the daemon.
> 
> Paul _______________________________________________ 
> Opendnssec-user mailing list Opendnssec-user at lists.opendnssec.org 
> https://lists.opendnssec.org/mailman/listinfo/opendnssec-user

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJQDUCYAAoJEA8yVCPsQCW5Mr4IAMyuLilgOvHjMeIKRq5M36Ya
Qip4+80o95NUcTJSHV9vYamtK8xq8Q9A8fCcKI5JW8O8dUXK0D+aMD6qzTAYWgIf
7zqkd79Z5sigLeRVkQMHOJD/XP/8/ZJqwNCLzVD6KIc7/9J/a57vuQ9SgOrD88Dh
N4gGk9Te2/gnBMyEs8MoIg1bs/VMJwLFGYepnOSv3SBlzzr1nci6zeN/Yangd8T4
dh/xKKlrznieebcqnfgVf2t/ucYazlLIcgfn7QEqNYHsXFY4IElxMymFRao3mnWP
aYiq2D+wUBmI6ocZr1UOJG4jzMhjpHa4NHKmw/SNuI4VszaZdW2NR+OS2u4Y0jc=
=jRLf
-----END PGP SIGNATURE-----



More information about the Opendnssec-user mailing list