[Opendnssec-user] Help with random signer crash

Matthijs Mekking matthijs at NLnetLabs.nl
Mon Feb 14 12:18:00 UTC 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Sebastian,

This sounds like a race condition to me. Although, I have not been able
to simulate this, it looks to me that the command handler is handling
the sign commands faster than that workers are created.

In the OpenDNSSEC-1.2 branch, I made a fix that tasklist, zonelist and
workers are created before starting the command handler (see below). I
would guess that this prevents the race condition from happening (as
well as the one you posted later).

Best regards,

Matthijs


Modified: branches/OpenDNSSEC-1.2/signer/src/daemon/engine.c
===================================================================
- --- branches/OpenDNSSEC-1.2/signer/src/daemon/engine.c	2011-02-11
13:11:34 UTC (rev 4436)
+++ branches/OpenDNSSEC-1.2/signer/src/daemon/engine.c	2011-02-14
10:34:57 UTC (rev 4437)
@@ -552,6 +552,11 @@
     se_log_assert(engine->config);
     se_log_debug("perform setup");

+    /* set up the work floor */
+    engine->tasklist = tasklist_create(); /* tasks */
+    engine->zonelist = zonelist_create(); /* zones */
+    engine_create_workers(engine); /* workers */
+
     /* create command handler (before chowning socket file) */
     engine->cmdhandler =
cmdhandler_create(engine->config->clisock_filename);
     if (!engine->cmdhandler) {
@@ -662,11 +667,6 @@
         return 1;
     }

- -    /* set up the work floor */
- -    engine->tasklist = tasklist_create(); /* tasks */
- -    engine->zonelist = zonelist_create(); /* zones */
- -    engine_create_workers(engine); /* workers */
- -
     return 0;
 }

On 02/11/2011 05:38 AM, Sebastian Castro wrote:
> My apologies in advance for this message, which is more venting that bug
> report.
> 
> In our testing environment we are hitting unexpected crashes from the
> signer, and refuses to give us light where it comes from.
> 
> The setup in a server running CentOS 5.5, equipped with a SCA6000,
> running openCryptoki and OpenDNSSEC 1.2.0
> 
> During start, the ods-signerd spits out the message below and reaches a
> state where all the threads can't progress because there is no
> "listener" behind the control sock (engine.sock)
> 
> sign2 openCryptokiModule[28609]: daemon/cmdhandler.c:209:
> cmdhandler_handle_cmd_sign: assertion cmdc->engine->tasklist failed
> 
> Note: for some reason we haven't investigated, the messages are logged
> under openCryptokiModule and not ods-signerd.
> 
> We've tried repeatedly to run ods-signerd using valgrind and gdb, and in
> both cases the signer DOESN'T CRASH!
> 
> So, in my search for wisdom I must ask: anyone has seen this error (I
> doubt it). If not, anyone would recommend an strategy to collect more
> useful information towards a diagnostic?
> 
> Cheers,
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJNWR13AAoJEA8yVCPsQCW5ISEH/31ItYK5ivOvc0t/SgZV1MoQ
1uSYYEztMduMyP/BZAk1C03O/fLnMb5l57i8ShnbcnUdgknOpito9vUctPusnjc7
bzcQ8WeagswzitW7llsqRNLeIKiPH37lJYnfxK25538xXuaYlcjrZrjJZENVzHab
+9ffEOqxffMbfSCCw9uwPZ2CQ7an01sqaR0fMAjwaWudMFw5w+Uo51I/1J1CRGVi
z3JrmIPTDcWLBkL0bebL/VvIPJtCKYwnuGxQy6V/0QpvQkxaKg9ylXjMEXwrbgNN
KreC/D1AUHEtgoLc8W2cZCwASKcpENFstWQ1fMuGju4aEobaiQSiGfTn5SZOzQg=
=s8f9
-----END PGP SIGNATURE-----



More information about the Opendnssec-user mailing list