[Opendnssec-develop] Re: [Opendnssec-otr] Locking mechanism problems

Matthijs Mekking matthijs at NLnetLabs.nl
Thu Sep 29 08:00:34 UTC 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 09/28/2011 04:59 PM, Jerry Lundström wrote:
> Hi,
> 
> I got this assertion error from Patrik Wallström yesterday and been
> looking into it some:
> 
> ods-signerd: ../../../signer/src/daemon/cmdhandler.c:322:
> cmdhandler_handle_cmd_sign: assertion zone->task failed
> 
> From what I can see the problem is that zonelist_update() may leave zone
> objects with a zone->task set to NULL before engine_update_zones() can
> assign a task to them. This can happen inside
> cmdhandler_handle_cmd_update() because it releases the lock on the
> zonelist before running engine_update_zones():
> 
> 1. cmdhandler_handle_cmd_update : lock zonelist
> 2. cmdhandler_handle_cmd_update : zonelist_update()
> 3. cmdhandler_handle_cmd_update : unlock zonelist
> 4. cmdhandler_handle_cmd_update : engine_update_zones()
> 5. engine_update_zones : lock zonelist
> 6. engine_update_zones : update zones
> 7. engine_update_zones : unlock zonelist
> 
> Some other thread might try and lock the zonelist at step 2 and then it
> would get that lock before engine_update_zones() can do its part.

I have tried to encounter this by checking zone->just_added.

cmdhandler.c:307:        if (zone && zone->just_added) {
cmdhandler.c:308:            zone = NULL;
cmdhandler.c:309:        }

But apparently there is another way to fail the assertion.

> A quick fix to this is to make engine_update_zones() aware of the lock
> on zonelist so it doesn't try and lock it and release it after,
> suggested patch below.
> 
> Altho I am seeing this almost everywhere and I would like to discuss our
> approach. I would really like to see read and write locks implemented to
> better support thread interoperability. Using
> pthread_mutex_trylock()/pthread_mutex_timedlock() where it suites to
> counter hangs. Using PTHREAD_MUTEX_RECURSIVE so different segments can
> lock the same locked object without letting each other know providing
> better OO. Of course as complexity in the locking mechanism increases so
> is the chance of dead locks.

The last sentence is exactly my reason to keep the locking mechanism
simple, so only lock/unlock.

> Why this hasn't turned up more might be that each zone is handled by one
> thread, only time two threads might be working on the same zone is if
> you do manually commands, and that the locking right now is very loose.

Each zone is indeed handled by one *worker* thread, with a manually
update, the cmdhandler/engine may have to update the zone.

> Thoughts?

I am not sure if this patch fixes the problem, When I look at the
assertion error, it looks to me that there is a call that invalidly sets
the zone->task to NULL.

http://trac.opendnssec.org/browser/branches/OpenDNSSEC-1.3/signer/src/daemon/engine.c#L897

Here task may be set to NULL, if the task is being worked on (not
scheduled).

http://trac.opendnssec.org/browser/branches/OpenDNSSEC-1.3/signer/src/daemon/engine.c#L923

Here the zone->task is set to task, which might be NULL if the task was
being worked on. This should not happen.

The same in cmdhandler.c when a update call is received.

Best regards,
  Matthijs


> 
> /Jerry
> 
> Index: signer/src/daemon/cmdhandler.c
> ===================================================================
> --- signer/src/daemon/cmdhandler.c(revision 5654)
> +++ signer/src/daemon/cmdhandler.c(working copy)
> @@ -190,11 +190,11 @@
>              cmdc->engine->zonelist->just_removed = 0;
>              cmdc->engine->zonelist->just_added = 0;
>              cmdc->engine->zonelist->just_updated = 0;
> -            lock_basic_unlock(&cmdc->engine->zonelist->zl_lock);
>  
>              ods_log_debug("[%s] commit zone list changes", cmdh_str);
> -            engine_update_zones(cmdc->engine);
> +            engine_update_zones(cmdc->engine, 1);
>              ods_log_debug("[%s] signer configurations updated", cmdh_str);
> +            lock_basic_unlock(&cmdc->engine->zonelist->zl_lock);
>          } else {
>              lock_basic_unlock(&cmdc->engine->zonelist->zl_lock);
>              (void)snprintf(buf, ODS_SE_MAXLINE, "Zone list has errors.\n");
> Index: signer/src/daemon/engine.c
> ===================================================================
> --- signer/src/daemon/engine.c(revision 5654)
> +++ signer/src/daemon/engine.c(working copy)
> @@ -811,7 +811,7 @@
>   *
>   */
>  void
> -engine_update_zones(engine_type* engine)
> +engine_update_zones(engine_type* engine, int zonelist_is_locked)
>  {
>      ldns_rbnode_t* node = LDNS_RBTREE_NULL;
>      zone_type* zone = NULL;
> @@ -833,8 +833,10 @@
>      now = time_now();
>      reload_zonefetcher(engine);
>  
> -    lock_basic_lock(&engine->zonelist->zl_lock);
> -    /* [LOCK] zonelist */
> +    if (!zonelist_is_locked) {
> +    lock_basic_lock(&engine->zonelist->zl_lock);
> +        /* [LOCK] zonelist */
> +    }
>      node = ldns_rbtree_first(engine->zonelist->zones);
>      while (node && node != LDNS_RBTREE_NULL) {
>          zone = (zone_type*) node->data;
> @@ -929,8 +931,10 @@
>          }
>          node = ldns_rbtree_next(node);
>      }
> -    /* [UNLOCK] zonelist */
> -    lock_basic_unlock(&engine->zonelist->zl_lock);
> +    if (!zonelist_is_locked) {
> +/* [UNLOCK] zonelist */
> +lock_basic_unlock(&engine->zonelist->zl_lock);
> +    }
>      if (wake_up) {
>          engine_wakeup_workers(engine);
>      }
> @@ -1097,7 +1101,7 @@
>          /* update zones */
>          if (zl_changed == ODS_STATUS_OK) {
>              ods_log_debug("[%s] commit zone list changes", engine_str);
> -            engine_update_zones(engine);
> +            engine_update_zones(engine, 0);
>              ods_log_debug("[%s] signer configurations updated",
> engine_str);
>              zl_changed = ODS_STATUS_UNCHANGED;
>          }
> Index: signer/src/daemon/engine.h
> ===================================================================
> --- signer/src/daemon/engine.h(revision 5654)
> +++ signer/src/daemon/engine.h(working copy)
> @@ -101,7 +101,7 @@
>   * \param[in] engine engine
>   *
>   */
> -void engine_update_zones(engine_type* engine);
> +void engine_update_zones(engine_type* engine, int zonelist_is_locked);
>  
>  /**
>   * Clean up engine.
> 
> 
> 
> _______________________________________________
> Opendnssec-otr mailing list
> Opendnssec-otr at lists.opendnssec.org
> https://lists.opendnssec.org/mailman/listinfo/opendnssec-otr

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOhCWiAAoJEA8yVCPsQCW5dJgH/0KfTB/P/xP5BNhqbs0JszrH
kbmqx/Qm58ZDoNDPx1SGh7OP35Y4iV/nYwvxsahOwZAEDXYhDOJ3o8mhJ+mVsFYf
Q5PlteRnv4TYjxY0GOpG3s2wZ5jmo7V5ZSe9HRfoIOQNYdS5lhkBF7NQo0oss/ss
pY3bno83h5HtS69u7Fdv0ezbWVCkHzkf79RAHPSfEC0PAiqjxYqLykhnfA/K8xUN
8cw9fnKGOTF4LuG5W5DpDU3gntRmfKcbiwQHJ/61yJbPR41ksJ6fn9gYZHfaEuph
UqGMh/92YXxFLXocIQ2OaJqW69b+p3P6iYwQYkFALSKIAg3ZRHce96y9XDNDZd4=
=waww
-----END PGP SIGNATURE-----



More information about the Opendnssec-develop mailing list