[Opendnssec-develop] Locking mechanism problems

Matthijs Mekking matthijs at NLnetLabs.nl
Thu Sep 29 08:31:06 UTC 2011

Hash: SHA1

On 09/29/2011 10:23 AM, Jerry Lundström wrote:
> On 2011-09-29 10.00, Matthijs Mekking <matthijs at NLnetLabs.nl> wrote:
>>> A quick fix to this is to make engine_update_zones() aware of the lock
>>> on zonelist so it doesn't try and lock it and release it after,
>>> suggested patch below.
>>> Altho I am seeing this almost everywhere and I would like to discuss our
>>> approach. I would really like to see read and write locks implemented to
>>> better support thread interoperability. Using
>>> pthread_mutex_trylock()/pthread_mutex_timedlock() where it suites to
>>> counter hangs. Using PTHREAD_MUTEX_RECURSIVE so different segments can
>>> lock the same locked object without letting each other know providing
>>> better OO. Of course as complexity in the locking mechanism increases so
>>> is the chance of dead locks.
>> The last sentence is exactly my reason to keep the locking mechanism
>> simple, so only lock/unlock.
> But the cost might come as a crash or worse that the data gets corrupt.

I am not sure which is better to be monitored. If the signer would
deadlock, you run the risk of your signatures getting expired, though it
looks like the signer is still doing its job (its up and running).

If the signer would crash, of course you run this risk too, but you can
immediately notice that the signer is not running.

> Just using PTHREAD_MUTEX_RECURSIVE would be a good start so that each
> separate segment in the code does not have to know about the locks. I
> don't know how supported PTHREAD_MUTEX_RECURSIVE is across the OS targets
> we have.
>>> Thoughts?
>> I am not sure if this patch fixes the problem, When I look at the
>> assertion error, it looks to me that there is a call that invalidly sets
>> the zone->task to NULL.
>> http://trac.opendnssec.org/browser/branches/OpenDNSSEC-1.3/signer/src/daem
>> on/engine.c#L897
>> Here task may be set to NULL, if the task is being worked on (not
>> scheduled).
>> http://trac.opendnssec.org/browser/branches/OpenDNSSEC-1.3/signer/src/daem
>> on/engine.c#L923
>> Here the zone->task is set to task, which might be NULL if the task was
>> being worked on. This should not happen.
>> The same in cmdhandler.c when a update call is received.
> I know that this fix does not fix everything, there are more places this
> can happen and there can be more data structures that might be affected by
> not locking correctly.
> Do we want to do an overhaul of the locks or is it in the plans to redo
> the signer?
> I could start looking at it in a week or so.

I currently have no plans to redo the locking mechanisms. Too busy with
the network support for adapters.

> /Jerry
> _______________________________________________
> Opendnssec-develop mailing list
> Opendnssec-develop at lists.opendnssec.org
> https://lists.opendnssec.org/mailman/listinfo/opendnssec-develop

Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


More information about the Opendnssec-develop mailing list