[Opendnssec-develop] Re: [OpenDNSSEC] #262: Possible race condition causing CPU-bound loop in signerd?

OpenDNSSEC owner-dnssec-trac at kirei.se
Tue Oct 25 13:27:31 UTC 2011


#262: Possible race condition causing CPU-bound loop in signerd?
-------------------------------+--------------------------------------------
Reporter:  goeran@…            |        Owner:  matthijs
    Type:  defect              |       Status:  accepted
Priority:  major               |    Component:  Signer  
 Version:  1.3.0               |   Resolution:          
Keywords:  CPU-bound loop      |  
-------------------------------+--------------------------------------------

Comment (by matthijs):

 Hi,

 I had the time to take a look at the strace. If I am correct, you have
 configured 8 worker threads and 2 signer threads.

 I see that all 8 threads are putting RRsets in the sign queue, so there
 are 8 zones in parallel being signed at the moment. The strace shows that
 7 of them are waiting on a lock on q_lock, one of them is releasing the
 lock on q_lock. So probably, that is going alright.

 All 8 workers have a lock on zone_lock (zone_lock is different in all of
 these cases, because each zone has it's own zone_lock). The command
 handler has received an update command for the zone "chalmers.eu", so it
 requires the zone_lock for that zone. Probably, one of the workers
 currently has that zone_lock.

 The two drudger threads are sleeping. That is kind of strange. They should
 have get a broadcast signal as soon as the threshold of 1 queuing RRset
 has been reached:

     if (count == 0 && q->count == 1) {
         lock_basic_broadcast(&q->q_threshold);
         ods_log_deeebug("[%s] threshold %u reached, notify drudgers",
             fifoq_str, q->count);
     }

 Given this reasoning, my analysis is that the RRset queue is full, so the
 workers cannot queue the whole zone for signing. Because of that, they
 cannot finish their job and release their zone_lock. You notice, because a
 ods-signer update command is requiring a zone_lock that is being resigned
 at the moment.

 That is my best guess. If someone has any other insights, please provide
 them. In the meantime, I will investigate if and how this scenario is
 possible.

-- 
Ticket URL: <http://trac.opendnssec.org/ticket/262#comment:5>
OpenDNSSEC <http://www.opendnssec.org/>
OpenDNSSEC


More information about the Opendnssec-develop mailing list