[Opendnssec-develop] Re: [Opendnssec-user]How to speed up signing performance

Matthijs Mekking matthijs at nlnetlabs.nl
Tue Dec 11 15:26:13 UTC 2012


On 12/10/2012 09:10 AM, Jerry Lundström wrote:
> Hi,
> 
> On Wed, Nov 21, 2012 at 4:34 PM, Matthijs Mekking <matthijs at nlnetlabs.nl> wrote:
>> The cpu patch was incorporated and is already included in the 1.4 beta.
>>
>> The aggressive retry patch is also applied, with small adjustments:
>>
>> -        if (status == ODS_STATUS_UNCHANGED) {
>> -            worker_wait_timeout_locked(&q->q_lock, &q->q_nonfull, 60);
>> +        if (!tries && status == ODS_STATUS_UNCHANGED) {
>> +            worker_wait_timeout_locked(&q->q_lock, &q->q_nonfull, 5);
>>          }
>>
>> The waiting should only occur when the number of max tries has been
>> reached (tries has been reset to 0). Also, 60 seconds seems indeed a
>> long maximum wait, 5 will do too (If the queue is not full, the timeout
>> should of course be shorter). This change is for now only in trunk, it
>> will be in the 1.4 rc.
> 
> I feel that the assumption that 60 seconds is a long time to wait is false.
> 
> If the locking/condition/signal code for the workers was working then
> you could actually wait forever because it would be signaled when
> there is more work.

True, but it should not take 60 seconds before the queue becomes
non-full (non-full is lower than 10 percent full). With your argument,
we could also set it to 0: only stop waiting until the queue is empty
enough again.

> Seeing that this patch has been introduced then it means that that
> code does not work as it should and it could lead to more problems
> like dead locks / slow downs and CPU hogging.

Perhaps the code does not work well, the more reason to let the worker
not wait that long. The 60 seconds is the bottleneck at the moment for
.ca apparently, so decreasing it to 5 seconds doesn't seem a slow down
to me.

Also note that we wait less often, only if tries is reset, we will wait.
In other words, we will try 10 times queuing the RRset before going in
wait mode. And we will start trying again in 5 seconds (or if the queue
becomes non-full, whichever comes first).

When all writing this down, I think why it is waiting for 60 seconds
(and now 5 seconds):

in worker_queue_rrset():
- lock sign queue for queuing another RRset
- sign queue is full, let's take a small break (5 or 60 seconds)

in worker_drudge():
- try get lock sign queue, for popping RRset, but sign queue is full

in worker_queue_rrset():
- stop waiting, unlock sign queue
- continue while loop.

Now there are some iterations for popping and pushing, and perhaps we
hit another 'small break'.

So, tuning down the 60 second wait to 5 seconds is a quick hack fix. I
think the real fix is waiting in worker_queue_rrset() without holding
the q_lock. Perhaps we need an push lock...


Best regards,
  Matthijs


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 553 bytes
Desc: OpenPGP digital signature
URL: <http://lists.opendnssec.org/pipermail/opendnssec-develop/attachments/20121211/6018f75f/attachment.bin>


More information about the Opendnssec-develop mailing list