[Opendnssec-user] Some issues with OpenDNSSEC 1.3.0 trunk

Sebastian Castro sebastian at nzrs.net.nz
Thu Mar 3 02:53:45 UTC 2011


On 02/23/2011 11:57 PM, Matthijs Mekking wrote:
> Hi Sebastian,

Hi Matthijs,

> 
> On 02/22/2011 05:47 AM, Sebastian Castro wrote:
>> Hi,
> 
>> I'm aware is a little bit too soon to expect a functional version, but
>> based on the current trunk version I'd like to report:
> 
>> - It seems this version is a memory hog. Running on a system with a 1-GB
>> memory limit per process, ods-signerd reaches the limit quite fast
>> (~30min) when signing a set of zones that includes two relatively large
>> ones [*].
> 
> I did some analysis and found one small leak (RRs we filter out were not
> freed), but that can't surely be the hog. I'll look into it.
> 

We increased the limit to 1.5GB and the memory consumption peaked 1.2GB
for the ods-signerd process, so we don't have more crashes due to memory
allocation. However, we are using just a handful zones, excluding our
largest one.

>> - Also when a big zone is being signed, we get messages
> 
>> [fifo] unable to push item: max cap reached
> 
> The fifo queue, the queue of RRsets that need to be signed, has a
> capacity. The worker will keep trying to push the RRset until it
> succeeds. This might happen a lot with a big zone. Perhaps the log
> message should be not LOG_WARN, because we know that this can occur.
> 
>> by thousands... then syslog starts complaining afterwards.
> 
>> ods-signerd: last message repeated 1850109 times
> 
>> When reaches this point, the signerd doesn't make any progress and has
>> to be killed. We are currently testing with a FIFOQ_MAX_COUNT = 50000
> 
> Are you sure it makes no progress? I was able to sign a tld while making
> use of the fifo queue.

I'm afraid I'm hitting a race condition here. When the signer starts
fresh, it can sign all the zones in about 40 minutes (which I think it's
too much, but that's material for a different thread). When it has to
resign, some zones work and some others don't.

After waiting for two hours for any output from the signer in the logs,
I asked it to stop via 'ods-signer stop'. The command returns, but the
process is still there. An inspection using gdb shows four threads running:

Thread 1: engine_run
Thread 2: worker insistently trying to call fifoq_push with no success
(the queue is full)
Thread 3: same as Thread 2, but for a different zone
Thread 4: running cmdhandler_start (probably because the stop command I
sent).

Unless I have the wrong picture, I see to producers, but no consumers.
engine->config shows num_worker_threads = 2, num_signer_threads = 2.
In conf.xml I have two worker threads but no entry for the signer
threads. What number of threads are you using in your system?


> 
>> - In a few ocassions, we have ended with empty signed zones. A
>> 'ods-signer clear', 'ods-signer sign' has helped.
> 
> Without the auditor complaining?

The auditor is not activated, probably I would do given the issues I have.



> 
> Thanks for the report.
> 
> Best regards,
> Matthijs

cheers,
-- 
Sebastian Castro
DNS Specialist
.nz Registry Services (New Zealand Domain Name Registry Limited)
desk: +64 4 495 2337
mobile: +64 21 400535



More information about the Opendnssec-user mailing list