[Opendnssec-user] Some issues with OpenDNSSEC 1.3.0 trunk
Sebastian Castro
sebastian at nzrs.net.nz
Thu Mar 3 02:53:45 UTC 2011
On 02/23/2011 11:57 PM, Matthijs Mekking wrote:
> Hi Sebastian,
Hi Matthijs,
>
> On 02/22/2011 05:47 AM, Sebastian Castro wrote:
>> Hi,
>
>> I'm aware is a little bit too soon to expect a functional version, but
>> based on the current trunk version I'd like to report:
>
>> - It seems this version is a memory hog. Running on a system with a 1-GB
>> memory limit per process, ods-signerd reaches the limit quite fast
>> (~30min) when signing a set of zones that includes two relatively large
>> ones [*].
>
> I did some analysis and found one small leak (RRs we filter out were not
> freed), but that can't surely be the hog. I'll look into it.
>
We increased the limit to 1.5GB and the memory consumption peaked 1.2GB
for the ods-signerd process, so we don't have more crashes due to memory
allocation. However, we are using just a handful zones, excluding our
largest one.
>> - Also when a big zone is being signed, we get messages
>
>> [fifo] unable to push item: max cap reached
>
> The fifo queue, the queue of RRsets that need to be signed, has a
> capacity. The worker will keep trying to push the RRset until it
> succeeds. This might happen a lot with a big zone. Perhaps the log
> message should be not LOG_WARN, because we know that this can occur.
>
>> by thousands... then syslog starts complaining afterwards.
>
>> ods-signerd: last message repeated 1850109 times
>
>> When reaches this point, the signerd doesn't make any progress and has
>> to be killed. We are currently testing with a FIFOQ_MAX_COUNT = 50000
>
> Are you sure it makes no progress? I was able to sign a tld while making
> use of the fifo queue.
I'm afraid I'm hitting a race condition here. When the signer starts
fresh, it can sign all the zones in about 40 minutes (which I think it's
too much, but that's material for a different thread). When it has to
resign, some zones work and some others don't.
After waiting for two hours for any output from the signer in the logs,
I asked it to stop via 'ods-signer stop'. The command returns, but the
process is still there. An inspection using gdb shows four threads running:
Thread 1: engine_run
Thread 2: worker insistently trying to call fifoq_push with no success
(the queue is full)
Thread 3: same as Thread 2, but for a different zone
Thread 4: running cmdhandler_start (probably because the stop command I
sent).
Unless I have the wrong picture, I see to producers, but no consumers.
engine->config shows num_worker_threads = 2, num_signer_threads = 2.
In conf.xml I have two worker threads but no entry for the signer
threads. What number of threads are you using in your system?
>
>> - In a few ocassions, we have ended with empty signed zones. A
>> 'ods-signer clear', 'ods-signer sign' has helped.
>
> Without the auditor complaining?
The auditor is not activated, probably I would do given the issues I have.
>
> Thanks for the report.
>
> Best regards,
> Matthijs
cheers,
--
Sebastian Castro
DNS Specialist
.nz Registry Services (New Zealand Domain Name Registry Limited)
desk: +64 4 495 2337
mobile: +64 21 400535
More information about the Opendnssec-user
mailing list