[Opendnssec-user] signer: "assertion part->soamin failed"

Havard Eidnes he at uninett.no
Mon Dec 14 10:33:07 UTC 2015


>> Can someone familiar with this piece of code please explain what
>> problem this assertion might indicate?  The text in the log
>> message itself isn't very descriptive.  Looking at the code this
>> appears to be related to use of IXFR, but I can't figure out the
>> sequence of events which might trigger this problem, or where the
>> data parts would come from (DNS neighbor? Already-existing IXFR
>> log?)
>
> An IXFR message is a list of differences between two versions of a zone.
> In OpenDNSSEC this is called parts.
>
> A part is a list of deleted records and a list of added records. THE SOA
> record in these lists are kind of special and are stored separately. The
> soamin is the SOA record that is deleted, the soaplus is the SOA record
> that is added.
>
> If one of those SOA records is missing in the part, then there is
> something broken, and we fail on the assertion.
>
> Where the data part comes from: Evertime a change is made we keep a
> journal in the working directory: <zone>.ixfr. That is read out when the
> secondary requests an IXFR from OpenDNSSEC.

Hm, ok.  From my recollection the <zone>.ixfr files are also read
when OpenDNSSEC is started, possibly due to "pull" from the slave
which does zone transfers(?)  We've also observed problems in this
area, this has given rise to

  https://issues.opendnssec.org/browse/SUPPORT-181

where the crux is

  ods-signerd: [backup] bad ixfr journal: trailing RRs after final SOA

Normally this would cause the .ixfr file to be deleted, I'm running
with a local patch which simply renames the file to <zone>.ixfr-bad
for later debugging.  I've given berry at nlnetlabs.nl copies of older
/var/opendnssec/tmp/ directories; I now have two more stashed which
I've not yet sent.  Looking at the files in one of them, both the
*.ixfr-bad and the *.ixfr files have an even number of SOA records in
them...

>> Having assertions fire in long-running daemons in a normal
>> operational environment is a bug, plain and simple.
>
> I agree, however for a developer the assertion helps to find
> the actual bug.

Mm...  I'm wondering if the assertion about "soamin not set" and the
problem reported above are related.  They at least concern the same
piece of functionality -- both are related to IXFR handling.

>> When this happens, if we try to restart the signer, it will
>> shortly thereafter exit again with the same message; I have then
>> to remove the tempoary files and push new zone content via notify
>> messages from the hidden master to set things up again.  So, this
>> time I have two sets of such files which I can supply to a
>> developer who would be willing to take a closer look (if indeed
>> the source of the data can be found in the files on disk).
>
> The contents of the backup files may help, because as you say,
> restarting will give you the same error. That tells me that reading the
> soamin from backup fails, so that file should be able to be used to
> trigger the bug.

As I said, if any of you are interested in copies of the "bad" .ixfr
files (and the presumed-good ones as well), I can send them privately.

Curing this bug would, I beleive, considerably improve the resiliency
of OpenDNSSEC in our deployment.

> There aren't many lines that alter soamin.

Right, that agrees with my grep'ing of the source.

Regards,

- Håvard



More information about the Opendnssec-user mailing list