[Opendnssec-user] segfault after system upgrade.

Yuri Schaeffer yuri at nlnetlabs.nl
Mon Jan 9 16:41:51 UTC 2017


Hi Fred,

This indeed looks very much like earlier reported and at the moment we
don't have enough leads. It seems to fail somewhere in the interaction
between different libraries. We would like to know of the dependencies
which are updated and which versions you run.

I can think of:
- softHSM
- botan (maybe used by softHSM?)
- Mysql / sqlite
- OpenSSL

(https://www.suse.com/releasenotes/x86_64/SUSE-SLES/12-SP2/#Packages.Update
tells me SP2 updates OpenSSL to 1.0.2 and libc to 2.22)

We've been observing some problems in combination with SoftHSMv2 and are
looking in to that. But till now only saw problems on our 'old' test
systems, not our relatively new development machines. This _could_ be
related, but it is to early to tell.

I'm unsure why valgrind gives you a hard time. Something seriously
corrupts the enforcers memory. Can you send me the executable you made
the stacktrace with?

If it is indeed related to the HSM handling code you could try to
install OpenDNSSEC from our development branch. (basically the to be
released 2.1) From git the 'develop' branch. The issue would not be
fixed perse but a lot of code around the HSM handling has changed so you
might just not hit the bug.

Regards,
Yuri

On 09-01-17 16:18, Fred.Zwarts wrote:
> On our test system we have been running ods 2.0.3 with softhsm 2.2.0 for
> a few weeks without problems.
> Last week we upgraded the system from
> SUSE Linux Enterprise Server 12 (x86_64) SP1
> to SP2.
> After this upgrade the enforcer exits with a segfault a short time after
> startup.
> In the system log we see:
> 
> 2017-01-09T15:19:37.958829+01:00 kvivs20 ods-enforcerd: [engine] running
> as pid 17890
> 2017-01-09T15:19:37.959069+01:00 kvivs20 ods-enforcerd: [engine]
> enforcer started
> 2017-01-09T15:19:37.970328+01:00 kvivs20 ods-enforcerd: [enforcer]
> update zone: 15.125.129.in-addr.arpa
> 2017-01-09T15:19:37.978189+01:00 kvivs20 ods-enforcerd: [enforcer]
> update zone: 27.125.129.in-addr.arpa
> 2017-01-09T15:19:37.983407+01:00 kvivs20 ods-enforcerd: [enforcer]
> update zone: 37.125.129.in-addr.arpa
> 2017-01-09T15:19:37.988586+01:00 kvivs20 ods-enforcerd: [enforcer]
> update zone: 40.125.129.in-addr.arpa
> 2017-01-09T15:19:38.173046+01:00 kvivs20 kernel: [432557.821200]
> ods-enforcerd[17892]: segfault at 7efc1b23aff8 ip 00007efc1cd1d6bc sp
> 00007efc1b23b000 error 6 in libc-2.22.so[7efc1cca4000+19a000]
> 2017-01-09T15:19:47.908556+01:00 kvivs20 systemd-coredump[17896]:
> Process 17890 (ods-enforcerd) of user 0 dumped core.
> 
> It looks as if there is a problem with zone 40.125.129.in-addr.arpa or
> 56.125.129.in-addr.arpa, because somewhere in the processing of these
> zones the error occurs each time. 40 is the last one mentioned, 56 is
> the first one not mentioned.
> 
> If have tried to get a trace-back with valgrind, but that fails with an
> internal error in valgrind:
> # valgrind ods-enforcerd -d
> 
> ==16788== Memcheck, a memory error detector
> ==16788== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==16788== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright
> info
> ==16788== Command: /usr/local/sbin/ods-enforcerd -d
> ==16788==
> 
> vex: the `impossible' happened:
>   isZeroU
> vex storage: T total 535943568 bytes allocated
> vex storage: P total 640 bytes allocated
> 
> valgrind: the 'impossible' happened:
>   LibVEX called failure_exit().
> 
> host stacktrace:
> ==16788==    at 0x3803D1C8: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x3803D2F4: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x3803D531: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x3803D55A: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x38057F02: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x380FF028: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x3810BF2D: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x3810F9E1: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x38110A5E: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x38112345: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x381133F3: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x380FC885: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x3805A3D3: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x3808AD1A: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x3808C9DF: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> ==16788==    by 0x3809BA7A: ??? (in
> /usr/lib64/valgrind/memcheck-amd64-linux)
> 
> sched status:
>  running_tid=1
> 
> Thread 1: status = VgTs_Runnable (lwpid 16788)
> ==16788==    at 0x6101260: ??? (in /lib64/libcrypto.so.1.0.0)
> ==16788==    by 0x60E4011: EC_POINT_mul (in /lib64/libcrypto.so.1.0.0)
> ==16788==    by 0x60EBC97: EC_KEY_check_key (in /lib64/libcrypto.so.1.0.0)
> ==16788==    by 0x60EC06D: EC_KEY_set_public_key_affine_coordinates (in
> /lib64/libcrypto.so.1.0.0)
> ==16788==    by 0x61A0542: FIPS_selftest_ecdsa (in
> /lib64/libcrypto.so.1.0.0)
> ==16788==    by 0x619BEE9: FIPS_selftest (in /lib64/libcrypto.so.1.0.0)
> ==16788==    by 0x619ABF4: FIPS_module_mode_set (in
> /lib64/libcrypto.so.1.0.0)
> ==16788==    by 0x607616B: FIPS_mode_set (in /lib64/libcrypto.so.1.0.0)
> ==16788==    by 0x6072B5F: OPENSSL_init_library (in
> /lib64/libcrypto.so.1.0.0)
> ==16788==    by 0x400EC09: call_init.part.0 (in /lib64/ld-2.22.so)
> ==16788==    by 0x400ECF2: _dl_init (in /lib64/ld-2.22.so)
> ==16788==    by 0x4001189: ??? (in /lib64/ld-2.22.so)
> ==16788==    by 0x1: ???
> ==16788==    by 0xFFF00078E: ???
> ==16788==    by 0xFFF0007AC: ???
> 
> 
> Note: see also the FAQ in the source distribution.
> It contains workarounds to several common problems.
> In particular, if Valgrind aborted or crashed after
> identifying problems in your program, there's a good chance
> that fixing those problems will prevent Valgrind aborting or
> crashing, especially if it happened in m_mallocfree.c.
> 
> If that doesn't help, please report this bug to: www.valgrind.org
> 
> In the bug report, send all the above text, the valgrind
> version, and what OS and version you are using.  Thanks.
> 
> Then I found the other thread in this mailing list about segfaults. I am
> not familiar with the debugger. It tried the following:
> I rebuilt ods with make clean; make CFLAGS="-g -O0"
> Then I entered the /enforcer/src directory and tried:
> 
> # gdb -e ./ods-enforcerd
> GNU gdb (GDB; SUSE Linux Enterprise 12) 7.11.1
> Copyright (C) 2016 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-suse-linux".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> <http://bugs.opensuse.org/>.
> Find the GDB manual and other documentation resources online at:
> <http://www.gnu.org/software/gdb/documentation/>.
> For help, type "help".
> Type "apropos word" to search for commands related to "word".
> (gdb) run -d
> Starting program: /downloads/opendnssec-2.0.3/enforcer/src/ods-enforcerd -d
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> OpenDNSSEC key and signing policy enforcer version 2.0.3
> [New Thread 0x7ffff5584700 (LWP 18228)]
> [New Thread 0x7ffff4d83700 (LWP 18229)]
> [New Thread 0x7fffeffff700 (LWP 18230)]
> [New Thread 0x7fffef7fe700 (LWP 18231)]
> [New Thread 0x7fffeeffd700 (LWP 18232)]
> 
> Thread 3 "ods-enforcerd" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7ffff4d83700 (LWP 18229)]
> 0x00007ffff60666bc in _int_malloc (av=av at entry=0x7ffff0000020,
> bytes=bytes at entry=16) at malloc.c:3320
> 3320    malloc.c: No such file or directory.
> (gdb) bt
> #0  0x00007ffff60666bc in _int_malloc (av=av at entry=0x7ffff0000020,
> bytes=bytes at entry=16) at malloc.c:3320
> #1  0x00007ffff6069024 in __libc_calloc (n=<optimized out>,
> elem_size=<optimized out>) at malloc.c:3237
> #2  0x0000000000430824 in ?? ()
> #3  0x0000000000001f00 in ?? ()
> #4  0x0000000000000020 in ?? ()
> #5  0x00007ffff45840a0 in ?? ()
> #6  0x000000000043084d in ?? ()
> #7  0x0000000000001f00 in ?? ()
> #8  0x00007ffff0027cb0 in ?? ()
> #9  0x00007ffff0034100 in ?? ()
> #10 0x000000000047b249 in ?? ()
> #11 0x00007ffff45840d0 in ?? ()
> #12 0x0000000000430bff in ?? ()
> #13 0x00007ffff001f0f0 in ?? ()
> #14 0x00007ffff00315f0 in ?? ()
> #15 0x00007ffff4584100 in ?? ()
> #16 0x00007ffff0027cb0 in ?? ()
> #17 0x00007ffff4584100 in ?? ()
> #18 0x0000000000430a61 in ?? ()
> #19 0x00007ffff4d83700 in ?? ()
> #20 0x00007ffff001f0f0 in ?? ()
> #21 0x00007ffff0034100 in ?? ()
> #22 0x00007ffff00315f0 in ?? ()
> #23 0x00007ffff4584140 in ?? ()
> #24 0x0000000000444db9 in ?? ()
> #25 0x00007ffff0028080 in ?? ()
> #26 0x00007ffff0031410 in ?? ()
> #27 0x00007ffff002a3b0 in ?? ()
> #28 0x00007ffff00008c0 in ?? ()
> #29 0x00007ffff4584150 in ?? ()
> #30 0x0000000000000003 in ?? ()
> #31 0x00007ffff4584170 in ?? ()
> #32 0x0000000000444b93 in ?? ()
> #33 0x00000001f4584170 in ?? ()
> #34 0x00007ffff0028080 in ?? ()
> #35 0x00007ffff002bd50 in ?? ()
> #36 0x00007ffff0031410 in ?? ()
> #37 0x00007ffff4584200 in ?? ()
> #38 0x000000000042437f in ?? ()
> #39 0x00007ffff002b180 in ?? ()
> #40 0x00007ffff002b0c0 in ?? ()
> #41 0x00000001f45841b0 in ?? ()
> #42 0x00007ffff4d82b60 in ?? ()
> #43 0x00007ffff002b0c0 in ?? ()
> #44 0x00007ffff002d780 in ?? ()
> #45 0x0000000000000005 in ?? ()
> #46 0x00007ffff0022160 in ?? ()
> #47 0x00007ffff4584200 in ?? ()
> #48 0x00007ffff4d82b60 in ?? ()
> #49 0x00000001f4d83700 in ?? ()
> #50 0x00007ffff002b0c0 in ?? ()
> #51 0x00007ffff002b0c0 in ?? ()
> #52 0xfffffffff002d780 in ?? ()
> #53 0x00007ffff002b0c8 in ?? ()
> #54 0x0000000000000003 in ?? ()
> #55 0x00007ffff4584290 in ?? ()
> #56 0x0000000000424902 in ?? ()
> #57 0x00007ffff0028080 in ?? ()
> #58 0x00007ffff0022f80 in ?? ()
> #59 0x00000001f4584240 in ?? ()
> #60 0x00007ffff4d82b60 in ?? ()
> #61 0x00007ffff0022f80 in ?? ()
> #62 0x00007ffff002d780 in ?? ()
> #63 0x0000000000000005 in ?? ()
> #64 0x00007ffff0022160 in ?? ()
> ---Type <return> to continue, or q <return> to quit---
> #65 0x00007ffff4584290 in ?? ()
> #66 0xfffffffff4d82b60 in ?? ()
> #67 0x00000001f4d83700 in ?? ()
> #68 0x00007ffff0031410 in ?? ()
> #69 0x0000000000000000 in ?? ()
> (gdb)
> 
> I don't know how to get more symbolic information.
> Any suggestion?
> 
> _______________________________________________
> Opendnssec-user mailing list
> Opendnssec-user at lists.opendnssec.org
> https://lists.opendnssec.org/mailman/listinfo/opendnssec-user

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: OpenPGP digital signature
URL: <http://lists.opendnssec.org/pipermail/opendnssec-user/attachments/20170109/7d3f442d/attachment.bin>


More information about the Opendnssec-user mailing list