[Opendnssec-user] RE :About Broken pipe (version 1.4.7 & 1.4.9)

yaohongyuan yaohongyuan at 163.com
Mon Apr 18 06:35:10 UTC 2016


>Hi all,
>        Last week I report an issue about "ods-signerd thread abnormal running" , after got Yuri's reply then I version up my test env's opendnssec to 1.4.9 , but with 3 days test it's still not work.
>        The signerd thread will disappear , I tend to think this is a major issue .
>        Some parameters about my test env list :
>            CUP : 14
>            Mem : 128G
>            General load average: 5.50, 4.43, 4.04
>            Zones : 20
>            Per zone RR count : 660,000
>            Total zone RR count : 13,200,000
>            Per zone RRset increasing speed : 1000/1h/zone
>            opendnssec version : 1.4.9 (1.4.7 last week)
>        And this machine just run 2 bind and opendnssec . Mem total cost less then 30G . 
>        I don't know why always got error as "wire/notify.c:477: notify_handle_zone: assertion notify->handler.fd == -1 failed" .
>        Did anybody have met this like me ? How do you solving this ?
>        I start the opendnssec about at 1 PM ,  I grep some system log as below : 
>Mar 30 13:51:39 p01-test-devops-9-81 ods-signerd: [socket] unable to handle outgoing tcp response: write() failed (Broken pipe)
>Mar 30 13:53:23 p01-test-devops-9-81 ods-signerd: [socket] unable to handle outgoing tcp response: write() failed (Broken pipe)
>Mar 30 13:53:40 p01-test-devops-9-81 ods-signerd: [socket] unable to handle outgoing tcp response: write() failed (Broken pipe)
>Mar 30 13:54:41 p01-test-devops-9-81 ods-signerd: [socket] unable to handle outgoing tcp response: write() failed (Broken pipe)
>Mar 30 13:54:54 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone9 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 13:54:54 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone8 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 13:54:54 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone6 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 13:54:54 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone2 cannot tcp write to 192.168.1.110: Broken pipe
>... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 
>Mar 30 19:03:14 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone7 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 19:03:14 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone6 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [STATS] testzone20 2015126051 RR[count=44 time=0(sec)] NSEC3[count=6 time=0(sec)] RRSIG[new=10 reused=172846 time=2(sec) avg=5(sig/sec)] TOTAL[time=8(sec)]
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [worker[4]] read zone testzone8
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone8 transfer done [notify acquired 1459337138, serial on disk 2015112767, notify serial 2015112767]
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone8 reset notify acquired
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [xfrd] tcp read xfr: release connection
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: wire/notify.c:477: notify_handle_zone: assertion notify->handler.fd == -1 failed
>
>        From above messages we could get that the signerd thread just work 6.5 H .
>        Could anybody please help me to fix this issue together?
>
>With kind regards.
>Dean
Hi all , 
        Last week we do some changes with source wire/notify.c:477 and have solved above problem , the change as below :
            Base source version : 1.4.8
            Before :
                if (notify->is_waiting) {
                    ods_log_debug("[%s] already waiting, skipping notify for zone %s", notify_str, zone->name);
                    ods_log_assert(notify->handler.fd == -1);
                    return;
                }
            After :
                if (notify->is_waiting) {
                    ods_log_debug("[%s] already waiting, skipping notify for zone %s", notify_str, zone->name);
                    if (notify->handler.fd > 0) {
                        close(notify->handler.fd);
                        notify->handler.fd = -1;
                    }
                    return;
                }


        I monitoring the handle count which under ods-signerd thread for a week and didn't find any abnormal phenomena . 
        The total number of handle count remain at around 1500.
        Hope get some suggestions about the change .


With kind regards.
Dean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opendnssec.org/pipermail/opendnssec-user/attachments/20160418/fd06ecbf/attachment.htm>


More information about the Opendnssec-user mailing list