[Opendnssec-user] RE :About Broken pipe (version 1.4.7 & 1.4.9)
yaohongyuan
yaohongyuan at 163.com
Mon Apr 18 06:35:10 UTC 2016
>Hi all,
> Last week I report an issue about "ods-signerd thread abnormal running" , after got Yuri's reply then I version up my test env's opendnssec to 1.4.9 , but with 3 days test it's still not work.
> The signerd thread will disappear , I tend to think this is a major issue .
> Some parameters about my test env list :
> CUP : 14
> Mem : 128G
> General load average: 5.50, 4.43, 4.04
> Zones : 20
> Per zone RR count : 660,000
> Total zone RR count : 13,200,000
> Per zone RRset increasing speed : 1000/1h/zone
> opendnssec version : 1.4.9 (1.4.7 last week)
> And this machine just run 2 bind and opendnssec . Mem total cost less then 30G .
> I don't know why always got error as "wire/notify.c:477: notify_handle_zone: assertion notify->handler.fd == -1 failed" .
> Did anybody have met this like me ? How do you solving this ?
> I start the opendnssec about at 1 PM , I grep some system log as below :
>Mar 30 13:51:39 p01-test-devops-9-81 ods-signerd: [socket] unable to handle outgoing tcp response: write() failed (Broken pipe)
>Mar 30 13:53:23 p01-test-devops-9-81 ods-signerd: [socket] unable to handle outgoing tcp response: write() failed (Broken pipe)
>Mar 30 13:53:40 p01-test-devops-9-81 ods-signerd: [socket] unable to handle outgoing tcp response: write() failed (Broken pipe)
>Mar 30 13:54:41 p01-test-devops-9-81 ods-signerd: [socket] unable to handle outgoing tcp response: write() failed (Broken pipe)
>Mar 30 13:54:54 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone9 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 13:54:54 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone8 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 13:54:54 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone6 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 13:54:54 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone2 cannot tcp write to 192.168.1.110: Broken pipe
>... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
>Mar 30 19:03:14 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone7 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 19:03:14 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone6 cannot tcp write to 192.168.1.110: Broken pipe
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [STATS] testzone20 2015126051 RR[count=44 time=0(sec)] NSEC3[count=6 time=0(sec)] RRSIG[new=10 reused=172846 time=2(sec) avg=5(sig/sec)] TOTAL[time=8(sec)]
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [worker[4]] read zone testzone8
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone8 transfer done [notify acquired 1459337138, serial on disk 2015112767, notify serial 2015112767]
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [xfrd] zone testzone8 reset notify acquired
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: [xfrd] tcp read xfr: release connection
>Mar 30 19:25:55 p01-test-devops-9-81 ods-signerd: wire/notify.c:477: notify_handle_zone: assertion notify->handler.fd == -1 failed
>
> From above messages we could get that the signerd thread just work 6.5 H .
> Could anybody please help me to fix this issue together?
>
>With kind regards.
>Dean
Hi all ,
Last week we do some changes with source wire/notify.c:477 and have solved above problem , the change as below :
Base source version : 1.4.8
Before :
if (notify->is_waiting) {
ods_log_debug("[%s] already waiting, skipping notify for zone %s", notify_str, zone->name);
ods_log_assert(notify->handler.fd == -1);
return;
}
After :
if (notify->is_waiting) {
ods_log_debug("[%s] already waiting, skipping notify for zone %s", notify_str, zone->name);
if (notify->handler.fd > 0) {
close(notify->handler.fd);
notify->handler.fd = -1;
}
return;
}
I monitoring the handle count which under ods-signerd thread for a week and didn't find any abnormal phenomena .
The total number of handle count remain at around 1500.
Hope get some suggestions about the change .
With kind regards.
Dean
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opendnssec.org/pipermail/opendnssec-user/attachments/20160418/fd06ecbf/attachment.htm>
More information about the Opendnssec-user
mailing list