Hi all
A server running version 4.4.4+0~20161223011227 has segfaulted for the second time this month. I'm attaching a bt in case someone can explain it to me.
cheers,
Jon
Hello,
some of the parameters in the bt stack are invalid. Are you using stock kamailio 4.4? Any backports or private modules?
Also, are you using acc module?
Cheers, Daniel
On 31/01/2017 19:06, Jon Bonilla (Manwe) wrote:
Hi all
A server running version 4.4.4+0~20161223011227 has segfaulted for the second time this month. I'm attaching a bt in case someone can explain it to me.
cheers,
Jon
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
El Wed, 1 Feb 2017 09:04:41 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:
Hello,
some of the parameters in the bt stack are invalid. Are you using stock kamailio 4.4? Any backports or private modules?
Also, are you using acc module?
Hi Daniel
No custom modules, nor acc modules.
loadmodule "db_mysql.so" loadmodule "mi_fifo.so" loadmodule "kex.so" loadmodule "corex.so" loadmodule "tm.so" loadmodule "tmx.so" loadmodule "sl.so" loadmodule "rr.so" loadmodule "pv.so" loadmodule "maxfwd.so" loadmodule "usrloc.so" loadmodule "textops.so" loadmodule "siputils.so" loadmodule "xlog.so" loadmodule "sanity.so" loadmodule "ctl.so" loadmodule "cfg_rpc.so" loadmodule "mi_rpc.so" loadmodule "dispatcher.so" loadmodule "tls.so" loadmodule "xhttp.so" loadmodule "auth.so" loadmodule "htable.so" loadmodule "pike.so" loadmodule "nathelper.so" loadmodule "websocket.so" loadmodule "http_client.so" loadmodule "exec.so" loadmodule "uac_redirect.so" loadmodule "uac.so"
The only "special" configuration here is that auth is done agains a shell script executed via
exec_avp("/usr/local/bin/decrypt.sh -k /etc/kamailio/keys/keys.csv -t $hdr(X-MyAuth) -u $fU 2>&1", "$avp(s:decrypt)");
El Wed, 1 Feb 2017 10:21:35 +0100 "Jon Bonilla (Manwe)" manwe@aholab.ehu.es escribió:
Update:
Installed version 4.4.5+0~20170202010534.75+jessie amd64 from nightly sources.
The coredump is 100% reproducible: Outgoing call to a freeswitch server which is disabled (fsctl pause inbound) and responds 503. Kamailio dies after few seconds. The message in the log is
CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 10
Attaching latest coredump.
cheers,
Jon
El Fri, 3 Feb 2017 00:50:58 +0100 "Jon Bonilla (Manwe)" manwe@aholab.ehu.es escribió:
El Wed, 1 Feb 2017 10:21:35 +0100 "Jon Bonilla (Manwe)" manwe@aholab.ehu.es escribió:
Update:
Installed version 4.4.5+0~20170202010534.75+jessie amd64 from nightly sources.
The coredump is 100% reproducible: Outgoing call to a freeswitch server which is disabled (fsctl pause inbound) and responds 503. Kamailio dies after few seconds. The message in the log is
CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 10
Attaching latest coredump.
More tests:
This is the log I get when the error causes the coredump:
NOTICE: <script>: Reply - M=INVITE S=503 Maximum Calls In Progress ID=KvcoZ5xMc5
Feb 2 15:35:32 14cn5 sbc[16550]: WARNING: <script>: [DISPATCHER] - Destination down: INVITE sip:dest@domain:443;transport=tls (<null>)
Feb 2 15:35:32 14cn5 sbc[16550]: NOTICE: <script>: Node unavailable, trying next sip:198.1.54.105:5060 from group <null>
Feb 2 15:35:48 14cn5 sbc[16572]: CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 10
And this is what I get on a 4.2.x where kamailio does not crash with the same config and scenarios:
NOTICE: <script>: Reply - M=INVITE S=503 Maximum Calls In Progress ID=kN4dar1Jce
WARNING: <script>: [DISPATCHER] - Destination down: INVITE sip:dest@domain:443;transport=tls (<null>)
NOTICE: <script>: Node unavailable, trying next sip:198.1.54.105:5060 from group FREESWITCH
Note that dispatcher selects another node and shows the group in kamailio 4.2 but crashes and shows group null in version 4.4
cheers,
Jon
Thanks for troubleshooting further -- I will try to analyze asap, when I get a bit of time while traveling to Fosdem.
Cheers, Daniel
On 03/02/2017 01:23, Jon Bonilla (Manwe) wrote:
El Fri, 3 Feb 2017 00:50:58 +0100 "Jon Bonilla (Manwe)" manwe@aholab.ehu.es escribió:
El Wed, 1 Feb 2017 10:21:35 +0100 "Jon Bonilla (Manwe)" manwe@aholab.ehu.es escribió:
Update:
Installed version 4.4.5+0~20170202010534.75+jessie amd64 from nightly sources.
The coredump is 100% reproducible: Outgoing call to a freeswitch server which is disabled (fsctl pause inbound) and responds 503. Kamailio dies after few seconds. The message in the log is
CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 10
Attaching latest coredump.
More tests:
This is the log I get when the error causes the coredump:
NOTICE: <script>: Reply - M=INVITE S=503 Maximum Calls In Progress ID=KvcoZ5xMc5
Feb 2 15:35:32 14cn5 sbc[16550]: WARNING: <script>: [DISPATCHER] - Destination down: INVITE sip:dest@domain:443;transport=tls (<null>)
Feb 2 15:35:32 14cn5 sbc[16550]: NOTICE: <script>: Node unavailable, trying next sip:198.1.54.105:5060 from group <null>
Feb 2 15:35:48 14cn5 sbc[16572]: CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 10
And this is what I get on a 4.2.x where kamailio does not crash with the same config and scenarios:
NOTICE: <script>: Reply - M=INVITE S=503 Maximum Calls In Progress ID=kN4dar1Jce
WARNING: <script>: [DISPATCHER] - Destination down: INVITE sip:dest@domain:443;transport=tls (<null>)
NOTICE: <script>: Node unavailable, trying next sip:198.1.54.105:5060 from group FREESWITCH
Note that dispatcher selects another node and shows the group in kamailio 4.2 but crashes and shows group null in version 4.4
cheers,
Jon
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
One question regarding your description. Are you using string values for setid of dispatcher groups? I saw that for 4.2 it prints FREESWITCH, not a number -- just wanted to clarify and be sure you didn't use an alias for the logs.
Cheers, Daniel
On 03/02/2017 09:17, Daniel-Constantin Mierla wrote:
Thanks for troubleshooting further -- I will try to analyze asap, when I get a bit of time while traveling to Fosdem.
Cheers, Daniel
On 03/02/2017 01:23, Jon Bonilla (Manwe) wrote:
El Fri, 3 Feb 2017 00:50:58 +0100 "Jon Bonilla (Manwe)" manwe@aholab.ehu.es escribió:
El Wed, 1 Feb 2017 10:21:35 +0100 "Jon Bonilla (Manwe)" manwe@aholab.ehu.es escribió:
Update:
Installed version 4.4.5+0~20170202010534.75+jessie amd64 from nightly sources.
The coredump is 100% reproducible: Outgoing call to a freeswitch server which is disabled (fsctl pause inbound) and responds 503. Kamailio dies after few seconds. The message in the log is
CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 10
Attaching latest coredump.
More tests:
This is the log I get when the error causes the coredump:
NOTICE: <script>: Reply - M=INVITE S=503 Maximum Calls In Progress ID=KvcoZ5xMc5
Feb 2 15:35:32 14cn5 sbc[16550]: WARNING: <script>: [DISPATCHER] - Destination down: INVITE sip:dest@domain:443;transport=tls (<null>)
Feb 2 15:35:32 14cn5 sbc[16550]: NOTICE: <script>: Node unavailable, trying next sip:198.1.54.105:5060 from group <null>
Feb 2 15:35:48 14cn5 sbc[16572]: CRITICAL: <core> [pass_fd.c:277]: receive_fd(): EOF on 10
And this is what I get on a 4.2.x where kamailio does not crash with the same config and scenarios:
NOTICE: <script>: Reply - M=INVITE S=503 Maximum Calls In Progress ID=kN4dar1Jce
WARNING: <script>: [DISPATCHER] - Destination down: INVITE sip:dest@domain:443;transport=tls (<null>)
NOTICE: <script>: Node unavailable, trying next sip:198.1.54.105:5060 from group FREESWITCH
Note that dispatcher selects another node and shows the group in kamailio 4.2 but crashes and shows group null in version 4.4
cheers,
Jon
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
El Fri, 3 Feb 2017 09:34:04 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:
One question regarding your description. Are you using string values for setid of dispatcher groups? I saw that for 4.2 it prints FREESWITCH, not a number -- just wanted to clarify and be sure you didn't use an alias for the logs.
Hi Daniel
the setid groups are int(11) and numeric. The FREESWITCH name is taken from attrs (VARCHAR 128) field.
I didn't populate the db and didn't realize until now that attrs field was used that way (!).
Will remove those strings from the attrs field and try again.
On 03/02/2017 09:57, Jon Bonilla (Manwe) wrote:
El Fri, 3 Feb 2017 09:34:04 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:
One question regarding your description. Are you using string values for setid of dispatcher groups? I saw that for 4.2 it prints FREESWITCH, not a number -- just wanted to clarify and be sure you didn't use an alias for the logs.
Hi Daniel
the setid groups are int(11) and numeric. The FREESWITCH name is taken from attrs (VARCHAR 128) field.
I didn't populate the db and didn't realize until now that attrs field was used that way (!).
Will remove those strings from the attrs field and try again.
Would be good to see the difference, if it still crashes or not. But this needs to be fixed, no crash should happen at runtime no matter of what's the input from network or db.
As I looked quickly over the commits in dispatcher module, are you using the socket attribute? It was one of the major addition lately, otherwise the dispatcher was quite steady for 4.4.
Cheers, Daniel
El Fri, 3 Feb 2017 10:06:21 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:
Would be good to see the difference, if it still crashes or not. But this needs to be fixed, no crash should happen at runtime no matter of what's the input from network or db.
As I looked quickly over the commits in dispatcher module, are you using the socket attribute? It was one of the major addition lately, otherwise the dispatcher was quite steady for 4.4.
Hi Daniel
Doing some tests with latest nightly we can reproduce the issue only if attrs is populated with arbitrary strings AND the modparam sock_avp is set. All other 3 combinations do not crash.
cheers,
Jon
On 06/02/2017 23:38, Jon Bonilla (Manwe) wrote:
El Fri, 3 Feb 2017 10:06:21 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:
Would be good to see the difference, if it still crashes or not. But this needs to be fixed, no crash should happen at runtime no matter of what's the input from network or db.
As I looked quickly over the commits in dispatcher module, are you using the socket attribute? It was one of the major addition lately, otherwise the dispatcher was quite steady for 4.4.
Hi Daniel
Doing some tests with latest nightly we can reproduce the issue only if attrs is populated with arbitrary strings AND the modparam sock_avp is set. All other 3 combinations do not crash.
Thanks Jon! Very useful to know that it is related to that. I am still traveling for few days, but whenever I get the first chance, I will dig in it properly.
One more question? Is it a HA node with active-standby shared IP?
Cheers, Daniel
El Tue, 7 Feb 2017 07:34:09 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:
Thanks Jon! Very useful to know that it is related to that. I am still traveling for few days, but whenever I get the first chance, I will dig in it properly.
One more question? Is it a HA node with active-standby shared IP?
No it's not. I think it's a physical system. But no HA on it as it's a QA system. It uses bonding.
On 07/02/2017 09:23, Jon Bonilla (Manwe) wrote:
El Tue, 7 Feb 2017 07:34:09 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:
Thanks Jon! Very useful to know that it is related to that. I am still traveling for few days, but whenever I get the first chance, I will dig in it properly.
One more question? Is it a HA node with active-standby shared IP?
No it's not. I think it's a physical system. But no HA on it as it's a QA system. It uses bonding.
OK, thanks for all the details, need to dig into the code.
Cheers, Daniel