[SR-Users] DMQ broadcasting crashes kamailio
Daniel-Constantin Mierla
miconda at gmail.com
Fri Apr 24 21:21:32 CEST 2020
Hello,
I pushed two patches to prevent the crash, even the modules is not used
as expected in the config.
Charles: can you check and see if both makes sense? The one in
worker_loop() function is to prevent the crash:
*
https://github.com/kamailio/kamailio/commit/a675ab88fefac75145a7d563fee0431458630529
This should be backported if all goes fine with it.
The second one in empty_peer_callback() is to generated a 202-Accepted
response, otherwise in such cases the sender will do retransmissions:
*
https://github.com/kamailio/kamailio/commit/7f618c2d855ac268df905eb3d6e18733c8773047
But maybe it was on purpose not to send a response (i.e., to allow
sending the response from config), in such case it can be reverted.
Cheers,
Daniel
On 24.04.20 20:57, Charles Chance wrote:
> Hi,
>
> Did you try the config snippet I provided?
>
> Basically dmq_handle_message() must be called if the message is not
> your own, otherwise the node discovery/health check will not work and
> you will see nodes disappearing as you described.
>
> Here it is again:
>
> if(is_method("KDMQ")){
>
> if($rU =~ "userOnline"){
> //user came online in cluster, resume transactions if-any
> suspended
> $avp(remoteUser) = $rb;
> } else {
> dmq_handle_message();
> }
> }
>
> Notice that we check for your own/custom message first, then call
> handle message if not matched.
>
> Let me know if it works.
>
> Cheers,
>
> Charles
>
>
> On Fri, 24 Apr 2020 at 19:52, SamyGo <govoiper at gmail.com
> <mailto:govoiper at gmail.com>> wrote:
>
> Yes,
> I did read all(past 3+ years) his replies specific to DMQ and DMQ
> USRLOC and only one matched exact description and there has no
> resolution to it.
> Github open+closed issues for DMQ didn't have anything similar
> either. Could it be something I'm doing wrong !?
>
> Additional info: One of the server is direct on Public IP and
> Other one is behind NAT. Another test setup where it consistently
> reproducible is two server behind NAT(AWS)
> Here are the mod params. Only usrloc sync is done via DMQ and no
> other module is using DMQ.
>
> listen=udp:LocalIP:5060 advertise PublicIP:5060
>
> modparam("dmq","server_address", DMQ_LOCAL_SERVER)
> modparam("dmq", "notification_address", DMQ_REMOTE_SERVER)
> modparam("dmq", "multi_notify", 0) //1 for DNS SRV
> modparam("dmq", "num_workers", 10)
> modparam("dmq", "ping_interval", 60)
>
> modparam("dmq_usrloc", "enable", 1)
> modparam("dmq_usrloc", "sync", 1)
> modparam("dmq_usrloc", "batch_size", 4000)
> modparam("dmq_usrloc", "batch_usleep", 1000)
> modparam("dmq_usrloc", "usrloc_domain", "location")
>
> Where: DMQ_REMOTE_SERVER = sip:PublicIP2:5060
>
> GDB info as requested:
>
> Core was generated by `/usr/local/sbin/kamailio -w /tmp/kamailio
> -P /var/run/kamailio/kamailio.pid -f'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0,
> reason=0x7ffd775e3ab8) at sl.c:276
> 276 if(reason->s[reason->len-1]=='\0') {
> (gdb)
> (gdb)
> (gdb) frame 0
> #0 0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0,
> reason=0x7ffd775e3ab8) at sl.c:276
> 276 if(reason->s[reason->len-1]=='\0') {
> (gdb) p *reason
> $1 = {s = 0x0, len = 0}
> (gdb)
> (gdb) frame 1
> #1 0x00007f24656c6549 in worker_loop (id=2) at worker.c:129
> 129
> if(slb.freply(current_job->msg, peer_response.resp_code,
> (gdb) p *worker
> $3 = {queue = 0x7f2469f240a8, jobs_processed = 5, lock = {val =
> 2}, pid = 935}
> (gdb)
> (gdb)
> (gdb) p *current_job
> $6 = {f = 0x7f24656d6d8d <empty_peer_callback>, msg =
> 0x7f2469f88d40, orig_peer = 0x7f2469f6ed50, next = 0x0, prev = 0x0}
> (gdb)
>
>
> On Fri, Apr 24, 2020 at 1:30 PM Daniel-Constantin Mierla
> <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>
> Hello,
>
> have you tried the suggestion from Charles in the other
> response? It can help figuring out where the problem resides.
>
> Now, from C point of view, I would need the following output
> from gdb of the core file:
>
> frame 0
> p *reason
>
> frame 1
> p *worker
> p *current_job
>
> I would also need to know the modparams for dmq and other
> dmq_* module, plus the list if modules for which you enabled
> dmq (eg, htable, dialog, presence, ...).
>
> Cheers,
> Daniel
>
> On 24.04.20 18:10, SamyGo wrote:
>> Oops,apologize, missed that:
>>
>> version: kamailio 5.3.3 (x86_64/linux) 44ccb9-dirty
>> flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS,
>> DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC,
>> Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX,
>> FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER,
>> USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
>> ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144,
>> MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
>> poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
>> id: 44ccb9 -dirty
>> compiled on 17:04:55 Apr 17 2020 with gcc 4.9.2
>>
>> Tried this with version 5.0, 5.2, and now 5.3 same situation..
>>
>> Thankyou for looking into this,
>> Sammy
>>
>> On Fri, Apr 24, 2020 at 2:33 AM Daniel-Constantin Mierla
>> <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>>
>> Hello,
>>
>> you have to provide the version of kamailio for each
>> reported kamailio issue, otherwise is hard to match with
>> the source code. Use 'kamailio -v' to get version details.
>>
>> Cheers,
>> Daniel
>>
>> On 23.04.20 23:36, SamyGo wrote:
>>> Hi,
>>>
>>> Is there a way to broadcast KDMQ to the cluster but not
>>> expect a reply back !?as far as I've read the source
>>> code dmq_bcast_message is exactly like dmq_send_message
>>> in a way that it expects a callback to be executed on
>>> response i.e expects a reply.
>>>
>>> So, the situation I'm facing is I'm broadcasting message
>>> to cluster and I do not want a reply back. The following
>>> two options result in crash & core dump.
>>>
>>> 1 - If my script doesn't respond back, by use of
>>> dmq_handle_message, it marks the destined servers as
>>> "inactive" and stops usrloc sync process which
>>> isn't desirable.
>>> 2 - If I respond back with the dmq_handle_message it
>>> crashes the Kamailio which just received this
>>> broadcasted message.
>>>
>>> Here is how its done in script:
>>>
>>> *broadcasting message to cluster:*
>>> dmq_bcast_message("userOnline", "$fu",
>>> "text/plain");
>>>
>>> *Receiving and handling a broadcast message:*
>>> route[DMQ_HANDLE] {
>>> if(!(is_method("KDMQ") || $rm == "KDMQ")) return;
>>>
>>> if(is_method("KDMQ") || $rm == "KDMQ"){
>>> if($rU =~ "userOnline"){
>>> //user came online in cluster,
>>> resume transactions if-any suspended
>>> $avp(remoteUser) = $rb;
>>> }
>>> dmq_handle_message();
>>> exit;
>>> }
>>> }
>>>
>>> *Related log lines:*
>>> Apr 23 21:15:48 kamailio[916]: ALERT: <script>:
>>> [da2c1-2f499] ------ DMQ_HANDLE: UserOnline Event
>>> Received ------
>>> Apr 23 21:15:48 kamailio[916]: DEBUG: dmq
>>> [message.c:53]: ki_dmq_handle_message_rc():
>>> dmq_handle_message [KDMQ sip:userOnline at 9.8.7.123:5060
>>> <http://sip:userOnline@9.8.7.123:5060>]
>>> Apr 23 21:15:48 kamailio[916]: DEBUG: dmq
>>> [message.c:66]: ki_dmq_handle_message_rc():
>>> dmq_handle_message peer found: userOnline
>>> Apr 23 21:15:48 kamailio[916]: DEBUG: <core>
>>> [core/receive.c:437]: receive_msg(): request-route
>>> executed in: 401461 usec
>>> Apr 23 21:15:48 kamailio[935]: DEBUG: dmq
>>> [worker.c:87]: worker_loop(): dmq_worker [2 935] lock
>>> acquired
>>> and crash/segfault..
>>>
>>> Core dump: https://pastebin.com/S7ekCPfF
>>>
>>> Any help or pointers to solve this would be really
>>> appreciated.
>>>
>>> Best Regards,
>>> Sammy
>>>
>>> _______________________________________________
>>> Kamailio (SER) - Users Mailing List
>>> sr-users at lists.kamailio.org <mailto:sr-users at lists.kamailio.org>
>>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>
>> --
>> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com>
>> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
>>
> --
> Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com>
> www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
>
> _______________________________________________
> Kamailio (SER) - Users Mailing List
> sr-users at lists.kamailio.org <mailto:sr-users at lists.kamailio.org>
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
> --
> *Charles Chance*
> Managing Director
>
> t. 0330 120 1200 m. 07932 063 891
>
> Sipcentric Ltd. Company registered in England & Wales no.
> 7365592. Registered office: Faraday Wharf, Innovation Birmingham
> Campus, Holt Street, Birmingham Science Park, Birmingham B7 4BB.
--
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20200424/0dc2a80b/attachment.html>
More information about the sr-users
mailing list