[SR-Users] DMQ broadcasting crashes kamailio

Daniel-Constantin Mierla miconda at gmail.com
Fri Apr 24 21:21:32 CEST 2020


Hello,

I pushed two patches to prevent the crash, even the modules is not used
as expected in the config.

Charles: can you check and see if both makes sense? The one in
worker_loop() function is to prevent the crash:

  *
https://github.com/kamailio/kamailio/commit/a675ab88fefac75145a7d563fee0431458630529

This should be backported if all goes fine with it.

The second one in empty_peer_callback() is to generated a 202-Accepted
response, otherwise in such cases the sender will do retransmissions:

  *
https://github.com/kamailio/kamailio/commit/7f618c2d855ac268df905eb3d6e18733c8773047

But maybe it was on purpose not to send a response (i.e., to allow
sending the response from config), in such case it can be reverted.

Cheers,
Daniel

On 24.04.20 20:57, Charles Chance wrote:
> Hi,
>
> Did you try the config snippet I provided?
>
> Basically dmq_handle_message() must be called if the message is not
> your own, otherwise the node discovery/health check will not work and
> you will see nodes disappearing as you described.
>
> Here it is again:
>
>     if(is_method("KDMQ")){
>
>         if($rU =~ "userOnline"){
>             //user came online in cluster, resume transactions if-any
> suspended
>             $avp(remoteUser) = $rb;    
>         } else {
>             dmq_handle_message();
>         }
>     }
>
> Notice that we check for your own/custom message first, then call
> handle message if not matched.
>
> Let me know if it works.
>
> Cheers,
>
> Charles
>
>
> On Fri, 24 Apr 2020 at 19:52, SamyGo <govoiper at gmail.com
> <mailto:govoiper at gmail.com>> wrote:
>
>     Yes,
>     I did read all(past 3+ years) his replies specific to DMQ and DMQ
>     USRLOC and only one matched exact description and there has no
>     resolution to it. 
>     Github open+closed issues for DMQ didn't have anything similar
>     either. Could it be something I'm doing wrong !? 
>
>     Additional info:  One of the server is direct on Public IP and
>     Other one is behind NAT. Another test setup where it consistently
>     reproducible is two server behind NAT(AWS) 
>     Here are the mod params.  Only usrloc sync is done via DMQ and no
>     other module is using DMQ. 
>
>     listen=udp:LocalIP:5060 advertise PublicIP:5060
>
>     modparam("dmq","server_address", DMQ_LOCAL_SERVER)
>     modparam("dmq", "notification_address", DMQ_REMOTE_SERVER)
>     modparam("dmq", "multi_notify", 0) //1 for DNS SRV
>     modparam("dmq", "num_workers", 10)
>     modparam("dmq", "ping_interval", 60)
>
>     modparam("dmq_usrloc", "enable", 1)
>     modparam("dmq_usrloc", "sync", 1)
>     modparam("dmq_usrloc", "batch_size", 4000)
>     modparam("dmq_usrloc", "batch_usleep", 1000)
>     modparam("dmq_usrloc", "usrloc_domain", "location")
>
>     Where:  DMQ_REMOTE_SERVER  = sip:PublicIP2:5060 
>
>     GDB info as requested:
>
>     Core was generated by `/usr/local/sbin/kamailio -w /tmp/kamailio
>     -P /var/run/kamailio/kamailio.pid -f'.
>     Program terminated with signal SIGSEGV, Segmentation fault.
>     #0  0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0,
>     reason=0x7ffd775e3ab8) at sl.c:276
>     276             if(reason->s[reason->len-1]=='\0') {
>     (gdb)
>     (gdb)
>     (gdb) frame 0
>     #0  0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0,
>     reason=0x7ffd775e3ab8) at sl.c:276
>     276             if(reason->s[reason->len-1]=='\0') {
>     (gdb) p *reason
>     $1 = {s = 0x0, len = 0}
>     (gdb)
>     (gdb) frame 1
>     #1  0x00007f24656c6549 in worker_loop (id=2) at worker.c:129
>     129                                    
>     if(slb.freply(current_job->msg, peer_response.resp_code,
>     (gdb) p *worker
>     $3 = {queue = 0x7f2469f240a8, jobs_processed = 5, lock = {val =
>     2}, pid = 935}
>     (gdb)
>     (gdb)
>     (gdb) p *current_job
>     $6 = {f = 0x7f24656d6d8d <empty_peer_callback>, msg =
>     0x7f2469f88d40, orig_peer = 0x7f2469f6ed50, next = 0x0, prev = 0x0}
>     (gdb)
>
>
>     On Fri, Apr 24, 2020 at 1:30 PM Daniel-Constantin Mierla
>     <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>
>         Hello,
>
>         have you tried the suggestion from Charles in the other
>         response? It can help figuring out where the problem resides.
>
>         Now, from C point of view, I would need the following output
>         from gdb of the core file:
>
>         frame 0
>         p *reason
>
>         frame 1
>         p *worker
>         p *current_job
>
>         I would also need to know the modparams for dmq and other
>         dmq_* module, plus the list if modules for which you enabled
>         dmq (eg, htable, dialog, presence, ...).
>
>         Cheers,
>         Daniel
>
>         On 24.04.20 18:10, SamyGo wrote:
>>         Oops,apologize, missed that:
>>
>>         version: kamailio 5.3.3 (x86_64/linux) 44ccb9-dirty
>>         flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS,
>>         DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC,
>>         Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX,
>>         FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER,
>>         USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
>>         ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144,
>>         MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
>>         poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
>>         id: 44ccb9 -dirty
>>         compiled on 17:04:55 Apr 17 2020 with gcc 4.9.2
>>
>>         Tried this with version 5.0, 5.2, and now 5.3 same situation.. 
>>
>>         Thankyou for looking into this,
>>         Sammy
>>
>>         On Fri, Apr 24, 2020 at 2:33 AM Daniel-Constantin Mierla
>>         <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>>
>>             Hello,
>>
>>             you have to provide the version of kamailio for each
>>             reported kamailio issue, otherwise is hard to match with
>>             the source code. Use 'kamailio -v' to get version details.
>>
>>             Cheers,
>>             Daniel
>>
>>             On 23.04.20 23:36, SamyGo wrote:
>>>             Hi,
>>>
>>>             Is there a way to broadcast KDMQ to the cluster but not
>>>             expect a reply back !?as far as I've read the source
>>>             code dmq_bcast_message is exactly like dmq_send_message
>>>             in a way that it expects a callback to be executed on
>>>             response i.e expects a reply.
>>>
>>>             So, the situation I'm facing is I'm broadcasting message
>>>             to cluster and I do not want a reply back. The following
>>>             two options result in crash & core dump.
>>>
>>>             1 - If my script doesn't respond back, by use of
>>>             dmq_handle_message, it marks the destined servers as
>>>             "inactive" and stops usrloc sync process which
>>>             isn't desirable.
>>>             2 - If I respond back with the dmq_handle_message it
>>>             crashes the Kamailio which just received this
>>>             broadcasted message.
>>>
>>>             Here is how its done in script:
>>>
>>>             *broadcasting message to cluster:*
>>>                     dmq_bcast_message("userOnline", "$fu",
>>>             "text/plain");
>>>
>>>             *Receiving and handling a broadcast message:*
>>>             route[DMQ_HANDLE] {
>>>                 if(!(is_method("KDMQ") || $rm == "KDMQ")) return;
>>>                
>>>                 if(is_method("KDMQ") || $rm == "KDMQ"){
>>>                         if($rU =~ "userOnline"){
>>>                                 //user came online in cluster,
>>>             resume transactions if-any suspended
>>>                                 $avp(remoteUser) = $rb;    
>>>                         }
>>>                         dmq_handle_message();
>>>                         exit;
>>>                 }
>>>             }
>>>
>>>             *Related log lines:*
>>>             Apr 23 21:15:48  kamailio[916]: ALERT: <script>:
>>>             [da2c1-2f499] ------ DMQ_HANDLE: UserOnline Event
>>>             Received ------
>>>             Apr 23 21:15:48  kamailio[916]: DEBUG: dmq
>>>             [message.c:53]: ki_dmq_handle_message_rc():
>>>             dmq_handle_message [KDMQ sip:userOnline at 9.8.7.123:5060
>>>             <http://sip:userOnline@9.8.7.123:5060>]
>>>             Apr 23 21:15:48  kamailio[916]: DEBUG: dmq
>>>             [message.c:66]: ki_dmq_handle_message_rc():
>>>             dmq_handle_message peer found: userOnline
>>>             Apr 23 21:15:48  kamailio[916]: DEBUG: <core>
>>>             [core/receive.c:437]: receive_msg(): request-route
>>>             executed in: 401461 usec
>>>             Apr 23 21:15:48  kamailio[935]: DEBUG: dmq
>>>             [worker.c:87]: worker_loop(): dmq_worker [2 935] lock
>>>             acquired
>>>             and crash/segfault..
>>>
>>>             Core dump: https://pastebin.com/S7ekCPfF
>>>
>>>             Any help or pointers to solve this would be really
>>>             appreciated.
>>>
>>>             Best Regards,
>>>             Sammy
>>>
>>>             _______________________________________________
>>>             Kamailio (SER) - Users Mailing List
>>>             sr-users at lists.kamailio.org <mailto:sr-users at lists.kamailio.org>
>>>             https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>
>>             -- 
>>             Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com>
>>             www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
>>
>         -- 
>         Daniel-Constantin Mierla -- www.asipto.com <http://www.asipto.com>
>         www.twitter.com/miconda <http://www.twitter.com/miconda> -- www.linkedin.com/in/miconda <http://www.linkedin.com/in/miconda>
>
>     _______________________________________________
>     Kamailio (SER) - Users Mailing List
>     sr-users at lists.kamailio.org <mailto:sr-users at lists.kamailio.org>
>     https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
> -- 
> *Charles Chance*
> Managing Director
>
> t. 0330 120 1200    m. 07932 063 891
>
> Sipcentric Ltd. Company registered in England & Wales no.
> 7365592. Registered office: Faraday Wharf, Innovation Birmingham
> Campus, Holt Street, Birmingham Science Park, Birmingham B7 4BB.

-- 
Daniel-Constantin Mierla -- www.asipto.com
www.twitter.com/miconda -- www.linkedin.com/in/miconda

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20200424/0dc2a80b/attachment.html>


More information about the sr-users mailing list