[SR-Users] DMQ broadcasting crashes kamailio

SamyGo govoiper at gmail.com
Sat Apr 25 00:10:56 CEST 2020


Hey,
Charles I'm so sorry, only after Daniel's reply I could read your reply and
later found your replies were marked spam ! probably didnt need to read all
your years old replies to other posts :)

Yeah that's how I understood it as well that I didn't need to handle it,
and when I first completed this code some 6months ago it used to look
pretty much like how you showed.
Here is exact route from config:

# DMQ  processing
route[DMQ_HANDLE] {
    if(!(is_method("KDMQ") || $rm == "KDMQ"))
            return;

    if(is_method("KDMQ") || $rm == "KDMQ"){                // I had v5.0.8
and read that sometimes is_method() isn't enough so just in case add extra
protection
            if($rU =~ "userOnline"){
                    $avp(remoteUser) = $rb;
                    route(CHECK_WAITING_TRANSACTIONS);      //removing this
route doesn't help to resolve the crash
                    #sl_send_reply("200","OK")              //big NO NO,
never do this. Crash imminent
                    t_release();                            //IDK why, but
I got it from older mailing list snippet...removing it still doesn't help

                    exit;                                   // exit or not
crash is still here.
            }
            dmq_handle_message();
    }
    t_release();
    exit;
}

Btw, I had a version where I broadcast using existing peer name like this:

dmq_bcast_message("peer_name", "$fu", "usrloc/online");  & handled the $cT
instead of $rU.

Is that a better way so I don't create a new peer/handler ?

So it had been working but randomly the cluster pair went into a restart
cycle and ever since then they just crash each other. As in, if A is dead I
start it up, it sends this custom KDMQ to B and B is dead.
Since past few days I've tried few different variations in the config
script and kamailio versions and all of them lead to pretty much same crash
unless I just stop sending the broadcast message to cluster.

@Daniel, I'm about to test your patches to see if that makes a difference.
Will get on this with results soon.

Thank you so much.
Best Regards,
Sammy



On Fri, Apr 24, 2020 at 3:24 PM Daniel-Constantin Mierla <miconda at gmail.com>
wrote:

> Hello,
>
> I pushed two patches to prevent the crash, even the modules is not used as
> expected in the config.
>
> Charles: can you check and see if both makes sense? The one in
> worker_loop() function is to prevent the crash:
>
>   *
> https://github.com/kamailio/kamailio/commit/a675ab88fefac75145a7d563fee0431458630529
>
> This should be backported if all goes fine with it.
>
> The second one in empty_peer_callback() is to generated a 202-Accepted
> response, otherwise in such cases the sender will do retransmissions:
>
>   *
> https://github.com/kamailio/kamailio/commit/7f618c2d855ac268df905eb3d6e18733c8773047
>
> But maybe it was on purpose not to send a response (i.e., to allow sending
> the response from config), in such case it can be reverted.
>
> Cheers,
> Daniel
> On 24.04.20 20:57, Charles Chance wrote:
>
> Hi,
>
> Did you try the config snippet I provided?
>
> Basically dmq_handle_message() must be called if the message is not your
> own, otherwise the node discovery/health check will not work and you will
> see nodes disappearing as you described.
>
> Here it is again:
>
>     if(is_method("KDMQ")){
>
>         if($rU =~ "userOnline"){
>             //user came online in cluster, resume transactions if-any
> suspended
>             $avp(remoteUser) = $rb;
>         } else {
>             dmq_handle_message();
>         }
>     }
>
> Notice that we check for your own/custom message first, then call handle
> message if not matched.
>
> Let me know if it works.
>
> Cheers,
>
> Charles
>
>
> On Fri, 24 Apr 2020 at 19:52, SamyGo <govoiper at gmail.com> wrote:
>
>> Yes,
>> I did read all(past 3+ years) his replies specific to DMQ and DMQ USRLOC
>> and only one matched exact description and there has no resolution to it.
>> Github open+closed issues for DMQ didn't have anything similar either.
>> Could it be something I'm doing wrong !?
>>
>> Additional info:  One of the server is direct on Public IP and Other one
>> is behind NAT. Another test setup where it consistently reproducible is two
>> server behind NAT(AWS)
>> Here are the mod params.  Only usrloc sync is done via DMQ and no other
>> module is using DMQ.
>>
>> listen=udp:LocalIP:5060 advertise PublicIP:5060
>>
>> modparam("dmq","server_address", DMQ_LOCAL_SERVER)
>> modparam("dmq", "notification_address", DMQ_REMOTE_SERVER)
>> modparam("dmq", "multi_notify", 0) //1 for DNS SRV
>> modparam("dmq", "num_workers", 10)
>> modparam("dmq", "ping_interval", 60)
>>
>> modparam("dmq_usrloc", "enable", 1)
>> modparam("dmq_usrloc", "sync", 1)
>> modparam("dmq_usrloc", "batch_size", 4000)
>> modparam("dmq_usrloc", "batch_usleep", 1000)
>> modparam("dmq_usrloc", "usrloc_domain", "location")
>>
>> Where:  DMQ_REMOTE_SERVER  = sip:PublicIP2:5060
>>
>> GDB info as requested:
>>
>> Core was generated by `/usr/local/sbin/kamailio -w /tmp/kamailio -P
>> /var/run/kamailio/kamailio.pid -f'.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0,
>> reason=0x7ffd775e3ab8) at sl.c:276
>> 276             if(reason->s[reason->len-1]=='\0') {
>> (gdb)
>> (gdb)
>> (gdb) frame 0
>> #0  0x00007f248c4cef15 in send_reply (msg=0x7f2469f88d40, code=0,
>> reason=0x7ffd775e3ab8) at sl.c:276
>> 276             if(reason->s[reason->len-1]=='\0') {
>> (gdb) p *reason
>> $1 = {s = 0x0, len = 0}
>> (gdb)
>> (gdb) frame 1
>> #1  0x00007f24656c6549 in worker_loop (id=2) at worker.c:129
>> 129                                     if(slb.freply(current_job->msg,
>> peer_response.resp_code,
>> (gdb) p *worker
>> $3 = {queue = 0x7f2469f240a8, jobs_processed = 5, lock = {val = 2}, pid =
>> 935}
>> (gdb)
>> (gdb)
>> (gdb) p *current_job
>> $6 = {f = 0x7f24656d6d8d <empty_peer_callback>, msg = 0x7f2469f88d40,
>> orig_peer = 0x7f2469f6ed50, next = 0x0, prev = 0x0}
>> (gdb)
>>
>>
>> On Fri, Apr 24, 2020 at 1:30 PM Daniel-Constantin Mierla <
>> miconda at gmail.com> wrote:
>>
>>> Hello,
>>>
>>> have you tried the suggestion from Charles in the other response? It can
>>> help figuring out where the problem resides.
>>>
>>> Now, from C point of view, I would need the following output from gdb of
>>> the core file:
>>>
>>> frame 0
>>> p *reason
>>>
>>> frame 1
>>> p *worker
>>> p *current_job
>>>
>>> I would also need to know the modparams for dmq and other dmq_* module,
>>> plus the list if modules for which you enabled dmq (eg, htable, dialog,
>>> presence, ...).
>>>
>>> Cheers,
>>> Daniel
>>> On 24.04.20 18:10, SamyGo wrote:
>>>
>>> Oops,apologize, missed that:
>>>
>>> version: kamailio 5.3.3 (x86_64/linux) 44ccb9-dirty
>>> flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS,
>>> DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC,
>>> F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT,
>>> USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST,
>>> HAVE_RESOLV_RES
>>> ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE
>>> 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
>>> poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
>>> id: 44ccb9 -dirty
>>> compiled on 17:04:55 Apr 17 2020 with gcc 4.9.2
>>>
>>> Tried this with version 5.0, 5.2, and now 5.3 same situation..
>>>
>>> Thankyou for looking into this,
>>> Sammy
>>>
>>> On Fri, Apr 24, 2020 at 2:33 AM Daniel-Constantin Mierla <
>>> miconda at gmail.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> you have to provide the version of kamailio for each reported kamailio
>>>> issue, otherwise is hard to match with the source code. Use 'kamailio -v'
>>>> to get version details.
>>>>
>>>> Cheers,
>>>> Daniel
>>>> On 23.04.20 23:36, SamyGo wrote:
>>>>
>>>> Hi,
>>>>
>>>> Is there a way to broadcast KDMQ to the cluster but not expect a reply
>>>> back !?as far as I've read the source code dmq_bcast_message is exactly
>>>> like dmq_send_message in a way that it expects a callback to be executed on
>>>> response i.e expects a reply.
>>>>
>>>> So, the situation I'm facing is I'm broadcasting message to cluster and
>>>> I do not want a reply back. The following two options result in crash &
>>>> core dump.
>>>>
>>>> 1 - If my script doesn't respond back, by use of dmq_handle_message, it
>>>> marks the destined servers as "inactive" and stops usrloc sync process
>>>> which isn't desirable.
>>>> 2 - If I respond back with the dmq_handle_message it crashes the
>>>> Kamailio which just received this broadcasted message.
>>>>
>>>> Here is how its done in script:
>>>>
>>>> *broadcasting message to cluster:*
>>>>         dmq_bcast_message("userOnline", "$fu", "text/plain");
>>>>
>>>> *Receiving and handling a broadcast message:*
>>>> route[DMQ_HANDLE] {
>>>>     if(!(is_method("KDMQ") || $rm == "KDMQ")) return;
>>>>
>>>>     if(is_method("KDMQ") || $rm == "KDMQ"){
>>>>             if($rU =~ "userOnline"){
>>>>                     //user came online in cluster, resume transactions
>>>> if-any suspended
>>>>                     $avp(remoteUser) = $rb;
>>>>             }
>>>>             dmq_handle_message();
>>>>             exit;
>>>>     }
>>>> }
>>>>
>>>> *Related log lines:*
>>>> Apr 23 21:15:48  kamailio[916]: ALERT: <script>: [da2c1-2f499] ------
>>>> DMQ_HANDLE: UserOnline Event Received ------
>>>> Apr 23 21:15:48  kamailio[916]: DEBUG: dmq [message.c:53]:
>>>> ki_dmq_handle_message_rc(): dmq_handle_message [KDMQ
>>>> sip:userOnline at 9.8.7.123:5060]
>>>> Apr 23 21:15:48  kamailio[916]: DEBUG: dmq [message.c:66]:
>>>> ki_dmq_handle_message_rc(): dmq_handle_message peer found: userOnline
>>>> Apr 23 21:15:48  kamailio[916]: DEBUG: <core> [core/receive.c:437]:
>>>> receive_msg(): request-route executed in: 401461 usec
>>>> Apr 23 21:15:48  kamailio[935]: DEBUG: dmq [worker.c:87]:
>>>> worker_loop(): dmq_worker [2 935] lock acquired
>>>> and crash/segfault..
>>>>
>>>> Core dump: https://pastebin.com/S7ekCPfF
>>>>
>>>> Any help or pointers to solve this would be really appreciated.
>>>>
>>>> Best Regards,
>>>> Sammy
>>>>
>>>> _______________________________________________
>>>> Kamailio (SER) - Users Mailing Listsr-users at lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>>>
>>>> --
>>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>>
>>>> --
>>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>>>
>>> _______________________________________________
>> Kamailio (SER) - Users Mailing List
>> sr-users at lists.kamailio.org
>> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>>
> --
> *Charles Chance*
> Managing Director
>
> t. 0330 120 1200    m. 07932 063 891
>
> Sipcentric Ltd. Company registered in England & Wales no. 7365592. Registered
> office: Faraday Wharf, Innovation Birmingham Campus, Holt Street,
> Birmingham Science Park, Birmingham B7 4BB.
>
> --
> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>
> _______________________________________________
> Kamailio (SER) - Users Mailing List
> sr-users at lists.kamailio.org
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-users/attachments/20200424/5588b04b/attachment.html>


More information about the sr-users mailing list