Hello,

so it is not a crash, right? No coredump or some segfault report, but just it doesn't start -- did I get it correctly?

Given you run a lot of instances, maybe you run out of file descriptors, can you check the OS limits for them?

Also, running out of memory might result in such behaviour.

Cheers,
Daniel


On 25/05/16 15:03, Sebastian Damm wrote:
Hi,

we have a machine running 16 Kamailio instances, and while upgrading
to 4.4.1 (from 4.3.5), 8 of them wouldn't start. When downgrading to
4.3.5, they all start again.

All of them have pretty identical configuration files, except IPs,
ports and some code enabled or disabled via defines. After comparing a
working and non-working configuration and adjusting setting by
setting, we finally ended up with a working configuration. The
difference is, that it won't start when a part of the code DOES NOT
get included. If it gets included, it will start.

This is the mentioned part of our main route:

#!ifdef ENABLE_INV_RATELIMIT
                # Check for INVITE limit
                if (is_method("INVITE") && $au == $null && !($ua =~
"sipgate") ) {
                        $var(invcount) = $shtcn(invcount=>%~$fU);
                        xlog("L_INFO", "INVITE Requests from $fU in
last 30 seconds: $var(invcount)\n");

                        if ($var(invcount) < 12) {
                                $var(uniqcid) = $ci + $Ts + $ft;
                                $var(tkey) = $fU + '-' +
$(var(uniqcid){s.md5}{s.substr,0,10});
                                $sht(invcount=>$(var(tkey))) = 1;
                                $var(uniqcid) = $null;
                                $var(tkey) = $null;
                        }

                        if ($var(invcount) > 10) {
                                if ($var(invcount) == 11 ) {
                                        xlog("L_NOTICE", "User $fU
($var(domain2use)) over ratelimit for new calls, rejecting.\n");
                                }
                                # Enable this only after evaluating the impact!
                                append_to_reply("Retry-After: 30\r\n");
                                sl_send_reply("503", "Call Rate Limit
Exceeded");
                                exit;
                        }
                }
#!endif


If we put this line at the top of the configuration file, everything works:

#!define ENABLE_INV_RATELIMIT

If we delete this line, startup does not work. It just sits in ps for
one minute without forking, and then gets terminated.

We enabled a bit of debugging, and this is apparently the error
causing Kamailio to shutdown:

May 25 14:50:15 kammel /usr/sbin/kamailio[24989]: DEBUG: <core>
[sr_module.c:920]: init_mod_child(): rank 53: nathelper
May 25 14:50:15 kammel /usr/sbin/kamailio[24987]: DEBUG: <core>
[local_timer.c:61]: init_local_timer(): timer_list between 0x9f0428
and 0xa34428
May 25 14:50:15 kammel /usr/sbin/kamailio[24987]: DEBUG: <core>
[io_wait.h:376]: io_watch_add(): DBG: io_watch_add(0x9f0240, 82, 1,
(nil)), fd_no=0
May 25 14:50:15 kammel /usr/sbin/kamailio[24987]: ERROR: <core>
[io_wait.h:459]: io_watch_add(): epoll_ctl failed: Bad file descriptor
[9]
May 25 14:50:15 kammel /usr/sbin/kamailio[24987]: CRITICAL: <core>
[tcp_read.c:1747]: tcp_receive_loop(): failed to add tcp main socket
to the fd list
May 25 14:50:15 kammel /usr/sbin/kamailio[24987]: CRITICAL: <core>
[tcp_read.c:1815]: tcp_receive_loop(): exiting...


I have no idea, how this part of the code could lead to this error,
but it is reproducable, that at least on this system setting or
disabling this define fixes or breaks the startup.

Does anybody have an idea, what's happening there?

Best Regards,
Sebastian

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
http://www.asipto.com - http://www.kamailio.org
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda