once in a while kamailio 4.0 presence server becomes unresponsive, i.e., does not process any requests. below is bt full of a process that at that time takes most of the cpu time. rls_notifier_processes is not set, i.e., it defaults to 1.
does the bt give any clue why kamailio is unresponsive?
-- juha
(gdb) bt full #0 0xb7703424 in __kernel_vsyscall () No symbol table info available. #1 0xb765b32d in select () from /lib/i686/cmov/libc.so.6 No symbol table info available. #2 0x0812ad00 in sleep_us (child_id=-1, desc=0xbfab3310 "RLS NOTIFIER 0", make_sock=1, f=0xb6b22dc0 <timer_send_notify>, param=0xb4db0720, uinterval=100000) at ut.h:520 tval = {tv_sec = 0, tv_usec = 64460} #3 fork_basic_utimer (child_id=-1, desc=0xbfab3310 "RLS NOTIFIER 0", make_sock=1, f=0xb6b22dc0 <timer_send_notify>, param=0xb4db0720, uinterval=100000) at timer_proc.c:127 pid = <value optimized out> ts = 4294966782 #4 0xb6b23b90 in child_init (rank=0) at rls.c:704 tmp = "RLS NOTIFIER 0\000\277" i = 0 #5 0x080f5197 in init_mod_child (m=0xb71a3330, rank=0) at sr_module.c:893 No locals. #6 0x080f5110 in init_mod_child (m=0xb71a3500, rank=0) at sr_module.c:890 No locals. #7 0x080f5110 in init_mod_child (m=0xb71a37b0, rank=0) at sr_module.c:890 No locals. #8 0x080f5110 in init_mod_child (m=0xb71a3c20, rank=0) at sr_module.c:890 No locals. #9 0x080f5110 in init_mod_child (m=0xb71a4258, rank=0) at sr_module.c:890 No locals. #10 0x080f5110 in init_mod_child (m=0xb71a4420, rank=0) at sr_module.c:890 No locals. #11 0x08094dcd in main_loop () at main.c:1710 i = 0 pid = -514 si = 0x0 si_desc = "\001\000\000\000\220\065\253\277\000\000\000\000\220D\032\267\006\000\000\000\020\317\060\000\000\000\000\000\220D\032\267\001\000\000\000\330\067\005\b\320i\036\b\000\000\000\000\030\036Z\267\030\000\000\000\v\b\000\000\b\017۴\350\065\253\277\250\205\031\267\004\000\000\000\002\000\000\000\300\201\252\264\001\000\000\000\000\000\000\000\002\000\000\000|\353&\b\b\000\000\000\330\065\253\277\002\000\000\000h\353&\b\b\000\000\000\350\065\253\277\071\026\f\b" nrprocs = 134560983 #12 0x08096f66 in main (argc=16, argv=0xbfab3724) at main.c:2546 cfg_stream = 0x8 c = <value optimized out> r = -514 tmp = 0xbfab3f7d "" tmp_len = 135830000 port = <value optimized out> proto = <value optimized out> ret = <value optimized out> seed = 779380118 rfd = <value optimized out> debug_save = 0 debug_flag = <value optimized out> dont_fork_cnt = 8 n_lst = <value optimized out> p = <value optimized out>
Hello,
the bt is from custom timer process, which doesn't handle sip requests from the network.
Do a 'kamctl ps' and the select one of the sip workers to grab the back with gdb.
Cheers, Daniel
On 20/12/13 15:22, Juha Heinanen wrote:
once in a while kamailio 4.0 presence server becomes unresponsive, i.e., does not process any requests. below is bt full of a process that at that time takes most of the cpu time. rls_notifier_processes is not set, i.e., it defaults to 1.
does the bt give any clue why kamailio is unresponsive?
-- juha
(gdb) bt full #0 0xb7703424 in __kernel_vsyscall () No symbol table info available. #1 0xb765b32d in select () from /lib/i686/cmov/libc.so.6 No symbol table info available. #2 0x0812ad00 in sleep_us (child_id=-1, desc=0xbfab3310 "RLS NOTIFIER 0", make_sock=1, f=0xb6b22dc0 <timer_send_notify>, param=0xb4db0720, uinterval=100000) at ut.h:520 tval = {tv_sec = 0, tv_usec = 64460} #3 fork_basic_utimer (child_id=-1, desc=0xbfab3310 "RLS NOTIFIER 0", make_sock=1, f=0xb6b22dc0 <timer_send_notify>, param=0xb4db0720, uinterval=100000) at timer_proc.c:127 pid = <value optimized out> ts = 4294966782 #4 0xb6b23b90 in child_init (rank=0) at rls.c:704 tmp = "RLS NOTIFIER 0\000\277" i = 0 #5 0x080f5197 in init_mod_child (m=0xb71a3330, rank=0) at sr_module.c:893 No locals. #6 0x080f5110 in init_mod_child (m=0xb71a3500, rank=0) at sr_module.c:890 No locals. #7 0x080f5110 in init_mod_child (m=0xb71a37b0, rank=0) at sr_module.c:890 No locals. #8 0x080f5110 in init_mod_child (m=0xb71a3c20, rank=0) at sr_module.c:890 No locals. #9 0x080f5110 in init_mod_child (m=0xb71a4258, rank=0) at sr_module.c:890 No locals. #10 0x080f5110 in init_mod_child (m=0xb71a4420, rank=0) at sr_module.c:890 No locals. #11 0x08094dcd in main_loop () at main.c:1710 i = 0 pid = -514 si = 0x0 si_desc = "\001\000\000\000\220\065\253\277\000\000\000\000\220D\032\267\006\000\000\000\020\317\060\000\000\000\000\000\220D\032\267\001\000\000\000\330\067\005\b\320i\036\b\000\000\000\000\030\036Z\267\030\000\000\000\v\b\000\000\b\017۴\350\065\253\277\250\205\031\267\004\000\000\000\002\000\000\000\300\201\252\264\001\000\000\000\000\000\000\000\002\000\000\000|\353&\b\b\000\000\000\330\065\253\277\002\000\000\000h\353&\b\b\000\000\000\350\065\253\277\071\026\f\b" nrprocs = 134560983 #12 0x08096f66 in main (argc=16, argv=0xbfab3724) at main.c:2546 cfg_stream = 0x8 c = <value optimized out> r = -514 tmp = 0xbfab3f7d "" tmp_len = 135830000 port = <value optimized out> proto = <value optimized out> ret = <value optimized out> seed = 779380118 rfd = <value optimized out> debug_save = 0 debug_flag = <value optimized out> dont_fork_cnt = 8 n_lst = <value optimized out> p = <value optimized out>
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Daniel-Constantin Mierla writes:
the bt is from custom timer process, which doesn't handle sip requests from the network.
ok, that just happened to be the one that according to 'top' used most cpu time.
Do a 'kamctl ps' and the select one of the sip workers to grab the back with gdb.
will do. how does kamailio choose which worker gets to serve next request?
on my test laptop, i get:
# pres-serv_ctl ps 5432 attendant 5434 slow timer 5435 timer 5436 ctl handler 5437 RLS NOTIFIER 0 5438 tcp receiver (generic) child=0 5439 tcp receiver (generic) child=1 5440 tcp receiver (generic) child=2 5450 tcp receiver (generic) child=3 5452 tcp main process
is 'tcp main process' the dispatcher? if yes, can that get stuck and then prevent 'tcp receiver' processes from getting any work?
-- juha
FYI: In master there is a nice way to get all the BTs:
utils/kamctl: new command 'trap' - useful to get a full bt dump of all kamailio processes - handy in dead-lock investigatigations
regards Klaus
On 20.12.2013 19:06, Daniel-Constantin Mierla wrote:
Hello,
the bt is from custom timer process, which doesn't handle sip requests from the network.
Do a 'kamctl ps' and the select one of the sip workers to grab the back with gdb.
Cheers, Daniel
On 20/12/13 15:22, Juha Heinanen wrote:
once in a while kamailio 4.0 presence server becomes unresponsive, i.e., does not process any requests. below is bt full of a process that at that time takes most of the cpu time. rls_notifier_processes is not set, i.e., it defaults to 1.
does the bt give any clue why kamailio is unresponsive?
-- juha
(gdb) bt full #0 0xb7703424 in __kernel_vsyscall () No symbol table info available. #1 0xb765b32d in select () from /lib/i686/cmov/libc.so.6 No symbol table info available. #2 0x0812ad00 in sleep_us (child_id=-1, desc=0xbfab3310 "RLS NOTIFIER 0", make_sock=1, f=0xb6b22dc0 <timer_send_notify>, param=0xb4db0720, uinterval=100000) at ut.h:520 tval = {tv_sec = 0, tv_usec = 64460} #3 fork_basic_utimer (child_id=-1, desc=0xbfab3310 "RLS NOTIFIER 0", make_sock=1, f=0xb6b22dc0 <timer_send_notify>, param=0xb4db0720, uinterval=100000) at timer_proc.c:127 pid = <value optimized out> ts = 4294966782 #4 0xb6b23b90 in child_init (rank=0) at rls.c:704 tmp = "RLS NOTIFIER 0\000\277" i = 0 #5 0x080f5197 in init_mod_child (m=0xb71a3330, rank=0) at sr_module.c:893 No locals. #6 0x080f5110 in init_mod_child (m=0xb71a3500, rank=0) at sr_module.c:890 No locals. #7 0x080f5110 in init_mod_child (m=0xb71a37b0, rank=0) at sr_module.c:890 No locals. #8 0x080f5110 in init_mod_child (m=0xb71a3c20, rank=0) at sr_module.c:890 No locals. #9 0x080f5110 in init_mod_child (m=0xb71a4258, rank=0) at sr_module.c:890 No locals. #10 0x080f5110 in init_mod_child (m=0xb71a4420, rank=0) at sr_module.c:890 No locals. #11 0x08094dcd in main_loop () at main.c:1710 i = 0 pid = -514 si = 0x0 si_desc = "\001\000\000\000\220\065\253\277\000\000\000\000\220D\032\267\006\000\000\000\020\317\060\000\000\000\000\000\220D\032\267\001\000\000\000\330\067\005\b\320i\036\b\000\000\000\000\030\036Z\267\030\000\000\000\v\b\000\000\b\017۴\350\065\253\277\250\205\031\267\004\000\000\000\002\000\000\000\300\201\252\264\001\000\000\000\000\000\000\000\002\000\000\000|\353&\b\b\000\000\000\330\065\253\277\002\000\000\000h\353&\b\b\000\000\000\350\065\253\277\071\026\f\b"
nrprocs = 134560983
#12 0x08096f66 in main (argc=16, argv=0xbfab3724) at main.c:2546 cfg_stream = 0x8 c = <value optimized out> r = -514 tmp = 0xbfab3f7d "" tmp_len = 135830000 port = <value optimized out> proto = <value optimized out> ret = <value optimized out> seed = 779380118 rfd = <value optimized out> debug_save = 0 debug_flag = <value optimized out> dont_fork_cnt = 8 n_lst = <value optimized out> p = <value optimized out>
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Klaus Darilion writes:
utils/kamctl: new command 'trap'
- useful to get a full bt dump of all kamailio processes
- handy in dead-lock investigatigations
klaus,
thanks for the pointer. i pulled out shell script from it (below).
it would be still nice to know how kamailio is dispatching the requests to worker processes.
-- juha
#!/bin/bash
BINARY=kamailio
GDB=gdb DATE=`/bin/date +%Y%m%d_%H%M%S` LOG_FILE=/tmp/gdb_kamailio_$DATE echo "Trap file: $LOG_FILE" pres-serv_ctl ps > $LOG_FILE echo -n "Trapping Kamailio with gdb: " PID_TIMESTAMP_VECTOR=`sed -e 's/([0-9]*).*/\1/' $LOG_FILE` for pid in $PID_TIMESTAMP_VECTOR do echo -n "." PID=`echo $pid | cut -d '-' -f 1` echo "" >> $LOG_FILE echo "---start $PID -----------------------------------------------------" >> $LOG_FILE $GDB $BINARY $PID -batch --eval-command="bt full" &>> $LOG_FILE echo "---end $PID -------------------------------------------------------" >> $LOG_FILE done echo "."
Hello,
On 23/12/13 12:12, Juha Heinanen wrote:
Klaus Darilion writes:
utils/kamctl: new command 'trap'
- useful to get a full bt dump of all kamailio processes
- handy in dead-lock investigatigations
klaus,
thanks for the pointer. i pulled out shell script from it (below).
it would be still nice to know how kamailio is dispatching the requests to worker processes.
for udp, is the kernel that decides which process reads the datagram. For tcp, iirc, the main tcp process dispatches connections to tcp workers on the least loaded (by number of connections) mode.
Cheers, Daniel
-- juha
#!/bin/bash
BINARY=kamailio
GDB=gdb DATE=`/bin/date +%Y%m%d_%H%M%S` LOG_FILE=/tmp/gdb_kamailio_$DATE echo "Trap file: $LOG_FILE" pres-serv_ctl ps > $LOG_FILE echo -n "Trapping Kamailio with gdb: " PID_TIMESTAMP_VECTOR=`sed -e 's/([0-9]*).*/\1/' $LOG_FILE` for pid in $PID_TIMESTAMP_VECTOR do echo -n "." PID=`echo $pid | cut -d '-' -f 1` echo "" >> $LOG_FILE echo "---start $PID -----------------------------------------------------" >> $LOG_FILE $GDB $BINARY $PID -batch --eval-command="bt full" &>> $LOG_FILE echo "---end $PID -------------------------------------------------------" >> $LOG_FILE done echo "."