[OpenSER-Devel] XMLRPC issues

Jason Penton jason.penton at smilecoms.com
Tue Apr 22 18:40:39 CEST 2008


Hi Dan

I noticed exactly the same symptoms. It seems to be a problem with the
number of threads being allowed in the abyss web server. When your program
hangs and you do a 'netstat' you will notice 16 connections locked in the
FIN_WAIT_2 stage. There is a hard limit setting in abyss.h to increase this
number but this doesn’t fix the root cause (merely prolonging it :D ).

Having a quick look through the code revealed some insight:

It seems that to cater for asynchronous requests the server is setup to
create connections and leave them open until a background thread (syn.
Garbage collector) comes around and cleans up the completed transaction
connections. I didn't have too much time to poke around but it seems that
this bg thread either takes a long time or doesn’t do its job at all. The
server then blocks until it has available connections to dispatch again.  I
will try and check in detail tomorrow if I have time.

In the meantime I was able to stabilise my server by changing the
transaction model to be foreground (sync) processes (good enough for my
purposes). You can change this in xmlrpc_server.c (I think its in the
RunServer2 fn). After this change - I was able to process millions of
transactions without any lockups / errors.

Hope this helps / adds insight into a bug fix.

Cheers
Jason

Whenever I run a lot of xmlrpc requests, after a while the xmlrpc process 
becomes unresponsive. It is still there, it doesn't crash, the rest of 
openser works as expected, but the xmlrpc process simply doesn't answer 
any request anymore until restarted. The number of requests I have to 
issue to make this happen is variable. Sometimes is takes only 100 
requests, other times is need up to 20000, but the end result is always 
the same.

The xmlrpc request I ran is refreshWatchers(account, 'presence', 0) called 
repeatedly until it no longer receives a reply. The account is a 
subscriber that is present in the system. But I also tried pwd and ps and 
got the same result.

Below is the strace right before and while it got stuck. As it can be seen 
it got stuck in futex() which seems to indicate some sort of locking 
mechanism that has entered a deadlock. The futex is not called before 
that point in the whole strace output, and when it is called it locks.


--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4353
close(9)                                = 0
waitpid(-1, 0xbfdcca6c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
getpeername(8, {sa_family=AF_INET, sin_port=htons(38926), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
SIGCHLD, child_tidptr=0xb7c69bc8) = 4354
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
accept(6, {sa_family=AF_INET, sin_port=htons(38928), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 9
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4354
close(8)                                = 0
waitpid(-1, 0xbfdcca6c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
getpeername(9, {sa_family=AF_INET, sin_port=htons(38928), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
SIGCHLD, child_tidptr=0xb7c69bc8) = 4355
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
accept(6, {sa_family=AF_INET, sin_port=htons(38930), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 8
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4355
close(9)                                = 0
waitpid(-1, 0xbfdcca6c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
getpeername(8, {sa_family=AF_INET, sin_port=htons(38930), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
SIGCHLD, child_tidptr=0xb7c69bc8) = 4356
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
accept(6, {sa_family=AF_INET, sin_port=htons(38932), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 9
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4356
close(8)                                = 0
waitpid(-1, 0xbfdcca6c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
getpeername(9, {sa_family=AF_INET, sin_port=htons(38932), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
SIGCHLD, child_tidptr=0xb7c69bc8) = 4357
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
accept(6, {sa_family=AF_INET, sin_port=htons(38934), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 8
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4357
close(9)                                = 0
waitpid(-1, 0xbfdcca6c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
getpeername(8, {sa_family=AF_INET, sin_port=htons(38934), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
SIGCHLD, child_tidptr=0xb7c69bc8) = 4358
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
accept(6, {sa_family=AF_INET, sin_port=htons(38936), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 9
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4358
close(8)                                = 0
waitpid(-1, 0xbfdcca6c, WNOHANG)        = -1 ECHILD (No child processes)
sigreturn()                             = ? (mask now [])
getpeername(9, {sa_family=AF_INET, sin_port=htons(38936), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|
SIGCHLD, child_tidptr=0xb7c69bc8) = 4359
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
accept(6, {sa_family=AF_INET, sin_port=htons(38938), 
sin_addr=inet_addr("10.0.0.146")}, [16]) = 8
kill(4359, SIG_0)                       = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 4359
close(9)                                = 0
futex(0xb7dc9140, FUTEX_WAIT, 2, NULL


-- 
Dan


No virus found in this outgoing message.
Checked by AVG. 
Version: 7.5.524 / Virus Database: 269.23.1/1384 - Release Date: 2008/04/17
03:47 PM
 




More information about the Devel mailing list