Hi all,
My OpenSER system seems to run out of TCP handlers all the time. It looks to me that I need two child processes for each local IP and peer combination, because to and back communication do not use the same TCP channel.
I have a lot of TCP clients which would lead to more than 100 child processes. Does it make sense to reduce tcp_connection_lifetime? This would increase TCP overhead, but would allow me to have fewer child processes, right?
Any help is welcome. Thanks guys!
Regards
Gawith schrieb:
Hi all,
My OpenSER system seems to run out of TCP handlers all the time. It looks to me that I need two child processes for each local IP and peer combination, because to and back communication do not use the same TCP channel.
Using different TCP connection for sending/receiving is not elegant, but possible. Nevertheless responses should always be sent via the same TCP connection as the request.
I have a lot of TCP clients which would lead to more than 100 child processes.
To handle 1000 TCP connection you do not need 1000 processes. In Kamailio there is one process which handles all the incoming TCP messages. As soon as the "TCP receiver" has received enough TCP fragments to build the whole SIP message, the SIP message is handled over to one of the "TCP worker" processes. You can identify the processes using "kamctl ps", e.g.:
# kamctl ps Process:: ID=0 PID=9280 Type=attendant Process:: ID=1 PID=9281 Type=SIP receiver udp:3.13.32.14:5060 Process:: ID=2 PID=9282 Type=SIP receiver udp:3.13.32.14:5060 Process:: ID=3 PID=9283 Type=SIP receiver udp:3.13.32.14:5060 Process:: ID=4 PID=9284 Type=SIP receiver udp:3.13.32.14:5060 Process:: ID=5 PID=9285 Type=SIP receiver udp:3.13.32.15:5060 Process:: ID=6 PID=9286 Type=SIP receiver udp:3.13.32.15:5060 Process:: ID=7 PID=9287 Type=SIP receiver udp:3.13.32.15:5060 Process:: ID=8 PID=9288 Type=SIP receiver udp:3.13.32.15:5060 Process:: ID=9 PID=9289 Type=timer Process:: ID=10 PID=9290 Type=MI FIFO Process:: ID=11 PID=9291 Type=TCP receiver Process:: ID=12 PID=9292 Type=TCP receiver Process:: ID=13 PID=9293 Type=TCP receiver Process:: ID=14 PID=9294 Type=TCP receiver Process:: ID=15 PID=9295 Type=TCP main
If a TCP message will be sent, a worker process will lookup for a existing TCP connection, "put" the TCP connection, send the SIP message on this TCP connection, and then give the TCP connection back to the "pool".
Does it make sense to reduce tcp_connection_lifetime?
No. Especially not if TCP is used between clients and the server. Then the proxy should set the lifetime of the TCP connection to the lifetime of the registration. http://www.kamailio.org/docs/modules/devel/registrar.html#id2452958
This would increase TCP overhead, but would allow me to have fewer child processes, right?
No. As said above - with 1 TCP children you can handle lots of TCP connections.
Any help is welcome. Thanks guys!
btw: do you have any problems at all?
regards klaus
Klaus Darilion schrieb:
Using different TCP connection for sending/receiving is not elegant, but possible. Nevertheless responses should always be sent via the same TCP connection as the request.
I agree, but unfortunately I have no influence on the PBX behaviour...
btw: do you have any problems at all?
Yes, I see the "no free tcp receiver, connection passed to the leastbusy one" message several hundred times in the logs, and now and then incoming requests are not answered at all (or maybe only after a VERY long time, I've not checked that).
Regards
Gawith schrieb:
Klaus Darilion schrieb:
Using different TCP connection for sending/receiving is not elegant, but possible. Nevertheless responses should always be sent via the same TCP connection as the request.
I agree, but unfortunately I have no influence on the PBX behaviour...
btw: do you have any problems at all?
Yes, I see the "no free tcp receiver, connection passed to the leastbusy one" message several hundred times in the logs, and now and then incoming requests are not answered at all (or maybe only after a VERY long time, I've not checked that).
How many TCP worker processes do you have (tcp_children)?
How busy is the server? requests/transactions per second
regards klaus
Gawith wrote:
Klaus Darilion schrieb:
How many TCP worker processes do you have (tcp_children)?
How busy is the server? requests/transactions per second
That's the point. The server does nearly nothing: I have 10 TCP childs and only about 2 INVITEs/sec.
Then there is probably something going wrong.
I would: - stop the proxy and all clients - start tcpdump or wireshark (without capture filter) - start the clients (if you have clients) - make a single call, hang up - repeat the single call, hang up.
Then analyze the capture file: Are there "strange" SYN packets trying to establish a connection which does not work? Are there other TCP errors (RST, FIN ...)
I suspect that workers are trying to establish new tcp connection which do trigger a timeout - and as TCP handling in Kamailio is blocking the TCP workers are busy when new requests are received.
Note: With "TCP receiver" I mean the single process which handles incoming TCP SIP messages and forwards the messages to the TCP worker processes (the number of TCP workers are configured with tcp_children). Thus, IMO the warning message in the log file is wrong because it is talking about TCP receivers although the TCP workers are meant.
You could also adjust the various TCP timeout settings to shorter value - but I recommend to use default value (large timeouts) as this will help you to identify the problem.
regards klaus
Regards
Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users
Klaus Darilion schrieb:
That's the point. The server does nearly nothing: I have 10 TCP childs and only about 2 INVITEs/sec.
Then there is probably something going wrong.
I would:
- stop the proxy and all clients
- start tcpdump or wireshark (without capture filter)
- start the clients (if you have clients)
- make a single call, hang up
- repeat the single call, hang up.
Then analyze the capture file: Are there "strange" SYN packets trying to establish a connection which does not work? Are there other TCP errors (RST, FIN ...)
I suspect that workers are trying to establish new tcp connection which do trigger a timeout - and as TCP handling in Kamailio is blocking the TCP workers are busy when new requests are received.
I've check the traces and can not see any suspicious SYNs etc. All looks fine. But I recognized that the "no free tcp receiver" comes along with a "CRITICAL:core:handle_io: empty fd map" messages. Nevertheless, I have no clue at all...
Just for clarification: if I have 5 TCP children, then only 4 childs are worker processes. If a sip peering partner with 4 different SIP server has a network split, all my worker childs are blocked and wait for TCP timeouts. In the meantime, no calls to anywhere else is possible. Right?
Thanks.
Gawith schrieb:
Klaus Darilion schrieb:
That's the point. The server does nearly nothing: I have 10 TCP childs and only about 2 INVITEs/sec.
Then there is probably something going wrong.
I would:
- stop the proxy and all clients
- start tcpdump or wireshark (without capture filter)
- start the clients (if you have clients)
- make a single call, hang up
- repeat the single call, hang up.
Then analyze the capture file: Are there "strange" SYN packets trying to establish a connection which does not work? Are there other TCP errors (RST, FIN ...)
I suspect that workers are trying to establish new tcp connection which do trigger a timeout - and as TCP handling in Kamailio is blocking the TCP workers are busy when new requests are received.
I've check the traces and can not see any suspicious SYNs etc. All looks fine. But I recognized that the "no free tcp receiver" comes along with a "CRITICAL:core:handle_io: empty fd map" messages. Nevertheless, I have no clue at all...
Just for clarification: if I have 5 TCP children, then only 4 childs are worker processes. If a sip peering partner with 4 different SIP server has a network split, all my worker childs are blocked and wait for TCP timeouts. In the meantime, no calls to anywhere else is possible. Right?
It depends. e.g.
- Kamailio receives request which should be sent to sip:user@1.1.1.1;transport=tcp: Kamailio will open TCP connection, will write to the socket in blocking mode. That means that one single worker process (either a UDP worker if the request was received via UDP or a TCP worker if the request was received via TCP) is blocked until the TCP ACK is received from the destination and the writing to the socket returns. Then this TCP connection is moved to the connection pool and then this worker process is free for some other tasks.
If the server on 1.1.1.1 does not answer to TCP packets, this single processed is blocked. For example, if you send 4 INVITE requests to the proxy afdressing 4 servers which do not exist - then 4 workers will block until the TCP handshake timeout will trigger.
But if everything is fine you can have handle hundreds of TCP connection with a few worker processes - the requests are processes one after the other and the currently not needed TCP connections will be put to the TCP connection pool meanwhile.
regards klaus
Maybe you can send me a "tcpdump -s 0 -w tcpproblem.pcap -i any" dump of the problem.
regards klaus
Gawith schrieb:
Klaus Darilion schrieb:
That's the point. The server does nearly nothing: I have 10 TCP childs and only about 2 INVITEs/sec.
Then there is probably something going wrong.
I would:
- stop the proxy and all clients
- start tcpdump or wireshark (without capture filter)
- start the clients (if you have clients)
- make a single call, hang up
- repeat the single call, hang up.
Then analyze the capture file: Are there "strange" SYN packets trying to establish a connection which does not work? Are there other TCP errors (RST, FIN ...)
I suspect that workers are trying to establish new tcp connection which do trigger a timeout - and as TCP handling in Kamailio is blocking the TCP workers are busy when new requests are received.
I've check the traces and can not see any suspicious SYNs etc. All looks fine. But I recognized that the "no free tcp receiver" comes along with a "CRITICAL:core:handle_io: empty fd map" messages. Nevertheless, I have no clue at all...
Just for clarification: if I have 5 TCP children, then only 4 childs are worker processes. If a sip peering partner with 4 different SIP server has a network split, all my worker childs are blocked and wait for TCP timeouts. In the meantime, no calls to anywhere else is possible. Right?
Thanks.
Users mailing list Users@lists.kamailio.org http://lists.kamailio.org/cgi-bin/mailman/listinfo/users