Thanks very much for the quick replies, Alex and Brandon.
The main reason I'm hitting a bottleneck is because my architecture is not optimal.
I have a number of edge proxies which communicate with all my clients.
The clients are usually distributed pretty evenly across all the proxies.
On those proxies, client TCP connections are distributed pretty evenly across the TCP
workers, so that's all fine.
The problem occurs when the edge proxies communicate with the central registrar.
When that happens, the SIP messages from a very large number of clients are all
multiplexed onto a single TCP socket connection between the proxy and the registrar.
That narrowing results in the registrar worker processes not being utilized efficiently.
For example, say I have 3 edge proxies, and my registrar has 8 cores and 8 worker
processes.
I want to spread the message processing across all 8 workers, but I'm only able to
utilize 3, because all the messages from a given edge proxy are being processed by a
single TCP worker on the registrar.
The long-term solution is to change my architecture.
I should not be doing all the expensive work on a single central registrar.
I'm planning to move to a mesh architecture, where all servers are the same, and the
registration processing is divided amongst all the servers.
That design makes sense, but it's more complex and will require more information
sharing amongst the servers.
I'm looking for a short-term improvement that will give me some breathing room, which
led me to looking at the ASYNC module.
...one alternate suggestion that could help spread load
on actual Kamailio TCP workers is by firing up additional workers on alternate ports.
That makes sense. I could have my central registrar listen on several different ports, and
could perhaps use the dispatcher module on my edge proxies to try and evenly divide the
traffic across those ports. I will look into that.
Thanks again.
-Cody