We're seeing an issue where some webRTC clients are not receiving inbound calls. Kamailio logs show the following error:
WARNING: {1 609669922 INVITE BW152025538120824-895739618@10.103.43.11}: <core> [core/msg_translator.c:3007]: via_builder(): TCP/TLS connection (id: 0) for WebSocket could not be found ERROR: {1 609669922 INVITE BW152025538120824-895739618@10.103.43.11}: <core> [core/msg_translator.c:2086]: build_req_buf_from_sip_req(): could not create Via header ERROR: {1 609669922 INVITE BW152025538120824-895739618@10.103.43.11}: tm [t_fwd.c:484]: prepare_new_uac(): could not build request ERROR: {1 609669922 INVITE BW152025538120824-895739618@10.103.43.11}: tm [t_fwd.c:1764]: t_forward_nonack(): failure to add branches DEBUG: {1 609669922 INVITE BW152025538120824-895739618@10.103.43.11}: tm [t_funcs.c:358]: t_relay_to(): t_forward_nonack returned error -2 (-2) DEBUG: {1 609669922 INVITE BW152025538120824-895739618@10.103.43.11}: tm [t_funcs.c:376]: t_relay_to(): -2 error reply generation delayed
It seems to be happening after the client has been connected to Kamailio for more than 24 hours. The socket connection details look correct in the external DB and in the output of ws.dump, core.tcp_list, and ul.dump. A restart of Kamailio or a reload of the webRTC client, triggering a new websocket connection, will clear the issue. However, we haven't been able to determine exactly when and why a client ends up in this state. While a client is in this state, REGISTER requests are still handled successfully, they can make outbound calls and TCP keepalives from the client are working.
I'm still working on trying to debug this with GDB so I can see what the value of send_info is in https://github.com/kamailio/kamailio/blob/master/src/core/msg_translator.c#L..., but I wasn't having luck with multiple child processes running, and now my client has reloaded and is working again. I've restarted Kamailio with a single child process, so when it does start failing again, I may have better luck with GDB.
Any thoughts on what might be causing clients to get into this state? Whatever additional information I can provide that might help, I'm happy to share. Thanks!