My websocket TLS server is full of these kinds of messages:

Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19701]: NOTICE: <script>:
  http:217.120.x.x:55386: WS connection closed
...
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: WARNING: <core>
  [msg_translator.c:2506]: via_builder(): TCP/TLS connection (id: 0) for WebSocket could not be found
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: <core>
  [msg_translator.c:1722]: build_req_buf_from_sip_req(): could not create Via header
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: tm
  [t_fwd.c:527]: prepare_new_uac(): could not build request
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: tm
  [t_fwd.c:1773]: t_forward_nonack(): ERROR: t_forward_nonack: failure to add branches
Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: sl
  [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: No error (2/SL)
(repeat these last errors for a bunch of attempted NOTIFY forwards)

The route block does basically something like this:

    # add_contact_alias(); # only for requests from the outside
    loose_route();
    if (!t_relay()) {
            sl_reply_error();
    }

The problem arises here:

}else if (send_info->proto==PROTO_WS){
...
    con = tcpconn_get(send_info->id, &ip, port, from, 0)
...
    if (con == NULL) {
        LM_WARN("TCP/TLS connection (id: %d) for WebSocket could not be found\n", send_info->id);

The NULL failure status gets returned up to prepare_new_uac in t_fwd.c:

    shbuf=build_req_buf_from_sip_req( i_req, &len, dst, BUILD_IN_SHM);
    if (!shbuf) {
            LM_ERR("could not build request\n");
            ret=E_OUT_OF_MEM;
            goto error01;
    }

At this point, ser_error will become E_OUT_OF_MEM while it should be something like E_SEND.

And E_OUT_OF_MEM gets translated to 500 No Error because we're not running in DEBUG mode.

What causes the connection to drop in the first place, you ask?

18:10:18.690738 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [S], seq 1323983240, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
18:10:18.690863 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [S.], seq 4077761781, ack 1323983241, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 3], length 0
18:10:18.710846 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 1, win 256, length 0
18:10:18.808751 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [P.], seq 1:246, ack 1, win 256, length 245
...
18:10:19.233415 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [.], ack 31348, win 5126, length 0
18:10:26.489764 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [P.], seq 31348:32473, ack 34578, win 255, length 1125
...
18:10:26.501409 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [P.], seq 42255:42916, ack 46010, win 5046, length 661
18:10:26.527755 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 36993, win 252, length 0
18:10:26.527860 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [.], seq 42916:47296, ack 46010, win 5278, length 4380
18:10:26.527888 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [FP.], seq 47296:48663, ack 46010, win 5278, length 1367
18:10:26.529179 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 40501, win 254, length 0
18:10:26.529200 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 42916, win 251, length 0
18:10:26.547276 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 48664, win 251, length 0
18:10:26.549712 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [F.], seq 46010, ack 48664, win 251, length 0
18:10:26.549750 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [.], ack 46011, win 5278, length 0

Where you see that the FIN is initiated by 195.35.x.x which is the Kamailio websocket server.

The cause (probably) is the WS client closing the connection. In this case after re-subscribing with Expires:0. The presence server attempts to reply with a bunch of NOTIFYs with Subscription-State: terminated;reason=timeout but they bounce on the broken connection. If Kamailio would return a nice "477 Unfortunately error on sending to next hop occurred" it'd be prettier.

Getting less "error" messages (a total of 6 per expired/unsubscribed subscription) after this error --which is apparently very common -- would be beneficial too.

As for fixing:

We could change the via_builder to set ser_error (and check that in build_req_buf_from_sip_req), or
add error-code-out-parameters to all calls from build_req_buf_from_sip_req and down. I'm not sure if either is the best way.

As for the excessive error reporting, would looking at ser_error before printing (another) error be an acceptable fix?

Cheers,
Walter Doekes
OSSO B.V.

—
Reply to this email directly or view it on GitHub.