[sr-dev] [kamailio] websocket: `tcpconn_get()` error gets turned into `E_OUT_OF_MEM` which becomes "500 No Error" (#47)

Walter Doekes notifications at github.com
Mon Jan 19 14:35:18 CET 2015


My websocket TLS server is full of these kinds of messages:

    Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19701]: NOTICE: <script>:
      http:217.120.x.x:55386: WS connection closed
    ...
    Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: WARNING: <core>
      [msg_translator.c:2506]: via_builder(): TCP/TLS connection (id: 0) for WebSocket could not be found
    Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: <core>
      [msg_translator.c:1722]: build_req_buf_from_sip_req(): could not create Via header
    Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: tm
      [t_fwd.c:527]: prepare_new_uac(): could not build request
    Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: tm
      [t_fwd.c:1773]: t_forward_nonack(): ERROR: t_forward_nonack: failure to add branches
    Jan 18 18:10:26 ws0 /usr/sbin/kamailio[19689]: ERROR: sl
      [sl_funcs.c:387]: sl_reply_error(): ERROR: sl_reply_error used: No error (2/SL)
    (repeat these last errors for a bunch of attempted NOTIFY forwards)

The route block does basically something like this:

        # add_contact_alias(); # only for requests from the outside
        loose_route();
        if (!t_relay()) {
                sl_reply_error();
        }

The problem arises here:

    }else if (send_info->proto==PROTO_WS){
    ...
        con = tcpconn_get(send_info->id, &ip, port, from, 0)
    ...
        if (con == NULL) {
            LM_WARN("TCP/TLS connection (id: %d) for WebSocket could not be found\n", send_info->id);

The NULL failure status gets returned up to `prepare_new_uac` in `t_fwd.c`:

        shbuf=build_req_buf_from_sip_req( i_req, &len, dst, BUILD_IN_SHM);
        if (!shbuf) {
                LM_ERR("could not build request\n");
                ret=E_OUT_OF_MEM;
                goto error01;
        }

At this point, ser_error will become `E_OUT_OF_MEM` while it should be something like `E_SEND`.

And `E_OUT_OF_MEM` gets translated to `500 No Error` because we're not running in DEBUG mode.

What causes the connection to drop in the first place, you ask?

    18:10:18.690738 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [S], seq 1323983240, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
    18:10:18.690863 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [S.], seq 4077761781, ack 1323983241, win 14600, options [mss 1460,nop,nop,sackOK,nop,wscale 3], length 0
    18:10:18.710846 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 1, win 256, length 0
    18:10:18.808751 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [P.], seq 1:246, ack 1, win 256, length 245
    ...
    18:10:19.233415 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [.], ack 31348, win 5126, length 0
    18:10:26.489764 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [P.], seq 31348:32473, ack 34578, win 255, length 1125
    ...
    18:10:26.501409 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [P.], seq 42255:42916, ack 46010, win 5046, length 661
    18:10:26.527755 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 36993, win 252, length 0
    18:10:26.527860 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [.], seq 42916:47296, ack 46010, win 5278, length 4380
    18:10:26.527888 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [FP.], seq 47296:48663, ack 46010, win 5278, length 1367
    18:10:26.529179 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 40501, win 254, length 0
    18:10:26.529200 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 42916, win 251, length 0
    18:10:26.547276 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [.], ack 48664, win 251, length 0
    18:10:26.549712 IP 217.120.x.x.55386 > 195.35.x.x.443: Flags [F.], seq 46010, ack 48664, win 251, length 0
    18:10:26.549750 IP 195.35.x.x.443 > 217.120.x.x.55386: Flags [.], ack 46011, win 5278, length 0

Where you see that the FIN is initiated by `195.35.x.x` which is the Kamailio websocket server.

The cause (probably) is the WS client closing the connection. In this case after re-subscribing with Expires:0. The presence server attempts to reply with a bunch of NOTIFYs with `Subscription-State: terminated;reason=timeout` but they bounce on the broken connection. If Kamailio would return a nice "477 Unfortunately error on sending to next hop occurred" it'd be prettier.

Getting less "error" messages (a total of 6 *per* expired/unsubscribed subscription) after this error --which is apparently very common -- would be beneficial too.

As for fixing:
- We could change the `via_builder` to set `ser_error` (and check that in `build_req_buf_from_sip_req`), or
- add error-code-out-parameters to all calls from `build_req_buf_from_sip_req` and down.
I'm not sure if either is the best way.

As for the excessive error reporting, would looking at `ser_error` before printing (another) error be an acceptable fix?

Cheers,
Walter Doekes
OSSO B.V.

---
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/47
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20150119/712122a1/attachment.html>


More information about the sr-dev mailing list