### Description
Thank you for implementing #2413, I'm looking forward to use it.
I was trying it on a dev system. It works fine when the same TLS client needs to be selected for ALL connections.
Having issue with it when connection expected to alternate between multiple configured TLS clients. When configured as bellow and event_route alternates between connections (like in logs bellow) connection is always using TLSc with one of the server-ids, e.g. "domain-02".
It looks like it's a race between setting server id in event_route and a "thread" that starts TLS client. In my observations only one TLS client is used.
Expecting: each outbound connection use TLSc as it was set by tls_set_connect_server_id().
``` event_route[tm:local-request] { if (is_method("OPTIONS")) { $var(contact) = "Contact: <sip:" + $fd + ":5061;transport=tls>\r\n"; append_hf("$var(contact)");
if ($fd == "domain-01") { tls_set_connect_server_id("domain-01"); xlog("L_INFO", "ID=$ci|tls_set_connect_server_id(domain-01)\n"); } else if ($fd == "domain-02") { tls_set_connect_server_id("domain-02"); xlog("L_INFO", "ID=$ci|tls_set_connect_server_id(domain-02)\n"); } } } ```
Dispatcher configured as: ``` loadmodule "dispatcher.so" modparam("dispatcher", "list_file", "/etc/kamailio/dispatcher.list") modparam("dispatcher", "ds_probing_mode", 1) modparam("dispatcher", "ds_ping_interval", 60) ``` With records like: ``` 1 sip:sip1.host.com;transport=tls 0 1 socket=tls:111.222.233.11:5061;ping_from=sip:my-domain-01.com 1 sip:sip2.host.com;transport=tls 0 2 socket=tls:111.222.233.12:5061;ping_from=sip:my-domain-01.com 1 sip:sip3.host.com;transport=tls 0 3 socket=tls:111.222.233.13:5061;ping_from=sip:my-domain-01.com 2 sip:sip1.host.com;transport=tls 0 1 socket=tls:111.222.233.21:5061;ping_from=sip:my-domain-02.com 2 sip:sip2.host.com;transport=tls 0 2 socket=tls:111.222.233.22:5061;ping_from=sip:my-domain-02.com 2 sip:sip3.host.com;transport=tls 0 3 socket=tls:111.222.233.23:5061;ping_from=sip:my-domain-02.com ```
#### Log Messages
``` Jun 3 11:57:44 INFO: <script>: ID=4eadda397f10fcb1-948@1.2.3.4|tls_set_connect_server_id(domain-02) Jun 3 11:57:44 INFO: <script>: ID=4eadda397f10fcb2-948@1.2.3.4|tls_set_connect_server_id(domain-01) Jun 3 11:57:44 INFO: <script>: ID=4eadda397f10fcb3-948@1.2.3.4|tls_set_connect_server_id(domain-02) Jun 3 11:57:44 INFO: <script>: ID=4eadda397f10fcb4-948@1.2.3.4|tls_set_connect_server_id(domain-01) Jun 3 11:57:44 INFO: <script>: ID=4eadda397f10fcb5-948@1.2.3.4|tls_set_connect_server_id(domain-02) Jun 3 11:57:44 INFO: <script>: ID=4eadda397f10fcb6-948@1.2.3.4|tls_set_connect_server_id(domain-01)
```
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 5.5.0 (x86_64/linux) flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: unknown compiled with gcc 7.5.0 ```
* **Operating System**:
``` Linux dev03 4.15.0-143-generic #147-Ubuntu SMP Wed Apr 14 16:10:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Ubuntu 18.04.5 LTS ```
I was half way through writing a less detailed version of a very similar issue i am having too, having to send options presenting a specific servername and cert
same environment as well (ubuntu 18.04, kam 5.5.0)
I have noticed that the xavp sni settings seems to only work on one thread as well if you loop the message back and set via request_route
@arkadiam Pretty sure we are both trying to do same thing judging by your dispatcher and routing config, multi domain telephony integration with a cloud comms platform?
Its frustrating that SNI is being flakey as it works very nicely if you have a single cert with multiple SAN's defined but that less than ideal from a management point of view (but doesnt seem to have the performance hit loading multiple certs has #2312 )
For sake of completeness my virtually identical config is as follows: ```event_route[tm:local-request] { if(is_method("OPTIONS") && $ru =~ "vendor.com") { xlogl("L INFO", "Dispatcher Pinging ruri: $ru turi: $tu furi: $fu"); append_hf("Contact: sip:$fd:5061;transport=tls\r\n"); tls_set_connect_server_id($fd); xlogl("L INFO","SNI ID: $fd"); } } ```
Can you set the `debug=3` global parameter, reproduce the scenario and attach here all log messages printed by kamailio to syslog (there should be a log of `DEBUG`)? It will help to troubleshoot and see what happens.
Certainly, i spent yesterday debugging and going through the logs and im thinking the issue may be the (ab)use of dispatcher to ping the same destination presenting as different domains. The destinations are intentially the same in the sample dispatcher.list in the first post
```1 sip:sip1.host.com;transport=tls 0 1 socket=tls:111.222.233.11:5061;ping_from=sip:my-domain-01.com 1 sip:sip2.host.com;transport=tls 0 2 socket=tls:111.222.233.12:5061;ping_from=sip:my-domain-01.com 1 sip:sip3.host.com;transport=tls 0 3 socket=tls:111.222.233.13:5061;ping_from=sip:my-domain-01.com 2 sip:sip1.host.com;transport=tls 0 1 socket=tls:111.222.233.21:5061;ping_from=sip:my-domain-02.com 2 sip:sip2.host.com;transport=tls 0 2 socket=tls:111.222.233.22:5061;ping_from=sip:my-domain-02.com 2 sip:sip3.host.com;transport=tls 0 3 socket=tls:111.222.233.23:5061;ping_from=sip:my-domain-02.com```
What we are both trying to achieve is I believe integration with Teams using SNI to provide connectivity to multiple domain names. I have been running in production for a couple of years now a solution based on using a single cert and lots of SAN's but that i becoming a management nightmare and using a cert per domain served is far preferable. DR (Direct Routing) has a primary signalling domain and two failover domains (hence dispatcher is ideal for this). Connectivity is monitored by mutually verifying the TLS connection where the Server (kamailio) must present a valid cert whose SN or SAN must match the expected SNI server name from the client (DR SBC), for inbound traffic this works fine, but on outbound traffic the same rules apply meaning that each domain we wish to serve has to have its own connection which has been initialised to utilise the corresponding [client:any] entry with the appropriate server name configured. DR's second level connection state involves mutually pinging each end, but DR only sends pings on the receipt of valid ping from the kamailio, which doesnt happen as my-domain02.com presents as my-domain01.com.
I added some additional LM_DBG statements to the TLS module as well as compiling with the TLS_WR_DEBUG and TLS_RD_DEBUG defines enabled to help track the flow through the code initially i noticed calls to `ksr_tls_set_connect_server_id` were not always setting `_ksr_tls_connect_server_id` so i commented out the `if(_ksr_tls_connect_server_id.len>=srvid->len)` block and the value was consistently set. However i was seeing the TLS connection initalise only 3 times, using the set server ID for the first dispatcher set, and never the second, even though i could see clearly the server id being set, the logs jump straight to writing over the existing connection. Which leads me to believe as the destination (and probably sending port) is the same for each dispatcher address set then the connection will be reused. (#1107 seems to be an accurate description of what i have observed). Which leads me to wonder; if an existing matching connection needs to be closed when the server id is set by `ksr_tls_set_connect_server_id` or if an additional connection matching mode could be implemented which took a from address into consideration whilst matching.
I will revert my changes and post the log output here
Debug output attached, in the full version from about line 8050 is the dispatcher activity [kamailio-debug-dispatcher-sni-full.log](https://github.com/kamailio/kamailio/files/6623779/kamailio-debug-dispatcher...) [kamailio-debug-dispatcher-sni-dispatcher-activity.log](https://github.com/kamailio/kamailio/files/6623780/kamailio-debug-dispatcher...)
As I understood from the previous comment, it is same target address, but you want different tls connections with different certs.
SIP specifications decouple transport layer from SIP traffic, there is no relation between a SIP request/transaction/dialog and transport layer (in this case the TLS connection). Even more the specs recommend connection reuse, which is done here.
Practically, if the target is the same, then kamailio creates a single connection to it and it will be used for all traffic sent to the target, irrelevant of the SIP From headers or different dispatcher groups, a.s.o.
Looking at dispatcher records provided by @arkadiam, seems to be same case for him.
You can try to have different listen sockets for domains, if you have a few of them should be ok.
Otherwise, the tcp (tls) connection management code has to be changed, it may impact several components. It has to be coded in C.
Over all, it seems not to be an issue related to `ksr_tls_set_connect_server_id()`, if no other new information shows up soon to indicate a different conclusion, this issue can be closed.
I would agree its a connection reuse issue, #1107 is probably a more relevant discussion to have.
As an experiment I tried a modified version of tcp_main.c with the connection lookup modified to not return an existing connection and the expected behaviour was observed.
Thinking the further the ideal solution from my point of view would be a pair of config values, one to enable a enhanced matching mechanism, and a complementary value list of ips which need to be outbound sni aware and takes the tls profile into account when a configured ip is matched as the destination. Unfortunately this would be non trivial as correct me if I'm wrong but the tls profile is attached after the connection is created so a chicken and egg problem? I agree this isnt vast majority use case territory but would be invaluable for offering hosted services and allows me to keep the rest of the architecture and network config the same. As surely an outbound connection with a different servername specified should be treated as distinct connections even if the destination is the same?
The issue I have with aliases is that I would swap the admin burden of adding sans to the ssl and doing a tls.reload to having to restart the service, and increase the number of open ports at the edge, basically its not scalable enough, I can just about cope with slow tls reload when you have more than one cert loaded (trivial load test with 30 certs took over 60 secs to reload) The whole impetus for trying to use sni with kamailio would be offer a scalable, automatable provisioning for direct routing customers, which doesn't have a restriction on number of sans defined on the cert (think simplicity of adding a vhost in nginx). Currently provisioning can be done live by expanding a cert, updating the dispatcher with new entries and reloading tls and dispatcher with kamcmd.
With your changes, does it create a new connection every time? Or connections are reused in some cases?
You can make a pull request with your code and we can analyze/review/comment there if can be integrated. The new behaviour should be guarded by a core parameter, because the current one follows the specs. Adding a new core parameter requires changing lex/yacc files, I can do it if you are not familiar with them, just introduce a global variable to use it in IF conditions.
@miconda it was a very naïve modification so it just created a new connection and never reused which was a horrible solution
Re-reading through tcp_main.c I had a spark of inspiration when I noticed that `tcpconn_rm` checked if `c->extra_data` had a value other than 0 to determine if `tls_tcpconn_clean` needed calling.
I have currently hacked an extra property to indicate sni force connection into the `tls_extra_data` struct, which gets set by calling `ksr_tls_set_connect_server_id`, an additional check is added to `_tcpconn_find` which checks if the connection is TLS or WSS and that the sni force new connection `extra_data` property is 1, now only requests which have a call to `tls_set_connect_server_id` force a new connection, otherwise connections are reused (in my limited testing so far). I havnt investigated if the flags property might be more appropriate. One thing that occurs to me with this solution is that other than the changes to the `tls_extra_data` struct the behaviour could be explicitly triggered with an additional argument on `tls_set_connect_server_id(str serverId, bool forceNew)` thus wouldnt effect any existing behaviours?
I shall dig out the PR guidelines and get one raised ASAP, i suspect my fix is a bit too hairy a solution to make it, but im happy to contribute to the discussion.
You can add a new parameter to tls_set_connect_server_id(), but if the same `serverId` is provided, then I expect to be same certs (same client profile). If you want to have many connections with the same profile, then 2nd parameter makes sense. Otherwise, probably has to be some sort of global/module parameter to specify that matching has to be done also on TLS profile, not only on destination address for tls connections. Having mixed matching (with or without serverid) at the same time, might bring in more complexity. Anyhow, up to you how you implement, make a PR then we can discuss further there.
Closed #2760.