### Description
<!-- Explain what you did, what you expected to happen, and what actually happened. -->
When handling a `PUBLISH` we call [handle_publish()](https://kamailio.org/docs/modules/5.5.x/modules/presence.html#presence.f.han...) and `NOTIFY`s are sent to all of the corresponding `active_watchers` (as expected). However, when `NOTIFY`s timeout (`408`), we're seeing that the corresponding entries in the `active_watchers` table are _not_ being deleted as expected. Furthermore, we've noticed that `NOTIFY`s are being sent to `active_watchers` which are expired (i.e. `expires < UNIX_TIMESTAMP()`) and when we run `kamcmd presence.cleanup`, no expired entries are removed from the `active_watchers` table.
We suspect that all of these things might be related--the common theme being that records aren't deleted when expected.
### Troubleshooting
#### Reproduction
<!-- If the issue can be reproduced, describe how it can be done. -->
In our setup, we're using Kamailio as a "presence server" (via the [presence](https://kamailio.org/docs/modules/5.5.x/modules/presence.html), [presence_dialoginfo](https://kamailio.org/docs/modules/5.5.x/modules/presence_dialoginfo.html), and [presence_xml](https://kamailio.org/docs/modules/5.5.x/modules/presence_xml.html) modules). We're using [`subs_db_mode`](https://kamailio.org/docs/modules/5.5.x/modules/presence.html#presence.p.sub...) `3` (DB-only scheme) and we have multiple Kamailio instances connected to a shared database (MySQL `8.0.27`).
Everything seems to be working as expected. However, as we accumulated stale entries in the `active_watchers` table we're finding that we're wasting more and more time on sending `NOTIFY`s to black holes. We're generating a _lot_ of traffic and waiting for the timeouts to hit is causing memory issues and backups.
Here are the relevant portions of our `kamailio.cfg` file: ``` # ----- presence params ----- modparam("presence", "db_table_lock_type", 0) # Disable locking; MySQL has issues with this is enabled. modparam("presence", "db_update_period", -1) # Disable synchronization. modparam("presence", "db_url", PRESENCE_DB_URL) modparam("presence", "expires_offset", 60) # Force the client to send an UPDATE before the old PUBLISH expires. modparam("presence", "max_expires", 1800) modparam("presence", "min_expires", 1700) modparam("presence", "publ_cache", 0) # Disable the PUBLISH cache since the database is shared. modparam("presence", "server_address", "sip:$CLUSTER_DOMAIN_NAME:5060") # This becomes the value of the Contact header. modparam("presence", "sip_uri_match", 1) # Use case insensitive URI matching. modparam("presence", "subs_db_mode", 3) # Database-only scheme; everything is stored in the database. modparam("presence", "notifier_processes", 0) # Caution! Under load a race condition can cause CSeq's to be reused. modparam("presence", "timeout_rm_subs", 1)
# ----- presence_dialoginfo params ----- modparam("presence_dialoginfo", "force_single_dialog", 1) # Maybe not all phones support multiple "dialog" elements? modparam("presence_dialoginfo", "force_dummy_dialog", 1) # Maybe not all phones support a null body?
# ----- presence_xml params ----- modparam("presence_xml", "db_url", PRESENCE_DB_URL) modparam("presence_xml", "force_active", 1) # Skip permission/XCAP checks. modparam("presence_xml", "force_dummy_presence", 1) # Default to a simple "open" status when presentity info is unavailable.
# ...
route[PRESENCE] { if (!is_method("PUBLISH|SUBSCRIBE")) { return; }
if (!t_newtran()) { sl_reply_error(); exit; }
if (is_method("PUBLISH")) { handle_publish(); t_release(); } else if (is_method("SUBSCRIBE")) { handle_subscribe(); t_release(); } exit; } ```
#### SIP Traffic
<!-- If the issue is exposed by processing specific SIP messages, grab them with ngrep or save in a pcap file, then add them next, or attach to issue, or provide a link to download them (e.g., to a pastebin site). -->
Here's a somewhat sanitized example (the message seems OK to us; however, the `Subscription-State: terminated; reason=timeout` does make us wonder--do we as the sender _know_ that the client is terminated/timed-out?): ``` 2022/04/05 21:09:55.209846 10.21.3.12:5060 -> 10.31.0.226:6060 NOTIFY sip:SomeUser@192.168.86.24:54639;alias=123.21.125.232~54639~1 SIP/2.0 Via: SIP/2.0/UDP presence-w.staging.internal:5060;branch=z9hG4bK43ea.648a1952000000000000000000000000.0 To: sip:SomeOtherUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946;tag=2607596073 From: sip:SomeUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946;tag=69309ea83adcd977af8788878e9f31b3-42e32342 CSeq: 66 NOTIFY Call-ID: 0_2607659559@192.168.86.24 Route: sip:10.31.0.226:6060;r2=on;lr;ftag=2607596073, sip:55.8.122.110;r2=on;lr;ftag=2607596073 Content-Length: 710 Max-Forwards: 70 Event: dialog Contact: sip:presence-w.staging.internal:5060 Subscription-State: terminated;reason=timeout Content-Type: application/dialog-info+xml
<?xml version="1.0"?> <dialog-info xmlns="urn:ietf:params:xml:ns:dialog-info" version="66" state="full" entity="sip:SomeUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946"> <dialog id="0_1364146118@192.168.1.244" call-id="0_1364146118@192.168.1.244" direction="initiator"> <state>confirmed</state> <remote> <identity>sip:4355558565@9bfadf66-a77b-6a69-25f3-02d96d4aa945:5060</identity> <target uri="sip:4355558565@9bfadf66-a77b-6a69-25f3-02d96d4aa946:5060"/> </remote> <local> <identity>sip:SomeUser@9bfadf66-a77b-6a69-25f3-02d96d4aa946:5060</identity> <target uri="sip:SomeUser@123.130.50.202:58872"/> </local> </dialog> </dialog-info> ```
### Possible Solutions
<!-- If you found a solution or workaround for the issue, describe it. Ideally, provide a pull request with a fix. -->
We didn't see any functions in the `presence` module that we could call directly to clean things up. One thought we had was to manually run some database commands from `event_route[presence:notify-reply]` (or in a `reply_route`). We've noticed that once the problematic entries are manually removed from the database that we no longer attempt to send `NOTIFY`s to the defunct destinations.
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
``` version: kamailio 5.5.4 (x86_64/linux) flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLOCKLIST, HAVE_RESOLV_RES, TLS_PTHREAD_MUTEX_SHARED ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: unknown compiled with gcc 10.2.1 ```
* **Operating System**:
<!-- Details about the operating system, the type: Linux (e.g.,: Debian 8.4, Ubuntu 16.04, CentOS 7.1, ...), MacOS, xBSD, Solaris, ...; Kernel details (output of `lsb_release -a` and `uname -a`) -->
``` $ lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 11 (bullseye) Release: 11 Codename: bullseye
$ uname -a Linux ip-10-21-3-12 5.10.0-13-cloud-amd64 #1 SMP Debian 5.10.106-1 (2022-03-17) x86_64 GNU/Linux ```