Fixes #4503.
### Problem
Transactions leak when `drop;` executes in a `branch_route` and modules like `pua_dialoginfo` are loaded. The leaked transactions persist indefinitely in shared memory — no final reply is sent, no timer fires, and `wait_handler()` never frees the cell.
### Cause
`set_kr()` accumulates flags with `|=`, but the first cleanup check in `t_unref()` uses exact equality:
```c if(unlikely(kr == REQ_ERR_DELAYED)) { // fails when other flags are set ```
When `pua_dialoginfo` processes an INVITE, it calls `dialog_publish_multi()` inline during `DLGCB_CREATED`, which goes through `pua_send_publish()` → `tmb.t_request()` → `t_uac_prepare()` → `set_kr(REQ_FWDED)`. If the INVITE's `branch_route` then calls `drop;`, `t_relay_to()` ORs in `REQ_ERR_DELAYED`, making `_tm_kr = REQ_FWDED | REQ_ERR_DELAYED` (17).
In `t_unref()`: - Check 1: `17 == 16` → false - Check 2: `17 == 0` → false - Check 3: `(17 & 16)` true, but `(17 & ~(4|2|16|1))` = 0 → false
All three checks fail. The transaction is orphaned.
This affects any module that calls `t_request()` or `t_uac_prepare()` inline during INVITE processing — `pua_dialoginfo` is one common example.
### Fix
Change the check from exact equality to a bitwise test:
```c - if(unlikely(kr == REQ_ERR_DELAYED)) { + if(unlikely(kr & REQ_ERR_DELAYED)) { ```
The delayed-reply handler now fires whenever `REQ_ERR_DELAYED` is present, regardless of other flags accumulated by `set_kr()`.
This is safe because: - `REQ_ERR_DELAYED` is only set in `t_relay_to()` when `t_forward_nonack()` fails and the delayed-reply path is chosen — it is never set spuriously. - `kill_transaction()` already handles the case where a final reply was previously sent (`FL_FINAL_REPLY` check → `t_release_transaction()`).
The existing check-3 fallback (lines 2077-2084) becomes unreachable after this change but is harmless to leave in place.
### Workaround
On 6.1+ where `delayed_reply` is a runtime modparam, setting `modparam("tm", "delayed_reply", 0)` avoids the vulnerable code path. On ≤ 6.0.x, `TM_DELAYED_REPLY` is a compile-time `#define` (always on) and there is no workaround — only this patch fixes the leak.
### Test results
Ten-scenario matrix across 6.0.3, 6.1.0, and 6.2.0-dev0 master. 1000 INVITEs per run, `fr_timer=5000`, `branch_route` with unconditional `drop;`. Contamination triggered naturally by loading `dialog` + `pua` + `pua_dialoginfo` — no source modification beyond this patch.
| # | Version | Patch | `delayed_reply` | current | 5xx | SHM delta | Verdict | |---|---------|-------|:---------------:|--------:|----:|----------:|---------| | 1 | 6.0.3 | no | x | 431 | 0 | +7,150,272 | **LEAK** | | 2 | 6.0.3 | yes | x | 0 | 353 | 0 | OK | | 3 | 6.1.0 | no | 1 | 431 | 0 | +7,287,200 | **LEAK** | | 4 | 6.1.0 | no | 0 | 0 | 375 | 0 | OK | | 5 | 6.1.0 | yes | 1 | 0 | 407 | 0 | OK | | 6 | 6.1.0 | yes | 0 | 0 | 336 | 0 | OK | | 7 | 6.2.0-dev0 | no | 1 | 446 | 0 | +7,770,288 | **LEAK** | | 8 | 6.2.0-dev0 | no | 0 | 0 | 461 | 0 | OK | | 9 | 6.2.0-dev0 | yes | 1 | 0 | 448 | 0 | OK | | 10 | 6.2.0-dev0 | yes | 0 | 0 | 448 | 0 | OK |
`delayed_reply` = x: not available as a modparam (compile-time, always on). `current`: transactions remaining in hash table after test. `5xx`: 500 replies sent by the delayed-reply handler (nonzero = cleanup ran).
---
*Note: AI tooling was used to assist with the code analysis, reproduction, and test automation for this issue. All diagnosis, testing, and review was supervised by @NormB — the AI did not operate autonomously. The root cause identification, fix selection, and validation were guided and verified at each step.* You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/4644
-- Commit Summary --
* tm: use bitwise check for REQ_ERR_DELAYED in t_unref()
-- File Changes --
M src/modules/tm/t_lookup.c (2)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/4644.patch https://github.com/kamailio/kamailio/pull/4644.diff
gaaf left a comment (kamailio/kamailio#4644)
Please don't keep unreachable code, tm's code is already hard enough to grasp.
Is it really safe/wanted to omit **all** the extra checks from path 3 and now call `kill_transaction()` instead of just `t_release_transaction()` in those cases? If so, please explain why in the commit message and remove the dead code. Note that this code is 16+ years old, maybe the real cause is somewhere else?
Maybe 2812bd9c426d was wrong and the extra condition had to be handled in path 1? Like so: `if (unlikely(kr == REQ_ERR_DELAYED || (kr == (REQ_ERR_DELAYED | REQ_FWDED)))`
@NormB pushed 1 commit.
55ae00455a8772aa3997809b02f491315f860a97 tm: use bitwise check for REQ_ERR_DELAYED in t_unref()
NormB left a comment (kamailio/kamailio#4644)
Good points, thanks for the review.
**Dead code removal** -- agreed, updated patch removes the path 3 block entirely.
**kill_transaction() vs t_release_transaction()** -- safe. `kill_transaction()` checks `has_reply_route()` and `FL_FINAL_REPLY` first (t_funcs.c), and if a final reply was already sent, it degrades to `t_release_transaction()` -- identical to what path 3 did. When no final reply was sent (our case: all branches dropped), the UAC is waiting with no response, so generating a 500 is the correct behavior.
**On the explicit `||` alternative** -- the problem is wider than just `REQ_FWDED`. Because `set_kr()` uses `|=` on process-local state, any combination of flags can accumulate. The path 3 mask itself proves this:
```c (kr & ~(REQ_RLSD | REQ_RPLD | REQ_ERR_DELAYED | REQ_FWDED)) ```
This explicitly *excludes* those flags from triggering cleanup, so path 3 never fires for any of these combinations either:
- `ERR_DELAYED | FWDED` (17): `17 & ~23 = 0` → path 3 skips → **leak** - `ERR_DELAYED | RPLD` (18): `18 & ~23 = 0` → path 3 skips → **leak** - `ERR_DELAYED | RLSD` (20): `20 & ~23 = 0` → path 3 skips → **leak**
Path 3 only ever caught `REQ_EXIST` contamination (`kr & ~23 != 0`), which is unlikely in practice. The `||` approach would fix one known case; the `&` test handles all of them and is robust against future flag additions.
Tested the updated patch (bitwise `&` + path 3 removed) on 6.2.0-dev0 master (4767914):
| Test | Patch | current | 5xx | freed | Verdict | |------|-------|---------|-----|-------|---------| | Baseline (no patch) | no | 500 | 0 | 500/1000 | **LEAK** | | Updated patch | yes | **0** | 500 | 2000/2000 | **OK** | | Updated patch + `delayed_reply=0` | yes | **0** | 500 | 2000/2000 | **OK** |
Same results as the original 10-test matrix across 6.0.3/6.1.0/master -- removing path 3 has no effect since it was already unreachable after the `&` change.
Commit updated with path 3 removed and expanded commit message.
NormB left a comment (kamailio/kamailio#4644)
One more note on gaaf's point about whether the real cause is somewhere else -- he's right that the deeper architectural issue is `set_kr()` using `|=` to accumulate flags across unrelated TM operations within the same process. Fixing that properly would mean reworking every `set_kr()` caller (or scoping `_tm_kr` per-transaction instead of per-process), which is a much larger change.
The `&` check is the minimal defensive fix at the decision point -- it tolerates flag contamination from inline TM operations rather than trying to prevent it.
henningw left a comment (kamailio/kamailio#4644)
These tool generated text blocks are really hard to read. Just compare your last comment to your previous one, @NormB. It would be better if it would be actually you, who is commenting.
kill_transaction() vs t_release_transaction() -- safe. kill_transaction() checks has_reply_route() and FL_FINAL_REPLY first (t_funcs.c), and if a final reply was already sent,[..]
Also there is no has_reply_route() function in the whole tm module:
``` henning@app01.dev:~/repositories/kamailio/src/modules/tm$ ack has_reply_route henning@app01.dev:~/repositories/kamailio/src/modules/tm$ ```
NormB left a comment (kamailio/kamailio#4644)
I thought the change was originally a 1 character conditional update and the second just removed dead code.
Will go back and look at this again.
NormB left a comment (kamailio/kamailio#4644)
Sorry about that, sloppy on my part. I'll keep it in my own words going forward. Correction on the earlier comment: there's no `has_reply_route()` — I meant the `FL_FINAL_REPLY` check at the top of `kill_transaction()` (t_funcs.c:166).
gaaf left a comment (kamailio/kamailio#4644)
I'm not familiar with that part of tm. I'm just worried about possible side-effects of changing long-standing code for multiple cases while fixing only a single failure case. Given the path 3 conditions, I do think the original `==` in path 1 was a deliberate choice and it should be checked very carefully if all path 3 conditions can be merged in path 1. That's why I suggested changing it only for the case at hand (`REQ_FWDED`) and evaluating against 2812bd9c426. Unfortunately, 2812bd9c426 does not provide a explanation on why it was added to path 3 instead of 1.
NormB left a comment (kamailio/kamailio#4644)
I'm not familiar with that part of tm. I'm just worried about possible side-effects of changing long-standing code for multiple cases while fixing only a single failure case. Given the path 3 conditions, I do think the original `==` in path 1 was a deliberate choice and it should be checked very carefully if all path 3 conditions can be merged in path 1. That's why I suggested changing it only for the case at hand (`REQ_FWDED`) and evaluating against [2812bd9](https://github.com/kamailio/kamailio/commit/2812bd9c426d7a186bdc9b1d5fc47663...). Unfortunately, [2812bd9](https://github.com/kamailio/kamailio/commit/2812bd9c426d7a186bdc9b1d5fc47663...) does not provide a explanation on why it was added to path 3 instead of 1.
I've got the entire history of t_unref() path 3 from day 1 in 2003. After reading the entire document, it does make sense and details each commit, when it was made along with the author and commit message. If you like, I can append the markdown of the entire document here.
This is the Summary:
This document traces the git history of the three cleanup paths in t_unref() to answer the question: was the == in path 1 a deliberate choice, and is it safe to replace it with &?
The answer: path 1's == was correct when written because REQ_ERR_DELAYED couldn't coexist with other flags at that time. Path 3 was added two days later as a diagnostic BUG catcher, not as an alternative cleanup path. Two subsequent commits narrowed path 3's trigger to suppress false positive BUG messages. The == in path 1 was never revisited during those changes — understandably, since the focus each time was on the noisy BUG log rather than the broader cleanup logic.
henningw left a comment (kamailio/kamailio#4644)
It seems we are still discussing with AI generated summaries. I have spend now several hours going through this PR, validating the assumptions and call paths to make sure there is not another hallucination. This approach is not helpful for me. I am setting this PR to draft, as it could not be verified yet and also did not solve the issue reported in #4503, it seems.
Here some short page I came around yesterday, which I found interesting regarding the usage of AI tools: [ai-manifesto](https://ai-manifesto.dev/) . It was created because some issues other open source projects had in this situation.
Ozzyboshi left a comment (kamailio/kamailio#4644)
One side note, what I tested is https://github.com/kamailio/kamailio/issues/4503#issuecomment-4125345286
This is a sum of https://github.com/kamailio/kamailio/pull/4644/changes/55ae00455a8772aa39978... plus what you mentioned as "Change 2" at message https://github.com/kamailio/kamailio/pull/4644#issue-4073052522
Please update the commit before merging it.
miconda left a comment (kamailio/kamailio#4644)
@NormB: does the commits of PR need to be updated according to previous comment? If yes, can you do it so the PR can be reviewed with latest changes.
@NormB pushed 1 commit.
6cd97c1512de24015ec2455cebd84fbdf9ceca33 tm: fix transaction leak on drop; in branch_route
@NormB pushed 1 commit.
30147c6e70d877ed66688eaebe8c5be21da41671 tm: fix transaction leak on drop; in branch_route
henningw left a comment (kamailio/kamailio#4644)
Thanks for the update, format checks are failing on an unrelated earlier commit.
miconda left a comment (kamailio/kamailio#4644)
Thanks, merging it!
Not yet sure if it should be immediately backported, willing to watch it for a while...
Merged #4644 into master.