### Description
The Kamailio 5.4.x dialog profiles functionality can lead to dead-lock on certain high-load scenarios.
The Kamailio dialog profiles are used to track parallel channels for about 200 outgoing PSTN carrier interconnections. During high traffic times (like several thousands parallel calls) the Kamailio server will frequently (e.g. hourly) goes into an end-less loop while executing get_profile_size in the configuration script. This causes the locking for the dialog profiles never be released and Kamailio will stop serving traffic. Internal monitoring tools and RPC commands stay working, as long as they do not touch the dialog functionality.
A similar (dedicated) Kamailio setup is used for tracking parallel channels for customers. Here the dead-lock is not observed that frequently, but aparentely also some crashes happens in a much longer time interval.
### Troubleshooting
After analysis of the back-traces with GDB the get_profile_size() function was removed from the configuration script. After this change the crash did not happened anymore for several days.
#### Reproduction
Issue could not be reproduced so far.
#### Debugging Data
##### bt 1 (some data removed)
(gdb) bt
\# 0 0x00007f57cf3b00da in get_profile_size (profile=0x7f50ccbc7e80, value=0x7ffd9928f300) at dlg_profile.c:859
n = 364
i = 12
ph = 0x7f50d3e4b7d0
\# 1 0x00007f57cf419c67 in w_get_profile_size_helper (msg=0x7f57d699d418, profile=0x7f50ccbc7e80, value=0x7ffd9928f300, spd=0x7f57d6916960) at dialog.c:941
\# 2 0x00007f57cf41a459 in w_get_profile_size3 (msg=0x7f57d699d418, profile=0x7f50ccbc7e80, value=0x7f57d6935118, result=0x7f57d6916960) at dialog.c:982
\# 3 0x0000000000463fea in do_action (h=0x7ffd99293610, a=0x7f57d6936488, msg=0x7f57d699d418) at core/action.c:1094
\# 4 0x00000000004711ee in run_actions (h=0x7ffd99293610, a=0x7f57d6936488, msg=0x7f57d699d418) at core/action.c:1581
\# 5 0x000000000046058b in do_action (h=0x7ffd99293610, a=0x7f57d690fda8, msg=0x7f57d699d418) at core/action.c:700
The first back-trace was taking from a running process with gdb. The counter in f0 does not increased that much during this time, probably due the overflow of the loop counter.
##### bt2 (analysis with data structure with gdb scripts)
Here the loop counter in f0 showed a really high value. Expected size of dialog profiles hash table:
(gdb) p profile->entries[3]
$4 = {first = 0x7f9bfd4aad98, content = 2068}
(gdb) p profile->entries[7]
$3 = {first = 0x7f9c12079f70, content = 784}
(gdb) p profile->entries[12]
$6 = {first = 0x7f9c02be5d50, content = 7600}
(gdb) p profile->entries[14]
$2 = {first = 0x7f9bff636de8, content = 6764}
hash table bucket 14 shows a lot of corruption and the loop never ends (carrier names and IPs replaced). The list for hash bucket 7 got linked to the list for hash bucket 14:
counter 6755: prev 0x7f9c0b9dcde0 - current 0x7f9c02e5b378 - next 0x7f9c0a5f9ba0 - value carrier1-XX.XX - hash 14
counter 6756: prev 0x7f9c02e5b378 - current 0x7f9c0a5f9ba0 - next 0x7f9c0860b968 - value carrier1-XX.XX▒▒▒▒ - hash 14
counter 6757: prev 0x7f9c0a5f9ba0 - current 0x7f9c0860b968 - next 0x7f9bfe3f3a78 - value carrier1-XX.XX▒▒▒▒ - hash 14
counter 6758: prev 0x7f9c0860b968 - current 0x7f9bfe3f3a78 - next 0x7f9c10d977f0 - value carrier1-XX.XX - hash 14
counter 6759: prev 0x7f9bfe3f3a78 - current 0x7f9c10d977f0 - next 0x7f9c0ae198b0 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 6760: prev 0x7f9c10d977f0 - current 0x7f9c0ae198b0 - next 0x7f9c12079f70 - value carrier3-XX.XX - hash 7
counter 6761: prev 0x7f9c0ae198b0 - current 0x7f9c12079f70 - next 0x7f9c011f2540 - value-carrier2-XX.XX▒▒▒▒ - hash 7
counter 6762: prev 0x7f9c12079f70 - current 0x7f9c011f2540 - next 0x7f9bfff886f0 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 6763: prev 0x7f9c011f2540 - current 0x7f9bfff886f0 - next 0x7f9c05db00a8 - value carrier3-XX.XX= - hash 7
[...]
counter 28270: prev 0x7f9c019d06e8 - current 0x7f9bfaf18290 - next 0x7f9c12c90680 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 28271: prev 0x7f9bfaf18290 - current 0x7f9c12c90680 - next 0x7f9c086a2b58 - value-carrier2-XX.XX▒▒▒▒ - hash 7
counter 28272: prev 0x7f9c12c90680 - current 0x7f9c086a2b58 - next 0x7f9c0b4f09e8 - value carrier2-XX.XX▒▒▒▒ - hash 7
[...]
hash table bucket 7 is still consistent regarding the loop, but already shows initial sign of corruption. There is one item of the list for hash bucket 14 visible:
counter 780: prev 0x7f9c0db57ac8 - current 0x7f9c02225700 - next 0x7f9bfbf7db08 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 781: prev 0x7f9c02225700 - current 0x7f9bfbf7db08 - next 0x7f9c10d977f0 - value carrier1-XX.XX- hash 14
counter 782: prev 0x7f9bfe3f3a78 - current 0x7f9c10d977f0 - next 0x7f9c0ae198b0 - value carrier2-XX.XX▒▒▒▒ - hash 7
counter 783: prev 0x7f9c10d977f0 - current 0x7f9c0ae198b0 - next 0x7f9c12079f70 - value carrier3-XX.XX - hash 7
total size of hash table is 784
#### Log Messages
No special log messages observed.
#### SIP Traffic
SIP traffic looked ok during analysis of the core dumps.
### Possible Solutions
* adding additional safe-guards for the get_profile_size function to not access data from other hash buckets
* stopping the loop counter after some threshold
* finding and fixing the source of the internal data corruption (obviously)
* refactoring the dialog modules to use another approach for storing the dialog profile information
### Additional Information
* **Kamailio version**:
Kamailio 5.4.7, compiled from git repository
* **Operating System**:
CentOS 7.9
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/2923
THIS IS AN AUTOMATED MESSAGE, DO NOT REPLY.
A user has added themself to the list of users assigned to this task.
FS#100 - Assignment operators don't work
User who did this - Alex Hermann (axlh)
http://sip-router.org/tracker/index.php?do=details&task_id=100
You are receiving this message because you have requested it from the Flyspray bugtracking system. If you did not expect this message or don't want to receive mails in future, you can change your notification settings at the URL shown above.
Changes to example files for PCSCF in misc/examples/ims to make funcitional with current stable version.
<!--
IMPORTANT:
- for detailed contributing guidelines, read:
https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md
- pull requests must be done to master branch, unless they are backports
of fixes from master branch to a stable branch
- backports to stable branches must be done with 'git cherry-pick -x ...'
- code is contributed under BSD for core and main components (tm, sl, auth, tls)
- code is contributed GPLv2 or a compatible license for the other components
- GPL code is contributed with OpenSSL licensing exception
-->
#### Pre-Submission Checklist
<!-- Go over all points below, and after creating the PR, tick all the checkboxes that apply -->
<!-- All points should be verified, otherwise, read the CONTRIBUTING guidelines from above-->
<!-- If you're unsure about any of these, don't hesitate to ask on sr-dev mailing list -->
- [x] Commit message has the format required by CONTRIBUTING guide
- [x] Commits are split per component (core, individual modules, libs, utils, ...)
- [x] Each component has a single commit (if not, squash them into one commit)
- [x] No commits to README files for modules (changes must be done to docbook files
in `doc/` subfolder, the README file is autogenerated)
#### Type Of Change
- [x] Small bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds new functionality)
- [ ] Breaking change (fix or feature that would change existing functionality)
#### Checklist:
<!-- Go over all points below, and after creating the PR, tick the checkboxes that apply -->
- [ ] PR should be backported to stable branches
- [x] Tested changes locally
- [ ] Related to issue #XXXX (replace XXXX with an open issue number)
#### Description
Changes to example files for PCSCF in misc/examples/ims to make funcitional with current stable version.
Changes:
- Loading IPsec module prior to IMS Usrloc PCSCF (Now required)
- removed modparam("ims_usrloc_pcscf", "hashing_type", 2) from example (This parameter was removed some time ago)
- Fix to formatting of single MySQL connection to work in current version
- Bind to any IP by default
- Dispatcher parameters only loaded if required
You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/2203
-- Commit Summary --
* misc: examples: IMS PCSCF kamailio.cfg update
* misc: examples: IMS PCSCF pcscf.cfg update
-- File Changes --
M misc/examples/ims/pcscf/kamailio.cfg (6)
M misc/examples/ims/pcscf/pcscf.cfg.sample (12)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/2203.patchhttps://github.com/kamailio/kamailio/pull/2203.diff
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/pull/2203
Dear Gang
Possibly @oej could provide more in-depth information as he has witnessed this issue.
Usually the user of the from URI is the phone number displayed at the destination. There are situations where this phone number is translated.
As example. In Switzerland, the user is used to see numbers in a local format. National number starting with 0 and international numbers with 00 but on interconnection between telcos, e164 is used.
So basically when a call is sent to a customer '+41' is replaced by '0' and '+' is replaced by '00'.
Let's start with an example From: header:
`From: "Maurice Moss" <sip:+41991234567@example.com>;user=phone`
So shortly before the call is sent out to the location of the registered CPE, this is done:
```
if ($fU =~ "^\+41") {
$fU = "0" + $(fU{s.substr,3,0});
} else if ($fU = ~ "^\+") {
$fU = "00" + $(fU{s.substr,1,0});
}
```
What is sent to the CPE now looks like this:
`From: "Maurice Moss" <sip:0991234567@example.com>;user=phone`
Now we hit an error like 486 BUSY and the destination has call forwarding active to a mobile phone on another TSP. So we have to send the call out back the IC and numbers need to be translated back to e164.
We handle this in a failure route, which in turn could trigger a branch route.
So we revert the number back to e164:
`$fU = "+41" + $(fU{s.substr,1,0});`
Expected outcome:
`From: "Maurice Moss" <sip:+41991234567@example.com>;user=phone`
Observed outcome:
`From: "Maurice Moss" <sip:0991234567+41991234567@example.com>;user=phone`
So setting $fU more than once is appending to the user element of the From header URI.
This behavior has not been found in any documentation.
I have been working around most of the issues by making sure I change $fU (and $tU) at the latest possible time and only once. But in the case described above, I have not been able to come up with a work-around yet.
I also can't think of any benefit of the way those PV are handled or any harm that could be done, to handle them differently and make the last 'write' overwrite and previous value, instead of appending.
Thank you for looking into this.
-Benoît-
--
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/issues/3165
You are receiving this because you are subscribed to this thread.
Message ID: <kamailio/kamailio/issues/3165(a)github.com>