Hi,
we're trying to have scalable Kamailio nodes that are stateless. That means that we can for example load dispatchers and reload them from an API endpoint.
Now we're stuck on the dialog part. All our nodes load the dialogs from a shared db, what happens when a node gets down is that a packet reaches another node and does not know how to handle that because the dialog does not have it's ip address. Also there is no possibility to reload the dialogs from db so that we could update the records in the db and have the dialogs ip changed on an anctive node's ip.
Is there a possibility to have a function to reload the dialogs from db on a running kamailio?
an example of how we could implement: ``` if(!is_known_dlg()) { load_dialog_vars_from_db(); } ``` or, eventually ``` if(!is_known_dlg()) { load_dialog_vars_from_db($dlg(callid)); } ```
Question: How would the module handle this? If there are dialogs active and you run load_dialog_vars_from_db() - should we delete all dialogs in memory first? Should only missing dialogs load, or just the active dialog?
In my opinion, if you specify the parameter (`load_dialog_vars_from_db($ci)`) the module should "append" the specified dialog in the dialog memory and handle it . If you don't specify anything, method `load_dialog_vars_from_db(); ` should : - delete all dialogs in memory first - restore dialogs from db to memory
There could be a third option , if needed by someone, to load all missing dialogs or load dialogs matching a particular regex rule
Actually we're dealing with sql_query and $dlg_vars with "not mine" dialogs in a distributed containerized environment (orchestrated by kubernetes), but yes, it is a workaround.
This feature is really interesting, any news about this?
Sorry , re-reading this issue I saw I made a mistake, the correct usage shoud be : ``` if(!is_known_dlg()) { load_dialog_vars_from_db($ci); } ``` as `$dlg(callid)` at that point is not set
I just committed the config (and kemi) function `dlg_db_load_callid(val)`, which should load dialog record from database by matching on callid (it loads also associated variables).
While not tested yet given lack of proper environment to simulate such scenario (hopefully you can do it), there is still an issue that needs to be sorted out. Respectively a high potential of having conflicts for internal dialog id. The internal id is made from two integers, hash entry and hash id (h_entry, h_id). h_entry is the index of the slot in the hash table used to store the dialog structure, computed by hashing over call id. h_id is an incremented integer specific for each hash table slot.
So, when loading a new record from database, if its h_id is not conflicting with an existing dialog on the same hash table slot, all is ok, otherwise the module is not going to work properly with two dialogs having same h_id.
Among solutions I thought of:
1) have the servers generating non conflicting h_id, by having a start value different per server and an increment step larger than the number of servers. Iirc, here was at some point a similar attempt, but somehow didn't make it. Let's say one has two servers, first server starts allocating h_id from 1 and increments by 2 (e.g: 1, 3, 5, ...) and the seconds start from 2 and increments with 2 (2, 4, 6, ...). Those values can be set via mod params, eventually with an option to rely on server_id.
This should be the least intrusive in the other modules built on top of dialog. But it is rather rigid, with the example above, if one adds an extra server, it needs to reconfigure the old ones, so each server starts from either 1, 2 or 3 and increment by 3. Of course, one can set increment step to a larger value, like 100, and then has flexibility to deploy up to 1 hundred servers before having to reconfigure in case there is need for more server.
2) add server_id as the third field in the dialog id. It will require review of other modules using dialog and eventual code updates in those modules. One column to store the server_id needs to be added to dialog db tables.
3) switch from this conflicting id system to something like string unique ids, similar to what we have in usrloc records. This will require coding in other modules, changes to database schema, etc., so more work comparing with the above two, but could be the best in long term.
Most likely I am going to push code for option 1) for now, but it would be good to see opinions from other devs, if they would prefer other option or propose other variants.
Am Mittwoch, 4. April 2018, 09:52:23 CEST schrieb Daniel-Constantin Mierla:
I just committed the config (and kemi) function `dlg_db_load_callid(val)`, which should load dialog record from database by matching on callid (it loads also associated variables).
While not tested yet given lack of proper environment to simulate such scenario (hopefully you can do it), there is still an issue that needs to be sorted out. Respectively a high potential of having conflicts for internal dialog id. The internal id is made from two integers, hash entry and hash id (h_entry, h_id). h_entry is the index of the slot in the hash table used to store the dialog structure, computed by hashing over call id. h_id is an incremented integer specific for each hash table slot.
So, when loading a new record from database, if its h_id is not conflicting with an existing dialog on the same hash table slot, all is ok, otherwise the module is not going to work properly with two dialogs having same h_id.
Among solutions I thought of:
- have the servers generating non conflicting h_id, by having a start value
different per server and an increment step larger than the number of servers. Iirc, here was at some point a similar attempt, but somehow didn't make it. Let's say one has two servers, first server starts allocating h_id from 1 and increments by 2 (e.g: 1, 3, 5, ...) and the seconds start from 2 and increments with 2 (2, 4, 6, ...). Those values can be set via mod params, eventually with an option to rely on server_id.
Hello Daniel,
just a quick comment about the option 1), maybe I did not understood it correctly. The scheme would only work for two servers, for e.g. three server it would overlap sometimes?
Server 1: 1,3,5 Server 2: 2,4,6 Server 3: 3,6,9
If option 1) will not work, I think option 2) would be the best trade-of, giving the rather large implementation and testing effort of 3).
This should be the least intrusive in the other modules built on top of dialog. But it is rather rigid, with the example above, if one adds an extra server, it needs to reconfigure the old ones, so each server starts from either 1, 2 or 3 and increment by 3. Of course, one can set increment step to a larger value, like 100, and then has flexibility to deploy up to 1 hundred servers before having to reconfigure in case there is need for more server.
- add server_id as the third field in the dialog id. It will require review
of other modules using dialog and eventual code updates in those modules. One column to store the server_id needs to be added to dialog db tables.
- switch from this conflicting id system to something like string unique
ids, similar to what we have in usrloc records. This will require coding in other modules, changes to database schema, etc., so more work comparing with the above two, but could be the best in long term.
Most likely I am going to push code for option 1) for now, but it would be good to see opinions from other devs, if they would prefer other option or propose other variants.
Best regards,
Henning
If there are 3 servers, then the step should at least 3, resulting in:
``` Server1: 1, 4, 7, 10, ... Server2: 2, 5, 8, 11, ... Server3: 3, 6, 9, 12, ... ```
So the start value should be the `index` of the server and the increment value has to be at least the number of servers. There is no conflict in such case.
In our case we will have an autoscaling system of Kamailio based on geographic requests and load so we will not be able to know in advance how many instances ok kamailio will be provisioned. As you suggested for this temporary workaround we should have : Server 1 : 1, 101,201,301,401... Server 2 : 2, 202,302,402,.... Server n : n, n+100,n+200,n+300,...
Where n < 100
I'm trying the install this morning and got: ```kamailio is already the newest version (5.2.0~dev4+0~20180310010216.1064+xenial).```
In my sources list: deb http://deb.kamailio.org/kamailiodev-nightly xenial main deb-src http://deb.kamailio.org/kamailiodev-nightly xenial main
What am I doing wrong? -- Aleksandar Sosic mail: alex.sosic@timenet.it skype: alex.sosic cell: +385 91 2505 146
On Wed, Apr 4, 2018 at 10:27 PM, paolovisintin notifications@github.com wrote:
In our case we will have an autoscaling system of Kamailio based on geographic requests and load so we will not be able to know in advance how many instances ok kamailio will be provisioned. As you suggested for this temporary workaround we should have : Server 1 : 1, 101,201,301,401... Server 2 : 2, 202,302,402,.... Server n : n, n+100,n+200,n+300,...
Where n < 100
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
2018-04-05 11:43 GMT+02:00 alexsosic notifications@github.com:
I'm trying the install this morning and got:
(5.2.0~dev4+0~20180310010216.1064+xenial).``` In my sources list: deb http://deb.kamailio.org/kamailiodev-nightly xenial main deb-src http://deb.kamailio.org/kamailiodev-nightly xenial main What am I doing wrong?
I don't know, maybe you are not doing $ apt-get update
checked right now and the latest available version right now is: kamailio_5.2.0~dev4+0~20180405010344.1088+xenial $ wget http://deb.kamailio.org/kamailiodev-nightly/dists/xenial/main/binary-amd64/P... $ head Packages Package: kamailio Version: 5.2.0~dev4+0~20180405010344.1088+xenial Architecture: amd64 Maintainer: Debian VoIP Team pkg-voip-maintainers@lists.alioth.debian.org Installed-Size: 26018 Depends: adduser, lsb-base, python, init-system-helpers (>= 1.18~), libc6 (>= 2.14), libpcre3, libreadline6 (>= 6.0), libstdc++6 (>= 4.1.1) Suggests: kamailio-berkeley-modules, kamailio-carrierroute-modules, kamailio-cpl-modules, kamailio-dbg, kamailio-ldap-modules, kamailio-lua-modules, kamailio-mysql-modules, kamailio-perl-modules, kamailio-postgres-modules, kamailio-presence-modules, kamailio-python-modules, kamailio-radius-modules, kamailio-snmpstats-modules, kamailio-tls-modules, kamailio-uni xodbc-modules, kamailio-xml-modules, kamailio-xmpp-modules Multi-Arch: foreign Homepage: http://www.kamailio.org/ Priority: optional
https://kamailio.sipwise.com/job/kamailiodev-nightly-binaries/architecture=a...
@alexsosic, nothing to do with this issue. Please use the mailing list for this.
Hi Daniel,
I just tested this version and what I could notice is:
with the first kamailio node comming up and doing some test calls the dialog records in the DB are not deleted after the call end.
On kamailio node failover (call in progress, and new node up with hangup on client side) I've got this logs: ``` 4(85) NOTICE: <script>: [f:0409828030-t:0039040123123 id:NBr3x4WAap6jOlpzbz5KKIUkQfG8fGq- | BYE] Foreign dialog detected for Call-ID: NBr3x4WAap6jOlpzbz5KKIUkQfG8fGq- 4(85) WARNING: dialog [dlg_db_handler.c:246]: create_socket_info(): non-local socket udp:172.16.234.95:5060...ignoring 4(85) WARNING: dialog [dlg_db_handler.c:246]: create_socket_info(): non-local socket udp:172.16.234.95:5060...ignoring```
Also I can confirm that in this case the dialog in the msgdb is also present after the call end!
All the dialog values are correctly retrieved by the function and upon call end we can correctly bill with cgrates.
What I do expect is that the dialogs are deleted (native and the ones restored with dlg_db_load_callid() function) once the call has ended.
Kind regards, -- Aleksandar Sosic mail: alex.sosic@timenet.it skype: alex.sosic cell: +385 91 2505 146
On Wed, Apr 4, 2018 at 9:52 AM, Daniel-Constantin Mierla notifications@github.com wrote:
I just committed the config (and kemi) function dlg_db_load_callid(val), which should load dialog record from database by matching on callid (it loads also associated variables).
While not tested yet given lack of proper environment to simulate such scenario (hopefully you can do it), there is still an issue that needs to be sorted out. Respectively a high potential of having conflicts for internal dialog id. The internal id is made from two integers, hash entry and hash id (h_entry, h_id). h_entry is the index of the slot in the hash table used to store the dialog structure, computed by hashing over call id. h_id is an incremented integer specific for each hash table slot.
So, when loading a new record from database, if its h_id is not conflicting with an existing dialog on the same hash table slot, all is ok, otherwise the module is not going to work properly with two dialogs having same h_id.
Among solutions I thought of:
have the servers generating non conflicting h_id, by having a start value different per server and an increment step larger than the number of servers. Iirc, here was at some point a similar attempt, but somehow didn't make it. Let's say one has two servers, first server starts allocating h_id from 1 and increments by 2 (e.g: 1, 3, 5, ...) and the seconds start from 2 and increments with 2 (2, 4, 6, ...). Those values can be set via mod params, eventually with an option to rely on server_id.
This should be the least intrusive in the other modules built on top of dialog. But it is rather rigid, with the example above, if one adds an extra server, it needs to reconfigure the old ones, so each server starts from either 1, 2 or 3 and increment by 3. Of course, one can set increment step to a larger value, like 100, and then has flexibility to deploy up to 1 hundred servers before having to reconfigure in case there is need for more server.
add server_id as the third field in the dialog id. It will require review of other modules using dialog and eventual code updates in those modules. One column to store the server_id needs to be added to dialog db tables.
switch from this conflicting id system to something like string unique ids, similar to what we have in usrloc records. This will require coding in other modules, changes to database schema, etc., so more work comparing with the above two, but could be the best in long term.
Most likely I am going to push code for option 1) for now, but it would be good to see opinions from other devs, if they would prefer other option or propose other variants.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
The warnings are due to different local socket addresses, probably they should be made debug level.
Are the dialogs not deleted from database also in memory (can you see them in the output of dialog list rpc command)? If yes, paste here the output of such dialog from the rpc command.
Two test calls, each of them ended client side. Still in dialog.
```kamcmd> dlg.list { h_entry: 1685 h_id: 9485 call-id: pd32UDkxLpckfIImaMSLBwxDuFEtHz5V from_uri: sip:0409828030@proxy.mydomain.it to_uri: sip:0039040123123@proxy.mydomain.it state: 5 start_ts: 1523022068 init_ts: 1523022067 timeout: 0 lifetime: 10800 dflags: 516 sflags: 0 iflags: 1 caller: { tag: JqqPPpLqYIBL4tvfcQb1cmBvg97JlC06 contact: sip:0409828030@172.16.21.154:64261;ob cseq: 16023 route_set: sip:172.16.234.68;lr=on;ftag=JqqPPpLqYIBL4tvfcQb1cmBvg97JlC06;did=596.a43 socket: udp:172.16.234.69:5060 } callee: { tag: as791c8975 contact: sip:39040123123@172.16.201.101:5060 cseq: 0 route_set: sip:172.16.234.68;lr=on;ftag=JqqPPpLqYIBL4tvfcQb1cmBvg97JlC06;did=596.b43 socket: udp:172.16.234.69:5060 } profiles: { } variables: { } } { h_entry: 1703 h_id: 7436 call-id: TOdkWS1Q3cLb1WScROKcBM59sMLtN4JX from_uri: sip:0409828030@ proxy.mydomain.it to_uri: sip:0039040123123@ proxy.mydomain.it state: 5 start_ts: 1523022028 init_ts: 1523022027 timeout: 0 lifetime: 10800 dflags: 516 sflags: 0 iflags: 1 caller: { tag: uAxTuXyak45PSRKXaOC15MKf987ef4-y contact: sip:0409828030@172.16.21.154:64261;ob cseq: 22572 route_set: sip:172.16.234.68;lr=on;ftag=uAxTuXyak45PSRKXaOC15MKf987ef4-y;did=7a6.9532 socket: udp:172.16.234.69:5060 } callee: { tag: as1ecd985e contact: sip:39040123123@172.16.201.101:5060 cseq: 0 route_set: sip:172.16.234.68;lr=on;ftag=uAxTuXyak45PSRKXaOC15MKf987ef4-y;did=7a6.a532 socket: udp:172.16.234.69:5060 } profiles: { } variables: { } }```
-- Aleksandar Sošić alex.sosic(at)gmail<dot>com
On Fri, Apr 6, 2018 at 2:52 PM, Daniel-Constantin Mierla < notifications@github.com> wrote:
The warnings are due to different local socket addresses, probably they should be made debug level.
Are the dialogs not deleted from database also in memory (can you see them in the output of dialog list rpc command)? If yes, paste here the output of such dialog from the rpc command.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/issues/1274#issuecomment-379243977, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPxXvjhPf-tjluMGaFV-dBre_ILwI3kks5tl2UpgaJpZM4QAl53 .
Can you try with latest version and paste again the dialog list rpc output? Over the weekend I pushed a patch to print the reference counter value for the dialog.
Do you use other modules that need dialog?
Hi,
I'm using 5.2.0~dev4+0~20180408010333.1091+xenial now. The dialog without the failover is correctly deleted now:
kamcmd> dlg.list { h_entry: 93 h_id: 9163 ref: 2 call-id: w6HWT0d6EZaVY3Lt1dIi0auvX1sq7Ctl from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523271041 init_ts: 1523271041 end_ts: 0 timeout: 1523271228 lifetime: 187 dflags: 512 sflags: 0 iflags: 1 caller: { tag: 3OHIzRviWKofH8StpnwRPFZhLB8zNs4o contact: sip:0409828030@172.16.21.154:55236;ob cseq: 31843 route_set: sip:172.16.234.85;lr=on;ftag=3OHIzRviWKofH8StpnwRPFZhLB8zNs4o;did=d5.39a1 socket: udp:172.16.234.89:5060 } callee: { tag: as497536f9 contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.16.234.85;lr=on;ftag=3OHIzRviWKofH8StpnwRPFZhLB8zNs4o;did=d5.49a1 socket: udp:172.16.234.89:5060 } profiles: { } variables: { { ru: sip:39390409828030@carrier1.cloud.evox.it } { du: sip:172.16.234.85:5060 } { cgrSupplier: carrier1.cloud.evox.it } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it,carrier3.cloud.evox.it } { cgrCallip: 172.16.21.154 } { cgrDestination: 39390409828030 } { cgrAccount: 0409828030 } { cgrTenant: evox.it } { cgrReqType: *prepaid } { calleeNumber: 39390409828030 } { au: 0409828030 } { authType: subscriber } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { originalSourceIP: 172.16.21.154 } } }
During failover the second kamailio istance loads the dialog but than it is not deleted and remains in kamailio even after the call ends via client.
The first instance has this dialog during the call: kamcmd> dlg.list { h_entry: 3812 h_id: 3301 ref: 3 call-id: 0GqZ1kdvtCKeDrh2l5.ctzGd5tc0KX6G from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523271151 init_ts: 1523271150 end_ts: 0 timeout: 1523271338 lifetime: 187 dflags: 512 sflags: 0 iflags: 1 caller: { tag: FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY contact: sip:0409828030@172.16.21.154:55236;ob cseq: 16653 route_set: sip:172.16.234.85;lr=on;ftag=FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY;did=4ee.fca socket: udp:172.16.234.89:5060 } callee: { tag: as0ef32d6d contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.16.234.85;lr=on;ftag=FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY;did=4ee.0da socket: udp:172.16.234.89:5060 } profiles: { } variables: { { ru: sip:39390409828030@carrier1.cloud.evox.it } { du: sip:172.16.234.85:5060 } { cgrSupplier: carrier1.cloud.evox.it } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it,carrier3.cloud.evox.it } { cgrCallip: 172.16.21.154 } { cgrDestination: 39390409828030 } { cgrAccount: 0409828030 } { cgrTenant: evox.it } { cgrReqType: *prepaid } { calleeNumber: 39390409828030 } { au: 0409828030 } { authType: subscriber } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { originalSourceIP: 172.16.21.154 } } }
After that I kill that instance (call still ongoing) and the second one has no dialogs before I call the function dlg_db_load_callid($ci) obviously. After loading the dialog from db (with the function dlg_db_load_callid) the dialog remains on kamailio even after the call ended:
kamcmd> dlg.list { h_entry: 3812 h_id: 3301 ref: 2 call-id: 0GqZ1kdvtCKeDrh2l5.ctzGd5tc0KX6G from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523271151 init_ts: 1523271225 end_ts: 0 timeout: 1523271337 lifetime: 187 dflags: 0 sflags: 0 iflags: 1 caller: { tag: FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY contact: sip:0409828030@172.16.21.154:55236;ob cseq: 16653 route_set: sip:172.16.234.85;lr=on;ftag=FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY;did=4ee.fca socket: } callee: { tag: as0ef32d6d contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.16.234.85;lr=on;ftag=FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY;did=4ee.0da socket: } profiles: { } variables: { { originalSourceIP: 172.16.21.154 } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { authType: subscriber } { au: 0409828030 } { calleeNumber: 39390409828030 } { cgrReqType: *prepaid } { cgrTenant: evox.it } { cgrAccount: 0409828030 } { cgrDestination: 39390409828030 } { cgrCallip: 172.16.21.154 } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it,carrier3.cloud.evox.it } { cgrSupplier: carrier1.cloud.evox.it } { du: sip:172.16.234.85:5060 } { ru: sip:39390409828030@carrier1.cloud.evox.it } } } -- Aleksandar Sošić alex.sosic(at)gmail<dot>com
On Mon, Apr 9, 2018 at 10:41 AM, Daniel-Constantin Mierla notifications@github.com wrote:
Can you try with latest version and paste again the dialog list rpc output? Over the weekend I pushed a patch to print the reference counter value for the dialog.
Do you use other modules that need dialog?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I can notice that after some minutes the dialog is deleted. Is there a timeout? -- Aleksandar Sošić alex.sosic(at)gmail<dot>com
On Mon, Apr 9, 2018 at 12:56 PM, Aleksandar Sosic alex.sosic@gmail.com wrote:
Hi,
I'm using 5.2.0~dev4+0~20180408010333.1091+xenial now. The dialog without the failover is correctly deleted now:
kamcmd> dlg.list { h_entry: 93 h_id: 9163 ref: 2 call-id: w6HWT0d6EZaVY3Lt1dIi0auvX1sq7Ctl from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523271041 init_ts: 1523271041 end_ts: 0 timeout: 1523271228 lifetime: 187 dflags: 512 sflags: 0 iflags: 1 caller: { tag: 3OHIzRviWKofH8StpnwRPFZhLB8zNs4o contact: sip:0409828030@172.16.21.154:55236;ob cseq: 31843 route_set: sip:172.16.234.85;lr=on;ftag=3OHIzRviWKofH8StpnwRPFZhLB8zNs4o;did=d5.39a1 socket: udp:172.16.234.89:5060 } callee: { tag: as497536f9 contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.16.234.85;lr=on;ftag=3OHIzRviWKofH8StpnwRPFZhLB8zNs4o;did=d5.49a1 socket: udp:172.16.234.89:5060 } profiles: { } variables: { { ru: sip:39390409828030@carrier1.cloud.evox.it } { du: sip:172.16.234.85:5060 } { cgrSupplier: carrier1.cloud.evox.it } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it,carrier3.cloud.evox.it } { cgrCallip: 172.16.21.154 } { cgrDestination: 39390409828030 } { cgrAccount: 0409828030 } { cgrTenant: evox.it } { cgrReqType: *prepaid } { calleeNumber: 39390409828030 } { au: 0409828030 } { authType: subscriber } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { originalSourceIP: 172.16.21.154 } } }
During failover the second kamailio istance loads the dialog but than it is not deleted and remains in kamailio even after the call ends via client.
The first instance has this dialog during the call: kamcmd> dlg.list { h_entry: 3812 h_id: 3301 ref: 3 call-id: 0GqZ1kdvtCKeDrh2l5.ctzGd5tc0KX6G from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523271151 init_ts: 1523271150 end_ts: 0 timeout: 1523271338 lifetime: 187 dflags: 512 sflags: 0 iflags: 1 caller: { tag: FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY contact: sip:0409828030@172.16.21.154:55236;ob cseq: 16653 route_set: sip:172.16.234.85;lr=on;ftag=FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY;did=4ee.fca socket: udp:172.16.234.89:5060 } callee: { tag: as0ef32d6d contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.16.234.85;lr=on;ftag=FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY;did=4ee.0da socket: udp:172.16.234.89:5060 } profiles: { } variables: { { ru: sip:39390409828030@carrier1.cloud.evox.it } { du: sip:172.16.234.85:5060 } { cgrSupplier: carrier1.cloud.evox.it } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it,carrier3.cloud.evox.it } { cgrCallip: 172.16.21.154 } { cgrDestination: 39390409828030 } { cgrAccount: 0409828030 } { cgrTenant: evox.it } { cgrReqType: *prepaid } { calleeNumber: 39390409828030 } { au: 0409828030 } { authType: subscriber } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { originalSourceIP: 172.16.21.154 } } }
After that I kill that instance (call still ongoing) and the second one has no dialogs before I call the function dlg_db_load_callid($ci) obviously. After loading the dialog from db (with the function dlg_db_load_callid) the dialog remains on kamailio even after the call ended:
kamcmd> dlg.list { h_entry: 3812 h_id: 3301 ref: 2 call-id: 0GqZ1kdvtCKeDrh2l5.ctzGd5tc0KX6G from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523271151 init_ts: 1523271225 end_ts: 0 timeout: 1523271337 lifetime: 187 dflags: 0 sflags: 0 iflags: 1 caller: { tag: FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY contact: sip:0409828030@172.16.21.154:55236;ob cseq: 16653 route_set: sip:172.16.234.85;lr=on;ftag=FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY;did=4ee.fca socket: } callee: { tag: as0ef32d6d contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.16.234.85;lr=on;ftag=FMn4o4AlYFAxYTkK89SYVksQbaYTfjvY;did=4ee.0da socket: } profiles: { } variables: { { originalSourceIP: 172.16.21.154 } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { authType: subscriber } { au: 0409828030 } { calleeNumber: 39390409828030 } { cgrReqType: *prepaid } { cgrTenant: evox.it } { cgrAccount: 0409828030 } { cgrDestination: 39390409828030 } { cgrCallip: 172.16.21.154 } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it,carrier3.cloud.evox.it } { cgrSupplier: carrier1.cloud.evox.it } { du: sip:172.16.234.85:5060 } { ru: sip:39390409828030@carrier1.cloud.evox.it } } } -- Aleksandar Sošić alex.sosic(at)gmail<dot>com
On Mon, Apr 9, 2018 at 10:41 AM, Daniel-Constantin Mierla notifications@github.com wrote:
Can you try with latest version and paste again the dialog list rpc output? Over the weekend I pushed a patch to print the reference counter value for the dialog.
Do you use other modules that need dialog?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Is it staying in state 4? Previously you added an example with state 5. State 4 means that the call was not ended yet ... If you didn't grab the rpc output for ended dialog, do it again, i need the ref count for state 5.
There is a cleanup timeout for the cases when something on top of dialog is not dereferencing after ending the dialog, see end_timeout parameter.
Hi Daniel,
so with no node failover in the ongoing call I've got this: ``` kamcmd> dlg.list { h_entry: 3696 h_id: 3033 ref: 3 call-id: x64Qy6smfdCH8keXe7rx1gfzr5-NCKH. from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523355019 init_ts: 1523355019 end_ts: 0 timeout: 1523355206 lifetime: 187 dflags: 643 sflags: 0 iflags: 1 caller: { tag: rDGJFCeM3IW7z3P3nR74R28Zk2QwddX- contact: sip:0409828030@172.16.21.38:55236;ob cseq: 10941 route_set: sip:172.22.2.37;lr=on;ftag=rDGJFCeM3IW7z3P3nR74R28Zk2QwddX-;did=07e.9b8 socket: udp:172.22.2.36:5060 } callee: { tag: as1cb15e8b contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.22.2.37;lr=on;ftag=rDGJFCeM3IW7z3P3nR74R28Zk2QwddX-;did=07e.ab8 socket: udp:172.22.2.36:5060 } profiles: { } variables: { { ru: sip:39390409828030@carrier1.cloud.evox.it } { du: sip:172.22.2.37:5060 } { cgrSupplier: carrier1.cloud.evox.it } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it, carrier3.cloud.evox.it } { cgrCallip: 172.16.21.38 } { cgrDestination: 39390409828030 } { cgrAccount: 0409828030 } { cgrTenant: evox.it } { cgrReqType: *prepaid } { calleeNumber: 39390409828030 } { au: 0409828030 } { authType: subscriber } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { originalSourceIP: 172.16.21.38 } } } ```
Then after the call has ended this: ``` kamcmd> dlg.list { h_entry: 3696 h_id: 3033 ref: 1 call-id: x64Qy6smfdCH8keXe7rx1gfzr5-NCKH. from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 5 start_ts: 1523355019 init_ts: 1523355019 end_ts: 1523355025 timeout: 0 lifetime: 187 dflags: 647 sflags: 0 iflags: 1 caller: { tag: rDGJFCeM3IW7z3P3nR74R28Zk2QwddX- contact: sip:0409828030@172.16.21.38:55236;ob cseq: 10941 route_set: sip:172.22.2.37;lr=on;ftag=rDGJFCeM3IW7z3P3nR74R28Zk2QwddX-;did=07e.9b8 socket: udp:172.22.2.36:5060 } callee: { tag: as1cb15e8b contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.22.2.37;lr=on;ftag=rDGJFCeM3IW7z3P3nR74R28Zk2QwddX-;did=07e.ab8 socket: udp:172.22.2.36:5060 } profiles: { } variables: { } } ``` The kamailio version is always 5.2.0~dev4+0~20180408010333.1091+xenial
-- Aleksandar Sošić alex.sosic(at)gmail<dot>com
On Mon, Apr 9, 2018 at 2:13 PM, Daniel-Constantin Mierla < notifications@github.com> wrote:
Is it staying in state 4? Previously you added an example with state 5. State 4 means that the call was not ended yet ... If you didn't grab the rpc output for ended dialog, do it again, i need the ref count for state 5.
There is a cleanup timeout for the cases when something on top of dialog is not dereferencing after ending the dialog, see end_timeout parameter.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/issues/1274#issuecomment-379730601, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPxXmm7Tef5UdXXc8-EKt8jwvwcygYjks5tm1BWgaJpZM4QAl53 .
I need for the failover case, from the second node that loads the dialog from db -- that's what we need to debug.
Yes shure,
but in the case i mentioned above there's a problem that some dialogs are cleared on call end and some remain in state 5 and are not cleared in the db.
Regarding the failover I can observe this after another camailio node loads the dialog from db and the call ends client side: ```kamcmd> dlg.list { h_entry: 3025 h_id: 1845 ref: 2 call-id: jAw54-ddy-jNknXLSjB.Eo06RWfJ6yuR from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523358496 init_ts: 1523358607 end_ts: 0 timeout: 1523358683 lifetime: 187 dflags: 0 sflags: 0 iflags: 1 caller: { tag: OX6QxdbUGdlOOkksck2HafvRa-UqKbmf contact: sip:0409828030@172.16.21.38:55236;ob cseq: 17987 route_set: sip:172.22.2.49;lr=on;ftag=OX6QxdbUGdlOOkksck2HafvRa-UqKbmf;did=1db.601 socket: } callee: { tag: as2595835f contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.22.2.49;lr=on;ftag=OX6QxdbUGdlOOkksck2HafvRa-UqKbmf;did=1db.701 socket: } profiles: { } variables: { { originalSourceIP: 172.16.21.38 } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { authType: subscriber } { au: 0409828030 } { calleeNumber: 39390409828030 } { cgrReqType: *prepaid } { cgrTenant: evox.it } { cgrAccount: 0409828030 } { cgrDestination: 39390409828030 } { cgrCallip: 172.16.21.38 } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it, carrier3.cloud.evox.it } { cgrSupplier: carrier1.cloud.evox.it } { du: sip:172.22.2.49:5060 } { ru: sip:39390409828030@carrier1.cloud.evox.it } } }```
The state remains to 4 and then after 2 minutes it is cleared from the dlg.list and db!
-- Aleksandar Sošić alex.sosic(at)gmail<dot>com
On Tue, Apr 10, 2018 at 12:27 PM, Daniel-Constantin Mierla < notifications@github.com> wrote:
I need for the failover case, from the second node that loads the dialog from db -- that's what we need to debug.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/issues/1274#issuecomment-380051408, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPxXtwqVa2ljdLnbN9TIgpAuiBURlGDks5tnIj_gaJpZM4QAl53 .
Maybe you should summarise the cases that work and the cases that do now work, because you didn't follow what I asked and now it is hard to sort out what needs to be troubleshooted and fixed. Then we start looking at the cases that do not work as expected, one by one.
Also, it would be good if you use the github tracker directly when pasting output from rpc commands, replies via email are not properly formatted and it is not that easy to read your replies.
Ok @miconda,
So let's start with the failover case first. I start a call on **kamailio1**: `kamcmd> dlg.list { h_entry: 2294 h_id: 5833 ref: 2 call-id: lbiUaRwZR8x3mRiBjxWlJDpqkO9JHTdD from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523361570 init_ts: 1523361570 end_ts: 0 timeout: 1523361757 lifetime: 187 dflags: 643 sflags: 0 iflags: 1 caller: { tag: 2Qp.E4ft3KpSHWbjfHVMI2Q3Jz0wEyxW contact: sip:0409828030@172.16.21.38:55236;ob cseq: 1646 route_set: sip:172.22.2.58;lr=on;ftag=2Qp.E4ft3KpSHWbjfHVMI2Q3Jz0wEyxW;did=6f8.6b92 socket: udp:172.22.2.57:5060 } callee: { tag: as47e134c0 contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.22.2.58;lr=on;ftag=2Qp.E4ft3KpSHWbjfHVMI2Q3Jz0wEyxW;did=6f8.7b92 socket: udp:172.22.2.57:5060 } profiles: { } variables: { { ru: sip:39390409828030@carrier1.cloud.evox.it } { du: sip:172.22.2.58:5060 } { cgrSupplier: carrier1.cloud.evox.it } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it,carrier3.cloud.evox.it } { cgrCallip: 172.16.21.38 } { cgrDestination: 39390409828030 } { cgrAccount: 0409828030 } { cgrTenant: evox.it } { cgrReqType: *prepaid } { calleeNumber: 39390409828030 } { au: 0409828030 } { authType: subscriber } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { originalSourceIP: 172.16.21.38 } } }`
I then kill the **kamailio1** and start a new node **kamailio2** with empty dispatcher list. The call is still going on... now client side I do end the call that sends a bye to **kamailio2**, so the new kamailio node does this: `# ----------------------------------------------------------------------------- # route FOREIGN_DIALOG # short description # ----------------------------------------------------------------------------- route[FOREIGN_DIALOG] { $var(breadcrumbs) = "FOREIGN_DIALOG"; BREADCRUMBS
xlog("L_NOTICE","[f:$fU-t:$tU id:$ci | $rm] Foreign dialog detected for Call-ID: $ci\n");
dlg_db_load_callid($ci); setflag(FLG_AUTH_PASSED);
if (is_method("BYE")) { route(CGR_CALL_END); sl_send_reply("200","OK"); } exit; } # end route FOREIGN_DIALOG`
After the call has ended I have this in the dialog inside the kamailio2 node:
`kamcmd> dlg.list { h_entry: 2294 h_id: 5833 ref: 2 call-id: lbiUaRwZR8x3mRiBjxWlJDpqkO9JHTdD from_uri: sip:0409828030@proxy.alex.cloud.evox.it to_uri: sip:390409828030@proxy.alex.cloud.evox.it state: 4 start_ts: 1523361570 init_ts: 1523361632 end_ts: 0 timeout: 1523361758 lifetime: 188 dflags: 0 sflags: 0 iflags: 1 caller: { tag: 2Qp.E4ft3KpSHWbjfHVMI2Q3Jz0wEyxW contact: sip:0409828030@172.16.21.38:55236;ob cseq: 1646 route_set: sip:172.22.2.58;lr=on;ftag=2Qp.E4ft3KpSHWbjfHVMI2Q3Jz0wEyxW;did=6f8.6b92 socket: } callee: { tag: as47e134c0 contact: sip:39390409828030@172.16.201.101:5060 cseq: 0 route_set: sip:172.22.2.58;lr=on;ftag=2Qp.E4ft3KpSHWbjfHVMI2Q3Jz0wEyxW;did=6f8.7b92 socket: } profiles: { } variables: { { originalSourceIP: 172.16.21.38 } { cgrSubsystems: "cgr_subsystems":"*resources;*suppliers;*accounts" } { authType: subscriber } { au: 0409828030 } { calleeNumber: 39390409828030 } { cgrReqType: *prepaid } { cgrTenant: evox.it } { cgrAccount: 0409828030 } { cgrDestination: 39390409828030 } { cgrCallip: 172.16.21.38 } { cgrSuppliers: carrier1.cloud.evox.it,carrier2.cloud.evox.it,carrier3.cloud.evox.it } { cgrSupplier: carrier1.cloud.evox.it } { du: sip:172.22.2.58:5060 } { ru: sip:39390409828030@carrier1.cloud.evox.it } } }`
This stays in memory on the node and in the DB for 2 minutes after the call has ended and then it is cleared. Is this behaviour correct? Am I clear enough or do you want me to make a video of what's going on? Kind regards, Alex
So, on kamailio2, you get the BYE and because the dialog is not found in memory, you do the dlg_db_load_callid() via route[FOREIGN_DIALOG]? Or is another logic calling route[FOREIGN_DIALOG]?
Why do you send 200ok for BYE via sl_send_reply("200","OK")? Shouldn't it be relayed and the reply be sent by caller or callee?
Do you call dlg_manage() after the call is loaded from db?
@miconda We've modified the route like this now: ` route[FOREIGN_DIALOG] { $var(breadcrumbs) = "FOREIGN_DIALOG"; BREADCRUMBS xlog("L_NOTICE","[f:$fU-t:$tU id:$ci | $rm] Foreign dialog detected for Call-ID: $ci\n"); dlg_db_load_callid($ci); setflag(FLG_AUTH_PASSED); dlg_manage(); } # end route FOREIGN_DIALOG `
And everything seems working quite well now with the version 5.2.0~dev4+0~20180408010333.1091+xenial. The old route was a quick&dirty fix for ending foreign dialogs before the recover from db.
I will test also the last nightly build tomorrow.
Thanks!
Closed #1274.
I am closing this one, in case if misfunctionality, open a new issue. The feature was added, if not working, it will be a new bug.