I reproduced in a lab with 3 Kamailio servers
``` kamailio_first 127.0.0.101 modparam("dmq", "server_address", "sip:127.0.0.101:5060") modparam("dmq", "notification_address", "sip:127.0.0.103:5060") modparam("dmq", "multi_notify", 1) modparam("dmq", "num_workers", 2) modparam("dmq", "ping_interval", 15)
kamailio_second 127.0.0.102 notification_peer: 127.0.0.103 modparam("dmq", "server_address", "sip:127.0.0.102:5060") modparam("dmq", "notification_address", "sip:127.0.0.103:5060") modparam("dmq", "multi_notify", 1) modparam("dmq", "num_workers", 2) modparam("dmq", "ping_interval", 15)
kamailio_third 127.0.0.103 notification_peer: 127.0.0.101 modparam("dmq", "server_address", "sip:127.0.0.103:5060") modparam("dmq", "notification_address", "sip:127.0.0.101:5060") modparam("dmq", "multi_notify", 1) modparam("dmq", "num_workers", 2) modparam("dmq", "ping_interval", 15) ```
` kamcmd -s /var/run/kamailio_$1/kamailio_ctl dmq.list_nodes | grep -e host -e status -e last -e local | tr '\n' ' ' | sed -e 's/host/\n&/g'`
``` show dmq bus: first host: 127.0.0.102 status: 2 last_notification: 0 local: 0 host: 127.0.0.103 status: 2 last_notification: 0 local: 0 host: 127.0.0.101 status: 2 last_notification: 0 local: 1 show dmq bus: second host: 127.0.0.101 status: 2 last_notification: 0 local: 0 host: 127.0.0.103 status: 2 last_notification: 0 local: 0 host: 127.0.0.102 status: 2 last_notification: 0 local: 1 show dmq bus: third host: 127.0.0.102 status: 2 last_notification: 0 local: 0 host: 127.0.0.101 status: 2 last_notification: 0 local: 0 host: 127.0.0.103 status: 2 last_notification: 0 local: 1 ``` ``` /etc/init.d/kamailio_first stop /etc/init.d/kamailio_third stop
/etc/init.d/kamailio_third start /etc/init.d/kamailio_first start ```
` kamcmd -s /var/run/kamailio_$1/kamailio_ctl dmq.list_nodes | grep -e host -e status -e last -e local | tr '\n' ' ' | sed -e 's/host/\n&/g'` ``` show dmq bus: first host: 127.0.0.103 status: 2 last_notification: 0 local: 0 host: 127.0.0.101 status: 2 last_notification: 0 local: 1 show dmq bus: second host: 127.0.0.101 status: 8 last_notification: 0 local: 0 host: 127.0.0.103 status: 8 last_notification: 0 local: 0 host: 127.0.0.102 status: 2 last_notification: 0 local: 1 show dmq bus: third host: 127.0.0.101 status: 2 last_notification: 0 local: 0 host: 127.0.0.103 status: 2 last_notification: 0 local: 1 ``` The bus will remain broken, I think there is a second scenario that can break.
The notes I took when I found this (I consider this was a limitation not a bug) Why : when Kamailio_third is shutting down, it will send a message to every peers telling them he is now inactive so they will not try to contact him again. once restarted no nodes will contact him because they have an inactive state for him, at this point the only way he can learn about the other nodes is by contacting Kamailio_first, however Kamailio_first will not know about the other nodes.
Hi Charles, I guess we can fix this, I think there is a second case where things can break I will try to find it a document it as well. Or maybe I should simply use different config ...
I also tried the ring strategy but, I did not like the fact that if you loose one server the ring may get broken.
Maybe this is still better in the end but when you have a lot of server it is more likely that one server may be removed from service.
This is why I am testing with a "master" node Not sure what is you opinion
Example of broken state with a ring config and 4 servers: ``` /etc/init.d/kamailio_second stop /etc/init.d/kamailio_third stop /etc/init.d/kamailio_second start ```
You end up with ``` show dmq bus: first host: 127.0.0.103 status: 8 last_notification: 0 local: 0 host: 127.0.0.104 status: 2 last_notification: 0 local: 0 host: 127.0.0.102 status: 8 last_notification: 0 local: 0 host: 127.0.0.101 status: 2 last_notification: 0 local: 1 show dmq bus: second host: 127.0.0.103 status: 2 last_notification: 0 local: 0 host: 127.0.0.102 status: 2 last_notification: 0 local: 1 show dmq bus: third not running ! show dmq bus: fourth host: 127.0.0.102 status: 8 last_notification: 0 local: 0 host: 127.0.0.103 status: 8 last_notification: 0 local: 0 host: 127.0.0.101 status: 2 last_notification: 0 local: 0 host: 127.0.0.104 status: 2 last_notification: 0 local: 1 ```
Thanks for reporting and for taking the time to test different scenarios.
Using your examples I will try to reproduce locally but out of interest, do you observe the same behaviour if notification address is an FQDN resolving to multiple IPs?
I am guessing FQDN resolving to multiple IPs will greatly improve things ! I will test it and maybe we could document a wiki page on how to configure DMQ bus
@jchavanton - would you mind running some tests again with 851e610 to see if there is an improvement?
Hi, I am still thinking about finding some time to test this in a few days / weeks ...
Closing this one, following the patch referenced above. If still an issue, reopen or create a new issue as the module got new code inside it.
Closed #1349.