Hello José,

The issue may have been introduced by the recent multi-notify option.

To test the theory, could you try setting multi_notify to 0 and the notification address to one of either the A or B server IP addresses?

e.g.

modparam("dmq", "notification_address", "sip:172.112.10.206:5060")
modparam("dmq", "multi_notify", 0)


If the issue is not present in this case then I will look to fix it for multi-notify scenario.

Either way, please include the server C log for comparison.

Cheers,
Charles 

On 7 Feb 2017 07:06, "José Seabra" <joseseabra4@gmail.com> wrote:
Hello Charles,
2 of them were active during network failure time period and at the time of reconnection.

As the issue were noticed on Production environment i don't have enough logs to report you but i have  reproduced the issue on my Lab environment that has exactly the same DMQ configurations, except the number of dmq hosts that are 3 instead of 4.

Steps to reproduce the issue:

Start all 3 kamailio nodes.
at this stage all of them are active in dmq.list_nodes.
Server A
{
    host: 172.112.10.243
    port: 5060
    resolved_ip: 172.112.10.243
    status: 2
    last_notification: 0
    local: 0
}
{
    host: 172.112.10.246
    port: 5060
    resolved_ip: 172.112.10.246
    status: 2
    last_notification: 0
    local: 0
}
{
    host: 172.112.10.207
    port: 5060
    resolved_ip: 172.112.10.207
    status: 2
    last_notification: 0
    local: 0
}
{
    host: 172.112.10.206
    port: 5060
    resolved_ip: 172.112.10.206
    status: 2
    last_notification: 0
    local: 1
}

Server B
{
    host: 172.112.10.243
    port: 5060
    resolved_ip: 172.112.10.243
    status: 2
    last_notification: 0
    local: 0
}
{
    host: 172.112.10.206
    port: 5060
    resolved_ip: 172.112.10.206
    status: 2
    last_notification: 0
    local: 0
}
{
    host: 172.112.10.207
    port: 5060
    resolved_ip: 172.112.10.207
    status: 2
    last_notification: 0
    local: 1
}


Server C

{
    host: 172.112.10.246
    port: 5060
    resolved_ip: 172.112.10.246
    status: 2
    last_notification: 0
    local: 0
}
{
    host: 172.112.10.207
    port: 5060
    resolved_ip: 172.112.10.207
    status: 2
    last_notification: 0
    local: 0
}
{
    host: 172.112.10.206
    port: 5060
    resolved_ip: 172.112.10.206
    status: 2
    last_notification: 0
    local: 0
}
{
    host: 172.112.10.243
    port: 5060
    resolved_ip: 172.112.10.243
    status: 2
    last_notification: 0
    local: 1
}


Then, after few minutes i inserted an IPTABLES rule on server C to drop all packages to 5060 port.

After that the Server A and Server B can see each other:

{
        host: 172.112.10.206
        port: 5060
        resolved_ip: 172.112.10.206
        status: 2
        last_notification: 0
        local: 0
}
{
        host: 172.112.10.207
        port: 5060
        resolved_ip: 172.112.10.207
        status: 2
        last_notification: 0
        local: 1
}

Server B only can see itself:

{
    host: 172.112.10.243
    port: 5060
    resolved_ip: 172.112.10.243
    status: 2
    last_notification: 0
    local: 1
}

This behavior keeps the same after the network connectivity comes up.


Please find out the log files attached on this email for each server.

Server A - 172.112.10.206
Server B - 172.112.10.207
Server C- 172.112.10.243

Let me know if do you need further information.

Regards
José





2017-02-06 14:04 GMT+00:00 Charles Chance <charles.chance@sipcentric.com>:
Hello,

DMQ will remove nodes from its internal list if they fail to respond to its pings - with the exception of the original notification peer specified in config. This way, if the network connection is lost, DMQ will continue to try the original peer indefinitely until connectivity is restored, and rebuild its list of other nodes from there.

Was the original peer (or one of them if multiple defined A/SRV records) still active at the time of reconnection?

It would help to diagnose if you can you send your log from around the time of disconnection, and also at the time of reconnect.

Regards,

Charles


On 6 February 2017 at 10:03, José Seabra <joseseabra4@gmail.com> wrote:
Hello Daniel,

The parameters that i have configured on my kamailio server are:

#!ifdef ENABLE_REG_SYNC
modparam("registrar", "sock_flag", 18)
modparam("registrar", "sock_hdr_name", "Sock-Info")
####### SIP registrar replication to other nodes ######
loadmodule "dmq.so"

#######  distributed message queue module paramenters #####
modparam("dmq", "server_address", "sip:MY_IP_ADDRESS:MY_PORT_ADDRESS")
modparam("dmq", "notification_address", "DMQ_HOSTS")
modparam("dmq", "multi_notify", 1)
modparam("dmq", "num_workers", 4)

loadmodule "dmq_usrloc.so"
modparam("dmq_usrloc", "enable", 1)
modparam("dmq_usrloc", "sync", 1)
modparam("dmq_usrloc", "batch_size", DMQ_BATCH_SIZE)
modparam("dmq_usrloc", "batch_usleep", DMQ_BATCH_USLEEP)
#!endif


Let me know if do you need further information's.

Thank you
Regards
José Seabra

2017-02-06 7:01 GMT+00:00 Daniel-Constantin Mierla <miconda@gmail.com>:

Hello,

what are the parameters for dmq module?

Cheers,
Daniel


On 01/02/2017 11:32, José Seabra wrote:
Hello there,
My DMQ cluster  has 4 nodes and by some reason 2 of them lost the network connectivity for long time(~ 4 hours), after the network of these 2 nodes come back they didn't get connected on DMQ  cluster automatically, i had to restart kamailio to get them again on DMQ list.
My doubts here are:
  • Is this an expected behavior of DMQ module?
  • Is there any way of put them again on DMQ bus without need restart kamailio?

Thank you
Best Regards
José Seabra


_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users

-- 
Daniel-Constantin Mierla
www.twitter.com/miconda -- www.linkedin.com/in/miconda
Kamailio Advanced Training - Mar 6-8 (Europe) and Mar 20-22 (USA) - www.asipto.com
Kamailio World Conference - May 8-10, 2017 - www.kamailioworld.com

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users




--
Cumprimentos
José Seabra

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users




Sipcentric Ltd. Company registered in England & Wales no. 7365592. Registered office: Faraday Wharf, Innovation Birmingham Campus, Holt Street, Birmingham Science Park, Birmingham B7 4BB.

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users




--
Cumprimentos
José Seabra

_______________________________________________
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list
sr-users@lists.sip-router.org
http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users


Sipcentric Ltd. Company registered in England & Wales no. 7365592. Registered office: Faraday Wharf, Innovation Birmingham Campus, Holt Street, Birmingham Science Park, Birmingham B7 4BB.