[sr-dev] [kamailio/kamailio] dmq: wait for a 2nd failed ping before deleting a node (#1840)

Federico Cabiddu notifications at github.com
Thu Feb 7 11:41:32 CET 2019


#### Pre-Submission Checklist
- [X] Commit message has the format required by CONTRIBUTING guide
- [X] Commits are split per component (core, individual modules, libs, utils, ...)
- [X] Each component has a single commit (if not, squash them into one commit)
- [X] No commits to README files for modules (changes must be done to docbook files
in `doc/` subfolder, the README file is autogenerated)

#### Type Of Change
- [X] Small bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds new functionality)
- [ ] Breaking change (fix or feature that would change existing functionality)

#### Checklist:
- [X] PR should be backported to stable branches
- [X] Tested changes locally
- [ ] Related to issue #XXXX (replace XXXX with an open issue number)

#### Description
I've recently being experiencing a loop in nodes removal/addition leading to "ghost nodes".
Suppose to have three servers A,B,C.
Server C goes down not cleanly, so DMQ doesn't notify the other nodes. Server A is the first to send its ping, with a nodelist including node C. After fr_timer, the transaction for the message to node C times out and the node is removed from node A nodelist.
Then node B sends its ping with a nodelist including node C (still alive for A), node A sees node C as a new node and adds it back to its nodelist. Now node B reaching fr_timer timeout removes node C, until next node's A ping, and so on. This does not occur if the delta between node A and node B pings is less than fr_timer.
What I propose here is that, upon a failed ping, the failing node is put in disabled state and we wait a 2nd failed ping before removing it from the nodelist. This should prevent dead nodes to come back.
You can view, comment on, or merge this pull request online at:

  https://github.com/kamailio/kamailio/pull/1840

-- Commit Summary --

  * dmq: wait for a 2nd failed ping before deleting a node

-- File Changes --

    M src/modules/dmq/notification_peer.c (23)

-- Patch Links --

https://github.com/kamailio/kamailio/pull/1840.patch
https://github.com/kamailio/kamailio/pull/1840.diff

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/kamailio/kamailio/pull/1840
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-dev/attachments/20190207/790ea0c4/attachment.html>


More information about the sr-dev mailing list