Hi,
Currently there is ping mechanism in DMQ for detecting inactive nodes.
However this triggers only on "ping_interval".
DMQ exposes API functions to other modules for sending dmq messages or
broadcasting them. This API functions require "dmq_resp_cback_t" to be
passed as parameter. Most of the other modules just log a DBG msg in it.
There are two improvement ideas:
1. other modules that use DMQ, to detect node failures too, as they are
sending/bcasting messages. This, in addition to DMQ module "ping_interval"
mechanism.
2. don't set inactive state of nodes on first failure, but have per node
counter of fails, and modparam to tune this check.
To do this either:
A. Implement the response callback in each module that use DMQ, to check
response code and set inactive state of dmq node => duplicate same code
across different modules
B. Send NULL for that "dmq_resp_cback_t" from other modules that use DMQ.
In DMQ module itself, check if NULL and if so, use a default callback that
checks response code and set inactive state of dmq nodes => cleaner approach
Any opinions on this? Do you see any possible problems with this?
Thanks,
Stefan