Description

I have 2 jsonrpc servers configured with different prio's. For testing, I have the servers configured to always delay the response to any request by more than the module's timout setting.

The (initial) request is sent to the first server. As this one times out, I would expect a retry to go to the second servers, but instead, all retries are sent to the same server. The backup server is never contacted. This makes the whole "prio" system seem a bit useless.

Troubleshooting

Reproduction

modparam("janssonrpcc", "server", "conn=test;addr=pc1;port=8081;priority=5;weight=10")
modparam("janssonrpcc", "server", "conn=test;addr=pc1;port=8082;priority=5;weight=10")

janssonrpc_request("test", "Test.Timeout", '[  { "Timout": 1000} ]', "route=JSONRPC_RESPONSE;retry=10;timeout=1000");

Log Messages

No useful logs are produced. I verified the described behavior on the jsonrpc server.

2023-02-23T16:59:34.585346+01:00 pc1 proxy1[340870]: INFO: janssonrpcc [janssonrpc_connect.c:361]: bev_connect(): Connecting to server pc1:8081 for conn rating.
2023-02-23T16:59:34.585420+01:00 pc1 proxy1[340870]: INFO: janssonrpcc [janssonrpc_connect.c:361]: bev_connect(): Connecting to server pc1:8082 for conn rating.
2023-02-23T16:59:34.585446+01:00 pc1 proxy1[340870]: INFO: janssonrpcc [janssonrpc_connect.c:290]: bev_connect_cb(): Connected to host pc1:8081
2023-02-23T16:59:34.585462+01:00 pc1 proxy1[340870]: INFO: janssonrpcc [janssonrpc_connect.c:290]: bev_connect_cb(): Connected to host pc1:8082
2023-02-23T17:05:10.903398+01:00 pc1 proxy1[340870]: WARNING: janssonrpcc [janssonrpc_request.c:247]: schedule_retry(): Number of retries exceeded. Failing request.

Possible Solutions

Retry in combination with a timeout and prio's is a bit tricky. When do what? Just retrying on the first prio makes the lower prio servers completely useless, while going to the next prio on every retry skips possibly useful high-prio servers and may exhaust the number of candidate servers very fast.

Best solution IMHO would be to first try every server in the highest prio, before going to the next prio. Do not do (exponential) backoff between these steps.

If there are still retries remaining after that, wrap around to the highest prio with the exponentional backoff delay.

With the above, failover considers all servers and failover between servers is fast while not overloading a single server.

BTW. If I configure multiple servers per prio, it seems to randomly select one of them for every (re)try. It never selects one form the next prio.

Additional Information

5.6.1
(paste your output here)


Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <kamailio/kamailio/issues/3378@github.com>