[sr-dev] db_redis shared tcp connection issue
Andrew Pogrebennyk
andrew.nau.ua at gmail.com
Mon May 2 20:58:29 CEST 2022
Henning,
yes, will do. For me it seems to solve the problem,
but I have doubt about this code in ims_usrloc_[sp]cscf where its origin is
in usrloc:
case DB_ONLY:
case WRITE_THROUGH:
/* connect to db only from SIP workers, TIMER and
MAIN processes,
* and RPC processes */
if (_rank<=0 && _rank!=PROC_TIMER &&
_rank!=PROC_MAIN
&& _rank!=PROC_RPC)
return 0;
break;
The connection creation is skipped when _rank is less than -2, for higher
rank numbers we connect - including from the main process.
Based on Daniel's suggestion I also looked if the main proc closes the
connection after doing some stuff.. but no: main process does not close the
connection AFAICS - then it is available in forked tcp worker processes.
As I found for IMS it works well when the PROC_MAIN does not make a
connection.
If I look at open sockets by kamailio 5.4 running plain usrloc, it looks
better to me with db_mode 0:
- with db_mode 0 i does not have multiple tcp sockets opened for redis in
parallel children
- with db_mode 1 main process has connection open for redis and tcp workers
inherit the socket inode from the main.
I did not test the normal usrloc yet, whether there is any regression or if
it works well if I implement the changes there.
This is the main thing which is holding me back from making PR to usrloc,
ims_usrloc_pcscf, ims_usrloc_scscf.
So to me it looks like it doesn't serve any purpose and other users could
hit the bug; the condition when it happens two tcp children receiving two
registrations close to the same time. Maybe not many users are running
usrloc with db_redis ?
Regards,
Andrew
On Mon, May 2, 2022, 16:52 Henning Westerholt <hw at gilawa.com> wrote:
> Hello,
>
>
>
> thanks for the confirmation. Please create a pull request on our tracker
> with the fix if your tests were successful.
>
>
>
> Cheers,
>
>
>
> Henning
>
>
>
> --
>
> Henning Westerholt – https://skalatan.de/blog/
>
> Kamailio services – https://gilawa.com
>
>
>
> *From:* sr-dev <sr-dev-bounces at lists.kamailio.org> *On Behalf Of *Andrew
> Pogrebennyk
> *Sent:* Friday, April 29, 2022 6:27 PM
> *To:* Daniel-Constantin Mierla <miconda at gmail.com>
> *Cc:* Kamailio (SER) - Development Mailing List <sr-dev at lists.kamailio.org
> >
> *Subject:* Re: [sr-dev] db_redis shared tcp connection issue
>
>
>
> Daniel,
>
> I think I found it. Since some historic times the ims_usrloc_scscf and
> usrloc_pcscf have had connection opened for main process in child init.
>
> I changed the child init from:
>
> case WRITE_THROUGH:
> /* connect to db only from SIP workers, TIMER and MAIN processes */
> if (_rank<=0 && _rank!=PROC_TIMER && _rank!=PROC_MAIN)
> return 0;
>
>
> to
> case WRITE_THROUGH:
> /* skip child init for non-worker process ranks */
> if (_rank==PROC_INIT || _rank==PROC_MAIN || _rank==PROC_TCP_MAIN)
> return 0;
>
> Testing it.
>
>
>
> On Fri, Apr 29, 2022 at 4:18 PM Daniel-Constantin Mierla <
> miconda at gmail.com> wrote:
>
> No.
>
> Connections opened in mod init or child init for rank proc main/init must
> be closed again there.
>
> If a component wants to keep the connection open, has to be done in child
> init for ranks corresponding to sip workers, rpcs, timers, ...
>
> On 29.04.22 15:25, Andrew Pogrebennyk wrote:
>
> Hi Daniel,
>
> I am not sure if I understood you correctly. Do you mean that child_init
> should open the connection only when the rank is proc main or proc init?
>
>
>
> For example, in pua module we have
>
>
>
> static int child_init(int rank)
> {
> if (rank==PROC_INIT || rank==PROC_MAIN || rank==PROC_TCP_MAIN)
> return 0; /* do nothing for the main process */
>
> if (pua_dbf.init==0)
> {
> LM_CRIT("database not bound\n");
>
>
>
> Is that correct? If I have a module which does not connect in child_init
> for rank PROC_RPC, but the origin of this module (ims_dialog vs dialog),
> does also establish connection in RPC rank would that be a problem? No,
> right? :)
>
>
>
> Thanks for the pointer, checking it.
>
> Andrew
>
>
>
> On Fri, Apr 29, 2022 at 1:17 PM Daniel-Constantin Mierla <
> miconda at gmail.com> wrote:
>
> Hello,
>
> this sounds like a module does a db operation in mod init opening the
> connection, but does it close it afterwards there. It should then re-open
> in child init.
>
> It can be also in child_init(), but when the rank is proc main or proc
> init. In child init db connection has to be left opened only for the other
> ranks.
>
> Try to identify which component makes the first operation.
>
> Cheers,
> Daniel
>
> On 29.04.22 12:39, Andrew Pogrebennyk wrote:
>
> Dear community,
> I've been looking at some weirdness in db_redis behavior when it returns
> the responses to the queries made by tcp processes in mixed order.
> Tested this on various kamailio 5.3 and 5.4 (sipwise spce) and they are
> showing interesting pattern.
> After restart of kamailio I ran lsof to enumerate all the sockets open in
> kamailio children.
> There is a connection to db port 6379 which is held by multiple processes
> at the same time.
>
> for i in $(ps auxww | grep kamailio.proxy | grep -v grep | awk '{print
> $2}'); do echo "print file descriptors of $i" && sudo lsof -p $i | grep
> 6379; done > redis_conn.txt
>
> ...i see that lsof lists tcp client socket to redis server with same
> source TCP port and same inode number in several processes:
>
> 14199, "TIMER NH",
> 14200, "ctl handler",
> 14205, "Dialog Clean Timer",
> 14206, "JSONRPCS FIFO",
> 14210, "JSONRPCS DATAGRAM",
> 14213, "tcp receiver (generic) child=0",
> 14214, "tcp receiver (generic) child=1",
> 14215, "tcp receiver (generic) child=2",
> 14220, "tcp receiver (generic) child=3",
> 14224, "tcp receiver (generic) child=4",
> 14225, "tcp main process"
>
>
>
> The UDP processes are safe (and some timer ones too), because in that lsof
> they have unique TCP client port.
>
>
>
> That's giving me a lot of headache because UA registrations received by
> any of the TCP workers (or IPSec ones for that matter) are
> randomly failing, because if two processes made same query to DB in
> parallel it is appearing on the wire with same TCP source port and replies
> can be mixed up.
>
>
>
> This can be some bug in usage of hiredis, impacting all users of db_redis
> module. Is there any relation to the way kamailio is working its TCP
> workers, where maybe tcp workers are forked from the main attendant
> processes after having opened the DB connection?
>
>
>
> P.S. Why I have the above hypothesis: when I log redis queries with
> redis-cli monitor at startup of kamailio, I see only that srem_key_lua is
> executed against redis in runtime only once from that source port, but then
> this connection is shared across multiple processes.
>
>
>
> Regards,
>
> Andrew
>
>
>
> _______________________________________________
>
> Kamailio (SER) - Development Mailing List
>
> sr-dev at lists.kamailio.org
>
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
>
> --
>
> Daniel-Constantin Mierla -- www.asipto.com
>
> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>
> Kamailio Advanced Training - Online
>
> * https://www.asipto.com/sw/kamailio-advanced-training-online/
>
> --
>
> Daniel-Constantin Mierla -- www.asipto.com
>
> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>
> Kamailio Advanced Training - Online
>
> * https://www.asipto.com/sw/kamailio-advanced-training-online/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-dev/attachments/20220502/b877d1a0/attachment-0001.htm>
More information about the sr-dev
mailing list