[sr-dev] db_redis shared tcp connection issue

Andrew Pogrebennyk andrew.nau.ua at gmail.com
Mon May 2 20:58:29 CEST 2022


Henning,
yes, will do. For me it seems to solve the problem,
but I have doubt about this code in ims_usrloc_[sp]cscf where its origin is
in usrloc:

                case DB_ONLY:
                case WRITE_THROUGH:
                        /* connect to db only from SIP workers, TIMER and
MAIN processes,
                         * and RPC processes */
                        if (_rank<=0 && _rank!=PROC_TIMER &&
_rank!=PROC_MAIN
                                         && _rank!=PROC_RPC)
                                return 0;
                        break;


The connection creation is skipped when _rank is less than -2, for higher
rank numbers we connect - including from the main process.

Based on Daniel's suggestion I also looked if the main proc closes the
connection after doing some stuff.. but no: main process does not close the
connection AFAICS - then it is available in forked tcp worker processes.

As I found for IMS it works well when the PROC_MAIN does not make a
connection.

If I look at open sockets by kamailio 5.4 running plain usrloc, it looks
better to me with db_mode 0:

- with db_mode 0 i does not have multiple tcp sockets opened for redis in
parallel children
- with db_mode 1 main process has connection open for redis and tcp workers
inherit the socket inode from the main.

I did not test the normal usrloc yet, whether there is any regression or if
it works well if I implement the changes there.
This is the main thing which is holding me back from making PR to usrloc,
ims_usrloc_pcscf, ims_usrloc_scscf.

So to me it looks like it doesn't serve any purpose and other users could
hit the bug; the condition when it happens two tcp children receiving two
registrations close to the same time. Maybe not many users are running
usrloc with db_redis ?

Regards,
Andrew

On Mon, May 2, 2022, 16:52 Henning Westerholt <hw at gilawa.com> wrote:

> Hello,
>
>
>
> thanks for the confirmation. Please create a pull request on our tracker
> with the fix if your tests were successful.
>
>
>
> Cheers,
>
>
>
> Henning
>
>
>
> --
>
> Henning Westerholt – https://skalatan.de/blog/
>
> Kamailio services – https://gilawa.com
>
>
>
> *From:* sr-dev <sr-dev-bounces at lists.kamailio.org> *On Behalf Of *Andrew
> Pogrebennyk
> *Sent:* Friday, April 29, 2022 6:27 PM
> *To:* Daniel-Constantin Mierla <miconda at gmail.com>
> *Cc:* Kamailio (SER) - Development Mailing List <sr-dev at lists.kamailio.org
> >
> *Subject:* Re: [sr-dev] db_redis shared tcp connection issue
>
>
>
> Daniel,
>
> I think I found it. Since some historic times the ims_usrloc_scscf and
> usrloc_pcscf have had connection opened for main process in child init.
>
> I changed the child init from:
>
> case WRITE_THROUGH:
> /* connect to db only from SIP workers, TIMER and MAIN processes */
> if (_rank<=0 && _rank!=PROC_TIMER && _rank!=PROC_MAIN)
>     return 0;
>
>
> to
> case WRITE_THROUGH:
> /* skip child init for non-worker process ranks */
> if (_rank==PROC_INIT || _rank==PROC_MAIN || _rank==PROC_TCP_MAIN)
>    return 0;
>
> Testing it.
>
>
>
> On Fri, Apr 29, 2022 at 4:18 PM Daniel-Constantin Mierla <
> miconda at gmail.com> wrote:
>
> No.
>
> Connections opened in mod init or child init for rank proc main/init must
> be closed again there.
>
> If a component wants to keep the connection open, has to be done in child
> init for ranks corresponding to sip workers, rpcs, timers, ...
>
> On 29.04.22 15:25, Andrew Pogrebennyk wrote:
>
> Hi Daniel,
>
> I am not sure if I understood you correctly. Do you mean that child_init
> should open the connection only when the rank is proc main or proc init?
>
>
>
> For example, in pua module we have
>
>
>
> static int child_init(int rank)
> {
>         if (rank==PROC_INIT || rank==PROC_MAIN || rank==PROC_TCP_MAIN)
>                 return 0; /* do nothing for the main process */
>
>         if (pua_dbf.init==0)
>         {
>                 LM_CRIT("database not bound\n");
>
>
>
> Is that correct? If I have a module which does not connect in child_init
> for rank PROC_RPC, but the origin of this module (ims_dialog vs dialog),
> does also establish connection in RPC rank would that be a problem? No,
> right? :)
>
>
>
> Thanks for the pointer, checking it.
>
> Andrew
>
>
>
> On Fri, Apr 29, 2022 at 1:17 PM Daniel-Constantin Mierla <
> miconda at gmail.com> wrote:
>
> Hello,
>
> this sounds like a module does a db operation in mod init opening the
> connection, but does it close it afterwards there. It should then re-open
> in child init.
>
> It can be also in child_init(), but when the rank is proc main or proc
> init. In child init db connection has to be left opened only for the other
> ranks.
>
> Try to identify which component makes the first operation.
>
> Cheers,
> Daniel
>
> On 29.04.22 12:39, Andrew Pogrebennyk wrote:
>
> Dear community,
> I've been looking at some weirdness in db_redis behavior when it returns
> the responses to the queries made by tcp processes in mixed order.
> Tested this on various kamailio 5.3 and 5.4 (sipwise spce) and they are
> showing interesting pattern.
> After restart of kamailio I ran lsof to enumerate all the sockets open in
> kamailio children.
> There is a connection to db port 6379 which is held by multiple processes
> at the same time.
>
> for i in $(ps auxww | grep kamailio.proxy | grep -v grep | awk '{print
> $2}'); do echo "print file descriptors of $i" && sudo lsof -p $i | grep
> 6379; done > redis_conn.txt
>
> ...i see that lsof lists tcp client socket to redis server with same
> source TCP port and same inode number in several processes:
>
>   14199,   "TIMER NH",
>   14200,  "ctl handler",
>   14205,  "Dialog Clean Timer",
>   14206,  "JSONRPCS FIFO",
>   14210,  "JSONRPCS DATAGRAM",
>   14213,  "tcp receiver (generic) child=0",
>   14214,  "tcp receiver (generic) child=1",
>   14215,  "tcp receiver (generic) child=2",
>   14220,  "tcp receiver (generic) child=3",
>   14224,  "tcp receiver (generic) child=4",
>   14225,  "tcp main process"
>
>
>
> The UDP processes are safe (and some timer ones too), because in that lsof
> they have unique TCP client port.
>
>
>
> That's giving me a lot of headache because UA registrations received by
> any of the TCP workers (or IPSec ones for that matter) are
> randomly failing, because  if two processes made same query to DB in
> parallel it is appearing on the wire with same TCP source port and replies
> can be mixed up.
>
>
>
> This can be some bug in usage of hiredis, impacting all users of db_redis
> module. Is there any relation to the way kamailio is working its TCP
> workers, where maybe tcp workers are forked from the main attendant
> processes after having opened the DB connection?
>
>
>
> P.S. Why I have the above hypothesis: when I log redis queries with
> redis-cli monitor at startup of kamailio, I see only that srem_key_lua is
> executed against redis in runtime only once from that source port, but then
> this connection is shared across multiple processes.
>
>
>
> Regards,
>
> Andrew
>
>
>
> _______________________________________________
>
> Kamailio (SER) - Development Mailing List
>
> sr-dev at lists.kamailio.org
>
> https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
>
> --
>
> Daniel-Constantin Mierla -- www.asipto.com
>
> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>
> Kamailio Advanced Training - Online
>
>   * https://www.asipto.com/sw/kamailio-advanced-training-online/
>
> --
>
> Daniel-Constantin Mierla -- www.asipto.com
>
> www.twitter.com/miconda -- www.linkedin.com/in/miconda
>
> Kamailio Advanced Training - Online
>
>   * https://www.asipto.com/sw/kamailio-advanced-training-online/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-dev/attachments/20220502/b877d1a0/attachment-0001.htm>


More information about the sr-dev mailing list