[sr-dev] db_redis shared tcp connection issue

Andrew Pogrebennyk andrew.nau.ua at gmail.com
Fri Apr 29 18:27:14 CEST 2022


Daniel,
I think I found it. Since some historic times the ims_usrloc_scscf and
usrloc_pcscf have had connection opened for main process in child init.
I changed the child init from:
case WRITE_THROUGH:
/* connect to db only from SIP workers, TIMER and MAIN processes */
if (_rank<=0 && _rank!=PROC_TIMER && _rank!=PROC_MAIN)
    return 0;

to
case WRITE_THROUGH:
/* skip child init for non-worker process ranks */
if (_rank==PROC_INIT || _rank==PROC_MAIN || _rank==PROC_TCP_MAIN)
   return 0;
Testing it.

On Fri, Apr 29, 2022 at 4:18 PM Daniel-Constantin Mierla <miconda at gmail.com>
wrote:

> No.
>
> Connections opened in mod init or child init for rank proc main/init must
> be closed again there.
>
> If a component wants to keep the connection open, has to be done in child
> init for ranks corresponding to sip workers, rpcs, timers, ...
> On 29.04.22 15:25, Andrew Pogrebennyk wrote:
>
> Hi Daniel,
> I am not sure if I understood you correctly. Do you mean that child_init
> should open the connection only when the rank is proc main or proc init?
>
> For example, in pua module we have
>
> static int child_init(int rank)
> {
>         if (rank==PROC_INIT || rank==PROC_MAIN || rank==PROC_TCP_MAIN)
>                 return 0; /* do nothing for the main process */
>
>         if (pua_dbf.init==0)
>         {
>                 LM_CRIT("database not bound\n");
>
> Is that correct? If I have a module which does not connect in child_init
> for rank PROC_RPC, but the origin of this module (ims_dialog vs dialog),
> does also establish connection in RPC rank would that be a problem? No,
> right? :)
>
> Thanks for the pointer, checking it.
> Andrew
>
> On Fri, Apr 29, 2022 at 1:17 PM Daniel-Constantin Mierla <
> miconda at gmail.com> wrote:
>
>> Hello,
>>
>> this sounds like a module does a db operation in mod init opening the
>> connection, but does it close it afterwards there. It should then re-open
>> in child init.
>>
>> It can be also in child_init(), but when the rank is proc main or proc
>> init. In child init db connection has to be left opened only for the other
>> ranks.
>>
>> Try to identify which component makes the first operation.
>>
>> Cheers,
>> Daniel
>> On 29.04.22 12:39, Andrew Pogrebennyk wrote:
>>
>> Dear community,
>> I've been looking at some weirdness in db_redis behavior when it returns
>> the responses to the queries made by tcp processes in mixed order.
>> Tested this on various kamailio 5.3 and 5.4 (sipwise spce) and they are
>> showing interesting pattern.
>> After restart of kamailio I ran lsof to enumerate all the sockets open in
>> kamailio children.
>> There is a connection to db port 6379 which is held by multiple processes
>> at the same time.
>>
>>> for i in $(ps auxww | grep kamailio.proxy | grep -v grep | awk '{print
>>> $2}'); do echo "print file descriptors of $i" && sudo lsof -p $i | grep
>>> 6379; done > redis_conn.txt
>>
>> ...i see that lsof lists tcp client socket to redis server with same
>> source TCP port and same inode number in several processes:
>>
>>   14199,   "TIMER NH",
>>   14200,  "ctl handler",
>>   14205,  "Dialog Clean Timer",
>>   14206,  "JSONRPCS FIFO",
>>   14210,  "JSONRPCS DATAGRAM",
>>   14213,  "tcp receiver (generic) child=0",
>>   14214,  "tcp receiver (generic) child=1",
>>   14215,  "tcp receiver (generic) child=2",
>>   14220,  "tcp receiver (generic) child=3",
>>   14224,  "tcp receiver (generic) child=4",
>>   14225,  "tcp main process"
>>
>> The UDP processes are safe (and some timer ones too), because in
>> that lsof they have unique TCP client port.
>>
>> That's giving me a lot of headache because UA registrations received by
>> any of the TCP workers (or IPSec ones for that matter) are
>> randomly failing, because  if two processes made same query to DB in
>> parallel it is appearing on the wire with same TCP source port and replies
>> can be mixed up.
>>
>> This can be some bug in usage of hiredis, impacting all users of db_redis
>> module. Is there any relation to the way kamailio is working its TCP
>> workers, where maybe tcp workers are forked from the main attendant
>> processes after having opened the DB connection?
>>
>> P.S. Why I have the above hypothesis: when I log redis queries with
>> redis-cli monitor at startup of kamailio, I see only that srem_key_lua is
>> executed against redis in runtime only once from that source port, but then
>> this connection is shared across multiple processes.
>>
>> Regards,
>> Andrew
>>
>> _______________________________________________
>> Kamailio (SER) - Development Mailing Listsr-dev at lists.kamailio.orghttps://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
>>
>> --
>> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
>> Kamailio Advanced Training - Online
>>   * https://www.asipto.com/sw/kamailio-advanced-training-online/
>>
>> --
> Daniel-Constantin Mierla -- www.asipto.comwww.twitter.com/miconda -- www.linkedin.com/in/miconda
> Kamailio Advanced Training - Online
>   * https://www.asipto.com/sw/kamailio-advanced-training-online/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.kamailio.org/pipermail/sr-dev/attachments/20220429/aaf08acb/attachment-0001.htm>


More information about the sr-dev mailing list