<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p>I don't remember by heart, but I think the child_init for
      PROC_MAIN is indeed called before forking TCP worker processes,
      which in this case results in propagation of the db connection.</p>
    <p>Then the db open operation has to me moved from child init for
      rank PROC_MAIN to the one with rank PROC_POSTCHILDINIT, if the
      connection is needed by main process.</p>
    <p>Cheers,<br>
      Daniel<br>
    </p>
    <div class="moz-cite-prefix">On 02.05.22 20:58, Andrew Pogrebennyk
      wrote:<br>
    </div>
    <blockquote type="cite"
cite="mid:CALPNc+5BB8wkzt9BB48LdQJxcRn0ciGt9asFhnkcEKsSBc_xDA@mail.gmail.com">
      <meta http-equiv="content-type" content="text/html; charset=UTF-8">
      <div dir="auto">Henning,
        <div dir="auto">yes, will do. For me it seems to solve the
          problem,</div>
        <div dir="auto">but I have doubt about this code in
          ims_usrloc_[sp]cscf where its origin is in usrloc:</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">                case DB_ONLY:</div>
        <div dir="auto">                case WRITE_THROUGH:</div>
        <div dir="auto">                        /* connect to db only
          from SIP workers, TIMER and MAIN processes,</div>
        <div dir="auto">                         * and RPC processes */</div>
        <div dir="auto">                        if (_rank<=0
          && _rank!=PROC_TIMER && _rank!=PROC_MAIN</div>
        <div dir="auto">                                       
           && _rank!=PROC_RPC)</div>
        <div dir="auto">                                return 0;</div>
        <div dir="auto">                        break;</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">The connection creation is skipped when _rank is
          less than -2, for higher rank numbers we connect - including
          from the main process. </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Based on Daniel's suggestion I also looked if
          the main proc closes the connection after doing some stuff..
          but no: main process does not close the connection AFAICS -
          then it is available in forked tcp worker processes.<br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">As I found for IMS it works well when the
          PROC_MAIN does not make a connection.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">If I look at open sockets by kamailio 5.4
          running plain usrloc, it looks better to me with db_mode 0:</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">- with db_mode 0 i does not have multiple tcp
          sockets opened for redis in parallel children</div>
        <div dir="auto">- with db_mode 1 main process has connection
          open for redis and tcp workers inherit the socket inode from
          the main. </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">I did not test the normal usrloc yet, whether
          there is any regression or if it works well if I implement the
          changes there.</div>
        <div dir="auto">This is the main thing which is holding me back
          from making PR to usrloc, ims_usrloc_pcscf, ims_usrloc_scscf.</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">So to me it looks like it doesn't serve any
          purpose and other users could hit the bug; the condition when
          it happens two tcp children receiving two registrations close
          to the same time. Maybe not many users are running usrloc with
          db_redis ?</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">Regards,</div>
        <div dir="auto">Andrew</div>
      </div>
      <br>
      <div class="gmail_quote">
        <div dir="ltr" class="gmail_attr">On Mon, May 2, 2022, 16:52
          Henning Westerholt <<a href="mailto:hw@gilawa.com"
            moz-do-not-send="true" class="moz-txt-link-freetext">hw@gilawa.com</a>>
          wrote:<br>
        </div>
        <blockquote class="gmail_quote" style="margin:0 0 0
          .8ex;border-left:1px #ccc solid;padding-left:1ex">
          <div link="blue" vlink="purple" style="word-wrap:break-word"
            lang="DE">
            <div class="m_-8102745605837182697WordSection1">
              <p class="MsoNormal"><span>Hello,</span></p>
              <p class="MsoNormal"><span> </span></p>
              <p class="MsoNormal"><span lang="EN-GB">thanks for the
                  confirmation. Please create a pull request on our
                  tracker with the fix if your tests were successful.</span></p>
              <p class="MsoNormal"><span lang="EN-GB"> </span></p>
              <p class="MsoNormal"><span lang="EN-GB">Cheers,</span></p>
              <p class="MsoNormal"><span lang="EN-GB"> </span></p>
              <p class="MsoNormal"><span lang="EN-GB">Henning</span></p>
              <p class="MsoNormal"><span lang="EN-GB"> </span></p>
              <p class="MsoNormal"><span lang="EN-GB">-- </span></p>
              <p class="MsoNormal"><span lang="EN-GB">Henning Westerholt
                  –
                </span><span><a href="https://skalatan.de/blog/"
                    target="_blank" rel="noreferrer"
                    moz-do-not-send="true"><span style="color:#0563c1"
                      lang="EN-GB">https://skalatan.de/blog/</span></a></span><span
                  lang="EN-GB"></span></p>
              <p class="MsoNormal"><span lang="EN-GB">Kamailio services
                  –
                </span><span><a href="https://gilawa.com/"
                    target="_blank" rel="noreferrer"
                    moz-do-not-send="true"><span style="color:#0563c1"
                      lang="EN-GB">https://gilawa.com</span></a></span><span
                  lang="EN-GB"></span></p>
              <p class="MsoNormal"><span lang="EN-GB"> </span></p>
              <div style="border:none;border-top:solid #e1e1e1
                1.0pt;padding:3.0pt 0cm 0cm 0cm">
                <p class="MsoNormal" style="margin-left:35.4pt"><b>From:</b>
                  sr-dev <<a
                    href="mailto:sr-dev-bounces@lists.kamailio.org"
                    target="_blank" rel="noreferrer"
                    moz-do-not-send="true" class="moz-txt-link-freetext">sr-dev-bounces@lists.kamailio.org</a>>
                  <b>On Behalf Of </b>Andrew Pogrebennyk<br>
                  <b>Sent:</b> Friday, April 29, 2022 6:27 PM<br>
                  <b>To:</b> Daniel-Constantin Mierla <<a
                    href="mailto:miconda@gmail.com" target="_blank"
                    rel="noreferrer" moz-do-not-send="true"
                    class="moz-txt-link-freetext">miconda@gmail.com</a>><br>
                  <b>Cc:</b> Kamailio (SER) - Development Mailing List
                  <<a href="mailto:sr-dev@lists.kamailio.org"
                    target="_blank" rel="noreferrer"
                    moz-do-not-send="true" class="moz-txt-link-freetext">sr-dev@lists.kamailio.org</a>><br>
                  <b>Subject:</b> Re: [sr-dev] db_redis shared tcp
                  connection issue</p>
              </div>
              <p class="MsoNormal" style="margin-left:35.4pt"> </p>
              <div>
                <p class="MsoNormal" style="margin-left:35.4pt">Daniel,</p>
                <div>
                  <p class="MsoNormal" style="margin-left:35.4pt">I
                    think I found it. Since some historic times the
                    ims_usrloc_scscf and usrloc_pcscf have
                    had connection opened for main process in child
                    init.</p>
                </div>
                <div>
                  <p class="MsoNormal" style="margin-left:35.4pt">I
                    changed the child init from:</p>
                </div>
                <div>
                  <p class="MsoNormal" style="margin-left:35.4pt">case
                    WRITE_THROUGH:<br>
                    /* connect to db only from SIP workers, TIMER and
                    MAIN processes */<br>
                    if (_rank<=0 && _rank!=PROC_TIMER
                    && _rank!=PROC_MAIN)<br>
                        return 0;</p>
                </div>
                <div>
                  <p class="MsoNormal" style="margin-left:35.4pt"><br>
                    to<br>
                    case WRITE_THROUGH:<br>
                    /* skip child init for non-worker process ranks */<br>
                    if (_rank==PROC_INIT || _rank==PROC_MAIN ||
                    _rank==PROC_TCP_MAIN)<br>
                       return 0;</p>
                </div>
                <div>
                  <p class="MsoNormal" style="margin-left:35.4pt">Testing
                    it.</p>
                </div>
              </div>
              <p class="MsoNormal" style="margin-left:35.4pt"> </p>
              <div>
                <div>
                  <p class="MsoNormal" style="margin-left:35.4pt">On
                    Fri, Apr 29, 2022 at 4:18 PM Daniel-Constantin
                    Mierla <<a href="mailto:miconda@gmail.com"
                      target="_blank" rel="noreferrer"
                      moz-do-not-send="true"
                      class="moz-txt-link-freetext">miconda@gmail.com</a>>
                    wrote:</p>
                </div>
                <blockquote style="border:none;border-left:solid #cccccc
                  1.0pt;padding:0cm 0cm 0cm
                  6.0pt;margin-left:4.8pt;margin-right:0cm">
                  <div>
                    <p style="margin-left:35.4pt">No.</p>
                    <p style="margin-left:35.4pt">Connections opened in
                      mod init or child init for rank proc main/init
                      must be closed again there.</p>
                    <p style="margin-left:35.4pt">If a component wants
                      to keep the connection open, has to be done in
                      child init for ranks corresponding to sip workers,
                      rpcs, timers, ...</p>
                    <div>
                      <p class="MsoNormal" style="margin-left:35.4pt">On
                        29.04.22 15:25, Andrew Pogrebennyk wrote:</p>
                    </div>
                    <blockquote
                      style="margin-top:5.0pt;margin-bottom:5.0pt">
                      <div>
                        <p class="MsoNormal" style="margin-left:35.4pt">Hi
                          Daniel, </p>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt">I am not sure if
                            I understood you correctly. Do you mean that
                            child_init should open the connection only
                            when the rank is proc main or proc init?</p>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt"> </p>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt">For example, in
                            pua module we have </p>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt"> </p>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt">static int
                            child_init(int rank)<br>
                            {<br>
                                    if (rank==PROC_INIT ||
                            rank==PROC_MAIN || rank==PROC_TCP_MAIN)<br>
                                            return 0; /* do nothing for
                            the main process */<br>
                            <br>
                                    if (pua_dbf.init==0)<br>
                                    {<br>
                                            LM_CRIT("database not
                            bound\n");</p>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt"> </p>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt">Is that correct?
                            If I have a module which does not connect in
                            child_init for rank PROC_RPC, but the origin
                            of this module (ims_dialog vs dialog), does
                            also establish connection in RPC rank would
                            that be a problem? No, right? :)</p>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt"> </p>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt">Thanks for the
                            pointer, checking it.</p>
                        </div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt">Andrew</p>
                        </div>
                      </div>
                      <p class="MsoNormal" style="margin-left:35.4pt"> </p>
                      <div>
                        <div>
                          <p class="MsoNormal"
                            style="margin-left:35.4pt">On Fri, Apr 29,
                            2022 at 1:17 PM Daniel-Constantin Mierla
                            <<a href="mailto:miconda@gmail.com"
                              target="_blank" rel="noreferrer"
                              moz-do-not-send="true"
                              class="moz-txt-link-freetext">miconda@gmail.com</a>>
                            wrote:</p>
                        </div>
                        <blockquote style="border:none;border-left:solid
                          #cccccc 1.0pt;padding:0cm 0cm 0cm
                          6.0pt;margin-left:4.8pt;margin-right:0cm">
                          <div>
                            <p style="margin-left:35.4pt">Hello,</p>
                            <p style="margin-left:35.4pt">this sounds
                              like a module does a db operation in mod
                              init opening the connection, but does it
                              close it afterwards there. It should then
                              re-open in child init.</p>
                            <p style="margin-left:35.4pt">It can be also
                              in child_init(), but when the rank is proc
                              main or proc init. In child init db
                              connection has to be left opened only for
                              the other ranks.</p>
                            <p style="margin-left:35.4pt">Try to
                              identify which component makes the first
                              operation.</p>
                            <p style="margin-left:35.4pt">Cheers,<br>
                              Daniel</p>
                            <div>
                              <p class="MsoNormal"
                                style="margin-left:35.4pt">On 29.04.22
                                12:39, Andrew Pogrebennyk wrote:</p>
                            </div>
                            <blockquote
                              style="margin-top:5.0pt;margin-bottom:5.0pt">
                              <div>
                                <p class="MsoNormal"
                                  style="margin-left:35.4pt">Dear
                                  community,<br>
                                  I've been looking at some weirdness in
                                  db_redis behavior when it returns the
                                  responses to the queries made by tcp
                                  processes in mixed order.<br>
                                  Tested this on various kamailio 5.3
                                  and 5.4 (sipwise spce) and they are
                                  showing interesting pattern.<br>
                                  After restart of kamailio I ran lsof
                                  to enumerate all the sockets open in
                                  kamailio children.<br>
                                  There is a connection to db port 6379
                                  which is held by multiple processes at
                                  the same time.</p>
                                <blockquote
                                  style="border:none;border-left:solid
                                  #cccccc 1.0pt;padding:0cm 0cm 0cm
                                  6.0pt;margin-left:4.8pt;margin-right:0cm">
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt">for i in
                                    $(ps auxww | grep kamailio.proxy |
                                    grep -v grep | awk '{print $2}'); do
                                    echo "print file descriptors of $i"
                                    && sudo lsof -p $i | grep
                                    6379; done > redis_conn.txt</p>
                                </blockquote>
                                <p class="MsoNormal"
                                  style="margin-left:35.4pt">...i see
                                  that lsof lists tcp client socket to
                                  redis server with same source TCP port
                                  and same inode number in several
                                  processes:<br>
                                  <br>
                                    14199,   "TIMER NH",<br>
                                    14200,  "ctl handler",<br>
                                    14205,  "Dialog Clean Timer",<br>
                                    14206,  "JSONRPCS FIFO",<br>
                                    14210,  "JSONRPCS DATAGRAM",<br>
                                    14213,  "tcp receiver (generic)
                                  child=0",<br>
                                    14214,  "tcp receiver (generic)
                                  child=1",<br>
                                    14215,  "tcp receiver (generic)
                                  child=2",<br>
                                    14220,  "tcp receiver (generic)
                                  child=3",<br>
                                    14224,  "tcp receiver (generic)
                                  child=4",<br>
                                    14225,  "tcp main process" </p>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt"> </p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt">The UDP
                                    processes are safe (and some timer
                                    ones too), because in that lsof they
                                    have unique TCP client port.</p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt"> </p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt">That's
                                    giving me a lot of headache because
                                    UA registrations received by any of
                                    the TCP workers (or IPSec ones for
                                    that matter) are
                                    randomly failing, because  if
                                    two processes made same query to DB
                                    in parallel it is appearing on the
                                    wire with same TCP source port and
                                    replies can be mixed up.</p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt"> </p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt">This can
                                    be some bug in usage of hiredis,
                                    impacting all users of db_redis
                                    module. Is there any relation to the
                                    way kamailio is working its TCP
                                    workers, where maybe tcp workers are
                                    forked from the main attendant
                                    processes after having opened the DB
                                    connection?</p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt"> </p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt">P.S. Why
                                    I have the above hypothesis: when I
                                    log redis queries with redis-cli
                                    monitor at startup of kamailio, I
                                    see only that srem_key_lua is
                                    executed against redis in runtime
                                    only once from that source port, but
                                    then this connection is shared
                                    across multiple processes.</p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt"> </p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt">Regards,</p>
                                </div>
                                <div>
                                  <p class="MsoNormal"
                                    style="margin-left:35.4pt">Andrew</p>
                                </div>
                              </div>
                              <p class="MsoNormal"
                                style="margin-left:35.4pt"> </p>
                              <pre style="margin-left:35.4pt">_______________________________________________</pre>
                              <pre style="margin-left:35.4pt">Kamailio (SER) - Development Mailing List</pre>
                              <pre style="margin-left:35.4pt"><a href="mailto:sr-dev@lists.kamailio.org" target="_blank" rel="noreferrer" moz-do-not-send="true" class="moz-txt-link-freetext">sr-dev@lists.kamailio.org</a></pre>
                              <pre style="margin-left:35.4pt"><a href="https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev" target="_blank" rel="noreferrer" moz-do-not-send="true" class="moz-txt-link-freetext">https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev</a></pre>
                            </blockquote>
                            <pre style="margin-left:35.4pt">-- </pre>
                            <pre style="margin-left:35.4pt">Daniel-Constantin Mierla -- <a href="http://www.asipto.com" target="_blank" rel="noreferrer" moz-do-not-send="true">www.asipto.com</a></pre>
                            <pre style="margin-left:35.4pt"><a href="http://www.twitter.com/miconda" target="_blank" rel="noreferrer" moz-do-not-send="true">www.twitter.com/miconda</a> -- <a href="http://www.linkedin.com/in/miconda" target="_blank" rel="noreferrer" moz-do-not-send="true">www.linkedin.com/in/miconda</a></pre>
                            <pre style="margin-left:35.4pt">Kamailio Advanced Training - Online</pre>
                            <pre style="margin-left:35.4pt">  * <a href="https://www.asipto.com/sw/kamailio-advanced-training-online/" target="_blank" rel="noreferrer" moz-do-not-send="true" class="moz-txt-link-freetext">https://www.asipto.com/sw/kamailio-advanced-training-online/</a></pre>
                          </div>
                        </blockquote>
                      </div>
                    </blockquote>
                    <pre style="margin-left:35.4pt">-- </pre>
                    <pre style="margin-left:35.4pt">Daniel-Constantin Mierla -- <a href="http://www.asipto.com" target="_blank" rel="noreferrer" moz-do-not-send="true">www.asipto.com</a></pre>
                    <pre style="margin-left:35.4pt"><a href="http://www.twitter.com/miconda" target="_blank" rel="noreferrer" moz-do-not-send="true">www.twitter.com/miconda</a> -- <a href="http://www.linkedin.com/in/miconda" target="_blank" rel="noreferrer" moz-do-not-send="true">www.linkedin.com/in/miconda</a></pre>
                    <pre style="margin-left:35.4pt">Kamailio Advanced Training - Online</pre>
                    <pre style="margin-left:35.4pt">  * <a href="https://www.asipto.com/sw/kamailio-advanced-training-online/" target="_blank" rel="noreferrer" moz-do-not-send="true" class="moz-txt-link-freetext">https://www.asipto.com/sw/kamailio-advanced-training-online/</a></pre>
                  </div>
                </blockquote>
              </div>
            </div>
          </div>
        </blockquote>
      </div>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
Daniel-Constantin Mierla -- <a class="moz-txt-link-abbreviated" href="http://www.asipto.com">www.asipto.com</a>
<a class="moz-txt-link-abbreviated" href="http://www.twitter.com/miconda">www.twitter.com/miconda</a> -- <a class="moz-txt-link-abbreviated" href="http://www.linkedin.com/in/miconda">www.linkedin.com/in/miconda</a>
Kamailio Advanced Training - Online
  * <a class="moz-txt-link-freetext" href="https://www.asipto.com/sw/kamailio-advanced-training-online/">https://www.asipto.com/sw/kamailio-advanced-training-online/</a></pre>
  </body>
</html>