[sr-dev] git:master: modules_k/pua: Fixed race hazard on pua table

Anca Vamanu anca.vamanu at 1and1.ro
Tue Mar 13 11:00:01 CET 2012


Hi Peter,

Thanks for explaining, I understand now.
Just one more curiosity if you allow me :) . Suppose the case when 
Notify comes and doesn't find any dialog. What if no reply is sent in 
this case? It will be retransmitted and even the first retransmission 
has big chances of coming after the 200OK and finding the dialog in 
database. Have you thought at this solution?

Regards,
Anca


On 03/13/2012 11:31 AM, Peter Dunkley wrote:
> Hi,
>
> This is what the PUA module used to do.  Unfortunately, there is a 
> race hazard (which happens very frequently in certain scenarios) where 
> the NOTIFY that follows a SUBSCRIBE can be received and processed 
> faster than the 200 OK to the SUBSCRIBE.  The temporary dialog stuff 
> is in there so that this NOTIFY can be matched and processed.
>
> So, while putting the PUA behaviour back to this will fix the latest 
> race hazard I found it will re-introduce a much more likely one.
>
> Regards,
>
> Peter
>
> On Tue, 2012-03-13 at 10:55 +0200, Anca Vamanu wrote:
>
>> Hi Peter,
>>
>>
>> Wouldn't it had been easier if you waited until 200OK to do the insert?
>> Kamailio is does not store transactions persistently, so there is no
>> point to store the subscribe dialog information until it is not
>> established. The reason is that if kamailio is restarted in the mean
>> time before receiving the 200OK, it won't match the 200OK to the
>> transaction and won't complete the dialog anyways.
>> This is actually a solution I adopted in the b2bua modules which had the
>> same race problems.
>>
>> Regards,
>> Anca
>>
>>
>>
>> On 03/13/2012 12:01 AM, Peter Dunkley wrote:
>> >  Module: sip-router
>> >  Branch: master
>> >  Commit: 287ee15ffa985cb6d07f192f1d1cbfadb31c0fd8
>> >  URL:http://git.sip-router.org/cgi-bin/gitweb.cgi/sip-router/?a=commit;h=287ee15ffa985cb6d07f192f1d1cbfadb31c0fd8
>> >
>> >  Author: Peter Dunkley<peter.dunkley at crocodile-rcs.com  <mailto:peter.dunkley at crocodile-rcs.com>>
>> >  Committer: Peter Dunkley<peter.dunkley at crocodile-rcs.com  <mailto:peter.dunkley at crocodile-rcs.com>>
>> >  Date:   Mon Mar 12 21:52:39 2012 +0000
>> >
>> >  modules_k/pua: Fixed race hazard on pua table
>> >
>> >  - During testing a race hazard where the 2XX to a back-end SUBSCRIBE can be
>> >     received and processed (and the DB UPDATE to convert a temporary dialog to a
>> >     full dialog) before the DB INSERT to create a temporary dialog is run.  There
>> >     is an incredibly small window for this, but it was happening consistently on
>> >     one system.
>> >  - The easiest way to fix this is to use the replace() DB API to convert the
>> >     dialog and live with the initial INSERT failing (this does not actually
>> >     return an error from the SRDB1 interface so the rest of the code continues
>> >     OK).  Unfortunately, the replace() API is not available for some databases
>> >     (for example, PostgreSQL).
>> >  - I have updated the code to use replace() when it is available and to do an
>> >     update() then check affected_rows() (and if 0 then do an insert()) when
>> >     replace() is not available.
>> >  - The update() and then insert() process makes the window for the race much
>> >     smaller, but doesn't get rid of it completely.  However, with PostgreSQL a
>> >     DB rule can be used to fix it completely:
>> >  - PostgreSQL DB rule:
>> >  CREATE RULE "pua_insert_race1" AS ON INSERT TO "pua"
>> >    WHERE EXISTS(
>> >     SELECT 1 FROM pua WHERE call_id=NEW.call_id AND from_tag=NEW.from_tag
>> >           AND pres_id=NEW.pres_id AND to_tag=''
>> >    ) DO INSTEAD (
>> >     UPDATE pua
>> >       SET expires=NEW.expires,
>> >           desired_expires=NEW.desired_expires,
>> >           flag=NEW.flag,
>> >           etag=NEW.etag,
>> >           tuple_id=NEW.tuple_id,
>> >           to_tag=NEW.to_tag,
>> >           cseq=NEW.cseq,
>> >           record_route=NEW.record_route,
>> >           contact=NEW.contact,
>> >           remote_contact=NEW.remote_contact,
>> >           version=NEW.version,
>> >           extra_headers=NEW.extra_headers
>> >       WHERE call_id=NEW.call_id AND from_tag=NEW.from_tag
>> >            AND pres_id=NEW.pres_id
>> >  );
>> >  - You can also add another PostgreSQL rule to make the failing INSERT
>> >     (described above) do so quietly.  This does not affect the function of the
>> >     code, but it will make the logs quieter (which is nice):
>> >  CREATE RULE "pua_insert_race2" AS ON INSERT TO "pua"
>> >    WHERE EXISTS(
>> >     SELECT 1 FROM pua WHERE call_id=NEW.call_id AND from_tag=NEW.from_tag
>> >           AND to_tag<>''
>> >    ) DO INSTEAD NOTHING;
>> >
>> >  ---
>> >
>> >    modules_k/pua/pua_db.c         |  187 +++++++++++++++++++++++++---------------
>> >    modules_k/pua/send_subscribe.c |   54 ------------
>> >    2 files changed, 118 insertions(+), 123 deletions(-)
>> >
>> >  diff --git a/modules_k/pua/pua_db.c b/modules_k/pua/pua_db.c
>> >  index b83e3ed..b652cee 100644
>> >  --- a/modules_k/pua/pua_db.c
>> >  +++ b/modules_k/pua/pua_db.c
>> >  @@ -883,9 +883,9 @@ int get_record_id_puadb(ua_pres_t *pres, str **rec_id )
>> >    /******************************************************************************/
>> >    int convert_temporary_dialog_puadb(ua_pres_t *pres)
>> >    {
>> >  -       db_key_t query_cols[4], data_cols[10];
>> >  -       db_val_t query_vals[4], data_vals[10];
>> >  -       int n_query_cols = 0, n_data_cols = 0;
>> >  +       db_key_t query_cols[18];
>> >  +       db_val_t query_vals[18];
>> >  +       int n_query_cols = 0;
>> >
>> >           if (pres==NULL)
>> >           {
>> >  @@ -920,82 +920,131 @@ int convert_temporary_dialog_puadb(ua_pres_t *pres)
>> >           n_query_cols++;
>> >
>> >           /* The columns I need to fill in to convert a temporary dialog to a dialog */
>> >  -       data_cols[n_data_cols] =&str_expires_col;
>> >  -       data_vals[n_data_cols].type = DB1_INT;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  -       data_vals[n_data_cols].val.int_val = pres->expires;
>> >  -       n_data_cols++;
>> >  -
>> >  -       data_cols[n_data_cols] =&str_desired_expires_col;
>> >  -       data_vals[n_data_cols].type = DB1_INT;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  -       data_vals[n_data_cols].val.int_val = pres->desired_expires;
>> >  -       n_data_cols++;
>> >  -
>> >  -       data_cols[n_data_cols] =&str_flag_col;
>> >  -       data_vals[n_data_cols].type = DB1_INT;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  -       data_vals[n_data_cols].val.int_val = pres->flag;
>> >  -       n_data_cols++;
>> >  -
>> >  -       data_cols[n_data_cols] =&str_to_tag_col;
>> >  -       data_vals[n_data_cols].type = DB1_STR;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  -       data_vals[n_data_cols].val.str_val = pres->to_tag;
>> >  -       n_data_cols++;
>> >  -
>> >  -       data_cols[n_data_cols] =&str_cseq_col;
>> >  -       data_vals[n_data_cols].type = DB1_INT;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  -       data_vals[n_data_cols].val.int_val = pres->cseq;
>> >  -       n_data_cols++;
>> >  -
>> >  -       data_cols[n_data_cols] =&str_record_route_col;
>> >  -       data_vals[n_data_cols].type = DB1_STR;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  -       data_vals[n_data_cols].val.str_val = pres->record_route;
>> >  -       n_data_cols++;
>> >  -
>> >  -       data_cols[n_data_cols] =&str_contact_col;
>> >  -       data_vals[n_data_cols].type = DB1_STR;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  -       data_vals[n_data_cols].val.str_val = pres->contact;
>> >  -       n_data_cols++;
>> >  -
>> >  -       data_cols[n_data_cols] =&str_remote_contact_col;
>> >  -       data_vals[n_data_cols].type = DB1_STR;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  -       data_vals[n_data_cols].val.str_val = pres->remote_contact;
>> >  -       n_data_cols++;
>> >  -
>> >  -       data_cols[n_data_cols] =&str_version_col;
>> >  -       data_vals[n_data_cols].type = DB1_INT;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  -       data_vals[n_data_cols].val.int_val = pres->version;
>> >  -       n_data_cols++;
>> >  -
>> >  -       data_cols[n_data_cols] =&str_extra_headers_col;
>> >  -       data_vals[n_data_cols].type = DB1_STR;
>> >  -       data_vals[n_data_cols].nul = 0;
>> >  +       query_cols[n_query_cols] =&str_expires_col;
>> >  +       query_vals[n_query_cols].type = DB1_INT;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.int_val = pres->expires;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_desired_expires_col;
>> >  +       query_vals[n_query_cols].type = DB1_INT;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.int_val = pres->desired_expires;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_flag_col;
>> >  +       query_vals[n_query_cols].type = DB1_INT;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.int_val = pres->flag;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_to_tag_col;
>> >  +       query_vals[n_query_cols].type = DB1_STR;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.str_val = pres->to_tag;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_cseq_col;
>> >  +       query_vals[n_query_cols].type = DB1_INT;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.int_val = pres->cseq;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_record_route_col;
>> >  +       query_vals[n_query_cols].type = DB1_STR;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.str_val = pres->record_route;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_contact_col;
>> >  +       query_vals[n_query_cols].type = DB1_STR;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.str_val = pres->contact;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_remote_contact_col;
>> >  +       query_vals[n_query_cols].type = DB1_STR;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.str_val = pres->remote_contact;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_version_col;
>> >  +       query_vals[n_query_cols].type = DB1_INT;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.int_val = pres->version;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_extra_headers_col;
>> >  +       query_vals[n_query_cols].type = DB1_STR;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >           if (pres->extra_headers)
>> >           {
>> >  -               data_vals[n_data_cols].val.str_val.s = pres->extra_headers->s;
>> >  -               data_vals[n_data_cols].val.str_val.len = pres->extra_headers->len;
>> >  +               query_vals[n_query_cols].val.str_val.s = pres->extra_headers->s;
>> >  +               query_vals[n_query_cols].val.str_val.len = pres->extra_headers->len;
>> >           }
>> >           else
>> >           {
>> >  -               data_vals[n_data_cols].val.str_val.s = "";
>> >  -               data_vals[n_data_cols].val.str_val.len = 0;
>> >  +               query_vals[n_query_cols].val.str_val.s = "";
>> >  +               query_vals[n_query_cols].val.str_val.len = 0;
>> >           }
>> >  -       n_data_cols++;
>> >  +       n_query_cols++;
>> >
>> >  -       if (pua_dbf.update(pua_db, query_cols, 0, query_vals,
>> >  -                       data_cols, data_vals, n_query_cols, n_data_cols)<   0)
>> >  +       query_cols[n_query_cols] =&str_event_col;
>> >  +       query_vals[n_query_cols].type = DB1_INT;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.int_val = pres->event;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_watcher_uri_col;
>> >  +       query_vals[n_query_cols].type = DB1_STR;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.str_val.s = pres->watcher_uri->s;
>> >  +       query_vals[n_query_cols].val.str_val.len = pres->watcher_uri->len;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_etag_col;
>> >  +       query_vals[n_query_cols].type = DB1_STR;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.str_val.s = 0;
>> >  +       query_vals[n_query_cols].val.str_val.len = 0;
>> >  +       n_query_cols++;
>> >  +
>> >  +       query_cols[n_query_cols] =&str_tuple_id_col;
>> >  +       query_vals[n_query_cols].type = DB1_STR;
>> >  +       query_vals[n_query_cols].nul = 0;
>> >  +       query_vals[n_query_cols].val.str_val.s = 0;
>> >  +       query_vals[n_query_cols].val.str_val.len = 0;
>> >  +       n_query_cols++;
>> >  +
>> >  +       if (pua_dbf.replace != NULL)
>> >           {
>> >  -               LM_ERR("Failed update db\n");
>> >  -               return -1;
>> >  +               if (pua_dbf.replace(pua_db, query_cols, query_vals, n_query_cols)<   0)
>> >  +               {
>> >  +                       LM_ERR("Failed replace db\n");
>> >  +                       return -1;
>> >  +               }
>> >  +       }
>> >  +       else
>> >  +       {
>> >  +               if (pua_dbf.update(pua_db, query_cols, 0, query_vals,
>> >  +                               query_cols + 4, query_vals + 4, 4, n_query_cols - 4)<   0)
>> >  +               {
>> >  +                       LM_ERR("Failed update db\n");
>> >  +                       return -1;
>> >  +               }
>> >  +
>> >  +               LM_DBG("affected_rows: %d\n", pua_dbf.affected_rows(pua_db));
>> >  +               if (pua_dbf.affected_rows(pua_db) == 0)
>> >  +               {
>> >  +                       if (pua_dbf.insert(pua_db, query_cols, query_vals, n_query_cols)<   0)
>> >  +                       {
>> >  +                               LM_ERR("Failed insert db\n");
>> >  +                               return -1;
>> >  +                       }
>> >  +               }
>> >           }
>> >
>> >  +
>> >           shm_free(pres->remote_contact.s);
>> >           shm_free(pres);
>> >
>> >  diff --git a/modules_k/pua/send_subscribe.c b/modules_k/pua/send_subscribe.c
>> >  index 15a1a02..b135ac3 100644
>> >  --- a/modules_k/pua/send_subscribe.c
>> >  +++ b/modules_k/pua/send_subscribe.c
>> >  @@ -1050,18 +1050,9 @@ insert:
>> >
>> >                   if(subs->flag&   UPDATE_TYPE)
>> >                   {
>> >  -                       /*
>> >  -                       LM_ERR("request for a subscription"
>> >  -                                       " with update type and no record found\n");
>> >  -                       ret= -1;
>> >  -                       goto done;
>> >  -                       commented this because of the strange type parameter in usrloc callback functions
>> >  -                       */
>> >  -
>> >                           LM_DBG("request for a subscription with update type"
>> >                                           " and no record found\n");
>> >                           subs->flag= INSERT_TYPE;
>> >  -
>> >                   }
>> >                   hentity= subscribe_cbparam(subs, REQ_OTHER);
>> >                   if(hentity== NULL)
>> >  @@ -1168,50 +1159,6 @@ insert:
>> >           }
>> >           else
>> >           {
>> >  -        /*
>> >  -               if(presentity->desired_expires== 0)
>> >  -               {
>> >  -
>> >  -                       if(subs->expires<   0)
>> >  -                       {
>> >  -                           LM_DBG("Found previous request for unlimited subscribe-"
>> >  -                                               " do not send subscribe\n");
>> >  -
>> >  -                               if (subs->event&   PWINFO_EVENT)
>> >  -                               {
>> >  -                                       presentity->watcher_count++;
>> >  -                               }
>> >  -                               lock_release(&HashT->p_records[hash_code].lock);
>> >  -                           goto done;
>> >  -
>> >  -                       }
>> >  -
>> >  -
>> >  -                       if(subs->event&   PWINFO_EVENT)
>> >  -                       {
>> >  -                               if(subs->expires== 0)
>> >  -                               {
>> >  -                                       presentity->watcher_count--;
>> >  -                                       if(     presentity->watcher_count>   0)
>> >  -                                       {
>> >  -                                               lock_release(&HashT->p_records[hash_code].lock);
>> >  -                                               goto done;
>> >  -                                       }
>> >  -                               }
>> >  -                               else
>> >  -                               {
>> >  -                                       presentity->watcher_count++;
>> >  -                                       if(presentity->watcher_count>   1)
>> >  -                                       {
>> >  -                                               lock_release(&HashT->p_records[hash_code].lock);
>> >  -                                               goto done;
>> >  -                                       }
>> >  -                               }
>> >  -                       }
>> >  -
>> >  -               }
>> >  -        */
>> >  -
>> >                   if (subs->internal_update_flag == INTERNAL_UPDATE_TRUE)
>> >                   {
>> >                           LM_INFO("attempting to re-SUBSCRIBE on internal (rls_update_subs()) update - skipping\n");
>> >  @@ -1265,7 +1212,6 @@ insert:
>> >                   if (dbmode!=PUA_DB_ONLY)
>> >                           lock_release(&HashT->p_records[hash_code].lock);
>> >
>> >  -       //      hentity->flag= flag;
>> >                   LM_DBG("event parameter: %d\n", hentity->event);
>> >
>> >                   set_uac_req(&uac_r,&met, str_hdr, 0, td, TMCB_LOCAL_COMPLETED,
>> >
>> >
>> >  _______________________________________________
>> >  sr-dev mailing list
>> >  sr-dev at lists.sip-router.org  <mailto:sr-dev at lists.sip-router.org>
>> >  http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>> >
>>
>
> -- 
> Peter Dunkley
> Technical Director
> Crocodile RCS Ltd
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-dev/attachments/20120313/95d0c571/attachment-0001.htm>


More information about the sr-dev mailing list