[sr-dev] Crash in s-cscf registrar module

Hugh Waite hugh.waite at crocodile-rcs.com
Wed Dec 18 19:08:37 CET 2013


Hi Carsten,

The patch that Carlos posted works fine - please commit.
I think a similar fix in needed in ims_icscf/cxdx_lir.c. I patched it as 
well, but haven't been able to reproduce the timeouts that I was having 
before.

I haven't investigated the suggested change to the AVP name type.

Regards,
Hugh


On 18/12/2013 17:10, Carsten Bock wrote:
> Hi Hugh,
>
> did the patch work for you? If yes, we would commit the changes to GIT.
>
> Thanks for testing,
> Carsten
>
> 2013/12/17 Carlos Ruiz Díaz <carlos.ruizdiaz at gmail.com>:
>> Hi Hugh,
>>
>> that is indeed the problem, and I noticed this pattern in similar modules.
>>
>> I remember having this issue with the ims_charging module which I patched
>> for enhancements and that works in a similar fashion regarding the
>> suspension and resuming of transactions.
>>
>> For error conditions, we might have the return code unset, causing the crash
>> you described. To avoid that , I imported the transaction AVP list right
>> after the transaction lookup and before checking for any kind of errors.
>> Also, I always set the AVP to a known value whether there was an error or
>> not. Attached I provide you a patch (which I haven't tested but) that will
>> hopefully cover all possible scenarios, could you please try it?
>>
>> Regarding the null string when printing the AVP from configuration file, I
>> think this is an interpolation problem due to the fact that the AVP is
>> actually numeric and not alphanumeric. If you check the function that
>> creates the AVP, you will notice AVP_NAME_STR as the type. I also changed
>> this in the ims_charging module and here's the function that I used instead
>> [1].
>>
>> [1] https://gist.github.com/caruizdiaz/8008968
>>
>> Regards,
>>
>>
>> On Tue, Dec 17, 2013 at 2:12 PM, Hugh Waite <hugh.waite at crocodile-rcs.com>
>> wrote:
>>> Hi Carlos, Carsten,
>>>  From a bit of code inspection, it looks like this affects the error paths
>>> for the diameter responses.
>>> I've seen these warnings printed from both the s-cscf, and the i-cscf when
>>> there were diameter timeouts (although it didn't cause a crash every time).
>>>
>>> Dec 15 12:13:23 kamailio kam-scscf[22542]: ERROR: <script>: We need to do
>>> an UNREG server SAR assignemnt
>>> Dec 15 12:13:23 kamailio kam-scscf[22542]: INFO: ims_registrar_scscf
>>> [cxdx_sar.c:79]: create_return_code(): created AVP successfully :
>>> [saa_return_code] - [-2]
>>> Dec 15 12:13:23 kamailio kam-scscf[22553]: INFO: ims_registrar_scscf
>>> [cxdx_avp.c:138]: cxdx_get_avp(): cxdx_get_experimental_result_code: Failed
>>> finding avp
>>> Dec 15 12:13:23 kamailio kam-scscf[22553]: INFO: ims_registrar_scscf
>>> [cxdx_avp.c:138]: cxdx_get_avp(): cxdx_get_charging_info: Failed finding avp
>>> Dec 15 12:13:23 kamailio kam-scscf[22553]: ERROR: <script>: Unknown return
>>> code from SAR, value is [<null>]
>>> ...
>>> Dec 16 17:53:51 kamailio kam-icscf[23653]: INFO: ims_icscf
>>> [cxdx_uar.c:71]: create_uaa_return_code(): created AVP successfully :
>>> [uaa_return_code]
>>> Dec 16 17:53:57 kamailio kam-icscf[23666]: ERROR: ims_icscf
>>> [cxdx_uar.c:107]: async_cdp_uar_callback(): Error timeout when  sending
>>> message via CDP
>>> Dec 16 17:53:57 kamailio kam-icscf[23666]: ERROR: <script>: Unknown return
>>> code from UAR, value is [<null>]
>>>
>>> I think there are two issues:
>>> 1) The return_code avp does not work causing a NULL value or crash. I
>>> experimented by restoring the avp lists from the suspended transaction in
>>> the 'error:' section and this seems to work (attached patch) - I can now see
>>> the  "-2" return code that was set up before the suspend. I'll leave it to
>>> you or others to decide if the error handling is being done properly in this
>>> function and if my patch is useful.
>>>
>>> Dec 17 16:41:07 kamailio kam-scscf[25089]: ERROR: <script>: We need to do
>>> an UNREG server SAR assignemnt
>>> Dec 17 16:41:07 kamailio kam-scscf[25089]: INFO: ims_registrar_scscf
>>> [cxdx_sar.c:79]: create_return_code(): created AVP successfully :
>>> [saa_return_code] - [-2]
>>> Dec 17 16:41:07 kamailio kam-scscf[25099]: INFO: ims_registrar_scscf
>>> [cxdx_avp.c:138]: cxdx_get_avp(): cxdx_get_experimental_result_code: Failed
>>> finding avp
>>> Dec 17 16:41:07 kamailio kam-scscf[25099]: INFO: ims_registrar_scscf
>>> [cxdx_avp.c:138]: cxdx_get_avp(): cxdx_get_charging_info: Failed finding avp
>>> Dec 17 16:41:07 kamailio kam-scscf[25099]: ERROR: <script>: SAR error -
>>> error response sent from module
>>>
>>> 2) In these error cases, the original transaction is not responded to.
>>> This leaves hanging calls and other requests. Perhaps the example cfgs could
>>> be updated with default replies in the appropriate places.
>>>
>>> Let me know if there are patches you want me to try.
>>>
>>> Hugh
>>>
>>>
>>> On 15/12/2013 21:17, Hugh Waite wrote:
>>>
>>> Hello,
>>> I am seeing a crash within the latest ims modules using the example cfg
>>> scripts. It also happened in 4.1
>>>
>>> 1) The s-cscf receives a request from an application server and runs
>>> 'assign_server_unreg' (cfg line 368) because the intended destination is not
>>> registered.
>>> 2) The HSS returns an error '5012: Unable to comply' and the suspended
>>> transaction is resumed into the UNREG_SAR_REPLY route (cxdx_sar.c:290)
>>> 3) The coredump shows that the AVP lists are nonsensical, so the action to
>>> get $avp(s:saa_return_code) causes a crash.
>>>
>>> Do the avp lists need to be re-initialised from the suspended transaction,
>>> like in the 'success/done' section (cxdx_sar.c:252)?
>>> Maybe someone who is more familiar with this code can shine some light on
>>> this?
>>>
>>> Also in this scenario I can't see a code path that will send a response
>>> back to the application server e.g. '480 Temporarily Unavailable' - Should
>>> this be done in the cfg before calling assign_server_unreg?
>>>
>>> Regards,
>>> Hugh
>>>
>>> Backtrace:
>>>
>>> (gdb) bt
>>> #0  0x000000000053dc89 in match_by_name (avp=0x303630363a6d6f63, id=116,
>>> name=0x7ffff29895f8) at usr_avp.c:391
>>> #1  0x000000000053e411 in search_next_avp (s=0x7ffff29895f0,
>>> val=0x7ffff2989630) at usr_avp.c:507
>>> #2  0x000000000053e120 in search_avp (ident=..., val=0x7ffff2989630,
>>> state=0x7ffff29895f0) at usr_avp.c:475
>>> #3  0x000000000053de09 in search_first_avp (flags=1, name=...,
>>> val=0x7ffff2989630, s=0x7ffff29895f0) at usr_avp.c:427
>>> #4  0x00007fa8de2f5626 in pv_get_avp (msg=0x7ffff298a030,
>>> param=0x7fa8de86b898, res=0x7ffff2989760) at pv_core.c:1475
>>> #5  0x0000000000499270 in pv_get_spec_value (msg=0x7ffff298a030,
>>> sp=0x7fa8de86b880, value=0x7ffff2989760) at pvapi.c:1266
>>> #6  0x00000000004c5f03 in rval_get_int (h=0x7ffff2989ef0,
>>> msg=0x7ffff298a030, i=0x7ffff2989d58, rv=0x7fa8de86b878, cache=0x0) at
>>> rvalue.c:978
>>> #7  0x00000000004c89f5 in rval_expr_eval_int (h=0x7ffff2989ef0,
>>> msg=0x7ffff298a030, res=0x7ffff2989d58, rve=0x7fa8de86b870) at rvalue.c:1918
>>> #8  0x0000000000420648 in do_action (h=0x7ffff2989ef0, a=0x7fa8de86eaa8,
>>> msg=0x7ffff298a030) at action.c:1219
>>> #9  0x0000000000422878 in run_actions (h=0x7ffff2989ef0, a=0x7fa8de86aa30,
>>> msg=0x7ffff298a030) at action.c:1599
>>> #10 0x0000000000423017 in run_top_route (a=0x7fa8de86aa30,
>>> msg=0x7ffff298a030, c=0x0) at action.c:1685
>>> #11 0x00007fa8de59eae3 in t_continue (hash_index=15710, label=170389234,
>>> route=0x7fa8de86aa30) at t_suspend.c:245
>>> #12 0x00007fa8da1ebc98 in async_cdp_callback (is_timeout=0,
>>> param=0x7fa8d5c68f40, saa=0x0, elapsed_msecs=1) at cxdx_sar.c:290
>>> #13 0x00007fa8db23cacb in api_callback (p=0x7fa8d5c24d40,
>>> msg=0x7fa8d5c5aca8, ptr=0x0) at api_process.c:115
>>> #14 0x00007fa8db27ad87 in worker_process (id=2) at worker.c:330
>>> #15 0x00007fa8db257aea in diameter_peer_start (blocking=0) at
>>> diameter_peer.c:309
>>> #16 0x00007fa8db25a02b in cdp_child_init (rank=0) at mod.c:237
>>> #17 0x00000000004f7ec2 in init_mod_child (m=0x7fa8de841158, rank=0) at
>>> sr_module.c:924
>>> #18 0x00000000004f7d65 in init_mod_child (m=0x7fa8de841d00, rank=0) at
>>> sr_module.c:921
>>> #19 0x00000000004f7d65 in init_mod_child (m=0x7fa8de8420a8, rank=0) at
>>> sr_module.c:921
>>> #20 0x00000000004f7d65 in init_mod_child (m=0x7fa8de842458, rank=0) at
>>> sr_module.c:921
>>> #21 0x00000000004f7d65 in init_mod_child (m=0x7fa8de842ae8, rank=0) at
>>> sr_module.c:921
>>> #22 0x00000000004f7d65 in init_mod_child (m=0x7fa8de842f60, rank=0) at
>>> sr_module.c:921
>>> #23 0x00000000004f8048 in init_child (rank=0) at sr_module.c:948
>>> #24 0x000000000046d57c in main_loop () at main.c:1694
>>> #25 0x000000000047030b in main (argc=13, argv=0x7ffff298af78) at
>>> main.c:2533
>>>
>>>
>>>
>>> --
>>> Hugh Waite
>>> Principal Design Engineer
>>> Crocodile RCS Ltd.
>>>
>>>
>>>
>>> _______________________________________________
>>> sr-dev mailing list
>>> sr-dev at lists.sip-router.org
>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>
>>>
>>>
>>> --
>>> Hugh Waite
>>> Principal Design Engineer
>>> Crocodile RCS Ltd.
>>>
>>>
>>> _______________________________________________
>>> sr-dev mailing list
>>> sr-dev at lists.sip-router.org
>>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>
>>
>>
>> --
>> Carlos
>> http://caruizdiaz.com
>> +595981146623
>>
>> _______________________________________________
>> sr-dev mailing list
>> sr-dev at lists.sip-router.org
>> http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>
>
>


-- 
Hugh Waite
Principal Design Engineer
Crocodile RCS Ltd.




More information about the sr-dev mailing list