[SR-Users] [sr-dev] kamailio core at qm_status ( patch required important)

Daniel-Constantin Mierla miconda at gmail.com
Fri Sep 21 09:46:20 CEST 2012


Hello,

yes, I'll apply it, just that I was mainly offline these days to had a 
proper chance to look at it.

Thanks for troubleshooting and patching,
Daniel

On 9/20/12 2:36 PM, Jijo wrote:
> Hi Daniel,
>
> This patch needs to be applied to avoid core.
>
> Thanks
> Jijo
>
> On Wed, Sep 19, 2012 at 10:54 AM, Jijo <realjijo at gmail.com 
> <mailto:realjijo at gmail.com>> wrote:
>
>
>     Hi All,
>
>     Finally i found the issue,
>
>     Here is one of the bad trace for SUBSCRIBE(722bytes) and
>     NOTIFY(1282bytes) which corrupted the memory. The messages came in
>     the order NOTIFY and SUBSCRIBE. The core is generated in a
>     different place but I believe this could be the reason for memory
>     corruption.
>
>     Here is the trace UDP Process 27294processing NOTIFY and Process
>     27303processing SUBSCRIBE .
>
>     The explanation and implementation is below
>
>     2012-09-19T02:06:17+01:00 [info] sipserver: [27303] INFO:
>     <script>: CI=1-3292 at 10.233.20.152 <mailto:CI=1-3292 at 10.233.20.152>
>     -R39 - Entry R-URI=1234 FD=10.233.20.152 SI=10.233.20.152
>
>     2012-09-19T02:06:17+01:00 [info] sipserver: [27294] INFO:
>     <script>: CI=1-3292 at 10.233.20.152 <mailto:CI=1-3292 at 10.233.20.152>
>     -R1 - Force LAN socket: tcp:10.233.20.151:5060
>     <http://10.233.20.151:5060> <null>
>
>     2012-09-19T02:06:17+01:00 [info] sipserver: [27303] INFO:
>     <script>: CI=1-3292 at 10.233.20.152 <mailto:CI=1-3292 at 10.233.20.152>
>     -R1 - Force LAN socket: tcp:10.233.20.151:5060
>     <http://10.233.20.151:5060> <null>
>
>     2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core>
>     [tcp_main.c:2357]: tcp_conn_send_put : calling wbufq_add
>
>     2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core>
>     [tcp_main.c:730]: ERROR: wbufq_add(722 bytes): buf:SUBSCRIBE
>     sip:1234 at 10.233.20.141:5063;transport=tcp SIP/2.0
>
>     Record-Route: <sip:10.233.20.151;tran
>
>     2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core>
>     [tcp_main.c:747]: ERROR: wbufq_add(722 bytes): first:b00519f4
>     last:b00519f4
>
>     2012-09-19T02:06:17+01:00 [err] sipserver: [27303] ERROR: <core>
>     [tcp_main.c:774]: ERROR: wbufq_add(2 last free crt_size:722):
>     first:b00519f4 last:b00519f4
>
>     2012-09-19T02:06:17+01:00 [err] sipserver: [27294] ERROR: <core>
>     [tcp_main.c:796]: ERROR: wbufq_insert(1282 bytes): buf:NOTIFY
>     sip:1234 at 10.233.20.141:5063;transport=tcp SIP/2.0
>
>     Record-Route: <sip:10.233.20.151;transpo
>
>     2012-09-19T02:06:17+01:00 [err] sipserver: [27294] ERROR: <core>
>     [tcp_main.c:801]: ERROR: wbufq_insert(2 last free ):
>     first:b00519f4 last:b00519f4
>
>     2012-09-19T02:06:17+01:00 [err] sipserver: [27294] ERROR: <core>
>     [tcp_main.c:820]: ERROR: wbufq_insert(22 last free ):
>     first:b00519f4 last:b00519f4
>
>     2012-09-19T02:06:17+01:00 [err] sipserver: [27359] ERROR: <core>
>     [tcp_main.c:887]: ERROR: wbufq_run(3 last free ): first:b00519f4
>     last:b00519f4
>
>     2012-09-19T02:06:17+01:00 [crit] sipserver: [27359] : <core>
>     [mem/q_malloc.c:157]: BUG: qm_*: fragm. 0xb00519dc (address
>     0xb00519f4) end overwritten(0, 0)!
>
>     2012-09-19T02:06:18+01:00 [alert] sipserver: [27265] ALERT: <core>
>     [main.c:755]: child process 27359 exited by a signal 11
>
>     2012-09-19T02:06:18+01:00 [alert] sipserver: [27265] ALERT: <core>
>     [main.c:758]: core was generated
>
>     2012-09-19T02:06:18+01:00 [info] sipserver: [27265] INFO: <core>
>     [main.c:770]: INFO: terminating due to SIGCHLD
>
>     Process 27294(NOTIFY) created the TCP connection structure for
>     destination IP and just before calling wbufq_insert(), context
>     switch happened and process 27303(SUBSCRIBE) got the cpu. Since
>     the connection structure is already available process 27303 add
>     the SUBSCRIBE message(722 bytes) to wbufq. Afterwards process
>     27294 got the CPU and invoked wbufq_insert() which basically
>     corrupted the memory due to an overflow with the existing
>     implementation.
>
>     inline static int _wbufq_insert(struct  tcp_connection* c, const
>     char* data,
>
>     unsigned int size)
>
>     {
>
>     struct tcp_wbuffer_queue* q;
>
>     struct tcp_wbuffer* wb;
>
>     q=&c->wbuf_q;
>
>     if (likely(q->first==0)) /* if empty, use wbufq_add */
>
>     return _wbufq_add(c, data, size);
>
>     :
>
>     :
>
>     :
>
>     :
>
>     if ((q->first==q->last) && ((q->last->b_size-q->last_used)>=size)){
>
>     /* one block with enough space in it for size bytes */
>
>     memmove(q->first->buf+size, q->first->buf, size);
>
>     memcpy(q->first->buf, data, size);
>
>     q->last_used+=size;
>
>     }
>
>     The above condition shall be true in this case and memmove was
>     moving the pointer which was causing the overflow.
>
>                          memmove(void *dest, const void *src, size_t n);
>
>     As per the memmove man page, the src shall be copied with size ‘n’
>     to a temporary buffer and then temporary buffer to dest.
>
>     dest is q->first->buf+size: which is basically (q->first->buf  +
>     NOTIFY MSG SIZE). so the dst will move my 1282 bytes, so we have
>     remaining space of only 818 bytes.
>
>     src is q->first->buf: which is basically copied to temp buffer
>     with NOTIFY SIZE(1282bytes).
>
>     Finally we are moving the buffer from temporary buffer of size
>     1282 bytes to buffer which we left with 818 bytes, This basically
>     corrupt the memory and on wbufq_run we see the memory corruption
>
>     2012-09-19T02:06:17+01:00 [crit] sipserver: [27359] : <core>
>     [mem/q_malloc.c:157]: BUG: qm_*: fragm. 0xb00519dc (address
>     0xb00519f4) end overwritten(0, 0)!
>
>     I think we don’t need memove, so we can change the code as below
>
>            if ((q->first==q->last) &&
>     ((q->last->b_size-q->last_used)>=size)){
>
>     /* one block with enough space in it for size bytes */
>
>     //memmove(q->first->buf+size, q->first->buf, size);
>
>     memcpy(q->first->buf+q->last_used, data, size);
>
>     q->last_used+=size;
>
>     }
>
>
>     OR
>     we need only the else part as it always add the block to the first.
>
>     Thanks
>     Jijo
>
>     On Wed, Jul 18, 2012 at 12:42 PM, Daniel-Constantin Mierla
>     <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>
>         Hello,
>
>         sorry, I just read the last messages in the thread and I
>         didn't noticed it is about a crash, but thought it is about an
>         annoying log message.
>
>         The backtrace is no longer matching the latest sources of the
>         3.1 branch, but I expect it is due to a double free issue, so
>         you have to update to latest 3.1.6, as suggested by other
>         users here. There were many fixes from 3.1.0 to 3.1.6.
>
>         Cheers,
>         Daniel
>
>
>         On 7/17/12 6:03 PM, Jijo wrote:
>>         Hi,
>>
>>         This is not happening at shutdown or status check. Its
>>         aborting when the system is active.
>>
>>         Thanks
>>         Jijo
>>         On Tue, Jul 17, 2012 at 3:06 AM, Daniel-Constantin Mierla
>>         <miconda at gmail.com <mailto:miconda at gmail.com>> wrote:
>>
>>             Hello,
>>
>>             does it keep being or it is one time and that's it?
>>
>>             That is printed at shut down or if pkg_status() or
>>             shm_status() is executed from some part of code or in
>>             config via cfgutils module functions.
>>
>>             You can get rid of them by setting memdbg and memlog to a
>>             value higher than debug global parameter.
>>
>>             Cheers,
>>             Daniel
>>
>>
>>             On 7/16/12 5:28 PM, Jijo wrote:
>>>             Thanks.. It is not easy to upgrade as it is happening at
>>>             customer system.
>>>             Is there any change occurred for this issue.I looked at
>>>             it, but didn't see anything in q_malloc.c/qm_status()
>>>
>>>
>>>             On Mon, Jul 16, 2012 at 11:12 AM, Jon Bonilla
>>>             <manwe at aholab.ehu.es <mailto:manwe at aholab.ehu.es>> wrote:
>>>
>>>                 El Mon, 16 Jul 2012 10:27:42 -0400
>>>                 Jijo <realjijo at gmail.com
>>>                 <mailto:realjijo at gmail.com>> escribió:
>>>
>>>                 > Hi All,
>>>                 >
>>>                 > I'm observing a core intermittently at "qm_status
>>>                 (qm=0x786cd000) at
>>>                 > mem/q_malloc.c:763" for kamailio version 3.1.0
>>>                 >
>>>
>>>                 I'd say that you're using a very old version. You
>>>                 should update your branch to
>>>                 3.1.6 or upgrade to a newer branch.
>>>
>>>
>>>                 _______________________________________________
>>>                 sr-dev mailing list
>>>                 sr-dev at lists.sip-router.org
>>>                 <mailto:sr-dev at lists.sip-router.org>
>>>                 http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>>
>>>
>>>
>>>
>>>             _______________________________________________
>>>             sr-dev mailing list
>>>             sr-dev at lists.sip-router.org  <mailto:sr-dev at lists.sip-router.org>
>>>             http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-dev
>>
>>             -- 
>>             Daniel-Constantin Mierla -http://www.asipto.com
>>             http://twitter.com/#!/miconda  <http://twitter.com/#%21/miconda>  -http://www.linkedin.com/in/miconda
>>             Kamailio Advanced Training, Seattle, USA, Sep 23-26, 2012 -http://asipto.com/u/katu
>>             Kamailio Practical Workshop, Netherlands, Sep 10-12, 2012 -http://asipto.com/u/kpw
>>
>>
>
>         -- 
>         Daniel-Constantin Mierla -http://www.asipto.com
>         http://twitter.com/#!/miconda  <http://twitter.com/#%21/miconda>  -http://www.linkedin.com/in/miconda
>         Kamailio Advanced Training, Seattle, USA, Sep 23-26, 2012 -http://asipto.com/u/katu
>         Kamailio Practical Workshop, Netherlands, Sep 10-12, 2012 -http://asipto.com/u/kpw
>
>
>

-- 
Daniel-Constantin Mierla - http://www.asipto.com
http://twitter.com/#!/miconda - http://www.linkedin.com/in/miconda
Kamailio Advanced Training, Berlin, Nov 5-8, 2012 - http://asipto.com/u/kat
Kamailio Advanced Training, Miami, USA, Nov 12-14, 2012 - http://asipto.com/u/katu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.sip-router.org/pipermail/sr-users/attachments/20120921/6841d5dc/attachment-0001.htm>


More information about the sr-users mailing list