[Kamailio-Devel] OpenSer 1.3.0 crash

Robin Vleij robin at swip.net
Wed Sep 17 14:34:19 CEST 2008


Henning Westerholt wrote:

Henning,

This became a bit of a long mail, sorry for that. I'm trying to find out
if there's a bug (maybe it's already fixed after our 1.3.0) or if I have
something wrong. The core I'm first looking at has to do with sending on
an ACK and seems related to finding a free proxy (I assume using the lcr
code) to send it on.

>> What do you need more from me and how do I get that with gdb and my cores?
> 
> The h_addr_list should normally contain a list of hosts, it seems that this 
> variable gets somehow corrupt. You could investigate in previous frames why 
> this happens, just change with e.g. "f 1" [1] to the first frame, and examine 
> the variables there, and so on. You can also print the actual source code of 
> the function with "list" [2].

Right. Interesting, I've been reading the pages you linked about gdb a
bit and have been throwing commands at the prompt now. I see lots of
things, but since I'm not a developer I don't get much wiser.
I understand so far, I think. The problem lies in the following function:

192     void free_hostent(struct hostent *dst)
193     {
194             int r;
195             if (dst->h_name) pkg_free(dst->h_name);
196             if (dst->h_aliases){
197                     for(r=0; dst->h_aliases[r]; r++) {
198                             pkg_free(dst->h_aliases[r]);
199                     }
200                     pkg_free(dst->h_aliases);
201             }
202             if (dst->h_addr_list){
203                     for (r=0; dst->h_addr_list[r];r++) {
204                             pkg_free(dst->h_addr_list[r]);
205                     }
206                     pkg_free(dst->h_addr_list);
207             }
208     }

Somewhere something went funny with the dst->h_addr_list, since that one
doesn't seem to be readable (ie, wrong memory pointers). But where does
this dst->h_addr_list come from? I guess from the step f 1, where I find
the following function:

309     void free_proxy(struct proxy_l* p)
310     {
311             if (p) {
312                     free_hostent(&p->host);
313                     free_dns_res( p );
314             }
315     }

and print p->host gives:

$25 = {h_name = 0x6e05e8 "sip-corporate2.tele2.se", h_aliases =
0x6aee60, h_addrtype = 2, h_length = 4, h_addr_list = 0x38}

There's the h_addr_list and the invalid pointer there.

One step up again I find the following code:

217             /* ACKs do not establish a transaction and are fwd-ed
statelessly */
218             if ( p_msg->REQ_METHOD==METHOD_ACK) {
219                     LM_DBG("forwarding ACK\n");
220                     /* send it out */
221                     if (proxy==0) {
222                             uri = GET_RURI(p_msg);
223                             proxy=uri2proxy(GET_NEXT_HOP(p_msg),
PROTO_NONE);
224                             if (proxy==0) {
225                                             ret=E_BAD_ADDRESS;
226                                             goto done;
227                             }
228                             ret=forward_request( p_msg , proxy);
229                             if (ret>=0) ret=1;
230                             free_proxy( proxy );
231                             pkg_free( proxy );
232                     } else {
233                             ret=forward_request( p_msg , proxy);
234                             if (ret>=0) ret=1;
235                     }
236                     goto done;

And I do here:

(gdb) print proxy
$26 = (struct proxy_l *) 0x6e3fc0
(gdb) x 0x6e3fc0
0x6e3fc0 <mem_pool+709280>:     0

Not sure how this is supposed to work and if I'm at all on the right
way. I stop here, lost it. :)

Now we get to core2 that we got after a normal invite to a test system.

> Just as you did in the first place, just print them in gdb, and try to 
> investigate the source of the problem. :-)

This core was related to the following bt:

#0  free_lump_list (l=0x636d20) at data_lump.c:412
#1  0x000000000048ed02 in free_sip_msg (msg=0x6df7b8) at
parser/msg_parser.c:661
#2  0x000000000044b4b9 in receive_msg (
    buf=0x625ca0 "INVITE sip:blah at domain;transport=UDP SIP/2.0\r\nFrom:
<sip:blah2 at test_server:5060>;tag=pstn20080913184322\r\nTo:
<sip:blah2 at domain:5060>\r\nCall-ID: pstn20080913184322-1-1"..., len=728,
rcv_info=0x7fffee85b310) at receive.c:206
#3  0x0000000000488154 in udp_rcv_loop () at udp_server.c:438
#4  0x0000000000425081 in main (argc=9, argv=0x7fffee85b518) at main.c:834

Here I go to f 2 and the code there is:

655     void free_sip_msg(struct sip_msg* msg)
656     {
657             if (msg->new_uri.s) { pkg_free(msg->new_uri.s);
msg->new_uri.len=0; }
658             if (msg->dst_uri.s) { pkg_free(msg->dst_uri.s);
msg->dst_uri.len=0; }
659             if (msg->path_vec.s) { pkg_free(msg->path_vec.s);
msg->path_vec.len=0; }
660             if (msg->headers)     free_hdr_field_lst(msg->headers);
661             if (msg->add_rm)      free_lump_list(msg->add_rm);
662             if (msg->body_lumps)  free_lump_list(msg->body_lumps);
663             if (msg->reply_lump)   free_reply_lump(msg->reply_lump);
664             /* don't free anymore -- now a pointer to a static buffer */
665     #       ifdef DYN_BUF
666             pkg_free(msg->buf);
667     #       endif
668     }

print msg->add_rm gives $3 = (struct lump *) 0x707d98.

Looks all OK to me, but there is something that goes terribly wrong in
this free_lump_list. Sounds like some memory problem or so, why on earth
would such a basic thing go wrong?
The only thing different in the config there was that I was fiddling
with the avp_fr_timer_avp values, which I reverted later on. It gave
some "invalid type assignment" error, but that seems unrelated to this.


Any ideas in which direction I have to search further now? I'd like to
upgrade to the latest 1.3, but I can't do that now.

/robin

-- 
Robin Vleij
robin at swip.net



More information about the Devel mailing list