[Kamailio-Devel] OpenSer 1.3.0 crash
Robin Vleij
robin at swip.net
Wed Sep 17 14:34:19 CEST 2008
Henning Westerholt wrote:
Henning,
This became a bit of a long mail, sorry for that. I'm trying to find out
if there's a bug (maybe it's already fixed after our 1.3.0) or if I have
something wrong. The core I'm first looking at has to do with sending on
an ACK and seems related to finding a free proxy (I assume using the lcr
code) to send it on.
>> What do you need more from me and how do I get that with gdb and my cores?
>
> The h_addr_list should normally contain a list of hosts, it seems that this
> variable gets somehow corrupt. You could investigate in previous frames why
> this happens, just change with e.g. "f 1" [1] to the first frame, and examine
> the variables there, and so on. You can also print the actual source code of
> the function with "list" [2].
Right. Interesting, I've been reading the pages you linked about gdb a
bit and have been throwing commands at the prompt now. I see lots of
things, but since I'm not a developer I don't get much wiser.
I understand so far, I think. The problem lies in the following function:
192 void free_hostent(struct hostent *dst)
193 {
194 int r;
195 if (dst->h_name) pkg_free(dst->h_name);
196 if (dst->h_aliases){
197 for(r=0; dst->h_aliases[r]; r++) {
198 pkg_free(dst->h_aliases[r]);
199 }
200 pkg_free(dst->h_aliases);
201 }
202 if (dst->h_addr_list){
203 for (r=0; dst->h_addr_list[r];r++) {
204 pkg_free(dst->h_addr_list[r]);
205 }
206 pkg_free(dst->h_addr_list);
207 }
208 }
Somewhere something went funny with the dst->h_addr_list, since that one
doesn't seem to be readable (ie, wrong memory pointers). But where does
this dst->h_addr_list come from? I guess from the step f 1, where I find
the following function:
309 void free_proxy(struct proxy_l* p)
310 {
311 if (p) {
312 free_hostent(&p->host);
313 free_dns_res( p );
314 }
315 }
and print p->host gives:
$25 = {h_name = 0x6e05e8 "sip-corporate2.tele2.se", h_aliases =
0x6aee60, h_addrtype = 2, h_length = 4, h_addr_list = 0x38}
There's the h_addr_list and the invalid pointer there.
One step up again I find the following code:
217 /* ACKs do not establish a transaction and are fwd-ed
statelessly */
218 if ( p_msg->REQ_METHOD==METHOD_ACK) {
219 LM_DBG("forwarding ACK\n");
220 /* send it out */
221 if (proxy==0) {
222 uri = GET_RURI(p_msg);
223 proxy=uri2proxy(GET_NEXT_HOP(p_msg),
PROTO_NONE);
224 if (proxy==0) {
225 ret=E_BAD_ADDRESS;
226 goto done;
227 }
228 ret=forward_request( p_msg , proxy);
229 if (ret>=0) ret=1;
230 free_proxy( proxy );
231 pkg_free( proxy );
232 } else {
233 ret=forward_request( p_msg , proxy);
234 if (ret>=0) ret=1;
235 }
236 goto done;
And I do here:
(gdb) print proxy
$26 = (struct proxy_l *) 0x6e3fc0
(gdb) x 0x6e3fc0
0x6e3fc0 <mem_pool+709280>: 0
Not sure how this is supposed to work and if I'm at all on the right
way. I stop here, lost it. :)
Now we get to core2 that we got after a normal invite to a test system.
> Just as you did in the first place, just print them in gdb, and try to
> investigate the source of the problem. :-)
This core was related to the following bt:
#0 free_lump_list (l=0x636d20) at data_lump.c:412
#1 0x000000000048ed02 in free_sip_msg (msg=0x6df7b8) at
parser/msg_parser.c:661
#2 0x000000000044b4b9 in receive_msg (
buf=0x625ca0 "INVITE sip:blah at domain;transport=UDP SIP/2.0\r\nFrom:
<sip:blah2 at test_server:5060>;tag=pstn20080913184322\r\nTo:
<sip:blah2 at domain:5060>\r\nCall-ID: pstn20080913184322-1-1"..., len=728,
rcv_info=0x7fffee85b310) at receive.c:206
#3 0x0000000000488154 in udp_rcv_loop () at udp_server.c:438
#4 0x0000000000425081 in main (argc=9, argv=0x7fffee85b518) at main.c:834
Here I go to f 2 and the code there is:
655 void free_sip_msg(struct sip_msg* msg)
656 {
657 if (msg->new_uri.s) { pkg_free(msg->new_uri.s);
msg->new_uri.len=0; }
658 if (msg->dst_uri.s) { pkg_free(msg->dst_uri.s);
msg->dst_uri.len=0; }
659 if (msg->path_vec.s) { pkg_free(msg->path_vec.s);
msg->path_vec.len=0; }
660 if (msg->headers) free_hdr_field_lst(msg->headers);
661 if (msg->add_rm) free_lump_list(msg->add_rm);
662 if (msg->body_lumps) free_lump_list(msg->body_lumps);
663 if (msg->reply_lump) free_reply_lump(msg->reply_lump);
664 /* don't free anymore -- now a pointer to a static buffer */
665 # ifdef DYN_BUF
666 pkg_free(msg->buf);
667 # endif
668 }
print msg->add_rm gives $3 = (struct lump *) 0x707d98.
Looks all OK to me, but there is something that goes terribly wrong in
this free_lump_list. Sounds like some memory problem or so, why on earth
would such a basic thing go wrong?
The only thing different in the config there was that I was fiddling
with the avp_fr_timer_avp values, which I reverted later on. It gave
some "invalid type assignment" error, but that seems unrelated to this.
Any ideas in which direction I have to search further now? I'd like to
upgrade to the latest 1.3, but I can't do that now.
/robin
--
Robin Vleij
robin at swip.net
More information about the Devel
mailing list