Constant crashes across versions

List overview All Threads
Download

newer

older

Prioritize specific codecs

Info: sipexer v1.2.0 released

Jon Bonilla (Manwe)

28 Jan 2025 28 Jan '25

10:13 a.m.

Hi all

I have a setup of several proxies behind a load balancer and all of them have several coredumps every day. I've tried versions 5.3, 5.5 and 5.8 and all of them crash

I didn't have dbgsym packages installed but now I've built a 5.8.4 version in one of them and can gdb the coredumps. Can't see anything particular and I'm wondering if I'm doing anything wrong.

I'm descompressing the coredump with lz4cat. Sometimes I see that the system generates 3 coredumps at the same time. I've tried running gdb in one of them and exec "bt full"

I usually see "can't access memory" in some of them but inb others I don't see anything relevant.

I'm attaching one coredump. I don't even know if I'm doing it properly. Could you please guide me to how to debug what's going on?

thanks

Attachments:

core2.txt (text/plain — 17.8 KB)
attachment.sig (application/pgp-signature — 833 bytes)

Show replies by date

Daniel-Constantin Mierla

28 Jan 28 Jan

10:33 a.m.

Hello,

is the attached backtrace from 5.8.4?

If you get many core dump files at the same time, attach the full backtrace for each of them, because usually one is revealing the reason of crash and the others are just side effects, but all need to be investigated in order to see which one is important for troubleshooting.

Some other details would be useful:

- what is the operating system you run? - is it a dedicated server, or some virtualization system (docker/kubernetes, virtual machine, ...)? - is it under high load when it happens, or some resources not available (e.g., database backend)? - can you list the modules that are loaded in kamailio config? Any with custom code, or all from stock kamailio repo?

Cheers, Daniel

On 28.01.25 11:12, Jon Bonilla (Manwe) via sr-users wrote:

...

Hi all

I have a setup of several proxies behind a load balancer and all of them have several coredumps every day. I've tried versions 5.3, 5.5 and 5.8 and all of them crash

I didn't have dbgsym packages installed but now I've built a 5.8.4 version in one of them and can gdb the coredumps. Can't see anything particular and I'm wondering if I'm doing anything wrong.

I'm descompressing the coredump with lz4cat. Sometimes I see that the system generates 3 coredumps at the same time. I've tried running gdb in one of them and exec "bt full"

I usually see "can't access memory" in some of them but inb others I don't see anything relevant.

I'm attaching one coredump. I don't even know if I'm doing it properly. Could you please guide me to how to debug what's going on?

thanks

Kamailio - Users Mailing List - Non Commercial Discussions -- sr-users@lists.kamailio.org To unsubscribe send an email to sr-users-leave@lists.kamailio.org Important: keep the mailing list in the recipients, do not reply only to the sender!

-- Daniel-Constantin Mierla (@ asipto.com) twitter.com/miconda -- linkedin.com/in/miconda Kamailio Consultancy, Training and Development Services -- asipto.com Kamailio World Conference, May 12-13, 2025, Berlin -- kamailioworld.com

Jon Bonilla (Manwe)

11:29 a.m.

El Tue, 28 Jan 2025 11:33:19 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:

...

Hello,

Hi Daniel

...

is the attached backtrace from 5.8.4?

Yes, it's a 5.8.4 with some sipwise patches. I built the upstream master of their repo.

version: kamailio 5.8.4 (x86_64/linux) 23a581-dirty

...

If you get many core dump files at the same time, attach the full backtrace for each of them, because usually one is revealing the reason of crash and the others are just side effects, but all need to be investigated in order to see which one is important for troubleshooting.

Ok. I'm attaching the trace of 2 crashes, 3 coredumps each.

...

Some other details would be useful:

- what is the operating system you run?

It's debian 10

...

- is it a dedicated server, or some virtualization system (docker/kubernetes, virtual machine, ...)?

All servers are bare metal

...

- is it under high load when it happens, or some resources not available (e.g., database backend)?

No, it happens both high load and low load (night and day). The ones attaching happened during the night and low load.

...

- can you list the modules that are loaded in kamailio config? Any with custom code, or all from stock kamailio repo?

Tested with 3 versions of kamailio but they are sipwise version. I know they push upstream but it won't be 100% stock kamailio.

modules are a bunch of them really.

loadmodule "db_mysql.so" loadmodule "db_redis.so" loadmodule "auth.so" loadmodule "auth_db.so" loadmodule "tm.so" loadmodule "tmx.so" loadmodule "sl.so" loadmodule "rr.so" loadmodule "pv.so" loadmodule "maxfwd.so" loadmodule "usrloc.so" loadmodule "registrar.so" loadmodule "textops.so" loadmodule "uri_db.so" loadmodule "siputils.so" loadmodule "utils.so" loadmodule "xlog.so" loadmodule "sanity.so" loadmodule "acc.so" loadmodule "nathelper.so" loadmodule "rtpengine.so" loadmodule "domain.so" loadmodule "ctl.so" loadmodule "xmlrpc.so" loadmodule "cfg_rpc.so" loadmodule "cfgutils.so" loadmodule "avpops.so" loadmodule "sqlops.so" loadmodule "uac.so" loadmodule "kex.so" loadmodule "lcr.so" loadmodule "dispatcher.so" loadmodule "permissions.so" loadmodule "uac_redirect.so" loadmodule "dialplan.so" loadmodule "speeddial.so" loadmodule "dialog.so" loadmodule "tmrec.so" loadmodule "diversion.so" loadmodule "corex.so" loadmodule "textopsx.so" loadmodule "sdpops.so" loadmodule "htable.so" loadmodule "jansson.so" loadmodule "pv_headers.so" loadmodule "secsipid.so" loadmodule "jsonrpcs.so" loadmodule "app_lua.so"

Daniel-Constantin Mierla

29 Jan 29 Jan

8:27 p.m.

A cause for the crash is revealed by the backtrace:

#0 0x00007fbd7e07b8eb in raise () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 0x00007fbd7e066535 in abort () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #2 0x000055fff797d98f in qm_debug_check_frag (qm=qm@entry=0x7fbb6e9b9000, f=f@entry=0x7fbb6ee08940, file=file@entry=0x7fbd6eeb34f5 "permissions: hash.c", line=line@entry=402, eline=eline@entry=603, efile=0x55fff7ac8d48 "core/mem/q_malloc.c") at core/mem/q_malloc.c:132 p = <optimized out> __func__ = "qm_debug_check_frag"

Which point to the abort line at src/core/mem/q_malloc.c:132, respectively:

if(f->check != ST_CHECK_PATTERN) { LM_CRIT("BUG: qm: fragm. %p (address %p) " "beginning overwritten (%lx)! Memory allocator was called " "from %s:%u. Fragment marked by %s:%lu. Exec from %s:%u.\n", f, (char *)f + sizeof(struct qm_frag), f->check, file, line, f->file, f->line, efile, eline); qm_status(qm); abort(); };

That means there is a buffer overflow or writing at a wrong address.

I would suggest that you review the additional patches you apply to the stock Kamailio, because no similar crash has been reported to the project and such case wouldn't last long to show up in deployments, it is a high chance that the fault is coming from those additional patches.

Cheers, Daniel

On 28.01.25 12:29, Jon Bonilla (Manwe) wrote:

...

El Tue, 28 Jan 2025 11:33:19 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:

...
Hello,

Hi Daniel

...
is the attached backtrace from 5.8.4?

Yes, it's a 5.8.4 with some sipwise patches. I built the upstream master of their repo.

version: kamailio 5.8.4 (x86_64/linux) 23a581-dirty

...
If you get many core dump files at the same time, attach the full backtrace for each of them, because usually one is revealing the reason of crash and the others are just side effects, but all need to be investigated in order to see which one is important for troubleshooting.

Ok. I'm attaching the trace of 2 crashes, 3 coredumps each.

...
Some other details would be useful:

- what is the operating system you run?

It's debian 10

...
- is it a dedicated server, or some virtualization system (docker/kubernetes, virtual machine, ...)?

All servers are bare metal

...
- is it under high load when it happens, or some resources not available (e.g., database backend)?

No, it happens both high load and low load (night and day). The ones attaching happened during the night and low load.

...
- can you list the modules that are loaded in kamailio config? Any with custom code, or all from stock kamailio repo?

Tested with 3 versions of kamailio but they are sipwise version. I know they push upstream but it won't be 100% stock kamailio.

modules are a bunch of them really.

loadmodule "db_mysql.so" loadmodule "db_redis.so" loadmodule "auth.so" loadmodule "auth_db.so" loadmodule "tm.so" loadmodule "tmx.so" loadmodule "sl.so" loadmodule "rr.so" loadmodule "pv.so" loadmodule "maxfwd.so" loadmodule "usrloc.so" loadmodule "registrar.so" loadmodule "textops.so" loadmodule "uri_db.so" loadmodule "siputils.so" loadmodule "utils.so" loadmodule "xlog.so" loadmodule "sanity.so" loadmodule "acc.so" loadmodule "nathelper.so" loadmodule "rtpengine.so" loadmodule "domain.so" loadmodule "ctl.so" loadmodule "xmlrpc.so" loadmodule "cfg_rpc.so" loadmodule "cfgutils.so" loadmodule "avpops.so" loadmodule "sqlops.so" loadmodule "uac.so" loadmodule "kex.so" loadmodule "lcr.so" loadmodule "dispatcher.so" loadmodule "permissions.so" loadmodule "uac_redirect.so" loadmodule "dialplan.so" loadmodule "speeddial.so" loadmodule "dialog.so" loadmodule "tmrec.so" loadmodule "diversion.so" loadmodule "corex.so" loadmodule "textopsx.so" loadmodule "sdpops.so" loadmodule "htable.so" loadmodule "jansson.so" loadmodule "pv_headers.so" loadmodule "secsipid.so" loadmodule "jsonrpcs.so" loadmodule "app_lua.so"

Jon Bonilla (Manwe)

17 Feb 17 Feb

10:46 a.m.

El Wed, 29 Jan 2025 21:27:13 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:

...

That means there is a buffer overflow or writing at a wrong address.

I would suggest that you review the additional patches you apply to the stock Kamailio, because no similar crash has been reported to the project and such case wouldn't last long to show up in deployments, it is a high chance that the fault is coming from those additional patches.

Hi Daniel

Sorry for replying so late. I've been working on this and testing stability takes some days for every change.

First I realized that there was a big memory pressure in the systems by other processes and took care of that. It helped a bit but still not enough.

I had also a bottleneck in a shared redis-server where the dialog module was storing dialog and dialog profiles. Removing that shared redis-server and removing the dialog profiles has helped a lot with overall system stability. From several crashes per day to almost 0 crashes.

I'm not sure additional patches are the cause here. Anyway I think that now new coredumps, if they show up, will be more reliable because there are fewer external noise and the systems run smoother.

I'll continue testing and will let you know.

thanks,

Jon

-- PekePBX, the multitenant PBX solution https://pekepbx.com

163

Age (days ago)

183

Last active (days ago)

sr-users@lists.kamailio.org

4 comments

2 participants

tags (0)

participants (2)

Daniel-Constantin Mierla
Jon Bonilla (Manwe)