A cause for the crash is revealed by the backtrace:
#0 0x00007fbd7e07b8eb in raise () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 0x00007fbd7e066535 in abort () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #2 0x000055fff797d98f in qm_debug_check_frag (qm=qm@entry=0x7fbb6e9b9000, f=f@entry=0x7fbb6ee08940, file=file@entry=0x7fbd6eeb34f5 "permissions: hash.c", line=line@entry=402, eline=eline@entry=603, efile=0x55fff7ac8d48 "core/mem/q_malloc.c") at core/mem/q_malloc.c:132 p = <optimized out> __func__ = "qm_debug_check_frag"
Which point to the abort line at src/core/mem/q_malloc.c:132, respectively:
if(f->check != ST_CHECK_PATTERN) { LM_CRIT("BUG: qm: fragm. %p (address %p) " "beginning overwritten (%lx)! Memory allocator was called " "from %s:%u. Fragment marked by %s:%lu. Exec from %s:%u.\n", f, (char *)f + sizeof(struct qm_frag), f->check, file, line, f->file, f->line, efile, eline); qm_status(qm); abort(); };
That means there is a buffer overflow or writing at a wrong address.
I would suggest that you review the additional patches you apply to the stock Kamailio, because no similar crash has been reported to the project and such case wouldn't last long to show up in deployments, it is a high chance that the fault is coming from those additional patches.
Cheers, Daniel
On 28.01.25 12:29, Jon Bonilla (Manwe) wrote:
El Tue, 28 Jan 2025 11:33:19 +0100 Daniel-Constantin Mierla miconda@gmail.com escribió:
Hello,
Hi Daniel
is the attached backtrace from 5.8.4?
Yes, it's a 5.8.4 with some sipwise patches. I built the upstream master of their repo.
version: kamailio 5.8.4 (x86_64/linux) 23a581-dirty
If you get many core dump files at the same time, attach the full backtrace for each of them, because usually one is revealing the reason of crash and the others are just side effects, but all need to be investigated in order to see which one is important for troubleshooting.
Ok. I'm attaching the trace of 2 crashes, 3 coredumps each.
Some other details would be useful:
- what is the operating system you run?
It's debian 10
- is it a dedicated server, or some virtualization system (docker/kubernetes, virtual machine, ...)?
All servers are bare metal
- is it under high load when it happens, or some resources not available (e.g., database backend)?
No, it happens both high load and low load (night and day). The ones attaching happened during the night and low load.
- can you list the modules that are loaded in kamailio config? Any with custom code, or all from stock kamailio repo?
Tested with 3 versions of kamailio but they are sipwise version. I know they push upstream but it won't be 100% stock kamailio.
modules are a bunch of them really.
loadmodule "db_mysql.so" loadmodule "db_redis.so" loadmodule "auth.so" loadmodule "auth_db.so" loadmodule "tm.so" loadmodule "tmx.so" loadmodule "sl.so" loadmodule "rr.so" loadmodule "pv.so" loadmodule "maxfwd.so" loadmodule "usrloc.so" loadmodule "registrar.so" loadmodule "textops.so" loadmodule "uri_db.so" loadmodule "siputils.so" loadmodule "utils.so" loadmodule "xlog.so" loadmodule "sanity.so" loadmodule "acc.so" loadmodule "nathelper.so" loadmodule "rtpengine.so" loadmodule "domain.so" loadmodule "ctl.so" loadmodule "xmlrpc.so" loadmodule "cfg_rpc.so" loadmodule "cfgutils.so" loadmodule "avpops.so" loadmodule "sqlops.so" loadmodule "uac.so" loadmodule "kex.so" loadmodule "lcr.so" loadmodule "dispatcher.so" loadmodule "permissions.so" loadmodule "uac_redirect.so" loadmodule "dialplan.so" loadmodule "speeddial.so" loadmodule "dialog.so" loadmodule "tmrec.so" loadmodule "diversion.so" loadmodule "corex.so" loadmodule "textopsx.so" loadmodule "sdpops.so" loadmodule "htable.so" loadmodule "jansson.so" loadmodule "pv_headers.so" loadmodule "secsipid.so" loadmodule "jsonrpcs.so" loadmodule "app_lua.so"