Description

After upgrading from Kamailio 5.2.x, a high volume Kamailio 5.4.4 instance randomly crashes with either a general protection or segfault error message in siptrace.so during use of sip_trace function from one of its child processes (which cascades to the parent crashing). This appears to occur once about every 36 hours on average, but has not yet appeared to correspond with any particular event.

We are continuing to collect debug information and will be populating this ticket as more information becomes available. However, this issue has been observed.

Sip trace function is applied in this example snippet:

# ------- siptrace --------
modparam("siptrace", "hep_mode_on", 1)
modparam("siptrace", "hep_version", 3)
modparam("siptrace", "trace_to_database", 0)
modparam("siptrace", "trace_flag", 22)
modparam("siptrace", "trace_on", 1)

request_route {
    #....
    if ( is_method("INVITE") && !has_totag() ) {
        # Only start sip_trace on initial INVITE
        sip_trace("HEP_URL","$ci-MY_IP","d");
    }
    setflag(22);
    #...
}

Troubleshooting

We attempted packet collection with Homer v5 and Homer v7 and changed between HEP protocol v2 and v3.

Reproduction

We have not determined a means of reproducing this issue without simply letting the server run until a crash occurs. There are four almost identical servers all experiencing the same random crashing but not at the same time.

Debugging Data

Our next troubleshooting case will be to simply comment out the sip_trace function, but this effectively disables the siptrace module completely rather than addressing an underlying problem.

Core dumps are still in-progress for retrieval. Debug logs should also be more readily available soon. There will be delays since these are high volume production servers.

Log Messages

All of them have randomly crashed with the following example log entry. Regardless of troubleshooting tactics to date:

kernel: traps: kamailio[7579] general protection ip:7fb1a64e2dbf sp:7ffc60f04180 error:0 in siptrace.so[7fb1a64b8000+4e000]
systemd: kamailio.service: main process exited, code=exited, status=1/FAILURE
systemd: Unit kamailio.service entered failed state.
systemd: kamailio.service failed.

SIP Traffic

To date, there is no corresponding SIP Traffic with the crash.

Possible Solutions

To date, only disabling the siptrace module seems to be the solution.

Additional Information

version: kamailio 5.4.4 (x86_64/linux) e16352
flags: USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES
ADAPTIVE_WAIT_LOOPS 1024, MAX_RECV_BUFFER_SIZE 262144, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
id: e16352 
compiled on 15:56:46 Feb 15 2021 with gcc 4.8.5
Linux <hostname> 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.