Hi there,
We are encountering consistent segfaults after rebooting our Kamailio instance with
incoming traffic, specifically when using Kamailio 5.7.4. We think this issue did not
occur with version 5.7.2, so it seems to have been introduced in either 5.7.3 or 5.7.4.
Due to team bandwidth constraints and the potential impact on production traffic, we
don't want to spend time on trying to reproduce the issue. So we have decided to
downgrade to 5.6.4, which we confirmed to be stable. (Probably 5.7.2 would be too - but we
didn't try).
Unfortunately, our logging was only set to WARNING level, and we did not capture a core
dump, so we cannot provide additional details beyond the following logs:
This was with tcp_reuse_ports=yes:
2024-05-17T15:42:55.582475541Z Listening on
2024-05-17T15:42:55.582512370Z [redacted]
2024-05-17T15:42:55.582538161Z tls: 10.X.X.X:5061 advertise Y.Y.Y:5061
2024-05-17T15:42:55.582543750Z Aliases:
2024-05-17T15:42:55.582549081Z tls: [redacted]:5061
2024-05-17T15:42:55.582574890Z
2024-05-17T15:42:55.587876630Z 0(1) WARNING: tls [tls_init.c:978]: tls_h_mod_init_f():
openssl bug #1491 (crash/mem leaks on low memory) workaround enabled (on low memory tls
operations will fail preemptively) with free memory thresholds 18874368 and 9437184 bytes
2024-05-17T15:42:55.703927049Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 23
2024-05-17T15:42:55.703972029Z 0(1) ALERT: <core> [main.c:791]: handle_sigs():
child process 15 exited by a signal 11
2024-05-17T15:42:55.703978409Z 0(1) ALERT: <core> [main.c:795]: handle_sigs(): core
was generated
2024-05-17T15:42:55.705049839Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 17
2024-05-17T15:42:55.705074209Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 21
2024-05-17T15:42:55.705081209Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 22
2024-05-17T15:42:55.705085879Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 20
2024-05-17T15:42:55.705090319Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 18
2024-05-17T15:42:55.705094649Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 19
2024-05-17T15:42:55.705098879Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 16
2024-05-17T15:42:55.705207399Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 15
2024-05-17T15:42:55.705459439Z 35(41) CRITICAL: <core> [core/pass_fd.c:281]:
receive_fd(): EOF on 27
Without tcp_reuse_ports=yes, the segfault was always preceded by the following line if any
existing TLS connections were stuck in TIME_WAIT:
2024-05-16T19:18:51.654447639Z 9(14) WARNING: {1 1 INVITE XXX(a)0.0.0.0} <core>
[core/tcp_main.c:1301]: find_listening_sock_info(): binding to source address
10.X.X>X:5061 failed: Address already in use [98]
2024-05-16T19:18:51.746994728Z 0(1) ALERT: <core> [main.c:791]: handle_sigs():
child process 14 exited by a signal 11
When the server wasn't handling any traffic, the issue didn't occur even in
5.7.4.
Does anyone have any insights or suggestions on how to address this issue?
Kind regards
Stefan