### Description I've kamailio 4.2.3 (i386/freebsd) which works perfectly with 200-300 cps for years.
Now I need migrate to freebsd amd64. I've installed kamailio 4.4.5 (amd64/freebsd), copied kamailio.cfg, made a test call - it was all right.
But when I start work load (200-300 cps), kamailio core dumped and exited. ### Troubleshooting
#### Reproduction Compile kamailio on FreeBSD amd64 and load it ~200-300 cps #### Debugging Data gdb's bt is here: http://tmp.lehis.ru/kam_gdb.txt I can't enable debug symbols for gdb (tried gmake mode=debug, which added EXTRA_DEBUG to flags). How can I do it? #### Log Messages syslog (skipped "<script>:" having): http://tmp.lehis.ru/kam_log.txt ### Additional Information version: kamailio 4.4.5 (amd64/freebsd) f98162 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_PTHREAD_MUTEX, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, select, kqueue. id: f98162 compiled on 12:25:48 Mar 1 2017 with clang 3.4
OS: FreeBSD sip7 10.3-STABLE FreeBSD 10.3-STABLE #0: Wed Feb 15 12:13:46 MSK 2017 root@sip7:/usr/obj/usr/src/sys/SIP7 amd64 (r313760)
The core is generated by the shutdown procedure, so not a runtime event, as side effect of running out of memory when building the transaction -- maybe some of the fields might be inconsistent. It needs to be fixed, but it is a bit hard to track it without proper core file and debugging symbols.
What are the values for shared memory and private memory (the -m and -M command line parameters)?
Did you killed kamailio or it was stopping itself?
If the later, can you be sure that each process generates a core file, because shutdown procedure can overwrite the core file that is generated at startup. You should have at least two core files in such case.
I've not use any shared memory parameters during kamailio starts. kamcmd shows:
``` kamcmd> core.shmmem { total: 67108864 free: 61856936 used: 5164808 real_used: 5251928 max_used: 5258528 fragments: 10 } ```
Yes, kamailio stops by itself. I'm sure that I've only one core file. I've detected one more thing: core happens when kamailio receives multiple BYE.
But your words about multi core files are catalyze me for one interesting things: I've tried to set children=1 at kamailio.cfg and no core dump at all! But when it set to more than 1, core dump occurs.
I'm ready for any assistance.
I don't know about *BSD, but in Linux there is an option to enable corefiles per process (per pid). It is something like:
``` echo "1" > /proc/sys/kernel/core_uses_pid ```
Can you see if there is something similar for *BSD, enable it and test again?
I also know little of BSD, but this is what was required on Linux to get a dump into /tmp:
phil@ua-proxy-01:/etc/sysctl.d$ cat 61-core-pattern.conf fs.suid_dumpable=2 kernel.core_uses_pid=1 kernel.core_pattern=/tmp/core.%e.%p.%h.%t
From: sr-dev [mailto:sr-dev-bounces@lists.sip-router.org] On Behalf Of Daniel-Constantin Mierla Sent: 09 March 2017 12:02 To: Kamailio Devel List sr-dev@lists.sip-router.org Cc: Subscribed subscribed@noreply.github.com Subject: Re: [sr-dev] [kamailio/kamailio] Kamailio core dumped (#1021)
I don't know about *BSD, but in Linux there is an option to enable corefiles per process (per pid). It is something like:
echo "1" > /proc/sys/kernel/core_uses_pid
Can you see if there is something similar for *BSD, enable it and test again?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/kamailio/kamailio/issues/1021#issuecomment-285333557, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AF36ZYmXju3xSskPpRjzfkfWJDhpsoyvks5rj-pGgaJpZM4MVTh1.
I also know little of BSD, but this is what was required on Linux to get a dump into /tmp:
```phil@ua-proxy-01:/etc/sysctl.d$ cat 61-core-pattern.conf fs.suid_dumpable=2 kernel.core_uses_pid=1 kernel.core_pattern=/tmp/core.%e.%p.%h.%t```
Yes, I've got it. In FreeBSD corefiles per PID can sets via sysctl: sysctl kern.corefile=/mnt/coredumps/%N.%P.core, where are %N - name of process, %P - number of pid
I've run kamailio again and got only one core, named as: ``` # ls -1 /mnt/coredumps/ kamailio.84671.core # ``` And logs says that only one core was generate (for PID 84671):
Mar 9 15:27:38 sip7 /usr/local/sbin/kamailio[84666]: ALERT: <core> [main.c:740]: handle_sigs(): child process 84671 exited by a signal 11 Mar 9 15:27:38 sip7 /usr/local/sbin/kamailio[84666]: ALERT: <core> [main.c:743]: handle_sigs(): core was generated Mar 9 15:27:38 sip7 /usr/local/sbin/kamailio[84666]: INFO: <core> [main.c:755]: handle_sigs(): terminating due to SIGCHLD
Here is log for this pid: http://tmp.lehis.ru/tmp/kam.log.txt
How can I turn on debug symbols?
What is the gdb backtrace for the new core?
It's here: http://tmp.lehis.ru/tmp/kam_bt.txt
Proper gdb backtrace with debugging symbols is needed here.
Do you compile from sources directly, or do you do it via ports/something else? Can you try with gcc instead if clang? Debugging symbols should be compiled by default, at least with gcc, as I haven't contributed the clang part, I can't say it for sure for it.
I've compiled kamailio via ports. Well, I'll try to build it directly from sources with gcc. I'll be back :)
I've built kamailio with gcc:
version: kamailio 4.4.5 (x86_64/freebsd) f98162 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, select, kqueue. id: f98162 compiled on 18:20:17 Mar 9 2017 with gcc 4.9.4
Now I can't reproduce core dump. I'll try to dig in it and post results later.
Daniel, I can't reproduce core dump with gcc - it works perfectly. I've turned on debug symbols with clang: after last core dump I've five core files (for all childs?). Here is last logs: http://tmp.lehis.ru/tmp/kam_log_2.txt Here is bt of PID 82699: http://tmp.lehis.ru/tmp/kam_bt_2.txt
The crash happens due to `signlal 10` -- can you get the output of `kill -l` to see what is that on your system? It might be specific to the OS, like SIGUSR1 or SIGBUS...
FreeBSD 11.0-RELEASE-p2 FreeBSD 11.0-RELEASE-p2 #0: Mon Oct 24 06:55:27 UTC 2016 $ kill -l 1) SIGHUP 2) SIGINT 3) SIGQUIT 4) SIGILL 5) SIGTRAP 6) SIGABRT 7) SIGEMT 8) SIGFPE 9) SIGKILL 10) SIGBUS 11) SIGSEGV 12) SIGSYS 13) SIGPIPE 14) SIGALRM 15) SIGTERM 16) SIGURG 17) SIGSTOP 18) SIGTSTP 19) SIGCONT 20) SIGCHLD 21) SIGTTIN 22) SIGTTOU 23) SIGIO 24) SIGXCPU 25) SIGXFSZ 26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGINFO 30) SIGUSR1 31) SIGUSR2
In the bt file, is `list` output taken for the same corefile in the frame 0?
So 10 is `SIGBUS`, which typically appears due to unaligned memory access, but can be other reasons:
* https://en.wikipedia.org/wiki/Bus_error
Can you check on all core files that you got after enabling it per pid to see if they crash at the same point in code (is fame 0 line the same)?
If you compile with clang directly in the source tree, without using the ports, is it still crashing?
Looking at the backtraces, the SIGBUS can happen only if there is a strict alignment requirement of 8 bytes, which is probably not ensured for all members of structures. There is typically an alignment to 4 bytes, which is a more common requirement, rather than 8 bytes.
If the 8 bytes alignment is not the strict requirement, then there can be some paging error like suggested in the wikipedia article linked above. This is probably more like OS configuration/restriction on dealing with no available memory cases.
Yes, it is. But now it seems that they crash at the same point in code. Here is bt (only 4 core, not 5): http://tmp.lehis.ru/tmp/kam_bt_27333.txt http://tmp.lehis.ru/tmp/kam_bt_27334.txt http://tmp.lehis.ru/tmp/kam_bt_27336.txt http://tmp.lehis.ru/tmp/kam_bt_27337.txt
Can you try running kamailio with `-x qm` command parameters to see if it is specific to fm memory manager?
Done: ``` ps ax | grep kama 27526 - S 0:00,05 /usr/local/sbin/kamailio -x qm 27527 - S 0:00,19 /usr/local/sbin/kamailio -x qm ... ``` And now I've got five core files without load...
... and the backtraces?
Sorry, I've updated my previous post.
I can create virtualbox image for you, Daniel, and provide ssh access to it, if it can helps you.
Probably having access to a VM where I can reporduce myself is the best for analyzing. Let me know when it is available, I can provide a ssh key for getting access on it (I will also need sudo privileges). As I am not that familiar with freebsd, be sure that vim and gdb are installed.
Ok, I'll write when it will be ready.
It's ready for you. Send me your public key to ports@subnets.ru, please.
I connected to the box and started kamailio with `-x qm` -- it works, you said that it crashes immediately:
``` [VM-TEST@sip7:/home/daniel] ps auxw | grep kamai root 19951 0.0 1.8 222420 37196 - S 4:24PM 0:00.05 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19952 0.0 1.8 222420 37200 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19953 0.0 1.8 222420 37188 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19954 0.0 1.8 222420 37188 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19955 0.0 1.8 222420 37188 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19956 0.0 1.8 222420 37188 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19957 0.0 1.8 222420 37188 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19958 0.0 1.8 222420 37188 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19959 0.0 1.8 222420 37188 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19960 0.0 1.8 222420 37236 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19961 0.0 1.8 222420 37208 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19962 0.0 1.8 222420 37204 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd root 19963 0.0 1.8 222420 37204 - S 4:24PM 0:00.00 /usr/local/sbin/kamailio -x qm -a no -E -e -ddd ```
Is it something different in the way you start it?
Also, can you reproduce the crash with load testing so I can get some core files? Let me know where they are written so I can troubleshoot.
No, but sometime it is hard to reproduce core dump... Yes, I can. I'll pass calls now.
I've got core dump: it locates at /mnt/hdd3/coredumps/kamailio.20041.core
How did you compile kamailio? It uses pthread mutexes, which should not happen, there is no proper synchronization done there, so crashing can happen at any time.
I cloned kamailio and compiled it as usual and it uses the default FAST_LOCK for sync, not PTHREAD_MUTEX.
I've compiled kamailio using gmake, without any special options. I'll try to detect where was included using of PTHREAD_MUTEX
I cloned the git repository inside `/usr/local/src`, then compiled it with `gmake`. You can run `kamailio -I` to see some of the compile time flags. It is a difference between the one printed by `/usr/local/sbin/kamailio` and the one cloned by me in `/usr/local/src`.
I can confirm that kamailio works perfectly without PTHREAD_MUTEX: ``` Version: kamailio 4.4.5 (x86_64/freebsd) f98162 Default config: /usr/local/etc/kamailio/kamailio.cfg Default paths to modules: /usr/local/lib/kamailio/modules Compile flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES MAX_RECV_BUFFER_SIZE=262144 MAX_LISTEN=16 MAX_URI_SIZE=1024 BUF_SIZE=65535 DEFAULT PKG_SIZE=8MB DEFAULT SHM_SIZE=64MB ADAPTIVE_WAIT_LOOPS=1024 TCP poll methods: poll, select, kqueue Source code revision ID: f98162 Compiled with: clang 3.4 Compiled on: 12:45:18 Mar 15 2017 ``` I've found diff between build from src and using ports framework - it's target arch: when it builds from src **make cfg** shows: `target architecture <x86_64>, host architecture <x86_64>`
when it builds from ports it shows: `target architecture <amd64>, host architecture <x86_64>`
In last case Makefile.defs did know nothing about arch amd64 and that's why use_fast_lock hasn't set to yes and that's why PTHREAD_MUTEX was used.
But it is another story, where I and **opsec** will try to do something this it.
Daniel, thanks for your help and patience!
Closed #1021.
I see that PTHREAD_MUTEX are set as default instead of POSIX, if fast locks are not used. Maybe someone itried to see when they work very long time ago and eventually it was left like this (I tried to track the change quickly, but couldn't find it easy, too many changes in the file). Pthread mutex doesn't work for processes in most of the cases.
I am closing this one and open a new item to review the default locking option.