### Description
Kamailio runs out of memory and stops processing requests.
``` ERROR: <core> [core/tcp_main.c:973]: tcpconn_new(): mem. allocation failure ERROR: <core> [core/tcp_main.c:4042]: handle_new_connect(): tcpconn_new failed, closing socket ERROR: uac [uac_reg.c:533]: reg_ht_add(): no more shm ERROR: uac [uac_reg.c:1347]: uac_reg_load_db(): Error adding reg to htable ```
### Troubleshooting
Restarting the Kamailio process resolves the issue
#### Reproduction
The issue occurs every 48 hours or so on the Kamailio instance
#### Debugging Data
``` #kamcmd mod.stats all pkg Module: core { init_io_wait(475): 6480 init_dst_set(83): 32208 init_io_wait(524): 3576 yyparse(1782): 8 mk_new_net(52): 48 fix_param(1196): 128 route_new_list(200): 352 fix_expr(541): 616 exp_optimize_right(411): 40 fixup_regexp_null(213): 216 fix_param(1257): 352 yyparse(2729): 4872 mk_rval_expr1(2616): 68256 mk_elem(90): 976 mk_rval_expr2(2674): 321160 mk_rval_expr_v(2548): 768 mk_rval_expr_v(2537): 808136 mk_action(118): 196384 parse_params2(585): 560 set_mod_param_regex(121): 848 set_mod_param_regex(110): 264 tr_table_add(1875): 400 sr_cmd_exports_convert(235): 34160 register_module(280): 6192 load_module(496): 2072 new_sock_info(235): 176 new_sock_info(230): 3640 yyparse(642): 352 yyparse(1732): 288 add_alias(93): 80 add_alias(91): 128 yyparse(1446): 8 yyparse(2262): 296 yyparse(661): 416 subst_str(514): 416 subst_parser(301): 264 subst_parser(291): 2384 subst_parser(274): 960 pp_subst_add(57): 312 pp_define(1766): 1096 addstr(1399): 261168 str_hash_alloc(59): 512 pv_table_add(236): 22616 parse_pvname_list(1515): 2496 tr_new(1646): 1512 fix_match_rve(3028): 2152 pv_cache_add(347): 20664 fix_param(1162): 18936 init_modules(1026): 16 cnt_hash_add(383): 2048 db_do_init2(299): 336 dupl_string_name(72): 32 dupl_string(48): 144 new_db_id(272): 288 get_abs_pathname(262): 80 rpc_hash_add(146): 2048 pv_parse_format(1150): 40592 get_hdr_field(116): 232 parse_headers(320): 64 register_select_table(458): 104 rpc_hash_add(100): 26264 add_callback(59): 288 sr_wtimer_init(308): 136 cfg_new_group(79): 1160 cfg_declare(50): 4120 fix_socket_list(1509): 96 fix_sock_str(420): 912 fix_hostname(1301): 160 grp_hash_add(234): 952 pv_init_buffer(2055): 163840 pv_init_buffer(2045): 320 init_nonsip_hooks(43): 24 init_rlist(146): 40 route_add(124): 1784 str_hash_alloc(59): 768 rval_get_str(1256): 33648 init_counters(122): 128 cnt_hash_add(332): 26352 str_hash_alloc(59): 1280 Total: 2138200 }
Module: ndb_redis { redis_parse_index(627): 88 pv_parse_redisc_name(704): 16496 redisc_add_server(257): 24064 Total: 40648 }
Module: pv { tr_parse_string(2340): 64 tr_parse_string(2310): 192 tr_parse_string(2300): 192 tr_parse_string(2360): 32 tr_parse_string(2264): 544 add_var(65): 512 add_var(58): 2080 tr_parse_string(2275): 496 tr_parse_paramlist(2668): 32 Total: 4144 }
Module: sqlops { sql_get_result(200): 64 pv_parse_dbr_name(134): 864 sql_init_con(83): 240 sqlops_tr_buffer_init(46): 2048 Total: 3216 }
Module: textops { tr_txt_parse_re(212): 96 hname_fixup(2819): 616 fixup_method(2889): 1256 Total: 1968 }
Module: xhttp_rpc { child_init(789): 832 Total: 832 }
Module: db_mysql { db_mysql_new_connection(75): 4088 db_mysql_new_connection(65): 96 Total: 4184 }
Module: auth { generate_random_secret(239): 32 generate_random_secret(238): 32 Total: 64 }
Module: htable { pv_parse_ht_name(158): 160 fixup_ht_key(339): 88 Total: 248 }
Module: xlog { xlog_fixup_helper(535): 17472 mod_init(214): 4104 xdbg_fixup_helper(500): 3352 Total: 24928 }
Module: cnxcc { __alloc_redis(149): 24 __alloc_redis(146): 40 Total: 64 }
Module: rtpengine { build_rtpp_socks(1611): 8 child_init(1768): 240 Total: 248 }
Module: mangler { compile_expresions(557): 64 compile_expresions(541): 64 Total: 128 }
Module: acc { parse_acc_extra(116): 672 Total: 672 }
Module: rr { register_rrcb(61): 32 Total: 32 }
Module: dialog { dlg_bridge_init_hdrs(66): 80 Total: 80 }
Module: tm { Total: 0 }
Module: uac { Total: 0 }
Module: tls { Total: 0 }
Module: sl { Total: 0 }
Module: tmx { Total: 0 }
Module: kex { Total: 0 }
Module: usrloc { Total: 0 }
Module: pike { Total: 0 }
Module: websocket { Total: 0 }
Module: cfgutils { Total: 0 }
Module: nathelper { Total: 0 }
Module: dispatcher { Total: 0 }
```
#### Log Messages
``` kamailio[6722]: ERROR: <core> [core/tcp_main.c:973]: tcpconn_new(): mem. allocation failure kamailio[6722]: ERROR: <core> [core/tcp_main.c:4042]: handle_new_connect(): tcpconn_new failed, closing socket kamailio[6722]: ERROR: tm [t_msgbuilder.c:1594]: build_uac_req(): no more shared memory (544)
```
### Possible Solutions
Restarting Kamailio
### Additional Information
* **Kamailio Version** - output of `kamailio -v`
```
version: kamailio 5.1.7 (x86_64/linux) 567df3 flags: STATS: Off, USE_TCP, USE_TLS, USE_SCTP, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, TLSF_MALLOC, DBG_SR_MEMORY, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144 MAX_URI_SIZE 1024, BUF_SIZE 65535, DEFAULT PKG_SIZE 8MB poll method support: poll, epoll_lt, epoll_et, sigio_rt, select. id: 567df3 compiled on 04:30:41 Mar 14 2019 with gcc 7.3.0
```
* **Operating System**: ``` cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.2 LTS" ```
``` 4.15.0-1034-aws #36-Ubuntu SMP Tue Mar 5 23:17:16 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux ```
``` cat /proc/meminfo MemTotal: 4038176 kB MemFree: 130020 kB MemAvailable: 747324 kB Buffers: 15136 kB Cached: 2851248 kB SwapCached: 0 kB Active: 930936 kB Inactive: 2672836 kB Active(anon): 737672 kB Inactive(anon): 2097272 kB Active(file): 193264 kB Inactive(file): 575564 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 120 kB Writeback: 0 kB AnonPages: 734064 kB Mapped: 2180672 kB Shmem: 2097544 kB Slab: 154288 kB SReclaimable: 101108 kB SUnreclaim: 53180 kB KernelStack: 4176 kB PageTables: 81940 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 2019088 kB Committed_AS: 4529756 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 221184 kB DirectMap2M: 3973120 kB ```
Also, here is the output of "grep alloc qm_status.txt | awk '{ print substr( $0, 16, length($0) ) }' | sort | uniq -c | sort -k1n" (as per https://www.kamailio.org/wiki/tutorials/troubleshooting/memory):
https://www.dropbox.com/s/act3whe6gumyzvj/alloc.txt?dl=1
It is the shared memory that runs out of space. As I can see it, you provided the stats for pkg. You have to take the ones for shm.
What are the values you give for `-m` and `-M` command line parameters for kamailio?
@miconda -m 2048 -M 16
Another occurence:
``` 11(8342) ERROR: usrloc [ucontact.c:100]: new_ucontact(): no more shm memory 11(8342) ERROR: usrloc [urecord.c:149]: mem_insert_ucontact(): failed to create new contact 11(8342) ERROR: usrloc [urecord.c:573]: insert_ucontact(): failed to insert contact 11(8342) ERROR: registrar [save.c:522]: insert_contacts(): failed to insert contact 11(8342) ERROR: sl [sl_funcs.c:362]: sl_reply_error(): stateless error reply used: I'm terribly sorry, server error occurred (1/SL) 31(8362) ERROR: <core> [core/tcp_main.c:973]: tcpconn_new(): mem. allocation failure 31(8362) ERROR: <core> [core/tcp_main.c:4042]: handle_new_connect(): tcpconn_new failed, closing socket 31(8362) ERROR: <core> [core/tcp_main.c:973]: tcpconn_new(): mem. allocation failure 31(8362) ERROR: <core> [core/tcp_main.c:4042]: handle_new_connect(): tcpconn_new failed, closing socket 31(8362) ERROR: <core> [core/tcp_main.c:973]: tcpconn_new(): mem. allocation failure ```
``` kamcmd mod.stats all shm Module: core { dns_cache_mk_bad_entry(881): 7864 dns_cache_mk_rd_entry(1230): 13800 msg_lump_cloner(986): 501728 sip_msg_shm_clone(496): 2862448 tcpconn_new(971): 36272 build_req_buf_from_sip_req(2182): 814696 counters_prefork_init(207): 39168 create_avp(175): 57912 cfg_clone_str(130): 256 cfg_shmize(217): 1088 init_pt(113): 8 init_pt(108): 8 init_pt(107): 4760 cfg_parse_str(906): 192 register_timer(1012): 560 cfg_register_ctx(47): 96 init_tcp(4700): 8192 init_tcp(4694): 32768 init_tcp(4686): 8 init_tcp(4679): 8 init_tcp(4672): 8 init_tcp(4666): 8 init_tcp(4654): 8 init_avps(90): 8 init_avps(89): 8 init_dst_blacklist(437): 16384 init_dst_blacklist(430): 8 timer_alloc(515): 96 init_dns_cache(369): 8 init_dns_cache(361): 16384 init_dns_cache(354): 16 init_dns_cache(345): 8 init_timer(284): 8 init_timer(283): 16384 init_timer(282): 8 init_timer(281): 8 init_timer(270): 8 init_timer(238): 8 init_timer(221): 278544 init_timer(220): 8 init_timer(207): 8 cfg_child_cb_new(828): 64 sr_cfg_init(360): 8 sr_cfg_init(353): 8 sr_cfg_init(346): 8 sr_cfg_init(334): 8 sr_cfg_init(322): 8 qm_shm_lock_init(1195): 8 Total: 4709872 }
Module: ndb_redis { Total: 0 }
Module: pv { Total: 0 }
Module: sqlops { Total: 0 }
Module: textops { Total: 0 }
Module: xhttp_rpc { Total: 0 }
Module: db_mysql { Total: 0 }
Module: auth { Total: 0 }
Module: htable { ht_init_tables(386): 24576 ht_add_table(281): 936 Total: 25512 }
Module: xlog { Total: 0 }
Module: cnxcc { __alloc_new_credit_data(1217): 1080 __alloc_new_credit_data(1207): 480 __get_or_create_credit_data_entry(1177): 288 __add_call_by_cid(1534): 4464 __alloc_new_call_by_money(1278): 16512 __alloc_new_credit_data(1233): 48 shm_str_dup(723): 10728 __shm_str_hash_alloc(1090): 21984 __mod_init(286): 16 __mod_init(276): 16 __mod_init(275): 16 __mod_init(273): 16 __mod_init(272): 16 __mod_init(270): 16 __mod_init(269): 16 Total: 55696 }
Module: rtpengine { shm_str_dup(723): 3336 rtpp_function_call(2281): 3592 rtpengine_hash_table_init(85): 1024 rtpengine_hash_table_init(76): 2048 rtpengine_hash_table_init(67): 2048 get_rtpp_set(746): 200 add_rtpengine_socks(841): 88 get_rtpp_set(755): 8 rtpengine_hash_table_init(58): 72 add_rtpengine_socks(860): 24 mod_init(1448): 72 mod_init(1441): 24 mod_init(1428): 88 mod_init(1421): 8 rtpengine_hash_table_init(111): 14400 rtpengine_hash_table_init(96): 2048 Total: 29080 }
Module: mangler { Total: 0 }
Module: acc { Total: 0 }
Module: rr { Total: 0 }
Module: dialog { build_new_dlg(456): 151088 dlg_set_leg_info(541): 24712 dlg_get_iuid_shm_clone(287): 8328 dlg_set_toroute(1157): 8128 dlg_set_leg_info(532): 8136 dlg_set_leg_info(524): 36240 init_dlg_callback(55): 16 register_dlgcb(124): 40 init_dlg_table(295): 131088 init_dlg_table(287): 8 init_dlg_table(280): 8 init_dlg_table(275): 8 init_dlg_timer(57): 8 init_dlg_timer(48): 32 new_dlg_profile(188): 312 Total: 368152 }
Module: uac { reg_ht_add(530): 41784 reg_ht_add_byuser(479): 1760 reg_ht_add_byuuid(448): 2016 uac_reg_update(1093): 11472688 uac_reg_init_ht(237): 512 uac_reg_init_ht(223): 24 uac_reg_init_ht(211): 512 uac_reg_init_ht(199): 24 uac_reg_init_ht(187): 8 Total: 11519328 }
Module: tm { str_duplicate(196): 312 new_dlg_uac(324): 248 _reply_light(490): 72496 cancel_reason_pack(1201): 5416 update_totag_set(311): 4904 update_totag_set(310): 4136 build_local_reparse(345): 253264 relay_reply(1920): 323544 build_cell(339): 1800657088 build_uac_req(1592): 190580248 init_tm_stats_child(58): 8704 insert_tmcb(137): 15281328 init_tmcb_lists(74): 16 init_tmcb_lists(72): 16 init_hash_table(481): 2097152 Total: 2009288872 }
Module: tls { fix_domain(987): 816 tls_new_domain(157): 576 tls_new_cfg(1323): 48 tls_ct_wq_init(49): 8 mod_init(322): 8 mod_init(306): 8 ser_realloc(299): 194536 ser_malloc(293): 1404184 lock_set_alloc(70): 24 Total: 1600208 }
Module: sl { init_sl_stats_child(125): 6528 sl_startup(83): 8 init_sl_stats(110): 8 Total: 6544 }
Module: tmx { tmx_init_pretran_table(90): 512 Total: 512 }
Module: kex { pkg_proc_stats_init(79): 1632 Total: 1632 }
Module: usrloc { build_stat_name(52): 64 new_udomain(94): 49152 new_udomain(87): 48 new_dlist(563): 16 new_dlist(555): 32 init_ulcb_list(45): 16 Total: 49328 }
Module: pike { pike_init(125): 16 lock_set_alloc(70): 1040 init_ip_tree(115): 4112 pike_init(107): 8 Total: 5176 }
Module: websocket { wsconn_init(109): 16 wsconn_init(100): 8192 wsconn_init(89): 8 wsconn_init(79): 8 Total: 8224 }
Module: cfgutils { mod_init(863): 8 mod_init(857): 8 mod_init(849): 8 Total: 24 }
Module: nathelper { mod_init(483): 8 Total: 8 }
Module: dispatcher { reindex_dests(629): 1600 ds_avl_insert(3064): 2568 ds_set_attrs(257): 48 pack_dest(355): 216 init_data(226): 16 init_data(218): 16 ds_parse_reply_codes(987): 816 mod_init(308): 8 mod_init(306): 8 ds_ping_active_init(118): 8 Total: 5304 } ```
@miconda any other info I can gather that would help locate the issue here? The issue is recurring.
Thank you
We also have a test system with the issue reproduced if a closer look is needed @miconda
Posting a $100 bounty on this issue if anyone wants to take a look.
With busy traveling during March, this issue got out of my sight. It looks like a problem with TM UAC, the allocated size there is big. Could be related to uac_reg (uac remote registrations), I will dig into it a bit.
Allocated size for tm is indeed very high and it also looks like being induced from uac module, because the uac_reg_update() allocates size of UUID (less than 100) for each registration update, overall being 11472688, means over 100k remote registrations. Do you have so many records in you uacreg table?
After starting kamailio, but before the first error log message containing `no more shm memory`, do you have other error messages in the logs (especially from tm or uac modules)?
Any reason you are running kamailio printing logs to stderror (terminal) and not to syslog?
@miconda the number of records in uacreg is significantly less than that, that makes me wonder if the hash table cleanup is to blame
``` Apr 16 00:05:37 ip-10-0-0-0 kamailio[12057]: ERROR: uac [uac_reg.c:913]: uac_reg_tm_callback(): authentication failed for <user1234> Apr 16 00:05:37 ip-10-0-0-0 kamailio[12040]: ERROR: uac [uac_reg.c:1032]: uac_reg_tm_callback(): got sip response 500 while registering [test] Apr 16 01:39:03 ip-10-0-0-0 kamailio[12054]: ERROR: tls [tls_server.c:192]: tls_complete_init(): tls: ssl bug #1491 workaround: not enough memory for safe operation: shm=17097536 threshold1=17301504 Apr 16 01:39:03 ip-10-0-0-0 kamailio[12054]: ERROR: <core> [core/tcp_read.c:1485]: tcp_read_req(): ERROR: tcp_read_req: error reading - c: 0x7f7ace43eb68 r: 0x7f7ace43ebe8 ```
In regards to the logs, that's something that we use for tests in the lab
Which hash table cleanup?
Are these all the error messages you get from starting kamailio till the out of memory errors?
@miconda what I meant is the non-expiring registration records stored in memory
As to the error messages, those are about it, everything else is repetitive and the remaining logs consist of registrations, test calls, etc.
Perhaps there's a good way to export all shm data related to uac?
also doubling the bounty for this, hoping the community would help us get a stable 5.1.7+
Have you tried moving to db and see what happens?
On Thu, 18 Apr 2019 at 18:26, Jay J notifications@github.com wrote:
also doubling the bounty for this, hoping the community would help us get a stable 5.1.7+
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/issues/1895#issuecomment-484601643, or mute the thread https://github.com/notifications/unsubscribe-auth/ABO7UZPN4ZU3SEW34RA6TMDPRCVLZANCNFSM4G7MZHHQ . _______________________________________________ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
@kamailio-sync could you please clarify? Not sure I understand
@Jaykah - if they repeat, is important to see how many times -- can you count how many error messages that contain the next texts are from start to out of memory logs?
``` authentication failed for got sip response 500 while registering ```
Something like:
``` grep "authentication failed for" /path/to/logfile | wc -l ``` Just be sure that the logfile has only the logs from startup to out of memory, not older logs.
@miconda ~2000 matches for "authentication failed for" ~2800 matches for "got sip response 500 while registering"
@miconda We got rid of the trunks/accounts causing 500 errors and auth failure, yet are still seeing the leak in the kamailio process.
Any other way to troubleshoot this?
Upping bounty to $250
@Jaykah - bug bounties are not really common in the Kamailio project so far. Commercial support options can be found on e.g. in the directory on kamailio.org. I will write you also an email with some more information.
@henningw will do that instead, thanks!
@miconda what would be the best way to dump the kamailio uac memory segments to see those records and compare with actual registrations?
@miconda just wanted to follow up on this - dumping those records should show us what the issue might be, how can that be inspected?
Is there a way to get access to the system exposing the issue and do some troubleshooting there (it may require sudo privileges)? The leak is identified by the number of allocated memory chunks by uac, with side effects to transactions, but the reason for it is yet unknown. With Kamailio World 2019 being finished, I expect to have more time for troubleshooting here.
@miconda yes, I do have a machine where the issue is reproduced, where can I email the information? If you could reply to that email with your public key I'll add that right away as well.
You can write to miconda [at] gmail.com
Update after I got access to a system exposing the issue.
The timer process of kamailio was using 100% CPU, being blocked, and by that not releasing/removing active/terminated transactions, leading to filling the memory - top output:
``` PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13582 root 20 0 2274068 18012 12856 R 99.7 0.4 6179:53 kamailio ```
The kamclt ps:
``` { IDX: 14 PID: 13582 DSC: timer } ```
Attaching with gdb revealed the following backtrace:
``` #0 0x00007fec0b6ea7d8 in yaSSL::States::What (this=this@entry=0x55b99e2fbf70) at ./extra/yassl/src/yassl_int.cpp:208 #1 0x00007fec0b6eab31 in yaSSL::SSL::GetError (this=this@entry=0x55b99e2fb600) at ./extra/yassl/src/yassl_int.cpp:1520 #2 0x00007fec0b6f89fb in yaSSL::receiveData (ssl=..., data=..., peek=peek@entry=false) at ./extra/yassl/src/handshake.cpp:1047 #3 0x00007fec0b6de65d in yaSSL::yaSSL_read (ssl=ssl@entry=0x55b99e2fb600, buffer=buffer@entry=0x55b99e2f91a0, sz=sz@entry=4) at ./extra/yassl/src/ssl.cpp:363 #4 0x00007fec0b6c6797 in vio_ssl_read (vio=0x55b99e2d83b0, buf=0x55b99e2f91a0 "@", size=<optimized out>) at ./vio/viossl.c:186 #5 0x00007fec0b69fbae in net_read_raw_loop (count=4, net=0x7fec0c0378f0) at ./sql/net_serv.cc:672 #6 net_read_packet_header (net=0x7fec0c0378f0) at ./sql/net_serv.cc:762 #7 net_read_packet (net=0x7fec0c0378f0, complen=0x7ffe188c6ff0) at ./sql/net_serv.cc:822 #8 0x00007fec0b6a0c2c in my_net_read (net=net@entry=0x7fec0c0378f0) at ./sql/net_serv.cc:899 #9 0x00007fec0b694edc in cli_safe_read_with_ok (mysql=mysql@entry=0x7fec0c0378f0, parse_ok=parse_ok@entry=0 '\000', is_data_packet=is_data_packet@entry=0x0) at ./sql-common/client.c:1061 #10 0x00007fec0b69518f in cli_safe_read (mysql=mysql@entry=0x7fec0c0378f0, is_data_packet=is_data_packet@entry=0x0) at ./sql-common/client.c:1194 #11 0x00007fec0b695fa8 in cli_read_query_result (mysql=0x7fec0c0378f0) at ./sql-common/client.c:5245 #12 0x00007fec0b697d61 in mysql_real_query (mysql=0x7fec0c0378f0, query=<optimized out>, length=<optimized out>) at ./sql-common/client.c:5341 #13 0x00007fec0bc9a2ae in db_mysql_submit_query (_h=0x7fec0c1a4808, _s=0x7fec0b45a2d0 <sql_str>) at km_dbase.c:116 #14 0x00007fec0b249f6d in db_do_submit_query (_h=0x7fec0c1a4808, _query=0x7fec0b45a2d0 <sql_str>, submit_query=0x7fec0bc99b36 <db_mysql_submit_query>) at db_query.c:53 #15 0x00007fec0b24d10d in db_do_delete (_h=0x7fec0c1a4808, _k=0x7ffe188c7310, _o=0x0, _v=0x7ffe188c7330, _n=2, val2str=0x7fec0bc98136 <db_mysql_val2str>, submit_query=0x7fec0bc99b36 <db_mysql_submit_query>) at db_query.c:291 #16 0x00007fec0bc9ed64 in db_mysql_delete (_h=0x7fec0c1a4808, _k=0x7ffe188c7310, _o=0x0, _v=0x7ffe188c7330, _n=2) at km_dbase.c:515 #17 0x00007fec09669714 in remove_dialog_from_db (cell=0x7feb84c98a98) at dlg_db_handler.c:714 #18 0x00007fec09645375 in destroy_dlg (dlg=0x7feb84c98a98) at dlg_hash.c:371 #19 0x00007fec0964cca0 in dlg_unref_helper (dlg=0x7feb84c98a98, cnt=2, fname=0x7fec096dd53f "dlg_handlers.c", fline=410) at dlg_hash.c:961 #20 0x00007fec09671fb5 in dlg_ontdestroy (t=0x7feb84dbf640, type=131072, param=0x7ffe188c7620) at dlg_handlers.c:410 #21 0x00007fec099dfb05 in run_trans_callbacks_internal (cb_lst=0x7feb84dbf6b8, type=131072, trans=0x7feb84dbf640, params=0x7ffe188c7620) at t_hooks.c:260 #22 0x00007fec099dfc2c in run_trans_callbacks (type=131072, trans=0x7feb84dbf640, req=0x0, rpl=0x0, code=0) at t_hooks.c:287 ---Type <return> to continue, or q <return> to quit--- #23 0x00007fec09915f45 in free_cell_helper (dead_cell=0x7feb84dbf640, silent=0, fname=0x7fec09a28882 "timer.c", fline=689) at h_table.c:166 #24 0x00007fec099cdbf6 in wait_handler (ti=1667374146, wait_tl=0x7feb84dbf6c8, data=0x7feb84dbf640) at timer.c:689 #25 0x000055b99c9f02c5 in timer_list_expire (t=1667374146, h=0x7feb8461a050, slow_l=0x7feb8461d988, slow_mark=1904) at core/timer.c:874 #26 0x000055b99c9f076d in timer_handler () at core/timer.c:939 #27 0x000055b99c9f0c36 in timer_main () at core/timer.c:978 #28 0x000055b99c942322 in main_loop () at main.c:1727 #29 0x000055b99c949a1b in main (argc=7, argv=0x7ffe188c7ca8) at main.c:2696 ```
Showing that the processes executing code from inside libmysqlclient, related to tcp/tls connection. I noticed in config that tls module is used and the libssl version is 1.1, so this can be a side effect of the issue related to the new locking API in libssl/libcrypto.
The solutions to try would be either to compile against libssl 1.0.x or use the pre-loaded library workaround:
* https://github.com/kamailio/kamailio/tree/master/src/modules/tls/utils/opens...
Run it with one of these two and let's see if it works fine or the issue appears again.
Adding also that the mysql timeout might not work always, as presented in manual:
* https://dev.mysql.com/doc/refman/5.5/en/mysql-options.html
Citing:
``` MYSQL_OPT_READ_TIMEOUT (argument type: unsigned int *)
The timeout in seconds for each attempt to read from the server. There are retries if necessary, so the total effective timeout value is three times the option value. You can set the value so that a lost connection can be detected earlier than the TCP/IP Close_Wait_Timeout value of 10 minutes.
Implementation of this timeout uses mechanisms that may not be available on all platforms. On such a platform, a client that issues a read call might under certain circumstances wait without timing out. For example, a client might not time out if the server is not responding because it is waiting for a “disk full” condition to clear. ``` Therefore you should also check if there are problems with mysql server.
FYI- we have to build our own debian9 pkgs due to the openssl mem leak issues not sure why this isn't in the kamailio's debian build system yet because im pretty sure there is an issue on GH related to it this is from our `build.sh` container:
# fix for libssl bug apt-get -y purge libssl-dev && apt-get -y -qq install libssl1.0-dev
...
sed -ie 's/libssl-dev/libssl1.0-dev/g' pkg/kamailio/deb/debian/control
On Wed, Jun 12, 2019 at 2:33 AM Daniel-Constantin Mierla < notifications@github.com> wrote:
Adding also that the mysql timeout might not work always, as presented in manual:
Citing:
MYSQL_OPT_READ_TIMEOUT (argument type: unsigned int *)
The timeout in seconds for each attempt to read from the server. There are retries if necessary, so the total effective timeout value is three times the option value. You can set the value so that a lost connection can be detected earlier than the TCP/IP Close_Wait_Timeout value of 10 minutes.
Implementation of this timeout uses mechanisms that may not be available on all platforms. On such a platform, a client that issues a read call might under certain circumstances wait without timing out. For example, a client might not time out if the server is not responding because it is waiting for a “disk full” condition to clear.
Therefore you should also check if there are problems with mysql server.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/issues/1895?email_source=notifications&email_token=ABO7UZNZ7RC26L4NVJPT4GDP2CKBZA5CNFSM4G7MZHH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXPMR3A#issuecomment-501139692, or mute the thread https://github.com/notifications/unsubscribe-auth/ABO7UZPCOHEPLSKYC7JKXLTP2CKBZANCNFSM4G7MZHHQ . _______________________________________________ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
strings /usr/lib/x86_64-linux-gnu/libcrypto.so | grep "^OpenSSL" OpenSSLDie OpenSSL_add_all_ciphers OpenSSL_add_all_digests OpenSSL 1.0.2n 7 Dec 2017 OpenSSL HMAC method OpenSSL EC algorithm OpenSSL RSA method OpenSSL DSA method OpenSSL ECDSA method OpenSSL DH Method OpenSSL X9.42 DH method OpenSSL PKCS#3 DH method OpenSSL ECDH method OpenSSL 'dlfcn' shared library method OpenSSL default OpenSSL default user interface OpenSSL CMAC method
Yet the issue is still reproducible
(gdb) bt n #0 0x00007fab34880860 in __errno_location@plt () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 #1 0x00007fab348bb589 in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 #2 0x00007fab348bb849 in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 #3 0x00007fab34895bc0 in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 #4 0x00007fab34896c2c in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 #5 0x00007fab3488aedc in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 #6 0x00007fab3488bfa8 in ?? () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 #7 0x00007fab3488dd61 in mysql_real_query () from /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 #8 0x00007fab34e902ae in db_mysql_submit_query (_h=0x7fab35237ee0, _s=0x7fab346502d0 <sql_str>) at km_dbase.c:116 #9 0x00007fab3443ff6d in db_do_submit_query (_h=0x7fab35237ee0, _query=0x7fab346502d0 <sql_str>, submit_query=0x7fab34e8fb36 <db_mysql_submit_query>) at db_query.c:53 #10 0x00007fab3444310d in db_do_delete (_h=0x7fab35237ee0, _k=0x7ffdad2ea400, _o=0x0, _v=0x7ffdad2ea420, _n=2, val2str=0x7fab34e8e136 <db_mysql_val2str>, submit_query=0x7fab34e8fb36 <db_mysql_submit_query>) at db_query.c:291 #11 0x00007fab34e94d64 in db_mysql_delete (_h=0x7fab35237ee0, _k=0x7ffdad2ea400, _o=0x0, _v=0x7ffdad2ea420, _n=2) at km_dbase.c:515 #12 0x00007fab3285f714 in remove_dialog_from_db (cell=0x7faaae3e6fd0) at dlg_db_handler.c:714 #13 0x00007fab3283b375 in destroy_dlg (dlg=0x7faaae3e6fd0) at dlg_hash.c:371 #14 0x00007fab32842ca0 in dlg_unref_helper (dlg=0x7faaae3e6fd0, cnt=2, fname=0x7fab328d353f "dlg_handlers.c", fline=410) at dlg_hash.c:961 #15 0x00007fab32867fb5 in dlg_ontdestroy (t=0x7faaae56ea00, type=131072, param=0x7ffdad2ea710) at dlg_handlers.c:410 #16 0x00007fab32bd5b05 in run_trans_callbacks_internal (cb_lst=0x7faaae56ea78, type=131072, trans=0x7faaae56ea00, params=0x7ffdad2ea710) at t_hooks.c:260 #17 0x00007fab32bd5c2c in run_trans_callbacks (type=131072, trans=0x7faaae56ea00, req=0x0, rpl=0x0, code=0) at t_hooks.c:287 #18 0x00007fab32b0bf45 in free_cell_helper (dead_cell=0x7faaae56ea00, silent=0, fname=0x7fab32c1e882 "timer.c", fline=689) at h_table.c:166 #19 0x00007fab32bc3bf6 in wait_handler (ti=1826413493, wait_tl=0x7faaae56ea88, data=0x7faaae56ea00) at timer.c:689 #20 0x0000556a3d1b32c5 in timer_list_expire (t=1826413493, h=0x7faaad841200, slow_l=0x7faaad8418b8, slow_mark=72) at core/timer.c:874 #21 0x0000556a3d1b376d in timer_handler () at core/timer.c:939 #22 0x0000556a3d1b3c36 in timer_main () at core/timer.c:978 #23 0x0000556a3d105322 in main_loop () at main.c:1727 #24 0x0000556a3d10ca1b in main (argc=7, argv=0x7ffdad2ead98) at main.c:2696
@Jaykah - do you still see this error, maybe after updating to a new kamailio version?
Hi @henningw - yes, I do. I saw a similar problem on the mailing list and we are also using RDS - perhaps that's the issue (deadlock?). It looks like this happens on a record delete for the dialog - looks like db_mode 0 resolves this - so far instances with that option enabled have not experienced the issue. However, we are looking for a more permanent fix that would allow us to use db_mode 1
As I looked again over the issue, I see in the backtrace (I attached) that the mysql client library is using yassl not libssl (which I expected).
@Jaykah - do you know if the library is installed from the default ubuntu repository or is something customized for RDS? Also, for you backtrace, install debugging symbols to see where the code get stuck in libmysqlclient library. There is another report on sr-dev with a blocking happening in libmysqlclient, maybe the backtrace with debug symbols will show the same like there.
@Jaykah - any update on this one?
@henningw we have stuck to db_mode 0, so the issue no longer resurfaces. It is still present, however, with db_mode 1.
@henningw we have stuck to db_mode 0, so the issue no longer resurfaces. It is still present, however, with db_mode 1.
Are there any other versions affected anyone knows of ?
On Fri, Nov 15, 2019 at 5:10 AM Jay J notifications@github.com wrote:
@henningw https://github.com/henningw we have stuck to db_mode 0, so the issue no longer resurfaces. It is still present, however, with db_mode
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kamailio/kamailio/issues/1895?email_source=notifications&email_token=ABO7UZMYCAMX7QQA4EVJICLQT2NRPA5CNFSM4G7MZHH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEFMAXY#issuecomment-554352735, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABO7UZILG3EC53NEAKRRC3DQT2NRPANCNFSM4G7MZHHQ . _______________________________________________ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
It seems that kamailio in Debian stable (5.2.1-1) is affected by this bug too.
Closed #1895.
This is likely related to libmysqlclient changes for ubuntu, like reported by #2394. On Debian stable I was not able to reproduce on a couple of heavy duty deployments, so might be something different.
I am closing this one, if you get it on Debian stable again, open a new issue and attach the the backtrace, ideally for a kamailio 5.3 or 5.3
On Debian stable I'm periodically getting following error: ``` /usr/sbin/kamailio[13996]: ERROR: tls [tls_server.c:192]: tls_complete_init(): tls: ssl bug #1491 workaround: not enough memory for safe operation: shm=7494944 threshold1=7864320 ```