Hello,
we have kamailio 1.4 (r. 5728) running in production. We compiled
kamailio with 4 times the default pkg memory pool size (#define
PKG_MEM_POOL_SIZE 4*1024*1024 ).
We use snmpstats to monitor registration and number of calls using
Nagios and PRTG.
Some days ago, after almost 30 days of continuous operation without
problems, in the shell we executed "snmpwalk -v2c -c public
192.168.88.22 .1.3.6.1.4.1.27483" as we routinely do for a quick check
of SNMP in the command line and it failed. Then we checked kamailio
logs and we found this:
May 21 16:25:35 ipx022 /usr/local/sbin/kamailio[8781]:
ERROR:snmpstats:handle_openserSIPServiceStartTime: failed to read
sysUpTime file at /tmp/openSER_SNMPAgent.txt
I don't know what caused the above (maybe the file was mistakenly rm'd
by someone. I failed to check it at that moment) but it was followed
with this:
May 21 16:25:35 ipx022 /usr/local/sbin/kamailio[8781]:
ERROR:snmpstats:executeInterprocessBufferCmd: Received a request for
contact: [protected]@[protected] for user:
sip:[protected]@[protected] who doesn't exists
May 21 16:25:35 ipx022 /usr/local/sbin/kamailio[8781]:
ERROR:snmpstats:executeInterprocessBufferCmd: Received a request to
delete contact: [protected]@[protected] for user:
sip:[protected]@[protected] who doesn't exist
... then the same log "contact XXX who doesn't exist" happened several
times
per second till:
May 21 16:36:30 ipx022 /usr/local/sbin/kamailio[8781]:
ERROR:snmpstats:createRegUserRow: failed to create a row for
openserSIPRegUserTable
May 21 16:36:30 ipx022 /usr/local/sbin/kamailio[8781]:
ERROR:snmpstats:updateUser: openserSIPRegUserTable ran out of memory.
Not able to add user: [protected]@[protected]
May 21 16:36:30 ipx022 /usr/local/sbin/kamailio[8781]:
ERROR:snmpstats:executeInterprocessBufferCmd: Received a request for
contact: [protected]@[protected] for user:
sip:[protected]@[protected] who doesn't exists
May 21 16:36:30 ipx022 /usr/local/sbin/kamailio[8781]:
ERROR:snmpstats:insertContactRecord: no more pkg memory
May 21 16:36:30 ipx022 /usr/local/sbin/kamailio[8781]:
ERROR:snmpstats:executeInterprocessBufferCmd: openserSIPRegUserTable
was unable to allocate memory for adding contact:
[protected]@[protected] to user
sip:[protected]@[protected].
During this time, snmpget queries issued by Nagios and PRTG were
failing and the process spawned by module snmpstats was making heavy
use of CPU.
No other parts of kamailio were affected (calls and registration were
OK), but snmpstats didn't normalize by itself so we decided to restart
kamailio.
I'm trying to recreate this in a lab machine but no luck so far.
I am keeping the lab server busy with registration and calls while
running snmpwalk in a loop. I've also ran "rm
/tmp/openSER_SNMPAgent.txt" to check if could trigger the problem but
the problem could not be recreated.
If I push the lab server enough, I can see similar logs like this:
May 27 20:16:54 ipx029 /usr/local/sbin/kamailio[14487]:
ERROR:snmpstats:updateUser: openserSIPRegUserTable ran out of memory.
Not able to add user: [protected]@[protected]
May 27 20:16:54 ipx029 /usr/local/sbin/kamailio[14487]:
ERROR:snmpstats:executeInterprocessBufferCmd: Received a request for
contact: [protected]@[protected] for user: sip:[protected]@[protected]
who doesn't exists
May 27 20:17:05 ipx029 /usr/local/sbin/kamailio[14487]:
ERROR:snmpstats:get_socket_list_from_proto: no more pkg memory
May 27 20:17:05 ipx029 last message repeated 9 times
May 27 20:17:59 ipx029 /usr/local/sbin/kamailio[14487]:
ERROR:snmpstats:handle_openserSIPServiceStartTime: failed to read
sysUpTime file at /tmp/openSER_SNMPAgent.txt
May 27 20:40:25 ipx029 /usr/local/sbin/kamailio[14487]:
ERROR:snmpstats:handle_openserSIPServiceStartTime: failed to read
sysUpTime file at /tmp/openSER_SNMPAgent.txt
But even with those messages, snmpget/snmpwalk doesn't stop to return responses.
I was hoping someone could take a look at snmpstats code; the logs for
snmpstats:executeInterprocessBufferCmd are marked with comments like :
/* This should never happen. This is more of a sanity check. */
so probably my server hit some bug.
regards,
takeshi
Show replies by date