Hi all,
we are running Kamailio 3.1.2 in a production environment, using the dialog module, and it crashed two hours ago.
Here you have the logs we got (addtional log fragments with the acc records involved in this call are appended at the end of the mail):
Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: CRITICAL: dialog [dlg_hash.c:599]: bogus ref -1 with cnt 1 for dlg 0x7f23f472db30 [2490:1070436595] with clid 'e0a20cb844d211e0acd8001422093865@<CLIENT IP>' and tags '1577886432-3759264324-335599788-1698171170' '' Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_cb.c: destroy_dlg_callbacks_list(80) - aborting Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: <core> [main.c:741]: child process 28927 exited by a signal 6 Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: <core> [main.c:744]: core was not generated Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: INFO: <core> [main.c:756]: INFO: terminating due to SIGCHLD Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28948]: INFO: <core> [main.c:807]: INFO: signal 15 received Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28942]: INFO: <core> [main.c:807]: INFO: signal 15 received
We get the kamailio code from git last week:
sercmd> core.info { version: kamailio 3.1.2 id: 4ace86 compiler: gcc 4.3.2 compiled: 09:12:36 Feb 23 2011 flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES }
The problem looks like this other one already fixed: http://lists.sip-router.org/pipermail/sr-users/2009-November/027351.html
We set the Kamailio to debug level in case it happens again.
On the other side, I need to know why the core is not been generated. I have already checked the points mentioned in http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:corefiles
1. disable_core_dump is not set in the config file.
2. From /etc/default/kamailio: ... DUMP_CORE=yes ...
2. From /etc/init.d/kamailio: ... if test "$DUMP_CORE" = "yes" ; then # set proper ulimit ulimit -c unlimited
# directory for the core dump files COREDIR=/home/corefiles [ -d $COREDIR ] || mkdir $COREDIR chmod 777 $COREDIR echo "$COREDIR/core.%e.sig%s.%p" > /proc/sys/kernel/core_pattern fi ...
4. Writting permissions of $COREDIR
ls -hall /home ... drwxrwxrwx 2 root root 4.0K 2010-12-21 09:15 corefiles ...
What else should I check?
Thanks in advance, regards
Antón
*Acc records related to the dialog whose destruction causes the problem:*
Mar 2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28902]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1299073364;method=INVITE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP>
...
Mar 2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28920]: NOTICE: acc [acc.c:275]: ACC: request acknowledged: timestamp=1299073364;method=ACK;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP> ...
Mar 2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28903]: ERROR: <script>: ACK WITHOUT MATCHING TRANSACTION in e0a20cb844d211e0acd8001422093865@<client IP> call... ignore and discard.
...
Mar 2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28904]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1299073380;method=BYE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP>
Hello,
looks like related to the callbacks for dialog module. Are you loading other modules that require dialog module?
Checking the time staps from the acc and the crash log, the BYE for the dialog was before the crash but the To-tag is not printed from dlg_hash.c, although it is in the acc for INVITE and BYE. Do you have parallel forking in front of this SIP server? I mean, is there another proxy that can do parallel forking then send two or more branches to this instance?
I will dig in more to see what went wrong there.
Thanks, Daniel
On 3/2/11 4:34 PM, Anton Roman wrote:
Hi all,
we are running Kamailio 3.1.2 in a production environment, using the dialog module, and it crashed two hours ago.
Here you have the logs we got (addtional log fragments with the acc records involved in this call are appended at the end of the mail):
Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: CRITICAL: dialog [dlg_hash.c:599]: bogus ref -1 with cnt 1 for dlg 0x7f23f472db30 [2490:1070436595] with clid 'e0a20cb844d211e0acd8001422093865@<CLIENT IP>' and tags '1577886432-3759264324-335599788-1698171170' '' Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_cb.c: destroy_dlg_callbacks_list(80) - aborting Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: <core> [main.c:741]: child process 28927 exited by a signal 6 Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: <core> [main.c:744]: core was not generated Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: INFO: <core> [main.c:756]: INFO: terminating due to SIGCHLD Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28948]: INFO: <core> [main.c:807]: INFO: signal 15 received Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28942]: INFO: <core> [main.c:807]: INFO: signal 15 received
We get the kamailio code from git last week:
sercmd> core.info http://core.info/ { version: kamailio 3.1.2 id: 4ace86 compiler: gcc 4.3.2 compiled: 09:12:36 Feb 23 2011 flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES }
The problem looks like this other one already fixed: http://lists.sip-router.org/pipermail/sr-users/2009-November/027351.html
We set the Kamailio to debug level in case it happens again.
On the other side, I need to know why the core is not been generated. I have already checked the points mentioned in http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:corefiles
disable_core_dump is not set in the config file.
From /etc/default/kamailio:
... DUMP_CORE=yes ...
- From /etc/init.d/kamailio:
... if test "$DUMP_CORE" = "yes" ; then # set proper ulimit ulimit -c unlimited
# directory for the core dump files COREDIR=/home/corefiles [ -d $COREDIR ] || mkdir $COREDIR chmod 777 $COREDIR echo "$COREDIR/core.%e.sig%s.%p" > /proc/sys/kernel/core_pattern
fi ...
- Writting permissions of $COREDIR
ls -hall /home ... drwxrwxrwx 2 root root 4.0K 2010-12-21 09:15 corefiles ...
What else should I check?
Thanks in advance, regards
Antón
*Acc records related to the dialog whose destruction causes the problem:*
Mar 2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28902]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1299073364;method=INVITE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP>
...
Mar 2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28920]: NOTICE: acc [acc.c:275]: ACC: request acknowledged: timestamp=1299073364;method=ACK;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP> ...
Mar 2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28903]: ERROR:
<script>: ACK WITHOUT MATCHING TRANSACTION in e0a20cb844d211e0acd8001422093865@<client IP> call... ignore and discard. ... Mar 2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28904]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1299073380;method=BYE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP> _______________________________________________ SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Hello,
thanks for your quick reply, my answer is inline.
2011/3/2 Daniel-Constantin Mierla miconda@gmail.com
Hello,
looks like related to the callbacks for dialog module. Are you loading other modules that require dialog module?
we are using some features of dialog module such as ending dialogs after a timeout period, and we are using engage_mediaproxy() function, as well. It's an old configuration we had to put in production with no time enough to test. Do you recommend not to use dialog module if not strictly required?
Checking the time staps from the acc and the crash log, the BYE for the dialog was before the crash but the To-tag is not printed from dlg_hash.c, although it is in the acc for INVITE and BYE. Do you have parallel forking in front of this SIP server? I mean, is there another proxy that can do parallel forking then send two or more branches to this instance?
AFAIK the the client who is sending that calls is not doing parallel
forking, they are sending calls over a SIP trunk to our Kamailio. They are calling to PSTN numbers and we are sending that calls to a gateway, so they shouldn't do parallel forking, I'll get some traces to check it.
I will dig in more to see what went wrong there.
I got the attached debug level logs for the reported problem, hope it
helps. I'm still trying to find out why the core is not being generated.
Thanks, Daniel
Thanks a lot, Anton
On 3/2/11 4:34 PM, Anton Roman wrote:
Hi all,
we are running Kamailio 3.1.2 in a production environment, using the dialog module, and it crashed two hours ago.
Here you have the logs we got (addtional log fragments with the acc records involved in this call are appended at the end of the mail):
Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: CRITICAL: dialog [dlg_hash.c:599]: bogus ref -1 with cnt 1 for dlg 0x7f23f472db30 [2490:1070436595] with clid 'e0a20cb844d211e0acd8001422093865@<CLIENT IP>' and tags '1577886432-3759264324-335599788-1698171170' '' Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28927]: : <core> [mem/q_malloc.c:446]: BUG: qm_free: freeing already freed pointer, first free: dialog: dlg_cb.c: destroy_dlg_callbacks_list(80) - aborting Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: <core> [main.c:741]: child process 28927 exited by a signal 6 Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: ALERT: <core> [main.c:744]: core was not generated Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28896]: INFO: <core> [main.c:756]: INFO: terminating due to SIGCHLD Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28948]: INFO: <core> [main.c:807]: INFO: signal 15 received Mar 2 14:43:05 kamailio2 /usr/local/sbin/kamailio[28942]: INFO: <core> [main.c:807]: INFO: signal 15 received
We get the kamailio code from git last week:
sercmd> core.info { version: kamailio 3.1.2 id: 4ace86 compiler: gcc 4.3.2 compiled: 09:12:36 Feb 23 2011 flags: STATS: Off, USE_IPV6, USE_TCP, USE_TLS, TLS_HOOKS, USE_RAW_SOCKS, DISABLE_NAGLE, USE_MCAST, DNS_IP_HACK, SHM_MEM, SHM_MMAP, PKG_MALLOC, DBG_QM_MALLOC, USE_FUTEX, FAST_LOCK-ADAPTIVE_WAIT, USE_DNS_CACHE, USE_DNS_FAILOVER, USE_NAPTR, USE_DST_BLACKLIST, HAVE_RESOLV_RES }
The problem looks like this other one already fixed: http://lists.sip-router.org/pipermail/sr-users/2009-November/027351.html
We set the Kamailio to debug level in case it happens again.
On the other side, I need to know why the core is not been generated. I have already checked the points mentioned in http://www.kamailio.org/dokuwiki/doku.php/troubleshooting:corefiles
disable_core_dump is not set in the config file.
From /etc/default/kamailio:
... DUMP_CORE=yes ...
- From /etc/init.d/kamailio:
... if test "$DUMP_CORE" = "yes" ; then # set proper ulimit ulimit -c unlimited
# directory for the core dump files COREDIR=/home/corefiles [ -d $COREDIR ] || mkdir $COREDIR chmod 777 $COREDIR echo "$COREDIR/core.%e.sig%s.%p" > /proc/sys/kernel/core_pattern
fi ...
- Writting permissions of $COREDIR
ls -hall /home ... drwxrwxrwx 2 root root 4.0K 2010-12-21 09:15 corefiles ...
What else should I check?
Thanks in advance, regards
Antón
*Acc records related to the dialog whose destruction causes the problem:*
Mar 2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28902]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1299073364;method=INVITE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP>
...
Mar 2 14:42:44 kamailio2 /usr/local/sbin/kamailio[28920]: NOTICE: acc [acc.c:275]: ACC: request acknowledged: timestamp=1299073364;method=ACK;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP> ...
Mar 2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28903]: ERROR:
<script>: ACK WITHOUT MATCHING TRANSACTION in e0a20cb844d211e0acd8001422093865@<client IP> call... ignore and discard. ... Mar 2 14:43:00 kamailio2 /usr/local/sbin/kamailio[28904]: NOTICE: acc [acc.c:275]: ACC: transaction answered: timestamp=1299073380;method=BYE;from_tag=1577886432-3759264324-335599788-1698171170;to_tag=5FFAEA34-6A;call_id=e0a20cb844d211e0acd8001422093865@<client IP>;code=200;reason=OK;src_user=<caller number>;src_domain=<client IP>;dst_ouser=<called number>;dst_user=<called number>;dst_domain=10.90.1.251;src_ip=<client IP> _______________________________________________ SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing listsr-users@lists.sip-router.orghttp://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users -- Daniel-Constantin Mierlahttp://www.asipto.com
Hey,
On 03.03.2011 10:19, Anton Roman wrote:
Checking the time staps from the acc and the crash log, the BYE for the dialog was before the crash but the To-tag is not printed from dlg_hash.c, although it is in the acc for INVITE and BYE. Do you have parallel forking in front of this SIP server? I mean, is there another proxy that can do parallel forking then send two or more branches to this instance?
AFAIK the the client who is sending that calls is not doing parallel forking, they are sending calls over a SIP trunk to our Kamailio. They are calling to PSTN numbers and we are sending that calls to a gateway, so they shouldn't do parallel forking, I'll get some traces to check it.
Your trace shows that there are two worker processes dealing with the segfault-triggering dialog, process ID 32155 and 32158. I cannot see from your trace what module caused the latter process to execute unref_dlg() in dlg_hash.c, however.
What I can tell though is that the crash happens because too much dialog reference counter decrementing takes place. Although I have no clue why, I believe the implementation of unref_dlg_unsafe() (a macro) could be somewhat more robust by not unlinking and destroying a dialog when the counter drops below zero. That is, instead of running the following block
if ((_dlg)->ref<=0) { \ unlink_unsafe_dlg( _d_entry, _dlg);\ LM_DBG("ref <=0 for dialog %p\n",_dlg);\ destroy_dlg(_dlg);\ }\
for _dlg->ref <= 0, I see no reason to change the compare operator to ==.
Of course, that just cures the symptoms. A coredump would be really helpful in identifying the root of the crash problem but I don't know why it wasn't generated in your case. Your configuration looks good to me.
Cheers,
--Timo
Argh:
On 03.03.2011 11:11, Timo Reimann wrote:
What I can tell though is that the crash happens because too much dialog reference counter decrementing takes place. Although I have no clue why,
^^^^^^^^^^^^^^^^^
...the crash happens,
I believe the implementation of unref_dlg_unsafe() (a macro) could be somewhat more robust by not unlinking and destroying a dialog when the counter drops below zero. That is, instead of running the following block
if ((_dlg)->ref<=0) { \ unlink_unsafe_dlg( _d_entry, _dlg);\ LM_DBG("ref <=0 for dialog %p\n",_dlg);\ destroy_dlg(_dlg);\ }\
for _dlg->ref <= 0, I see no reason to change the compare operator to ==.
I see no reason *not* to change compare operator to ==. That is, I want the block to execute iff the reference counter is found to be zero.
--Timo
Hello,
just committed a safety check for this case. If anyone can give it some tests, then we can backport.
I will analyze to see why it got in such case, but anyhow it is better and safer to detect bogus dereferences to dialogs and not crash.
Thanks, Daniel
On 3/3/11 11:34 AM, Timo Reimann wrote:
Argh:
On 03.03.2011 11:11, Timo Reimann wrote:
What I can tell though is that the crash happens because too much dialog reference counter decrementing takes place. Although I have no clue why,
^^^^^^^^^^^^^^^^^
...the crash happens,
I believe the implementation of unref_dlg_unsafe() (a macro) could be somewhat more robust by not unlinking and destroying a dialog when the counter drops below zero. That is, instead of running the following block
if ((_dlg)->ref<=0) { \ unlink_unsafe_dlg( _d_entry, _dlg);\ LM_DBG("ref<=0 for dialog %p\n",_dlg);\ destroy_dlg(_dlg);\ }\
for _dlg->ref<= 0, I see no reason to change the compare operator to ==.
I see no reason *not* to change compare operator to ==. That is, I want the block to execute iff the reference counter is found to be zero.
--Timo
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
Hi,
more than 3 millions calls have been processed and no problem (crash, increment in memory allocation...) has been noticed since the update, so this check works for us.
Thanks a lot, regards
2011/3/4 Daniel-Constantin Mierla miconda@gmail.com
Hello,
just committed a safety check for this case. If anyone can give it some tests, then we can backport.
I will analyze to see why it got in such case, but anyhow it is better and safer to detect bogus dereferences to dialogs and not crash.
Thanks, Daniel
On 3/3/11 11:34 AM, Timo Reimann wrote:
Argh:
On 03.03.2011 11:11, Timo Reimann wrote:
What I can tell though is that the crash happens because too much dialog reference counter decrementing takes place. Although I have no clue why,
^^^^^^^^^^^^^^^^^
...the crash happens,
I believe the implementation of unref_dlg_unsafe() (a macro) could be
somewhat more robust by not unlinking and destroying a dialog when the counter drops below zero. That is, instead of running the following block
if ((_dlg)->ref<=0) { \ unlink_unsafe_dlg( _d_entry, _dlg);\ LM_DBG("ref<=0 for dialog %p\n",_dlg);\ destroy_dlg(_dlg);\ }\
for _dlg->ref<= 0, I see no reason to change the compare operator to ==.
I see no reason *not* to change compare operator to ==. That is, I want the block to execute iff the reference counter is found to be zero.
--Timo
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
-- Daniel-Constantin Mierla http://www.asipto.com
Hello,
On 3/3/11 10:19 AM, Anton Roman wrote:
Hello,
thanks for your quick reply, my answer is inline.
2011/3/2 Daniel-Constantin Mierla <miconda@gmail.com mailto:miconda@gmail.com>
Hello, looks like related to the callbacks for dialog module. Are you loading other modules that require dialog module?
we are using some features of dialog module such as ending dialogs after a timeout period, and we are using engage_mediaproxy() function, as well. It's an old configuration we had to put in production with no time enough to test. Do you recommend not to use dialog module if not strictly required?
usage of dialog module was always safe and working great for me. But I use it mostly alone, never with mediaproxy module, just with pua_dialoginfo module in some cases. From the logs, the crash was related to the callback system exported by dialog module for the other modules willing to hook into dialog, it is why I asked about the other modules to be sure there is at list one binding to dialog.
So, like with other modules, if there is a problem discovered there, it is important that we fix it - this is a module used a lot by many. Therefore usage is encouraged when needed :-)
Cheers, Daniel
Ok,
I updated the code in the server. I'm testing the changes on Tuesday and I'll send feedback to the list.
We found dialog module very useful because of the information and functionality it provides. For example, we are using its exported function dlg_end_dlg to cleanly end all the active calls when stopping Kamailio is required for maintenance reasons. We are also using the dlg_bridge function to implement click-to-dial applications and it works fine.
On the other hand, in the logs of the server we detected the unreference problem, we got the logs showed below quite often. I don 't know if it can be related to the unreference problem. Since it has a CRITICAL log level I'm not sure if this is so because it can mean a real problem or Kamailio can safety deal with it:
Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: CRITICAL: dialog [dlg_hash.c:615]: bogus event 5 in state 4 for dlg *0x7f2d0a3d30e0*[306:1818049706] with clid ' 92515995-3508071667-342415@usmiap1etx02.mydomain.com' and tags '3508071667-342428' '7A242CC-0' Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: dialog [dlg_hash.c:770]: dialog *0x7f2d0a3d30e0* changed from state 4 to state 4, due event 5 Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm [t_lookup.c:1379]: DEBUG: t_newtran: msg id=4077 , global msg id=4076 , T on entrance=(nil) Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm [t_lookup.c:528]: t_lookup_request: start searching: hash=356, isACK=0 Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm [t_lookup.c:470]: DEBUG: RFC3261 transaction matched, tid=3178c7ec929daf0e4ade2b303de82a20 Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm [t_lookup.c:728]: DEBUG: t_lookup_request: transaction found (T=0x7f2d0a82bca0) Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: tm [t_reply.c:1430]: DEBUG: reply retransmitted. buf=0x7f2d2eff4160: SIP/2.0 5..., shmem=0x7f2d0a72cb90: SIP/2.0 5 Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: dialog [dlg_hash.c:599]: unref dlg *0x7f2d0a3d30e0* with 1 -> 3 Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: <core> [usr_avp.c:646]: DEBUG:destroy_avp_list: destroying list (nil) Mar 2 17:21:30 kamailio2 /usr/local/sbin/kamailio[32153]: DEBUG: <core> [usr_avp.c:646]:
From dlg_hash.h
... DLG_STATE_CONFIRMED 4 /*!< confirmed dialog */ ... DLG_EVENT_REQPRACK 5 /*!< PRACK request */ ...
I understand it means we are receiving a PRACK in a confirmed dialog (ACK received), doesn't it? I guess it can be due either to an error of the SIP stack of the caller side or this PRACK is a rtx due to networking issues (not probable, I think).
Thanks a lot, regards
Antón
2011/3/4 Daniel-Constantin Mierla miconda@gmail.com
Hello,
On 3/3/11 10:19 AM, Anton Roman wrote:
Hello,
thanks for your quick reply, my answer is inline.
2011/3/2 Daniel-Constantin Mierla miconda@gmail.com
Hello,
looks like related to the callbacks for dialog module. Are you loading other modules that require dialog module?
we are using some features of dialog module such as ending dialogs after a timeout period, and we are using engage_mediaproxy() function, as well. It's an old configuration we had to put in production with no time enough to test. Do you recommend not to use dialog module if not strictly required?
usage of dialog module was always safe and working great for me. But I use it mostly alone, never with mediaproxy module, just with pua_dialoginfo module in some cases. From the logs, the crash was related to the callback system exported by dialog module for the other modules willing to hook into dialog, it is why I asked about the other modules to be sure there is at list one binding to dialog.
So, like with other modules, if there is a problem discovered there, it is important that we fix it - this is a module used a lot by many. Therefore usage is encouraged when needed :-)
Cheers, Daniel
-- Daniel-Constantin Mierlahttp://www.asipto.com