i turned out that the problem below was caused by a firewall that
blocked tcp session if it had been idle for a few minutes. the problem
went away when i reduced tcp_connection_lifetime from 3610 to 120 sec.
i don't know if it possible to configure tcp_connection_lifetime on per
connection basis. for example, tcp connection to UA could have
tcp_connection_lifetime=3610, since tcp session is kept active by UA
sending crlfs, whereas tcp connection to another proxy could have a
shorter tcp_connection_lifetime.
-- juha
-------------------------------------------------------------
i did some
more debugging and wireshark shows that the 3.0 sr does not
even try to send anything to the 3.1 sr over the tcp connection although
netstat now tells at both hosts that the connection is established.
instead sr 3.0 replies immediately after receiving invite from ua:
SIP/2.0 477 Unfortunately error on sending to next hop occurred (477/TM)
there is no related messages in syslog. perhaps tcp stack on 3.0 host
has not got acks for earlier packets and just waits there.
The most likely candidates are:
- blacklisted destination (due to some previous error).
You could check it with sercmd dst_blacklist.view or
dst_blacklist.debug.
- some local firewall rules on the OUTPUT chain
running out of memory (but it's strange that you don't get any log
messages)