Hello,
what version are you playing with? To look in the right branch when troubleshooting first time, then look at the others that might be affected...
Cheers, Daniel
On 4/30/13 5:07 PM, Andreas Granig wrote:
Hi,
We're hitting an issue in a deployment where all udp receivers are sitting in FUTEX_WAIT caused by save() -> lock_udomain() and seem to have deadlocked themselves every couple of days.
Looking at the code, enable_gruu in registrar is active by default, and in lookup there is a code path
/* temp-gruu lookup */ res = ul.get_urecord_by_ruid(_d, ahash, &inst, &r, &ptr);
but no lock_udomain is obtained. However, when the execution falls through to the "done:" marker, it does
ul.unlock_udomain(_d, &aor);
without having called ul.lock_udomain first.
1.) Could someone please review this part? Looks a bit suspicious, although I don't know what implicitly happens in this case. If it were a semaphore and you decrease it to -1 by decrementing it without prior increment, it's essentially causing a dead-lock, but the current locking implementation might work completely different.
2.) Since I have no clue how gruu is supposed to work in detail, and since in our config we don't explicitly handle gruu (no lookup in loose-route, but gruu is enabled by default in registrar and we don't explicitly turned it off), I'm not even sure if we ever hit this code path. I only see that the ruid column in the location table is filled, but in order to get to this part, the ";gr" flag needs to be set in the R-URI for a lookup(), which I don't know whether that happened somehow in some call flows (we only log $ru, which I don't think logs these parameters, right?).
Some input is highly appreciated!
Andreas
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users