[SR-Users] gruu and dead-lock in registrar module

30 Apr 2013


      Hi,
We're hitting an issue in a deployment where all udp receivers are 
sitting in FUTEX_WAIT caused by save() -> lock_udomain() and seem to 
have deadlocked themselves every couple of days.
Looking at the code, enable_gruu in registrar is active by default, and 
in lookup there is a code path
/* temp-gruu lookup */
    res = ul.get_urecord_by_ruid(_d, ahash, &inst, &r, &ptr);
but no lock_udomain is obtained. However, when the execution falls 
through to the "done:" marker, it does
ul.unlock_udomain(_d, &aor);
without having called ul.lock_udomain first.
1.) Could someone please review this part? Looks a bit suspicious, 
although I don't know what implicitly happens in this case. If it were a 
semaphore and you decrease it to -1 by decrementing it without prior 
increment, it's essentially causing a dead-lock, but the current locking 
implementation might work completely different.
2.) Since I have no clue how gruu is supposed to work in detail, and 
since in our config we don't explicitly handle gruu (no lookup in 
loose-route, but gruu is enabled by default in registrar and we don't 
explicitly turned it off), I'm not even sure if we ever hit this code 
path. I only see that the ruid column in the location table is filled, 
but in order to get to this part, the ";gr" flag needs to be set in the 
R-URI for a lookup(), which I don't know whether that happened somehow 
in some call flows (we only log $ru, which I don't think logs these 
parameters, right?).
Some input is highly appreciated!
Andreas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[SR-Users] gruu and dead-lock in registrar module