Hi all,
I'm running 3.1-branch git head currently under valgrind. And I've seen several invalid reads and writes (apparently most are off by one).
==22274== Invalid read of size 1 ==22274== at 0x6DA2C0A: pv_set_ruri_user (pv_core.c:1755) ==22274== by 0x13BA15: lval_pvar_assign (lvalue.c:357) ==22274== by 0x13BF0F: lval_assign (lvalue.c:405) ==22274== by 0x139E4D: do_action (action.c:1472) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x12FDEB: do_action (action.c:711) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x13A86F: run_actions_safe (action.c:1607) ==22274== by 0x1E2666: rval_get_int (rvalue.c:904) ==22274== by 0x1E5252: rval_expr_eval_int (rvalue.c:1866) ==22274== by 0x131B8A: do_action (action.c:1071) ==22274== by 0x13A6F1: run_actions (action.c:1555)
Is fairly obvious. modules_k/pv/pv_core.c has several places which take backup of "val->rs.s[val->rs.len]" and replaces it with zero for string termination. It's than later on set back to the original value. However, it would appear that the value passed does not always point to a memory area which is large enough. This results in multiple invalid reads and writes of one.
The remaining reads are not so clear to me.
==22274== Invalid read of size 1 ==22274== at 0x4811A69: strncpy (mc_replace_strmem.c:339) ==22274== by 0x6DA77FF: set_var_value (pv_svar.c:122) ==22274== by 0x6DA1EDB: pv_set_scriptvar (pv_core.c:1636) ==22274== by 0x13BA15: lval_pvar_assign (lvalue.c:357) ==22274== by 0x13BF0F: lval_assign (lvalue.c:405) ==22274== by 0x139E4D: do_action (action.c:1472) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x131E05: do_action (action.c:1086) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x12FDEB: do_action (action.c:711) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x131E05: do_action (action.c:1086)
==22274== Invalid read of size 2 ==22274== at 0x4811E1F: memcpy (mc_replace_strmem.c:523) ==22274== by 0x1E134D: rval_new_str (rvalue.c:269) ==22274== by 0x1E394E: rval_convert (rvalue.c:1301) ==22274== by 0x1E4780: rval_str_add2 (rvalue.c:1610) ==22274== by 0x1E8681: rval_expr_eval (rvalue.c:2399) ==22274== by 0x13BBD9: lval_assign (lvalue.c:389) ==22274== by 0x139E4D: do_action (action.c:1472) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x12FDEB: do_action (action.c:711) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x139229: do_action (action.c:1342) ==22274== by 0x13A6F1: run_actions (action.c:1555)
Any ideas how to fix these? Or any instructions what to provide more in the bug report?
Cheers, Timo
On 03/31/2011 07:52 PM, Timo Teräs wrote:
Hi all,
I'm running 3.1-branch git head currently under valgrind. And I've seen several invalid reads and writes (apparently most are off by one).
Hello,
I haven't used Valgrind memcheck tool for Kamailio , but from what I know it might not be that useful. What Valgrind does it hooks into the memory allocation functions like malloc/calloc/realloc/alloca and keeps track of said allocation. Because, by default, Kamailio uses it's own memory manager (both for private memory or shared), this messes up on how Valgrind can keep track of its allocation/deallocation.
I don't know if there is a way to make Valgrind to work reliable with K memory managers, but I will take a look on the cases you provide just to be sure.
As a side note, I had uses static analysis tools quite succesfully on 3.1 . I had use clang analyzer from the llvm suite. There are still some outstanding bugs which I didn't had time to look at.
Marius
==22274== Invalid read of size 1 ==22274== at 0x6DA2C0A: pv_set_ruri_user (pv_core.c:1755) ==22274== by 0x13BA15: lval_pvar_assign (lvalue.c:357) ==22274== by 0x13BF0F: lval_assign (lvalue.c:405) ==22274== by 0x139E4D: do_action (action.c:1472) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x12FDEB: do_action (action.c:711) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x13A86F: run_actions_safe (action.c:1607) ==22274== by 0x1E2666: rval_get_int (rvalue.c:904) ==22274== by 0x1E5252: rval_expr_eval_int (rvalue.c:1866) ==22274== by 0x131B8A: do_action (action.c:1071) ==22274== by 0x13A6F1: run_actions (action.c:1555)
Is fairly obvious. modules_k/pv/pv_core.c has several places which take backup of "val->rs.s[val->rs.len]" and replaces it with zero for string termination. It's than later on set back to the original value. However, it would appear that the value passed does not always point to a memory area which is large enough. This results in multiple invalid reads and writes of one.
The remaining reads are not so clear to me.
==22274== Invalid read of size 1 ==22274== at 0x4811A69: strncpy (mc_replace_strmem.c:339) ==22274== by 0x6DA77FF: set_var_value (pv_svar.c:122) ==22274== by 0x6DA1EDB: pv_set_scriptvar (pv_core.c:1636) ==22274== by 0x13BA15: lval_pvar_assign (lvalue.c:357) ==22274== by 0x13BF0F: lval_assign (lvalue.c:405) ==22274== by 0x139E4D: do_action (action.c:1472) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x131E05: do_action (action.c:1086) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x12FDEB: do_action (action.c:711) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x131E05: do_action (action.c:1086)
==22274== Invalid read of size 2 ==22274== at 0x4811E1F: memcpy (mc_replace_strmem.c:523) ==22274== by 0x1E134D: rval_new_str (rvalue.c:269) ==22274== by 0x1E394E: rval_convert (rvalue.c:1301) ==22274== by 0x1E4780: rval_str_add2 (rvalue.c:1610) ==22274== by 0x1E8681: rval_expr_eval (rvalue.c:2399) ==22274== by 0x13BBD9: lval_assign (lvalue.c:389) ==22274== by 0x139E4D: do_action (action.c:1472) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x12FDEB: do_action (action.c:711) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x139229: do_action (action.c:1342) ==22274== by 0x13A6F1: run_actions (action.c:1555)
Any ideas how to fix these? Or any instructions what to provide more in the bug report?
Cheers, Timo
SIP Express Router (SER) and Kamailio (OpenSER) - sr-users mailing list sr-users@lists.sip-router.org http://lists.sip-router.org/cgi-bin/mailman/listinfo/sr-users
On 04/01/2011 10:29 AM, marius zbihlei wrote:
On 03/31/2011 07:52 PM, Timo Teräs wrote:
Hi all,
I'm running 3.1-branch git head currently under valgrind. And I've seen several invalid reads and writes (apparently most are off by one).
Hello,
I haven't used Valgrind memcheck tool for Kamailio , but from what I know it might not be that useful. What Valgrind does it hooks into the memory allocation functions like malloc/calloc/realloc/alloca and keeps track of said allocation. Because, by default, Kamailio uses it's own memory manager (both for private memory or shared), this messes up on how Valgrind can keep track of its allocation/deallocation.
It does not make valgrind useless. Basically, it just makes valgrind not detect all errors. Any errors it does detect, should be valid.
I don't know if there is a way to make Valgrind to work reliable with K memory managers, but I will take a look on the cases you provide just to be sure.
There are valgrind hooks for memory managers. Should probably add them, so it becomes aware of the things. However, those should be only needed for memory pools. Ultimately kamailio uses malloc or mmap to allocate the memory and valgrind knows both.
As a side note, I had uses static analysis tools quite succesfully on 3.1 . I had use clang analyzer from the llvm suite. There are still some outstanding bugs which I didn't had time to look at.
Looks like all my bugs are related to pvars and kamailio modules.
The first really evil programming mistake I found is at: modules_k/textops/txtvar.c:tr_txt_eval_re()
In line 86, it will fills the transformed variable with tr_txt_buf and returns that. However, that is a stack variable, which will get corrupted pretty soon after function return. I'm not sure how to fix this properly.
Marius
==22274== Invalid read of size 1 ==22274== at 0x6DA2C0A: pv_set_ruri_user (pv_core.c:1755) ==22274== by 0x13BA15: lval_pvar_assign (lvalue.c:357) ==22274== by 0x13BF0F: lval_assign (lvalue.c:405) ==22274== by 0x139E4D: do_action (action.c:1472) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x12FDEB: do_action (action.c:711) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x13A86F: run_actions_safe (action.c:1607) ==22274== by 0x1E2666: rval_get_int (rvalue.c:904) ==22274== by 0x1E5252: rval_expr_eval_int (rvalue.c:1866) ==22274== by 0x131B8A: do_action (action.c:1071) ==22274== by 0x13A6F1: run_actions (action.c:1555)
Is fairly obvious. modules_k/pv/pv_core.c has several places which take backup of "val->rs.s[val->rs.len]" and replaces it with zero for string termination. It's than later on set back to the original value. However, it would appear that the value passed does not always point to a memory area which is large enough. This results in multiple invalid reads and writes of one.
I fixed this by making a temporary copy of the string data to local stack variable and null terminating it there. That made valgrind happy. So obviously pv_set_ruri_user and others, are called with strings that are allocated for the exact length, and the hacky zero-termination cannot be used in this module.
The remaining reads are not so clear to me.
I'm analysing the rest problems. Seems that they are almost always related to using re.subst, which has the problem mentioned in the beginning of the mail.
- Timo
On 04/01/2011 10:38 AM, Timo Teräs wrote:
Hello,
Comments inline.
On 04/01/2011 10:29 AM, marius zbihlei wrote:
On 03/31/2011 07:52 PM, Timo Teräs wrote:
Hi all,
I'm running 3.1-branch git head currently under valgrind. And I've seen several invalid reads and writes (apparently most are off by one).
Hello,
I haven't used Valgrind memcheck tool for Kamailio , but from what I know it might not be that useful. What Valgrind does it hooks into the memory allocation functions like malloc/calloc/realloc/alloca and keeps track of said allocation. Because, by default, Kamailio uses it's own memory manager (both for private memory or shared), this messes up on how Valgrind can keep track of its allocation/deallocation.
It does not make valgrind useless. Basically, it just makes valgrind not detect all errors. Any errors it does detect, should be valid.
That is good to know.
I don't know if there is a way to make Valgrind to work reliable with K memory managers, but I will take a look on the cases you provide just to be sure.
There are valgrind hooks for memory managers. Should probably add them, so it becomes aware of the things. However, those should be only needed for memory pools. Ultimately kamailio uses malloc or mmap to allocate the memory and valgrind knows both.
The allocation inside the memory pools, that is what valgrind must be aware of.
The first really evil programming mistake I found is at: modules_k/textops/txtvar.c:tr_txt_eval_re()
In line 86, it will fills the transformed variable with tr_txt_buf and returns that. However, that is a stack variable, which will get corrupted pretty soon after function return. I'm not sure how to fix this properly.
Good call. I truly belive that should a "static" keyword in front of the buffer declaration. Can you test it. (because in the cfg you usually assign that transformation result to another pvar, the buffer will be deep copied soon enough).
diff --git a/modules_k/textops/txt_var.c b/modules_k/textops/txt_var.c index ae1fce6..a13e5e1 100644 --- a/modules_k/textops/txt_var.c +++ b/modules_k/textops/txt_var.c @@ -41,7 +41,7 @@ int tr_txt_eval_re(struct sip_msg *msg, tr_param_t *tp, int subtype, int nmatches; str* result; #define TR_TXT_BUF_SIZE 2048 - char tr_txt_buf[TR_TXT_BUF_SIZE]; + static char tr_txt_buf[TR_TXT_BUF_SIZE];
if(val==NULL || (!(val->flags&PV_VAL_STR)) || val->rs.len<=0) return -1;
Marius
==22274== Invalid read of size 1 ==22274== at 0x6DA2C0A: pv_set_ruri_user (pv_core.c:1755) ==22274== by 0x13BA15: lval_pvar_assign (lvalue.c:357) ==22274== by 0x13BF0F: lval_assign (lvalue.c:405) ==22274== by 0x139E4D: do_action (action.c:1472) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x12FDEB: do_action (action.c:711) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x13A86F: run_actions_safe (action.c:1607) ==22274== by 0x1E2666: rval_get_int (rvalue.c:904) ==22274== by 0x1E5252: rval_expr_eval_int (rvalue.c:1866) ==22274== by 0x131B8A: do_action (action.c:1071) ==22274== by 0x13A6F1: run_actions (action.c:1555)
Is fairly obvious. modules_k/pv/pv_core.c has several places which take backup of "val->rs.s[val->rs.len]" and replaces it with zero for string termination. It's than later on set back to the original value. However, it would appear that the value passed does not always point to a memory area which is large enough. This results in multiple invalid reads and writes of one.
I fixed this by making a temporary copy of the string data to local stack variable and null terminating it there. That made valgrind happy. So obviously pv_set_ruri_user and others, are called with strings that are allocated for the exact length, and the hacky zero-termination cannot be used in this module.
Well, indeed that's a ugly hack. As str's are not 0 terminated, you either copy to a buffer length+1 or you overwrite the memory location next to the end and revert the change afterwards. Not the cleanest way to do it, but from a performance point of view, it make sense. If should be taken extra care to ensure that on every code path the backup is restored.
Marius
The remaining reads are not so clear to me.
I'm analysing the rest problems. Seems that they are almost always related to using re.subst, which has the problem mentioned in the beginning of the mail.
- Timo
On 04/01/2011 11:17 AM, marius zbihlei wrote:
On 04/01/2011 10:38 AM, Timo Teräs wrote:
On 04/01/2011 10:29 AM, marius zbihlei wrote:
On 03/31/2011 07:52 PM, Timo Teräs wrote:
The first really evil programming mistake I found is at: modules_k/textops/txtvar.c:tr_txt_eval_re()
In line 86, it will fills the transformed variable with tr_txt_buf and returns that. However, that is a stack variable, which will get corrupted pretty soon after function return. I'm not sure how to fix this properly.
Good call. I truly belive that should a "static" keyword in front of the buffer declaration. Can you test it. (because in the cfg you usually assign that transformation result to another pvar, the buffer will be deep copied soon enough).
static buffers are not nice solution and can break depending on the call chain. However, since if it's going to be deep copied anyway, it'll probably work here. I was not sure about the deep copying part, so I hesitated to suggest it as possible fix.
diff --git a/modules_k/textops/txt_var.c b/modules_k/textops/txt_var.c index ae1fce6..a13e5e1 100644 --- a/modules_k/textops/txt_var.c +++ b/modules_k/textops/txt_var.c @@ -41,7 +41,7 @@ int tr_txt_eval_re(struct sip_msg *msg, tr_param_t *tp, int subtype, int nmatches; str* result; #define TR_TXT_BUF_SIZE 2048
char tr_txt_buf[TR_TXT_BUF_SIZE];
static char tr_txt_buf[TR_TXT_BUF_SIZE]; if(val==NULL || (!(val->flags&PV_VAL_STR)) || val->rs.len<=0) return -1;
Yes, I tried this actually earlier right after sending my previous mail. It seems that this cured all the remaining valgrind errors I was seeing.
So yes, this should definitely pushed to all applicable git repos/branches.
==22274== Invalid read of size 1 ==22274== at 0x6DA2C0A: pv_set_ruri_user (pv_core.c:1755) ==22274== by 0x13BA15: lval_pvar_assign (lvalue.c:357) ==22274== by 0x13BF0F: lval_assign (lvalue.c:405) ==22274== by 0x139E4D: do_action (action.c:1472) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x12FDEB: do_action (action.c:711) ==22274== by 0x13A6F1: run_actions (action.c:1555) ==22274== by 0x13A86F: run_actions_safe (action.c:1607) ==22274== by 0x1E2666: rval_get_int (rvalue.c:904) ==22274== by 0x1E5252: rval_expr_eval_int (rvalue.c:1866) ==22274== by 0x131B8A: do_action (action.c:1071) ==22274== by 0x13A6F1: run_actions (action.c:1555)
Is fairly obvious. modules_k/pv/pv_core.c has several places which take backup of "val->rs.s[val->rs.len]" and replaces it with zero for string termination. It's than later on set back to the original value. However, it would appear that the value passed does not always point to a memory area which is large enough. This results in multiple invalid reads and writes of one.
I fixed this by making a temporary copy of the string data to local stack variable and null terminating it there. That made valgrind happy. So obviously pv_set_ruri_user and others, are called with strings that are allocated for the exact length, and the hacky zero-termination cannot be used in this module.
Well, indeed that's a ugly hack. As str's are not 0 terminated, you either copy to a buffer length+1 or you overwrite the memory location next to the end and revert the change afterwards. Not the cleanest way to do it, but from a performance point of view, it make sense. If should be taken extra care to ensure that on every code path the backup is restored.
Yes. My argument was that it absolutely does not work in two cases: - the memory allocated for the string is exactly the strings length (you'd be overwriting random places with the string terminator) - the same memory area is used for two overlapping strings: changing one byte would change all string references using that area
Though, in this instance the second item does not seem possible. I'm not sure if the first item can happen or not.
The remaining reads are not so clear to me.
I'm analysing the rest problems. Seems that they are almost always related to using re.subst, which has the problem mentioned in the beginning of the mail.
Yes. Looks like all the nags were related to usage re.subst results.
Thanks, Timo
On 04/01/2011 12:53 PM, Timo Teräs wrote:
On 04/01/2011 11:17 AM, marius zbihlei wrote:
static buffers are not nice solution and can break depending on the call chain. However, since if it's going to be deep copied anyway, it'll probably work here. I was not sure about the deep copying part, so I hesitated to suggest it as possible fix.
Hello,
I pushed the patch to master and 3.1.
Yes, I tried this actually earlier right after sending my previous mail. It seems that this cured all the remaining valgrind errors I was seeing.
So yes, this should definitely pushed to all applicable git repos/branches. Yes. My argument was that it absolutely does not work in two cases:
- the memory allocated for the string is exactly the strings length (you'd be overwriting random places with the string terminator)
- the same memory area is used for two overlapping strings: changing one byte would change all string references using that area
Though, in this instance the second item does not seem possible. I'm not sure if the first item can happen or not.
Well, I think the first case is what happens. it will not overwrite random places, it will place a \0 at the end of the allocated buffer(even the memory location is not part of that buffer). Because buffer[length] points to some memory in pkg memory pool, accessing it will not cause an access violation (it's not an address in .txt or .ro segments) . The only problem I forsee is that during the set of \0 and the revert to the original value, the address location would be accessed in some other way, but I doubt it (keep in mind that locations have to be properly aligned - so i presume in most cases a padding byte will be overwritten). Not a good practice, but if no bugs are present I don't recommend changing it.
Cheers, Marius
On 04/01/2011 01:33 PM, marius zbihlei wrote:
On 04/01/2011 12:53 PM, Timo Teräs wrote:
Yes. My argument was that it absolutely does not work in two cases:
- the memory allocated for the string is exactly the strings length (you'd be overwriting random places with the string terminator)
- the same memory area is used for two overlapping strings: changing one byte would change all string references using that area
Though, in this instance the second item does not seem possible. I'm not sure if the first item can happen or not.
Well, I think the first case is what happens. it will not overwrite random places, it will place a \0 at the end of the allocated buffer(even the memory location is not part of that buffer). Because buffer[length] points to some memory in pkg memory pool, accessing it will not cause an access violation (it's not an address in .txt or .ro segments) . The only problem I forsee is that during the set of \0 and the revert to the original value, the address location would be accessed in some other way, but I doubt it (keep in mind that locations have to be properly aligned - so i presume in most cases a padding byte will be overwritten). Not a good practice, but if no bugs are present I don't recommend changing it.
What if that byte is actually pkg memory pool management data? That breaks memory allocation which is done when doing deep copy?
But yes, the re.subst fix alone seems to have removed all the valgrind reported issues.
- Timo
On Friday 01 April 2011, Timo Teräs wrote:
[...] There are valgrind hooks for memory managers. Should probably add them, so it becomes aware of the things. However, those should be only needed for memory pools. Ultimately kamailio uses malloc or mmap to allocate the memory and valgrind knows both.
Hi Timo,
as a small side note: i think you could switch the PGK internal memory manager to use system malloc instead of its pool - maybe this makes it easier for valgrind. At least in 1.5 this was present, e.g.: https://openser.svn.sourceforge.net/svnroot/openser/branches/1.5/mem/mem.h (SYSTEM_MALLOC)
Cheers,
Henning