<!-- Kamailio Pull Request Template -->
<!-- IMPORTANT: - for detailed contributing guidelines, read: https://github.com/kamailio/kamailio/blob/master/.github/CONTRIBUTING.md - pull requests must be done to master branch, unless they are backports of fixes from master branch to a stable branch - backports to stable branches must be done with 'git cherry-pick -x ...' - code is contributed under BSD for core and main components (tm, sl, auth, tls) - code is contributed GPLv2 or a compatible license for the other components - GPL code is contributed with OpenSSL licensing exception -->
#### Pre-Submission Checklist <!-- Go over all points below, and after creating the PR, tick all the checkboxes that apply --> <!-- All points should be verified, otherwise, read the CONTRIBUTING guidelines from above--> <!-- If you're unsure about any of these, don't hesitate to ask on sr-dev mailing list --> - [x] Commit message has the format required by CONTRIBUTING guide - [x] Commits are split per component (core, individual modules, libs, utils, ...) - [x] Each component has a single commit (if not, squash them into one commit) - [x] No commits to README files for modules (changes must be done to docbook files in `doc/` subfolder, the README file is autogenerated)
#### Type Of Change - [x] Small bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds new functionality) - [ ] Breaking change (fix or feature that would change existing functionality)
#### Checklist: <!-- Go over all points below, and after creating the PR, tick the checkboxes that apply --> - [ ] PR should be backported to stable branches - [x] Tested changes locally - [x] Related to issue #4019
#### Description I assume that the crash #4019 happens, because of an issue in libcurl, namely in the `event_cb` function ([in this line](https://github.com/kamailio/kamailio/blob/master/src/modules/http_async_clie...)) kamailio removes the pointer to the `cell` object (which was attached to this `g->multi` curl descriptor earlier), and the `cell` object memory is released [at this line](https://github.com/kamailio/kamailio/blob/master/src/modules/http_async_clie...). However, the crash happens when `event_cb` is called when `curl_multi_remove_handle` is called [at this line](https://github.com/kamailio/kamailio/blob/master/src/modules/http_async_clie...). In this fix `cell` object is retrieved from the corresponding CURL descriptor and then it's compared with the `cell` object passed to this callback. With this change when this issue happens the `cell` object, which is retrieved from the corresponding CURL descriptor, is NULL pointer, because it was removed earlier [in this line](https://github.com/kamailio/kamailio/blob/master/src/modules/http_async_clie...), so the crash should not appear. However it might be fixed in other way, namely we may not to pass `cell` to the `event_cb` callback and always retrieve it from CURL descriptor. I would appreciate if developers comment that.
You can view, comment on, or merge this pull request online at:
https://github.com/kamailio/kamailio/pull/4020
-- Commit Summary --
* http_async_client: fixed crash on curl callback
-- File Changes --
M src/modules/http_async_client/http_multi.c (18)
-- Patch Links --
https://github.com/kamailio/kamailio/pull/4020.patch https://github.com/kamailio/kamailio/pull/4020.diff
@ivanuschak pushed 1 commit.
1021c39bbcf65b248cc8782881e09946e2549635 fixed clang format
Thanks for the report and for the fix proposal. However it is not clear to me how the crash is happening. In case of timeout (like it looks in your case) [this line](https://github.com/kamailio/kamailio/blob/5801ff9612d9eef9a7b06f91ff533a5902...) set sockptr to NULL so that, when sock_cb is called, "cell" iS ULL and [thsi condition](https://github.com/kamailio/kamailio/blob/5801ff9612d9eef9a7b06f91ff533a5902...) is false. From the backtrace we see that this is not the case in your crash because sockptr still holds the pointer to the cell that has already be unlinked and freed. I'm not sure what's happening in your case but: - which version of curl are you using? - could you please clarify/describe the scenario?
On the patch side: you are right that we could, in sock_cb, lookup for the cell by the easy handler, there would be no need of a cell_tmp to compare with in this case. Anyway this scenario should never occur this way so I would like to understand more.
> which version of curl are you using?
The libcurl version is 7.61.1-34
``` [centos@esrp-1b ~]$ rpm -qa | grep curl libcurl-minimal-7.61.1-34.el8.aarch64 curl-7.61.1-34.el8.aarch64 libcurl-devel-7.61.1-34.el8.aarch64 ```
> could you please clarify/describe the scenario?
The crash happens occasionally, I have not yet identified the moment and the context of this crash. The pcap file does not contain HTTP/HTTPs at the moment when it's crashing. I will observe and provide with more details.
> On the patch side: you are right that we could, in sock_cb, lookup for the cell by the easy handler, there would be no need of a cell_tmp to compare with in this case. Anyway this scenario should never occur this way so I would like to understand more.
I agree that it should be investigated more, and I don't consider this proposed fix as a long term solution. It can be considered as some kind of a temporary workaround until the root cause is identified. At first glance it looks like an issue in libcurl, but more details are needed to understand the context of this crash and to find the root cause. I will gather more details and provide them.
@ivanuschak: I've not been able to reproduce the crash, but could you try the changes in this branch grumvalski/http_async_cb_fix? It follows your suggestion of not using the *sockp passed to the sock_cb function.
@grumvalski yes, I will deploy and test the changes you suggested. It looks more correct than the current pull-request, because it just retrieves the `cell` object from the corresponding CURL descriptor.
@ivanuschak : have you been able to test the patch?
@grumvalski, I tested it on the environment where this issue does not occur and the fix works good there. However I'd like to see how it works on the env, where we experience this crash.
The deployment on the environment, where this issue is observed, is scheduled for tomorrow 05/12/2024. So it's not yet deployed. On average the crash occurs once every two days. After the fix is deployed I will be tracking the env within a week and will let you know how it works.
@ivanuschak : thank you, I'll wait for your feedback bedore merging.
@ivanuschak pushed 1 commit.
647a4c8117a959761608b2a979bdc3799c014ea3 http_async_client: fixed crash on curl callback1
@grumvalski last week the fix was installed on the environment where the crash occured. After installation the crash no longer occurs. `http_async_client` module works good with this fix.
Thanks for reporting back. So just to conclude this PR, the fix from @grumvalski branches was tested from you? This particular patch should be merged to master and stable branches, and this PR with a different approach is then not needed anymore?
@ivanuschak : thanks for reporting @henningw : the patch has been integrated in this PR, I'm going to merge this one and delete my temp branch.
@grumvalski approved this pull request.
Merged #4020 into master.