Not just in IMS but in general media proxies usually expect traffic from both ends before they start relaying it from one endpoint to another. This is basically how media proxies get aware of as from where the RTP would come from and where it is suppose to go.
During the call setup, the media proxy advertises its address to both caller (by changing SDP in 200 OK) and the callee (by changing SDP in initial invite), so that they both know where to send RTP to. However on the other hand, media proxy is NOT aware of actual address from where it would receive media from (the address in original SDPs which it receives from caller or callee may have private address or some other NAT issue associated with them, e.g. symmetric NAT etc.). Therefore, as soon as call establishes, the media proxy waits for incoming RTP from caller side and the callee side. When caller sends RTP, media proxy learns the actual address of caller from where it has received the audio and when callee sends RTP, media proxy learns the actual address of callee to where it is suppose to send RTP that it had received from caller and vice versa.
Thank you.