Iñaki Baz Castillo wrote:
2010/4/19 marius zbihlei marius.zbihlei@1and1.ro:
Hello Iñaki ,
I had a look over the patches and they look fine. Of course I think one of the core developers should have a look also.
I suggest one thing: Instead of a read() from the read end of the pipe, can we use a select()/poll() so we can have timeouts and prevent blocking. For example does it make sense to say that if the child process doesn't write something to the pipe in let's say 1 minute, this means that it is blocked somewhere and the main process should exit with error (thus the init.d script should return != 0) ?!
Hello Iñaki ,
Hi, in the proposed code if the child process (main process) exits due to an error then it writes nothing to the pipe and the parent process reads 0 bytes from it. It means that an error has occurred and it exits with -1. In case main process starts properly it writes something to the pipe ("go") so the master process reads 2 bytes (>0) and exits with 0.
Indeed, main process is the process after the fork, and this is the process that writes to signal the parent. I see two possible pitfalls: 1. If the main process blocks, this will block the parent process also 2. If the main process returns without writing the bytes, and there are still child processes left(tcp or udp worker children etc), then they will still have the writing part of the socket open (forked from the main process)and again the parent (master) process will keep blocking (didn't discovered a case where it might happen).
In the case you suggest, if the main process gets blocked for some reason (it doesn't exit but neither writes into the pipe) then as you say the parent process would get blocked. Not good. Is it possible to do a blocking read of the pipe with a timeout of 4-5 seconds? or is the select()/poll() stuff required for it?
With a select it is possible to do a blocking read from some time. I strongly suggest more than 4-5 seconds, I think 30s should be a minimum.
Anyhow, I wonder if it would be enough. Note that in case the main process gets blocked and the parent process exits with -1 (due to the suggested timeout) the main process still remains running (even if blocked). Perhaps the parent process should kill it and ensure it's dead in case such timeout occurs?
Thanks a lot.
Good question.. We can kill all children from the main proces, but I am not sure that from the masetr process we can do this..
Marius