If a primary process goes down, an "FT_CHANGE" happens and the mirror becomes the primary.
I came across an instance where the mirror process was made to "Fail over" just as it became "Ready" (A process is "Ready" once it finishes the initialization process and sends a message to the controlling system). This has resulted some bizarre behaviour.
After having a look at the logs, we came to the conclusion that this was due to the fact that the process gave the "Ready" signal before some "pthread_create" calls have finished executing. I check the return types of the code but this does not guarantee that the threads are up and running at that moment.
The lesson: Wait till the child thread comes up and sends the main thread an acknowledgement before making any calls to that thread (like we do in most of the other processes)
1 comment:
Hi Gayan,
Where are you working at? :)
I mean, nowadays, it is really hard to find a company that is achieving software fault tolerancy by running multiple concurrent copies of the same software. Thus, what you do and implement is great i think.
Executing several copies of the software concurrently brings up problems like state synchronization, heartbeat checking, etc.
But anyway, these topics are great. I would like to work on these. It would be great if you share the name of the company you work. That would be a good starting point for me.
Regards.
Post a Comment