Tuesday, February 22, 2011

Why Oracle Clusterware Needs Odd Number of Voting Disks – A statistics manner explanation



 


 

During Hayasaka-san and Okuma-san's business trip to OARDC, I was asked this question:

    Why we need odd number of voting disk in the cluster.


 

I didn't answer the question directly; instead I talked about the rule:

    Each node in the cluster can survive only if it can access more than half of the voting disk.

    Or the cluster can function normally even if no more than half of the voting disks fail.

And then I talked about various scenarios for network heart beat failure/disk heart failure.


 

Actually I was not satisfied with my answer. So I tried to find other possible ways to convince people that we should use odd number of VD, and I think this can be explained using overall system availability view in a statistics manner.

As you know, many hardware manufacture mark their product availability with 0.99…(many 9s). We can also get such information for the devices that are used for voting disk.


 

Let's start from a well-known mathematics equation:

Suppose:

    Disk availability A=0.9

    Disk un-availability U=0.1

    N hard disks used for voting disks

Then possibility for various scenarios of voting disk failure can be expressed by:

   


 

Under the rule from my previous answer, let's suppose: M=floor(N/2)

    That is: M is the biggest

The overall system un-availability due to voting disk failure =

Oracle can support up to 15 voting disks, let's take a look at the final result:

Voting Disks(N)MValue
00.1
10.19
10.028
20.0523
20.00856
30.015985
3……
4……
4……
10 5……
11 5……
12 6……
13 6……
14 7……
15 7……

 

Conclusion:

    We can see that the overall system availability is reduced if adding voting disks from an odd number to the next even number. Adding a voting disk does not always provide higher availability, in some case may be harmful to the availability.

No comments:

Post a Comment