![]() |
The probability of an outcome of a particular event happening is the chance or likelihood of it happening. We all have an intuitive understanding of whether something is likely to happen, but mathematically we can determine how likely something is exactly.
An event has outcomes. These are possible things that could arise from the event. For instance, buying a raffle ticket is an event. It has two basic outcomes: Either you win or you don’t!

| Probability of X happening = | Total number of possible outcomes |
|
Die |
|||||||
|
Coin |
1 |
2 |
3 |
4 |
5 |
6 |
|
|
Head |
H1 |
H2 |
H3 |
H4 |
H5 |
H6 |
|
|
Tail |
T1 |
T2 |
T3 |
T4 |
T5 |
T6 |
|
The probability of any one of a set of equiprobable outcomes is equal to 1 divided by the number of outcomes. For instance, in the National Lottery, there are 49 numbered balls. They are all equally likely to appear, so the probability of any given ball appearing is 1 / 49.
Suppose I had three red beads and three yellow beads in a bag. I reach in and pull out a bead at random and note its colour. Then I put the bead back in the bag and pull out a second bead at random. Because there are still three beads of each colour in the bag when I pull out the second bead, then the odds of getting a red bead are the same for the second draw whatever colour came of the bag on the first draw. This shows that the first and second draws are independent - the probabilities of the different outcomes on the second draw remain exactly the same regardless of what colour came out of the bag on the first draw.
On the other hand, suppose I didn't replace the first bead before pulling out the second bead. In this case, the odds of each colour on the second draw will be affected by the colour that came out on the first draw. For instance, if I pull out a red bead on the first draw, then the chances of a red bead on the second draw have been reduced slightly (0.4 instead of 0.5). On the other hand, if I pull out a yellow bead the first time round, then the probability of a red bead on the second draw increases slightly (0.6 instead of 0.5). In this case, the probabilities are changed for the second draw depending on the outcome of the first draw, so the two events are not independent.
Outcomes are said to be mutually exclusive if any of those outcomes prevents the others from happening. For instance, if I toss a die, then the outcomes - 1, 2, 3, 4, 5 or 6 - are mutually exclusive. We can "recast" these outcomes as "odd number" (1, 3, 5) and "even number" (2, 4, 6) - and these are still mutually exclusive as the die score can't be both odd and even at the same time. However, the possible outcomes "even number" and "number bigger than 3" are not mutually exclusive as there are two possible numbers (4, 6) which are both even and bigger than 3 - the outcomes can both happen at the same time.
| p(Heads) | means | "The probability of getting a head" |
| p(6) | means | "The probability of getting a (score of) 6" |
| p(Even number) | means | "The probability of getting an even number" |
If we want to consider the joint probability of two outcomes happening together, X and Y, then that is written using the symbol Ç , as follows:
| p(X Ç Y) | means | "The probability of both X and Y happening" |
| p(red Ç square) | means | "The probability of both the outcomes red and square happening" |
Similarly, the symbol È is used to signify or i.e. the probability of either X or Y (or both!) happening:
| p(X È Y) | means | "The probability of X or Y (or both) happening" |
| p(red È square) | means | "The probability of either the red or square (or both) happening" |
Sometimes we want to indicate the probability of one outcome that has been affected by another i.e. the probability of an outcome assuming that or given that another event has happened. For instance, if I chose an individual at random from all the inhabitants of New York, then the probability of that person being a police officer would be relatively low (out of the millions of inhabitants of New York, only a small minority of them are police officers). On the other hand, if I picked the individual only from the New Yorkers who happened to be sitting in police cars, then it is very likely that the person I chose would be a police officer, i.e. the probability that the person is a police officer given that (or assuming that) the person is sitting in a police car, is quite high.
The term "assuming that" or "given that" is written using the symbol | so
| p(X | Y) | means | "The probability of X assuming that Y has happened" |
| p(police officer | sitting in police car) | means | "The probability of getting a police officer given that the person is in a police car" |
The symbol ~ or Ø means "not", in case we want to find the probability of an event not happening. In this way, the probability of getting any coloured bead other than red would be written as p(~red) or p(Øred). Similarly, the probability of neither outcomes X nor Y happening would be written as p(~(X È Y)) or p(Ø(X È Y)). According to the rule that all probabilities must add to 1 (representing certainty), the following rule applies for any outcome (written as X):
Either X happens or it doesn't, so the probabilities of X happening and it not happening must add to give 1. This means that each of the probabilities is 1 minus the other probability.
or, in plain English:
If outcomes A and B are independent, then the probability of B happening is not affected by whether A has happened, i.e. p(B | A) = p(B). In the special case when A and B are independent, the AND rule simplifies to the following:
For this reason, we often say that AND turns into MULTIPLY. For example, if we toss a coin (where the probability of a head is 0.5) and throw a die (where the probability of getting a score of 6 is 1/6), then the probability of getting both a head and a score of 6 is
Since the toss of the coin and the die are independent (they don't affect each other), we can write this as:
However, the events aren't always independent. Suppose I choose randomly a whole number in the range 1 to 100 inclusive. What is the probability of getting a multiple of 10 which is less than 70? We could solve this simply by counting how many multiples of 10 there are which are less than 70 (there are 6 of them), and calculate the probability as 6 / 100 = 0.06. Alternatively, we can use the AND rule:
= p(Number is a multiple of 10) x p(Number < 70 | Number is a multiple of 10)
= 0.1 x 0.6
= 0.06
This works because there are 10 multiples of 10 available from the 100 numbers in total (giving a probability of 0.1) and of those 10 multiples of 10, there are 6 which are less than 70 (giving a probability of 0.6).
or in plain English:
The third term is important - many people often forget that as well as adding two terms together, a term needs to be subtracted. If outcomes A and B are independent (i.e. they can't both happen), then the last term becomes 0, and the rule is simplified to the following:
For this reason, we often say that OR turns into ADD. For example, going back to the number chosen at random in the range 1 to 100, we can calculate the probability that the number is either an even number or is bigger than 40.
Obviously, we could work out how many numbers fit the bill: there are 80 such numbers out of the 100 giving a probability of 0.8. However, it can easily be worked out using the OR rule.
There are 50 even numbers in the range 1 to 100, so p(Even number) = 0.5.
There are 60 numbers greater than 40, so p(Number > 40) = 0.6.
However, you can now see why we can't just add the probabilities and leave it at that. 0.5 + 0.6 = 1.1, and the final probability can't be bigger than 1! We must calculate the probability that the number chosen is both an even number and bigger than 40. There are 30 such numbers, giving a probability of 0.3, so the probability we want is:
We know that
However, since p(A Ç B) means "the probability of both A and B happening," then it can be rewritten as p(B Ç A). This means that the AND rule can be rewritten as
Bayes put these two versions of the rule together to get the following:
p(A).p(B | A) = p(B).p(A | B)
| p(B | A) = | p(A) |
Here is an example of how we might use Bayes' Rule.
A speech recognition device is to be trained to recognise the pilot saying either TV ON or DRAW CURTAINS and fitted in an interface to help disabled people. However, the device isn't always reliable. Assuming that the disabled person says one of the two commands, and the device recognises it as TV ON, what was the probability that TV ON really was the word uttered?
The machine was trained and during training, the user said TV ON 70 times and DRAW CURTAINS 92 times, and the machine recognized words as follows:
p(TV ON spoken) = 70 / (70 + 92) = 0.432
p(TV ON was recognized) = (52 + 12) / (52 + 12 + 18 + 80) = 64 / 162 = 0.395
p(TV ON recognized | TV ON spoken) = 52 / 70 = 0.743
Representing "TV ON was recognized" by A and "TV ON was spoken" by B in Bayes' Rule above, we have
| p(TV ON spoken | TV ON recognized) | = | p(TV ON was recognized) |
= |
0.432 x 0.743 / 0.395 |
|
= |
0.813 |
If you compare this with the simple recognition rate which is (52 + 80) / (52 + 12 + 18 + 80) = 0.815, then you see that its recognition of TV ON is actually slightly worse than the average!
![]() Back |
![]() Questions |
![]() Problem Solving |
![]() Data Sufficiency |