The Basics of Probability

The probability of an outcome of a particular event happening is the chance or likelihood of it happening. We all have an intuitive understanding of whether something is likely to happen, but mathematically we can determine how likely something is exactly.

Events and outcomes

An event is something that happens. It may not be very large or grand (e.g. tossing a coin is an event), and it may not happen at all (being knocked down by a bus is an event, but if I decide not to leave the house, it won’t happen).

An event has outcomes. These are possible things that could arise from the event. For instance, buying a raffle ticket is an event. It has two basic outcomes: Either you win or you don’t!

Mathematical values for probability

Probabilities are constrained mathematically to lie between two limits: 0 represents "impossibility" (something simply cannot happen) and 1 represents "certainty" (guaranteed to happen). These are two extremes, and most probabilities lie somewhere between them. You can think of the probability ranges as being a short number line running from 0 at one end to 1 at the other:

We can estimate probability in one of three ways:

Classical probability

These are the sort of probabilities that they always give in text books - generally speaking, it means probabilities worked out from theoretical values, such as the formulae you will meet later. For instance, classical theory suggests that if you throw a die (singular of dice), then the probability of getting a score of 6 should be 1/6 (i.e. if you throw it 600 times, you should get 100 sixes). However, if you actually try this, you will find that the figures are unlikely to come out perfectly.

Relative frequency

As a rule, a relative frequency probability is any probability that can be found by sampling a large amount of data. For instance, if you wanted to know the probability that your train would be late, then you would see what time it turned up every day for a month, and see how many times it turned up late. You obviously wouldn’t be able to get an exact probability - you would simply have to assume that the month you sampled was a fairly typical one. A similar argument applies to the probability that the next person who enters a supermarket is female - sample the next 100 people and see how many of them are women.

Subjective probability

Some probabilities can only be estimated by making an educated guess. For instance, we might guess that the probability of finding intelligent life elsewhere in the universe is quite high, but there is no way we can put an actual figure to it. A theoretical probability is one that can only be estimated in this way. We might say that subjective probabilities are based solely on belief.

The most important of these as far as real-life probabilities are concerned is the second. For instance, if you want to know what the probability of a sunny day in May is, then you would look at May last year’s records (31 days) and count how many sunny days there were (say 17). The probability would then be 17 / 31 = 0.543 Click for a question

Calculating probability theoretically

This is the standard way to calculate probability. You count up the total number of outcomes that match the condition you are interested in and divide it by the total number of outcomes that could possibly happen:

Probability of X happening =
Total number of outcomes in which X happens
Total number of possible outcomes

Possibility space

Sometimes you want the probabilities of joint outcomes - when more than one outcome can happen. For instance, if you toss a coin, you can get a head (H) or a tail (T). Similarly, if you throw a die, you can get a score from 1 to 6. We list all these results in a probability space table. It doesn’t tell you the probability values themselves, but it lists all the possible outcomes:

 

Die

Coin

 

1

2

3

4

5

6

Head

H1

H2

H3

H4

H5

H6

Tail

T1

T2

T3

T4

T5

T6

Equiprobable outcomes

A set of outcomes of an event are said to be equiprobable if they all have same chance of happening. For instance, if you toss a fair coin, you are equally likely to get a head as a tail. The probability of each of these is 0.5. If you toss a fair die, all the outcomes are equally likely. They all have a probability of 1/6.

The probability of any one of a set of equiprobable outcomes is equal to 1 divided by the number of outcomes. For instance, in the National Lottery, there are 49 numbered balls. They are all equally likely to appear, so the probability of any given ball appearing is 1 / 49.

Total Probabilities Must Add to 1

If you consider the probabilities of all the possible outcomes of an event, they must all add to give 1. This is because 1 represents certainty, and you can guarantee that one of these outcomes must happen. For instance, if a bag of beads contains only red, green and blue beads, and you reach in and take one out, then the probabilities of it being a red bead, a green bead and a blue bead respectively, must add to give 1 - these are the only colours present!

Independent events

Two events are said to be independent if they cannot influence or affect each other. For instance, if I toss a coin and also throw a die (whether at the same time or one after another), then the two events and the outcomes from them are independent. There is no way that the outcome of the coin toss can affect the outcome of the die throw and vice-versa.

Suppose I had three red beads and three yellow beads in a bag. I reach in and pull out a bead at random and note its colour. Then I put the bead back in the bag and pull out a second bead at random. Because there are still three beads of each colour in the bag when I pull out the second bead, then the odds of getting a red bead are the same for the second draw whatever colour came of the bag on the first draw. This shows that the first and second draws are independent - the probabilities of the different outcomes on the second draw remain exactly the same regardless of what colour came out of the bag on the first draw.

On the other hand, suppose I didn't replace the first bead before pulling out the second bead. In this case, the odds of each colour on the second draw will be affected by the colour that came out on the first draw. For instance, if I pull out a red bead on the first draw, then the chances of a red bead on the second draw have been reduced slightly (0.4 instead of 0.5). On the other hand, if I pull out a yellow bead the first time round, then the probability of a red bead on the second draw increases slightly (0.6 instead of 0.5). In this case, the probabilities are changed for the second draw depending on the outcome of the first draw, so the two events are not independent.

Mutually exclusive outcomes

When I toss a coin just once, there are two possible outcomes - a head or a tail. Since the coin is only tossed once, I can't get both a head and a tail. Only one of them is possible. If the coin comes up heads, then it can't come up tails on the same throw and vice-versa. We say that the two outcomes are mutually exclusive.

Outcomes are said to be mutually exclusive if any of those outcomes prevents the others from happening. For instance, if I toss a die, then the outcomes - 1, 2, 3, 4, 5 or 6 - are mutually exclusive. We can "recast" these outcomes as "odd number" (1, 3, 5) and "even number" (2, 4, 6) - and these are still mutually exclusive as the die score can't be both odd and even at the same time. However, the possible outcomes "even number" and "number bigger than 3" are not mutually exclusive as there are two possible numbers (4, 6) which are both even and bigger than 3 - the outcomes can both happen at the same time.

Probability Notation

Probabilities are often written using the following notation. The probability of a certain outcome X (whatever X is) is written as p(X) (pronounced "p of X"). For instance:

p(Heads) means "The probability of getting a head"
p(6) means "The probability of getting a (score of) 6"
p(Even number) means "The probability of getting an even number"

If we want to consider the joint probability of two outcomes happening together, X and Y, then that is written using the symbol Ç , as follows:

p(X Ç Y) means "The probability of both X and Y happening"
p(red Ç square) means "The probability of both the outcomes red and square happening"

Similarly, the symbol È is used to signify or i.e. the probability of either X or Y (or both!) happening:

p(X È Y) means "The probability of X or Y (or both) happening"
p(red È square) means "The probability of either the red or square (or both) happening"

Sometimes we want to indicate the probability of one outcome that has been affected by another i.e. the probability of an outcome assuming that or given that another event has happened. For instance, if I chose an individual at random from all the inhabitants of New York, then the probability of that person being a police officer would be relatively low (out of the millions of inhabitants of New York, only a small minority of them are police officers). On the other hand, if I picked the individual only from the New Yorkers who happened to be sitting in police cars, then it is very likely that the person I chose would be a police officer, i.e. the probability that the person is a police officer given that (or assuming that) the person is sitting in a police car, is quite high.

The term "assuming that" or "given that" is written using the symbol | so

p(X | Y) means "The probability of X assuming that Y has happened"
p(police officer | sitting in police car) means "The probability of getting a police officer given that the person is in a police car"

The symbol ~ or Ø means "not", in case we want to find the probability of an event not happening. In this way, the probability of getting any coloured bead other than red would be written as p(~red) or p(Øred). Similarly, the probability of neither outcomes X nor Y happening would be written as p(~(X È Y)) or p(Ø(X È Y)). According to the rule that all probabilities must add to 1 (representing certainty), the following rule applies for any outcome (written as X):

p(Ø X) = 1 - p(X) or p(~X) = 1 - p(X)

Either X happens or it doesn't, so the probabilities of X happening and it not happening must add to give 1. This means that each of the probabilities is 1 minus the other probability.

The AND rule

There is a rule that lets us calculate the probability of two outcomes both happening:

p(A Ç B) = p(A) x p(B | A)

or, in plain English:

p(Both A and B happening) = p(A happens) x p(B happens given that A has happened)

If outcomes A and B are independent, then the probability of B happening is not affected by whether A has happened, i.e. p(B | A) = p(B). In the special case when A and B are independent, the AND rule simplifies to the following:

p(A Ç B) = p(A) x p(B)

For this reason, we often say that AND turns into MULTIPLY. For example, if we toss a coin (where the probability of a head is 0.5) and throw a die (where the probability of getting a score of 6 is 1/6), then the probability of getting both a head and a score of 6 is

p(Head Ç Six) = p(Head) x p(Six | Head)

Since the toss of the coin and the die are independent (they don't affect each other), we can write this as:

p(Head Ç Six) = p(Head) x p(Six) = 0.5 x 1/6 = 1/12

However, the events aren't always independent. Suppose I choose randomly a whole number in the range 1 to 100 inclusive. What is the probability of getting a multiple of 10 which is less than 70? We could solve this simply by counting how many multiples of 10 there are which are less than 70 (there are 6 of them), and calculate the probability as 6 / 100 = 0.06. Alternatively, we can use the AND rule:

p(Number is a multiple of 10 Ç Number < 70)

= p(Number is a multiple of 10) x p(Number < 70 | Number is a multiple of 10)

= 0.1 x 0.6

= 0.06

This works because there are 10 multiples of 10 available from the 100 numbers in total (giving a probability of 0.1) and of those 10 multiples of 10, there are 6 which are less than 70 (giving a probability of 0.6).

The OR rule

Similarly, there is a rule which lets us calculate the probability that either outcome A or B (or both!) happens.

p(A È B) = p(A) + p(B) - p(A Ç B)

or in plain English:

p(Either A or B or both happens) = p(A happens) + p(B happens) - p(Both A and B happen)

The third term is important - many people often forget that as well as adding two terms together, a term needs to be subtracted. If outcomes A and B are independent (i.e. they can't both happen), then the last term becomes 0, and the rule is simplified to the following:

p(A È B) = p(A) + p(B)

For this reason, we often say that OR turns into ADD. For example, going back to the number chosen at random in the range 1 to 100, we can calculate the probability that the number is either an even number or is bigger than 40.

Obviously, we could work out how many numbers fit the bill: there are 80 such numbers out of the 100 giving a probability of 0.8. However, it can easily be worked out using the OR rule.

There are 50 even numbers in the range 1 to 100, so p(Even number) = 0.5.

There are 60 numbers greater than 40, so p(Number > 40) = 0.6.

However, you can now see why we can't just add the probabilities and leave it at that. 0.5 + 0.6 = 1.1, and the final probability can't be bigger than 1! We must calculate the probability that the number chosen is both an even number and bigger than 40. There are 30 such numbers, giving a probability of 0.3, so the probability we want is:

p(Even number È Number > 40) = 0.5 + 0.6 - 0.3 = 0.8 or 80%

That's more reasonable!

Bayes' Rule

The Reverend Bayes was an 18th century vicar who was also interested in gambling. He used the simple AND rule to develop the rule that bears his name:

We know that

p(A Ç B) = p(A).p(B | A)

However, since p(A Ç B) means "the probability of both A and B happening," then it can be rewritten as p(B Ç A). This means that the AND rule can be rewritten as

p(B Ç A) = p(B).p(A | B)

Bayes put these two versions of the rule together to get the following:

p(A Ç B) = p(A).p(B | A) = p(B Ç A) = p(B).p(A | B)

p(A).p(B | A) = p(B).p(A | B)

p(B | A) = 
p(B).p(A | B)
p(A)

Here is an example of how we might use Bayes' Rule.

A speech recognition device is to be trained to recognise the pilot saying either TV ON or DRAW CURTAINS and fitted in an interface to help disabled people. However, the device isn't always reliable. Assuming that the disabled person says one of the two commands, and the device recognises it as TV ON, what was the probability that TV ON really was the word uttered?

The machine was trained and during training, the user said TV ON 70 times and DRAW CURTAINS 92 times, and the machine recognized words as follows:

 
User said TV ON
User said DRAW CURTAINS
Machine recognized TV ON
52
12
Machine recognized DRAW CURTAINS
18
80

p(TV ON spoken) = 70 / (70 + 92) = 0.432

p(TV ON was recognized) = (52 + 12) / (52 + 12 + 18 + 80) = 64 / 162 = 0.395

p(TV ON recognized | TV ON spoken) = 52 / 70 = 0.743

Representing "TV ON was recognized" by A and "TV ON was spoken" by B in Bayes' Rule above, we have

p(TV ON spoken | TV ON recognized) =
p(TV ON spoken) x p(TV ON recognized | TV ON spoken)
p(TV ON was recognized)

=

0.432 x 0.743 / 0.395

=

0.813

If you compare this with the simple recognition rate which is (52 + 80) / (52 + 12 + 18 + 80) = 0.815, then you see that its recognition of TV ON is actually slightly worse than the average!



Back

Questions

Problem Solving

Data Sufficiency