The Binomial Distribution

A small reminder about probability distributions

Remember, any set of positive numbers that have a total of 1 can be thought of as a probability distribution. This means that they can represent the probabilities of various things happening. Here is an example of just such a set of numbers:

Red
Yellow
Green
Blue
0.2
0.3
0.1
0.4

I don't know what "red", "yellow" etc. mean (coloured beads in a bag possibly?) but the figures underneath could well represent the probabilities of them happening, i.e. p(Red) = 0.2 and p(not Blue) = 1 - 0.4 = 0.6 etc. The important things about the set of numbers are:

If either of these two things is false, then the numbers cannot represent a probability distribution. Such a distribution is called a discrete distribution as the possible outcomes are discrete events. Another example of a discrete distribution is the set of outcomes from tossing a die. A die can only give one of 6 distinct scores: 1, 2, 3, 4, 5 or 6 - never anything between.

Of course, there is no reason why a probability distribution cannot be continuous. In this case, it is shown by a line on a graph, either a straight line (as in the diagram below on the left) or a curve (such as the normal distribution, shown on the right).

There is a pair of rules equivalent to the ones above for continuous distributions:

The Binomial Distribution

Now consider this:
(p + q)n

Providing p + q = 1, then (p + q)n = 1n, and this will always be 1, regardless of the value of n.

For instance, if p + q = 1, then

(p + q)2 = p2 + 2pq + q2 = 1

(p + q)3 = p3 + 3p2q + 3pq2 + q3 = 1

This is simply expanding the brackets, and is dealt with in a different section. I have stuck to the tradition of writing the terms in the expanded expression in a certain order, with the p terms in descending order of powers (p3, then p2, then p and finally p disappears altogether). This has the effect that the q terms appear in ascending order of the powers (no q, then q, then q2 etc.) You will see that the powers of p and q always mirror each other (as one increases, the other decreases).

There is no mathematical reason why you should write the terms in that order, for instance, we could just as easily write this:

(p + q)3 = 3pq2 + 3p2q + q3 + p3

However, you will see that writing the terms in the conventional order has two advantages: It helps you to make sure that you haven't forgotten one, and it makes them easier to calculate using something called Pascal's Triangle.

Beautiful, but what's this got to do with probability?

Well, if we take p + q = 1 and both p and q as bigger than 0, then expanding (p + q)n gives a string of positive terms that add to give 1. Effectively, they are a probability distribution (or can be thought of as such). It is called the Binomial Distribution (I bet you didn't see that coming!) and is used to give the probabilities of various combinations of successes and failures in a certain number of trials.

Let me explain. Suppose you are going to try something n times, and each trial could go one of two ways: Success (probability p) or failure (probability q). Now, this means that p + q = 1, as these are the only two possibilities. You can probably see where I am going with this ...

The first term in the expansion (the pn term) gives the probability of n straight successes (a jackpot every time!). The next term (the pn-1q term preceded by some number - conventional order, remember!) gives the probability of one failure and n-1 successes (in any order). The next term, the term involving pn-2q2, gives the probability that all the trials will be successes except two of them, and so on. All the possibilities are covered, ending with the term qn, which is the probability that all the trials will be abject failures.

Perhaps a concrete example will make this clearer. Suppose you have a bag with 3 red beads and 7 black beads in it. You are going to pull out a bead at random, examine it and then put it back in the bag! (I will explain why that is important later.) You are going to do this five times. Every time you get a red bead, that is a "success" (and you get a pat on the back!) and every time you get a black bead, that is a "failure" (and you get a nasty look!)

So we have 5 trials (n = 5) with a chance of success of 0.3 on each trial (p = 3) as there are 3 red beads out of 10 and all are equally likely to be drawn. Furthermore, the chance of failure is 0.7 (q = 0.7).

(p + q)5 = p5 + 5p4q + 10p3q2 + 10p2q3 + 5pq4 + q5

p5 = 0.35
This is the probability of getting 5 straight successes.
5p4q = 5(0.3)40.7
This is p(4 red beads and 1 black bead in any order).
10p3q2 = 5(0.3)3(0.7)2
This is p(3 red beads and 2 black beads in any order).
10p2q3 = 5(0.3)2(0.7)3
This is p(2 red beads and 3 black beads in any order).
5pq4 = 5(0.3)(0.7)4
This is p(1 red bead and 4 black beads in any order).
q5 = 0.75
This is the probability of getting 5 black beads.

You will notice that the probability for success stays the same throughout the different combinations (always 0.3 in this case). This means that whatever coloured bead is withdrawn from the bag at any stage, it must be returned before the next draw to make sure the probabilities don't change. If the coloured beads were thrown away afterwards, then certain combinations (such as 5 red beads) would become impossible.

Please note: the terms of the formula tell you nothing about the order in which the outcomes can happen. To get that information, you have to do the calculation the hard way, as shown in this example. Let's consider the second term of that expansion, the one that gives the probability of 4 red beads and a black bead. Where could that have come from? The following possibilities occur:

p(red, then red, then red, then red, then black)
0.3 x 0.3 x 0.3 x 0.3 x 0.7
p(red, then red, then red, then black, then red)
0.3 x 0.3 x 0.3 x 0.7 x 0.3
p(red, then red, then black, then red, then red)
0.3 x 0.3 x 0.7 x 0.3 x 0.3
p(red, then black, then red, then red, then red)
0.3 x 0.7 x 0.3 x 0.3 x 0.3
p(black, then red, then red, then red, then red)
0.7 x 0.3 x 0.3 x 0.3 x 0.3

These terms all give the same probabilities, as it makes no difference in which order the four 0.3s and the 0.7 are multiplied. This means that the total probability for getting four reds and a black in any order is 5 times the probability of any particular row above, i.e. 5(0.3)4(0.7), exactly as specified in the expansion. You can use a similar logic to prove that the other terms must be correct.

Clearly, using the binomial expansion doesn't tell us anything that we couldn't work out the hard way. However, using the binomial expansion is much easier!

Pascal's Triangle

I promised you above that I would show you an easy way to expand (p + q)n. In the 17th century, the French mathematician Blaise Pascal developed his triangle of numbers, as follows:

(p + q)0 =
1
(p + q)1 =
1p
+
1q
(p + q)2 =
1p2
+
2pq
+
1q2
(p + q)3 =
1p3
+
3p2q
+
3pq2
+
1q3
(p + q)4
= 1p4
+
4p3q
+
6p2q2
+
4pq3
+
1q4

The pattern of the numbers is as follows: Down the side of the triangle, the first and last numbers are both 1 (indicating that the pn and qn terms will never have any number other than 1 in front of them). The coefficients within the body of the triangle are formed by adding the two coefficients diagonally above them. This only applies to the numbers in front of the ps and qs, of course. If you look at the last line displayed, you will see that the 4 comes from adding 1 and 3, the 6 from adding 3 and 3 etc.

The next line in the triangle is the one you saw above:

(p + q)5 = p5 + 5p4q + 10p3q2 + 10p2q3 + 5pq4 + q5

Compare this with the pattern that I have just described. The first and last terms do not have numbers in front of them (equivalent to 1), the numbers 5 are found by adding 1 and 4, and the numbers 10 are found by adding 4 and 6.

An example question

A machine produces bolts, and has a chance of 0.1 of producing a faulty bolt. Calculate the probability that in a sample of 4 bolts, there will be

  1. exactly 2 faulty bolts
  2. at least 2 faulty bolts

In this case n = 4, p = 0.1 and q = 0.9.

  1. The probability of all 4 bolts being faulty is 0.14 = 0.0001.
    The probability of exactly 3 bolts being faulty is 4(0.13)0.9 = 0.0036.
    The probability of exactly 2 bolts being faulty is 6(0.12)(0.92) = 0.0486.

  2. The probability of at least 2 faulty bolts = 0.0001 + 0.0036 + 0.0486 = 0.0523.

It may seem strange defining a "success" as getting a faulty bolt rather than a well constructed one, but that is perfectly acceptable mathematically. If it bothers you, then you can redefine a "success" (corresponding to p) as a functioning bolt (so p = 0.9) and "failure" (corresponding to q) as a faulty bolt (so q = 0.1). If you carry out the mathematics, then you will find that the answers come out the same as they did before.

The Algebraic Form of the Binomial Distribution

You may see the Binomial Distribution written slightly differently in some textbooks, so some explanation is called for. Firstly we need to define an operator called factorial, written using the ! symbol.

0! = 1
1! = 1
2! = 2 x 1 = 2
3! = 3 x 2 x 1 = 6
4! = 4 x 3 x 2 x 1 = 24
5! = 5 x 4 x 3 x 2 x 1 = 120 etc.

The pattern for n! ("n factorial") is all the numbers from 1 to n multiplied together. You can also see that to get n!, you can multiply n by the previous factorial in the list, so 6! is 6 x 5! = 6 x 120 = 720, and 7! = 7 x 6! = 7 x 720 = 5040. 0! and 1! are both defined to be 1. As you can see, factorial numbers tend to be very large (after all, we have only reached 7!, and already we are half way to 10,000). However, there are ways that we can use to avoid having to calculate huge numbers.

Now we need to define another operator called combination. This is written one of several different ways, depending on which text books you read, but I think the most common way it is written is as follows: nCr. C is the combination operator, and n and r are numbers. The combination operator gives the number of ways of arranging n items, which are all of the same type, except for r items which are of another type.

That needs an example. Suppose you had 16 beads, all black except for 3 that were red. There are 16C3 different ways of arranging these beads. nCr is defined as follows:

nCr =
n!
r!(n - r)!

In the case of the 3 red beads among the 16 beads in total, the number of ways of arranging these is:

16C3 =
16!
3!.13!
=
16x15x14x13x12x11x10x9x8x7x6x5x4x3x2x1
(3x2x1).(13x12x11x10x9x8x7x6x5x4x3x2x1)

Phew! We are never going to be able to calculate such huge numbers, but fortunately, we don't have to! Most of those numbers will cancel from the numerator and denominator, as follows:

16C3 =
16x15x14x13x12x11x10x9x8x7x6x5x4x3x2x1
(3x2x1).(13x12x11x10x9x8x7x6x5x4x3x2x1)
=
16 x 15 x 14
3 x 2 x 1
=
3360
6
= 560

This means that there are 560 distinct (different) ways of arranging 13 black beads and 3 red ones. Consider the value of nCr when n = 5 and r varies from 0 to 5:

5C0 =
5!
0!5!
=
5 x 4 x 3 x 2 x 1
1.(5 x 4 x 3 x 2 x 1)
= 1
 
5C1 =
5!
1!4!
=
5 x 4 x 3 x 2 x 1
1.(4 x 3 x 2 x 1)
= 5
 
5C2 =
5!
2!3!
=
5 x 4 x 3 x 2 x 1
(2 x 1).(3 x 2 x 1)
= 10
 
5C3 =
5!
3!2!
=
5 x 4 x 3 x 2 x 1
(3 x 2 x 1).(2 x 1)
= 10
 
5C4 =
5!
4!1!
=
5 x 4 x 3 x 2 x 1
(4 x 3 x 2 x 1).1
= 5
 
5C5 =
5!
5!0!
=
5 x 4 x 3 x 2 x 1
(5 x 4 x 3 x 2 x 1).1
= 1

Hmm! 1, 5, 10, 10, 5, 1. Those figures seem familiar. Of course, they are the numbers in front of the terms of the binomial expansion. In fact, nCr gives the coefficients (the numbers in front of the terms) for each possible expansion of (p + q)n. We can rewrite the expansion of (p + q)5, for instance, as follows:

(p + q)5 = 5C0.p5 + 5C1.p4q + 5C2.p3q2 + 5C3.p2q3 + 5C4.pq4 + 5C5.q5

Now we have a general pattern that describes every term. The rth term of the expression (where r starts at 0, not 1) is given by

nCr.pn-r.qr

This expression gives the probability of getting r failures (not successes, note! Look at the value of r as you move from left to right along that expression above) among n trials. When r = 0 (i.e. the probability of no failures at all), the q term disappears completely. This is because qr = q0 and anything to the power of 0 must be 1. When r = n (i.e. the probability of all failures, no successes) the p term disappears completely (pn-r = pn-n = p0).

In fact, you could easily rewrite that expression so that it gives the probability of r successes, rather than failures. Just ensure that the powers of p increase as r increases. A slight change is all that is needed:

nCr.pr.qn-r

I'll let you work out for yourself why this change is a valid one.

The expression (p + q)n is the sum of all the terms in the expansion starting with r = 0 and ending when r = n:

(p + q)n =
n
S
r = 0
nCr.pn-r.qr

There, if that doesn't impress people at parties, I don't know what will!


Go back      Questions