| Let's consider the normal distribution graph that you saw in the Introduction to Normal Distributions. You'll remember that the horizontal axis indicates the heights of people in a population and the vertical axis indicates the number of people who have a certain height.
The mean height was the central point, 4.75 feet, with the curve distributed evenly to either side of it. |
![]() |
![]() |
What proportion of the population has a height greater than 4.75 feet? Well, clearly exactly half of the population is taller, as shown by the shaded area in this graph. |
| Similarly, exactly half the population is shorter, as shown here: | ![]() |
What proportion of the population is exactly the mean height, 4.75 feet? Well, nobody in fact! You may think that you are exactly 4.75 feet tall, but your height has only been measured to a certain degree of accuracy (e.g. you may be 4.75 feet to the nearest inch, or to the nearest centimetre or to the nearest millimetre). Even if you measured heights to an accuracy of half the size of an atom, we could still say that you weren't exactly 4.75 feet tall. We could always say that you were taller or shorter.
| What proportion of the population is taller than 6 feet? Well, this question is a lot harder to answer. The proportion is the shaded area on the graph. It looks as if it is roughly 10% to me. Fortunately, there is a way to calculate it more exactly. | ![]() |
In order to work out the shaded area as a proportion of the entire area we need to know the mean value and the standard deviation. We need to know the mean so that we can compare it with 6 feet. After all, if 6 feet is greater than the mean (which it is in this case), then the proportion will be different to what it would be if 6 feet were less than the mean. The one thing that we don't actually need to know is the height of the graph. The proportion of the total area would be the same whether the graph were tall or short!
| The values published are for the Standard Normal curve, which always has the mean value at 0 (this explains why there is a vertical axis slicing the graph in half at the mid point) and a standard deviation of 1. The way we represent this is as N(0,1) (where the first number represents the mean and the second the standard deviation). | ![]() |
I have included the table on another page, which is available by . Please do give it a visit - it will pop up in its own little window. The table is complicated, and extracting the area that you want from it takes and bit of practice and a brave heart, so I have included a guide beneath the table that explains exactly how it is used.
So how does this help us with the "taller than 6 feet" problem? Well, there is a simple formula that allows you to translate any figure on any normal distribution graph into an equivalent figure on the standard normal distribution graph. After that, we can simply look up the correct figure in the table, and that will give us the proportion we want.
Here's the formula. In the following equation, m stands for the mean (central) value, s stands for the standard deviation, and x is the value which you are trying to translate.
| Z = | s |
This is called "calculating the Z score" for a number, which is why we represent it by the letter Z. In the case of the heights of people, we would need to know the mean (4.75 ft), the standard deviation (let's call it 1.5 ft), in order to produce a Z score for the height in which we are interested (6 ft). The Z score comes out as
| Z = | s |
= | 1.5 |
= 0.833 (to 3 decimal places) |
Now we go the table and look up 0.833 (or rather, look up 0.83 as the table only allows you to look up numbers to 2 decimal places) to get the value 0.7967. However, this is the area from minus infinity to 6 ft (i.e. the area left unshaded in the diagram above), so we need to subtract it from 1 to get the shaded area. 1 - 0.7967 = 0.2033. This means that 0.2033 or 20.33% of the population are greater than 6 feet tall. My estimate of 10% was rather inaccurate!
We have an alternative way of thinking about this figure, and that's as a probability. Instead of asking ourselves "What proportion of people are taller than 6 feet?" we could ask "Suppose I chose someone at random from the population. What is the probability that this person would be taller than 6 feet?" Clearly, since 20.33% of the population fall into this category, the probability would be 20.33%. We can interpret the figure that we get either as a proportion of the population or as a probability.
The translation into a Z score works by panel-beating our normal distribution until it fits over the standard distribution exactly. Firstly we subtract the mean from the number we are interested in. This has the effect of moving our distribution down the axis until the mean sits on the y-axis. (Of course, if our mean was originally negative - unlikely, but technically possible - then subtracting our mean from any value would still shift the mean until it was at 0). Then we divide the value by the standard deviation. This has the effect of altering the width of the curve. Distributions with a large standard deviation ("fat" distributions) are slimmed down, and distributions with a small standard deviation (less than 1, i.e. "thin" distributions) are widened.
In Great Britain, waiting times for non-urgent operations are normally distributed with a mean time of 18 months and a standard deviation of 4 months. What proportion of people have to wait between 6 months and a year for their operations?
![]() |
This one is a little more complicated as we are trying to find a range of values by looking up two figures in the table - one corresponding to 6 months, the other corresponding to a year (12 months). The table doesn't let us look up areas for ranges between one value and another, so we have to use a bit of common sense |
The first thing to do is to translate both the figures into Z scores:
|
|
|
We can look up the figures in the table. Well, no. Actually we can look up +3 and +1.5, but this doesn't really matter as we are only interested in the area between these values. When we get the two areas from the table, we simply subtract the smaller area from the larger one, and that tells us the proportion of the people that we want.
The table gives us 0.9332 for the value 1.5 and 0.9987 for the value 3, so the difference between these figures is 0.9987 - 0.9332 = 0.0655, or 6.55%. This means that 6.55% of the people waiting for non-urgent operations have to wait between 6 months and a year, and that if we chose someone at random from the people waiting for non-urgent operations, we would have a probability of 0.0655 or 6.55% of choosing someone who had to wait between 6 months and a year.
A machine packs nails into boxes, such that there is a mean number of 500 nails in each box. The distribution of the nails in the boxes is normal, with a standard deviation of 12 nails. Calculate the probability that a box chosen at random has 505 nails in it.
We could simply turn the value 505 into a Z score and then look it up in the table, but this would give us a range (up to 505 or so nails). What's the proper way to go about doing this?
The trouble with the normal distribution curve is that the mathematics doesn't realise that there are some situations where decimal numbers simply aren't appropriate. The machine could pack 505 nails into a box, or 506, or 504 (or more than 506 or fewer than 504 for that matter), but it couldn't pack 504.5 nails, or 506.1552 nails. However, the theory doesn't stop us from translating these stupid values into Z scores and calculating probabilities from them.
| All we have to do is to round the ranges to the nearest whole number. The machine can't pack 504.8 nails into a box (although we can work out a figure for that), so we will assume that 504.8 represents 505 nails. Similarly, we can assume that 505.2 nails represents 505 nails. Indeed, we can take any value between 504.5 and 505.5 (including 504.5 but not 505.5 as that would round up to 506) and take it to represent 505. | ![]() |
So we convert the numbers 504.5 and 505.5 into Z scores ...
|
|
|
... and look up these values in the table (well, we can only look up 0.38 and 0.46 in the table - that's as close as we can get!) to give the numbers 0.6480 and 0.6772. The probability that we want is the difference between these (the shaded area in the diagram above): 0.6772 - 0.6480 = 0.0292, or 2.92%. We can say that 2.92% of the boxes have 505 nails in them, and that the chance of a randomly chosen box containing 505 nails is 2.92%.
Of course, this fudge only applies to situations where the outcomes of the probabilities must be whole numbers. If you were trying to calculate the probability of finding 2 children in a family using the normal distribution, then you would be entitled to round the ranges as shown. If you were trying to find the probability of a piece of string being a certain length, then you wouldn't need to do any rounding (as a piece of string can be any length, including decimals).
![]() |
A 95% confidence interval is a range of values, centred around the mean, that contains 95% of the items in the entire population. It is the shaded region on the diagram. It is normally quoted using the two values at the left and right end of the region. 95% of the entire curve is shaded, and 5% is left unshaded (the two "tails" on the right and left sides of the diagram). |
Since the region has the mean at its centre, it must be symmetrical, and there must be 2.5% unshaded at each end of the diagram. The end values of the 95% confidence interval are found easily by multiplying the standard deviation by a critical number, which has the value 1.96. This gives the distance from the mean of each of the end values. The lower limit is found by subtracting this distance from the mean, the upper limit by adding the distance to the mean.
For instance, suppose we wanted to calculate a 95% confidence interval for a normal distribution in which the mean was 70 and the standard deviation was 10. We would multiply 10 by the critcal number 1.96 (to get 19.6). Then we would subtract 19.6 from 70 to get the lower limit, i.e. 50.4, and add 19.6 to 70 to get the upper limit, i.e. 89.6. This gives us a confidence interval of 50.4 to 89.6, i.e. 95% of the items that contributed towards this normal distribution lie in the region 50.4 to 89.6.
Mathematically we can represent the confidence interval as
where the symbol ± means that you subtract for the lower limit and add for the upper limit.
You may be asking "What's so special about the number 1.96?" Well, that's just the way the mathematics comes out. If you look for the Z score 1.96 in the table you will find that it translates to the value 97.5% (or 0.975). This means that the area from minus infinity to 1.96 on the standard normal curve contains 97.5% of the area, and there is 2.5% of the area to the right of this figure. However, we must remember in this case, that there is 2.5% of the area both at the extreme right and extreme left in this case.
Of course, we are only using the figure 1.96 because we want a 95% confidence interval. If we wanted a different interval (99% is the next most commonly used value), then we would use a different critical figure, 2.58. However, 95% confidence intervals are so common that I would recommend that you commit the number 1.96 to memory.
![]() |
|