We can divide mathematics into pure and applied maths. Pure maths is the theory of maths and is very abstract. The work you have covered on algebra is mostly pure maths. Applied maths is all about taking the theory (or pure maths) and applying it to the real world. To be able to do applied maths, you first have to learn pure maths.
But what does this have to do with probability? Well, just as mathematics can be divided into pure maths and applied maths, so statistics can be divided into probability theory and applied statistics. And just as you cannot do applied mathematics without knowing any theory, you cannot do statistics without beginning with some understanding of probability theory. Furthermore, just as it is not possible to describe what arithmetic is without describing what mathematics as a whole is, it is not possible to describe what probability theory is without some understanding of what statistics as a whole is about, and statistics, in its broadest sense, is about 'processes'.
Note: Interesting fact:
A process is how an object changes over time. For example, consider a coin. Now, the coin by itself is not a process; it is simply an object. However, if I was to flip the coin (i.e. putting it through a process), after a certain amount of time (however long it would take to land), it is brought to a final state. We usually refer to this final state as 'heads' or 'tails' based on which side of the coin landed face up, and it is the 'heads' or 'tails' that the statistician (person who studies statistics) is interested in. Without the process there is nothing to examine. Of course, leaving the coin stationary is also a process, but we already know that its final state is going to be the same as its original state, so it is not a particularly interesting process. Usually when we speak of a process, we mean one where the outcome is not yet known, otherwise there is no real point in analyzing it. With this understanding, it is very easy to understand what, precisely, probability theory is.
When we speak of probability theory as a whole, we mean the way in which we quantify the possible outcomes of processes. Then, just as 'applied' mathematics takes the methods of 'pure' mathematics and applies them to real-world situations, applied statistics takes the means and methods of probability theory (i.e. the means and methods used to quantify possible outcomes of events) and applies them to real-world events in some way or another. For example, we might use probability theory to quantify the possible outcomes of the coin-flip above as having a 50% chance of coming up heads and a 50% chance of coming up tails, and then use statistics to apply it a real-world situation by saying that of six coins sitting on a table, the most likely scenario is that three coins will come up heads and three coins will come up tails. This, of course, may not happen, but if we were only able to bet on ONE outcome, we would probably bet on that because it is the most probable. But here, we are already getting ahead of ourselves. So let's back up a little.
To quantify results, we may use a variety of methods, terms, and notations, but a few common ones are:
- a percentage (for example '50%')
- a proportion of the total number of outcomes (for example, '5/10')
- a proportion of 1 (for example, '1/2')
You may notice that all three of the above examples represent the same probability, and in fact ANY method of probability is fundamentally based on the following procedure:
- Define a process.
- Define the total measure for all outcomes of the process.
- Describe the likelihood of each possible outcome of the process with respect to the total measure.
The term 'measure' may be confusing, but one may think of it as a ruler. If we take a ruler that is 1 metre long, then half of that ruler is 50 centimetres, a quarter of that ruler is 25 centimetres, etc. However, the thing to remember is that without the ruler, it makes no sense to talk about proportions of a ruler! Indeed, the three examples above (50%, 5/10, and 1/2) represented the same probability, the only difference was how the total measure (ruler) was defined. If we go back to thinking about it in terms of a ruler '50%' means '50/100', so it means we are using 50 parts of the original 100 parts (centimetres) to quantify the outcome in question. '5/10' means 5 parts out of the original 10 parts (10 centimetre pieces) depict the outcome in question. And in the last example, '1/2' means we are dividing the ruler into two pieces and saying that one of those two pieces represents the outcome in question. But these are all simply different ways to talk about the same 50 centimetres of the original 100 centimetres! In terms of probability theory, we are only interested in proportions of a whole.
Although there are many ways to define a 'measure', the most common and easiest one to generalize is to use '1' as the total measure. So if we consider the coin-flip, we would say that (assuming the coin was fair) the likelihood of heads is 1/2 (i.e. half of one) and the likelihood of tails is 1/2. On the other hand, if we consider the event of not flipping the coin, then (assuming the coin was originally heads-side-up) the likelihood of heads is now 1, while the likelihood of tails is 0. But we could have also used '14' as the original measure and said that the likelihood of heads or tails on the coin-flip was each '7 out of 14', while on the non-coin-flip the likelihood of heads was '14 out of 14', and the likelihood of tails was '0 out of 14'. Similarly, if we consider the throwing of a (fair) six-sided die, it may be easiest to set the total measure to '6' and say that the likelihood of throwing a '4' is '1 out of the 6', but usually we simply say that it is 1/6, i.e. '1/6 of 1'.






















