Predicting the future: This article, originally published in QualityDigest.com, addresses the power of charts in providing useful prediction.
Predicting the future has been the stuff of folk tales for centuries. This century, however, has unprecedented access to predictive powers that no one in earlier times could have imagined.
The word predictive appears in many articles related to big data, analytics, artificial intelligence, and machine learning. The ability to make predictions has always been rewarded. Dr. Donald J. Wheeler says that “prediction is the essence of business.” With growing bodies of data and good analytical models, our predictions are getting better. You have undoubtedly experienced this as you type into the Google search box; with each new letter, a list of predictions shows up below. It can be a little unnerving to find what you are searching for appear after typing only a few characters!
The statistical models and algorithms behind prediction can be intimidating. Topics such as regression and neural networks can make the methods of prediction seem out of reach. However, a tool from statistical process control (SPC), the humble control chart, can put the power of useful prediction within your grasp.
Consider the following data series:
When presented in this form, what kind of predictions can you make about the response variable? You might note the scale of the chart and make this prediction:
Most of the data will be between 10 and 15.
The only basis you have for this prediction is data from a single month. Do you have much confidence in this prediction? More importantly, does this prediction provide any value? In other words, does it give you some business advantage to know this? The problem with this chart is it gives you limited context for thinking about the underlying process and it has little predictive value.
Consider the same data series with a mean line shown:
The chart now provides a single point of context for thinking about the data: how do the points compare to the average value? Despite your possible belief that all the neighbor kids are above average, half the points will be above the mean and half will be below the mean. Given the addition of the mean line, we might stretch and make the following prediction:
The mean of the data will continue to be around 12.23.
How confident are we about this prediction? It might seem a little better than our previous prediction, but do we have any real evidence this will hold into the future? Once again, the construction of this chart leaves us with limited predictive power.
Here, the same data series is presented with a comparison to last year’s mean:
In this chart we are looking backwards, at last year’s mean, possibly to help us predict what will happen in the future. We might note visually that the mean was lower last year than it is in this first month of the year. Does this allow us to make any better predictions? For example, we might now make this prediction:
The mean this year will be higher than it was last year.
The prediction seems weak. If a lucrative contract required that you maintain this new higher level mean, would you bet your job on it? If so, you are taking a bit of a gamble.
In the chart below, we add an additional piece of data not considered so far, the target:
Until now, we may have been assuming that things are going at least okay with the underlying process. Suddenly, things look worse, as we see we are never meeting our target, at least not in this month. Does this addition of the target allow us to make a more useful or better prediction? We now might predict this:
We will not meet the target of zero in the future.
Wow. How good does that make you feel? Well, it actually doesn’t matter because this prediction is no better than our previous predictions. Because of the stark contrast between the target and what we are actually doing, this might feel like a better prediction, but it suffers the same problems as our previous predictions. We cannot be confident in this prediction. There is not enough context for making a well-informed analysis of this data series – at least not so far.
Next, we look at the same data in a control chart. Dr. Wheeler refers to these as process behavior charts, since they allow you to make predictions about the underlying process behavior:
In a control chart, we plot the mean and we also calculate and plot an upper and lower limit. These limits are named control limits. The limits are calculated from the data points and represent (roughly) the mean plus 3 standard deviations (for the upper limit) and the mean minus 3 standard deviations (for the lower limit.)
The addition of control limits, along with some rules for interpreting the control chart, now give us the context we need to make useful predictions about the process.
Some observations about this control chart:
No points fall outside the control limits.
There are no trends of 7 or more points in one direction.
There are no runs of 7 or more points on one side of the mean.
There are no other unusual patterns in the data points.
With these observations, we can say the control chart is in control. Alternatively, we might say the chart represents a stable process. This allows us to make the following, really useful, prediction:
Unless something in the process is changed, the probability is high that it will continue to produce results centered on this mean and varying within these limits.
In other words, we can predict this metric into the future. Wow! If satisfying your customer requires that you live up to predictions like this, then the humble control chart can be your friend.
At this point a skeptical reader, maybe cautious is a better term, might ask this question:
What about data that is not in control?
Well, as we see in the example below, when viewing the data in a control chart context, you can still make valuable predictions. Consider the data in this chart – and think about the underlying process that might produce it:
Some observations about this chart:
Two points fall outside the control limits.
There is a run of 7 or more below the mean.
We can say that the underlying process is not in control. It is not stable. Since we are looking at this data through the eyes of a control chart, we can make this – still very useful – prediction:
We have no assurance that future data from this process will stay within these limits or be centered on this mean.
In other words, if we need to produce around a specified mean with a specified amount of variation – the process cannot do it – unless some change is made to the process.
This is still a prediction you can make about the future and it still provides value. However, until you look at the data with a control chart, your predictions might as well be guesses made by throwing darts.
Control charts have been in use for almost 100 years. They are simple to construct and do not require deep math or statistical skills. They are powerful tools of prediction about the process being studied. If control charts are not part of your quality improvement toolbox, perhaps they should be.