Driverless analysis? Accurate predictions demand more than a chart

Barb ClearyIf you get off the highway and take an alternate route when traffic slows to one lane, you are making a prediction. Likewise, if you decide to invite someone to dinner, that too is a prediction. The scientific method? Predictive in nature. Every time you make a decision, you are making a prediction of an outcome, and choosing one over another based on this prediction.

Prediction skills become second nature because of this daily application. These predictions may not be based on data or evidence, but involve some subjective guess about a preferred outcome. In the case of choosing a traffic route or a dinner date, it’s clear that not much data is involved. The decision involves subjective interpretations, intuitive hunches, and guesses about potential outcomes.

Will data analysis really enhance prediction accuracy? There are no guarantees, without adding a certain amount of understanding of data, of variation, and of process performance.

Sometimes predictions fail, as they did colossally before the financial crisis of 2008, in spite of available data that might have suggested the economic downturn that ensued. Looking back, in fact, it appears that there was substantial data to hint at the possibility of this collapse, but it was apparently ignored in favor of more positive possibilities. Statistician and master of predictions Nate Silver suggests that something else is at work:  “We focus on those signals that tell a story about the world as we would like it to be, not how it really is” (Silver, 20).

While data analysis using control charts offers a path to understanding a process, it does not guarantee accuracy in predicting how a process will continue to work. While this statement may seem like heresy, there are indeed conditions that militate against accurate understanding of process performance. Three mistakes that are often made without complete understanding of data, variation, and process performance are misattribution of causes, insufficient data, and too little process knowledge.

1. Misattribution of causes: Assigning causes without getting the full picture

Noticing points out of control on a chart offers the opportunity to evaluate contributing causes to these points. Annotating charts to reflect this evaluation is a common practice. “New product released” may explain sales figures that are clearly outside control limits, and may stimulate a need in the long run to recalculate these limits, if the pattern persists. Points below control limits may be annotated as well: “Machine repair,” or “Power failure” can explain downtime that created an out-of-control situation.

But without deeper analysis of a shift, the ability to predict future stability of the process is diminished; the shift in data generated by the process may have resulted from a variety of intermittent causes. For example, a downward trend might be attributed to an economic downturn. Coincidentally, a quality issue may have started to affect new customers. When computing new limits, the reason for doing so is important and might be worth noting on the chart – so that a few months later, when the economy starts to boom, but your sales do not – you can go back and examine your assumptions.

What else had been happening at that moment that may have affected the process? It may be easy to apply the rules that help to simply identify instability in the process—patterns of data points, for example. But assuring stability in the future demands thoughtful attention to the forces that have contributed to changes.

2. Too little data: Jumping to conclusions without sufficient support

While we may know that one can’t predict next week’s weight based only on this week’s, we tend to do this anyway. Panicking at the sight of several data points out of control invites tampering with a process without sufficient evidence for change. This kind of data point mentality not only falsely bases predictions on only a few data points, but can inspire actions that are unproductive and even harmful.

The same is true in process management; tampering with a process because a few data points seem to suggest an unfavorable outcome is commonplace. We’ve all adjusted the thermostat in a room when a single observation suggests that it’s colder than usual, without collecting data over time to identify trends, and then taking actions based on these trends that will create more comfortable temperatures throughout the day. A knee-jerk response to perceived cold in a room may lead to another adjustment response that creates an environment that is too warm. The common wisdom is that more than a handful of data points—some say at least 25– must be recorded before an analysis can be said to be accurate.

What has become known as Big Data is able to identify patterns of behavior—from how many steps per day you take to what clothing styles you prefer—because these predictions are based on mountains of data collected from fitness devices or web shopping sites. After all, buying a red car and a red dress and planting red carnations in the same week does not suggest that your future purchases will all be red. If, on the other hand, the entire city of San Diego has made the same kind of purchases, there may be something there.

3. Not enough knowledge about the process

However, just having a lot of data is not the Holy Grail. In one example, competing machine learning models were both looking at the same pile of data. A small tweak in initial settings generated accurate predictions by one model, and specious ones by the other one. So this identifies another possible mistake: not having the right knowledge about the algorithms and the data – and this leading to a wrong decision or application. Thinking of control charts, this might equate to something like calculating limits with data that is not in chronological order. Resulting control limits and charting will not reflect the reality of any process that produces results over time.

Knowledge of control charts depends on understanding and interpreting meaning in their patterns. Without knowing that a pattern of most data points falling near the mean is a signal to investigate further, one might conclude that this pattern is a positive one. If all points are on the same plane, the pattern is far from healthy, as an understanding of common-cause variation will indicate. Basic statistical tools are essential to the process of predicting process outcomes, and understanding variation lies at the heart of this process.

While we may soon find ourselves in driverless cars, happily trusting the built-in technology of the machine to get us where we want to go, it’s important to remember that control charts are never driverless. To make accurate predictions about a process, it will take an understanding of the process, the data, and the statistical tools that support analysis and assure accurate predictions.

Leave a Reply

Your email address will not be published. Required fields are marked *