Baseball generates what may be the greatest array of statistics of all sports. Aficionados love comparing records of home runs, hits, runs, doubles, triples, errors, batting averages, and other performance details, not only for individual players and teams, but also against historic records, sometimes collecting ammunition for a discussion with their brothers-in-law about who’s the best player or team, and how that player or team compares to record-breaking plays and players.
As with all statistics, sports statistics can be painfully distorted or innocently quoted to “prove” a point about a player or team. But statistics being statistics, they demand that any use of data respond to an appropriate question. Stats can be specious if the wrong question is being answered. Let’s see how this works. Can you answer this simple baseball question? (Feel free to look up statistics to support your response.)
“Who’s the best home run hitter in baseball history?”
- Barry Bonds
- Mark McGwire
- Alex Rodriguez
Look at this data:
Barry Bonds has hit 762 home runs—more than any other player in history.
But Mark McGwire has the highest ratio of home runs to at-bat opportunities, with 31 percent home run rate.
Alex Rodriguez has 688 home runs, with a 27 percent rate; but he will have more opportunities to better his record.
This is where your brother-in-law may take issue with your response to what may seem to be a simple question for someone who watches the game. You swear that Barry Bonds is the best; he insists on Mark McGwire. What happened?
It’s clear that the question itself offers ambiguity. “Best” demands an operational definition, and a more refined focus. If the question is, “Who has the greatest number of home runs?”, the answer is clear. Likewise, if the question is “What player has the greatest percentage of home runs to at-bats?”, the response is different, but correct.
Before you stop inviting your brother-in-law to poker games, think about the ways in which this example may apply to the use of statistics in general. Meaningful interpretation of statistical data demands one essential step: that of asking the right question to get the information that you need. In manufacturing, of course, the questions will be different, but the same caveats apply.
For example, say that your company manufactures a roll of material that is 36 inches wide, with a consistent thickness measurement that is at the target value. The operational definition of “thickness” must be developed with clarity, and to this end, questions that may be posed include these:
What is the average thickness for the length of the roll?
What is the average thickness across the width of the roll?
What is the thickness on the left side, middle, and right side of the roll?
What is the thickness at the beginning of the roll compared to the end of the roll?
It may be that the thickness at the beginning of the roll met customer requirements, but by the end of the roll, the thickness had changed. “Thickness” should be defined with respect to the entire roll, since the customer’s expectations demand this.
Of course, other questions could also be raised. Depending on what information is needed, a different data collection system might be developed in order to meet the need for that information. In any case, the more data that is collected, the greater the number and variety of types of analysis that can be applied. The customer undoubtedly wants the roll to be the same thickness across the width and throughout the entire roll and to know that a consistent thickness reflects the specified target value. It may be necessary to create a variety of chart types to respond to different questions. The essential part is knowing what questions need to be answered, and asking the right questions in order to generate the right data response.
You may drive your brother-in-law crazy by insisting on the right questions when it comes to Barry Bonds, of course. Maybe it’s worth the risk.