Jumping to conclusions: Another exercise in manipulating data

Steve DaumBarb ClearyJumping to conclusions based only on raw data: Without tools to interpret data, false conclusions are often made. And even charts can be misleading if they’re constructed without understanding of these tools.

Figures don’t lie, we often hear. Seeing actual numbers as data can be convincing, leading to conclusions that are generally considered to be valid. If my friend says that she’s 5’10” in height, I may conclude that she’s tall. Of course, if I’m looking at her and realizing that her data may be flawed, since she appears to be even shorter than my 5’3” stature, I will invalidate her data and make a different conclusion. In this case, of course, the data itself is flawed. More often, however, the interpretation of data may be seriously off base. When I look at my friend, I can see both her data and I can see her—the source of the data. Usually, however, when looking at a chart or a table of numbers in the media, we don’t get this opportunity to compare the raw data with the finished output.

Unfortunately, data can be manipulated, leading to false conclusions or facilitating leaps to conclusions that may not in fact be valid. Even charts with accurate data points can be wrong if the wrong tools are used to interpret these charts. It’s worth remembering that although statistical data can give vital information, sometimes the temptation is to draw conclusions and articulate the meaning of a chart instead of collecting enough data to let it speak for itself.

Raw data is often transformed in order to present it in the media. This transformation can affect how it is interpreted. This may be intentional or it may be the result of using a visualization tool. In either case, the consumer is left with less context for meaningful understanding.

Let’s look at these two sets of data from the U.S Bureau of Labor’s statistics for state and local government employees and civilian workers.

The descriptions at the top help establish context. There are four columns of data. The first two columns show data for state and local government employees. The second two columns show the same data for civilian employees. The data set is a time series (quarterly) covering January 2007 through January 2017.

How we respond to data like this is rarely objective. Instead, our beliefs, biases, and history color our first impressions. For example, a kneejerk response to this data might be, “Well, of course state and local government will spend more. After all, they have my taxes to spend.” Or maybe “Insurance for government employees must be a lot better than my insurance.”

If the goal is to understand the data, maybe to make sense of it for important decision making, our inclination to make swift judgements needs to be suppressed. Simply asking a few questions can help. Taking a cue from quality improvement practices, we might ask “what is the operational definition of compensation?” or “what is the operational definition of health insurance?” For example, are dental and vision included? What about mental health, chiropractor visits, physical therapy? When comparing two data sets, it is important to consider the question “should this data even be compared?” It may have been gathered under differing assumptions – resulting in an apple to orange comparison.

Consider these two charts:

It is hard to believe, but the red line on the left represents the same data as shown in chart on the right. How can this be? In the chart on the left, two different types of data are plotted on the same vertical scale. The blue line represents a percentage and the red line represents a dollar amount. This mistake ends up distorting both data lines because the single y-axis scale does not show the variation in  both values well. This mistake tends to minimize the real variation that might be present. The two data values plotted on the left should not be shown on the same chart – or more precisely, they should not share the same vertical scale.

Another distortion is seen in the chart on the right. When you squeeze the horizontal space available, the steepness of any trend, going up or going down, is exaggerated. Below, the same chart is shown with more horizontal space. The trend is still apparent but visually does not look as steep.

The chart below is more of an apple to apple comparison as the same metric in the same units is compared for two populations – on the same scale.

The Bureau of Labor Statistics offers its own warning: “Compensation cost levels in state and local government should not be directly compared with levels in private industry.” :  https://data.bls.gov/cgi-bin/surveymost. The site points out that variation in work activities, occupational structures, and professional and administrative support levels suggest that valid comparisons cannot be made. And yet, the temptation to take figures at their face value and make these comparisons is not only very real, but is one for which many of us succumb.

Making comparisons without understanding how the two may be both similar and different is very much like  the proverbial comparison of apples to oranges, which are both fruit and are roughly round in shape. Beyond that, points of comparison will fail to give useful information.