Don’t be so sure that the Boston Marathon is slowing down: Jumping the gun when it comes to data point mentality.

# Author Archives: Matt Savage

# Excluding the unwanted: Data that doesn’t belong

Do you have data that’s an anomaly or special cause that you want to exclude from your analysis? Do you want the ability to temporarily exclude certain data from your data set analysis? Special causes and outlying data can occur in any data collection process, learn how to easily handle these situations.

If you are calculating and charting average weekly temperature in a room, but find that one night the thermostat has been inadvertently set to 90 degrees, how will that data point affect your average? Clearly, the answer is that it would create a false sense of a higher temperature average for the week, and in effect create a misleading report.

There may be times that it is appropriate to leave out irregular data points such as this one, to be sure, without feeling that you’re “cheating” on the numbers. Additionally, there may be special causes that have been addressed and no longer apply to the calculations, and you’d like to remove these. Or perhaps you’d like to exclude certain pieces of data on a temporary basis, to analyze their effect on the calculations. Having the flexibility to remove or exclude anomaly data from your control chart can help you more easily focus on your process and remove any additional noise from the chart or the calculations.

# The differences between control limits and spec (specification) limits

The differences between control limits and spec (specification) limits may seem irrelevant or nonexistent to those outside process production, but the gulf between them is in fact huge. In fact, they are two entirely different animals.

Spec limits may be designated by a customer, engineer, etc., indicating the allowable spread of a given measurement. Control limits, on the other hand, emerge from the process. The process data will determine what control limits are and help determine the stability of the process.

If one is tempted to use spec limits as control limits, the advice from process engineers and statisticians as well is simple: Don’t.

For an X-bar chart, for example, such as the one illustrated below, all of the X-bar values are well within the designated spec limits. Things are fine, right?

Not so fast. Remember that an “X-bar” is an average. And as PQ Development Manager Steve Daum points out, if you put one foot in a bucket of ice water and the other into extremely hot water, the average water temperature may be perfectly temperate, indicating a comfortable situation. In fact, the average does not reflect the range of the separate data points, one of which might be 33 degrees Fahrenheit, and the other 180 degrees. Comfortable? Probably not.

A histogram of the same process offers a much clearer picture of the reality of this process (see chart), with some data values well outside the specification limits, indicating an unacceptable result.

Let your data do the talking, when it comes to control limits. Don’t confuse information from the process with requirements for the process.

# Easy capability in five steps

Any data that is in measurement form, such as time to complete a task, can be converted to a capability index, provided at least one specification exists. So the recipe for a capability study is:

- Define what you will measure.
- Measure it consistently.
- Create a control chart such as an individuals and moving range chart from the data and ensure that the process is predictable. Most quality and patient safety departments already have done these three items.
- Determine the upper and/or lower specification limit.
- Calculate the capability values. Software can greatly aid in performing the calculations and determining the percent of data expected beyond your limit.

The math is not complex. Two common capability indices are typically computed: Ppk and Cpk. For more on these indices, and responses to frequently asked questions about capability, visit this article in our Quality Advisor.

# Save time by creating your own defaults

“Easy to use right out of the box,” is what *SQCpack* users say about the software. It can be even easier if you customize default settings to align better with your specific needs.

Among the ways you can customize *SQCpack* are by changing default chart settings and altering entry preferences. Like the software itself, customizing defaults is easy to do and will save you time in the future.

When you select a default chart icon in *SQCpack*, the chart that appears represents default values; that is, it reflects common charting patterns that have been selected by the program’s developers.

# The capability index dilemma: Cpk, Ppk, or Cpm?

Lori, one of our customers, phoned to ask if Cpk is the best statistic to use in a process that slits metal to exacting widths. As a technical support analyst, I too wondered what index would be best suited for her application. Perhaps Cpk, Ppk, Cpm, or some other index offers the best means of reporting the capability of her product or process.

Lori’s process capability index, Cpk, has never dipped below 2 and typically averages above 3. Given this high degree of capability, she might consider reducing variation about the target. While the Cpk and Ppk are well accepted and commonly-used indices, they may not provide as much information as Lori needs to continue to improve the process. This is especially true if the target is not the mid-point of specifications.

C_{pm} incorporates the target when calculating the standard deviation. Like the sigma of the individuals formula, compares each observation to a reference value. However, instead of comparing the data to the mean, the data is compared to the target specification. These differences are squared. Thus any observation that is different from the target specification will increase the standard deviation.

As this difference increases, so does the sigma of the . And as this sigma becomes larger, the C_{pm} index gets smaller. If the difference between the data and the target is small, so too is the sigma of the C_{pm} value. And as this sigma gets smaller, the C_{pm} index becomes larger. The higher the C_{pm} index, the better the process, as shown in the diagrams below. (“Better” means closer to the target specification and reduced variation.)

# Building sand castles: Best practices for data management and system improvement

Does your data management sometimes feel as if a giant load of sand has been dumped on your head? You may want to build a sand castle, after all, but without some sense of order, all you’ll end up with is a million grains of useless material. The same may be true of data management.

With 2.5 quintillion bytes of data created every day, and the fact that 90 percent of the world’s data has been created within the past two years, it’s hard to imagine what efficient data management will look like in the next few years, or how anyone can expect to keep up with the deluge.

# Before your analysis, take these steps: Assuring accuracy in data collection

**True or false:** Collecting data, unlike other processes, is not subject to variation.

Well, of course this statement is false. Variation is inherent to any system, and excessive variation in the collection process can provide highly misleading analysis, since it will appear on control charts as variation in the process, potentially skewing the outcome and inhibiting accurate analysis.

# Not all control limits are created equal

Take a look at the charts below.

Aside from being created with two different programs, can you tell the difference between these two control charts?

Give up? Well to be fair, the answer isn’t all that clear. The same data set is repeated in both charts, and both use the same control limits.

So, what’s the big difference?

# Gage R&R study questions answered

We frequently entertain questions about MSA and specifically, gage R&R. Below are two questions we recently received:

**Question #1: ***“What are the requirements for the parts chosen in a study? Do the parts have to have the same specification?” *

**Answer:** The parts selected should be representative of the process variation that is producing them. This implies that selecting 10 consecutive pieces (parts) is not as good as using 10 parts obtained throughout the day or week. Part of what you are trying to do with an R&R study is determine whether your measurement system is capable of distinguishing parts made on the same process to the same specification. In summary, you want to select the parts in a way that represents the minimum sized part, the maximum sized part and those in between. If the selected parts have different specifications, they are different by design, not by random variation.

**Question #2: ***“Results can be calculated in several ways: using study parameters, specifications and others. Which is most acceptable for gage R&R?” *

**Answer:** The industry trend is to use study/process parameters, however, how you calculate the results of a gage R&R study depends on the purpose for doing the study. Before the study begins, you should decide what the primary purpose for conducting the study is. If you are trying to control your process, you need to be able to detect changes in the process. To do this, you should use study parameters or process parameters. If your focus is being able to compare a part to specifications, then you should use the specification method. Using the specification method suggests that you are trying to prove that your measurement system can distinguish between good and bad parts.

If you have any other questions or concerns about MSA or gage R&R, please contact us at support@pqsystems.com or by phone at 800-777-5060 or just post them below.