I received a question after my last blog post asking me to clarify the concept of within and between subgroup variation which is used in calculating Cpk, Cp, Cr, Ppk, Pp, Pr and other statistics. Here is an example I used to help explain the differences.

Let’s say that every day I run about 30 minutes with my chocolate labrador, Cadbury (pictured below).

While running, I decide to measure how fast we are going. I measure the speed (pace) three times throughout the run: toward the beginning, the middle, and the end of the run. This data tells me a few things:

1. The pace at the beginning, middle, and end of the run.

2. The average pace we keep. This average pace is also called an X-bar.

3. The difference between the fastest pace and the slowest pace, also called the range.

4. Cadbury, like me, has a lot more energy at the beginning of our run than at the end.

The data collected is easy enough to understand. But what if I want to know if the process is capable (of meeting some goal)? This is the same questions you no doubt have had about data collected for traditional quality improvement purposes. The tool I need to answer this question is capability analysis.

I am using the example of running to illustrate variability. As mentioned in #2 above, the average pace of a run is the X-bar. The range of the three measurements (#3 above) is also known as the within subgroup variation. This is sometimes referred to as the subgroup (or sample) standard deviation.

Let’s say Cadbury and I have run daily for 20 days. I took 3 measurements of my pace each day, so I now have 60 measurements. The variation of all 60 measurements is called the variation within and between the subgroups. This is sometimes called the total standard deviation.

Within subgroup variation (subgroup standard deviation) is used in calculating control limits and Cp, Cr, and Cpk. Within and between subgroup variation (total standard deviation) is used in calculating Pp, Pr, and Ppk. Keep the questions coming–Cadbury and I will try our best to answer them!

when are total standard deviation and subgroup standard deviation the same?

Subgroup standard deviation is typically the variability within the subgroup of X measurement values. X might be 5 if your sample size is 5. i.e. How much variation do you have among the 5 values.

Total standard deviation is the variability with your subgroup AND between your subgroups. This is often referred to the standard deviation of the individual values.

When are they the same? When the variability within your subgroup is the same as the variability as the within and between your subgroups. In short, if your process does not have big swings from start to finish, your total variability will likely be similar to the variability you see within a sample.

it is great article,,, but if we have only one subgroup… is standard deviation within and overall same? do you have formula to calculate them? thank you very much

When you have one subgroup, then you can only calculate the variability “within” the one subgroup. Then you need to decide what are you going to use to represent the variability of the subgroup. Two common choices are the Range of the subgroup (Max value – Min value) and Standard Deviation. Formula is:

SQRT of [(SUM(x – Mean)squared) / (n – 1) where x represents each value in your sample.

A nice approach to panel data.. Thanks a lot.

Hi

We are measuring four point in a sample and measuring 4 sample each time. It means 16 readings each measurement. Now the question is how to measure Standard deviation ? what’s the formula? I’ve to add in 24 hr.( a full day) there are 3 shifts , each shift 8hrs . we are doing this test 3 times a shift .it means there are 3*2*4*4 = 96 readings ( 3 shift * 2 times a shift * 4 samples* 4 points per samples ).

Thank you so much. I really appreciate your example with the dog. It brought a practical understanding to relate to our minitab software. You’re awesome. Beautiful dog too.

Lana H