Matthew Grogan

The Mean, MAD, and More: Measuring accuracy and precision

Background

It is often important in a business setting to measure the accuracy and precision of a value compared to a target. However, it can be difficult to precisely communicate these measurements to non-technical teams. This is one approach.

First, let’s define the terms:

We will look at metrics that can help us describe the degree of accuracy and precision of a dataset and discuss appropriate remediation.

Example

Imagine we have a jar of currency: coins and bills of various denominations. We ask four groups of thirty students to guess the total value of the currency in the jar. The students are not aware that the jar contains exactly $500.

The guesses are shown below:

> d.data
   Group_1 Group_2 Group_3 Group_4
1      441     171     259     122
2      441     191     311     175
3      446     204     313     213
4      447     211     324     239
5      462     216     342     256
6      467     216     348     308
7      471     217     358     319
8      476     222     369     346
9      476     229     373     387
10     477     230     433     509
11     488     233     467     519
12     489     235     485     520
13     490     235     495     520
14     497     237     500     523
15     501     243     520     533
16     502     244     533     539
17     516     245     543     582
18     517     246     559     601
19     518     248     563     633
20     518     252     567     640
21     525     263     577     678
22     533     264     600     730
23     533     271     610     741
24     534     272     629     762
25     537     274     644     781
26     542     285     705     883
27     552     285     708     894
28     556     289     710     914
29     566     291     723     934
30     571     297     782     972

Our task is to define numerical measurements that properly describe these four groups.

Visualization

Before we get into calculating numerical measurements, let’s take a look at the data.

ggplot(d, aes(x=x)) +
  geom_histogram() +
  facet_wrap(~ group_nbr) +
  geom_vline(xintercept=500, color="red", linetype=2) +
  xlim(0, 1000)

Histogram

From this visualization, we can make the following observations:

The Mean: How accurate is the group itself?

Taking the mean reveals the central tendency of each group.

> d %>%
+   group_by(group_nbr) %>%
+   summarize(mean = round(mean(x)))

  group_nbr  mean
1   Group_1   495
2   Group_2   260
3   Group_3   486
4   Group_4   395

The means reasonably close for the two accurate groups: one and three, with group four further off target. Group two has the “worst” performance - judging by the mean.

If we use only the mean to compare the groups, we lose the nuances between groups one and three. We also lose information regarding the precision of group two.

The mean does not give us enough information to characterize the groups.

The MAD: How accurate were the members of the group?

The Mean Absolute Deviation (MAD) is one way to numerically discriminate between groups one and three. Recall that both groups are accurate in that they cluster around the target, but they vary in their precision.

To take the MAD, follow these steps:

> d %>%
+   group_by(group_nbr) %>%
+   mutate(deviation = x - target,              # The deviation from target
+          abs_deviation = abs(deviation)) %>%  # Absolute value of deviation
+   summarize(mad = round(mean(abs_deviation))) # The mean absolute deviation

  group_nbr   mad
1   Group_1    27
2   Group_2   240
3   Group_3    79
4   Group_4   279

We can now say that, on average, students in group one were $27 away from the true value of $500. Some people conceptualize this as a percent: on average, students in group one were 5.4% (27 / 500 * 100%) off target, while students in group three were 15.8% off target.

The Group-MAD: How precise are members of the group?

Let’s take another look at group two.

Group Two

The Mean and MAD calculations described group two’s guesses as being far from the target, which is true. But these measurements do not help us learn that the guesses are clustered together (that is, they are precise). The Group-MAD provides us insight into this nuance.

To calculate the Group-MAD, use the mean of the group in place of the target value and otherwise follow the MAD formula as described above.

> d %>%
+   group_by(group_nbr) %>%
+   mutate(deviation = x - mean(x),              # The deviation from group mean
+          abs_deviation = abs(deviation)) %>%  # Absolute value of deviation
+   summarize(grp_mad = round(mean(abs_deviation))) # The mean absolute deviation for the group

  group_nbr   grp_mad
1   Group_1    28
2   Group_2    28
3   Group_3    78
4   Group_4   256

Now we see that group two is as precise as group one, but the group is obviously off target. Therefore, the group is precise, but not accurate. If we were to intervene (for example, training), the focus should be on improving accuracy only.

The Bias: Do group members trend above or below the target?

Now we want to improve group two, but none of the measurements described above help us determine the direction to move. For this, we need the bias.

The procedure for calculating bias is the same as for the MAD, except we won’t take the absolute value.

> d %>%
+   group_by(group_nbr) %>%
+   mutate(deviation = x - target) %>%     # The deviation from target
+   summarize(bias = round(mean(deviation))) # The bias

  group_nbr   bias
1   Group_1    -5
2   Group_2  -240
3   Group_3   -14
4   Group_4  -105

All four groups have a negative bias, with some groups having a larger bias than others.

Insights and Responses

Since all metrics are now available, let’s take a look at them and see what insights we can gather.

> d %>%
+   group_by(group_nbr) %>%
+   mutate(deviation_tgt = x - target,                  # The deviation from target
+          abs_deviation_tgt = abs(deviation_tgt),      # Absolute value of deviation
+          grp_deviation = x - mean(x),                 # The deviation from group mean
+          grp_abs_deviation = abs(grp_deviation)       # Absolute value of deviation
+          ) %>%  
+   summarize(mean = round(mean(x)),                    # The mean
+             mad = round(mean(abs_deviation_tgt)),     # The MAD
+             mad_pct = round(mad / 500, 3),            # MAD as a % of target
+             grp_mad = round(mean(grp_abs_deviation)), # The Group-MAD
+             bias = round(mean(deviation_tgt))         # The bias
+             )

  group_nbr  mean   mad mad_pct grp_mad  bias
1   Group_1   495    27   0.054      28    -5
2   Group_2   260   240   0.480      28  -240
3   Group_3   486    79   0.158      78   -14
4   Group_4   395   279   0.558     256  -105

Group 1’s mean is nearest the target value of 500. The low MAD, Group-MAD, and near-zero Bias tell us that this group is the most accurate and precise of the four.

Recommended action: None

Group 2 clearly had some challenge hitting the target, but the low Group-MAD and the Bias’ consistency with the MAD (240 vs. -240) tells us that this group is precise.

Recommended action: Work to align the group with the target without affecting their precision.

Group 3’s mean and MAD are slightly worse than the first group. The closeness of the MAD and Group-MAD shows that the group’s guesses are centered around the target (that is, they are accurate). However, both metrics are higher than group one, which tells us that this group is less precise than the former.

Recommended action: Work to increase precision.

Group 4’s guesses are all over the place and the metrics show it! While the mean is not as off-target as Group two, the MAD is much worse. On average, the guesses of students in group four were 55% off target.

Recommended action: Anything that helps both precision and accuracy.

Frequently Asked Questions

We already have a measure of precision. It’s called the standard deviation.

That’s not a question, professor! :p

These are simply other descriptive statistics that can help us communicate with stakeholders. I have found that saying individuals in a particular group miss a target by 5% can be easier to absorb than a standard deviation (or confidence interval for that matter.) Your mileage may vary.

Why would you use MAD over RMSE?

The Root Mean Squared Error (RMSE) is another way of describing variation. Instead of taking absolute values and averaging them, the RMSE squares them prior to averaging. And then take the square root to get to the original units.

This has two consequences: first, a number squared is always positive so there’s no need to take an absolute value. Second, the penalty for larger deviations is much higher using the RMSE.

The penalty is a good feature to use if larger errors result in much greater costs.

Summary

Accuracy Precision Mean MAD (Target) MAD (Group) Bias
High High Near Target Lower Lower Near Zero
Low High Far from Target Higher Lower Far from Zero
High Low Near Target Lower Higher Near Zero
Low Low Any Higher Higher Any

Back


comments powered by Disqus