It is often important in a business setting to measure the accuracy and precision of a value compared to a target. However, it can be difficult to precisely communicate these measurements to non-technical teams. This is one approach.
First, let’s define the terms:
We will look at metrics that can help us describe the degree of accuracy and precision of a dataset and discuss appropriate remediation.
Imagine we have a jar of currency: coins and bills of various denominations. We ask four groups of thirty students to guess the total value of the currency in the jar. The students are not aware that the jar contains exactly $500.
The guesses are shown below:
> d.data Group_1 Group_2 Group_3 Group_4 1 441 171 259 122 2 441 191 311 175 3 446 204 313 213 4 447 211 324 239 5 462 216 342 256 6 467 216 348 308 7 471 217 358 319 8 476 222 369 346 9 476 229 373 387 10 477 230 433 509 11 488 233 467 519 12 489 235 485 520 13 490 235 495 520 14 497 237 500 523 15 501 243 520 533 16 502 244 533 539 17 516 245 543 582 18 517 246 559 601 19 518 248 563 633 20 518 252 567 640 21 525 263 577 678 22 533 264 600 730 23 533 271 610 741 24 534 272 629 762 25 537 274 644 781 26 542 285 705 883 27 552 285 708 894 28 556 289 710 914 29 566 291 723 934 30 571 297 782 972
Our task is to define numerical measurements that properly describe these four groups.
Before we get into calculating numerical measurements, let’s take a look at the data.
ggplot(d, aes(x=x)) + geom_histogram() + facet_wrap(~ group_nbr) + geom_vline(xintercept=500, color="red", linetype=2) + xlim(0, 1000)
From this visualization, we can make the following observations:
Taking the mean reveals the central tendency of each group.
> d %>% + group_by(group_nbr) %>% + summarize(mean = round(mean(x))) group_nbr mean 1 Group_1 495 2 Group_2 260 3 Group_3 486 4 Group_4 395
The means reasonably close for the two accurate groups: one and three, with group four further off target. Group two has the “worst” performance - judging by the mean.
If we use only the mean to compare the groups, we lose the nuances between groups one and three. We also lose information regarding the precision of group two.
The mean does not give us enough information to characterize the groups.
The Mean Absolute Deviation (MAD) is one way to numerically discriminate between groups one and three. Recall that both groups are accurate in that they cluster around the target, but they vary in their precision.
To take the MAD, follow these steps:
441, the deviation will be
441 - 500 = -59.
|-59| = 59). This prevents positive and negative deviations from offsetting each other.
> d %>% + group_by(group_nbr) %>% + mutate(deviation = x - target, # The deviation from target + abs_deviation = abs(deviation)) %>% # Absolute value of deviation + summarize(mad = round(mean(abs_deviation))) # The mean absolute deviation group_nbr mad 1 Group_1 27 2 Group_2 240 3 Group_3 79 4 Group_4 279
We can now say that, on average, students in group one were $27 away from the true value of $500. Some people conceptualize this as a percent: on average, students in group one were 5.4% (
27 / 500 * 100%) off target, while students in group three were 15.8% off target.
Let’s take another look at group two.
The Mean and MAD calculations described group two’s guesses as being far from the target, which is true. But these measurements do not help us learn that the guesses are clustered together (that is, they are precise). The Group-MAD provides us insight into this nuance.
To calculate the Group-MAD, use the mean of the group in place of the target value and otherwise follow the MAD formula as described above.
> d %>% + group_by(group_nbr) %>% + mutate(deviation = x - mean(x), # The deviation from group mean + abs_deviation = abs(deviation)) %>% # Absolute value of deviation + summarize(grp_mad = round(mean(abs_deviation))) # The mean absolute deviation for the group group_nbr grp_mad 1 Group_1 28 2 Group_2 28 3 Group_3 78 4 Group_4 256
Now we see that group two is as precise as group one, but the group is obviously off target. Therefore, the group is precise, but not accurate. If we were to intervene (for example, training), the focus should be on improving accuracy only.
Now we want to improve group two, but none of the measurements described above help us determine the direction to move. For this, we need the bias.
The procedure for calculating bias is the same as for the MAD, except we won’t take the absolute value.
> d %>% + group_by(group_nbr) %>% + mutate(deviation = x - target) %>% # The deviation from target + summarize(bias = round(mean(deviation))) # The bias group_nbr bias 1 Group_1 -5 2 Group_2 -240 3 Group_3 -14 4 Group_4 -105
All four groups have a negative bias, with some groups having a larger bias than others.
Since all metrics are now available, let’s take a look at them and see what insights we can gather.
> d %>% + group_by(group_nbr) %>% + mutate(deviation_tgt = x - target, # The deviation from target + abs_deviation_tgt = abs(deviation_tgt), # Absolute value of deviation + grp_deviation = x - mean(x), # The deviation from group mean + grp_abs_deviation = abs(grp_deviation) # Absolute value of deviation + ) %>% + summarize(mean = round(mean(x)), # The mean + mad = round(mean(abs_deviation_tgt)), # The MAD + mad_pct = round(mad / 500, 3), # MAD as a % of target + grp_mad = round(mean(grp_abs_deviation)), # The Group-MAD + bias = round(mean(deviation_tgt)) # The bias + ) group_nbr mean mad mad_pct grp_mad bias 1 Group_1 495 27 0.054 28 -5 2 Group_2 260 240 0.480 28 -240 3 Group_3 486 79 0.158 78 -14 4 Group_4 395 279 0.558 256 -105
Group 1’s mean is nearest the target value of 500. The low MAD, Group-MAD, and near-zero Bias tell us that this group is the most accurate and precise of the four.
Recommended action: None
Group 2 clearly had some challenge hitting the target, but the low Group-MAD and the Bias’ consistency with the MAD (
240 vs. -240) tells us that this group is precise.
Recommended action: Work to align the group with the target without affecting their precision.
Group 3’s mean and MAD are slightly worse than the first group. The closeness of the MAD and Group-MAD shows that the group’s guesses are centered around the target (that is, they are accurate). However, both metrics are higher than group one, which tells us that this group is less precise than the former.
Recommended action: Work to increase precision.
Group 4’s guesses are all over the place and the metrics show it! While the mean is not as off-target as Group two, the MAD is much worse. On average, the guesses of students in group four were 55% off target.
Recommended action: Anything that helps both precision and accuracy.
We already have a measure of precision. It’s called the standard deviation.
These are simply other descriptive statistics that can help us communicate with stakeholders. I have found that saying individuals in a particular group miss a target by 5% can be easier to absorb than a standard deviation (or confidence interval for that matter.) Your mileage may vary.
Why would you use MAD over RMSE?
The Root Mean Squared Error (RMSE) is another way of describing variation. Instead of taking absolute values and averaging them, the RMSE squares them prior to averaging. And then take the square root to get to the original units.
This has two consequences: first, a number squared is always positive so there’s no need to take an absolute value. Second, the penalty for larger deviations is much higher using the RMSE.
The penalty is a good feature to use if larger errors result in much greater costs.
|Accuracy||Precision||Mean||MAD (Target)||MAD (Group)||Bias|
|High||High||Near Target||Lower||Lower||Near Zero|
|Low||High||Far from Target||Higher||Lower||Far from Zero|
|High||Low||Near Target||Lower||Higher||Near Zero|