How to Measure Anything: Finding the Value of Intangibles in Business

By Douglas W Hubbard (2014)

I have heard managers say that since each new product is unique, they cannot extrapolate from historical data… therefore, they have to rely on their experience. Note that this is said with no hint of irony.

My Notes

Define the decision which measurements aim to influence.

Don’t measure if it’s not going to affect a decision (e.g., dashboards)
What’s the trigger? E.g., if x is > n, then we take action y
The wrong decision should have negative consequences

Value of information.

Expected value of information, etc to determine worth of further investigation
The less we know, the more valuable information is

Measure.

When we decompose a decision this way we get new insights. First, you find that there are several other important variables that pertain to the judgment. You might find that there are a lot of other things to measure besides what you first thought you needed to measure, and that one of these new variables is the most important measurement of all… Second, it turns out that merely decomposing highly uncertain estimates provides a huge improvement to estimates.

Examples of low-value measurements:

Time spent in an activity
Attendance to sales training
Near-term costs of a project
Number of violations found in safety inspections

Examples of high-value measurements:

Value of an activity
Effect of sales training on sales
Long-term benefits of project
Reduction in risk of catastrophic accidents

Confidence intervals to express uncertainty: e.g., 80-85% (low uncertainty), 50-80% (higher).

Monte Carlo simulations for calculations with confidence intervals.

Our intuitions about sampling are way off.

Rule of five. Poll a sample of five, and the median of the complete population will be in the range of the five values with 93.75% probability (no matter the population size).

Single sample majority rule, with values being equally likely (uniform distribution),

If you randomly select one sample out of a large population, even a population that numbers the thousands or millions, where you initially believed the population proportion can be anything between 0% and 100%, there is a 75% chance that the characteristic you observe in that sample is the same as the majority.

Types of measurement (Stanley Smith Stevens, psychologist),

Nominal: something is in a set
Ordinal: something is more or less than something else, but by how much is unknown (e.g., movie ratings)
Interval: we know by how much but zero is arbitrary (e.g., Celsius)
Ratio: zero is not arbitrary, it is nil (e.g., Kelvin, money)

The more homogeneous the population, the fewer samples needed.

Calibration:

Pretend to bet money—would you take a bet on your 90% estimate, or a 90% chance of winning on a spin? E.g., you win money 90% of the time for each bet
Assume your estimate is wrong and explain why
Look at the upper and lower bounds separately
Start with a wildly accurate range and begin eliminating ridiculous values (avoid anchoring)

Examples where statistical prediction was shown to outperform experts (Paul Meehl and Robyn Dawes),

College freshman GPAs
Recidivism
Medical student performance
Navy recruits’s bootcamp performance

From worst to best:

Unstructured information, subjective estimation process
Subjective weighted scores on arbitrary scales with no standardization at all—may not be an improvement at all; adds other errors (no improvement)
Structured, consistently represented information, informal assessment—at least removes some error due to inconsistently presented information (some improvement)
Simple linear model with standardised z-scores—slightly better at aggregating multiple factors than unaided judges (some improvement again)
Lens Model or Rasch Model (both big improvements)
- Lens: removes inconsistency for a judge and bias due to unrelated factors
- Rasch: standardizes results of different judges, different tests, and different situations
Objective model, if you can get the historical data (big improvement)