Goodhart's Law Isn't as Useful as You Might Think

‘Focus on the process, not the outcome’ is something we hear often from successful people across disciplines ranging from business to sport. This is a brilliant piece to appreciate this principle better, especially for those of us who have to make decisions to solve for problems where it is not obvious what input variables (often multiple) drive the desired outcome and to what extent. The most common approach in industry is to set targets and beat ourselves and our team to go after it come what may. However, the author makes the point that this often gives perverse incentives for those being measured against those targets to fudge the system. That in essence is what Goodhart’s Law says: “when a measure becomes a target, it ceases to be a good measure.”
The author shows how organisations can defeat Goodhart’s law drawing from the case study of Amazon’s famous ‘Weekly Business Review’ process. Indeed, Amazon seems to have drawn a lot from Edward Deming’s work on Statistical Process Control.
The author first sets the Goodhart problem statement as follows:
“…business processes are often processes where you don’t know the inputs to your desired output. So the first step is to figure out what those inputs are, and then figure out what subset of those you can influence, and then, finally, figure out the causal relationships between the input metrics and output metrics. A causal relationship looks something like: “an X% lift in input metric A usually leads to a Y% lift in output metric B. Oh, and output metric B is affected by five different input metrics, of which A is merely one”. It is not an accident that the best growth teams are able to say things like “a 30% increase in newsletter performance should drive a 5% improvement in new account signups” — if you ever hear something like this, you should assume that they’ve listened to the Voice of the Process very, very carefully.
…if you want to improve some process, you have to ignore the goal first, in favour of examining the process itself. On the face of it, this is common sense: you cannot expect your team to improve some metric if that metric isn’t directly controllable. No, you must first work out what set of controllable input metrics leads to the output metrics outcomes you desire, before you can even begin to talk about hitting targets.”
The article then talks about how Amazon became an efficient machine using this principle in its Weekly Business Review (WBR):
“The Amazon WBR is a weekly operational metrics review meeting in which Amazon’s leadership team gathers and reviews 400-500 metrics within 60-90 minutes. It occurs — or so I’m told — every Wednesday morning.
…The way that a WBR deck is constructed is instructive. Broadly speaking, Amazon divides its metrics into ‘controllable input metrics’ and ‘output metrics’. Output metrics are not typically discussed in detail, because there is no way of directly influencing them. (Yes, Amazon leaders understand that they are evaluated based on their output metrics, but they recognise these are lagging indicators and are not directly actionable). Instead, the majority of discussions during WBR meetings focus on exceptions and trends in controllable input metrics. In other words, a metrics owner is expected to explain abnormal variation or a worrying trend (slowing growth rate, say, or if a metric is lagging behind target) — and is expected to announce “nothing to see here” if the metric is within normal variance and on track to hit target. In the latter case, the entire room glances at the metric for a second, and then moves on to the next graph.
(Note that they do not skip over the metric entirely; glancing at the metric is quite important. You’ll see why in a minute.)
How do you come up with the right set of controllable input metrics? The short answer is that you do so by trial and error. Let’s pretend that you want to influence ‘Marketing Qualified Leads’ (or MQLs) and you hypothesise that ‘percentage of newsletters sent that is promotional’, ‘number of webinars conducted per week’ and ‘number of YouTube videos produced’ are controllable input metrics that affect this particular output metric. You include these three metrics in your WBR metrics deck, and charge the various metrics owners to drive up those numbers. Over the period of a few months (and recall, the WBR is conducted every week) your leadership team will soon say things like “Hmm, we’ve been driving up promotional newsletters for awhile now but there doesn’t seem to be a big difference in MQLs; maybe we should stop doing that” or “Number of webinars seems pretty predictive of a bump in MQLs, but why is the bump in numbers this week so large? You say it’s because of the joint webinar we did with Tableau? Well, should we track ‘number of webinars executed with a partner’ as a new controllable input metric and see if we can drive that up?”
The article then quotes from the book ‘Working Backwards’ by Amazon’ insiders a specific instance at Amazon when this trial and error method allowed them to identify the right control variables to get to the desired output:
‘One mistake we made at Amazon as we started expanding from books into other categories was choosing input metrics focused around selection, that is, how many items Amazon offered for sale. Each item is described on a “detail page” that includes a description of the item, images, customer reviews, availability (e.g., ships in 24 hours), price, and the “buy” box or button. One of the metrics we initially chose for selection was the number of new detail pages created, on the assumption that more pages meant better selection.
Once we identified this metric, it had an immediate effect on the actions of the retail teams. They became excessively focused on adding new detail pages—each team added tens, hundreds, even thousands of items to their categories that had not previously been available on Amazon. For some items, the teams had to establish relationships with new manufacturers and would often buy inventory that had to be housed in the fulfillment centers.
We soon saw that an increase in the number of detail pages, while seeming to improve selection, did not produce a rise in sales, the output metric. Analysis showed that the teams, while chasing an increase in the number of items, had sometimes purchased products that were not in high demand. This activity did cause a bump in a different output metric—the cost of holding inventory—and the low-demand items took up valuable space in fulfillment centers that should have been reserved for items that were in high demand.
When we realized that the teams had chosen the wrong input metric—which was revealed via the WBR process—we changed the metric to reflect consumer demand instead. Over multiple WBR meetings, we asked ourselves, “If we work to change this selection metric, as currently defined, will it result in the desired output?” As we gathered more data and observed the business, this particular selection metric evolved over time from
– number of detail pages, which we refined to
– number of detail page views (you don’t get credit for a new detail page if customers don’t view it), which then became
– the percentage of detail page views where the products were in stock (you don’t get credit if you add items but can’t keep them in stock), which was ultimately finalized as
– the percentage of detail page views where the products were in stock and immediately ready for two-day shipping, which ended up being called Fast Track In Stock.
You’ll notice a pattern of trial and error with metrics in the points above, and this is an essential part of the process. The key is to persistently test and debate as you go. For example, Jeff (Bezos) was concerned that the Fast Track In Stock metric was too narrow. Jeff Wilke argued that the metric would yield broad systematic improvements across the retail business. They agreed to stick with it for a while, and it worked out just as Jeff Wilke had anticipated.’
You can see how picking the wrong controllable input metric temporarily created a Goodhart’s Law type of situation within Amazon. But the nature of the WBR prevented the situation from persisting. Implicit in the WBR process is the understanding that the initial controllable input metrics you pick might be the wrong ones. As a result, the WBR acts as a safety net — a weekly checkpoint to examine the relationships between controllable input metrics (which are set up as targets for operational teams) and corresponding output metrics (which represent the fundamental business outcomes that Amazon desires). If the relationship is non-existent or negative, Amazon’s leadership knows to kill that particular input metric. Said differently, the WBR assumes that controllable input metrics are only important if they drive desirable outcomes — if the metric is wrong, or the metric stops driving output metrics at some point in the future, the metric is simply dropped.”

If you want to read our other published material, please visit https://marcellus.in/blog/

Note: the above material is neither investment research, nor financial advice. Marcellus does not seek payment for or business from this publication in any shape or form. Marcellus Investment Managers is regulated by the Securities and Exchange Board of India as a provider of Portfolio Management Services. Marcellus Investment Managers is also regulated in the United States as an Investment Advisor.

Goodhart’s Law Isn’t as Useful as You Might Think

Long read: The Right Kind of Stubborn

Long read: China’s vanishing banks are threatening social stability

Long read: The sperm whale ‘phonetic alphabet’ revealed by AI

Goodhart’s Law Isn’t as Useful as You Might Think

Related Long Reads

Long read: The Right Kind of Stubborn

Long read: China’s vanishing banks are threatening social stability

Long read: The sperm whale ‘phonetic alphabet’ revealed by AI