A discussion of the data on cost-effectiveness comparisons

What does (and doesn't) it tell us about differences between charities?

On our comparing charities page, we discuss one of our core claims, echoed by many others in the effective giving ecosystem: many of us can easily 100x our impact by donating to highly-impactful charities. We cover why differences in impact exist and which factors contribute; we also demonstrate the vastness of these differences through a couple real life examples/case studies. But we wouldn’t be doing justice to our effective giving values without discussing some of the data that is often used to support claims about the differences between charities — so, while we chose to leave that discussion off the main page to avoid overwhelming our readers, let's dive into it here.

The DCP2 data: What does it say and why don't we use it on our comparing charities page?

Figure

Intervention cost-effectiveness in global health in order of DALY per $1,000 on the y-axis, from the DCP2. Compiled from The Moral Imperative Towards Cost Effectiveness by Toby Ord.

One of the most often discussed data sets that comes up when discussing how much different programs (or “interventions”) vary in their good per dollar or “cost-effectiveness,” is the Disease Control Priorities in Developing Countries 2006 report (DCP2), which was a project of The World Bank, the National Institutes of Health, the World Health Organization, and the Population Reference Bureau. This data (shown above) compared the cost-effectiveness of different interventions in global health. As discussed by Toby Ord in his foundational essay The Moral Imperative Towards Cost-Effectiveness, it found huge variance between the best and worst global health interventions and a still quite sizable difference (Ord suggests around 60x) between the median and best. It also suggested that the best interventions were so much better that, as Ord writes in his essay, “if we funded all of these interventions equally, 80% of the benefits would be produced by the top 20% of the interventions.”

While we think the patterns in the DCP2 data demonstrate a key effective giving principle — that there are substantial impact differences between the programs a charity may operate — we don’t rely on this data to demonstrate differences between charities. There are two reasons for this:

  1. Even if it were a highly reliable data set, The DCP2 data focuses on the average cost-effectiveness of interventions whereas we are interested in the marginal cost-effectiveness of charities. While the data does have some relevance to the marginal cost-effectiveness of charities, it is far from a 1:1 comparison (more below).
  2. It is not a highly-reliable data set (though we do think the core principle about significant differences in the cost-effectiveness of interventions holds up).

On the first point:

Average vs. marginal

We think that when you make a charitable donation, the relevant question is: how much good will my donation do? (cost-effectiveness on the margin) rather than “How much good does the average donation do?” (average cost-effectiveness). For example, suppose a charity is doing cutting edge research on vaccine development, but they are already fully staffed and resourced and therefore without need for additional funding. They may be highly cost-effective on average, but not on the margin.

Interventions vs. charities

We think it’s important not to conflate data on the cost-effectiveness of an intervention with data on the cost-effectiveness of a charity. While differences in the cost-effectiveness of interventions indicate that the cost-effectiveness of charities will vary based on which programs (interventions) they facilitate, there are other factors at play as well. For example, supporting a charity that facilitates a particularly cost-effective intervention will likely still be more expensive than is captured by the DCP2 data, since that data won’t necessarily include all the associated costs a charity might incur.

On the second point:

Reliability

In 2011, the charity evaluator GiveWell found some concerning errors when examining the DCP2’s figures for deworming, and GiveWell is now hesitant to put any weight on DCP2 cost-effectiveness estimates unless its research team can fully verify the calculations. Importantly, GiveWell only examined one intervention of many, so it isn’t clear whether there are additional errors in the data set; however, given the extent of the errors, and the fact that they weren’t caught prior to publication, there is good reason to question the reliability of this data source. As such, there is some doubt over how heavily donors should weigh the DCP2 conclusions; on the one hand, this was a project of over 350 experts and very credible agencies; on the other, GiveWell estimated the cost-effectiveness estimate on deworming to be off by a factor of 100.

Does other data support the patterns in the DCP2? What does this data tell us about charity comparisons?

Giving What We Can still displays the DCP2 data in some places on our site, because we think the “heavy tailed” pattern (which demonstrates the very large extent to which interventions vary in efficacy) is still accurate, even assuming some errors in the exact cost-effectiveness calculations. In other words, we think it’s fair to say that even if there were more errors in this data set, it’s highly likely it would still be heavy-tailed (given the very large spread we are starting with). That said, we don’t use the DCP2 to make claims about the exact cost-effectiveness figures of different interventions (or the exact amount interventions differ in impact) as we think GiveWell’s deep dive into the data casts doubt on those numbers. We remain confident, however, that the overall pattern shown by the data — that interventions vary greatly in their cost-effectiveness — holds up.

One of the reasons we are confident about this is that the 80/20 pattern (whereby the top 20% of interventions are so much better that implementing just those would do about 80% as much good as implementing all of them) is repeated in more recent data sets from a variety of sources. Indeed, when 80,000 Hours’ Benjamin Todd “checked” Ord’s foundational paper on the DCP2 against a collection of “all the studies [they] could find” he concluded that “the 80/20 pattern basically holds up” and found the top 2.5% of interventions in the data sets he examined to be around 20-200 times more cost-effective than the median (an intervention somewhere in the middle of the range) and 8-20 times more cost-effective than the mean (a randomly-chosen intervention).

So the data sets we have do support vast differences in the cost-effectiveness of different interventions, which means depending on which programs a charity implements, there will be vast differences in its overall impact.

In the examination referenced above, Todd also argues that the true impact difference might be lower than what we see in the data, because of the possibility of data errors like the type GiveWell encountered in the DCP2, regression to the mean, actual availability of the interventions examined, and positive secondary effects of some interventions; however, even taking into account these considerations, he still believes that the most effective interventions in an area are at least 3-10 times more cost-effective than the mean (“where the mean is the expected effectiveness you’d get from picking randomly”).

(By the way, In case there is any confusion about why Todd's numbers — 3-10 — are less than our estimate of “easily doing 100x more”, we want to clarify that we are making slightly different claims. Todd is saying that if a donor previously picked an intervention at random, they’d be able to 3-10x their impact by switching to supporting an intervention in the top 2.5% of interventions in that area. We’re saying that, depending on one’s starting point, we can easily 100x our impact by following some key principles when choosing where to donate. We also suspect that the starting point of most donors is more likely to be closer to the median of interventions (a “typical” intervention) than to the mean (choosing randomly). Perhaps more importantly, Todd's estimates are confined to the difference between interventions working in the same area, while ours are not. Todd estimates the difference in effectiveness between interventions in different cause areas to be considerably higher, stating that if one were to "both find a pressing problem and an effective solution...the combined spread in effectiveness could be 1,000 or even 10,000 fold."

That said, as GiveWell points out, there are limitations to cost-effectiveness calculations. When GiveWell analyses their top charities, its researchers take into account not only cost-effectiveness but also factors like confidence in the organisation (such as whether they have a proven track record) and how much evidence there is that a specific intervention works well.

Takeaways from the data (and its issues)

  • We think the fact that the data appears heavy-tailed in all the examined data sets is significant. This very explicitly supports the general claim that huge differences in impact exist between different interventions.
  • We agree with GiveWell that the DCP2 shouldn’t be used exclusively to make this claim, and also that it shouldn’t be relied on to find exact cost-effectiveness calculations for specific interventions. However, we think it is useful for showing a general principle that most people aren’t aware of: just how much charities can vary in their efficacy. The DCP2 shows a huge spread — the best interventions were about 10,000 times more cost-effective than the worst — so even if these estimates were off by the factor GiveWell found in the deworming case (100x) (and also taking into account that there are likely added costs a charity would incur that aren't necessarily captured by the cost-effectiveness of the intervention itself) large differences would still exist between charities facilitating the best interventions and those facilitating the least cost-effective interventions.
  • We also think the fact that GiveWell does its own cost-effectiveness analyses and finds large differences between good and great charities (with their top charities estimated to be at least ten times more cost-effective than cash transfers, an intervention that is already much more cost-effective than many others) further demonstrates that data showing large impact differences between interventions is consistent with the reality of the philanthropic landscape.
  • We agree with Benjamin Todd that data is not a perfect map of reality, and that true differences in impact are likely to be less than what is shown in the data sets we have. However, we think an imperfect map is better than no map at all, and find it telling that even taking into account that true impact is likely overstated by the data sets, differences in the cost-effectiveness of interventions in the available data sets reflect the kind of impact variations we’d expect donors to experience by following #1 and #2 of our five ways to multiply impact. That said, there are also other ways to multiply impact (such as #3, #4, and #5) that the DCP2 data — and data like it — doesn’t capture.

Concluding thoughts

We hope you enjoyed this deep dive into the data! If you have thoughts or feedback on this discussion, please let us know.