Although we are familiar with the idea of criticising journalists for using misleading language in their verbal rhetoric, we are not as quick to point to the use of misleading data representation in statistical rhetoric, which is often non-verbal and highly visual. Graphs give a writer the opportunity to present statistics clearly, but also the opportunity to be confusing in two dimensions and multiple colours.
In a second post on on the importance of conveying information persuasively when discussing moral issues, Yuan gives her recommendations on representing data in graphs in an easily understandable way and on minimising the 'lie factor'.
Watch this space for Yuan's third post in this series where she'll wade into the international Dollar vs real Dollar debate.
Representation: Statistics in Graphs
Graphs give a writer the opportunity to present statistics clearly, but also the opportunity to be confusing in two dimensions and multiple colours.Representing numbers visually is much more arresting than presenting statistics numerically - so long as the graphics are easy to correctly interpret.
A common mistake among those graphing cost effectiveness data is to present differences on a logarithmic scale. This compresses differences in cost effectiveness of ten, a hundred or a thousand-fold into apparent visual differences of just two, three or four-fold. A good example is the linked figure on education interventions from the Poverty Action Lab. While the true variation may be understood by the academics reading these reports, it’s isn’t appreciated by most others. Even if someone does understand on an intellectual level that the differences are ten times larger than they appear - and many do not - the logarithmic scale makes this point less emotionally compelling. As a result, we strongly recommend retaining a linear scale to show just how much better some approaches are than others.
A badly presented graph can place even more mental strain on interpreting the graph than simply presenting a table of numbers. Edward Tufte’s book, the Visual Display of Quantitative Information (summarised here) is an excellent guide to principles of graphical representation of statistics. In short, your images need to have graphical integrity: to present information as succinctly as possible, and to present information in a way that is not visually misleading (to minimise what Tufte calls the “Lie Factor” of a graph). Graphical integrity has a huge impact on people’s perception of the data presented.
Although we are familiar with the idea of criticising journalists for using misleading language in their verbal rhetoric, we are not as quick to point to the use of misleading data representation in statistical rhetoric, which is often non-verbal and highly visual.
Tufte’s key recommendations are as follows (with my comments in square brackets):
“1. The representation of numbers, as physically measured on the surface of the graph itself, should be directly proportional to the numerical quantities represented [i.e. no log scales; use the numbers in absolute values]
2. Clear, detailed and thorough labeling should be used to defeat graphical distortion and ambiguity. Write out explanations of the data on the graph itself. Label important events in the data.
3. Show data variation, not design variation. [Meaning: Do not let the presentation style obscure the variation in the data that you are trying to display, or introduce new variation when there isn’t any. Ask yourself: What comparison am I using this data to display? Consider that humans find it hard to make multi-layered comparisons, and simplify your graph into multiple graphs if necessary.]
4. In time-series displays of money, deflated and standardized units of monetary measurement are nearly always better than nominal units. [I will look at this in the next section.]
5. The number of information carrying (variable) dimensions depicted should not exceed the number of dimensions in the data.
6. Graphics must not quote data out of context.”
_Image courtesy of povertyactionlab _