Visualizing Uncertainty

CAPP 30239

What causes uncertainty?

  1. measurement error - An instrument used has some non-perfect degree of accuracy. In a survey, this could be a poorly-worded question.
  2. model uncertainty - Models make assumptions and simplifications, different assumptions lead to different outcomes.
  3. sampling variability - Differences between sample & population.
  4. missing data - How missing data is accounted for & represented.

The result is that we have a range or distribution, where we want a number to use with one of our channels (Hue, X, Y, etc.).

Challenges of Uncertainty

Often left out, in part due to being hard to understand, and even harder to visualize.

Omission however misleads audiences, especially where a lot of significant figures are included.

Global Population Uncertainty: ±160 million people (2%)

Challenges of Uncertainty

Uncertainty estimates are simplified, often out of necessity.

30% chance of rain: "A 30% chance that at least 0.01" of rain will fall somewhere within a given area over a 12 hour period."

Do I bring an umbrella?

Challenges of Uncertainty

Complexity of visualization can overwhelm audience, obscure other meaning.

From a data-ink ratio perspective, it is understandable why if the error bars do not seem relevant to a narrative, that they would be omitted.

Including Uncertainty

If omitting uncertainty misleads, it violates our prime directive of graphical integrity.

The job then, is to find ways that are audience appropriate & don't obfuscate the meaning.

The difficulty will be in resolving this tension.

Common Techniques

  • Uncertainty as Probability
  • Error Bars
  • Confidence Bands

Uncertainty As Probability

Random waffle chart: works for cases with discrete outcomes.

Uncertainty As Probability

In practice, we often care about more than boolean outcome.

Uncertainty of Point Estimates

These work when we're focused on uncertainty around a particular outcome.

Sometimes we need to show uncertainty around discrete measurements, or projections.

Error Bars

Error Bands

line = alt.Chart(source).mark_line().encode(
    x='Year',
    y='mean(Miles_per_Gallon)'
)

band = alt.Chart(source).mark_errorband(extent='ci').encode(
    x='Year',
    y=alt.Y('Miles_per_Gallon').title('Miles/Gallon'),
)

band + line

Issues with Error Bars & Confidence Bands

  1. There is no pre-defined meaning of these intervals.
    If error bars or bands are included, the legend must include information on the meaning.
  2. Error bars are common in scientific & academic literature, other audiences cannot be assumed to understand them.
  3. Restricted to 1D/2D dots. If variable being expressed is mapped to color, area, etc. then alternative presentations needed.

Variations on Error Bars & Intervals

Regression Uncertainty

Regression Uncertainty

Other Approaches

Showing Multiple Futures

Hurricane Uncertainty

On Maps

"Sketchiness"

Animating Uncertainty

References & Acknowledgements

What is this trying to show? source: https://www.ipcc.ch/report/ar6/wg1/figures/chapter-3/figure-3-4/

These are showing essentially the same thing, one shows individual models and the other uses some aggregates with confidence intervals. These are from the same page of the IPCC report.

We can convert this to discrete measurements: quantile dot plot.

source: fivethirtyeight

when appropriate, can also be used to show multiple intervals

care should be taken that distribution is indeed normal if curves/etc. chosen

source: https://tamucoa.b-cdn.net/app/uploads/2021/10/House2011TrackUncertaintyVisualization.pdf

source: https://www.e-education.psu.edu/geog486/sites/www.e-education.psu.edu.geog486/files/Lesson_07/Images/ex_vs_ont.PNG