2024-09-28 04:10:24 +00:00
---
theme: custom-theme
---
# Chart Design
## CAPP 30239
---
## Today
- What general **principles of visual design** are relevant to our work?
- What are the **common types of charts** and how do we use them?
- When and how do we break the rules?
---
## Edward Tufte
### The Visual Display of Quantitative Information
![](tufte.png)
---
## Key Ideas
- Graphical Integrity: Above all else, show the data.
- Maximize the data-ink ratio.
2024-09-28 19:45:22 +00:00
- Minimize chart junk.
2024-09-28 04:10:24 +00:00
- Aim for high chart density, consider *small multiples* .
- Revision & Editing are essential.
---
## Tufte's Principles for **Graphical Integrity**
---
1. The representation of numbers, as physically measured on the surface of the graphic itself, should be directly **proportional** to the numerical quantities represented.
![](liefactor.jpg)
Mileage increase: 53%
Graph length increase: 783%
"Lie Factor": 14.8x
---
2. Clear, detailed and thorough **labeling** should be used to defeat graphical distortion and ambiguity.
![bg left ](spinal.webp )
How many children get a spinal injury every year? (out of 74,000,000 children in US)
<!-- .0000003% -->
---
3. Write out explanation of the data on the graphic itself. **Label important events** in the data.
![](labeled.png)
---
4. Show **data variation, not design variation** .
Deflated & standardized units of money are almost almost superior to nominal units.
The number of information-carrying (variable) dimensions depicted should not exceed the number of dimensions in the data. (roughly 1:1 channel mapping)
Exception: It is OK/common to pair color & shape, or for print color & texture to address issues that color presents.
---
2024-10-02 22:43:38 +00:00
## Misleading Axes
- not starting at 0
- dual axes
- pie charts that don't add up to 100
TODO: add examples
---
2024-09-28 04:10:24 +00:00
## Data-Ink Ratio
- **Data-ink**: Ink (pixels) used to show data.
- Data-ink ratio: data-ink / total-ink
![](francetrains.jpg)
---
![](eec.gif)
![bg right width:600px ](sizecycle.gif )
---
## Optimizing Data Density
Number of entries in DataFrame / Area of Graphic.
Classic example of high data density is the sparkline, which can fit on a line of text.
![](sparkline.png)
---
![bg left height:700px ](age-junk.png )
## Chart Junk
Anything that isn't relevant to understanding the data.
---
![](chartjunk-bullet.webp)
via junkcharts.typepad.com
---
## Common Chart Types
2024-09-28 19:45:22 +00:00
---
2024-09-28 21:51:09 +00:00
## How to Pick?
- Quantitative / Quantitative:
- Quantitative / Temporal:
- Quantitative / Nominal:
- Nominal / Nominal:
---
2024-09-28 04:10:24 +00:00
### Bar Charts & Histograms
2024-09-28 19:45:22 +00:00
- X/Y: Nominal (Binned Numerical - Histogram)
- Y/X: Quantitative
![](bars.png)
---
2024-09-28 04:10:24 +00:00
### Line & Area Charts
2024-09-28 21:51:09 +00:00
- X: Temporal / Quantitative
- Y: Quantitative (means / sums)
2024-09-28 19:45:22 +00:00
![bg right width:600px ](lines.png )
---
### When to use stacked area charts?
![bg left width:600px ](area.png )
2024-09-28 21:51:09 +00:00
Sum of stacked axis variable **must have meaning** .
2024-09-28 19:45:22 +00:00
---
### Heatmap
![bg right width:600px ](heatmap.png )
- X & Y: Quantitative or Nominal
- Color: Quantitative
- `mark_rect`
---
### Strip Plot
![bg left width:600px ](strip.png )
2024-09-28 04:10:24 +00:00
2024-09-28 19:45:22 +00:00
- Y: Nominal
- X: Temporal or Quantitative
- Color: Optional (any type)
- `mark_tick`
---
2024-09-28 04:10:24 +00:00
### Pie / Donut / Radial Charts
2024-09-28 19:45:22 +00:00
![](pyramid.png)
Theta: Quantitative (ratio)
Color: Nominal
Direct comparison of segments is very difficult at n > 2.
Only use when most important information is ratio between sizes, and relatively few categories.
---
![](pie-comparison.png)
https://www.storytellingwithdata.com/blog/2020/5/14/what-is-a-pie-chart
---
### Bump / Rank Line Chart
2024-09-28 04:10:24 +00:00
2024-09-28 19:45:22 +00:00
![width:200px left ](rankline.png )
![width:500px left ](bump.png )
Useful for showing changes in relative positioning.
Require some data manipulation using `transform_window` or pre-computing ranks. (see Altair gallery examples.)
---
### Scatter & Bubble Plots
2024-09-28 21:51:09 +00:00
![bg left width:600px ](bubble.png )
2024-09-28 19:45:22 +00:00
- X / Y: Quantitative
Bubble charts use size as a 3rd dimension.
2024-09-28 21:51:09 +00:00
(Note subtle but useful transparency usage as well.)
2024-09-28 19:45:22 +00:00
---
2024-09-28 04:10:24 +00:00
### Small Multiples / Faceting
2024-09-28 21:51:09 +00:00
![facet ](facet.png )
2024-09-28 19:45:22 +00:00
![bg right ](small-maps.png )
<!-- source: https://www.juiceanalytics.com/writing/better - know - visualization - small - multiples -->
2024-09-28 21:51:09 +00:00
Useful when there is a nominal variable being compared across two other dimensions.
2024-09-28 19:45:22 +00:00
---
2024-09-28 04:10:24 +00:00
### Map Basics
2024-09-28 21:51:09 +00:00
Two most common:
- point maps
- choropleths
![bg left width:600px ](london-trees.png )
*Image: Trees in London, data.london.gov.uk*
<!-- source: https://data.london.gov.uk/dataset/local - authority - maintained - trees#:~:text=The%20data%20does%20not%20represent,streets%2C%20private%20gardens%20and%20more. -->
**We will revisit maps later in this course.**
---
## Two choropleths, same data.
![bg width:600px ](arcgis-chorolpleth.png )
![bg width:600px ](arcgis-choropleth2.png )
<!-- source: https://carto.maps.arcgis.com/apps/webappviewer/index.html?id=7475c5788efe4c75a9642f552f61d568 -->
Unit of measurement is incredibly important.
Consider alternatives if district/population sizes vary significantly.
2024-09-28 04:10:24 +00:00
---
## When & How to Break the Rules
2024-09-28 19:45:22 +00:00
**When in doubt...**
- 95% of visualizations should be some variation of the common types.
- Focus on Tufte's rules for clarity.
---
### Case Study: Two Innovations
Two visualization types that have had their moment in the past 10-15 years:
- Hex/Grid Maps
- Word Clouds
---
## Grid Map
![](npr-side-by-side.png)
Introduced in < https: / / blog . apps . npr . org / 2015 / 05 / 11 / hex-tile-maps . html >
<!-- discuss: is this a good thing? -->
---
## Word Cloud
![](word-cloud.jpg)
---
![bg left ](nyt1.png )
![](nyt2.png)
Derived from same data as word cloud.
source: NYTimes via https://www.niemanlab.org/2011/10/word-clouds-considered-harmful/
---
## Narrative-supporting graphics
![bg left width:500px ](crochet.jpg )
by ulaniulani on flickr
---
2024-09-28 04:10:24 +00:00
### When it's OK to use 3D
2024-09-28 19:45:22 +00:00
You have data that truly makes more sense in 3D.
and/or
You work at CERN.
![bg right width:700px ](lhc.png )
2024-09-28 21:51:09 +00:00
(Image: CERN Large Hadron Collider)
2024-09-28 04:10:24 +00:00
---
## Acknowledgements & References
Thanks to Alex Hale, Andrew McNutt, and Jessica Hullman for sharing their materials.
- https://www2.cs.uh.edu/~ceick/NO/COSC3337-DV2.pdf
2024-09-28 19:45:22 +00:00
- Images from Tufte's Visual Display of Quantitative Information
- Images from Altair < https: // altair-viz . github . io / gallery / index . html >