30239-notes/01.gog-altair/slides.md

215 lines
4.6 KiB
Markdown
Raw Normal View History

2024-09-26 04:41:41 +00:00
# Grammar of Graphics with Altair
## CAPP 30239
---
## Today
2024-09-27 22:28:23 +00:00
- What is a **grammar of graphics** and how do we use it in practice?
- What **types of data** do we encounter, and how does that affect visualizations?
- Introduction to **Altair**
2024-09-26 04:41:41 +00:00
---
## Grammar of Graphics
Hadley Wickham, creator of `ggplot2` and `tidyverse`, ["A Layered Grammar of Graphics"](http://vita.had.co.nz/papers/layered-grammar.pdf).
Key Idea: move beyond pre-defined composites like "scatter plot" and "bar chart" into a composable grammar from which we can construct a wide variety of visualizations.
---
## Wickham's Components:
1. data and aesthetic mappings,
2. one or more layers, each with
- a geometric object (line, point, etc.)
- (optional) statistical transformation
- (optional) position adjustment
3. one scale per aesthetic mapping (color, size, etc.)
4. a coordinate system
5. facet specification
---
## Types of Data
### **N** - Nominal
"strings" with no **order** (alphabetical does not count)
Species
States
Countries
### **O** - Ordered
- Grades: A, B, C, D, E, F
- Rankings: 1st, 2nd, 3rd
---
## Types of Data (Quantitative)
### **Q** - Interval (arbitrary zero)
- Dates (1 CE, Jan 1 1970, or...)
- Location (lat, lon)
Only differences matter, can't compare ratios.
_(What is 2024 / 1990?)_
### **T**emporal
Some systems (like Altair) will also offer this option specifically for dates and times.
### **Q** - Ratio (zero fixed)
Physical measurements, counts, amounts.
"4 km is _twice as far_ as 2 km"
---
## Types of Data (Operations)
| | =, != | <, >, <=, >= | +, - | ÷ |
| -------- | ----- | ------------ | ---- | --- |
| Nominal | ✓ | | | |
| Ordered | ✓ | ✓ | | |
| Interval | ✓ | ✓ | ✓ | |
| Ratio | ✓ | ✓ | ✓ | ✓ |
---
## Data Model to N, O, Q
- string?
- bool?
- float/int?
Possible exceptions?
---
## Data Model to N, O, Q
_Typically:_
- string - nominal or ordered
- bool - nominal
- float/int - interval or ratio
Possible Exceptions?
- Numeric IDs
- ZIP Codes
- ratio data stored with units (e.g. "10km")
---
## Mapping of Variables to Aesthetics
- position (X, Y, Z)
- length
- angle
- slope
- area
- volume
- density
- hue
- saturation
- texture
- connection
- containment/grouping
- shape
---
### Mackinlay's "effectiveness"
![width:800px](effectiveness.png)
---
## Altair
2024-09-26 04:41:41 +00:00
Altair is a Python visualization library that allows us to work from a grammar of graphics perspective.
2024-09-26 04:41:41 +00:00
It also is very flexible in output formats, which will be useful if you want to modify your graphics or make them interactive.
Altair is built on top of **Vega-Lite**.
Vega-Lite is a system that represents graphics in a JSON schema, and a set of tools that convert these JSON representations to images or interactive graphics.
2024-09-26 04:41:41 +00:00
---
## Vega-Lite Example
```json
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"description": "A scatterplot showing horsepower and miles per gallons for various cars.",
"data": {"url": "data/cars.json"},
"mark": "point",
"encoding": {
"x": {"field": "Horsepower", "type": "quantitative"},
"y": {"field": "Miles_per_Gallon", "type": "quantitative"}
}
}
```
Vega condenses several of the different pieces of the grammar to _"encoding channels"_.
---
![](vega.png)
---
## Altair
```python
import pandas as pd
import altair as alt
df = pd.read_csv("cars.csv")
alt.Chart(df).encode(
x="Horsepower:Q", # shorthand for simple features
alt.Y("Miles_per_Gallon:Q").title("Miles Per Gallon"), # longer form w/ customization
)
```
Altair is a Pythonic wrapper to create Vega-Lite JSON. If you use it in a notebook, the resulting graphs will render inline.
---
## Altair Notebook
<!-- at this point, see the marimo notebook in this directory -->
---
## Learning Altair
To master a library like Altair, you'll go through the following phases:
1. Learn the key concepts.
- Goal: Understand how the authors of Altair think about visualization.
- Achieved by: Reading user guide & watching tutorials.
2. Internalize concepts & API.
- Goal: Be able to do common tasks without referring to documentation. (You'll always lean on documentation for specifics.)
- Achieved by: Working on assignments & experimentation. Reading API reference as needed.
3. Mastery (not this quarter!)
- Goal: Be able to manipulate library to achieve most tasks. Understand limits.
- Achieved by: Regular use over months/years. Reading API reference and/or source code.
---
## Altair Assignment
<!-- walk through of assignment setup & how it'll be graded -->