# Introduction to Altair

## Before we start: Jupyter Notebooks

- Made up of "cells" which can be Python, markdown, or with extensions, any language.
- Cells execute individually, and can execute out of order.  Always re-order cells to be logical and make use of comments and/or markdown cells to make your notebook a narrative.
- Tip: Before submitting a notebook, restart the kernel and run all cells in order. (See Kernel menu for options.) Ensure that you didn't inadvertently break things.
- Not particularly Git friendly as the contents of the notebook are stored in JSON.  (Still keep them in Git! even if suboptimal better than losing work!)
- An alternative is `marimo` notebooks, which fix a few of the above issues.  Still new & there are some rough edges still so I'll be sticking with Jupyter.

`uv run jupyter lab` or `uv run jupyter notebook` will start the local notebook server (Lab is a newer UI, Notebook is the traditional one).  You may opt to use VS Code, but I personallyh find their interface for notebooks more confusing than helpful.

### Style Addendum (for all assignments in this course)

- Notebook style imports allowed/preferred.
- Limited use of global variables permitted with useful names and comments.
- Notebooks must execute sequentially!

## Altair expects "tidy" data

Altair expects our data to be [tidy](http://vita.had.co.nz/papers/tidy-data.html).

- Each variable is a column.
- Each observation is a row.
- Each type of observational unit is a table.

You may use `pandas` or `polars` DataFrames. 
Class examples will focus on pre-cleaned data, and I will use polars in examples, though rarely will what I show differ significantly in pandas.

In [18]:
# best to follow convention, "notebook" style imports are allowed/preferred
import marimo as mo
import altair as alt
import polars as pl
from pathlib import Path

In [22]:
# Also OK, limited use of global variables.
#
# Avoid `df` though because that is a common parameter name and not helpful in a global context.
# Also to avoid: `df2`, `df_pure`, `df_final_final_final_final_v4_v9`.

# first let's load and look at a dataframe with three columns
#  there is an observation for each state legislature, showing how many bills they introduced in a given year
bills = pl.read_csv("midwest_bills.csv")

# Having a dataframe or chart variable as the last line in a notebook cell will automatically display it.
#  Use sparingly, and typically only for final chart in your own work, but in tutorials I will show intermediate steps.
bills

state,session_start_year,num_bills
str,i64,i64
"""IL""",2017,13616
"""IL""",2019,12760
"""IL""",2021,12847
"""IL""",2023,11951
"""MI""",2017,4818
…,…,…
"""MI""",2023,3424
"""WI""",2017,1820
"""WI""",2019,2264
"""WI""",2021,2618


In [4]:
# Let's make our own charts of this data, first we bind the data to a new chart object
chart = alt.Chart(df)

In [8]:
# we add a geometry, we'll start with a point (at this point *something* can be displayed, but it won't be useful)
chart.mark_point()

In [29]:
# Example 1
# We use encodings to map our data to particular dimensions.
# Altair will make then make appropriate choices based upon the type of data.

chart.mark_point().encode(
    y="state",
    x="num_bills"
)

In [10]:
# Example 2 - what happens when we try to add color?
chart.mark_point().encode(
    alt.Y("state"),
    alt.X("num_bills"),
    alt.Color("session_start_year"),
)

In [32]:
# Example 3
# The prior example treated year as an Ordinal because it was numeric,
# instead we would treat it as Nominal for this data.
# We can use :Q, :O, :N, :T to mark the type that should be used.

by_year = chart.mark_point().encode(
    alt.Y("state:N"),
    alt.X("num_bills:Q"),
    alt.Color("session_start_year:N"),
)

# we're saving this one for later, so repeat the variable name to see it
by_year

In [34]:
# Example 4
# Here we make a different chart from the same base data 
# by re-using our `chart` variable.
#
# We choose a different shape (parameters that don't need to vary can be passed into the mark_* functions)
# We also use an aggregate function average(num_bills)

avgs = chart.mark_point(shape="wedge").encode(
    alt.Y("state:N"),
    alt.X("average(num_bills)"),
)
avgs

In [36]:
# Example 5
# We can combine compatible charts by using `+` to layer them.
# There are other operators you'll encounter that allow placing charts side by side.
by_year + avgs

In [41]:
# Example 6
# perhaps we don't want to use mark_point anymore, maybe a bar?
bar_avgs = chart.mark_bar(color="#ccc").encode(
    alt.Y("state"),
    alt.X("average(num_bills)"),
)
bar_avgs + by_year

In [42]:
# Example 7
# We can customize titles and other details by using `.title` and `.properties`
# the latter sets chart-wide properties.

final = chart.mark_point(shape="diamond").encode(
    alt.Y("state:N"),
    alt.X("num_bills:Q"),
    alt.Color("session_start_year:N").title("Session Year"),
) + chart.mark_bar(color="#70905050").encode(
    alt.Y("state"),
    alt.X("average(num_bills)").title("Number of Bills Introduced"),
)
final.properties(
    title='Midwest Bills by State',
    background='#f5f5dc'
)

## Exercise

Let's say we instead want to see if there are trends by year.
Try and create a new chart that:
- has year on X axis, and bills on Y
- is print & colorblind friendly (using multiple channels for encoding state)
- uses a custom color scale, not the default (to avoid confusion with the colors used in our earlier chart)

Try completing this below, before continuing on to look at my version.

In [43]:
# YOUR SOLUTION HERE

### Altair User Guide / Next Steps

See <https://capp30239.netlify.app/readings/> for details.

You'll want to work your way through the recommended material at your own pace, which will help you learn the material much better than me reading library documentation to you would. :)

**You aren't on your own**, we're here to help -- start with documentation and ask questions on Ed, in class, and in office hours.

Once you've read the guide and worked through the assignment you will have the core ideas of Altair.

The next steps towards mastery are using it a lot and asking questions to deepen understanding, *not leaning on GenAI too much*.

You're reading a fairly small subset of the library's documentation. 
The remaining sections are useful as reference, and as you use Altair you will find your way to them as you ask yourself questions like "how do I work with geospatial data" or "how can I combine these axes"?

It's likely the most common thing you will use the documentation for is "what arguments can I pass to this?"

For that, use the [API Reference](https://altair-viz.github.io/user_guide/api.html) and find the class you're working with.

Example:

- Let's say we want to adjust the color scheme, start with <https://altair-viz.github.io/user_guide/generated/channels/altair.Color.html>
- Note that it can take a scale, and click to <https://altair-viz.github.io/user_guide/generated/core/altair.Scale.html#altair.Scale>
- We find ourselves on Vega documentation, here <https://vega.github.io/vega-lite/docs/scale.html#scheme> before long. Vega documentation can be very helpful for understanding the options that are available, since Altair is an interface to those.

In [54]:

# one possible solution to the exercise

color_scheme = alt.Scale(scheme="set2")
chart.mark_line().encode(
    alt.Y("num_bills"),
    alt.X("session_start_year:N"),
    alt.Color("state", scale=color_scheme),
) + chart.mark_point().encode(
    alt.Y("num_bills").title("Bills Introduced"),
    alt.X("session_start_year:N").title("Session Year"),
    alt.Color("state", scale=color_scheme),
    alt.Shape("state"),
).properties(
    title='Midwest Bills by Session',
)