switch to Jupyter, marimo was having strange issues

2024-09-26 15:43:02 -05:00 · 2024-09-26 15:43:02 -05:00 · cd42f56eae
commit cd42f56eae
parent d795795b89
8 changed files with 1253 additions and 245 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1 +1,2 @@
 *.pyc
+.ipynb_checkpoints
--- a/00.value-of-dataviz/slides.md
+++ b/00.value-of-dataviz/slides.md
@ -233,7 +233,7 @@ Final project will have a place where D3 will be helpful, but other options will
 After introductory lecture, some examples will continue to be in D3, but you will not need to understand their inner workings.

 <!--
-It has however, become a "library's library" in some ways. Most developers interact with d3 through a higher-level interface. 
+It has however, become a "library's library" in some ways. Most developers interact with D3 through a higher-level interface. 

 We will be learning Altair, which generates Vega JSON, which in turn is drawn using D3.

@ -260,9 +260,10 @@ So, if you are here to learn visualization, I think that it is fair that you can
 ## Course Staff

 - James Turk
- TODO
+- Krisha Mehta
+- Sam Huang

-**All official information will be on the course site and/or Ed as appropriate.**
+**All official information will be on the course site and/or Ed.**


 ---
--- a/01.gog-altair/altair-notebook.py
+++ b/01.gog-altair/altair-notebook.py
@ -1,226 +0,0 @@
-import marimo
-
-__generated_with = "0.8.20"
-app = marimo.App(width="medium")
-
-
-@app.cell
-def __():
-    import marimo as mo
-    import altair as alt
-    import polars as pl
-    from pathlib import Path
-    return Path, alt, mo, pl
-
-
-@app.cell
-def __(mo):
-    mo.md(
-        """
-        ## Tidy Data
-
-        Altair expects our data to be [tidy](http://vita.had.co.nz/papers/tidy-data.html).
-
-        - Each variable is a column.
-        - Each observation is a row. 
-        - Each type of observational unit is a table.
-
-        You can use `pandas` or `polars` DataFrames.
-        """
-    )
-    return
-
-
-@app.cell
-def __(Path, __file__, pl):
-    # first let's load and look at a dataframe with three columns
-    #  there is an observation for each state legislature, showing how many bills they introduced in a given year
-    df = pl.read_csv(Path(__file__).parent / "midwest_bills.csv")
-    # (having a dataframe or chart as the last line in a notebook cell will automatically display it)
-    df
-    return (df,)
-
-
-@app.cell
-def __(alt, df):
-    # Let's make our own charts of this dat, first we bind the data to a new chart object
-    chart = alt.Chart(df)
-    return (chart,)
-
-
-@app.cell
-def __(chart):
-    # we add a geometry, we'll start with a point (at this point *something* can be displayed, but it won't be useful)
-    chart.mark_point()
-    return
-
-
-@app.cell
-def __(chart):
-    # We use encodings to map our data to particular dimensions.
-    # Altair will make then make appropriate choices based upon the type of data.
-    chart.mark_point().encode(
-        y="state",
-        x="num_bills"
-    )
-    return
-
-
-@app.cell
-def __():
-    return
-
-
-@app.cell
-def __(alt, chart):
-    # what happens when we try to add color?
-    chart.mark_point().encode(
-        alt.Y("state"),
-        alt.X("num_bills"),
-        alt.Color("session_start_year"),
-    )
-    return
-
-
-@app.cell
-def __(alt, chart):
-    # the prior example treated year as an Ordinal because it was numeric
-    # instead we would treat it as Nominal for this data
-    # we can use :Q, :O, :N, :T to mark the type that should be used
-    by_year = chart.mark_point().encode(
-        alt.Y("state:N"),
-        alt.X("num_bills:Q"),
-        alt.Color("session_start_year:N"),
-    )
-    # we're saving this one for later
-    by_year
-    return (by_year,)
-
-
-@app.cell
-def __(alt, chart):
-    # Here we make a different chart from the same base data 
-    # by re-using our `chart` variable.
-    #
-    # We choose a different shape (parameters that don't need to vary can be passed into the mark_* functions)
-    # We also use an aggregate function average(num_bills)
-    avgs = chart.mark_point(shape="wedge", color="black").encode(
-        alt.Y("state"),
-        alt.X("average(num_bills)"),
-    )
-    avgs
-    return (avgs,)
-
-
-@app.cell
-def __(avgs, by_year):
-    # two charts with compatible data can be layered with +
-    by_year + avgs
-    return
-
-
-@app.cell
-def __(alt, by_year, chart):
-    # perhaps we don't want to use mark_point anymore, maybe a bar?
-    bar_avgs = chart.mark_bar(color="#ccc").encode(
-        alt.Y("state"),
-        alt.X("average(num_bills)"),
-    )
-    bar_avgs + by_year
-    return (bar_avgs,)
-
-
-@app.cell
-def __(alt, chart):
-    # We can customize titles and other details by using `.title` and `.properties`
-    # the latter sets chart-wide properties.
-    final = chart.mark_point(shape="diamond").encode(
-        alt.Y("state:N"),
-        alt.X("num_bills:Q"),
-        alt.Color("session_start_year:N").title("Session Year"),
-    ) + chart.mark_bar(color="#70905050").encode(
-        alt.Y("state"),
-        alt.X("average(num_bills)").title("Number of Bills Introduced"),
-    )
-    final.properties(
-        title='Midwest Bills by State',
-        background='#f5f5dc'
-    )
-    return (final,)
-
-
-@app.cell
-def __(alt, chart):
-    # Let's say we instead want to see if there are trends by year.
-    # create a new chart object with year on the X-axis, and bills on the Y-axis
-    # Also, make the chart print/colorblind friendly by encoding state in multiple ways.
-    new_chart = chart.mark_point().encode(
-        alt.Y("num_bills"),
-        alt.X("session_start_year:N"),
-        alt.Color("state"),
-        alt.Shape("state"),
-    )
-    new_chart.properties(
-        title='Midwest Bills by Year',
-        background='#f5f5dc'
-    )
-    return (new_chart,)
-
-
-@app.cell
-def __(mo):
-    mo.md(
-        """
-        ### Recommended Reading
-
-        Altair Tutorial
-
-        - Specifying Data (you can stop when you hit 'Generated Data')
-        - Encodings
-        - Encodings -> Channels (skim Channel Options)
-        - Marks (skim a few of the mark guides, including Bar & Point)
-        - Data Transformations (skim a few, including Regression)
-        - Layered and Multi-View Charts
-        - Customizing Visualizations
-
-        Once you've read the above you have the core ideas of Altair.
-        The remaining sections are useful as reference, and as you use Altair you will find your way to them as you ask yourself questions like "how do I work with geospatial data" or "how can I combine these axes"?
-
-        The other common thing you will use the documentation for is "what arguments can I pass to this?"
-
-        For that, use the [API Reference](https://altair-viz.github.io/user_guide/api.html) and find the class you're working with.
-
-        Example: 
-
-        - Let's say we want to adjust the color scheme, start with <https://altair-viz.github.io/user_guide/generated/channels/altair.Color.html>
-        - Note that it can take a scale, and click to <https://altair-viz.github.io/user_guide/generated/core/altair.Scale.html#altair.Scale>
-        """
-    )
-    return
-
-
-@app.cell
-def __(alt, chart):
-    color_scheme = alt.Scale(scheme="set2")
-    chart.mark_line().encode(
-        alt.Y("num_bills"),
-        alt.X("session_start_year:N"),
-        alt.Color("state", scale=color_scheme),
-    ) + chart.mark_point().encode(
-        alt.Y("num_bills").title("Bills Introduced"),
-        alt.X("session_start_year:N").title("Session Year"),
-        alt.Color("state", scale=color_scheme),
-        alt.Shape("state"),
-    ).properties(
-        title='Midwest Bills by Session',
-    )
-    return (color_scheme,)
-
-
-@app.cell
-def __():
-    return
-
-
-if __name__ == "__main__":
-    app.run()
--- a/01.gog-altair/altair.ipynb
+++ b/01.gog-altair/altair.ipynb
--- a/01.gog-altair/slides.html
+++ b/01.gog-altair/slides.html
--- a/01.gog-altair/slides.md
+++ b/01.gog-altair/slides.md
@ -135,10 +135,80 @@ Possible Exceptions?

 ---

-## Altair's Grammar
+## Altair

-Altair condenses several of the different pieces of the grammar to _"encoding channels"_.
+Altair is a Python visualization library that allows us to work from a grammar of graphics perspective.

-We've seen X, Y, and color, let's take a look at some examples of other encoding channels.
+It also is very flexible in output formats, which will be useful if you want to modify your graphics or make them interactive.
+
+Altair is built on top of **Vega-Lite**.
+
+Vega-Lite is a system that represents graphics in a JSON schema, and a set of tools that convert these JSON representations to images or interactive graphics.

 ---
+
+## Vega-Lite Example
+
+```json
+{
+  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
+  "description": "A scatterplot showing horsepower and miles per gallons for various cars.",
+  "data": {"url": "data/cars.json"},
+  "mark": "point",
+  "encoding": {
+    "x": {"field": "Horsepower", "type": "quantitative"},
+    "y": {"field": "Miles_per_Gallon", "type": "quantitative"}
+  }
+}
+```
+Vega condenses several of the different pieces of the grammar to _"encoding channels"_.
+
+---
+
+![](vega.png)
+
+---
+
+## Altair
+
+```python
+import pandas as pd
+import altair as alt
+
+df = pd.read_csv("cars.csv")
+alt.Chart(df).encode(
+  x="Horsepower:Q",        # shorthand for simple features
+  alt.Y("Miles_per_Gallon:Q").title("Miles Per Gallon"),  # longer form w/ customization
+)
+```
+
+Altair is a Pythonic wrapper to create Vega-Lite JSON.  If you use it in a notebook, the resulting graphs will render inline.
+
+
+---
+
+## Altair Notebook
+
+<!-- at this point, see the marimo notebook in this directory -->
+
+--- 
+
+## Learning Altair
+
+To master a library like Altair, you'll go through the following phases:
+
+1. Learn the key concepts.
+    - Goal: Understand how the authors of Altair think about visualization.
+    - Achieved by: Reading user guide & watching tutorials.
+2. Internalize concepts & API.
+    - Goal: Be able to do common tasks without referring to documentation. (You'll always lean on documentation for specifics.)
+    - Achieved by: Working on assignments & experimentation. Reading API reference as needed.
+3. Mastery (not this quarter!)
+    - Goal: Be able to manipulate library to achieve most tasks. Understand limits.
+    - Achieved by: Regular use over months/years. Reading API reference and/or source code.
+
+---
+
+## Altair Assignment
+
+<!-- walk through of assignment setup & how it'll be graded -->
--- a/01.gog-altair/vega.png
+++ b/01.gog-altair/vega.png
--- a/README.md
+++ b/README.md
@ -14,19 +14,20 @@ Inside each directory, you're likely to find:

 - `slides.md` - My slides in raw markdown.
 - `slides.html` - My slides converted to a presentation. (using [`marp`](https://marpit.marp.app)) You can open this in your web browser (Type `open slides.html` from the command line.)
- `*-notebook.py` - These are marimo notebooks (see below).
+- `*.ipynb` - These are Jupyter notebooks (see below).

-Not every week will have slides & a notebook, but one or the other should generally exist.
+Not every week will have both slides & a notebook.

 Other files, such as images & data will be kept in the appropriate folder.

-### Marimo Notebooks
+### Jupyter Notebooks

-Marimo notebooks are similar to Jupyter notebooks, but work much better with Git and have some other nice features I appreciate.
+You have a few options for working with `.ipynb` notebooks:

-If you have ever looked at a Jupyter notebook file (.ipynb) in an editor, you know they are large JSON files, and once they are checked into Git changes become very difficult to track.
+- `uv run jupyter lab` - the newer UI, will start a server and 
+- `uv run jupyter notebook` - the older UI, perfectly functional still
+- VS Code will open these in it's own editor

-To interact with a notebook, run:
-
-`uv run marimo edit <notebook-file>`
+If you run one of the `uv run` options, you'll need to navigate to the .ipynb file in the window that opens in your browser.

+**Note:** To stop a server, press `Ctrl-C` and then 'y' to the prompt.