# 30239: Data Visualization for Policy Analysis ## James Turk --- ## Today - What is the value of data visualization? - Focus of this course - Course Logistics --- ![infographic: a day in data](a-day-in-data.jpg) --- ![data.gov](datagov.png) --- ![taxis](taxis.png) --- OK, there is **a lot** of data, but isn't that a **good thing**? --- "What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it." ~Herb Simon _as quoted by Hal Varian. Scientific American, September 1995_ --- ## What is Visualization? “... finding the **artificial memory** that best supports our natural means of perception.” _[Bertin 1967]_ “Transformation of the symbolic into the geometric” _[McCormick et al. 1987]_ “The use of computer-generated, interactive, visual representations of data to **amplify cognition**.” _[Card, Mackinlay, & Shneiderman 1999]_ --- ## Information Visualization "The use of computer-generated, interactive, visual representations of (abstract) data to **amplify cognition**." What does it mean to amplify cognition? Why do we often seek visualizations? --- ## Why do we create visualizations? - What visualizations have you created? - What visualizations have you seen that you remember? --- ## Reasons for Data Visualization - Means of reasoning about large quantities without reduction/over-simplification. - Assist in gaining unique insights into data: clustering, correlation, trends, etc. - Deepen understanding, for ourselves or others. --- ### Four Samples | x1 | y1 | x2 | y2 | x3 | y3 | x4 | y4 | | ---- | ----- | ---- | ---- | ---- | ----- | ---- | ----- | | 10.0 | 8.04 | 10.0 | 9.14 | 10.0 | 7.46 | 8.0 | 6.58 | | 8.0 | 6.95 | 8.0 | 8.14 | 8.0 | 6.77 | 8.0 | 5.76 | | 13.0 | 7.58 | 13.0 | 8.74 | 13.0 | 12.74 | 8.0 | 7.71 | | 9.0 | 8.81 | 9.0 | 8.77 | 9.0 | 7.11 | 8.0 | 8.84 | | 11.0 | 8.33 | 11.0 | 9.26 | 11.0 | 7.81 | 8.0 | 8.47 | | 14.0 | 9.96 | 14.0 | 8.10 | 14.0 | 8.84 | 8.0 | 7.04 | | 6.0 | 7.24 | 6.0 | 6.13 | 6.0 | 6.08 | 8.0 | 5.25 | | 4.0 | 4.26 | 4.0 | 3.10 | 4.0 | 5.39 | 19.0 | 12.50 | | 12.0 | 10.84 | 12.0 | 9.13 | 12.0 | 8.15 | 8.0 | 5.56 | | 7.0 | 4.82 | 7.0 | 7.26 | 7.0 | 6.42 | 8.0 | 7.91 | | 5.0 | 5.68 | 5.0 | 4.74 | 5.0 | 5.73 | 8.0 | 6.89 | What sense can we make of this? --- | | Sample 1 | Sample 2 | Sample 3 | Sample 4 | | ----------------------- | ----------------- | ----------------- | ----------------- | ----------------- | | Mean of x | 9 | 9 | 9 | 9 | | Variance of x | 11 | 11 | 11 | 11 | | Mean of y | 7.50 | 7.50 | 7.50 | 7.50 | | Variance of y (±0.003 ) | 4.125 | 4.125 | 4.125 | 4.125 | | Correlation x & y | 0.816 | 0.816 | 0.816 | 0.816 | | Linear Regression | y = 3.00 + 0.500x | y = 3.00 + 0.500x | y = 3.00 + 0.500x | y = 3.00 + 0.500x | | R² coefficient | 0.67 | 0.67 | 0.67 | 0.67 | --- ![Anscombe's quartet](anscome.svg) --- Our understanding of the data is enhanced by these visualizations in a way summary statistics won't capture. What things are easier to see? --- ## Benefits of Visualizing Data - outliers - "shape of data" - clusters - _intuition_ and _questions_ - "why is it like that?" - tap into human capacity for pattern recognition - often easier to test theories or models _exploratory visualization_ --- ## Explanatory Visualization - Highlight interesting findings - Tell a story - Present a thesis - Persuade - Support larger story/argument - Inspire --- ## Visualizations and Policy Visualizations have an outsized influence on human cognition, we seem to trust images more than words. This means that visualizations can be used to **persuade**, but also **mislead**. There does not need to be intention, we can easily deceive ourselves without realizing it. --- ![](rockets_chart.png) --- ![bg](challenger.jpg) ![bg](feynman.jpg) --- ![](cholera.jpg) --- ## Data Visualization for Policy Analysis --- ## Course Goals - Understand & appreciate what makes a good data visualization. - Learn practical visualization techniques that will apply in any language & library. - Build a portfolio of static & interactive visualizations using real-world policy data. - Gain exposure to useful libraries in Python and JavaScript. --- ## Topics - Fundamentals, Grammar of Graphics - Design principles. How to use color, human perception, chart design. - How to evaluate and critique visualizations. - Uncertainty & Narrative - HTML/CSS/JS overview - D3.js - Interactive & non-chart data visualization. - Geospatial visualization - Special Topics: to be discussed --- ## Programming Workload Key Idea: You get better at visualizations by making *a lot* of them. (20-30 this quarter) - Expect to write code every week, mostly fairly short Python functions. - You will be learning at least one library (Altair) mostly independently. - You'll also need to be comfortable with `pandas` or `polars`. ### JavaScript and D3 *"You aren't going to make them learn D3 are you?"* - 1 Assignment - 1-2 Lectures Final project will have a place where D3 will be helpful, but other options will be presented. After introductory lecture, some examples will continue to be in D3, but you will not need to understand their inner workings. --- ## Course Structure - **Lecture and Discussion**: Introduce and explore key concepts, mostly focused on theory. - **Readings**: Supplement course materials with more examples, technical tutorials. - **2 "minor" assignments: Altair & D3** - Gain practice with commonly used tools in a structured setting. - **Projects**: Learn to explore a topic on your own from conception to practice. Leave here with a portfolio you can show off. --- ## Course Staff - James Turk - Krisha Mehta - Sam Huang **All official information will be on the course site and/or Ed.** --- ## Projects | | Static | Interactive | | -------------- | ------ | ----------- | | Proposal | Week 1 | Week 6 | | Feedback Draft | Week 3 | Week 8 | | Peer Critique | Week 4 | Week 9 | | Final | Week 5 | Week 10 | ## Practice Assignments - Altair (Week 2) - D3 (Week 6) --- ## Grading 15 SNU grades **Completion:** 9 (2x Proposals, Drafts & Critiques + 2 Labs + 1 Participation) **Quality Grades:** 6 (2x Design, Narrative, Code Quality) --- ## AI Policy In this course, all usage of generative AI must be *fully cited*. Details on specific rules: You are, as always, expected to turn in **your own work**, not the work of an LLM. --- ## Static Visualization Project **End Product** Using real data of your choosing: - 8-12 distinct images, of at least 5 different types. - Presented as part of a **narrative**: an article, infographic, poster. - Cohesive visual design: custom theme for graphs, matching colors and fonts with supplementary material. Use of Altair is **strongly** recommended, but other libraries allowed. --- ## Examples --- ## Milestones - Milestone 1 (Week 1): Draft proposal. - Milestone 2 (Week 3): Draft of 8 visualizations for review & critique. - Milestone 3 (Week 4): Peer Critique - Milestone 4 (Week 5): Final Deliverable --- ## Acknowledgements & References Thanks to Alex Hale, Andrew McNutt, and Jessica Hullman for sharing their materials. [Why Is Data Visualization Important? What Is Important in Data Visualization?](https://hdsr.mitpress.mit.edu/pub/zok97i7p/release/4) - An tony Unwin