diff --git a/14.misc/mosaic.jpg b/14.misc/mosaic.jpg new file mode 100644 index 0000000..f481c18 Binary files /dev/null and b/14.misc/mosaic.jpg differ diff --git a/14.misc/plotly-dash.png b/14.misc/plotly-dash.png new file mode 100644 index 0000000..94f894b Binary files /dev/null and b/14.misc/plotly-dash.png differ diff --git a/14.misc/slides.html b/14.misc/slides.html new file mode 100644 index 0000000..b6a53c8 --- /dev/null +++ b/14.misc/slides.html @@ -0,0 +1,227 @@ +
+

Data Visualization for Public Policy

+
+
+

This Week

+
    +
  • Project Questions / Deployment
  • +
  • Code Quality & Style
  • +
  • Dashboards
  • +
  • Visual Style Guides
  • +
  • 100 Visualizations
  • +
  • Animation & Interaction
  • +
+
+
+

Code Quality

+

When you are writing a data pipeline or application, code quality is of high importance.

+
    +
  • Readability & Documentation => easier maintenance & bugs prevented.
  • +
  • Small speed-ups from better algorithm/data structure choices can make big differences when that task executes millions of times.
  • +
  • Test coverage makes refactoring easier, prevents regressions.
  • +
+
+
+

Unique Considerations for Data Viz

+
    +
  • Typically little to no ongoing reuse/maintenance.
  • +
  • Visualization itself unlikely to be performance bottleneck compared to data manipulation.
  • +
  • Focus is on immediate visual output, testing de-emphasized.
  • +
  • Often written by solo developer, even in larger organizations.
  • +
+

Code quality still matters, but your main goal should be code that you can trust is correct. Testing, documentation, and the "right way" are less essential.

+
+
+

Dashboards

+ +

Long-lived data visualizations that typically run against a central repository of data.

+

You can use the same techniques & tools, or custom dashboard-focused tools like Tableau or Dash.

+

Key difference: You will likely need some degree of dynamic refresh (instead of loading CSV/JSON load data from DB/API). Comes with caching and other performance considerations.

+
+
+

Dashboard Psuedocode

+
every interval {
+    data = update()
+    visualize(data)
+}
+
+

Can make use of animation to provide context:

+
    +
  • scrolling time series
  • +
  • animated dials to show directional changes
  • +
+
+
+

Are Dashboards Bad?

+

Dashboards saw a surge in popularity a decade or so ago, and there are now plenty of bad dashboards out there.

+

Golden rule of dashboards: answer a question & make them actionable.

+

Too often people just throw all their data on a dashboard.

+

OK, I can see that 6 errors occurred in the last 24 hours...

+
    +
  • Is that a lot? Show trends where appropriate!
  • +
  • What can I do? Provide links/action items!
  • +
+

Without this focus, dashboards become decorations.

+
+
+

Style Guide

+

It can be helpful to create or build from a style guide. Even for your own work.

+

Examples:

+ +
+
+

Key Elements

+

Typography

+

Select 2 complementary fonts:

+
    +
  • Prefer a very legible sans-serif font for data/axes labels.
  • +
  • Any legible font for chart titles/narrative/etc.
  • +
+

Color Selection

+

Best to have:

+
    +
  • Nominal data: Distinct, contrasting hues
  • +
  • Quantitative data: Linear or divergent gradients
  • +
  • Consider color-blindness and accessibility
  • +
+
+
+

Style Guide: Chart Selection

+
    +
  • Match chart type to data characteristics and audience.
  • +
  • Consider: +
      +
    • Data dimensionality
    • +
    • Comparison needs
    • +
    • Narrative goals
    • +
    +
  • +
+
+
+

Creativity: 1 Dataset 100 Visualizations

+

https://100.datavizproject.com

+
+
+

Applications of Animation

+
    +
  • Demonstrate change over time: Data being added to chart as time "plays."
  • +
  • Highlight relationships: Hover/highlight/select modifies display of other data on page.
  • +
  • Focus attention: Show subsets of data at a time.
  • +
  • Show uncertainty: "wiggle", shifting trend line (next page)
  • +
+

More Examples:

+ +
+
+
+

Applications of Interaction

+
    +
  • Enable user-driven exploration of data. +
      +
    • "How do these two variables compare?"
    • +
    • "What happens if this price increases?"
    • +
    +
  • +
  • Allow personalization (e.g. enter your zip code) +
      +
    • "What is this like in my city?"
    • +
    +
  • +
  • Increased engagement/retention. Lots of evidence showing we learn best by participating.
  • +
+
+
+

JS setInterval

+
// will call `func` every `everyMS`
+let intervalId = setInterval(func, everyMS) 
+
+// stop calling func
+clearInterval(intervalId)
+
+

https://developer.mozilla.org/en-US/docs/Web/API/Window/setInterval

+
+
+

Interaction: Making Data Selections

+

For user-driven data explorations, selection is an important concept.

+

How do you want to let a user select individual records or groups of records?

+

Selection Spectrum: Simple to Complex

+
    +
  • Menu/Select Box
  • +
  • Hover/click on items on page (tooltips, etc.)
  • +
  • Drag/Region selection
  • +
  • Pre-written SQL queries with dropdowns/selects. (Common on dashboards.)
  • +
  • Allow user to write queries themselves in SQL or a custom query language. Common on advanced dashboards.
  • +
+

Altair Selection: https://altair-viz.github.io/user_guide/interactions.html
+D3 Selection: https://observablehq.com/collection/@d3/d3-selection

+
+
+

Discussion: Major Visualization Challenges

+
    +
  • Missing/Incomplete data
  • +
  • Huge quantities of data
  • +
  • Complex, high-dimensional data
  • +
  • Uncertainty
  • +
  • Challenges of Scale
  • +
+
+
+

Missing/Incomplete Data

+
    +
  • Imputation of missing values.
  • +
  • Label missing data.
  • +
  • Regardless of choice. Be transparent.
  • +
+
+
+

Big Data

+
    +
  • Aggregation
  • +
  • Sampling
  • +
  • Filtering/Interactives
  • +
+
+
+

Lots of Attributes/Dimensions

+
    +
  • Small multiples approach
  • +
  • Pairwise charts. (XY, YZ, XZ)
  • +
  • Advanced: Dimensionality Reduction Algorithms (PCA, TSNE, etc.)
  • +
  • Interactive exploration
  • +
+
+
+

Handling Uncertainty

+
    +
  • Frequency Approach
  • +
  • Confidence intervals & error bars
  • +
  • Probabilistic visualizations
  • +
+
+
+

Visualizing Scale

+
    +
  • Hierarchical visualizations (treemaps)
  • +
  • Logarithmic scales when appropriate.
  • +
+
+
\ No newline at end of file diff --git a/14.misc/slides.md b/14.misc/slides.md new file mode 100644 index 0000000..ae79e1b --- /dev/null +++ b/14.misc/slides.md @@ -0,0 +1,236 @@ +# Data Visualization for Public Policy + +--- + +## This Week + +- Project Questions / Deployment +- Code Quality & Style +- Dashboards +- Visual Style Guides +- 100 Visualizations +- Animation & Interaction + +--- + +## Code Quality + +When you are writing a data pipeline or application, code quality is of high importance. + +- Readability & Documentation => easier maintenance & bugs prevented. +- Small speed-ups from better algorithm/data structure choices can make big differences when that task executes millions of times. +- Test coverage makes refactoring easier, prevents regressions. + +--- + +### Unique Considerations for Data Viz + +- Typically little to no ongoing reuse/maintenance. +- Visualization itself unlikely to be performance bottleneck compared to data manipulation. +- Focus is on **immediate visual output**, testing de-emphasized. +- Often written by solo developer, even in larger organizations. + +Code quality still matters, but your main goal should be code that you can trust is correct. Testing, documentation, and the "right way" are less essential. + +--- + + +## Dashboards + +![bg fit left](plotly-dash.png) + +Long-lived data visualizations that typically run against a central repository of data. + +You can use the same techniques & tools, or custom dashboard-focused tools like Tableau or Dash. + +Key difference: You will likely need some degree of dynamic refresh (instead of loading CSV/JSON load data from DB/API). Comes with caching and other performance considerations. + +--- + +## Dashboard Psuedocode + +``` +every interval { + data = update() + visualize(data) +} +``` + +Can make use of animation to provide context: + +- scrolling time series +- animated dials to show directional changes + +--- + +## Are Dashboards Bad? + +Dashboards saw a surge in popularity a decade or so ago, and there are now plenty of bad dashboards out there. + +Golden rule of dashboards: **answer a question & make them actionable**. + +Too often people just throw all their data on a dashboard. + +*OK, I can see that 6 errors occurred in the last 24 hours...* + +- Is that a lot? **Show trends where appropriate!** +- What can I do? **Provide links/action items!** + +Without this focus, dashboards become decorations. + +--- + +## Style Guide + +It can be helpful to create or build from a style guide. Even for your own work. + +Examples: + +- [Sunlight Foundation](https://www.amycesal.com/portfolio#/data-visualization-style-guidelines/) +- [CFPB](https://www.amycesal.com/portfolio#/cfpb-design-manual-data-visualization/) + +--- + +## Key Elements + +### Typography + +Select 2 complementary fonts: + +- Prefer a very legible sans-serif font for data/axes labels. +- Any legible font for chart titles/narrative/etc. + +### Color Selection + +Best to have: + +- Nominal data: Distinct, contrasting hues +- Quantitative data: Linear or divergent gradients +- Consider color-blindness and accessibility + +--- + +## Style Guide: Chart Selection +- Match chart type to **data characteristics** and **audience**. +- Consider: + - Data dimensionality + - Comparison needs + - Narrative goals + +--- + +## Creativity: 1 Dataset 100 Visualizations + + + +--- + +## Applications of Animation + +- Demonstrate change over time: Data being added to chart as time "plays." +- Highlight relationships: Hover/highlight/select modifies display of other data on page. +- Focus attention: Show subsets of data at a time. +- Show uncertainty: "wiggle", shifting trend line (next page) + +More Examples: +- +- + + +--- + +![bg fit](https://clauswilke.com/dataviz/visualizing_uncertainty_files/figure-html/mpg-uncertain-HOP-animated-1.gif) + +--- + +## Applications of Interaction + +- Enable user-driven exploration of data. + - "How do these two variables compare?" + - "What happens if this price increases?" +- Allow personalization (e.g. enter your zip code) + - "What is this like in my city?" +- Increased engagement/retention. Lots of evidence showing we learn best by participating. + +--- + +## JS setInterval + +```js +// will call `func` every `everyMS` +let intervalId = setInterval(func, everyMS) + +// stop calling func +clearInterval(intervalId) +``` + + + +--- + +## Interaction: Making Data Selections + +For user-driven data explorations, **selection** is an important concept. + +How do you want to let a user select individual records or groups of records? + +### Selection Spectrum: Simple to Complex + +- Menu/Select Box +- Hover/click on items on page (tooltips, etc.) +- Drag/Region selection +- Pre-written SQL queries with dropdowns/selects. (Common on dashboards.) +- Allow user to write queries themselves in SQL or a custom query language. Common on advanced dashboards. + +Altair Selection: +D3 Selection: + +--- + +## Discussion: Major Visualization Challenges + +- Missing/Incomplete data +- Huge quantities of data +- Complex, high-dimensional data +- Uncertainty +- Challenges of Scale + +--- + +### Missing/Incomplete Data + +- Imputation of missing values. +- Label missing data. +- Regardless of choice. Be transparent. + +--- + +### Big Data + +- Aggregation +- Sampling +- Filtering/Interactives + +--- + +### Lots of Attributes/Dimensions + +- Small multiples approach +- Pairwise charts. (XY, YZ, XZ) +- Advanced: Dimensionality Reduction Algorithms (PCA, TSNE, etc.) +- Interactive exploration + +--- + +### Handling Uncertainty + +- Frequency Approach +- Confidence intervals & error bars +- Probabilistic visualizations + +--- + +### Visualizing Scale + +- Hierarchical visualizations (treemaps) +- Logarithmic scales when appropriate. diff --git a/15.conclusion/slides.md b/15.conclusion/slides.md new file mode 100644 index 0000000..6ea6ca5 --- /dev/null +++ b/15.conclusion/slides.md @@ -0,0 +1,119 @@ +# Data Visualization for Public Policy + +![bg fit right](mosaic.jpg) + +--- + +## Recap: Why Do We Create Visualizations? + +- To **better understand** large, complex datasets. +- To **influence others** through compelling, evidence-based storytelling. + +--- + +## Influence: The Power of Visual Communication + +Effective data visualizations can: + +- Draw attention to critical problems or potential solutions. +- Argue for specific policy interventions. +- Connect an audience with large and potentially abstract data concepts. + +--- + +## Key Ideas Exercise + +What are *your* golden rules of data visualization? + +--- + +## (Some) Key Rules for Effective Data Visualization + +### 1. Audience-Centered Design + +- Take time to consider and understand your audience's background, expertise, and information needs. +- The "best" data visualization is one that the audience understands & remembers. + +--- + +### 2. **Prioritize Truthful Representation** + +- Correct chart types & encodings. +- Never sacrifice **data integrity** in the name of a "better" chart. +- Avoid misleading choices: truncated axes, dual axes, etc. +- Consider the role of **uncertainity** in representing your data. + +--- + + + +![bg fit](../01.gog-altair/effectiveness.png) + +--- + +### 3. **Maximize Clarity and Comprehension** + +- Simplify complex information where possible. It is OK to refer a user to a table or other source for deeper analysis. +- Remove unnecessary visual elements -- "chart junk" +- Guide the viewer's attention to key insights with **labeling**. + +--- + +## Tufte's Key Ideas Revisited + +![bg fit right](../03.charts/tufte.png) + +- Graphical Integrity: Above all else, show the data. +- Maximize the data-ink ratio. +- Minimize chart junk. +- Aim for high chart density, consider *small multiples*. +- Revision & Editing are essential. + +--- + +### 4. **Optimize for Accessibility** + +- Use color-blind friendly palettes. +- Ensure readability for viewers with different visual capabilities. (Contrast,font size, etc.) +- Provide *alternative text descriptions* in web presentations. + - `A graphic representing the length of rivers...` +- Accessibility tools: contrast/color/WCAG checkers. + +--- + +### 5. **Build a Compelling Narrative** + +- Create a clear, coherent story and use graphics to support it. +- Each chart should have a clear "why" -- don't make users wonder why you're showing them something. +- Use visual elements & conventions to guide the viewer through key arguments and order. +- Connect data to broader context and implications. + +--- + +### 6. **Embrace Iterative Improvement** + +- Seek feedback from diverse perspectives, especially those represented in your audience. +- Be willing to revise and refine, if someone had an issue others will too. + +--- + +### 7. **Consider Ethical Implications** + +- Represent marginalized groups respectfully: color choices, language. +- Remember that pixels often represent people, dismissing outliers/etc. should not be done without consideration. +- Be transparent about data sources and limitations. +- Use visualization as a tool for **understanding and persuasion, not manipulation**. + +--- + +## Conclusion + +Effective data visualization is both an art and a science. + +Understand your data and what you people to understand. + +Center your audience. + +Prioritize clarity & truth. + +Be creative & have fun!