class: center, middle, inverse, title-slide # Visual Perception! ### Daniel Anderson ### Week 4, Class 1 --- layout: true <script> feather.replace() </script> <div class="slides-footer"> <span> <a class = "footer-icon-link" href = "https://github.com/uo-datasci-specialization/c2-dataviz-2021/raw/main/static/slides/w4p1.pdf"> <i class = "footer-icon" data-feather="download"></i> </a> <a class = "footer-icon-link" href = "https://dataviz-2021.netlify.app/slides/w4p1.html"> <i class = "footer-icon" data-feather="link"></i> </a> <a class = "footer-icon-link" href = "https://github.com/uo-datasci-specialization/c2-dataviz-2021"> <i class = "footer-icon" data-feather="github"></i> </a> </span> </div> --- class: inverse-blue # Data viz in the wild * Kavya * Meg * Anisha * Rachael ## Zach and Tess on deck --- class: inverse-red middle # Reminder Your final project proposals are due next week. Please look at the syllabus for the requirements. --- # Agenda * Aesthetic mappings and visual encodings of data * data/ink ratio * Some do's and don't's (which are all rules 👍) -- ### Learning Objectives * Understand how decisions you make may help or hinder comprehension --- class: inverse-red center middle # Disclaimer ### I'm not a psychologist I don't really know why we perceive things certain ways. I mainly care that we do, and that your visualizations should account for them. --- # Visual Cues .footnote[Taken from *Modern Data Science with R*, p. 15] * **Position:** *Numeric*. Where in relation to other things? * **Length:** *Numeric*. How big (in one dimension)? * **Angle:** *Numeric*. How wide? Parallel to something else? * **Direction:** *Numeric*. At what slope? In a time series, going up or down? --- # Visual Cues .footnote[Taken from *Modern Data Science with R*, p. 15] * **Shape:** *Categorical*. Belonging to which group? * **Area:** *Numeric*. How big (in two dimensions)? * **Volume:** *Numeric*. How big (in three dimensions)? * **Shade:** *Numeric or Categorical*. To what extent? How Severely? * **Color:** *Numeric or Categorical*. To what extent? How Severely? --- class: middle center # Encoding data ![](w4p1_files/figure-html/data-encoding-1.png)<!-- --> --- # Other elements to consider * Text + How is the text displayed (e.g., font, face, location)? + What is the purpose of the text? -- * Transparency + Are there overlapping pieces? + Can transparency help? -- * Type of data + Continuous/categorical + Which can be mapped to each aesthetic? - e.g., shape and line type can only be mapped to categorical data, whereas color and size can be mapped to either. --- # Talk with a neighbor How would you encode each column of data? | Month | Day | Location | Station ID | Temperature | |:-----:|:---:|:-------------|-------------| :----------:| | Jan | 1 | Chicago | USW00014819 | 25.6 | | Jan | 1 | San Diego | USW00093107 | 55.2 | | Jan | 1 | Houston | USW00012918 | 53.9 | | Jan | 1 | Death Valley | USC00042319 | 51.0 | | Jan | 2 | Chicago | USW00014819 | 25.5 | | Jan | 2 | San Diego | USW00093107 | 55.3 | | Jan | 2 | Houston | USW00012918 | 53.8 | | Jan | 2 | Death Valley | USC00042319 | 51.2 | | Jan | 3 | Chicago | USW00014819 | 25.3 | --- # Scales > A scale defines a unique mapping between data and aesthetics. Importantly, a scale must be one-to-one, such that for each specific data value there is exactly one aesthetics value and vice versa. If a scale isn't one-to-one, then the data visualization becomes ambiguous. -- * Which data values correspond to specific aesthetic values? --- class: middle center # Basic Scales ![](w4p1_files/figure-html/scales-wilke-1.png)<!-- --> --- # Putting it to practice ![](w4p1_files/figure-html/temp-change-1.png)<!-- --> --- # Basic code for previous plot ```r ggplot(temps_long, aes(date, temperature)) + geom_line(aes(color = location)) ``` ![](w4p1_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- # Change colors If you want to change the colors on the previous plot, you have to change the colors of the scale for the color mapping. -- In other words, color is being mapped to data, and you have to change the color scale. -- ```r ggplot(temps_long, aes(date, temperature)) + geom_line(aes(color = location)) + scale_color_brewer(palette = "Dark2") ``` --- ![](w4p1_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- # One more note on colors There are lots of different scales and some work better than others. We'll talk about them more next week. -- You **do not** use `scale_color_*()` if you are not mapping data to color -- Make sure to keep straight `scale_color_*()` and `scale_fill_*()` --- # Alternative representation ### Same plot as before, but with different scales ![](w4p1_files/figure-html/temp-change2-1.png)<!-- --> --- # Basic code for previous plot ```r temps_long %>% group_by(location, month) %>% summarize(temp = mean(temperature)) %>% ggplot(aes(month, location)) + geom_tile(aes(fill = temp), color = "white") + coord_fixed() ``` ![](w4p1_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- # Change the fill ```r temps_long %>% group_by(location, month) %>% summarize(temp = mean(temperature)) %>% ggplot(aes(month, location)) + geom_tile(aes(fill = temp), color = "white") + coord_fixed() + scico::scale_fill_scico(palette = "tokyo") ``` ![](w4p1_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- # Comparison * Both represent three scales + Two position scales (x/y axis) + One color scale (categorical for the first, continuous for the second) * More scales are possible --- class: center middle ![](w4p1_files/figure-html/five-scales-1.png)<!-- --> --- background-image:url(http://socviz.co/dataviz-pdfl_files/figure-html4/ch-01-multichannel-1.png) background-size:contain ## Additional scales can become lost without high structure in the data --- class: inverse-blue center middle # Data ink ratio --- # What is it? -- > ## Above all else, show the data <br> \-Edward Tufte -- * Data-Ink Ratio = Ink devoted to the data / total ink used to produce the figure -- * Common goal: Maximize the data-ink ratio --- # Example ![](img/six-boxplots.png) -- * First thought might be - Cool! --- class: inverse-red background-image:url(https://theamericanreligion.files.wordpress.com/2012/10/lee-corso-sucks.jpeg?w=660) background-size:cover --- # Minimize cognitive load * Empirically, Tufte's plot was **the most difficult** for viewers to interpret. -- * Visual cues (labels, gridlines) *reduce* the data-ink ratio, but can also reduce cognitive load. --- # Another example ### Which do you prefer? .pull-left[ ![](w4p1_files/figure-html/h3_bad-1.png)<!-- --> ] .pull-right[ ![](w4p1_files/figure-html/h3_good-1.png)<!-- --> ] --- # Advice from Wilke > Whenever possible, visualize your data with solid, colored shapes rather than with lines that outline those shapes. Solid shapes are more easily perceived, are less likely to create visual artifacts or optical illusions, and do more immediately convey amounts than do outlines. --- # Another example .pull-left[ ![](w4p1_files/figure-html/iris_lines-1.png)<!-- --> ] .pull-right[ ![](w4p1_files/figure-html/iris_colored_lines-1.png)<!-- --> ] --- class: center middle ![](w4p1_files/figure-html/iris_filled-1.png)<!-- --> --- class: inverse-red middle background-image:url(img/monstrous-costs.png) background-size: contain ## This? --- # The takeaway? * It can often be helpful to remove "chart junk" + Remove background + Unnecessary frills + Certainly don't use 3D when it's not clearly warranted -- ### But... * Infographics can often be more memorable --- # Compromise? In some cases, it may be easy and more memorable to use glyphs instead of points or squares -- * Install packages ```r install.packages("extrafont") remotes::install_github("wch/extrafontdb") remotes::install_github("wch/Rttf2pt1") remotes::install_github("hrbrmstr/waffle") ``` -- * Create data ```r parts <- c(`Un-breached\nUS Population` = (318 - 11 - 79), `Premera` = 11, `Anthem` = 79) ``` --- # Basic plot ```r library(waffle) waffle(parts, rows = 8, colors = c("#969696", "#1879bf", "#009bda")) ``` ![](w4p1_files/figure-html/waffle1-1.png)<!-- --> --- # Glyph plot Doesn't seem to work anymore...🤷♂️ * Download and install `fontawesome-webfont.ttf` on your machine locally (see [here](https://fontawesome.com/v4.7.0/)) * Import new fonts (including glyphs, via font awesome) ```r library(extrafont) font_import() loadfonts() ``` ```r waffle(parts/10, rows = 3, colors = c("#969696", "#1879bf", "#009bda"), * use_glyph = "medkit") ``` --- # Should look like this ![](https://github.com/hrbrmstr/waffle/raw/master/README_files/figure-gfm/medkit-1.png) --- class: inverse center middle background-image: url(https://pbs.twimg.com/media/DxiychAVYAAJ4CY.jpg) background-size: 100% 100% --- # You can create them! * Create plots * Use illustrator or similar to put them together * Add some annotations * Consider using glyphs for greater memory * You can do a lot in R without going to illustrator etc. by just using [**{patchwork}**](https://patchwork.data-imaginist.com) or [**{cowplot}**](https://wilkelab.org/cowplot/index.html) --- class: center middle # More visual properties ![](http://socviz.co/assets/ch-01-perception-adelson-checkershow.jpg) --- # Or in real life <iframe width="560" height="315" src="https://www.youtube.com/embed/z9Sen1HTu5o" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> --- background-image:url(http://socviz.co/dataviz-pdfl_files/figure-html4/ch-01-dual-search-1.png) background-size:contain ## Where's the blue circle in each plot? --- background-image:url(http://socviz.co/assets/ch-01-cleveland-task-types.png) background-size:contain # What are we good at perceiving? --- background-image:url(http://socviz.co/assets/ch-01-heer-bostock-results.png) background-size:contain --- background-image:url(http://socviz.co/assets/ch-01-channels-for-cont-data-vertical.png) background-size:contain ## Ordered ## data ## mappings: ## Ranked --- # Unordered data mappings ![](http://socviz.co/assets/ch-01-channels-for-cat-data-vertical.png) --- class: inverse-blue center middle # Some things to avoid --- # Line drawings ### As discussed earlier .pull-left[ ![](w4p1_files/figure-html/iris_lines2-1.png)<!-- --> ] .pull-right[ ![](w4p1_files/figure-html/iris_filled2-1.png)<!-- --> ] --- # Much worse ### Unnecessary 3D .pull-left[ ![](img/3d-pie-10-v2.png) ] .pull-right[ ![](img/3d-pie-20-v2.png) ] --- # Much worse ### Unnecessary 3D .pull-left[ ![](img/3d-pie-40-v2.png) ] .pull-right[ ![](img/3d-pie-80-v2.png) ] --- # Horrid example ### Used relatively regularly ![](img/3d_bar.png) --- # Pie charts w/lots of categories ![](https://serialmentor.com/dataviz/visualizing_proportions_files/figure-html/marketshare-pies-1.png) --- # Alternative representation ![](https://serialmentor.com/dataviz/visualizing_proportions_files/figure-html/marketshare-side-by-side-1.png) --- # A case for pie charts * `\(n\)` categories low, * differences are relatively large * familiar for some audiences ![](w4p1_files/figure-html/pie-1.png)<!-- --> --- # The anatomy of a pie chart Pie charts are just stacked bar charts with a radial coordinate system ![](w4p1_files/figure-html/stacked_bars_nopie-1.png)<!-- --> --- # Alternative represenation ![](w4p1_files/figure-html/horiz_stacked-1.png)<!-- --> --- # Or one of these ![](w4p1_files/figure-html/dodged_bars-1.png)<!-- -->![](w4p1_files/figure-html/dodged_bars-2.png)<!-- --> --- # Dual axes * One exception - if second axis is a direct transformation of the first + e.g., Miles/Kilometers, Fahrenheit/Celsius ![](img/dual_axes.png) --- # Another example ![](img/bedsheet-tangled.png) .footnote[See more examples [here](http://www.tylervigen.com/spurious-correlations)] --- # Truncated axes ![](img/truncated_axes.png) --- class: middle ![](img/truncated_axes2.png) --- # Not always a bad thing > It is tempting to lay down inflexible rules about what to do in terms of producing your graphs, and to dismiss people who don’t follow them as producing junk charts or lying with statistics. But **being honest with your data is a bigger problem than can be solved by rules of thumb** about making graphs. In this case there is a moderate level of agreement that bar charts should generally include a zero baseline (or equivalent) given that bars encode their variables as lengths. But it would be a mistake to think that a dot plot was by the same token deliberately misleading, just because it kept itself to the range of the data instead. --- # Bars ![](http://socviz.co/dataviz-pdfl_files/figure-html4/ch-01-bar-simple-1.png) --- # Points ![](http://socviz.co/dataviz-pdfl_files/figure-html4/ch-01-bar-simple-2.png) --- # Law school enrollments ![](http://socviz.co/dataviz-pdfl_files/figure-html4/ch-01-law-enrollments-1.png) --- # Start at zero ![](http://socviz.co/dataviz-pdfl_files/figure-html4/ch-01-law-enrollments-2.png) --- # Scaling issues ![](img/area_size.png) --- class: middle center # Poor binning choices ![](img/poor_binning.png) --- class: inverse-blue middle # Conclusions --- # Essentially never * Use dual axes (unles they are direct transformations, just produce separate plots instead) * Use 3D unnecessarily -- # Be wary of + Truncated axes -- # Do * Minimize cognitive load * Be as clear as possible --- class: inverse-green middle # Next time ## Lab 2 We'll replicate some plots produced by [fivethirtyeight](https://fivethirtyeight.com/)