class: center, middle, inverse, title-slide # Colors! ### Daniel Anderson ### Week 5, Class 1 --- layout: true <script> feather.replace() </script> <div class="slides-footer"> <span> <a class = "footer-icon-link" href = "https://github.com/uo-datasci-specialization/c2-dataviz-2021/raw/main/static/slides/w5p1.pdf"> <i class = "footer-icon" data-feather="download"></i> </a> <a class = "footer-icon-link" href = "https://dataviz-2021.netlify.app/slides/w5p1.html"> <i class = "footer-icon" data-feather="link"></i> </a> <a class = "footer-icon-link" href = "https://github.com/uo-datasci-specialization/c2-dataviz-2021"> <i class = "footer-icon" data-feather="github"></i> </a> </span> </div> --- class: inverse-blue # Data viz in Wild Chris Vinita ### Shijing and David on deck --- # Agenda * Color basics + 3 basic ways color is used * Color blindness * Some common problems with color use * Quick discussion of palettes -- ### Learning Objectives * Understand different types of color palettes + ...and when you should use one versus another * Understand and be able to effectively evaluate concerns related to color blindness * Be able to fluently change colors/fills within ggplot --- # Before we get too deep ### Some very practical advice * Keep straight when color is mapped to a variable through `aes` and when it's modifying an element overall + Former requires `scale_color_*` or `scale_fill_*` while the latter does not -- * Keep straight colors and fills (see former bullet) -- * Use advice of others to your advantage (e.g., http://colorbrewer2.org/) --- class: inverse bottom center background-image:url(http://socviz.co/assets/ch-01-luminance-contrast-color.png) background-size:contain # Why color choice matters --- class: inverse bottom center background-image:url(http://socviz.co/assets/ch-01-luminance-contrast-bw.png) background-size:contain # Why color choice matters --- class: inverse-red middle # Another quick example .realbig[ .middle[ .center[ [{rayshader}](https://resources.rstudio.com/rstudio-conf-2019/3d-mapping-plotting-and-printing-with-rayshader) ] ] ] --- # 3 fundamental uses of color -- 1. Distinguish groups from each other -- 1. Represent data values -- 1. Highlight --- class:inverse-blue middle # Color as a tool to distinguish --- # Discrete items * Often no intrinsic order -- ### Qualitative color scale * Finite number of colors + Chosen to maximize distinctness, while also be equivalent + Equivalent - No color should stand out - No impression of order --- background-image:url(https://serialmentor.com/dataviz/color_basics_files/figure-html/qualitative-scales-1.png) background-size:contain # Some examples .footnote[See more about the Okabe Ito palette origins [here]( http://jfly.iam.u-tokyo.ac.jp/color/)] --- # How do we use them? Imagine we have data like this ```r popgrowth_df ``` ``` ## # A tibble: 51 x 7 ## region division state pop2000 pop2010 popgrowth ## <fct> <chr> <fct> <dbl> <dbl> <dbl> ## 1 Midwest East North Central Michigan 9938444 9883640 -0.005514344 ## 2 Northeast New England Rhode Island 1048319 1052567 0.004052202 ## 3 South West South Central Louisiana 4468976 4533372 0.01440956 ## 4 Midwest East North Central Ohio 11353140 11536504 0.01615095 ## 5 Northeast Middle Atlantic New York 18976457 19378102 0.02116544 ## 6 South South Atlantic West Virginia 1808344 1852994 0.02469110 ## # … with 45 more rows, and 1 more variable: area <dbl> ``` --- # Maybe a plot like this .pull-left[ ```r ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(alpha = 0.9) ``` ] .pull-right[ ![](w5p1_files/figure-html/unnamed-chunk-2-1.png)<!-- --> ] --- # Alternatively, fill by region .pull-left[ ```r ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = region), alpha = 0.9) ``` ] .pull-right[ ![](w5p1_files/figure-html/unnamed-chunk-3-1.png)<!-- --> ] --- # Problem with default palette ![](w5p1_files/figure-html/colorblind1-1.png)<!-- --> --- # Alternative: viridis .pull-left[ ```r ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = region), alpha = 0.9) + * scale_fill_viridis_d() ``` ] .pull-right[ ![](w5p1_files/figure-html/unnamed-chunk-4-1.png)<!-- --> ] --- # Revised version ![](w5p1_files/figure-html/unnamed-chunk-5-1.png)<!-- --> --- # The Okabe Ito palette .pull-left[ * From [Color Universal Design](http://jfly.iam.u-tokyo.ac.jp/color/) ```r *library(colorblindr) ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = region), alpha = 0.9) + * scale_fill_OkabeIto() ``` ] .pull-right[ ![](w5p1_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] --- # Okabe Ito for colorblindness ![](w5p1_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- # How am I checking for colorblindness? * Also part of the **{colorblindr}** package ([here](https://github.com/clauswilke/colorblindr)) + depends on the dev versions of **{colorspace}** and **{cowplot}**, which are useful packages in their own right ```r devtools::install_github("wilkelab/cowplot") install.packages("colorspace", repos = "http://R-Forge.R-project.org") devtools::install_github("clauswilke/colorblindr") ``` --- ```r p <- ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = region), alpha = 0.9) + scale_fill_OkabeIto() + * theme_void() # not necessary but I like it *colorblindr::cvd_grid(p) ``` ![](w5p1_files/figure-html/okabe-ito2-1.png)<!-- --> --- class: inverse-orange middle # Colors for continuous values --- background-image:url(https://serialmentor.com/dataviz/color_basics_files/figure-html/sequential-scales-1.png) background-size:contain # Sequential scale examples --- # Sequential scales * Which values are larger/smaller -- * How distant two values are from each other -- + Scale must be perceptually uniform across its entire range -- + Similar to an interval scale, but for color -- * Often based on a single .bolder[hue] -- * Multi-hue sequential scales tend to follow gradients in the natural world --- class: inverse-red middle # Common uses of sequential palettes --- # Heatmaps First the data: ```r hm <- diamonds %>% select(table, price, depth, carat) %>% corrr::correlate() %>% pivot_longer(-rowname) %>% mutate(name = fct_reorder(name, value), rowname = fct_reorder(rowname, value)) hm ``` ``` ## # A tibble: 16 x 3 ## rowname name value ## <fct> <fct> <dbl> ## 1 table table NA ## 2 table price 0.1271339 ## 3 table depth -0.2957785 ## 4 table carat 0.1816175 ## 5 price table 0.1271339 ## 6 price price NA ## # … with 10 more rows ``` --- ```r ggplot(hm, aes(name, rowname)) + geom_tile(aes(fill = value)) + coord_fixed() ``` ![](w5p1_files/figure-html/unnamed-chunk-9-1.png)<!-- --> --- ```r ggplot(hm, aes(name, rowname)) + geom_tile(aes(fill = value)) + coord_fixed() + *scale_fill_distiller(palette = "Blues") ``` ![](w5p1_files/figure-html/heatmap2-1.png)<!-- --> --- # Change the NA value In any `scale_*` you can change the `NA` value, including to `"transparent"`. ```r ggplot(hm, aes(name, rowname)) + geom_tile(aes(fill = value)) + coord_fixed() + scale_fill_distiller(palette = "Blues", * na.value = "#b0bfb0") ``` ![](w5p1_files/figure-html/heatmap2b-1.png)<!-- --> --- ```r ggplot(hm, aes(name, rowname)) + geom_tile(aes(fill = value)) + coord_fixed() + *scale_fill_viridis_c() ``` ![](w5p1_files/figure-html/heatmap3-1.png)<!-- --> --- ```r ggplot(hm, aes(name, rowname)) + geom_tile(aes(fill = value)) + coord_fixed() + *scale_fill_viridis_c(option = "magma") ``` ![](w5p1_files/figure-html/heatmap4-1.png)<!-- --> `option = c("viridis", "magma", "inferno", "plasma")` --- # Choropleths ![](w5p1_files/figure-html/lane1-1.png)<!-- --> --- # Heat palette ![](w5p1_files/figure-html/lane2-1.png)<!-- --> --- # Options * `scale_fill_continuous_sequential("Heat")` * `scale_color_continuous_sequential("Heat")` * `scale_fill_discrete_sequential("Heat")` * `scale_color_discrete_sequential("Heat")` --- # viridis palette ![](w5p1_files/figure-html/lane3-1.png)<!-- --> --- background-image:url(https://serialmentor.com/dataviz/color_basics_files/figure-html/diverging-scales-1.png) background-size:contain # Diverging palettes --- # Earth palette ![](w5p1_files/figure-html/or1-1.png)<!-- --> --- ![](w5p1_files/figure-html/ca1-1.png)<!-- --> --- class: inverse-blue center middle # Color as a tool to highlight --- # MPG data Basic scatterplot of weight to highway mpg ```r ggplot(mpg, aes(displ, hwy)) + geom_point() ``` ![](w5p1_files/figure-html/basic-scatter-1.png)<!-- --> --- # Highlight compact cars ```r ggplot(mpg, aes(displ, hwy)) + geom_point(color = "gray80") + geom_point(data = filter(mpg, class == "compact"), color = "#C55644") ``` ![](w5p1_files/figure-html/compact-cars-scatter-1.png)<!-- --> --- # Highlight manual cars ```r ggplot(mpg, aes(displ, hwy)) + geom_point(color = "gray80") + geom_point(data = filter(mpg, str_detect(trans, "manual")), color = "#C55644") ``` ![](w5p1_files/figure-html/compact-cars-scatter-h1-1.png)<!-- --> --- # Back to our states plot ### Highlight Oregon and Washington ```r ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = region), alpha = 0.9) + geom_col(data = filter(popgrowth_df, state == "Oregon" | state == "Washington"), fill = "#C55644") + scale_fill_OkabeIto() ``` --- ![](w5p1_files/figure-html/basic-highlight-or-eval-1.png)<!-- --> --- # Color labels ```r states <- unique(popgrowth_df$state) label_color <- ifelse(states == "Oregon" | states == "Washington", "#C55644", "gray30") label_color ``` ``` ## [1] "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" ## [8] "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" ## [15] "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" ## [22] "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" ## [29] "gray30" "gray30" "gray30" "gray30" "gray30" "#C55644" "gray30" ## [36] "gray30" "gray30" "gray30" "#C55644" "gray30" "gray30" "gray30" ## [43] "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" "gray30" ## [50] "gray30" "gray30" ``` ```r label_face <- ifelse(states == "Oregon" | states == "Washington", "bold", "plain") label_face ``` ``` ## [1] "plain" "plain" "plain" "plain" "plain" "plain" "plain" "plain" "plain" ## [10] "plain" "plain" "plain" "plain" "plain" "plain" "plain" "plain" "plain" ## [19] "plain" "plain" "plain" "plain" "plain" "plain" "plain" "plain" "plain" ## [28] "plain" "plain" "plain" "plain" "plain" "plain" "bold" "plain" "plain" ## [37] "plain" "plain" "bold" "plain" "plain" "plain" "plain" "plain" "plain" ## [46] "plain" "plain" "plain" "plain" "plain" "plain" ``` --- ```r ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = region), alpha = 0.9) + geom_col(data = filter(popgrowth_df, state == "Oregon" | state == "Washington"), fill = "#C55644") + scale_fill_OkabeIto() + * theme(axis.text.y = element_text(color = label_color, * face = label_face)) ``` --- ![](w5p1_files/figure-html/orwa-highlight-eval-1.png)<!-- --> --- # Even better ```r accent_OkabeIto <- palette_OkabeIto[c(1, 2, 7, 4, 5, 3, 6)] accent_OkabeIto[1:4] <- desaturate(lighten(accent_OkabeIto[1:4], .4), .8) accent_OkabeIto[5:7] <- darken(accent_OkabeIto[5:7], .3) gg_color_swatches(7) + scale_fill_manual(values = accent_OkabeIto) ``` ![](w5p1_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- ```r ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = region), alpha = 0.9) + geom_col(data = filter(popgrowth_df, state == "Oregon" | state == "Washington"), fill = "#C55644") + * scale_fill_manual(values = accent_OkabeIto) + theme(axis.text.y = element_text(color = label_color, face = label_face)) ``` --- ![](w5p1_files/figure-html/orwa-highlight-eval2-1.png)<!-- --> --- # Or even better ```r *library(ggtext) ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = region), alpha = 0.9) + geom_col(data = filter(popgrowth_df, state == "Oregon" | state == "Washington"), fill = "#C55644") + scale_fill_manual(values = accent_OkabeIto) + * scale_x_continuous(expand = c(0, 0)) + labs(title = "Population growth by region", subtitle = "The <span style = 'color: #C55644'>**northwest**</span> is where it's at") + theme(axis.text.y = element_text(color = label_color, face = label_face), * plot.subtitle = element_markdown()) ``` --- ![](w5p1_files/figure-html/unnamed-chunk-12-1.png)<!-- --> --- # Last example ```r data(sleepstudy, package = "lme4") head(sleepstudy) ``` ``` ## Reaction Days Subject ## 1 250 0 308 ## 2 259 1 308 ## 3 251 2 308 ## 4 321 3 308 ## 5 357 4 308 ## 6 415 5 308 ``` --- # Plot by subject ```r ggplot(sleepstudy, aes(Days, Reaction, group = Subject)) + geom_line() ``` ![](w5p1_files/figure-html/plot1-1.png)<!-- --> --- ```r *library(gghighlight) ggplot(sleepstudy, aes(Days, Reaction, group = Subject)) + geom_line() + *gghighlight(max(Reaction) > 400) ``` ![](w5p1_files/figure-html/gghighlight1-1.png)<!-- --> --- ```r library(gghighlight) ggplot(sleepstudy, aes(Days, Reaction, color = Subject)) + geom_line() + gghighlight(max(Reaction) > 400) + *scale_color_OkabeIto() ``` ![](w5p1_files/figure-html/gghighlight2-1.png)<!-- --> --- ```r library(gghighlight) ggplot(sleepstudy, aes(Days, Reaction, color = Subject)) + geom_line() + *facet_wrap(~Subject) + gghighlight(max(Reaction) > 400) + scale_color_OkabeIto() ``` ![](w5p1_files/figure-html/gghighlight3-1.png)<!-- --> --- class: inverse-red center middle # A few other things to consider --- # Double encodings ![](w5p1_files/figure-html/iris-scatter1-1.png)<!-- --> -- This plot is less than ideal. Why? --- # Color blindness ![](w5p1_files/figure-html/color-blind-iris_scatter-1.png)<!-- --> --- # Better version ![](w5p1_files/figure-html/iris-scatter-1.png)<!-- --> --- # Color blindness check ![](w5p1_files/figure-html/color-blind-iris_scatter2-1.png)<!-- --> --- class:inverse-blue center middle # Common problems with color --- # Too many colors More than 5-ish categories generally becomes too difficult to track ```r ggplot(popgrowth_df, aes(pop2000, popgrowth, color = state)) + geom_point() ``` ![](w5p1_files/figure-html/too-many-colors-1.png)<!-- --> --- # Use labels More than 5-ish categories generally becomes too difficult to track ```r *library(ggrepel) ggplot(popgrowth_df, aes(pop2000, popgrowth)) + geom_point(color = "gray70") + *geom_text_repel(aes(label = state)) ``` ![](w5p1_files/figure-html/states-labeled-1.png)<!-- --> --- # Better Get a subset ```r to_label <- c("Alaska", "Arizona", "California", "Florida", "Wisconsin", "Louisiana", "Nevada", "Michigan", "Montana", "New Mexico", "Pennsylvania", "New York", "Oregon", "Rhode Island", "Tennessee", "Texas", "Utah", "Vermont") subset_states <- popgrowth_df %>% filter(state %in% to_label) ``` --- ```r *library(ggrepel) ggplot(popgrowth_df, aes(pop2000, popgrowth)) + geom_point(color = "gray70") + *geom_text_repel(aes(label = state), * data = subset_states, * min.segment.length = 0) ``` ![](w5p1_files/figure-html/repeled-labels-1.png)<!-- --> (still lots more cleaning up we could do here...) --- # Rainbow palette ```r rainbow(3) ``` ``` ## [1] "#FF0000" "#00FF00" "#0000FF" ``` ```r rainbow(7) ``` ``` ## [1] "#FF0000" "#FFDB00" "#49FF00" "#00FF92" "#0092FF" "#4900FF" "#FF00DB" ``` --- # Pretty! Doesn't work well See [here](https://www.poynter.org/archive/2013/why-rainbow-colors-arent-always-the-best-options-for-data-visualizations/) for one (of many) articles on why this is the case ```r ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = state)) + scale_fill_manual(values = rainbow(51)) + guides(fill = "none") ``` ![](w5p1_files/figure-html/rainbow-pop-1.png)<!-- --> --- # Last few note on palettes * Do some research, find what you like **and** what tends to work well * Check for colorblindness * Look into http://colorbrewer2.org/ + `scale_color_brewer()` and `scale_fill_brewer()` ship with ggplot2 --- # For example ```r ggplot(popgrowth_df, aes(x = popgrowth, y = state)) + geom_col(aes(fill = region), alpha = 0.9) + * scale_fill_brewer(palette = "Set2") ``` --- ![](w5p1_files/figure-html/unnamed-chunk-14-1.png)<!-- --> --- # Paleteer package .center[ [![](https://github.com/EmilHvitfeldt/paletteer/raw/master/man/figures/logo.png)](https://github.com/EmilHvitfeldt/paletteer) ] --- class: inverse-green center middle # Next time Lab 3: Colors Note - this will be our final lab to make sure you have sufficient time for your final projects