Communication? --- class: inverse-red center middle # One continuous variable --- # Histogram ![](w2p1_files/figure-html/histo-1.png)<!-- --> --- # Density plot ![](w2p1_files/figure-html/dens-1.png)<!-- --> --- # (Empirical) Cumulative Density ![](w2p1_files/figure-html/cum_dens-1.png)<!-- --> --- # QQ Plot Compare to theoretical quantiles (for normality) ![](w2p1_files/figure-html/qq-1.png)<!-- --> --- # Empirical examples I'll move fast, but if you want to (try to) follow along, or recreate anything here later, first run ```r remotes::install_github("clauswilke/dviz.supp") ``` --- ### Titanic data ```r head(titanic) ``` ``` ## class age sex survived ## 1 1st 29.00 female 1 ## 2 1st 2.00 female 0 ## 3 1st 30.00 male 0 ## 4 1st 25.00 female 0 ## 5 1st 0.92 male 1 ## 6 1st 47.00 male 1 ``` --- # Basic histogram ```r ggplot(titanic, aes(x = age)) + geom_histogram() ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](w2p1_files/figure-html/age_hist-1.png)<!-- --> --- # Make it a little prettier ```r ggplot(titanic, aes(x = age)) + geom_histogram(fill = "#56B4E9", color = "white", alpha = 0.9) ``` ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](w2p1_files/figure-html/age_hist2-1.png)<!-- --> --- # Change the number of bins ```r ggplot(titanic, aes(x = age)) + geom_histogram(fill = "#56B4E9", color = "white", alpha = 0.9, * bins = 50) ``` ![](w2p1_files/figure-html/age_hist50-1.png)<!-- --> --- # Vary the number of bins ![](w2p1_files/figure-html/bins-1.png)<!-- --> --- # Denisty plot ### ugly 😫 ```r ggplot(titanic, aes(age)) + geom_density() ``` ![](w2p1_files/figure-html/dens-titanic-1.png)<!-- --> --- # Denisty plot ### Change the fill 😌 ```r ggplot(titanic, aes(age)) + geom_density(fill = "#56B4E9") ``` ![](w2p1_files/figure-html/dens-titanic-blue-1.png)<!-- --> --- # Density plot estimation * Kernal density estimation + Different kernal shapes can be selected + Bandwidth matters most + Smaller bands = bend more to the data * Approximation of the underlying continuous probability function + Integrates to 1.0 (y-axis is somewhat difficult to interpret) --- # Denisty plot ### change the bandwidth ```r ggplot(titanic, aes(age)) + geom_density(fill = "#56B4E9", bw = 5) ``` ![](w2p1_files/figure-html/dens-titanic5-1.png)<!-- --> --- class: middle ![](w2p1_files/figure-html/vary-bw-1.png)<!-- --> --- # Quickly How well does it approximate a normal distribution? ```r ggplot(titanic, aes(sample = age)) + stat_qq_line(color = "#56B4E9") + geom_qq(color = "gray40") ``` ![](w2p1_files/figure-html/qq-titanic-1.png)<!-- --> --- class: inverse-red center middle # Grouped data ### Distributions How do we display more than one distribution at a time? --- # Boxplots ![](w2p1_files/figure-html/boxplots-1.png)<!-- --> --- # Violin plots ![](w2p1_files/figure-html/violin-1.png)<!-- --> --- # Jittered points ![](w2p1_files/figure-html/jittered-1.png)<!-- --> --- # Sina plots ![](w2p1_files/figure-html/sina-1.png)<!-- --> --- # Stacked histograms ![](w2p1_files/figure-html/stacked-histo-1.png)<!-- --> --- # Overlapping densities ![](w2p1_files/figure-html/overlap-dens-1.png)<!-- --> --- # Ridgeline densities ![](w2p1_files/figure-html/ridgeline-1.png)<!-- --> --- class: inverse-orange center middle # Quick empirical examples --- # Boxplots ```r ggplot(titanic, aes(sex, age)) + geom_boxplot(fill = "#A9E5C5") ``` ![](w2p1_files/figure-html/boxplots-empirical-1.png)<!-- --> --- # Violin plots ```r ggplot(titanic, aes(sex, age)) + geom_violin(fill = "#A9E5C5") ``` ![](w2p1_files/figure-html/violin-empirical-1.png)<!-- --> --- # Jittered point plots ```r ggplot(titanic, aes(sex, age)) + geom_jitter(width = 0.3, height = 0) ``` ![](w2p1_files/figure-html/jittered-empirical-1.png)<!-- --> --- # Sina plot ```r ggplot(titanic, aes(sex, age)) + ggforce::geom_sina() ``` ![](w2p1_files/figure-html/sina-empirical-1.png)<!-- --> --- # Stacked histogram ```r ggplot(titanic, aes(age)) + geom_histogram(aes(fill = sex)) ``` ![](w2p1_files/figure-html/stacked-histo-empirical-1.png)<!-- --> -- .realbig[🤨] --- # Dodged ```r ggplot(titanic, aes(age)) + geom_histogram(aes(fill = sex), position = "dodge") ``` ![](w2p1_files/figure-html/dodged-histo-empirical-1.png)<!-- --> -- Note `position = "dodge"` does not go into `aes` (not accessing a variable in your dataset) --- # Better ```r ggplot(titanic, aes(age)) + geom_histogram(fill = "#A9E5C5", color = "white", alpha = 0.9,) + * facet_wrap(~sex) ``` ![](w2p1_files/figure-html/wrapped-histo-empirical-1.png)<!-- --> --- # Overlapping densities ```r ggplot(titanic, aes(age)) + geom_density(aes(fill = sex), color = "white", alpha = 0.4) ``` ![](w2p1_files/figure-html/overlap-dens-empirical-1.png)<!-- --> -- Note the default colors really don't work well in most of these --- ```r ggplot(titanic, aes(age)) + geom_density(aes(fill = sex), color = "white", alpha = 0.6) + scale_fill_manual(values = c("#009973", "#99ffe6")) ``` ![](w2p1_files/figure-html/overlap-dens-empirical2-1.png)<!-- --> --- # Ridgeline densities ```r ggplot(titanic, aes(age, sex)) + ggridges::geom_density_ridges(color = "white", fill = "#A9E5C5") ``` ![](w2p1_files/figure-html/ridgeline-dens-empirical-1.png)<!-- --> --- class: inverse-red center middle # Visualizing amounts --- # Bar plots ![](w2p1_files/figure-html/bars-1.png)<!-- --> --- # Flipped bars ![](w2p1_files/figure-html/flipped_bars-1.png)<!-- --> --- # Dotplot ![](w2p1_files/figure-html/dots-1.png)<!-- --> --- # Heatmap ![](w2p1_files/figure-html/heatmap-1.png)<!-- --> --- # Empirical examples ### How much does college cost? ```r library(here) library(rio) tuition <- import(here("data", "us_avg_tuition.xlsx"), setclass = "tbl_df") head(tuition) ``` ``` ## 6 Alabama 2009-10 7188.954 ## 7 Alabama 2010-11 8071.134 ## 8 Alabama 2011-12 8451.902 ## 9 Alabama 2012-13 9098.069 ## 10 Alabama 2013-14 9358.929 ## # … with 590 more rows ``` --- # Compute summaries ```r annual_means <- tuition %>% pivot_longer(`2004-05`:`2015-16`, names_to = "year", values_to = "avg_tuition") %>% group_by(year) %>% summarize(mean_tuition = mean(avg_tuition)) annual_means ``` ``` ## # A tibble: 12 x 2 ## year mean_tuition ## * <chr> <dbl> ## 1 2004-05 6409.564 ## 2 2005-06 6654.177 ## 3 2006-07 6809.914 ## 4 2007-08 7085.881 ## 5 2008-09 7156.560 ## 6 2009-10 7761.810 ## 7 2010-11 8228.834 ## 8 2011-12 8539.115 ## 9 2012-13 8842.357 ## 10 2013-14 8947.938 ## 11 2014-15 9037.357 ## 12 2015-16 9317.633 ``` --- # Good ```r ggplot(annual_means, aes(year, mean_tuition)) + geom_col() ``` ![](w2p1_files/figure-html/avg-tuition1-eval-1.png)<!-- --> --- # Better? ```r ggplot(annual_means, aes(year, mean_tuition)) + geom_col() + coord_flip() ``` ![](w2p1_files/figure-html/avg-tuition2-1.png)<!-- --> --- # Better still? ```r ggplot(annual_means, aes(year, mean_tuition)) + geom_point() + coord_flip() ``` ![](w2p1_files/figure-html/tuition3-1.png)<!-- --> --- # Even better ```r annual_means %>% mutate(year = readr::parse_number(year)) %>% ggplot(aes(year, mean_tuition)) + geom_line(color = "cornflowerblue") + geom_point() ``` ![](w2p1_files/figure-html/tuition4-1.png)<!-- --> -- Treat time (year) as a continuous variable --- # Grouped points Show change in tuition from 05-06 to 2015-16 ```r tuition %>% select(State, `2005-06`, `2015-16`) ``` ``` ## 1 9751.101 ## 2 9751.101 ## 3 9751.101 ## 4 9751.101 ## 5 9751.101 ## 6 9751.101 ``` --- # Rearrange ```r states <- states %>% gather(year, tuition, `2004-05`:`2015-16`) head(states) ``` ``` ## long lat group order State subregion year tuition ## 1 -87.46201 30.38968 1 1 alabama <NA> 2004-05 5682.838 ## 2 -87.48493 30.37249 1 2 alabama <NA> 2004-05 5682.838 ## 3 -87.52503 30.37249 1 3 alabama <NA> 2004-05 5682.838 ## 4 -87.53076 30.33239 1 4 alabama <NA> 2004-05 5682.838 ## 5 -87.57087 30.32665 1 5 alabama <NA> 2004-05 5682.838 ## 6 -87.58806 30.32665 1 6 alabama <NA> 2004-05 5682.838 ``` --- # Plot ```r ggplot(states) + geom_polygon(aes(long, lat, group = group, fill = tuition)) + * coord_fixed(1.3) + scale_fill_viridis_c(option = "magma") + facet_wrap(~year) ``` ![](w2p1_files/figure-html/usa-plot-1.png)<!-- --> --- background-image: url(img/states-heatmap.png) class: inverse bottom background-size:contain --- class: inverse bottom right background-image: url(img/states-heatmap-anim.gif) background-size:cover # Or animated --- class: middle # Wrapping up * We've got a ways to go - today was just an introduction * The geographic part in particular was too fast, and we'll talk about better ways later (note that Alaska/Hawaii were not even included) * We basically didn't talk about multivariate data (not even scatter plots) * Other types of plots will be embedded within the topics later in the class --- class:inverse-green # Next time ### Lab 2 git/GitHub collaboration It's already posted - feel free to start working on it whenever. * Must be completed as a group * Will use elements of what we talked about today, while also asking you to create branches, submit pull requests, etc.