class: title-slide, center, middle # Data Visualization with {ggplot2} --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # ggplot `{ggplot2}` is the tidyverse package for **data visualization** that can make beautiful, customizable, publication-ready plots -- It follows the **"grammar of graphics"**, creating graphics one layer at a time (like a painting) --- background-image: url(images/hex/palmerpenguins.png) background-position: 90% 5% background-size: 10% # Palmer penguins We're going to use the **Palmer Penguins** dataset as an example throughout our discussion of `{ggplot}`. .center[ <img src="images/penguins.png" width="60%" /> ] -- The dataset **penguins** comes from the `{palmerpenugins}` package, which you can download with `install.packages("palmerpenguins")` and load with `library(palmerpenguins)`. --- background-image: url(images/hex/palmerpenguins.png) background-position: 90% 5% background-size: 10% # Palmer penguins ```r library(palmerpenguins) penguins ``` ``` ## # A tibble: 344 x 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm ## <fct> <fct> <dbl> <dbl> <int> ## 1 Adelie Torgersen 39.1 18.7 181 ## 2 Adelie Torgersen 39.5 17.4 186 ## 3 Adelie Torgersen 40.3 18 195 ## 4 Adelie Torgersen NA NA NA ## 5 Adelie Torgersen 36.7 19.3 193 ## 6 Adelie Torgersen 39.3 20.6 190 ## 7 Adelie Torgersen 38.9 17.8 181 ## 8 Adelie Torgersen 39.2 19.6 195 ## 9 Adelie Torgersen 34.1 18.1 193 ## 10 Adelie Torgersen 42 20.2 190 ## # ... with 334 more rows, and 3 more variables: body_mass_g <int>, ## # sex <fct>, year <int> ``` --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # {ggplot2} There are 3 components to making any plot. 1. The data set, including variables you want visually represent -- 2. Geoms or geometric shapes (e.g., bars, points, lines) -- 3. Aesthetic mappings (e.g., color, shape, transparency, location) -- .center[ <img src="images/ggplot_summary.png" width="40%" /> ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Data set When creating a plot in `{ggplot2}`, the first thing you have to do is call the `ggplot()` function and tell it what data you want to graph. -- The function `ggplot()` takes `data` as its first argument. Let's start there... -- .panelset[ .panel[.panel-name[Code] ```r ggplot(data = penguins) ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/plot-1-out-1.png)<!-- --> ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Geoms **Geoms** are geometric objects like points, bars, histograms, boxplots -- In order to paint a geom on our blank canvas, we need to tell `ggplot()` which geom we want to map -- Some popular geoms are `geom_histogram()` for histograms, `geom_bar()` for bar charts, `geom_point()` for points (e.g., for scatter plots), and`geom_boxplot()` for boxplots --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Aesthetic mapping Each `geom()` function in `ggplot` takes a **mapping** argument. -- **Aesthetic mapping** allows us to assign aesthetic properties to the geom, e.g., location, color, size, transparency (different geoms have different options) --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Geoms and aesthetic mapping We're going to call `ggplot()` again, and this time add the `geom_histogram()` **layer**, telling `ggplot()` to map the variable `flipper_length_mm` on the x-axis -- .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot() + geom_histogram(mapping = aes(x = flipper_length_mm)) ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/penguins-hist-basic-out-1.png)<!-- --> ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Mapping with geoms If we want to add a static property to a `geom()`, we need to do so outside of the `mapping` parameter. For example, we could change the number of bins... -- .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot() + geom_histogram(mapping = aes(x = flipper_length_mm), bins = 10) ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/penguins-hist-bins-out-1.png)<!-- --> ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Mapping with geoms ...or make them a different color. We can change the color of 2D objects with the `fill` aesthetic -- .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot() + geom_histogram(mapping = aes(x = flipper_length_mm), fill = "turquoise") ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/penguins-hist-fill-out-1.png)<!-- --> ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Mapping with geoms Now we'll use a different `geom`---we'll add a layer of points to our plot using `geom_point()` -- ```r penguins %>% ggplot() + geom_point(mapping = aes(x = flipper_length_mm)) ``` ``` ## Error: geom_point requires the following missing aesthetics: y ``` ![](09-slides_files/figure-html/penguins-point-error-1.png)<!-- --> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Mapping with geoms We get an error, telling us that `geom_point()` requires the y-aesthetic. This makes sense---we need an x- and y-axis to define where points belong on a scatter plot. Let's add `bill_length_mm` as the y-axis -- .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot() + geom_point(mapping = aes(x = flipper_length_mm, y = bill_length_mm)) ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/penguins-point-fixed-out-1.png)<!-- --> ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Mapping with geoms Let's find out if the relationship between `flipper_length_mm` and `bill_length_mm` relates to the species of penguin. We'll map `species` to the `color` aesthetic (similar to `fill`, but for 1D objects). -- .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot() + geom_point( mapping = aes(x = flipper_length_mm, y = bill_length_mm, color = species)) ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/penguins-point-color-out-1.png)<!-- --> ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Mapping with geoms Notice that we included `color` **inside** our aesthetic mapping call (`mapping = aes()`) here, but not when we filled our histogram with the color turquoise earlier. -- This is the difference between mapping an aesthetic to **data** and just setting an aesthetic to some **value** (e.g., "turquoise"). -- This is a fairly common mistake, so let's take a look at an example --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Mapping with geoms What happens if we tell `ggplot` to make our points turquoise but accidentally include that **inside** the `aes()` call? -- .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot() + geom_point(mapping = aes(x = flipper_length_mm, y = bill_length_mm, color = "turquoise")) ``` ] .panel[.panel-name[Plot] .pull-left[ ![](09-slides_files/figure-html/penguins-point-turq-out-1.png)<!-- --> ] .pull-right[ This is not what we want! `ggplot` is treating the value "turquoise" as if it were part of our data, which it isn't. ] ] ] --- class: yourturn # Your Turn 1
05
:
00
1. Create a scatter plot to visualize the relationship between `flipper_length_mm` and `bill_length_mm`. 1. Build on your plot above by adding an aesthetic to visualize the effect of `species`. Choose any aesthetic you’d like or play around with a few. What do they do? How might you use more than one aesthetic? *Note:* Options for aesthetics include `color`, `shape`, `size`, and `alpha`. --- class: solution # Solution ## Q1 .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot() + geom_point(aes(x = flipper_length_mm, y = bill_length_mm)) ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/penguins-yt-q1-out-1.png)<!-- --> ] ] --- class: solution # Solution ## Q2 (answers will vary...) .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot() + geom_point(aes(x = flipper_length_mm, y = bill_length_mm, color = species, shape = species), alpha = 0.5, size = 3) ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/penguins-yt-q2-out-1.png)<!-- --> ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Mapping with geoms We could also make separate graphs for each `species` using `facet_wrap()`. We do this by passing a one-sided formula to `facet_wrap()`. -- .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot() + geom_point(aes(x = flipper_length_mm, y = bill_length_mm)) + facet_wrap(~species) ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/penguins-point-facet-out-1.png)<!-- --> ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Mapping with geoms Another thing we often want to do is to add a line over our scatterplot to describe the linear relationship between variables. We can do this by adding a `geom_smooth()` layer to our plot. -- .panelset[ .panel[.panel-name[Code 1] ```r penguins %>% ggplot() + geom_point(aes(x = flipper_length_mm, y = bill_length_mm, color = species)) + geom_smooth(aes(x = flipper_length_mm, y = bill_length_mm), color = "black") ``` ] .panel[.panel-name[Plot 1] .pull-left[ ![](09-slides_files/figure-html/penguins-smooth-out-1.png)<!-- --> ] .pull-right[ Note that "loess" is the default function for `geom_smooth()`. Learn more on that [here](http://www.statisticshowto.com/lowess-smoothing/). ] ] .panelset[ .panel[.panel-name[Code 2] You can change that by setting the `method` argument in `geom_smooth()`. Let's change it to our old friend linear regression or "lm". ```r penguins %>% ggplot() + geom_point(aes(x = flipper_length_mm, y = bill_length_mm, color = species)) + geom_smooth(aes(x = flipper_length_mm, y = bill_length_mm), color = "black", method = "lm") ``` ] .panel[.panel-name[Plot 2] ![](09-slides_files/figure-html/point-smooth-lm-out-1.png)<!-- --> ] ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Global aesthetic mapping Our code so far has been getting rather inefficient. We're specifying the x and y axis for each `geom_*` call. -- Instead, we can use **global** aesthetic mappings, which are specified in the `ggplot()` call. -- Global mappings are inherited by each layer unless they're overwritten. --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Global aesthetic mapping Let's re-make our previous plot using global aesthetic mapping. -- .panelset[ .panel[.panel-name[Code] ```r penguins %>% ggplot(aes(x = flipper_length_mm, y = bill_length_mm))+ geom_point(aes(color = species)) + geom_smooth(color = "black", method = "lm") ``` ] .panel[.panel-name[Plot] ![](09-slides_files/figure-html/point-smooth-global-out-1.png)<!-- --> ] .panel[.panel-name[Explanation] So...what do we put in global aesthetic mapping and what do we put in the aesthetic mapping of specific geoms? You want to put anything in the global mapping that you want *every layer to inherit* (or at least the majority of them). In the code above, I defined the `x` and `y` aesthetics globally because I want those the same in every `geom`. However, I *don't* define the `color` aesthetic globally, because `color` is geom-specific in this case. ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Global aesthetic mapping Let's take a look at the previous example again, but this time with `color` in the global aesthetic... -- .panelset[ .panel[.panel-name[Code 1] ```r penguins %>% ggplot(aes(x = flipper_length_mm, y = bill_length_mm, color = species))+ geom_point() + # inherit global geom_smooth(method = "lm") #inherit global ``` ] .panel[.panel-name[Plot 1] .pull-left[ ![](09-slides_files/figure-html/smooth-color-global-out-1.png)<!-- --> ] .pull-right[ As you can see, global aesthetic mapping gets inherited by every layer. We can override this by providing a different aesthetic mapping in individual `geom()` calls... ] ] .panel[.panel-name[Code 2] ```r penguins %>% ggplot(aes(x = flipper_length_mm, y = bill_length_mm, color = species))+ geom_point() + #inherit global geom_smooth(method = "lm", color = "black") #override global `color` ``` ] .panel[.panel-name[Plot 2] ![](09-slides_files/figure-html/smooth-color-global-override-out-1.png)<!-- --> ] ] --- class: yourturn # Your Turn 2
06
:
00
1. Take a look at the `diamonds` data set that is loaded as part of the `{ggplot2}` package. Use `glimpse()`, `str()`, `head()`, or any other data viewing function we've previously discussed. 1. Fill in the blanks in the code to re-create the plot below. *Note*: This plot uses a geom we haven't seen yet called `geom_bar()`, which I've filled in for you. <img src="images/ggplot2_yourturn_figure.png" width="50%" style="display: block; margin: auto;" /> --- class: solution # Solution .panelset[ .panel[.panel-name[Q1] ```r glimpse(diamonds) ``` ``` ## Rows: 53,940 ## Columns: 10 ## $ carat <dbl> 0.23, 0.21, 0.23, 0.29, 0.31, 0.24, 0.24, 0.26, 0.22, ~ ## $ cut <ord> Ideal, Premium, Good, Premium, Good, Very Good, Very G~ ## $ color <ord> E, E, E, I, J, J, I, H, E, H, J, J, F, J, E, E, I, J, ~ ## $ clarity <ord> SI2, SI1, VS1, VS2, SI2, VVS2, VVS1, SI1, VS2, VS1, SI~ ## $ depth <dbl> 61.5, 59.8, 56.9, 62.4, 63.3, 62.8, 62.3, 61.9, 65.1, ~ ## $ table <dbl> 55, 61, 65, 58, 58, 57, 57, 55, 61, 61, 55, 56, 61, 54~ ## $ price <int> 326, 326, 327, 334, 335, 336, 336, 337, 337, 338, 339,~ ## $ x <dbl> 3.95, 3.89, 4.05, 4.20, 4.34, 3.94, 3.95, 4.07, 3.87, ~ ## $ y <dbl> 3.98, 3.84, 4.07, 4.23, 4.35, 3.96, 3.98, 4.11, 3.78, ~ ## $ z <dbl> 2.43, 2.31, 2.31, 2.63, 2.75, 2.48, 2.47, 2.53, 2.49, ~ ``` ] .panel[.panel-name[Q2] .pull-left[ ```r diamonds %>% ggplot(aes(x = cut, y = price, fill = color)) + geom_bar(position = "dodge", stat = "summary", fun = "mean") + facet_wrap(~clarity) + labs(title = "Diamond Price by Cut, Color, and Clarity", x = "Cut", y = "Price", fill = "Color") ``` ] .pull-right[ ![](09-slides_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] ] ] --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Labels and themes You can do a TON more customization of your plots than what we've covered so far. The possibilities with `{ggplot2}` really are endless! -- For example, you can change your axis labels, tweak the theme of the plot, etc... -- *** Good resources: https://ggplot2.tidyverse.org/reference/theme.html http://r-statistics.co/Complete-Ggplot2-Tutorial-Part2-Customizing-Theme-With-R-Code.html --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Eric Ekholm](https://github.com/ekholme/tidytuesday#most-recent-finished-contribution)] <img src="images/ggplot_ex_1.png" width="60%" style="display: block; margin: auto;" /> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Eric Ekholm](https://github.com/ekholme/tidytuesday#most-recent-finished-contribution)] <img src="images/ggplot_ex_2.jpg" width="60%" style="display: block; margin: auto;" /> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Georgios Karamanis](https://github.com/gkaramanis/tidytuesday#highlights-click-on-image-to-go-to-code-)] <img src="images/ggplot_ex_3.png" width="60%" style="display: block; margin: auto;" /> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Georgios Karamanis](https://github.com/gkaramanis/tidytuesday#highlights-click-on-image-to-go-to-code-)] <img src="images/ggplot_ex_4.png" width="40%" style="display: block; margin: auto;" /> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Jake Kaupp](https://github.com/jkaupp/tidytuesdays)] <img src="images/ggplot_ex_5.jpg" width="80%" style="display: block; margin: auto;" /> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Georgios Karamanis](https://github.com/gkaramanis/tidytuesday#highlights-click-on-image-to-go-to-code-)] <img src="images/ggplot_ex_6.jpg" width="80%" style="display: block; margin: auto;" /> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Torsten Sprenger](https://github.com/spren9er/tidytuesday)] <img src="images/ggplot_ex_7.jpeg" width="60%" style="display: block; margin: auto;" /> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Ariane Aumaitre](https://twitter.com/ariamsita?s=20&t=e0M_tAtBxKKKdjlWpykazw)] <img src="images/ggplot_ex_8.jpeg" width="80%" style="display: block; margin: auto;" /> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Cara Thompson](https://github.com/cararthompson/tidytuesdays)] <img src="images/ggplot_ex_9.jpeg" width="60%" style="display: block; margin: auto;" /> --- background-image: url(images/hex/ggplot2.png) background-position: 90% 5% background-size: 10% # Example 🤯 .footnote[Image from [Cameron S. Kay](https://twitter.com/cameronskay/status/1552003928295817217/photo/1) 🙃] <img src="images/ggplot_ex_10.jpeg" width="100%" /> --- class: inverse, center, middle # Q & A
05
:
00
--- class: inverse, center, middle # Break!
15
:
00