Chapter 8 Data Visualisation 2
Intended Learning Outcomes
- Learn how to make multiple plots at the same time
- Learn the difference between global and local aesthetics
- Learn how to work with themes: adding labels, changing font size, colours and axes ticks.
- Learn to work with legends.
This lesson is led by Greta Todorova.
8.1 Pre-steps
Today, we will be working with ggplot2
which is part of tidyverse
and data from the Scottish Government saved in the file free_movement_uk.csv
. Load tidyverse
into the library, and read the data into our Global Environment
as migration_scot
.
library(tidyverse)
migration_scot <- read_csv("free_movement_uk.csv")
8.2 Re-introduction to the data
This data is openly available from the Scottish Government, and it introduces the flow of people at different ages and sex into and out of Scotland from the rest of UK (RUK) and Overseas. We have several variables to work with:
Variable | Description |
---|---|
FeatureCode | Codes given by the Scottish Government |
DateCode | Year of data collected |
Measurement | What type of measurement it is (here we have only counts i.e. the number of people) |
Units | Units (here we have number of people) |
Value | The actual count (i.e. the number of people) |
Age | Age of the counted people (separate by age, and total (sum of all age groups)) |
Sex | Sex of the counted people (separate by sex, and total (sum all sex groups)) |
Migration Source | Where the people are coming from (Overseas, RUK) |
Migration Type | Whether people are coming or leaving (In, Out, and Net (people coming in less the people leaving)) |
This is the wrangled data from last week.
traffic_scot <- migration_scot %>%
select(DateCode, Value, Sex,
`Migration Source`, `Migration Type`, Age) %>%
filter(DateCode == '2016', Age == 'All',
Sex != 'All', `Migration Type` != 'Net')
8.3 Create multiple plots in one
Sometimes, we have way too many variables and they are all important. You should always avoid having busy plots. They get too confusing, and it is easy for people to misinterpret them.
We are going to look at two ways to create multiple plots.
8.3.1 Select the data beforehand and create different plots.
Simple Task 1:
From the data we just created, select only data relevant for movement to and from the rest of the UK and save it to the Global Environment
as an object called rest_of_uk
.
rest_of_uk <- NULL
rest_of_uk <- traffic_scot %>%
filter(`Migration Source` == 'To-from Rest of UK')
Simple Task 2:
Create a column plot that shows the number of people for each sex using the data we just created using geom_col()
. Separate the data by using different colours based on Migration Type
. Don’t forget to use position = ‘dodge’
so that your columns do not overlap.
ggplot(NULL) + NULL
ggplot(rest_of_uk, aes(x = Sex, y = Value,
fill = `Migration Type`)) +
geom_col(position = 'dodge')
Simple Task 3:
Now plot only the data for the movement to and from Overseas following the previous two steps. This time, make sure you save your wrangled data into an object called overseas
and then use it to create your plot.
overseas <- NULL
ggplot(NULL) + NULL
overseas<- traffic_scot %>%
filter(`Migration Source` == 'To-from Overseas')
ggplot(overseas, aes(x = Sex, y = Value,
fill = `Migration Type`)) +
geom_col(position = 'dodge')
This becomes very cumbersome when we have more than one variable. Imagine you have data for 10 different countries. Can you imagine doing these plots by hand. Moreover, when you have to arrange them on a page to show how things differ you will loose a lot of time. Instead we can use facets.
8.3.2 Facets
Facets allow us to create separate plots without manually separating the data. Moreover we can specify how we want to put the plots in a grid: do we want them side by side or one on top of each other, etc.
We can use several facet functions.
facet_grid(variable_to_split_by)
structures the rows and columns of graphs based on a third variablefacet_grid(.~variable)
creates columns - i.e. side by side panelsfacet_grid(variable~.)
creates rows - i.e. one on top of the otherfacet_grid(variable1~variable2)
creates rows based on variable1 and columns based on variable2
facet_wrap(variable)
creates a grid with rectangular slots for the plots based on your variables
Let’s replicate the two plots we just created but using facet_grid()
and put them side-by-side in one. Now we have both plots next to each other, which makes it much easier to look at the differences in the migration patterns between the two Migration Sources.
ggplot(traffic_scot, aes(x = Sex, y = Value,
fill = `Migration Type`)) +
geom_col(position = 'dodge') +
facet_grid(.~`Migration Source`)
Question Time
Let’s look at these plots.
- Which sex tends to move to Scotland more, regardless of migration location?
- Which migration location do both males and females migrate to more when they leave Scotland?
8.4 Global and Local aesthetics
So far, we have been working with a Global definition of the aesthetics. This allows us to specify the axes and the groupings only once.
Sometimes, we want to use multiple data and put multiple plots on top of each other. For these occasions we can specify the aesthetics at a local level - i.e. in the geoms.
If you remember from last week, we specified both the colour and the shape in our ggplot()
when we were making the line graph. The geom_line()
inherited the colour but not the shape argument. However, if we have two geoms that share the same characteristics but we do not want to have the same colours, we can move them to each of the geoms instead of specifying them in ggplot()
.
Let’s redo the violins and boxplots from last week, and put them together into two separate layers of the same graph. This time, we will give them some colour. We will colour the violins with the fill
argument and make the boxplots transparent using the alpha
argument.
boxes <- migration_scot %>%
filter(Sex == 'Female',
`Migration Source` == 'To-from Overseas',
Age == 'All',
`Migration Type` != 'Net')
#make violins and boxplots
ggplot(boxes, aes(x = `Migration Type`, y = Value)) +
geom_violin(aes(fill = `Migration Type`)) +
geom_boxplot(aes(alpha = 0.5))
Because alpha
(the transparency argument) and fill
are arguments for both box plots and violins, if we had put them in the ggplot()
layer, they would both be inherited by the two geoms.
8.5 Themes: Making your plots pretty: looking the way you want them to
The great thing about making your own plots in R is that you can make it look the way you want it to look. Even better, if you are writing a paper, graphs are easy to adjust to match journals’ criteria. Let’s adjust the look of our graph.
8.5.1 Ready-made themes
ggplot2 comes with several ready made themes. Some of them are:
theme_bw(): a white background with major axes and border. theme_minimal(): a white background with major axes and more. Just use the cheetsheet for visualisations with ggplot for some more examples.
Themes are just like any other layer for ggplot. You just add them at the end of your plot with +
. Let’s add one of these theme formats to our plot above.
ggplot(boxes, aes(x = `Migration Type`, y = Value)) +
geom_violin(aes(fill = `Migration Type`)) +
geom_boxplot(aes(alpha = 0.5)) +
theme_bw()
8.5.2 Adding axes titles and headings
Next, let’s add some titles and better labels with labs()
. You need to specify which labels you want to change - x, y or the title, and type in the labels you want.
ggplot(boxes, aes(x = `Migration Type`, y = Value)) +
geom_violin(aes(fill = `Migration Type`)) +
geom_boxplot(aes(alpha = 0.5)) +
theme_bw()+
labs(title = 'Female migration to and from Overseas into Scotland',
x = 'Migration direction',
y = 'Number of people')
Sometimes, we need the titles to be a specific size. We can change all of that in an additional theme()
layer. We specify the argument we want to change and what we need to change. In our case, we want to change the size of the x axis title to 12pt, and the graph title to 20pt. Let’s also pretend we do not want the y axis title. We can also specify this in the theme()
layer.
We do this by specifying element_text()
and element_blank()
for the correct theme arguments (axis.title.x
, title
, axis.title.x
). Simply, for that element of the plot, what do we want to change - text and to make it blank.
ggplot(boxes, aes(x = `Migration Type`, y = Value)) +
geom_violin(aes(fill = `Migration Type`)) +
geom_boxplot(aes(alpha = 0.5)) +
theme_bw()+
labs(title = 'Female migration to and from\nOverseas into Scotland',
x = 'Migration direction',
y = 'Number of people') +
theme(
axis.title.x = element_text(size = 12),
# read this as: change the size of element_text of the
#title of the x axis
title = element_text(size = 20),
axis.title.y = element_blank()
# read this as: change the title of the y axis to blank
)
8.6 Changing colours
We can rely on the colour schemes automatically, or we can manually specify colours. To change any of the colours, we need to specify which scale we are changing. This is done by specifying the aesthetic. In our case, we are changing the colours of the fill
aesthetic so we use scale_fill_manual()
. Instead of using names, you can use any hex colour you want. A hex color is the representation of a color using a 6 digit code. scale_fill_manual()
works like another layer, which overwrites the automatic colour schemes. Because it overwrites an existing function, it has to be after the geom that uses the fill
argument.
ggplot(boxes, aes(x = `Migration Type`, y = Value)) +
geom_violin(aes(fill = `Migration Type`)) +
geom_boxplot(aes(alpha = 0.5)) +
scale_fill_manual(values = c('#127d69', '#cedc00')) +
#here I have picked two specific colours
theme_bw() +
labs(title = 'Female migration to and from\nOverseas into Scotland',
x = 'Migration direction',
y = 'Number of people') +
theme(
axis.title.x = element_text(size = 12),
title = element_text(size = 20),
axis.title.y = element_blank()
)
8.7 Legends
We can see that for any additional aesthetic we have put, ggplot has created a legend on the side.
Now let’s change what the labels actually say. We do this in the same place we specified the colours scale_fill_manual
because again, we are overwriting the automatic fill
labels.
Let’s also: 1) hide the legend associated with the box plots; 2) move the legend to the bottom of the graph using the legend.position
argument; and 3) change the size of the legend labels (using legend.title
). We do step 3 in the theme()
layer as well. Because we have more than one legend, everything we do will be applied to both legends. This is a good idea for consistency. But when we want to hide one of the legends, we have to specify that in the geom itself using the argument show.legend
.
ggplot(boxes, aes(x = `Migration Type`, y = Value)) +
geom_violin(aes(fill = `Migration Type`)) +
geom_boxplot(aes(alpha = 0.5), show.legend = FALSE) +
scale_fill_manual(values = c('#127d69', '#cedc00'),
labels = c('Coming to Scotland',
'Leaving Scotland')) +
theme_bw() +
labs(title = 'Female migration to and from\nOverseas into Scotland',
x = 'Migration direction',
y = 'Number of people') +
theme(
axis.title.x = element_text(size = 12),
title = element_text(size = 20),
axis.title.y = element_blank(),
legend.position = 'bottom', #move the legend to the bottom
legend.title = element_text(size = 10)
)
8.8 Changing axes ticks
Now, we also need to change what is stated on our x axis to match the legend. We use scale_x_discrete()
to give new names inside graph, so that we do not have to change our dataset. We specify the labels of the new groups by giving the old name in quotes and then the new name, also in quotes. We use scale_x_discrete()
, because we have discrete groups - a categorical variable.
ggplot(boxes, aes(x = `Migration Type`, y = Value)) +
geom_violin(aes(fill = `Migration Type`)) +
geom_boxplot(aes(alpha = 0.5), show.legend = FALSE) +
scale_fill_manual(values = c('#127d69', '#cedc00'),
labels = c('Coming to Scotland',
'Leaving Scotland')) +
scale_x_discrete(labels = c('In' = 'Coming to Scotland',
'Out' = 'Leaving Scotland')) +
theme_bw() +
labs(title = 'Female migration to and from\nOverseas into Scotland',
x = 'Migration direction',
y = 'Number of people') +
theme(
axis.title.x = element_text(size = 12),
title = element_text(size = 20),
axis.title.y = element_blank(),
legend.position = 'bottom',
legend.title = element_text(size = 10)
)
Let’s also say that we are plotting data which is continuous but we want to show each of the values. For example we have years and we want to show every year. ggplot
will try and skip some of the years to make the graph look neater. There are several ways we can change that. We can either make the variable into a factor or a character using as.character
so that ggplot
will be forced to look at it as a discrete variable or we can change the ticks of the x axis by giving them limits
and breaks
in the layer that overwrites the original x values - scale_x_continuous()
.
Let’s use the graph from last week which showed us the change in migration across the years.
# First, we are getting the data from last week to recreate the code.
traffic_scot2 <- migration_scot %>%
filter(Age == 'All',
`Migration Type` != 'Net',
Sex == 'All') %>%
select(DateCode, Value,`Migration Source`, `Migration Type`)
ggplot(traffic_scot2, aes(DateCode, Value,
shape = `Migration Source`,
colour = `Migration Type`)) +
geom_point() +
geom_line() +
scale_x_continuous(limits = c(2002,2018), breaks = c(2002:2018))
8.9 Saving plots
If you want to save your plot to your working directory, you can use the function ggsave
. First, you have to save your plot into an object in the Global Environment
.
# Save your plot to the Global Environment.
plot1 <- ggplot(traffic_scot2, aes(DateCode, Value,
shape = `Migration Source`,
colour = `Migration Type`)) +
geom_point() +
geom_line()
# Then save it to your working directory folder
ggsave('myplot.png', plot1, width = 9 , height = 6)
8.10 Summative Homework
The fourth summative assignment is available on moodle now.
Good luck.
Check that your Rmd file knits into a html file before submitting. Upload your Rmd file (not the knitted html) to moodle.