2A Lab 5 Week 7

This is the pair coding activity related to Chapter 5.

Task 1: Open the R project for the lab

Task 2: Create a new .Rmd file

… and name it something useful. If you need help, have a look at Section 1.3.

Task 3: Load in the library and read in the data

The data should already be in your project folder. If you want a fresh copy, you can download the data again here: data_pair_coding.

We are using the package tidyverse today, and the data file we need to read in is dog_data_clean_wide.csv. I’ve named my data object dog_data_wide to shorten the name but feel free to use whatever object name sounds intuitive to you.

Task 4: Re-create one of the 3 plots below

Re-create one of the 3 plot below:

  • grouped barchart (easy)
  • violin-boxplot (medium)
  • scatterplot (hard)

Difficulty level: easy

  • I’ve created a new data object data_bar to select the relevant variables but you could also work straight from dog_data_wide.
  • Consider turning the 2 categorical variables into factors before plotting
  • Plotting should be relatively straightforward - these are default colours and you would only need to change the axes labels/ legend title.

We can change all of the 3 labels in one go. Check out the ## Prettier grouped barchart in Section 4.5, where we did exactly that.

data_bar <- dog_data_wide %>% 
  select(RID, GroupAssignment, Year_of_Study) %>% 
  mutate(GroupAssignment = factor(GroupAssignment,
                                  levels = c("Direct", "Indirect", "Control")),
         Year_of_Study = factor(Year_of_Study,
                                levels = c("First", "Second", "Third", "Fourth", "Fifth or above")))
ggplot(data_bar, aes(x = GroupAssignment, fill = Year_of_Study)) +
  geom_bar(position = "dodge") +
  labs(x = "Experimental Group", y = "Count", fill = "Year of Study")

Difficulty level: medium

  • I’ve created a new data object data_vb to select the relevant variables but you could also work straight from dog_data_wide.
  • Consider turning the categorical variable into a factor before plotting
  • Plotting tips:
    • the colour scale is one of the viridis options
    • it’s a bit of trial and error for the opacity of the violin and the box width of the boxes (it is totally fine if it looks approximately right)
    • the tricky part might be adjusting the y-axis ticks. Take inspiration from the histogram in Section 5.3 (Tab Axes labels, margins, and breaks)
data_vb <- dog_data_wide %>% 
  select(RID, Year_of_Study, Loneliness_post) %>% 
  mutate(Year_of_Study = factor(Year_of_Study,
                                levels = c("First", "Second", "Third", "Fourth", "Fifth or above")))
ggplot(data_vb, aes(x = Year_of_Study, y = Loneliness_post, fill = Year_of_Study)) +
  geom_violin(alpha = 0.5) +
  geom_boxplot(width = 0.25) +
  scale_y_continuous(breaks = c(seq(from = 1, to = 4, by = 0.5)),
                     limits = c(1, 4)) +
  scale_fill_viridis_d(option = "magma",
                       guide = "none") +
  labs(x = "Year of Study", y = "Loneliness scores post intervention") +
  theme_classic()

Difficulty level: hard

  • Data wrangling: Even though we cleaned the data, it may not be in the shape for the task at hand. Have a look what the data object dog_data_wide looks like and think about how you’d need to restructure it to be able to plot the scatterplot. As always, I would suggest creating a new data object for the scatterplot (e.g., data_scatter).
  • Once you have the data in the right shape, start plotting. Start with a basic scatterplot and then add various layers and change elements you notice.
  • Remember, some finetuning might need to be done in data_scatter rather than plot itself.
RID PANAS_PA_pre PANAS_PA_post PANAS_NA_pre PANAS_NA_post
1 3.2 3.8 1.0 1.2
2 3.0 3.2 1.8 1.0
3 2.8 3.0 1.6 1.6
4 4.2 3.8 1.8 1.6
5 3.4 4.0 2.2 1.6
RID Subscale pre post
1 Positive Affect 3.2 3.8
1 Negative Affect 1.0 1.2
2 Positive Affect 3.0 3.2
2 Negative Affect 1.8 1.0
3 Positive Affect 2.8 3.0
  • Step 1: select the variables you need from dog_data_wide.
  • Step 2: pivot all columns (bar the Participant ID) into long format
  • Step 3: think about how to separate information of the subscales and timepoints
  • Step 4: pivot from long into wide format. Take some inspiration from the Special case: Variables with subscales scenario above.
  • The colour scheme is Dark2 from the colour palette brewer
  • The colour of the trendline is #7570b3
  • Think about how to make the Negative and Positive Affect points different colours. The solution is in Section 5.4
  • Renaming the different facets is one of those things that should be fixed in the data object instead
data_scatter <- dog_data_wide %>% 
  select(RID, starts_with("PANAS")) %>% 
  pivot_longer(cols = -RID, names_to = "Q", values_to = "Values") %>% 
  separate(Q, into = c(NA, "Subscale", "Timepoint"), sep = "_") %>% 
  pivot_wider(names_from = Timepoint, values_from = Values) %>% 
  mutate(Subscale = case_match(Subscale,
                               "NA" ~ "Negative Affect",
                               "PA" ~ "Positive Affect"),
         Subscale = factor(Subscale)) %>% 
  drop_na()
ggplot(data_scatter, aes(x = pre, y = post, colour = Subscale)) +
  geom_point() +
  geom_smooth(method = lm, colour = "#7570b3") +
  facet_wrap(~Subscale) +
  labs(x = "Pre-Intervention (Timepoint 1)",
       y = "Post-Intervention (Timepoint 2)") +
  scale_colour_brewer(palette = "Dark2",
                      guide = "none") +
  theme_bw()

If you are extremely fast, challenge yourself and re-create one of the other plots.