2A Lab 2 Week 3

This is the pair coding activity related to Chapter 2.

We will continue working with the data from Binfet et al. (2021), focusing on the randomised controlled trial of therapy dog interventions. Today, our goal is to calculate an average Flourishing score for each participant at time point 1 (pre-intervention) using the raw data file dog_data_raw. Currently, the data looks like this:

RID F1_1 F1_2 F1_3 F1_4 F1_5 F1_6 F1_7 F1_8
1 6 7 5 5 7 7 6 6
2 5 7 6 5 5 5 5 4
3 5 5 5 6 6 6 5 5
4 7 6 7 7 7 6 7 4
5 5 5 4 6 7 7 7 6

However, we want the data to look like this:

RID Flourishing_pre
1 6.125
2 5.250
3 5.375
4 6.375
5 5.875

Task 1: Open the R project you created last week

If you haven’t created an R project for the lab yet, please do so now. If you already have one set up, go ahead and open it.

Task 2: Open your .Rmd file from last week

Since we haven’t used it much yet, feel free to continue using the .Rmd file you created last week in Task 2.

Task 3: Load in the library and read in the data

The data should be in your project folder. If you didn’t download it last week, or if you’d like a fresh copy, you can download the data again here: data_pair_coding.

We will be using the tidyverse package today, and the data file we need to read in is dog_data_raw.csv.

# loading tidyverse into the library
library(???)

# reading in `dog_data_raw.csv`
dog_data_raw <- read_csv("???")

Task 4: Calculating the mean for Flourishing_pre

  • Step 1: Select all relevant columns from dog_data_raw, including participant ID and all items from the Flourishing questionnaire completed before the intervention. Store this data in an object called data_flourishing.

Look at the codebook. Try to determine:

  • The variable name of the column where the participant ID is stored.
  • The items related to the Flourishing scale at the pre-intervention stage.

From the codebook, we know that:

  • The participant ID column is called RID.
  • The Flourishing items at the pre-intervention stage start with F1_.
data_flourishing <- ??? %>% 
  select(???, F1_???:F1_???)
  • Step 2: Pivot the data from wide format to long format so we can calculate the average score more easily (in step 3).

Which pivot function should you use? We have pivot_wider() and pivot_longer() to choose from.

We also need 3 arguments in that function:

  • The columns you want to select (e.g., all the Flourishing items),
  • The name of the column where the current column headings will be stored (e.g., “Questionnaire”),
  • The name of the column that should store all the values (e.g., “Responses”).

We need pivot_longer(). You already encountered pivot_longer() in first year (or in the individual walkthrough if you have already completed this Chapter). The 3 arguments was also a give-away; pivot_wider() only requires 2 arguments.

  pivot_longer(cols = ???, names_to = "???", values_to = "???")
  • Step 3: Calculate the average Flourishing score per participant and name this column Flourishing_pre to match the table above.

Before summarising the mean, you may need to group the data.

To compute an average score per participant, we would need to group by participant ID first.

  group_by(???) %>% 
  summarise(Flourishing_pre = mean(???)) %>% 
  ungroup()
# loading tidyverse into the library
library(tidyverse)

# reading in `dog_data_raw.csv`
dog_data_raw <- read_csv("dog_data_raw.csv")

# Task 4: Tidying 
data_flourishing <- dog_data_raw %>% 
  # Step 1
  select(RID, F1_1:F1_8) %>% 
  # Step 2
  pivot_longer(cols = -RID, names_to = "Questionnaire", values_to = "Responses") %>% 
  # Step 3
  group_by(RID) %>% 
  summarise(Flourishing_pre = mean(Responses)) %>% 
  ungroup()