2B Lab 1 Week 2

This is the pair coding activity related to Chapter 8.

Task 1: Open the R project for the lab

Task 2: Create a new `.Rmd` file

… and name it something useful. If you need help, have a look at Section 1.3.

Task 3: Load in the library and read in the data

The data should already be in your project folder. If you want a fresh copy, you can download the data again here: data_pair_coding.

We are using the packages rstatix, tidyverse, qqplotr, lsr today. Make sure to load rstatix in before tidyverse.

We also need to read in dog_data_clean_wide.csv. Again, I’ve named my data object dog_data_wide to shorten the name but feel free to use whatever object name sounds intuitive to you.

For the plot, we will need the data in long format. We can either read in dog_data_clean_long.csv to take a shortcut, or wrangle the data from dog_data_wide. I’ve taken the shortcut and named my data object dog_data_long.

Task 4: Tidy data for a paired t-test

Not much tidying to do for today.

Pick a variable of interest and select the pre- and post-scores, and calculate the difference score. Store them in a separate data object with a meaningful name.

I will use Loneliness as an example and call my data object dog_lonely. Regardless of your chosen variable, your data object should look like/ similar to the table below.

RID	Loneliness_pre	Loneliness_post	Loneliness_diff
1	2.25	1.70	-0.55
2	1.90	1.60	-0.30
3	2.25	2.25	0.00
4	1.75	2.05	0.30
5	2.85	2.70	-0.15

In dog_data_long, we want to turn Stage into a factor so we can re-order the labels (i.e., “pre” before “post”).

Solution for Tasks 3 and 4

## Task 3
library(rstatix)
library(tidyverse)
library(lsr)

dog_data_wide <- read_csv("data/dog_data_clean_wide.csv")

Rows: 284 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): GroupAssignment, Year_of_Study, Live_Pets
dbl (21): RID, Age_Yrs, Consumer_BARK, Flourishing_pre, Flourishing_post, PA...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

dog_data_long <- read_csv("data/dog_data_clean_long.csv")

Rows: 568 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): GroupAssignment, Year_of_Study, Live_Pets, Stage
dbl (12): RID, Age_Yrs, Consumer_BARK, Flourishing, PANAS_PA, PANAS_NA, SHS,...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## Task 4
dog_lonely <- dog_data_wide %>% 
  select(RID, Loneliness_pre, Loneliness_post) %>% 
  mutate(Loneliness_diff = Loneliness_post - Loneliness_pre)

dog_data_long <- dog_data_long %>% 
  mutate(Stage = factor(Stage,
                        levels = c("pre", "post")))

Task 5: Compute descriptives

We want to determine the mean and sd of:

the pre-scores
the post-scores, and
the difference scores

Store them in a data object called descriptives.

Solution

descriptives <- dog_lonely %>% 
  summarise(mean_pre = mean(Loneliness_pre),
            sd_pre = sd(Loneliness_pre),
            mean_post = mean(Loneliness_post),
            sd_post = sd(Loneliness_post),
            diff = mean(Loneliness_diff),
            sd_diff = sd(Loneliness_diff))

Task 6: Check assumptions

Assumption 1: Continuous DV

Is the dependent variable (DV) continuous? Answer:

Yes. The DV is the difference in loneliness scores and it is continuousNo. The DV is the pre- and post-stages and it is categoricalNo. The DV is the difference in loneliness scores but it is categorical

Assumption 2: Data are independent

Each pair of values in the dataset has to be independent, meaning each pair of values needs to be from a separate participant. Answer:

Assumption 3: Normality

Looking at the violin-boxplots below, do you think the assumption of normality holds?

Note

The axis label of Plot 2 turned out to be quite long here. I’ve used the escape character \n to break it up across 2 lines.

## Plot 1
ggplot(dog_data_long, aes(x = Stage, y = Loneliness, fill = Stage)) +
  geom_violin(alpha = 0.5) +
  geom_boxplot(width = 0.4, alpha = 0.8) +
  scale_fill_viridis_d(guide = "none") +
  theme_classic() +
  labs(x = "Time point", y = "mean Loneliness Scores")

## Plot 2
ggplot(dog_lonely, aes(x = "", y = Loneliness_diff)) +
  geom_violin(fill = "#21908C", alpha = 0.5) +
  geom_boxplot(fill = "#21908C", width = 0.4) +
  theme_classic() +
  labs(x = "",
       y = "Difference in mean Loneliness scores \nbetween pre- and post- intervention") # \n forces a manual line break in the axis label

Plots displayed to assess normality assumption

Answer:

yes, because both pre- and post-scores in Plot 1 are approximately normally distributedyes, because the difference scores in Plot 2 are approximately normally distributedno, because both pre- and post-scores in Plot 1 are extremely skewedno, because the difference scores in Plot 2 are extremely skewed

Conclusion from assumption tests

With all assumptions tested, which statistical test would you recommend for this analysis?

Answer:

All assumptions held. We will conduct a paired-samples t-test.The assumption of normality was violated. We will conduct a Wilcoxon signed-rank test.

Task 7: Computing a paired-sample t-test with effect size & interpret the output

Step 1: Compute the paired-sample t-test. The structure of the function is as follows:

t.test(your_data$var1, your_data$var2, paired = TRUE)

Solution

t.test(dog_lonely$Loneliness_pre, dog_lonely$Loneliness_post, paired = TRUE)

Step 2: Calculate an effect size

Calculate Cohen’s D. The structure of the function is as follows:

cohensD(your_data$var1, your_data$var2, method = "paired")

Solution

cohensD(dog_lonely$Loneliness_pre, dog_lonely$Loneliness_post, method = "paired")

Step 3: Interpreting the output

Below are the outputs for the descriptive statistics (table), paired-samples t-test (main output), and Cohen’s D (last line starting with [1]). Based on these, write up the results in APA style and provide an interpretation.

mean_pre	sd_pre	mean_post	sd_post	diff	sd_diff
2.040187	0.5304488	1.914298	0.5344914	-0.1258895	0.2290269


    Paired t-test

data:  dog_lonely$Loneliness_pre and dog_lonely$Loneliness_post
t = 9.2632, df = 283, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 0.09913876 0.15264034
sample estimates:
mean difference 
      0.1258895

[1] 0.5496716

We hypothesised that there would be a significant difference between Loneliness measured before (M = , SD = ) and after (M = , SD = ) the dog intervention. On average, participants felt less lonely after the intervention (M_diff = , SD_diff = ). Using a paired-samples t-test, the effect was found to be and of a magnitude, t() = , p , d = . We therefore .

Task 1: Open the R project for the lab

Task 2: Create a new .Rmd file

Task 3: Load in the library and read in the data

Task 4: Tidy data for a paired t-test

Task 5: Compute descriptives

Task 6: Check assumptions

Assumption 1: Continuous DV

Assumption 2: Data are independent

Assumption 3: Normality

Conclusion from assumption tests

Task 7: Computing a paired-sample t-test with effect size & interpret the output

Task 2: Create a new `.Rmd` file