2A Lab 8 Week 10

This is the pair coding activity related to Chapter 7.

Task 1: Open the R project for the lab

Task 2: Create a new `.Rmd` file

… and name it something useful. If you need help, have a look at Section 1.3.

Task 3: Load in the library and read in the data

The data should already be in your project folder. If you want a fresh copy, you can download the data again here: data_pair_coding.

We are using the packages tidyverse, car, and lsr today, and the data file we need to read in is dog_data_clean_wide.csv. I’ve named my data object dog_data_wide to shorten the name but feel free to use whatever object name sounds intuitive to you.

If you have not worked through chapter 7 yet, you may need to install a few packages first before you can load them into the library, for example, if car is missing, run install.packages("car") in your CONSOLE.

Task 4: Tidy data for a two-sample t-test

For today’s task, we want to analyse how students’ psychological well-being scores differed at the post_intervention time point. Specifically, we will compare the scores of students who directly interacted with the dogs (Group direct)to those who only talked to the dog handlers (Group control).

To achieve that, we need to select all relevant columns from dog_data_wide, and narrow down the dataframe to only include students assigned either to the direct or the control groups.

Step 1: Select all relevant columns from dog_data_wide. For the task at hand, those would be the participant ID RID, GroupAssignment, and Flourishing_post. Store this data in an object called dog_independent.
Step 2: Narrow down dog_independent to only include GroupAssignment groups direct or the control.
Step 3: Convert GroupAssignment into a factor.

Hints

dog_independent <- ??? %>% 
  # Step 1
  select(???, ???, ???) %>% 
  # Step 2
  filter(??? %in% c(???, ???)) %>% 
  # Step 3
  mutate(GroupAssignment = ???())

Solution for Tasks 3 and 4

# loading tidyverse and lsr into the library
library(tidyverse)
library(car)
library(lsr)

# reading in `dog_data_clean_wide.csv`
dog_data_wide <- read_csv("dog_data_clean_wide.csv")

# Task 4: Tidying 
dog_independent <- dog_data_wide %>% 
  # Step 1
  select(RID, GroupAssignment, Flourishing_post) %>% 
  # Step 2
  filter(GroupAssignment %in% c("Control", "Direct")) %>% 
  # Step 3
  mutate(GroupAssignment = factor(GroupAssignment))

Task 5: Compute descriptives

Calculate the sample size (n), the mean, and the standard deviation of the psychological well-being score for both groups. Save the output in an object called dog_independent_descriptives. The resulting dataframe should look like this:

GroupAssignment	n	mean_Flourishing	sd_Flourishing
Control	94	5.718085	0.7709738
Direct	95	5.776316	0.8638912

Hints

dog_independent_descriptives <- dog_independent %>% 
  group_by(???) %>% 
  summarise(n = n(),
            mean_Flourishing = mean(???),
            sd_Flourishing = sd(???)) %>% 
  ungroup()

Solution

# Task 5: Means & SD
dog_independent_descriptives <- dog_independent %>% 
  group_by(GroupAssignment) %>% 
  summarise(n = n(), 
            mean_Flourishing = mean(Flourishing_post),
            sd_Flourishing = sd(Flourishing_post)) %>% 
  ungroup()

Task 6: Check assumptions

Assumption 1: Continuous DV

Is the dependent variable (DV) continuous? Answer:

No. The DV is called GroupAssignment and it is categoricalNo. The DV is called Flourishing and it is categoricalYes. The DV is called GroupAssignment and it is continuousYes. The DV is called Flourishing and it is continuous

Assumption 2: Data are independent

Each observation in the dataset has to be independent, meaning the value of one observation does not affect the value of any other. Answer:

Assumption 3: Homoscedasticity (homogeneity of variance)

I’ve computed Levene’s test below. How do you interpret the output?

leveneTest(Flourishing_post ~ GroupAssignment, data = dog_independent)

	Df	F value	Pr(>F)
group	1	0.7111707	0.4001329
	187	NA	NA

Answer:

The p-value of Levene's test is significant, therefore we conclude that there is a difference between the variances in the population.The p-value of Levene's test is non-significant, therefore we conclude that there is a difference between the variances in the population.The p-value of Levene's test is significant, therefore we conclude that the variances in the population are equal.The p-value of Levene's test is non-significant, therefore we conclude that the variances in the population are equal.

Assumption 4: DV should be approximately normally distributed

Looking at the violin-boxplot below, are both groups normally distributed?

ggplot(dog_independent, aes(x = GroupAssignment, y = Flourishing_post, fill = GroupAssignment)) +
  geom_violin(alpha = 0.4) +
  geom_boxplot(width = 0.3, alpha = 0.8) +
  scale_fill_viridis_d(option = "cividis", guide = "none") +
  theme_classic() +
  labs(x = "Group", y = "Psychological well-being (post-intervention)")

Answer:

yes, both groups are normally distributedno, both groups are sightly skewedno, both groups are extremely skewed

Conclusion from assumption tests

With all assumptions tested, which statistical test would you recommend for this analysis?

Answer:

All assumptions held. We will conduct a Student two-sample t-test.The assumption of normality was violated. We will conduct a Welch two-sample t-test because it has been shown to be robust to slight deviations from normality (Delacre et al., 2017).The assumptions of normality and homoscedasticity were violated. Therefore, we will conduct a non-parametric test.

Task 7: Computing a two-sample t-test with effect size & interpret the output

Step 1: Compute the Welch two-sample t-test. The structure of the function is as follows:

t.test(DV ~ IV, data = your_dataframe, var.equal = FALSE, alternative = "two.sided")

Solution

t.test(Flourishing_post ~ GroupAssignment, data = dog_independent, var.equal = FALSE, alternative = "two.sided")

Step 2: Calculate an effect size

Calculate Cohen’s D. The structure of the function is as follows:

cohensD(DV ~ IV, data = your_dataframe, method = "unequal")

Solution

cohensD(Flourishing_post ~ GroupAssignment, data = dog_independent, method = "unequal")

Step 3: Interpreting the output

Below are the outputs for the descriptive statistics (table), Welch t-test (main output), and Cohen’s D (last line starting with [1]). Based on these, write up the results in APA style and provide an interpretation.

GroupAssignment	n	mean_Flourishing	sd_Flourishing
Control	94	5.718085	0.7709738
Direct	95	5.776316	0.8638912


    Welch Two Sample t-test

data:  Flourishing_post by GroupAssignment
t = -0.48902, df = 185.05, p-value = 0.6254
alternative hypothesis: true difference in means between group Control and group Direct is not equal to 0
95 percent confidence interval:
 -0.2931533  0.1766920
sample estimates:
mean in group Control  mean in group Direct 
             5.718085              5.776316

[1] 0.0711213

The Welch two-sample t-test revealed that there is in psychological well-being scores between direct (N = , M = , SD = ) and control group (N = , M = , SD = ), t() = , p = , d = . The strength of the association between the variables is considered . We therefore .

Task 1: Open the R project for the lab

Task 2: Create a new .Rmd file

Task 3: Load in the library and read in the data

Task 4: Tidy data for a two-sample t-test

Task 5: Compute descriptives

Task 6: Check assumptions

Assumption 1: Continuous DV

Assumption 2: Data are independent

Assumption 3: Homoscedasticity (homogeneity of variance)

Assumption 4: DV should be approximately normally distributed

Conclusion from assumption tests

Task 7: Computing a two-sample t-test with effect size & interpret the output

Task 2: Create a new `.Rmd` file