2A Lab 7 Week 9

This is the pair coding activity related to Chapter 6.

Important

The pair-coding tasks relating to the inferential chapters are a bit longer than the earlier ones, and you might not finish every step in the lab. Try to work through as much as you can without rushing, and treat any remaining steps as a “Challenge Yourself” activity.

Intentions behind it?

The purpose of the pair-coding tasks from this point on is to walk you through a more complete analysis using a dataset different from the one in the respective chapters. I have chosen not to skip major steps such as descriptives and assumption checks, because real-life data analysis involves more than just running a test. Some smaller steps may be omitted, though, so refer back to the chapters for the full version of the analysis process expected for your lab reports.

Task 1: Open the R project for the lab

Task 2: Create a new .Rmd file

… and name it something useful. If you need help, have a look at Section 1.3.

Task 3: Load in the library and read in the data

The data should already be in your project folder. If you want a fresh copy, you can download the data again here: data_pair_coding.

We are using the packages tidyverse and lsr today, and the data file we need to read in is dog_data_clean_wide.csv. I’ve named my data object dog_data_wide to shorten the name but feel free to use whatever object name sounds intuitive to you.

If you have not worked through chapter 6 yet, you may need to install the package lsr before you can load it into the library. Run install.packages("lsr") in your CONSOLE.

Task 4: Tidy data for a Chi-Square t-test

Look at dog_data_wide and choose two categorical variables. To guide you through this example, I have selected Year of Study and whether or not the students owned pets as my categorical variable.

  • Step 1: Select all relevant columns from dog_data_wide. In my case, those would be the participant ID RID, Year_of_Study, and Live_Pets. Store this data in an object called dog_chi.

  • Step 2: Check if we have any missing values in the dog_chi. If so remove them with the function drop_na().

  • Step 3: Convert Year_of_Study and Live_Pets into factors. Feel free to order the categories meaningfully.

dog_chi <- ??? %>% 
  # Step 1
  select(???, ???, ???) %>% 
  # Step 2
  drop_na() %>% 
  # Step 3
  mutate(Year_of_Study = ???(Year_of_Study,
                                levels = c("First", "Second", "Third", "Fourth", "Fifth or above")),
         Live_Pets = ???(Live_Pets,
                            levels = c("yes", "no")))
# loading tidyverse and lsr into the library
library(tidyverse)
library(lsr)

# reading in `dog_data_clean_wide.csv`
dog_data_wide <- read_csv("dog_data_clean_wide.csv")

# Task 4: Tidying 
dog_chi <- dog_data_wide %>% 
  # Step 1
  select(RID, Year_of_Study, Live_Pets) %>% 
  # Step 2
  drop_na() %>% 
  # Step 3
  mutate(Year_of_Study = factor(Year_of_Study,
                                levels = c("First", "Second", "Third", "Fourth", "Fifth or above")),
         Live_Pets = factor(Live_Pets,
                            levels = c("yes", "no")))

Task 5: Compute descriptives

Create a frequency table (or contingency table to be more exact) from dog_chi, i.e., we need counts for each combination of the variables. Store the data in a new data object dog_chi_contingency. dog_chi_contingency should look like this:

Year_of_Study yes no
First 21 89
Second 37 64
Third 12 22
Fourth 12 18
Fifth or above 3 2
dog_chi_contingency <- dog_chi %>% 
  count(???, ???) %>% 
  pivot_wider(names_from = ???, values_from = n)
# Task 5: Frequency table
dog_chi_contingency <- dog_chi %>% 
  count(Live_Pets, Year_of_Study) %>% 
  pivot_wider(names_from = Live_Pets, values_from = n)

Task 6: Check assumptions

  1. Both variables should be categorical, measured at either the ordinal or nominal level. Answer: as Year_of_Study is , and Live_Pets is .

  2. Each observation in the dataset has to be independent, meaning the value of one observation does not affect the value of any other. Answer:

  3. Cells in the contingency table are mutually exclusive. Answer: because each individual can belong to in the contingency table.

Task 7: Compute a chi-square test & interpret the output

  • Step 1: Use the function as.data.frame to turn dog_chi into a dataframe. Store this output in a new data object called dog_chi_df.
??? <- as.data.frame(???)
dog_chi_df <- as.data.frame(dog_chi)
  • Step 2: Run the associationTest() function from the lsr package to compute the Chi-Square test. The structure of the function is as follows:
associationTest(formula = ~ Variable1 + Variable2, data = your_dataframe)
associationTest(formula = ~ Year_of_Study + Live_Pets, data = dog_chi_df)
  • Step 3: Interpreting the output
Warning in associationTest(formula = ~Year_of_Study + Live_Pets, data =
dog_chi_df): Expected frequencies too small: chi-squared approximation may be
incorrect

     Chi-square test of categorical association

Variables:   Year_of_Study, Live_Pets 

Hypotheses: 
   null:        variables are independent of one another
   alternative: some contingency exists between variables

Observed contingency table:
                Live_Pets
Year_of_Study    yes no
  First           21 89
  Second          37 64
  Third           12 22
  Fourth          12 18
  Fifth or above   3  2

Expected contingency table under the null hypothesis:
                Live_Pets
Year_of_Study      yes    no
  First          33.39 76.61
  Second         30.66 70.34
  Third          10.32 23.68
  Fourth          9.11 20.89
  Fifth or above  1.52  3.48

Test results: 
   X-squared statistic:  12.276 
   degrees of freedom:  4 
   p-value:  0.015 

Other information: 
   estimated effect size (Cramer's v):  0.209 
   warning: expected frequencies too small, results may be inaccurate

The Chi-Square test revealed that there is between Year of Study and whether students live with pets, \(\chi^2\) () = , p = , V = . The strength of the association between the variables is considered . We therefore .

Check which numbers need to have 2 or 3 decimal spaces and (don’t) have a leading 0.