2A Lab 7 Week 9
This is the pair coding activity related to Chapter 6.
The pair-coding tasks relating to the inferential chapters are a bit longer than the earlier ones, and you might not finish every step in the lab. Try to work through as much as you can without rushing, and treat any remaining steps as a “Challenge Yourself” activity.
Intentions behind it?
The purpose of the pair-coding tasks from this point on is to walk you through a more complete analysis using a dataset different from the one in the respective chapters. I have chosen not to skip major steps such as descriptives and assumption checks, because real-life data analysis involves more than just running a test. Some smaller steps may be omitted, though, so refer back to the chapters for the full version of the analysis process expected for your lab reports.
Task 1: Open the R project for the lab
Task 2: Create a new .Rmd
file
… and name it something useful. If you need help, have a look at Section 1.3.
Task 3: Load in the library and read in the data
The data should already be in your project folder. If you want a fresh copy, you can download the data again here: data_pair_coding.
We are using the packages tidyverse
and lsr
today, and the data file we need to read in is dog_data_clean_wide.csv
. I’ve named my data object dog_data_wide
to shorten the name but feel free to use whatever object name sounds intuitive to you.
If you have not worked through chapter 6 yet, you may need to install the package lsr
before you can load it into the library. Run install.packages("lsr")
in your CONSOLE.
Task 4: Tidy data for a Chi-Square t-test
Look at dog_data_wide
and choose two categorical variables. To guide you through this example, I have selected Year of Study and whether or not the students owned pets as my categorical variable.
Step 1: Select all relevant columns from
dog_data_wide
. In my case, those would be the participant IDRID
,Year_of_Study
, andLive_Pets
. Store this data in an object calleddog_chi
.Step 2: Check if we have any missing values in the
dog_chi
. If so remove them with the functiondrop_na()
.Step 3: Convert
Year_of_Study
andLive_Pets
into factors. Feel free to order the categories meaningfully.
Task 5: Compute descriptives
Create a frequency table (or contingency table to be more exact) from dog_chi
, i.e., we need counts for each combination of the variables. Store the data in a new data object dog_chi_contingency
. dog_chi_contingency
should look like this:
Year_of_Study | yes | no |
---|---|---|
First | 21 | 89 |
Second | 37 | 64 |
Third | 12 | 22 |
Fourth | 12 | 18 |
Fifth or above | 3 | 2 |
Task 6: Check assumptions
Both variables should be categorical, measured at either the ordinal or nominal level. Answer: as
Year_of_Study
is , andLive_Pets
is .Each observation in the dataset has to be independent, meaning the value of one observation does not affect the value of any other. Answer:
Cells in the contingency table are mutually exclusive. Answer: because each individual can belong to in the contingency table.
Task 7: Compute a chi-square test & interpret the output
-
Step 1: Use the function
as.data.frame
to turndog_chi
into a dataframe. Store this output in a new data object calleddog_chi_df
.
-
Step 2: Run the
associationTest()
function from thelsr
package to compute the Chi-Square test. The structure of the function is as follows:
- Step 3: Interpreting the output
Warning in associationTest(formula = ~Year_of_Study + Live_Pets, data =
dog_chi_df): Expected frequencies too small: chi-squared approximation may be
incorrect
Chi-square test of categorical association
Variables: Year_of_Study, Live_Pets
Hypotheses:
null: variables are independent of one another
alternative: some contingency exists between variables
Observed contingency table:
Live_Pets
Year_of_Study yes no
First 21 89
Second 37 64
Third 12 22
Fourth 12 18
Fifth or above 3 2
Expected contingency table under the null hypothesis:
Live_Pets
Year_of_Study yes no
First 33.39 76.61
Second 30.66 70.34
Third 10.32 23.68
Fourth 9.11 20.89
Fifth or above 1.52 3.48
Test results:
X-squared statistic: 12.276
degrees of freedom: 4
p-value: 0.015
Other information:
estimated effect size (Cramer's v): 0.209
warning: expected frequencies too small, results may be inaccurate
The Chi-Square test revealed that there is between Year of Study and whether students live with pets, \(\chi^2\) () = , p = , V = . The strength of the association between the variables is considered . We therefore .