10 One-Way ANOVA

In this chapter we’ll use the lm() and ezANOVA() functions for fitting a one-way ANOVA model with dummy codes and contrast codes, which will take the following form:

\[y_i=\beta_0+\beta_1\left(X1_L\right)+\beta_2\left(X1_Q\right)+\epsilon_i\]

The code in this chapter only works if you’re following along with the Github folder for this book (which you can download here), you’ve correctly set your working directory to the data folder (which you can learn how to do in Chapter 4), and run the code in the order it appears in this chapter.

Importing

For this chapter we’ll be using the data_800m.csv file. This dataset contains information about a subject’s VO2max and their 800 meter run time.

data <- read.csv("data_800m.csv")

Viewing

The data_800m.csv dataset contains information about a subject’s VO2max and their 800 meter run time. Base off of that run time, they were categorized as faster, medium, or slower.

head(data)
  subID   VO2 time groups
1  s001 74.00  116 faster
2  s002 68.00  117 faster
3  s003 70.50  119 faster
4  s004 61.00  121 faster
5  s005 73.25  121 faster
6  s006 68.25  122 faster

More examples of viewing data can be found in Chapter 5

Formatting

There’s no formatting that needs to be done for this dataset. By default, R will create dummy codes for categorical variables. Lambda 1 and Lambda 2 will be assigned the codes 0,1,0 and 0,0,1. However, we could create a set of contrast codes to analyze linear and quadratic effects. We’ll recode the faster, medium, and slower categories as -1, 0, 1 and 0, 1, 2 for the linear and quadratic codes, respectively. One easy way to do this is with the mapvalues() function which comes from the plyr package. You’ll first need to install the package (install.packages(plyr)) and then load the library (library(plyr)) before using this function.

data$lin.c <- mapvalues(x = data$groups, 
                        from = c("faster", "medium", "slower"),
                        to = c(1,0,-1))

data$quad.c <- mapvalues(x = data$groups, 
                         from = c("faster", "medium", "slower"),
                         to = c(-1,2,-1))

More examples of formatting data can be found in Chapter 6

Modeling

The ezANOVA()Function

ezANOVA(
    data
    , dv
    , wid
    , within = NULL
    , within_full = NULL
    , within_covariates = NULL
    , between = NULL
    , between_covariates = NULL
    , observed = NULL
    , diff = NULL
    , reverse_diff = FALSE
    , type = 2
    , white.adjust = FALSE
    , detailed = FALSE
    , return_aov = FALSE
)

The ezANOVA() function is used for ANOVA models. There are many arguments for this function, but not all of them need to be specified. The ones that do need to be specified will depend on the exact ANOVA that you’re performing. If you’d like to learn more about functions and arguments, Chapter 2 covers basic programming concepts, including functions and arguments.

The lm() Function

lm(formula, data, subset, weights, na.action,
   method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
   singular.ok = TRUE, contrasts = NULL, offset, ...)

The lm() (linear model) function is used for fitting linear models. Notice that there are many arguments for this function, but the formula and dataarguments are the only ones that need to be specified. If you’d like to learn more about functions and arguments, Chapter 2 covers basic programming concepts, including functions and arguments.

Model with Dummy Codes

By default, the lm() and ezANOVA() functions use the dummy codes 0, 1, 0 and 0, 0, 1 for three categories. In the formatting section we created contrast codes, but in this example we will not use those contrast codes. We’ll let the function create dummy codes for us.

In this example we’ll use the ezANOVA() function. We need to specify the data, dv, wid, between, and type arguments. You can also set detailed and return_aov to TRUE for a more comprehensive output. The data argument is set equal to the entire dataset, the dv argument, which is short for dependent variable, is set equal to the column of the dependent variable. The wid argument is set equal to the column that contains the variable specifying the case/Ss identifier; usually the subID column, or equivalent. The between argument is set equal to the between-subjects factor(s), which in this case is the groups column.

ez::ezANOVA(data = data, 
            dv = VO2, 
            wid = subID, 
            between = .(groups),
            type = 3,
            detailed = TRUE, 
            return_aov = TRUE)
$ANOVA
       Effect DFn DFd       SSn      SSd           F            p p<.05
1 (Intercept)   1  18 82657.440 398.9286 3729.574754 2.528806e-22     *
2      groups   2  18   398.756 398.9286    8.996106 1.956932e-03     *
        ges
1 0.9951969
2 0.4998918

$`Levene's Test for Homogeneity of Variance`
  DFn DFd      SSn      SSd         F         p p<.05
1   2  18 18.05357 168.1071 0.9665392 0.3992852      

$aov
Call:
   aov(formula = formula(aov_formula), data = data)

Terms:
                  groups Residuals
Sum of Squares  398.7560  398.9286
Deg. of Freedom        2        18

Residual standard error: 4.707728
Estimated effects may be unbalanced

Notice the formatting in the code above. When using the between and within arguments, you need to use a period and parentheses, .(), when listing the column. You can list additional columns for both of these arguments by separating each column with a comma: between = .(groups, Column2).

Model with Contrast Codes

Rather than using the default dummy codes, we can use the set of contrast codes that were created in the formatting section above. We’ll use the lm() function to implement this model.

my_model <- lm(formula = VO2 ~ lin.c + quad.c, data = data)
summary(my_model)

Call:
lm(formula = VO2 ~ lin.c + quad.c, data = data)

Residuals:
   Min     1Q Median     3Q    Max 
-7.964 -1.750 -0.500  2.250  8.286 

Coefficients: (1 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   57.964      1.779  32.576  < 2e-16 ***
lin.c0         3.786      2.516   1.504 0.149817    
lin.c1        10.536      2.516   4.187 0.000554 ***
quad.c2           NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.708 on 18 degrees of freedom
Multiple R-squared:  0.4999,    Adjusted R-squared:  0.4443 
F-statistic: 8.996 on 2 and 18 DF,  p-value: 0.001957
Anova(my_model, type = "III", singular.ok = T)
Anova Table (Type III tests)

Response: VO2
          Sum Sq Df F values    Pr(>F)    
lin.c     388.50  1    17.53 0.0005542 ***
quad.c      0.00  0                       
Residuals 398.93 18                       
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For comparison purposes, here’s what the model printout looks like with the dummy codes using the lm() function:

summary(lm(VO2 ~ groups, data = data))

Call:
lm(formula = VO2 ~ groups, data = data)

Residuals:
   Min     1Q Median     3Q    Max 
-7.964 -1.750 -0.500  2.250  8.286 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)    68.500      1.779  38.497  < 2e-16 ***
groupsmedium   -6.750      2.516  -2.682 0.015209 *  
groupsslower  -10.536      2.516  -4.187 0.000554 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.708 on 18 degrees of freedom
Multiple R-squared:  0.4999,    Adjusted R-squared:  0.4443 
F-statistic: 8.996 on 2 and 18 DF,  p-value: 0.001957