10 One-Way ANOVA
In this chapter we’ll use the
lm()
andezANOVA()
functions for fitting a one-way ANOVA model with dummy codes and contrast codes, which will take the following form:
\[y_i=\beta_0+\beta_1\left(X1_L\right)+\beta_2\left(X1_Q\right)+\epsilon_i\]
The code in this chapter only works if you’re following along with the Github folder for this book (which you can download here), you’ve correctly set your working directory to the data folder (which you can learn how to do in Chapter 4), and run the code in the order it appears in this chapter.
Importing
For this chapter we’ll be using the data_800m.csv file. This dataset contains information about a subject’s VO2max and their 800 meter run time.
Viewing
The data_800m.csv dataset contains information about a subject’s VO2max and their 800 meter run time. Base off of that run time, they were categorized as faster
, medium
, or slower
.
subID VO2 time groups
1 s001 74.00 116 faster
2 s002 68.00 117 faster
3 s003 70.50 119 faster
4 s004 61.00 121 faster
5 s005 73.25 121 faster
6 s006 68.25 122 faster
More examples of viewing data can be found in Chapter 5
Formatting
There’s no formatting that needs to be done for this dataset. By default, R will create dummy codes for categorical variables. Lambda 1 and Lambda 2 will be assigned the codes 0,1,0
and 0,0,1
. However, we could create a set of contrast codes to analyze linear and quadratic effects. We’ll recode the faster
, medium
, and slower
categories as -1
, 0
, 1
and 0
, 1
, 2
for the linear and quadratic codes, respectively. One easy way to do this is with the mapvalues()
function which comes from the plyr
package. You’ll first need to install the package (install.packages(plyr)
) and then load the library (library(plyr)
) before using this function.
data$lin.c <- mapvalues(x = data$groups,
from = c("faster", "medium", "slower"),
to = c(1,0,-1))
data$quad.c <- mapvalues(x = data$groups,
from = c("faster", "medium", "slower"),
to = c(-1,2,-1))
More examples of formatting data can be found in Chapter 6
Modeling
The ezANOVA()
Function
ezANOVA(
data
, dv
, wid
, within = NULL
, within_full = NULL
, within_covariates = NULL
, between = NULL
, between_covariates = NULL
, observed = NULL
, diff = NULL
, reverse_diff = FALSE
, type = 2
, white.adjust = FALSE
, detailed = FALSE
, return_aov = FALSE
)
The ezANOVA()
function is used for ANOVA models. There are many arguments for this function, but not all of them need to be specified. The ones that do need to be specified will depend on the exact ANOVA that you’re performing. If you’d like to learn more about functions and arguments, Chapter 2 covers basic programming concepts, including functions and arguments.
The lm()
Function
lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, offset, ...)
The lm()
(linear model) function is used for fitting linear models. Notice that there are many arguments for this function, but the formula
and data
arguments are the only ones that need to be specified. If you’d like to learn more about functions and arguments, Chapter 2 covers basic programming concepts, including functions and arguments.
Model with Dummy Codes
By default, the lm()
and ezANOVA()
functions use the dummy codes 0
, 1
, 0
and 0
, 0
, 1
for three categories. In the formatting section we created contrast codes, but in this example we will not use those contrast codes. We’ll let the function create dummy codes for us.
In this example we’ll use the ezANOVA()
function. We need to specify the data
, dv
, wid
, between
, and type
arguments. You can also set detailed
and return_aov
to TRUE
for a more comprehensive output. The data
argument is set equal to the entire dataset, the dv
argument, which is short for dependent variable, is set equal to the column of the dependent variable. The wid
argument is set equal to the column that contains the variable specifying the case/Ss identifier; usually the subID
column, or equivalent. The between
argument is set equal to the between-subjects factor(s), which in this case is the groups
column.
ez::ezANOVA(data = data,
dv = VO2,
wid = subID,
between = .(groups),
type = 3,
detailed = TRUE,
return_aov = TRUE)
$ANOVA
Effect DFn DFd SSn SSd F p p<.05
1 (Intercept) 1 18 82657.440 398.9286 3729.574754 2.528806e-22 *
2 groups 2 18 398.756 398.9286 8.996106 1.956932e-03 *
ges
1 0.9951969
2 0.4998918
$`Levene's Test for Homogeneity of Variance`
DFn DFd SSn SSd F p p<.05
1 2 18 18.05357 168.1071 0.9665392 0.3992852
$aov
Call:
aov(formula = formula(aov_formula), data = data)
Terms:
groups Residuals
Sum of Squares 398.7560 398.9286
Deg. of Freedom 2 18
Residual standard error: 4.707728
Estimated effects may be unbalanced
Notice the formatting in the code above. When using the between
and within
arguments, you need to use a period and parentheses, .()
, when listing the column. You can list additional columns for both of these arguments by separating each column with a comma: between = .(groups, Column2)
.
Model with Contrast Codes
Rather than using the default dummy codes, we can use the set of contrast codes that were created in the formatting section above. We’ll use the lm()
function to implement this model.
Call:
lm(formula = VO2 ~ lin.c + quad.c, data = data)
Residuals:
Min 1Q Median 3Q Max
-7.964 -1.750 -0.500 2.250 8.286
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.964 1.779 32.576 < 2e-16 ***
lin.c0 3.786 2.516 1.504 0.149817
lin.c1 10.536 2.516 4.187 0.000554 ***
quad.c2 NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.708 on 18 degrees of freedom
Multiple R-squared: 0.4999, Adjusted R-squared: 0.4443
F-statistic: 8.996 on 2 and 18 DF, p-value: 0.001957
Anova Table (Type III tests)
Response: VO2
Sum Sq Df F values Pr(>F)
lin.c 388.50 1 17.53 0.0005542 ***
quad.c 0.00 0
Residuals 398.93 18
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For comparison purposes, here’s what the model printout looks like with the dummy codes using the lm()
function:
Call:
lm(formula = VO2 ~ groups, data = data)
Residuals:
Min 1Q Median 3Q Max
-7.964 -1.750 -0.500 2.250 8.286
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 68.500 1.779 38.497 < 2e-16 ***
groupsmedium -6.750 2.516 -2.682 0.015209 *
groupsslower -10.536 2.516 -4.187 0.000554 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.708 on 18 degrees of freedom
Multiple R-squared: 0.4999, Adjusted R-squared: 0.4443
F-statistic: 8.996 on 2 and 18 DF, p-value: 0.001957