9 One and Two Sample t-test
In this chapter we’ll use the
t.test()
function for fitting one and two sample t-test models, which will take the following forms:
\[y_i=\beta_0+\beta_1\left(X1\right)+\epsilon_i\] \[y_i=\beta_0+\beta_1\left(X1\right)+U_i+\epsilon_{ij}\]
The first equation is the two sample independent t-test model, and the second is the two sample dependent t-test model.
The code in this chapter only works if you’re following along with the Github folder for this book (which you can download here), you’ve correctly set your working directory to the data folder (which you can learn how to do in Chapter 4), and run the code in the order it appears in this chapter.
Importing
For this chapter we’ll be using three datasets: rmr.csv for the two sample indepedent t-test, lactate_threshold.csv for the two sample dependent t-test, and the third dataset for the one sample t-test will be created in R.
First we’ll create the data for the one sample t-test. The code below creates a random normal distribution of 100 samples with a mean of 54.3 and a standard deviation of 5.3, which are both arbitrary values.
The
set.seed()
function is used to ensure that the same random normal distribution is created every time. Random normal distributions created in R are not truly random, and the values can be replicated with theset.seed()
function. Doing this will allow you to copy and past the code into your R session and get the same results.
And we’ll import the two other datasets:
Viewing
Data for Two Sample Independent
The dataset for the two sample independent t-test contains resting metabolic rate values, RMR
, for several males and females, which was collected using a metabolic cart.
SubID Sex VO2_abs VO2_rel RER kcal_L RMR
1 1 M 0.28 4.1 0.80 4.80 1935.360
2 2 M 0.29 3.6 0.92 4.92 2054.592
3 3 M 0.29 2.8 0.94 4.99 2083.824
4 4 M 0.52 7.3 0.89 4.92 3684.096
5 5 M 0.28 3.4 0.86 4.86 1959.552
6 6 M 0.27 3.7 0.91 4.92 1912.896
'data.frame': 16 obs. of 7 variables:
$ SubID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Sex : chr "M" "M" "M" "M" ...
$ VO2_abs: num 0.28 0.29 0.29 0.52 0.28 0.27 0.32 0.3 0.23 0.17 ...
$ VO2_rel: num 4.1 3.6 2.8 7.3 3.4 3.7 3.4 3.2 5.2 2.8 ...
$ RER : num 0.8 0.92 0.94 0.89 0.86 0.91 1.07 0.89 0.86 0.83 ...
$ kcal_L : num 4.8 4.92 4.99 4.92 4.86 4.92 5.05 4.92 4.86 4.83 ...
$ RMR : num 1935 2055 2084 3684 1960 ...
Data for Two Sample Dependent
The dataset for the two sample dependent t-test contains heart rate and VO2 data at lactate threshold and ventilatory threshold for several males and females. Lactate threshold is the point at which the blood concentration of lactate begins to increase exponentially ( source). Ventilatory threshold is the point during exercise at which ventilation starts to increase at a faster rate than VO2 (source).
SubID VT_VO2_abs VT_HR LT_VO2_abs LT_HR
1 1 3.3 179 3.3 188
2 2 2.7 185 2.6 177
3 3 4.3 182 4.4 175
4 4 3.3 188 3.5 186
5 5 2.5 191 2.4 183
6 6 2.6 193 2.5 193
'data.frame': 24 obs. of 5 variables:
$ SubID : int 1 2 3 4 5 6 7 8 9 10 ...
$ VT_VO2_abs: num 3.3 2.7 4.3 3.3 2.5 2.6 3.77 1.58 4.22 2.51 ...
$ VT_HR : num 179 185 182 188 191 193 183 187 197 188 ...
$ LT_VO2_abs: num 3.3 2.6 4.4 3.5 2.4 2.5 3.7 1.5 3.8 2.3 ...
$ LT_HR : num 188 177 175 186 183 193 179 186 193 188 ...
More examples of viewing data can be found in Chapter 5
Formatting
Data for Two Sample Independent
We’ll be comparing the resting metabolic rate of males and females for the independent t-test, so we need to filter the data to include only males or females, but not both. There are many ways to do this in R, and one way is to split the data2
object into data_M
and data_F
objects, which only contains data for the males and females, respectively. This can be accomplished with the filter()
function. The filter()
function comes from the tidyverse
package, so make sure you’ve loaded that library into your workspace.
data2_M <- filter(data2, Sex == "M")
data2_F <- filter(data2, Sex == "F")
# Make sure you've loaded the tidyverse package
Now the dataset has been separated into two separate objects which each contain only one sex.
SubID Sex VO2_abs VO2_rel RER kcal_L RMR
1 1 M 0.28 4.1 0.80 4.80 1935.360
2 2 M 0.29 3.6 0.92 4.92 2054.592
3 3 M 0.29 2.8 0.94 4.99 2083.824
4 4 M 0.52 7.3 0.89 4.92 3684.096
5 5 M 0.28 3.4 0.86 4.86 1959.552
6 6 M 0.27 3.7 0.91 4.92 1912.896
SubID Sex VO2_abs VO2_rel RER kcal_L RMR
1 9 F 0.23 5.2 0.86 4.86 1609.632
2 10 F 0.17 2.8 0.83 4.83 1182.384
3 11 F 0.25 5.2 0.92 5.20 1872.000
4 12 F 0.29 3.5 0.83 4.83 2017.008
5 13 F 0.29 4.4 0.79 4.78 1996.128
6 14 F 0.27 3.9 0.90 4.92 1912.896
More examples of formatting data can be found in Chapter 6
Modeling
The t.test()
Function
t.test(x, y = NULL,
alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, ...)
The t.test()
function can be used for both one sample and two sample tests. For two sample tests, the observations can be either dependent or independent. Notice that there are many arguments for this function, but the x
argument is the only argument that needs to be specified for one sample tests. x
and y
are the two arguments that need to be specified for two sample independent t-tests, and x
, y
, and paired
need to be specified for dependent t-tests. If you’d like to learn more about functions and arguments, Chapter 2 covers basic programming concepts, including functions and arguments.
One Sample
For the one sample t-test, the x
argument should be set equal to the object that contains the dataset, which in this case is the object data1
. If the data object contained multiple columns then you would need to specify the column to use in your analysis (for example: data1$Column1
), but in this example the data1
object only has the one column. The default value for the population mean, mu
, is 0, but in this made up example we’ll say mu
is equal to 49.1.
One Sample t-test
data: data1
t = 12.136, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 49.1
95 percent confidence interval:
53.93253 55.82168
sample estimates:
mean of x
54.8771
Two Sample: Independent / Unpaired
The goal of this analysis is to determine if there is a difference between the resting metabolic rate of males and females, on average.
The x
and y
arguments should be set equal to the object(s) that contain the data you want to compare. In this example, those are the data_M
and data_F
objects. Specifically, we want to compare the RMR
column from each dataset, which can be selected with the dollar sign, $
. By default, paired
is set equal to FALSE
, but it’s written out explicitly below to make the code more clear.
Welch Two Sample t-test
data: data2_M$RMR and data2_F$RMR
t = 2.2322, df = 10.025, p-value = 0.04959
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.119132 1031.252868
sample estimates:
mean of x mean of y
2260.350 1744.164
Two Sample: Dependent / Paired
The goal of this analysis is to determine if lactate and ventilatory thresholds occur at the same exercise intensity, where exercise intensity is measured as VO2 consumption.
The x
and y
arguments should be set equal to the object(s) that contain the data you want to compare, which is this example is the data3
object. Specifically, we want to compare the VT_VO2_abs
and LT_VO2_abs
columns from the dataset, which can be selected with the dollar sign, $
. By default, paired
is set equal to FALSE
, which needs to be changed to TRUE
.
Paired t-test
data: data3$VT_VO2_abs and data3$LT_VO2_abs
t = 1.0898, df = 23, p-value = 0.2871
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.05052285 0.16302285
sample estimates:
mean of the differences
0.05625
Optional Arguments
There are additional arguments for the t-test()
function that can be specified. By default, alternative
is set equal to "two.sided"
, but this can be changed to "less"
or "greater"
. var.equal
is set equal to FALSE
by default, but this can be set to TRUE
. 0.95 is the default confidence level (conf.level
), but this can be set to any desired confidence level.