9 One and Two Sample t-test

In this chapter we’ll use the t.test() function for fitting one and two sample t-test models, which will take the following forms:

\[y_i=\beta_0+\beta_1\left(X1\right)+\epsilon_i\] \[y_i=\beta_0+\beta_1\left(X1\right)+U_i+\epsilon_{ij}\]

The first equation is the two sample independent t-test model, and the second is the two sample dependent t-test model.

The code in this chapter only works if you’re following along with the Github folder for this book (which you can download here), you’ve correctly set your working directory to the data folder (which you can learn how to do in Chapter 4), and run the code in the order it appears in this chapter.

Importing

For this chapter we’ll be using three datasets: rmr.csv for the two sample indepedent t-test, lactate_threshold.csv for the two sample dependent t-test, and the third dataset for the one sample t-test will be created in R.

First we’ll create the data for the one sample t-test. The code below creates a random normal distribution of 100 samples with a mean of 54.3 and a standard deviation of 5.3, which are both arbitrary values.

set.seed(1)
data1 <- rnorm(n = 100, mean = 54.3, sd = 5.3)

The set.seed() function is used to ensure that the same random normal distribution is created every time. Random normal distributions created in R are not truly random, and the values can be replicated with the set.seed() function. Doing this will allow you to copy and past the code into your R session and get the same results.

And we’ll import the two other datasets:

data2 <- read.csv("rmr.csv")
data3 <- read.csv("lactate_threshold.csv")

Viewing

Data for Two Sample Independent

The dataset for the two sample independent t-test contains resting metabolic rate values, RMR, for several males and females, which was collected using a metabolic cart.

head(data2)
  SubID Sex VO2_abs VO2_rel  RER kcal_L      RMR
1     1   M    0.28     4.1 0.80   4.80 1935.360
2     2   M    0.29     3.6 0.92   4.92 2054.592
3     3   M    0.29     2.8 0.94   4.99 2083.824
4     4   M    0.52     7.3 0.89   4.92 3684.096
5     5   M    0.28     3.4 0.86   4.86 1959.552
6     6   M    0.27     3.7 0.91   4.92 1912.896
str(data2)
'data.frame':   16 obs. of  7 variables:
 $ SubID  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Sex    : chr  "M" "M" "M" "M" ...
 $ VO2_abs: num  0.28 0.29 0.29 0.52 0.28 0.27 0.32 0.3 0.23 0.17 ...
 $ VO2_rel: num  4.1 3.6 2.8 7.3 3.4 3.7 3.4 3.2 5.2 2.8 ...
 $ RER    : num  0.8 0.92 0.94 0.89 0.86 0.91 1.07 0.89 0.86 0.83 ...
 $ kcal_L : num  4.8 4.92 4.99 4.92 4.86 4.92 5.05 4.92 4.86 4.83 ...
 $ RMR    : num  1935 2055 2084 3684 1960 ...

Data for Two Sample Dependent

The dataset for the two sample dependent t-test contains heart rate and VO2 data at lactate threshold and ventilatory threshold for several males and females. Lactate threshold is the point at which the blood concentration of lactate begins to increase exponentially ( source). Ventilatory threshold is the point during exercise at which ventilation starts to increase at a faster rate than VO2 (source).

head(data3)
  SubID VT_VO2_abs VT_HR LT_VO2_abs LT_HR
1     1        3.3   179        3.3   188
2     2        2.7   185        2.6   177
3     3        4.3   182        4.4   175
4     4        3.3   188        3.5   186
5     5        2.5   191        2.4   183
6     6        2.6   193        2.5   193
str(data3)
'data.frame':   24 obs. of  5 variables:
 $ SubID     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ VT_VO2_abs: num  3.3 2.7 4.3 3.3 2.5 2.6 3.77 1.58 4.22 2.51 ...
 $ VT_HR     : num  179 185 182 188 191 193 183 187 197 188 ...
 $ LT_VO2_abs: num  3.3 2.6 4.4 3.5 2.4 2.5 3.7 1.5 3.8 2.3 ...
 $ LT_HR     : num  188 177 175 186 183 193 179 186 193 188 ...

More examples of viewing data can be found in Chapter 5

Formatting

Data for Two Sample Independent

We’ll be comparing the resting metabolic rate of males and females for the independent t-test, so we need to filter the data to include only males or females, but not both. There are many ways to do this in R, and one way is to split the data2 object into data_M and data_F objects, which only contains data for the males and females, respectively. This can be accomplished with the filter() function. The filter() function comes from the tidyverse package, so make sure you’ve loaded that library into your workspace.

data2_M <- filter(data2, Sex == "M")
data2_F <- filter(data2, Sex == "F")
# Make sure you've loaded the tidyverse package

Now the dataset has been separated into two separate objects which each contain only one sex.

head(data2_M)
  SubID Sex VO2_abs VO2_rel  RER kcal_L      RMR
1     1   M    0.28     4.1 0.80   4.80 1935.360
2     2   M    0.29     3.6 0.92   4.92 2054.592
3     3   M    0.29     2.8 0.94   4.99 2083.824
4     4   M    0.52     7.3 0.89   4.92 3684.096
5     5   M    0.28     3.4 0.86   4.86 1959.552
6     6   M    0.27     3.7 0.91   4.92 1912.896
head(data2_F)
  SubID Sex VO2_abs VO2_rel  RER kcal_L      RMR
1     9   F    0.23     5.2 0.86   4.86 1609.632
2    10   F    0.17     2.8 0.83   4.83 1182.384
3    11   F    0.25     5.2 0.92   5.20 1872.000
4    12   F    0.29     3.5 0.83   4.83 2017.008
5    13   F    0.29     4.4 0.79   4.78 1996.128
6    14   F    0.27     3.9 0.90   4.92 1912.896

More examples of formatting data can be found in Chapter 6

Modeling

The t.test()Function

t.test(x, y = NULL,
       alternative = c("two.sided", "less", "greater"),
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, ...)

The t.test() function can be used for both one sample and two sample tests. For two sample tests, the observations can be either dependent or independent. Notice that there are many arguments for this function, but the x argument is the only argument that needs to be specified for one sample tests. x and y are the two arguments that need to be specified for two sample independent t-tests, and x, y, and paired need to be specified for dependent t-tests. If you’d like to learn more about functions and arguments, Chapter 2 covers basic programming concepts, including functions and arguments.

One Sample

For the one sample t-test, the x argument should be set equal to the object that contains the dataset, which in this case is the object data1. If the data object contained multiple columns then you would need to specify the column to use in your analysis (for example: data1$Column1), but in this example the data1 object only has the one column. The default value for the population mean, mu, is 0, but in this made up example we’ll say mu is equal to 49.1.

t.test(x = data1, mu = 49.1)

    One Sample t-test

data:  data1
t = 12.136, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 49.1
95 percent confidence interval:
 53.93253 55.82168
sample estimates:
mean of x 
  54.8771 

Two Sample: Independent / Unpaired

The goal of this analysis is to determine if there is a difference between the resting metabolic rate of males and females, on average.

The x and y arguments should be set equal to the object(s) that contain the data you want to compare. In this example, those are the data_M and data_F objects. Specifically, we want to compare the RMR column from each dataset, which can be selected with the dollar sign, $. By default, paired is set equal to FALSE, but it’s written out explicitly below to make the code more clear.

t.test(x = data2_M$RMR, y = data2_F$RMR, paired = FALSE)

    Welch Two Sample t-test

data:  data2_M$RMR and data2_F$RMR
t = 2.2322, df = 10.025, p-value = 0.04959
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
    1.119132 1031.252868
sample estimates:
mean of x mean of y 
 2260.350  1744.164 

Two Sample: Dependent / Paired

The goal of this analysis is to determine if lactate and ventilatory thresholds occur at the same exercise intensity, where exercise intensity is measured as VO2 consumption.

The x and y arguments should be set equal to the object(s) that contain the data you want to compare, which is this example is the data3 object. Specifically, we want to compare the VT_VO2_abs and LT_VO2_abs columns from the dataset, which can be selected with the dollar sign, $. By default, paired is set equal to FALSE, which needs to be changed to TRUE.

t.test(x = data3$VT_VO2_abs, y = data3$LT_VO2_abs, paired = TRUE)

    Paired t-test

data:  data3$VT_VO2_abs and data3$LT_VO2_abs
t = 1.0898, df = 23, p-value = 0.2871
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.05052285  0.16302285
sample estimates:
mean of the differences 
                0.05625 

Optional Arguments

There are additional arguments for the t-test() function that can be specified. By default, alternative is set equal to "two.sided", but this can be changed to "less" or "greater". var.equal is set equal to FALSE by default, but this can be set to TRUE. 0.95 is the default confidence level (conf.level), but this can be set to any desired confidence level.