Consider CBF-levels were measured in 100 subjects with low-grade tumors and 100 subjects with high-grade tumors. Here we simulate the measurements from the normal distribution.
set.seed(12) # we set the seed to get a reproducible random result.
n = 100
low = rnorm(n = n, mean = 50, sd = 10)
high = rnorm(n = n, mean = 60, sd = 10)
In a real-world experiment you would first check whether your data is normally distributed (necessary for t-test). Draw quantile-quantile plots (QQ-plots) to check the normality-assumption (qqnorrm,qqline).
par(mfrow=c(1,2))
qqnorm(low,main='low')
qqline(low)
qqnorm(high,main='high')
qqline(high)
Illustrate the sample values by box-plots and histograms to check whether the assumption of equal variances seems reasonable.
boxplot(low, high, names=c("low", "high"),cex.lab=1.5, cex.axis=1.5,col=c(2,3))
grid()
hist(low, border=2, xlim=c(0, 100), ylim=c(0, 0.5*n), density=5, col=2)
hist(high, border=3, add=TRUE, density=2.5, col=3)
Perform the t-test (t.test).
a = t.test(low, high)
a = t.test(low, high, var.equal = TRUE)
#a = t.test(low, high, paired = FALSE,
# alternative = "two-sided",
# conf.level = 0.95)
a
##
## Two Sample t-test
##
## data: low and high
## t = -7.8758, df = 198, p-value = 2.19e-13
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.014823 -7.802415
## sample estimates:
## mean of x mean of y
## 49.68831 60.09693
# mean difference
a$estimate[1] - a$estimate[2]
## mean of x
## -10.40862
Interpret the results.
Repeat the above computer simulation with sample sizes \(n=5\) per group. Analyze whether the test results lead to a false positive or false negative decision.
set.seed(1)
n = 5 ### Sample Size
low = rnorm(n=n, mean=50, sd=10)
high = rnorm(n=n, mean=60, sd=10)
t.test(low, high, var.equal = TRUE)
##
## Two Sample t-test
##
## data: low and high
## t = -1.921, df = 8, p-value = 0.09098
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -22.133561 2.016246
## sample estimates:
## mean of x mean of y
## 51.29270 61.35136
Load the data frame in the dimarta_trial.csv file.
dimarta_df<-read.csv('./data/dimarta_trial.csv',sep=',')
str(dimarta_df)
## 'data.frame': 296 obs. of 6 variables:
## $ PatientID : chr "txdjezeo" "htxfjlxk" "vkdqhyez" "dbuvrwfq" ...
## $ arm : chr "I" "III" "I" "III" ...
## $ histamine_start: num 58.6 36.1 57.7 56.6 NA ...
## $ histamine_end : num 67 28 57.3 57.4 67.9 ...
## $ qol_start : int 3 3 2 2 5 2 4 2 3 1 ...
## $ qol_end : int 3 4 2 3 7 2 6 2 3 1 ...
Compute the difference between histamine_end level and histamine_start level and add the data to the data frame.
dimarta_df$histamine_change<-with(dimarta_df,histamine_end - histamine_start)
Use boxplots to check whether the assumption of equal variances of histamine changes between arms seems reasonable .
armI<-subset(dimarta_df,arm=='I')$histamine_change
armII<-subset(dimarta_df,arm=='II')$histamine_change
armIII<-subset(dimarta_df,arm=='III')$histamine_change
boxplot(armI,armII,armIII,names=c('arm I','arm II','arm III'),
col=c('coral1','lightskyblue','mediumspringgreen'))
grid()
Check whether your data is normally distributed (necessary for t-test). Draw quantile-quantile plots (QQ-plots) to check the normality-assumption,(qqnorrm,qqline).
par(mfrow=c(1,3))
qqnorm(armI,main='arm I',col='coral1')
qqline(armI)
qqnorm(armII,main='arm II',col='lightskyblue',ylab=NA)
qqline(armII)
qqnorm(armIII,main='arm III',col='mediumspringgreen',ylab=NA)
qqline(armIII)
To check this you can also use the Shapiro-Wilk’s test, where The null hypothesis is that the data are normally distributed (shapiro.test).
shapiro.test(armI)
##
## Shapiro-Wilk normality test
##
## data: armI
## W = 0.99133, p-value = 0.7977
shapiro.test(armII)
##
## Shapiro-Wilk normality test
##
## data: armII
## W = 0.98671, p-value = 0.4717
shapiro.test(armIII)
##
## Shapiro-Wilk normality test
##
## data: armIII
## W = 0.98543, p-value = 0.4073
Let treatment arm I be the control arm and II and III two new types of drugs. Check if the population means of arm I and II are equal, then do the same for arm I and III (t.test).
# When the variance are not equal we use the Welch t-test
t.test(armI,armII)
##
## Welch Two Sample t-test
##
## data: armI and armII
## t = 3.6231, df = 185.99, p-value = 0.0003755
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.158997 3.930029
## sample estimates:
## mean of x mean of y
## 2.7351579 0.1906452
t.test(armI,armII,var.equal = TRUE)
##
## Two Sample t-test
##
## data: armI and armII
## t = 3.6225, df = 186, p-value = 0.0003762
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.158791 3.930235
## sample estimates:
## mean of x mean of y
## 2.7351579 0.1906452
t.test(armI,armIII)
##
## Welch Two Sample t-test
##
## data: armI and armIII
## t = 0.55219, df = 181.12, p-value = 0.5815
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.056181 1.877047
## sample estimates:
## mean of x mean of y
## 2.735158 2.324725
t.test(armI,armIII,var.equal = TRUE)
##
## Two Sample t-test
##
## data: armI and armIII
## t = 0.55318, df = 184, p-value = 0.5808
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.053399 1.874264
## sample estimates:
## mean of x mean of y
## 2.735158 2.324725
Interpret the results of the test.