Theory

Let \(X_{Ai}\sim N(\mu_a,\sigma)\) i.i.d. and \(X_{Bi}\sim N(\mu_b,\sigma)\) i.i.d where \(i=1,\dots,n\). Consider the problem of testing the null hypothesis of no treatment difference \(H_0: \mu_a=\mu_b\) against the two sided alternative \(\mu_a \neq\mu_b\) with Type I error probability \(\alpha\) and power (probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true) \(1-\beta\) at \(|\mu_a-\mu_b|=\delta\). The standardize statistics for this test is: \[ Z=\frac{1}{\sqrt{2n\sigma^2}}\left(\sum_{i=1}^{n}X_{Ai}-\sum_{i=1}^{n}X_{Bi}\right)\sim N\left(\sqrt{\frac{n}{2\sigma^2}}\left(\mu_a-\mu_b\right),1\right) \] Thus we reject \(H_{0}\) if \(|Z|>\phi^{-1}(1-\alpha/2)\) and we satisfy the power requirement if \(P(|Z|>\phi^{-1}(1-\alpha/2))=1-\beta\) when \(Z\sim N\left(\delta\sqrt{\frac{n}{2\sigma^2}},1\right)\). Therefore we need \(E(Z)=\phi^{-1}(1-\alpha/2)+\phi^{-1}(1-\beta)\) at \(\mu_a-\mu_b=\delta\). From the expression of the distribution of the standardize statistics we have: \[ \delta\sqrt{\frac{n}{2\sigma^2}}=\phi^{-1}(1-\alpha/2)+\phi^{-1}(1-\beta)\quad \Rightarrow n= \frac{2\sigma^2}{\delta^2}\left[\phi^{-1}(1-\alpha/2)+\phi^{-1}(1-\beta)\right]^2\,. \]

Exercises

Exercise 7.1

Assume you want to compare the concentration of a certain metabolite in blood samples of patients with a certain disease with samples from healthy control people. From a pilot sample, it is known that the standard deviation in the complete sample is 2 (\(\sigma\) = 2). A difference of \(\delta = 3\) mml/l between the group means is considered clinically relevant. If there is truly a difference of \(\delta = 3\) mml/l between the group means, it is desired to detect this difference with a probability of 80 %, i.e. you allow for a false negative probability of 100% - 80% = 20%.

How many samples per group are necessary to fulfill the above demand for the power of 80% (power.t.test)?

# Exactly one of the parameters n, delta, power, sd, and sig.level
# must be passed as NULL, and that parameter is determined from the others.
power.t.test(delta       = 3, 
             sd          = 2, 
             power       = 0.80,
             sig.level   = 0.05,
             type        = "two.sample",
             alternative = "two.sided")

# From the above discussion

((qnorm(1-0.025)+qnorm(1-0.2))^2)*(2*2^2)/(3^2)

## 
##      Two-sample t test power calculation 
## 
##               n = 8.06031
##           delta = 3
##              sd = 2
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
## 
## [1] 6.976782

Assume in the above setting that a smaller difference, i.e. of \(\delta = 0.1\) is considered clinically relevant:

power.t.test(delta       = 0.5, 
             sd          = 2, 
             power       = 0.80,
             sig.level   = 0.05,
             type        = "two.sample")

# From the above discussion
((qnorm(1-0.025)+qnorm(1-0.2))^2)*(2*2^2)/(0.5^2)

## 
##      Two-sample t test power calculation 
## 
##               n = 252.1281
##           delta = 0.5
##              sd = 2
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
## 
## [1] 251.1642

In you opinion why the result of the power.t.test is different from the straight computation one?

Answer

As power.t.test function is built for a huge variety of cases, e.g. different sd in \(X_{Ai}\) and \(X_{Bi}\), the result in some cases do not match the straight computation but is a good approximation of it.

Change the value for the desired power and discuss with your neighbor the effect on the sample size. Turn also on the value of the standard deviation from the pilot sample.

power.t.test(delta       = 0.01, 
             sd          = 2, 
             power       = 0.95,
             sig.level   = 0.10,
             type        = "two.sample")

## 
##      Two-sample t test power calculation 
## 
##               n = 865774.6
##           delta = 0.01
##              sd = 2
##       sig.level = 0.1
##           power = 0.95
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Now, assume the researchers know that they can only effort to recruit \(n=10\) patients per group. The researchers want to know what power they can achieve with this constraint for a specified \(\delta\) of 1.3 and a standard deviation of 1.5.

power.t.test(delta= 1.3, 
             sd= 1.5, 
             n= 10,
             type= "two.sample")

## 
##      Two-sample t test power calculation 
## 
##               n = 10
##           delta = 1.3
##              sd = 1.5
##       sig.level = 0.05
##           power = 0.4500251
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Exercise 7.2

Assume you want to compare the rate of infections between two different groups of subjects. The one group (A) was strongly exposed to some risk factor, the other group (B) was only weakly exposed. From a study on a similar risk factor, it is known that the infection rate for group A is P1 = 25% and P2 = 18% for group B.

Determine the power to detect a difference between the effects rates of P1 - P2 = 7% given that N=100 subjects can be recruited per group (power.prop.test):

power.prop.test(n = 100, 
                p1 = 0.25, 
                p2 = 0.18,
                sig.level = 0.05)

## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 100
##              p1 = 0.25
##              p2 = 0.18
##       sig.level = 0.05
##           power = 0.2242613
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Assume that a power of 80% is desired for the above study. Which sample sizes per group would be necessary?

power.prop.test(power=0.8, 
                p1=0.25,
                p2=0.18,
                sig.level = 0.05)

## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 539.5113
##              p1 = 0.25
##              p2 = 0.18
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Sample size calculations

Theory

Exercises

Exercise 7.1

Exercise 7.2