Updated, Nov 6, 2020
To replicate the numbers in the Figure, we first look at what the authors say they did.
[You are also encouraged to read the other sections, as well as te full Statistical Analysis sction.]
Interestingly, the target sample size calculation assuming a 15% non-informative loss-to-follow-up, 5% non-adherence to treatment assignment in either direction, and 2.5 per 100 person-years annual HIV seroincidence in the control group.
The Kaplan-Meier method was used to estimate the HIV event distribution over time by treatment, accounting for staggered enrolment and incomplete, discrete follow-up. The time of HIV-positive status was credited to the follow-up visit when HIV was first detected. HIV-negative participants were censored in the analysis at the last regular follow-up visit completed where HIV status was ascertained. Estimates of 2-year HIV seroincidences and corresponding standard errors obtained by Greenwood’s formula were used to test for differences between the treatments on the primary outcome (HIV seroconversion). The primary analysis was by intention-to-treat; participants were included in the analysis in the group to which they were randomly assigned and all participants with follow-up for HIV status were included in the analysis.
Thus, for the cumulative incidences, and their difference:
ds=read.table("http://www.biostat.mcgill.ca/hanley/c634/HIV/KenyaTrialData.txt",header=TRUE)
# re-arrange so both data for both arms are in same row.
data=data.frame(
cbind(ds[seq(1,11,2),c(1,2,4,5)],
ds[seq(2,12,2),4:5]))
names(data)=c("t.0","t.end","n.C","n.HIV.C", "n.NotC","n.HIV.NotC")
data
## t.0 t.end n.C n.HIV.C n.NotC n.HIV.NotC
## 1 0 1 1367 4 1380 1
## 3 1 3 1351 2 1368 3
## 5 3 6 1323 5 1350 9
## 7 6 12 1287 3 1302 18
## 9 12 18 1029 0 1039 7
## 11 18 24 764 8 740 9
# calculations...
# C and NotC : shorthand for Circumcised and Not Circumcised)
# p (conditional, window-specific) probability of seroconversion
# S (unconditional) probability of remaining seronegative
# CI cumulative incidence
# SE Greenwood Standard Error for S-hat
data$p.C = (data$n.C - data$n.HIV.C)/data$n.C #condn'l pr(Staying HIV-)
data$p.NotC = (data$n.NotC - data$n.HIV.NotC)/data$n.NotC #condn'l pr(Staying HIV-)
data$S.C = cumprod(data$p.C); # un-condn'l prob(Staying HIV-)
data$S.NotC = cumprod(data$p.NotC); # un-condn'l prob(Staying HIV-)
( data$CI.C = 100*(1 - data$S.C) ) ; # Cum.Incidence (%) of HIV+
## [1] 0.2926116 0.4402169 0.8164821 1.0476791 1.0476791 2.0838291
( data$CI.NotC = 100*(1 - data$S.NotC) ) ; # Cum.Incidence (%) of HIV+
## [1] 0.07246377 0.29160310 0.95632575 2.32559313 2.98364977 4.16357836
data$V.log.p.C = (1/data$n.C)*(1-data$p.C)/data$p.C # Greenwood SE for S or CI=1-S
data$V.log.p.NotC = (1/data$n.NotC)*(1-data$p.NotC)/data$p.C # ditto, NotC
( data$SE.S.C =round(100 * data$S.C * sqrt(cumsum(data$V.log.p.C)),2) ) # Greenwood SE(%S)
## [1] 0.15 0.18 0.25 0.28 0.28 0.46
( data$SE.S.NotC =round(100 * data$S.NotC * sqrt(cumsum(data$V.log.p.NotC)),2) ) # SE(%S)=SE(%CI)
## [1] 0.07 0.15 0.26 0.41 0.48 0.61
Q = qnorm(c(0.025,0.975))
( round(data$CI.C[6] + Q * data$SE.S.C[6],1) ) # 95% CI for cum.inc
## [1] 1.2 3.0
( round(data$CI.NotC[6] + Q * data$SE.S.NotC[6],1) ) # likewise
## [1] 3.0 5.4
# z statistic for difference in cumulative incidence
SE.diff = sqrt( data$SE.S.C[6]^2 + data$SE.S.NotC[6]^2 )
( round( ( data$CI.C[6] - data$CI.NotC[6] ) / SE.diff, 2) )
## [1] -2.72
# IT MATTERS THAT YOU ROUND NUMBERS APPROPRIATELY. AND THINK OF END-USERS
A secondary analysis, that used the same statistical approach described above, excluded participants subsequently confirmed as HIV positive by PCR at baseline, and one further analysis excluded those confirmed positive at either baseline or at 1 month.
This is because an HIV test can still be negative for a while after someone is infected.
Furthermore, an as-treated analysis was done with a time-dependent covariate in a Cox regression model for circumcision status at each follow-up visit to take into account those individuals who did not adhere to their randomisation assignment; in this analysis, a time-dependent variable for the circumcision status of each participant at each follow-up visit was constructed and included as a single time-dependent predictor variable in a Cox regression model with all participants. Thus, irrespective of treatment assignment, participants were accounted in this analysis as they were treated with respect to circumcision. Cox regression models with fixed covariates were used to consider various baseline adjustments to the treatment effect. Age-group and variables that seemed to be slightly imbalanced were used — ie, ethnic group, occupation, infection with herpes simplex virus type 2, and infection with Chlamydia trachomatis. These variables were considered inde- pendently for association with HIV incidence, then singly, as adjustments to the treatment effect. Finally, the set of variables was included in a model as an adjustment to the treatment effect.
Note: An as-treated analysis means that if someone in the delayed- circumcision arm (the control arm) decided to get circumcised, he would be included in the circumcision arm; likewise, if someone in the circumcision arm (the treatment arm) decided not to get circumcised, he would be included in the control arm.
All hazard or risk ratios were estimated with the parameter estimates from Cox regression. An exact method for computing the likelihood was specified to handle ties.
It is not quite clear how the risk ratio of 0.49 (shown) was calculated but here are a few possibilities.
Lets take the ‘risk ratio’ term literally, i.e., as a ratio of the two 2-year risks. After all, a risk must have a time referent, and here the 1-KM(2y.) numbers are risks.
If we already have the 4.2%(SE 0.46) and the 2.1%(SE 0.61), we have immediately calculate the ratio, and use the delta method applied to its log as a way to calculate its SE.
\[Var[ \log(RiskRatio)] = Var[ \log(Risk.C) - \log(Risk.Not.C)]\] With \(Var[\log(RV) \approx Var(RV)/(E^2)\), where \(E = E[RV]\), and using a few more decional places, we have
( SE.log.ratio = sqrt(
data$SE.S.C[6]^2 / data$CI.C[6]^2 +
data$SE.S.Not[6]^2 / data$CI.NotC[6]^2
) )
## [1] 0.2649419
# multiplicative margin of error (MME)
( MME = exp( qnorm(c(0.025, 0.5, 0.975)) * SE.log.ratio ) )
## [1] 0.5949508 1.0000000 1.6808113
( CI = round((data$CI.C[6]/data$CI.NotC[6]) * MME, 4) )
## [1] 0.2978 0.5005 0.8412
CLEARLY, no matter how roughly or finely we round this ratio, and its limits, this interpretation doesn’t replicate the 0.28, 0.47, 0.78 Risk Ratio shown in their Figure.
Another possibility is that they calculated a rate ratio (a ratio of two incidence densities) but called it a risk ratio. This somewhat loose terminology is common, and is even found in the 2 authorative books by Breslow and Day.
You can see why the interchangeability arose: just as here, when we have low cumulative incidences, the Risk Ratio is mathematically close to the Hazard or IR or ‘rate’ Ratio. Assuming a constant over time hazard ratio, ie. over the follow-up period \(t=0\) to \(t=2\) years, and denoting the constant over time rate ratio as \(\theta\) \[\lambda_1(u) = \lambda_0(u) \times \theta,\]
so \[Risk_1(t=2) = 1 - \exp\bigg[ - \int_0^2\lambda_1(u)du\bigg] \approx \int_0^2\lambda_1(u)du = \int_0^2 \theta \times \lambda_0(u)du \approx \theta \times Risk_0(t=2).\] So, maybe they calculated \(\hat{\lambda_1}/\hat{\lambda_0}\) and just called it a ‘Risk Ratio’ rather than its proper name, a ‘Rate Ratio’ or ‘Hazard Ratio’ or ‘Incidence Density Ratio’?
Let’s see if that would fit. We need to calculate the 2 amounts of exposed population time, in terms of person years:
( PY.C = round( (1/12) * sum( data$n.C *(data$t.end - data$t.0)), 1))
## [1] 2209.8
( PY.NotC = round( (1/12) * sum( data$n.NotC*(data$t.end - data$t.0)), 1))
## [1] 2221
n.cases.HIV.C = sum( data$n.HIV.C ) ; n.cases.HIV.C
## [1] 22
n.cases.HIV.NotC = sum( data$n.HIV.NotC ) ; n.cases.HIV.NotC
## [1] 47
incidence.HIV.C = 100* n.cases.HIV.C / PY.C; incidence.HIV.C # per 100 PY
## [1] 0.9955652
incidence.HIV.NotC = 100* n.cases.HIV.NotC / PY.NotC; incidence.HIV.NotC # per 100 PY
## [1] 2.116164
( Rate.Ratio.hat = incidence.HIV.C / incidence.HIV.NotC )
## [1] 0.4704575
( SE.log.Rate.Ratio.hat =
sqrt( 1/n.cases.HIV.C + 1/n.cases.HIV.NotC ) )
## [1] 0.2583237
( MME=exp(qnorm(c(0.025,0.5,0.975)) * SE.log.Rate.Ratio.hat) )
## [1] 0.6027184 1.0000000 1.6591496
round(Rate.Ratio.hat * MME, 4)
## [1] 0.2836 0.4705 0.7806
Rounded down to 2 decimal places, thus calculation matches the reported point and interval estimates.
Notice that one can also arrive at these same estimates using POISSON REGRESSION.
data$n.HIV.C
## [1] 4 2 5 3 0 8
data$n.HIV.NotC
## [1] 1 3 9 18 7 9
( events = c(data$n.HIV.C, data$n.HIV.NotC ) )
## [1] 4 2 5 3 0 8 1 3 9 18 7 9
( py = c(data$n.C *(data$t.end - data$t.0)/12,
data$n.NotC*(data$t.end - data$t.0)/12) )
## [1] 113.9167 225.1667 330.7500 643.5000 514.5000 382.0000 115.0000 228.0000
## [9] 337.5000 651.0000 519.5000 370.0000
( circumcision = c(rep(1,6), rep(0,6)) )
## [1] 1 1 1 1 1 1 0 0 0 0 0 0
fit= glm(events ~ circumcision+offset(log(py)), family=poisson)
summary(fit)
##
## Call:
## glm(formula = events ~ circumcision + offset(log(py)), family = poisson)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.2007 -1.1053 0.1220 0.9265 2.0861
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.8556 0.1459 -26.432 < 2e-16 ***
## circumcision -0.7541 0.2583 -2.919 0.00351 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 35.640 on 11 degrees of freedom
## Residual deviance: 26.498 on 10 degrees of freedom
## AIC: 68.432
##
## Number of Fisher Scoring iterations: 5
( Rate.0.and.RateRatio = exp(fit$coefficients) )
## (Intercept) circumcision
## 0.02116164 0.47045042
( MME = exp(qnorm(c(0.025, 0.5, 0.975))*summary(fit)$coefficients[2,2]) )
## [1] 0.6027192 1.0000000 1.6591475
round(Rate.0.and.RateRatio[2] * MME,2)
## [1] 0.28 0.47 0.78
Rounded down to 2 decimal places, thus calculation ALSO matches the reported point and interval estimates.
But they mentioned using COX REGRESSION, so lets see waht that gives.
# Expand data set (could use weights instead, but a dataset with 1 row
# (or even several time slices) per participant is more natural
month = NULL
hiv.pos = NULL
circumcised=NULL
for(i in 1:6){
for(j in 0:1){
n = (j==1)*data$n.C[i] + (j==0)*data$n.NotC[i]
n.pos = (j==1)*data$n.HIV.C[i] + (j==0)*data$n.HIV.NotC[i]
circumcised = c( circumcised, rep(j,n.pos) )
hiv.pos = c( hiv.pos,rep(1,n.pos ) )
month = c( month,rep(data$t.end[i],n.pos) )
if(i < 6) n.censored = n - n.pos - (j==1)*data$n.C[i+1] - (j==0)*data$n.NotC[i+1]
if(i == 6) n.censored = n - n.pos
circumcised = c(circumcised, rep(j,n.censored))
hiv.pos = c(hiv.pos,rep(0,n.censored))
month=c(month,rep(data$t.end[i],n.censored))
}
}
length(hiv.pos)
## [1] 2747
df=data.frame(month,circumcised,hiv.pos)
head(df,20) ; tail(df,20)
## month circumcised hiv.pos
## 1 1 0 1
## 2 1 0 0
## 3 1 0 0
## 4 1 0 0
## 5 1 0 0
## 6 1 0 0
## 7 1 0 0
## 8 1 0 0
## 9 1 0 0
## 10 1 0 0
## 11 1 0 0
## 12 1 0 0
## 13 1 1 1
## 14 1 1 1
## 15 1 1 1
## 16 1 1 1
## 17 1 1 0
## 18 1 1 0
## 19 1 1 0
## 20 1 1 0
## month circumcised hiv.pos
## 2728 24 1 0
## 2729 24 1 0
## 2730 24 1 0
## 2731 24 1 0
## 2732 24 1 0
## 2733 24 1 0
## 2734 24 1 0
## 2735 24 1 0
## 2736 24 1 0
## 2737 24 1 0
## 2738 24 1 0
## 2739 24 1 0
## 2740 24 1 0
## 2741 24 1 0
## 2742 24 1 0
## 2743 24 1 0
## 2744 24 1 0
## 2745 24 1 0
## 2746 24 1 0
## 2747 24 1 0
summary(df)
## month circumcised hiv.pos
## Min. : 1.00 Min. :0.0000 Min. :0.00000
## 1st Qu.:18.00 1st Qu.:0.0000 1st Qu.:0.00000
## Median :24.00 Median :0.0000 Median :0.00000
## Mean :19.36 Mean :0.4976 Mean :0.02512
## 3rd Qu.:24.00 3rd Qu.:1.0000 3rd Qu.:0.00000
## Max. :24.00 Max. :1.0000 Max. :1.00000
library(survival)
#
ties=c("efron","breslow","exact")
for(method in 1:3){
print( noquote("********************************"))
print( noquote(paste("Ties dealt with by",
ties[method],"method")) )
print( noquote("********************************"))
ph.fit = coxph(Surv(month,hiv.pos) ~ circumcised,
data=df, ties=ties[method] )
print(ph.fit)
}
## [1] ********************************
## [1] Ties dealt with by efron method
## [1] ********************************
## Call:
## coxph(formula = Surv(month, hiv.pos) ~ circumcised, data = df,
## ties = ties[method])
##
## coef exp(coef) se(coef) z p
## circumcised -0.7592 0.4680 0.2583 -2.939 0.00329
##
## Likelihood ratio test=9.27 on 1 df, p=0.00233
## n= 2747, number of events= 69
## [1] ********************************
## [1] Ties dealt with by breslow method
## [1] ********************************
## Call:
## coxph(formula = Surv(month, hiv.pos) ~ circumcised, data = df,
## ties = ties[method])
##
## coef exp(coef) se(coef) z p
## circumcised -0.7568 0.4691 0.2583 -2.93 0.00339
##
## Likelihood ratio test=9.21 on 1 df, p=0.002407
## n= 2747, number of events= 69
## [1] ********************************
## [1] Ties dealt with by exact method
## [1] ********************************
## Call:
## coxph(formula = Surv(month, hiv.pos) ~ circumcised, data = df,
## ties = ties[method])
##
## coef exp(coef) se(coef) z p
## circumcised -0.7616 0.4669 0.2591 -2.94 0.00329
##
## Likelihood ratio test=9.27 on 1 df, p=0.002331
## n= 2747, number of events= 69
# Since all 3 sets of results are SO close to each other, lets use the last one.
round(exp(ph.fit$coefficients),3) # Fitted HAZARD RATIO
## circumcised
## 0.467
MME = exp(qnorm(c(0.025,0.5,0.975))*summary(ph.fit)$coefficients[1,3] )
round(MME*exp(ph.fit$coefficients),2) # Point Estimate and CI
## [1] 0.28 0.47 0.78
SO, it looks like the fitted hazard ratio for the Cox Model might have been used, and reported as a ‘Risk Ratio’.
JH hopes to cover in class the various methods for handling ties, and to show the logLikelihood that lies behind the parameter-fitting for the Cox model.
From Abstract:
HIV incidence over 24 months was 0·66 cases per 100 person-years in the intervention group and 1·33 cases per 100 person-years in the control group (estimated efficacy of intervention 51%, 95% CI 16–72; p=0·006).
The as-treated efficacy was 55% (95% CI 22–75; p=0·002); efficacy from the Kaplan-Meier time-to-HIV-detection as-treated analysis was 60% (30–77; p=0·003)
It would be learn what the ratio was when the 2 year follow was complete. As it stands now, the overall ratio is more influenced (weighted ) by the larger number of events in the earlier folloup experience. So, it is somewhat arbitrary.
ds=read.table(
"http://www.biostat.mcgill.ca/hanley/c634/HIV/UgandaTrialData.txt",
header=TRUE)
ds
## t.from t.to circumcision n.at.risk n.seroconversions person.years
## 1 0.0 0.5 1 2263 14 1172.1
## 2 0.0 0.5 0 2319 19 1206.7
## 3 0.5 1.0 1 2235 5 1190.7
## 4 0.5 1.0 0 2229 14 1176.3
## 5 1.0 2.0 1 964 3 989.7
## 6 1.0 2.0 0 980 12 1008.7
# re-arrange so both data for both arms are in same row.
data=cbind(ds[seq(1,5,2),c(1,2,4,5,6)],ds[seq(2,6,2),4:6])
names(data)=c("t.0","t.end", "n.C","n.HIV.C","py.C", "n.NotC","n.HIV.NotC","py.NotC") ; data
## t.0 t.end n.C n.HIV.C py.C n.NotC n.HIV.NotC py.NotC
## 1 0.0 0.5 2263 14 1172.1 2319 19 1206.7
## 3 0.5 1.0 2235 5 1190.7 2229 14 1176.3
## 5 1.0 2.0 964 3 989.7 980 12 1008.7
plot(c(0,2),c(-2500,2500),
col="white", xlab="Follup-Up Time (Years)",
ylab="PopulationSize",
cex.lab=1.75,
cex.axis=1.75)
for(i in 1:3){
rect( data$t.0[i], 0,
data$t.end[i], data$n.C[i],
col="lightgreen",border=NA)
rx = runif(data$n.HIV.C[i], data$t.0[i],data$t.end[i] )
ry = runif(data$n.HIV.C[i],
0.01*data$n.C[i],0.99*data$n.C[i])
points(rx,ry,cex=1,pch=19)
rect( data$t.0[i], 0,
data$t.end[i], -data$n.C[i],
col="lightblue",border=NA)
rx = runif(data$n.HIV.NotC[i], data$t.0[i],data$t.end[i] )
ry = runif(data$n.HIV.NotC[i],
0.01*data$n.NotC[i], 0.99*data$n.NotC[i])
points(rx,-ry,cex=1,pch=19)
}
By the way, a number of us are trying to promote this type of graph and have proposed calling it a Population-Time plot. See Figure 1 here, or Figure 3 here, and the various vignettes illustrating the casebase
package for R
. You saw another onelast week when we were dealing with the discontinuations in the population time in which the IUD was being used.
See the 2015 Lancet article on the ’’Feasibility and effectiveness of oral cholera vaccine in an urban endemic setting in Bangladesh: a cluster randomised open-label trial".
In the ‘Statistical analysis’ section, the authors state that
We calculated sample size by methods described elsewhere.[Donner and Klar text]. We calculated the intra-cluster correlation for cholera hospital admissions for 2008, and 2009. We assumed 65% efficacy and 65% coverage, yielding 42% overall protective efficacy, with a one-sided test (\(\alpha=0.05\)), 80% power, incidence of 1.6 cases per 1000 people per year, 25% yearly attrition, and 2 years of post-vaccination surveillance. On the basis of these assumptions, we calculated that we would need 236,340 participants (78,780 in each group).
The following R code answers these questions
What (assumed constant) attrition rate [expressed as an instantaneous rate of say x losses per 100 participant-days] would generate an annual attrition risk of 25% (so that only 3/4 of those randomized remain under followup [‘at risk’] at the end of year 1, and only 9/16 at the end of year 2)?
Assume 50,000 persons were to be randomized to the control arm. Using the attrition rate you just calculated, compute and graph the expected number under surveillance for each of the first 730 days of follow-up. Add up these daily numbers to get the expected total number of person-days (PD) of follow-up.
For the moment, ignore the variance (reciprocal of sample size) inflation caused by the cluster randomization design, (i.e., natural cluster to variation in attack rates) and by the fact that cholera can also easily spread between persons in the same cluster. Assume instead that the individual attacks are governed by a single homogeneous Poisson process, with \(\lambda_0\) = 1.6/1000 PersonYears (PY) in the ‘control arm’ area.
.
Convert this attack rate to an attack rate per person-day or per 100,000 person-days. [For interest, is it higher or lower that the observed rate given in Table 2?]
Let \(Y_0\) denote the number of attacks in the control arm. Using the rate you just calculated, and the PD from above, calculate E[\(Y_0\)], and (under the no extra-variance assumption) Var[\(Y_0\)].
Assume, for the purposes of hypothesis testing and setting a ‘positivity’ cutoff for a statistical test of the null, that the attack rate (\(\lambda_1\)) for persons in the ‘vaccination only’ area is also 1.6/1000PY, and that the 50,000 persons randomized to this arm are subject to the same attrition rate. Let \(Y_1\) denote the number of attacks in this arm, and let \(d = Y_1 - Y_0.\) [If the 2 amounts of experience were different, we would need to consider the difference in the rates, rather than in the numerators. This `close to 50:50’ randomization makes the planning a bit easier.]
.
Under this null assumption, calculate \(\sigma_{d|H_0} = Var[d|H_0]^{1/2)}\), and compute \[d_{crit} = E[d|H_0] - 1.96 \times \sigma_{d|H_0}= 0 - 1.96 \times \sigma_{d|H_0}.\] (Note that this implies a 2-sided test with \(\alpha=0.05;\) it appears that the authors used a 1-sided test, so they would have used 1.645 SD’s as their criterion for test positivity). Sketch the distribution of \(d|H_0,\) leaving some space to the left of, and below, the distribution so as to be able to overlay the non-null version. (Given the large expected numbers, it is safe to use a Normal approximations to the exact distributions of \(d|H_0\) and \(d|H_1.\))
Under the authors’ assumptions, what is the (non-null, \(H_1\)) value of \(\lambda_1\), of \(d\), and of \(\sigma_{d|H_1}\)? Sketch this distribution of \(d|H_1,\) to the left of the null distribution, and upside down [See diagram in section 4.3.2 (p14) of JH’s Notes on inference for a mean. It is a simpler (one-sample) context, and the alternative is on the positive side of the null, but it gives you the idea. For more on sample size calculations for a comparison of 2 means, see the Notes on comparison of 2 means in the Resources.] so it is easier to distinguish the various tail areas. Then use a normal approximation to the distribution of \(d|H_1\) to calculate what percentage of it lies to the left of \(d_{crit}.\) This percentage is called the power of this size study, i.e., the probability that – assuming the \(\lambda_1\) and \(\lambda_0\) are as specified – the study will yield a `positive’ (i.e., statistically significant) result. [It does not mean, as some investigators sloppily write, that the study has this power to detect a rate ratio reduction of 42%.]
# bangladesh cholera vaccine trial jh 2015.11.09
# attrition rate
# exp(-730 * rate ) = 9/16
a.rate = -log(9/16)/730
N=50000 # trial and error
n.c = N * exp( - a.rate *(1:730) )
plot(1:730,n.c, ylim=c(0,N),
col="white", xlab="Time: Follow-Up Day",
ylab="Population Size (Persons)",
cex.lab=1.5, cex.axis=1.5)
polygon(c(1:730,730,1),c(n.c,0,0),
col="lightblue", border=NA )
P1000Y=365*1000
100000*1.6/P1000Y
## [1] 0.4383562
( observed.attack.rate=106/(39327744) )
## [1] 2.695298e-06
PD = sum(n.c); PD # person days
## [1] 27743226
assumed.attack.rate=1.6/(1000*365)
c(assumed.attack.rate, observed.attack.rate)
## [1] 4.383562e-06 2.695298e-06
E.attacks = assumed.attack.rate * PD
E.attacks
## [1] 121.6141
# assuming for now, Y_0 and Y_1 are Poisson
# i.e. no extra-Poisson (between clsuter) variation
d.crit = qnorm(0.025) * sqrt(1/E.attacks + 1/E.attacks)
# alt
Efficacy = 0.65; Coverage=0.65
alt.Rate.Ratio = 1-Efficacy*Coverage;
alt.Rate.Ratio
## [1] 0.5775
alt.E.attacks = alt.Rate.Ratio * E.attacks
c.crit.z.alt=(d.crit-
(alt.E.attacks-E.attacks) )/
sqrt(alt.E.attacks+E.attacks)
c.crit.z.alt
## [1] 3.691513
pnorm(c.crit.z.alt)
## [1] 0.9998885
## in a diagram
E.diff.n.cases.null = E.attacks - E.attacks
V.diff.n.cases.null = E.attacks + E.attacks
E.diff.n.cases.alt = alt.E.attacks - E.attacks
V.diff.n.cases.alt = alt.E.attacks + E.attacks
possible.diffs = floor( E.diff.n.cases.alt -
3*sqrt(V.diff.n.cases.alt)
):
ceiling( E.diff.n.cases.null +
3*sqrt(V.diff.n.cases.null)
)
prob.mass.fn.null = dnorm(possible.diffs,
E.diff.n.cases.null,
sqrt(V.diff.n.cases.null)
)
prob.mass.fn.alt = dnorm(possible.diffs,
E.diff.n.cases.alt,
sqrt(V.diff.n.cases.alt)
)
plot(possible.diffs,prob.mass.fn.null,
type="h",col="green",
ylim=c(-max(prob.mass.fn.alt),
max(prob.mass.fn.null)),
xlab="N.cases.vaccine.arm - N.cases.ctl.arm",
ylab="Prob. of observing.This difference",
cex.lab=1.5, cex.axis=1.5)
abline(v=0-1.96*sqrt(V.diff.n.cases.null),
col="green",lwd=3)
text(1*sqrt(V.diff.n.cases.null),
2/3*max(prob.mass.fn.null),
"NULL",col="green",adj=c(-0.5,0))
SD=sqrt(V.diff.n.cases.null)
h = exp(-1/2)*max(prob.mass.fn.null)
segments( 0,h,floor(SD),h,
col="green",lwd=2)
text(SD/2,h,toString(round(SD,1)),
col="green",adj=c(0.5,1.25) )
text(0,0,paste("0 = ",
toString(round(E.attacks,1)),
" - ",
toString(round(E.attacks,1)),
sep=""),
col="green",
adj=c(0,1.25) )
points(possible.diffs+0.5,-prob.mass.fn.alt,
type="h", col="red")
text(E.diff.n.cases.alt-1*sqrt(V.diff.n.cases.alt),
-2/3*max(prob.mass.fn.null),
"ALT",col="red",adj=c(1.5,0))
text(E.diff.n.cases.alt,0,
paste(
toString(alt.Rate.Ratio),
"
x ",
toString(round(E.attacks,1)),
"
-
",
toString(round(E.attacks,1)),
"
= ",
toString(round(E.diff.n.cases.alt,1)),
sep=""),
col="red",
adj=c(1,-0.1) )
text(E.diff.n.cases.alt,0,"|",col="red")
OBSERVED DATA IN ACTUAL TRIAL
In the ‘Results’ section, the authors address a measure of the 2-year protection afforded by the vaccine.
They had two ways of obtaining a crude measure:
as $ 100 (1-RiskRatio),$ where the \(RiskRatio\) is estimated as the ratio of the 2-year (730 day) risks (i.e. approximately \(\widehat{R_1(2y)} = 1-0.9989 = 0.0011\) and \(\widehat{R_0(2y)} = 1-0.9981 = 0.0019\)) obtained from the the two Kaplan-Meier curves in Figure 3). This gives a crude estimate of \(100\times(1 - 0.0011/0.0019)\) = 42% protection.
as $ 100 (1-RateRatio),$ or $ 100 (1-HazardRatio),$ where the \(RateRatio\) or \(HazardRatio\) is estimated as the ratio of the attack rates calculated over the 730 days (i.e. approximately \(\widehat{\lambda_1} = 65/41,809,947PD = 0.1555\) attacks per 100000PD and \(\widehat{\lambda_0} = 106/39,327,744PD = 0.2695\) attacks per 100000PD obtained from the data in the top row of Table 2). This also gives a crude estimate of \(100\times(1 - 0.1555/0.2695)\) = 42% protection.
Assuming no extra-Poisson variation, we have enough information to directly calculate a CI for the second version. We start by calculating a CI for the \(RateRatio\) or \(HazardRatio\), and then compute \(100 \times (1 - CI).\) But instead of working in the ratio scale, we work in the \(\log [Ratio]\) scale, so that \(Var\{ \log [ \widehat{\lambda_1} / \widehat{\lambda_0} ] \} = Var\{ \log [ \widehat{\lambda_1} ] \} + Var\{ \log [ \widehat{\lambda_0} ] \}.\)
Use your results from exercise 0.1 of the `models for intensity rates’ material to work out the variance (and thus a CI) for the log of the ratio, and from it, a CI for the ratio itself. Then, convert this (slightly asymmetric) CI for the ratio into a (similarly asymmetric) CI to accompany the point estimate of the percent protection. Can you think of reasons why their CI is slightly wider?
With a few approximations and interpolations, and again assuming no extra-binomial variation, we have enough information to directly calculate a Greenwood-based CI for the first version.
\[\widehat{RiskRatio} = \widehat{R_1(2y)} / \widehat{R_0(2y)} \]
\[\textrm{Var}\{ \log \widehat{RiskRatio} \} = \textrm{Var}\{ \log [ \widehat{R_1}] \} + \textrm{Var}\{ \log [ \widehat{R_0} ] \} \]
Writing \(S = 1 - R,\) noting that $ [ ] = $ \(\textrm{Var} [ 1- \hat{R} ] = \textrm{Var} [ \hat{S} ],\) and using the delta method, we can write each of the two Var{log} components as \[ (1/ \hat{R})^2 \times \textrm{Var} [ \hat{R} ] = (1/ \hat{R})^2 \times \textrm{Var} [ \hat{S} ] = (1/ \hat{R})^2 \times \hat{S}^2 \times \sum \{ n_i^{-2} \} \]
where the sum is over the risksets. In this application, each riskset, i.e., \(n_i\), is very large; if the attacks occur on separate days, so that each \(d_i\) is 1, then each \(d_i/[n_i(n_i-d_i)]\) term in the sum in the Greenwood formula can be approximated by \(1/n_i^2\). So, all we are missing for the two components are the 65 different \(n\)’s, i.e. the numbers at risk in the vaccination arm when each of the 65 attacks occurred, and the 106 numbers at risk in the control arm when each of the 106 attacks occurred. Figure 3 indicates that the events are distributed across the 730 days, but because there is more person time nearer to day 1 than day 730 (see your first set of calculations), the 65 events are too. and so the \(n\)’s at these times tend to be somewhat bigger than the average \(n\).
Generate a best guess for the 65 \(n\)’s at risk, and from them calculate the first variance component for \(\textrm{Var}\{ \log \widehat{RiskRatio} \}.\) Do the same for the other arm (with 106 attacks), and add the two variance components to get the variance of the \(\log RiskRatio\). From this, calculate a CI for \(\log RiskRatio\) and, from it, a CI for \(RiskRatio,\) and, from it, a CI for the \(Percent \ Protection\).
c=c(65,106); PD=c(41809947,39327744)
Rate=c/PD
round(Rate[1]/Rate[2],2)
## [1] 0.58
efficacy=round(100*(1-Rate[1]/Rate[2]),0)
log.ratio=log(Rate[1]/Rate[2])
SE.log = sqrt(sum(1/c))
round(100*(1-exp(log.ratio+c(-1.96,0,1.96)*SE.log)),0)
## [1] 58 42 21
round(100*(1-exp(log.ratio+c(-1.645,0,1.645)*SE.log)),0)
## [1] 55 42 25
N1=94675
a.rate = -log(38866/N1)/730
n.c = N1 * exp( - a.rate*(1:730) )
prob = n.c/sum(n.c)
#plot(1:730,prob, ylim=c(0,max(prob)))
t.events1 = sort( sample(1:730,65, prob=prob) )
n.riskset1 = n.c[t.events1]
Sum = sum(1/(n.riskset1^2))
Var.logR1 = (1/0.0011^2) * (0.9989^2) * Sum
N0=80056
a.rate = -log(37303/N0)/730
n.c = N0 * exp( - a.rate*(1:730) )
prob = n.c/sum(n.c)
t.events0 = sort( sample(1:730,106, prob=prob) )
n.riskset0 = n.c[t.events0]
Sum = sum(1/(n.riskset0^2))
Var.logR0 = (1/0.0011^2) * (0.9989^2) * Sum
Var.logRR = Var.logR1 + Var.logR0
CI.RR = round( 100*exp( log(0.0011/0.0019)+
c(-1.96,0,1.96)*sqrt(Var.logRR)
), 0)
100-CI.RR
## [1] 61 42 13
Refer again to the ‘Statistical analysis’ section, where the authors address the intra-cluster correlation for hospital admissions.
Assume, for simplicity, that all clusters have the same (average) cluster size of \(n=9,001,\) so that the variance inflation factor is \(VIF = 1+(n-1) \times ICC = 1 + 9000 \times ICC.\)
Using the same steps as in the power calculation above, and assuming for now that ICC=0, work out what sample size per arm would be required
for the type II error of 20% (80% power) if one uses a test of size \(\alpha=0.05\) (1-sided). \ \ Compare this with the requirement calculated by the authors, and deduce the value of ICC they must have used.
[You may also find the Rate ratios' section in the article,
Sample Size, Precision and Power Calculations: A Unified Approach’ by Hanley and Moodie in J Biomet Biostat 2:124 in 2011 to be of help. The article can also be found using the REPRINTS link on JH’s homepage. It calculates the requirements in terms of , but it is easy to work back from numbers of events to the numbers of person-days required to generate this many events.]
This article was chosen in part to highlight the silliness of our obsession with ‘survival’ data and our **use of emotionally-laden words*. The title was ‘Marriage risk of cancer research fellows’; the label on the y-axis was ‘Marriage-Free Survival’.
The median in the K-M curve is about 6 years.
The text of the letter reads
11 of 13 participants (85%) got married by the 17-year cutoff.
Taken in isolation, this is not that helpful, since the ‘survival’ curve reaches 20% by the end of year 7 and 10% by the end of year 8 or so. The confusion is introduced by the mention of the ‘17-year cutoff.’
Where does this 17 years come from, given that the survival curve end at zero at year 8? It comes from the fact that the first fellow entered follow-up in 1993 (the last one entered in 2008) and the analysis was carried out in 2010, some 17 years later. So the 17 is not relevant, and its mention in the results is misleading.
It is easy to determine the times of the 11 marriages, since each marriage results in a downwards jump in the curve. If we add up the 6 single jumps, and the triple jump at the end of year 6, and the double at the end of year 7, our total comes to the 6+3+2 = 11 reported.
It is also possible to determine how far out the two still-unmarried fellows were when the report was prepared. We can do so by noticing where the jumps increase. It is clear to the eye that the size of the jump at the end of year 5 (t = 5y) is larger than the size of the jump at t = 4.5y. So there must be a censored value somewhere in the second half of the 5th year. Likewise, the size of the triple-jump at the end of year 6 is more than 3 times the size of the jump at the end of year 5. So there must be a consored value somewhere in that 6th year.
.
[Note: We shouldn’t say ‘at risk’, but we could say something like ‘still-ummarried’ or ‘had not changed their facebook status to married’ instead. The key is to avoid ‘jargon’ and to instead say what it is in clear language that a lay person would easily understand. The use of the term ‘still unmarried’ emphasizes that marriage (the event) is a transition from the unmarried state to the married state. We are using the word ‘state’ here in the same way that people talk about a person’s ‘Facebook status.’
JH omitted the descriptor subtitle: ‘A Matched Case-Control Study’.
===========
1.. Comment on the authors’ description of their study.
===========
It is NOT a case-control study, or even a so-called ‘case-control study’.
First, some clarifications, from a nenowned teacher of epidemiology, whose 2011 book Epidemiological Research: Terms and Concepts should be regarded as the go-to book.
Case – Concerning a sickness or an illness (or whatever else), an instance of it. Note: A person with a case of an illness is not a case of that illness; and a number of cases of an illness is not a group but a series of these (in a group of people).
Case – A common misnomer for a person with a case of a particular illness (especially in the context of ‘case-control’ studies). (Cf. sect. I – 1. 2.)
Case-control study∗ (synonyms: retrospective study, case-history study) – “The observational epidemiological study of persons with the disease (or other outcome variable) of interest and a suitable control group of persons without the disease (comparison group, reference group). The potential relationship of a suspected risk factor or an attribute to the disease is examined by comparing the diseased and nondiseased with regard to how frequently the attribute or factor is present . . . in each of the groups (diseased and nondiseased) [refs.]” [4].
.
Note 1: Instead of the persons involved, studied is an occurrence relation (abstract).
.
Note 2: Even though the term ‘case-control study’ is commonly – including in the I.E.A. dictionary [4] – used as a synonym for ‘case-base study’ and ‘case-referent study,’ the concept of ‘case-control study’ actually is profoundly different from the referent of those two other terms.
.
Note 3: The concept of ‘case-control’ study – or ‘trohoc’ (heteropalindrome of ‘cohort’) study – is seriously malformed. It represents what may be termed the trohoc fallacy (corresponding to the cohort fallacy). A case series (not group) in any epidemiological study has its meaning only in the context of its referent, the study base, in reference to which it presumably is the totality of cases. Its associated non-case series (not group), in turn, has meaning only if it is a fair sample of the person-moments (infinite in number) constituting the population-time of the study base. And once the two series are thus construed, comparison between them – numerator and denominator series – is seen to be profoundly misguided. Very notably, the alternative to causality – confounding, that is – can never be understood in terms of the “case-control” comparison – but only in terms of the etiogenetic contrast in reference to the study base. (See ‘Directionality,’ ‘Overmatching,’ and ‘Etiologic study.’)
.
Note 4: The terms ‘retrospective study’ and ‘case-history study’ have lost favor in comparison with ‘case-control study,’ while the respective concepts – fundamental fallacies – are the very same. (Cf. ‘Etiologic study.’)
JH’s shorter (and less formal) version is a study that allows the ratio of two (time-based) rates
Rate\(_1\) = No. of cases (\(C_1\)) occurring in a certain amount of ‘exposed’ person.time / this amount of person.time (\(PT_1\)).
and
Rate \(_0\) = No. of cases (\(C_0\)) occurring in a certain amount of ‘un-exposed’ person.time / this amount of person.time (\(PT_0\))
to be estimated from the ‘case’ series, and a ‘denominator’ series, a fair sample of the person-moments making up the base in which, or from which, the cases arose. The base comprises \(PT\) = \(PT_1\) + \(PT_0\) units of Population-Time. Say the sample is of size \(n\).
The \(n\) person-moments are classified as either ‘exposed’ or ‘not exposed’ and the \(n_1: n_0\) ratio is used to estimate the relative sizes of the exposed and unexposed segments of the base. In other words, \[\widehat{PT_1} = [n_1/n] \times PT; \ \ \widehat{PT_0} = [n_1/n] \times PT.\]
If we knew the actual sizes of the 2 segments of the base , we would calculate \[RateRatio = \frac{C_1/PT_1}{C_0/PT_0}.\] Since we don’t, we substitute our best estimates of them, to get
\[\widehat{RateRatio} = \frac{C_1/([n_1/n] \times PT)} {C_0/([n_0/n] \times PT)}.\]
Since \(n\) and \(PT\) cancel out, we have \[\widehat{RateRatio} = \frac{C_1/n_1} {C_0/n_0} = \frac{C_1 \times n_0} {C_0 \times n_0} .\] The computatition vesrion on the right hand side is the familiar cross-product ratio that is often mistaken for and confused with an odds ratio. It is not a ratio of two odds; it is a ratio of two ‘quasi-rates’ or ‘quasi-incidence densities’.
===========
2.. Overall, the mortality rate in the ICU was 42% (18 of 43 patients) in the IV colistin, compared with and 24% (10 of 43 patients) in the AS-IV colistin group (P = .066).
Show how the authors arrived at their P = .066 for the all-cause mortality comparison. If you used an online calculator, include a screenshot. If you used R, show the code you used.
===========
Several of you got ‘close’, using various ‘canned’ routines
Some did the calculation by hand (with help of R
. For example
( z = (18/43 -10/43)/sqrt(
(18/43) * (25/43) / 43 + (10/43) * (33/43) / 43 ) )
## [1] 1.878352
round(2*(1-pnorm(z)),4)
## [1] 0.0603
Another possibility might be that they used a ‘continuity correction’. It would reduce the \(X^2\) statistic and thus increase the p-value.
Others, whose teachers told you that you can use separate variances when calculating a SE(diff of 2 proportions) for a CI, but a common variance when calculating a SE(diff of 2 proportions) for a test of the null got
( z = (18/43 -10/43)/sqrt(
(28/86) * (58/86) / 43 + (28/86) * (58/86) / 43 ) )
## [1] 1.840968
round(2*(1-pnorm(z)),3)
## [1] 0.066
which matched theirs.
Here are JH’s Notes from when he taught the Intro biostatistics course for our Epidemiology students.
Notice the difference between the SE on page 1 (for the CI) and the SE on page 2 (for the test of the null H). The common \(p\) in the variance is dictated by the null. The CI reflects whatever the 2 proportions are, whereas the SE for the null is calculated under the common \(p\) specified by the \(H_0.\)
It is not widely appreciated by non-statisticians that that the square of the z-statistic is the same as the Chi-Square statistic.
Nor, nowadays, do statisticians themselves the ‘old-timer’ way of calculating the Chi-Square statistic, using nothing but integers. It avoids having to calculate the 4 expected frequencies in the \(2 \times 2\) table, then the 4 \((O-E)^2/E\) values, and finally their sum. This integer-only version of the Chi-Square statistic is illustrated on page 5 of these Notes.
a=18; b= 10; c = 25; d=33;
r1=a+b; r2=c+d; c1=43; c2=43; n=c1+c2;
( Ch.Sq.statistic = n*(a*d - b*c)^2/(r1*r2*c1*c2) )
## [1] 3.389163
z^2
## [1] 3.389163
round( pchisq(Ch.Sq.statistic,1,lower.tail=FALSE),3)
## [1] 0.066
===========
3.. How the curves were fitted, and the distribution of the times of the 10 and of the 18 deaths.
===========
Suspiciously, the 6 single jumps in the solid line curve are all of size 1/10 = 0.1, and the 2 double jumps are both of size 2/10 = 0.2. This would have to mean that the survival times of the 43-10 = 33 patients who did not die are all censored before the 6th day, the day of the first death. THIS DOES NOT MAKE MUCH SENSE.
Also, the fact that the survival curve goes down to 0 is very suspicious. One would think that the censored survival times are all past the 30 days, so the curve should end with a horizontal straight line at a height of \(S\) = 33/43 = 0.77.
The most logical conclusion is that the Kaplan-Meier curve was calculated just using the survival times of the 10 patients who died. The answer to the next question will support this.
===========
4.. How can the log-rank test give the reported P = .888 for all-cause mortality?
===========
THANKS to James for his ‘data extraction’
library(survival)
for(version in c("Dead Patients Only","All Patients")){
# the 10 deaths, without/with 33 censored included
t1 =c(6,6, 7, 8, 9, 13, 14,14, 25, 27)
dead1 = rep(1,length(t1))
# the 18, with 25 censored
t0 =c(5,5, 6,6,6, 7,7, 8, 9, 10,
12,12, 16,16,16, 20, 29, 30)
dead0 = rep(1,length(t0))
if(version=="All Patients"){
t1 = c(t1, rep(30,33) )
dead1 = c(dead1, rep(0,33) )
t0 =c(t0, rep(30,25))
dead0 = c( dead0, rep(0,25) )
}
Day = c(t1, t0)
iv.plus = c(rep(1,length(t1)),
rep(0,length(t0)) )
dead = c(dead1,dead0)
km = survfit(Surv(Day,dead)~iv.plus)
plot(km, ylab="Proportion Still Alive",
xlab="Day",
cex.lab=1.5, cex.axis=1.5,
lwd=5,
col=c("black","red"),
main=version)
abline(h=seq(0,1,0.1),col="lightblue")
log.rank = survdiff(Surv(Day,dead)~iv.plus)
print(log.rank)
# more decimals
print(c( round(log.rank$chisq,3),
round(pchisq(log.rank$chisq,1,lower.tail=FALSE),3)
)
)
if(version=="All Patients"){ # show details
print(noquote(""))
print(noquote("DETAILS .. by JH"))
print(noquote(""))
risk.set.day = sort( unique(Day[dead==1]) )
print(noquote(""))
print(noquote("Risksets on these days..."))
print(risk.set.day)
n.tables = length(risk.set.day)
m = matrix(NA,n.tables,7)
for(table in 1:n.tables){
table.day = risk.set.day[table]
m[table,1] = table.day
m[table,2] = sum( dead[ dead==1 & Day == table.day &
iv.plus ==1 ] ) # a
m[table,3] = sum( Day >= table.day &
iv.plus ==1 ) - m[table,2] # b
m[table,4] = sum( dead[ dead==1 & Day == table.day &
iv.plus ==0 ] ) # c
m[table,5] = sum( Day >= table.day &
iv.plus ==0 ) - m[table,4] # d
r1 = m[table,2]+ m[table,3]
r2 = m[table,4]+ m[table,5]
c1 = m[table,2]+ m[table,4]
c2 = m[table,3]+ m[table,5]
n = r1+r2
E.0 = r1*c1/n
V.0 = r1*r2*c1*c2/(n^3)
m[table,6] = E.0 # R
m[table,7] = V.0 # V
text(table.day-0.1, 1 - table.day / 30,
toString(m[table,2]),adj=c(1,-0.2))
text(table.day, 1 - table.day / 30,
toString(m[table,3]),adj=c(0,-0.2))
text(table.day-0.1, 1 - table.day / 30,
toString(m[table,4]),adj=c(1,1.2))
text(table.day, 1 - table.day / 30,
toString(m[table,5]),adj=c(0,1.2))
text(table.day -1.5, 1 - table.day / 30,
toString(round(m[table,2]-E.0,1)),
adj=c(1,0.5), col="blue",cex=1.4)
text(table.day -3, 1 - table.day / 30,
toString(round(V.0,4)),adj=c(1,0.5), col="blue",cex=1.4)
}
text(0, 0.15," V[ a | H0 ] and ( a - E[ a | H0 ] )",
col="blue",cex=2.5,adj=c(0,0.5))
print(noquote(""))
print(noquote("Sum(a), Sum(E.a.0), Sum(V.0)"))
print(noquote(""))
sums = apply(m,2,sum)[c(2,6,7)]
print (round( sums,4))
print(noquote(""))
print(noquote("( Sum(a) - Sum(E.a.0) )^2 / Sum(V.0)"))
print(noquote(""))
print( (sums[1]-sums[2])^2 / sums[3] )
}
} # who (dead only, all)
## Call:
## survdiff(formula = Surv(Day, dead) ~ iv.plus)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## iv.plus=0 18 18 18.33 0.00586 0.0199
## iv.plus=1 10 10 9.67 0.01111 0.0199
##
## Chisq= 0 on 1 degrees of freedom, p= 0.9
## [1] 0.020 0.888
## Call:
## survdiff(formula = Surv(Day, dead) ~ iv.plus)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## iv.plus=0 43 18 13.2 1.78 3.43
## iv.plus=1 43 10 14.8 1.58 3.43
##
## Chisq= 3.4 on 1 degrees of freedom, p= 0.06
## [1] 3.427 0.064
## [1]
## [1] DETAILS .. by JH
## [1]
## [1]
## [1] Risksets on these days...
## [1] 5 6 7 8 9 10 12 13 14 16 20 25 27 29 30
## [1]
## [1] Sum(a), Sum(E.a.0), Sum(V.0)
## [1]
## [1] 10.0000 14.8380 6.7355
## [1]
## [1] ( Sum(a) - Sum(E.a.0) )^2 / Sum(V.0)
## [1]
## [1] 3.475096
No surprisingly, the two versions of the log-rank test give very different results. The comparison based only on those who died replicates the P=0.888 the authors reported.
JH’s suggestion is that some of you write a letter to the Editor explaining who you are and why the concerns in the original letter to the Editor are borne out by your analyses, and asking that the journal fix the error. You might also point out that this wrong way of analyzing data is very effective in minimizing any differences in the full data.
Your survey of the web/textbooks found both two versions ‘out there’. C&H show us the rationale behind the more compact (1 term) version. It also fits with the more ‘modern’ (M-H) way of computing a chi-square statistic even in the case of a single 2 x 2 table.
Even though Mantel proposed this method, based on just the ‘a’ cell, back in 1959, it does not seem to have pdermeated the statistics community. Maybe some of the reason is that it is only suited to 2 x 2 tables, and not to larger ones. See page 8 of these notes for the details, and some worked examples.
The full article can be found here.
We will focus on the K-M curves in Figure 1 (and the Incidence Rate Ratios in the top half of Table 2) more to get experience with these entities than to make fair comparisons.
Hi Jim,
.
Sorry that I missed your lecture yesterday. I was watching the lecture recording just now, and you talked a lot about one of my favorite words for the past two years, “confounding”. It was excellently explained in a statistical way, but I noticed a small difference in the abstract figure you showed at last, that the bi-directional arrow is not proper if it is a DAG figure. And to be called a confounder in Epi, the direction can only be from the confounder to the “cause” otherwise it’s just a part of the effect of the cause on the outcome. Nevertheless, I am pretty sure that you have already known these things, just in case if it’s a typo.
.
Thanks,
.
Haoyu .
See more on confounding here and here.
.
The ‘adjusted’ comparison in Table 3 does try to make a fair comparison, by using regression models to (synthetically, artificially, mathematically) ‘match’ or ‘handicap’ men on age.
Read the last sentence of the Results section of the Abstract, and identify the table in which this result appears. Comment.
.
They took it from Table 2, which merely shows the crude comparison, ie. unadjusted (crude, confounded) incidence ratio. This is a sneaky practice and misleads university media offices, journalists and other casual readers who don’t have the time to read the full paper.
From data in the Figure, construct a dataset of 1104 observations (1 per participant) that comes close to the actual dataset, and use it – and the survival package in R – to generate the K-M curves. It will help to work with an electronic version of the Figure, so that you can enlarge it.
Log-rank test
Here is a version for the 0-10 and 11-20 push-ups groups: it is enough to show the issues.
Year=c(0.2,1:11)
set.seed(1234567)
( AT.RISK=t(matrix(
c( 75, 68, 63, 60, 53, 39, 32,28,26,23,17,17,
200,200,186,184,172,139,118,96,89,78,63,42),12,2) ))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 75 68 63 60 53 39 32 28 26 23 17 17
## [2,] 200 200 186 184 172 139 118 96 89 78 63 42
str(AT.RISK)
## num [1:2, 1:12] 75 200 68 200 63 186 60 184 53 172 ...
( t.event=list(
t1=c(1.01,1.10, 1.95,2.11, 2.95,4.6, 4.7,7.8),
t2=c(1.85,2.03, 3.4 ,4.8, 4.9, 5.35,6.6,7.4, 10.5)) )
## $t1
## [1] 1.01 1.10 1.95 2.11 2.95 4.60 4.70 7.80
##
## $t2
## [1] 1.85 2.03 3.40 4.80 4.90 5.35 6.60 7.40 10.50
L=c("0-10 Pushups","11-20 Pushups")
par(mfrow=c(1,1), mar=c(0,0,0,0))
plot(-5,-1,xlim=c(-0.5,13),ylim=c(-220,220))
for(no in seq(-75,200,25)){
text(-0.05,no,
toString(abs(no)), adj=c(1,0.5))
if(abs(no)<50) text(11.05,no,
toString(abs(no)), adj=c(0,0.5))
}
for(t in 0:11) text(t,220, toString(t), adj=c(0.5,0.5))
text(12.25,220,"Years\nFollowed",adj=c(0.5,0.5))
text(-0.75,-50,"<---------- | ------ No. Being Followed ---> ",
adj=c(0,0.5),srt=90)
RT = list(t1=NULL,t0=NULL)
TIMES=list(t1=NULL,t0=NULL)
Tt = sort(unlist(t.event))
for(row in 1:2){
n.t = function(t) approx(Year,AT.RISK[row,],t)$y
RT[[row]]=n.t( Tt )
t.n = function(n) {
if(i <= AT.RISK[row,12]) return(11)
if(i > AT.RISK[row,12]) return(
approx(AT.RISK[row,], Year,n)$y )
}
Sign = ((row==2)-(row==1))
nn=AT.RISK[row,1]
Times=NULL
for(i in 1:nn){
Times=c(Times,t.n(i))
}
TIMES[[row]]=
c(Times[sample(1:nn,nn-length(t.event[[row]]))],
t.event[[row]])
for(i in 1:nn){
ttt = sort(TIMES[[row]])[nn+1-i]
segments(0, Sign*(i),
ttt, Sign*(i),
col=c("grey20","blue")[row] ,lwd=0.1)
}
segments(unlist(t.event),0,
unlist(t.event),Sign*200,col="white",lwd=1.5)
PT= sum(TIMES[[row]])
n.events = length(t.event[[row]])
n=n.t(t.event[[row]])
r = runif(n,0.1,0.9)
points(t.event[[row]],Sign*r*n,pch=19,cex=0.5,col="red")
lambda.hat = n.events/PT
text(6.2,Sign*65*row,L[row])
text(6.2,Sign*50*row,
paste(toString(round(1000*lambda.hat,1)),"/ 1000 ManYears"),
col="red")
}
tt=sort(unlist(t.event))
dy=40
yy=-110 +c(0,-dy, dy,0,-dy,-2*dy, 0,-dy,
dy,0,-dy,-2*dy,0,0,0,-dy,0)
CEX=0.85
A = 0; EA=0; V=0
for(i in 1:length(tt)){
in.2 = tt[i] %in% t.event[[2]]
a=0; if(in.2) a=1
A=A+a
text(tt[i],yy[i],
paste("a=",toString(a),sep=""),cex=CEX,font=4,col="blue")
n2 = sum(TIMES[[2]]>= tt[i])
n1 = sum(TIMES[[1]]>= tt[i])
EA = EA+n2/(n1+n2)
text(tt[i],yy[i]-dy/3.5,
paste("E: ",toString(round(n2/(n2+n1),2)),sep=""),
cex=CEX*0.75,col="blue")
v = 1*(n1+n2-1)*n1*n2/( (n1+n2-1)*(n1+n2)^2 )
text(tt[i],yy[i]-2*dy/3.5,
paste("v: ",toString(round(v,2)),sep=""),
cex=CEX*0.75,col="blue")
V=V+v
}
ALL.TIMES = c(TIMES[[1]],TIMES[[2]])
EVENT.indicator =c(rep(0, 75-8), rep(1,8 ),
rep(0,200-9), rep(1,9 ) )
GROUP = c( rep(" 0-10",length(TIMES[[1]])),
rep("11-20",length(TIMES[[2]])) )
table(GROUP)
## GROUP
## 0-10 11-20
## 75 200
n = length(GROUP)
df = data.frame(ALL.TIMES , EVENT.indicator, GROUP )
df=df[order(df$ALL.TIMES , df$EVENT.indicator),]
head(df,20); tail(df,20)
## ALL.TIMES EVENT.indicator GROUP
## 59 0.2000000 0 0-10
## 43 0.3142857 0 0-10
## 190 0.6000000 0 11-20
## 56 0.6571429 0 0-10
## 252 0.7000000 0 11-20
## 66 0.7714286 0 0-10
## 84 0.8000000 0 11-20
## 13 0.8857143 0 0-10
## 194 0.9000000 0 11-20
## 31 1.0000000 0 0-10
## 183 1.0000000 0 11-20
## 68 1.0100000 1 0-10
## 129 1.1000000 0 11-20
## 69 1.1000000 1 0-10
## 40 1.2000000 0 0-10
## 235 1.2000000 0 11-20
## 93 1.3000000 0 11-20
## 256 1.4000000 0 11-20
## 122 1.5000000 0 11-20
## 8 1.6000000 0 0-10
## ALL.TIMES EVENT.indicator GROUP
## 180 11 0 11-20
## 210 11 0 11-20
## 212 11 0 11-20
## 213 11 0 11-20
## 215 11 0 11-20
## 216 11 0 11-20
## 219 11 0 11-20
## 220 11 0 11-20
## 224 11 0 11-20
## 226 11 0 11-20
## 234 11 0 11-20
## 242 11 0 11-20
## 244 11 0 11-20
## 245 11 0 11-20
## 246 11 0 11-20
## 250 11 0 11-20
## 251 11 0 11-20
## 260 11 0 11-20
## 261 11 0 11-20
## 266 11 0 11-20
library(survival)
logRank = survdiff( Surv(ALL.TIMES,EVENT.indicator) ~ GROUP)
logRank
## Call:
## survdiff(formula = Surv(ALL.TIMES, EVENT.indicator) ~ GROUP)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## GROUP= 0-10 75 8 4 3.99 5.23
## GROUP=11-20 200 9 13 1.23 5.23
##
## Chisq= 5.2 on 1 degrees of freedom, p= 0.02
text(11,-140,"||\n||\n||\n||\n||\n||\n||\n||\n||")
text(12.2,-80,
paste("Sum(a)\n",toString(A)),cex=CEX,font=4,
col="blue")
text(12.2,-80-3*dy/3.5,
paste("Sum(E[a])\n",toString(round(EA,1))),
cex=CEX*0.75,col="blue")
text(12.2,-80-6*dy/3.5,
paste("Sum(v)\n",toString(round(V,2))),
cex=CEX*0.75,col="blue")
text(12.2,-100-9*dy/3.5,
paste("(9-13.2)^2\n-------------\n 2.96\n\n=",
toString( round((A-EA)^2/V,2))),
cex=CEX*0.75,col="blue")
survdiff(Surv(ALL.TIMES, EVENT.indicator) ~ GROUP)
## Call:
## survdiff(formula = Surv(ALL.TIMES, EVENT.indicator) ~ GROUP)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## GROUP= 0-10 75 8 4 3.99 5.23
## GROUP=11-20 200 9 13 1.23 5.23
##
## Chisq= 5.2 on 1 degrees of freedom, p= 0.02
text(5.7,225,
"\n
N Observed Expected (O-E)^2/E (O-E)^2/V
GROUP= 0-10 75 8 3.83 4.53 5.86
GROUP=11-20 200 9 13.17 1.32 5.86
Chisq= 5.9 on 1 degrees of freedom, p= 0.02",
adj=c(0,1),family="mono", cex=1.2 )
KM = survfit( Surv(ALL.TIMES,EVENT.indicator) ~ GROUP)
par(mfrow=c(1,1), mar=c(5,5,4,0))
plot(KM, ylab=
"Proportion Still Free of Cardiovascular Disease",
xlab="Follow-Up Year",
cex.lab=1.5, cex.axis=1.5,
lwd=5,
col=c("black","red"),
main="Those whose Push-up Exercise Capacity was 0-10 (black) vs. 11-20 (red)")
abline(h=seq(0,1,0.1),col="lightblue")
In case you are wondering why it is called the log rank test, maybe this abstract and this presentation at the (Montreal) CRM Workshop Survival Analysis in 2005 and the Annual Conference of Applied Statistics in Ireland in 2006, might help.
8 * 100000 / 1757
= 455.3215709 backcalculated from the the reported event rate of 1757 per 100,000 person years, and the ‘numerator’ of 8 eventsplot( c(0,10), c(220,-80),
col="white",
xlab="Follow-up Year",
ylab="Population-Sizes (ignore the negative sign)",
cex.axis=1.5, cex.lab=1.5)
AT.RISK=t(matrix(
c( 75, 68, 63, 60, 53, 39, 32,28,26,23,17,
200,200,186,184,172,139,118,96,89,78,63),11,2) )
polygon( c(0, 0:10, 10),
c(0, AT.RISK[2,], 0), col="grey95",border=NA)
rect(0,0, 10,mean( AT.RISK[2,]) )
polygon( c(0, 0:10, 10),
c(0, -AT.RISK[1,], 0), col="grey85",border=NA)
rect(0,0,10,-mean( AT.RISK[1,]) )
text(0, -mean(AT.RISK[1,])/2,
toString( mean(AT.RISK[1,]) ),
adj=c(-0.25,0.5), cex=2 )
text(5, -mean(AT.RISK[1,]), "10",
adj=c(0.5,-0.2) , cex=2 )
8 * 100000 / 1757 # back-calculation of no. of person-years.
## [1] 455.3216
9 * 100000 /625
## [1] 1440
Calculate a crude IRR and 95% CI for the IRR for the 11-20 (index category) versus the 0-10 (reference category) contrast. Compare them with those reported in Table 2. [Hint: work with logIRR, so that its variance is the sum of the variances of the logs of the 2 Poisson random variables – encountered already in exercise 0.1 in the notes on intensity rates:- models / inference / planning ; compute the CI in the log scale, then transfer it back to the IRR scale.]
.
With 8 events in 455.3PY, 9 in 1440PY, the rate.ratio = (9/1440)/(8/455.3) =
= 0.36.
.
The SE of the log of the ratio is SE.log = sqrt(1/9 + 1/8)
= 0.49, so the multiplicative margin of error is MME = exp( qnorm(c(0.025,0.975))*SE.log)
= 0.39, 2.59, so the CI for the rate ratio is round(rate.ratio*MME,2)
= 0.14, 0.93, which closely matches the reported one.
.
In case the concept of a multiplicative margin of error is new and mysterious to you, think of it as a ‘shortcut’ that avoids having to take the log of the observed rate ratio, then calculate the CI in the log scale, then transform these limits back to the natural, ratio, scale. Instead we leave the rate ratio in place, and ‘multiply and divide’ it by the back-transformed margins of error. The MMEs are symmetric in the log scale, and allow us to describe the lower limit as ‘x fold’ or ‘x-times’ lower than the point estimate, and the upper limit as ‘x fold’ or ‘x-times’ higher than the point estimate. It encourages us to think in a multiplicative scale, and to get away from thinking that the only limits are \(+/-\) style limits. Now we have \(\div \ \times\) style limits.
Why are the corresponding adjusted IRR (2 vs 1) estimates from model 2 in Table 3 closer to the null (i.e., to IRR=1) than the crude one in Table 2?
.
Because the UNadjusted one compared YOUNGER and fitter men with compared OLDER and less fit men. If we level the playing field (to use a term we hear a lot these days) or **take away* the starting age-advantage, then naturally, the event rate ratio (the Incidence Density Ratio, the Hazard Ratio) is less extreme.