Comment on the null hypothesis (first sentence section 9.2, page 83 of the Moderna protocol
What does a ‘proportional hazards VE’ (top of p. 84) mean? Is proportional hazards a reasonable assumption?
What is a ‘stratified Cox proportional hazard model’? How it is fitted?
Why use of a 1-sided 0.025 significance level? (c.f. FDA guidelines).
‘The primary analysis population for efficacy will be the Per-Protocol (PP) Set, defined in Section 9.4. In the primary analysis of efficacy, cases will be counted starting 14 days after the second dose of IP.’ Explain what is meant by the ‘Per-Protocol (PP) Set.’
Rework the statistical calculations shown in Table 6. Do so ‘from scratch,’ using first principles, i.e. don’t use the referenced R package. Begin with the simplest case, where there are no interim analyses.
D = 151
alpha = 0.05
E.null = 0.3
V.P = 1
( P.case.was.V.null = V.P *(1-E.null) /
( V.P *(1-E.null) + 1*1 ) )
## [1] 0.4117647
7/17
## [1] 0.4117647
n.cases.in.V.arm = 0:D
probs.under.null = dbinom(
n.cases.in.V.arm, D, P.case.was.V.null )
( n.critical = max(n.cases.in.V.arm[
pbinom(n.cases.in.V.arm, D, P.case.was.V.null) < alpha/2]) )
## [1] 49
pbinom(n.critical,D,P.case.was.V.null)
## [1] 0.01711484
binom.test(n.critical,D,P.case.was.V.null)
##
## Exact binomial test
##
## data: n.critical and D
## number of successes = 49, number of trials = 151, p-value = 0.0314
## alternative hypothesis: true probability of success is not equal to 0.4117647
## 95 percent confidence interval:
## 0.2506521 0.4053987
## sample estimates:
## probability of success
## 0.3245033
E.alt = 0.6
( P.case.was.V.alt = V.P *(1-E.alt) /
( V.P *(1-E.alt) + 1*1 ) )
## [1] 0.2857143
pbinom(n.critical,D,P.case.was.V.alt)
## [1] 0.873225
library(gsDesign)
gsDesign(k = 3, test.type = 2,
sfu="OF",n.fix = 151)
## Symmetric two-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
## Spending computations assume trial stops
## if a bound is crossed.
##
##
## Analysis N Z Nominal p Spend
## 1 52 3.47 0.0003 0.0003
## 2 103 2.45 0.0071 0.0069
## 3 154 2.00 0.0225 0.0178
## Total 0.0250
##
## ++ alpha spending:
## O'Brien-Fleming boundary.
##
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
##
## Upper boundary (power or Type I Error)
## Analysis
## Theta 1 2 3 Total E{N}
## 0.0000 0.0003 0.0069 0.0178 0.025 152.7
## 0.2638 0.0565 0.5288 0.3147 0.900 120.6
##
## Lower boundary (futility or Type II Error)
## Analysis
## Theta 1 2 3 Total
## 0.0000 3e-04 0.0069 0.0178 0.025
## 0.2638 0e+00 0.0000 0.0000 0.000
gsDesign(k = 3, test.type = 2,
sfu="Pocock",n.fix = 151)
## Symmetric two-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
## Spending computations assume trial stops
## if a bound is crossed.
##
##
## Analysis N Z Nominal p Spend
## 1 58 2.29 0.011 0.0110
## 2 116 2.29 0.011 0.0079
## 3 174 2.29 0.011 0.0060
## Total 0.0250
##
## ++ alpha spending:
## Pocock boundary.
##
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
##
## Upper boundary (power or Type I Error)
## Analysis
## Theta 1 2 3 Total E{N}
## 0.0000 0.011 0.0079 0.0060 0.025 170.3
## 0.2638 0.389 0.3421 0.1689 0.900 108.9
##
## Lower boundary (futility or Type II Error)
## Analysis
## Theta 1 2 3 Total
## 0.0000 0.011 0.0079 0.006 0.025
## 0.2638 0.000 0.0000 0.000 0.000
cases = c(11,185)
PY = c(3304.9, 3273.7)
#See https://jhanley.biostat.mcgill.ca/bios601/COVIDvaccineTrials/VaccineTrials.html
# get binomial-based CI for expected (theoretical) proportion of vaccinated cases of COVID-19, i.e, of all COVID cases, what proportion ($\pi$) would you expect in the vaccinated PY?
binom.test(cases[1], sum(cases))
##
## Exact binomial test
##
## data: cases[1] and sum(cases)
## number of successes = 11, number of trials = 196, p-value < 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.02834578 0.09819815
## sample estimates:
## probability of success
## 0.05612245
If PY were 50:50, then we could back-translate these limits to VE limits, using the expression \[\pi = \frac{100-VE}{200-VE},\] and its inverse
\[VE = \frac{100-200\pi}{1-\pi} = \frac{100-200 \times 0.056}{1-0.056} = 93.6 \ percent.\]
Because the PY were nor exactly equal, we can use the more general formula (see HPV trial):
\[\pi = \frac{F \times (100-VE)}{F \times (100-VE) + (1-F) \times 100},\] where \(F\) is the fraction of all PY that were contributed by those in the Vaccinated arm [here, \(F\) = 3304.9/(3304.9+ 3273.7)].
Likewise, rework the statistical calculations shown in Table 19.
Some authorities are planning to use just a single dose. Can you use the data in the Kaplan Meier plots to obtain a point and and interval estimate of the efficacy of a single dose, and if you can, what are the limitations? (You might find the information in Fig. 3 of the NEJM report of the BioNTech/Pfizer trial easier to work with). (See also the paragraph beginning with “Delaying or deferring the second dose of the Pfizer or Moderna vaccine has the same prob- lem” in Risks of altering vaccination protocols are unknown by Montreal Gazette columnist Dr. Christopher Labos, the comments by McGill’s Dr. Donald Vinh, as well as this BBC piece).
Comment on Senn’s Serendipity dividend? on the AstraZeneca/Oxford announcement. “As the press release put it, two different dosing regimens demonstrated efficacy with one showing a better profile.”
Paraphrase the message in this item from the StatsChat website at the University of Auckland.
Repeat the statistical calculations in the ‘SputnikV’ report. Also comment on any major design/analysis differences vis-a-vis the other trials you have looked at.
In order to better understand these curves, it would help to be able to recreate the daily numbers of patients receiving a first psychiatric diagnoses as well as the daily numbers at risk. Focus on the two curves (COVID-19 patients vs. influenza patients) in the top left [F20-F48] panel of Figure 1 in the main text, and on ‘Table 1 – Characteristics of the COVID-19 and influenza cohorts before and after matching’ on page 9 of the supplement.
For the Covid group, the rate of being diagnosed with psychiatric illness is ~95 people/day. For the control group, it is ~55 people/day. In total there were ~10,000 at risk for first psychiatric diagnosis in the Covid group at day 0, and ~12,750 at risk in the Control group at day 0. Therefore total number of patients receiving a first psychiatric diagnosis by day 90 is ~8,542 in the Covid group (85%) and ~4,950 in the Control group (40%).
# input numbers at risk into dataframe
psych_df <- data.frame(
day = c(30,45,60,75,90,30,45,60,75,90),
group = c(rep("Covid",5),rep("Control",5)),
nar = c(7112,5265,3827,2602,1458,11174,10199,9371,8634,7800))
# calculate slope for covid group ~95
(1458-7112)/(90-30)
## [1] -94.23333
# calculate slope for control group ~55
(7800-11174)/(90-30)
## [1] -56.23333
# calculate number at risk at day 0 for covid group
1458 + 95*90
## [1] 10008
# calculate number at risk at day 0 for control group
7800 + 55*90
## [1] 12750
# calculate total number who received first diagnosis in covid group
10000-1458
## [1] 8542
# calculate total number who received first diagnosis in control group
12750-7800
## [1] 4950
# calculate probability of receiving first diagnosis in covid group
8542/10000
## [1] 0.8542
# calculate probability of receiving first diagnosis in control group
4950/12750
## [1] 0.3882353
JH: These calculations are in conflict with the (low) cumulative incidence. There is a lot of censoring. See below.
Day 0 # at risk: 24857
Day 30: 7112 * 0.16
7112*0.16
## [1] 1137.92
5265 * 0.2
## [1] 1053
3827 * 0.4
## [1] 1530.8
2602 * 0.42
## [1] 1092.84
1458 * 0.5
## [1] 729
JH: looks like you are multiplying by CUMULATIVE incidence, rather than (daily) INCIDENCE, so double counting. See below, where the cumulative incidence is ‘de-cumulated’
library(data.table)
library(tidyverse)
library(splitstackshape)
library(survival)
##
## Attaching package: 'survival'
## The following object is masked from 'package:boot':
##
## aml
load(url('https://osf.io/mxa4e/download'))
kmResults=resMain[[1]]$outcomes[[1]]$KM$survival
df <- as.data.frame(kmResults)
colors <- c("COVID-19" = "red", "Influenza" = "blue")
ggplot(df, aes(x = time, y = 1-values1, color="COVID-19")) +
geom_line() +
geom_ribbon(aes(ymin=1-CI1.lower, ymax=1-CI1.upper, color="COVID-19"),
fill="red", alpha=0.2)+
geom_line(aes(x = time, y = 1-values2, color="Influenza"))+
geom_ribbon(aes(ymin=1-CI2.lower, ymax=1-CI2.upper, color="Influenza"),
fill="blue", alpha=0.2)+
xlab("Days") +
ylab("Outcome probability (%)") +
theme_bw() +
scale_colour_manual(values=colors)+
theme(legend.title = element_blank())+
guides(color=guide_legend(override.aes=list(fill=NA,border = NA)))+
ggtitle("Psychiatric illness (F20-F48)")
## Getting Number of events based on data for COVID-19
n.at.risk <- data.frame(time=c(14,30,45,60,75,90),
n.at.risk1=c(25850,7112,5265,3827,2602,1458),
n.at.risk1.st=c(18738,1847,1438,1225,1144,1458),
n.at.risk2=c(24857,11174,10199,9371,8634,7800),
n.at.risk2.st=c(13683,975,828,737,834,7800))
df2 <- merge(df,n.at.risk,by="time",all.x = T)
df2$n.events.1 <- ifelse(df2$time==14,
round(df2$n.at.risk1*(1-df2$values1)),NA)
df2$n.events.1 <- ifelse(df2$time==30,
round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==14]))
,df2$n.events.1)
df2$n.events.1 <- ifelse(df2$time==45,
round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==30]))
,df2$n.events.1)
df2$n.events.1 <- ifelse(df2$time==60,
round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==45]))
,df2$n.events.1)
df2$n.events.1 <- ifelse(df2$time==75,
round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==60]))
,df2$n.events.1)
df2$n.events.1 <- ifelse(df2$time==90,
round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==75]))
,df2$n.events.1)
df2$n.events.2 <- ifelse(df2$time==14,
round(df2$n.at.risk2*(1-df2$values2)),NA)
df2$n.events.2 <- ifelse(df2$time==30,
round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==14]))
,df2$n.events.2)
df2$n.events.2 <- ifelse(df2$time==45,
round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==30]))
,df2$n.events.2)
df2$n.events.2 <- ifelse(df2$time==60,
round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==45]))
,df2$n.events.2)
df2$n.events.2 <- ifelse(df2$time==75,
round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==60]))
,df2$n.events.2)
df2$n.events.2 <- ifelse(df2$time==90,
round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==75]))
,df2$n.events.2)
df3 <- na.omit(df2)
sum(df3$n.events.1)
## [1] 252
sum(df3$n.events.2)
## [1] 272
str(df3)
## 'data.frame': 6 obs. of 13 variables:
## $ time : int 14 30 45 60 75 90
## $ values1 : num 1 0.986 0.975 0.964 0.955 ...
## $ CI1.lower : num 0.999 0.983 0.971 0.959 0.949 ...
## $ CI1.upper : num 1 0.988 0.979 0.968 0.96 ...
## $ values2 : num 1 0.993 0.988 0.983 0.978 ...
## $ CI2.lower : num 0.999 0.992 0.986 0.981 0.975 ...
## $ CI2.upper : num 1 0.995 0.99 0.986 0.98 ...
## $ n.at.risk1 : num 25850 7112 5265 3827 2602 ...
## $ n.at.risk1.st: num 18738 1847 1438 1225 1144 ...
## $ n.at.risk2 : num 24857 11174 10199 9371 8634 ...
## $ n.at.risk2.st: num 13683 975 828 737 834 ...
## $ n.events.1 : num 11 97 57 44 24 19
## $ n.events.2 : num 9 69 54 45 50 45
## - attr(*, "na.action")= 'omit' Named int [1:72] 2 3 4 5 6 7 8 9 10 11 ...
## ..- attr(*, "names")= chr [1:72] "2" "3" "4" "5" ...
# df.covid <- df3 %>% select(time, n.at.risk1, n.events.1,values1,n.at.risk1.st)
df.covid = df3[,c("time", "n.at.risk1", "n.events.1",
"values1","n.at.risk1.st")]
df.covid.2 <- expandRows(df.covid, "n.at.risk1.st", count.is.col = TRUE, drop = F)
df.covid.2 <- df.covid.2 %>% group_by(time) %>% mutate(observation = 1:n())
df.covid.2$status <- ifelse(df.covid.2$time==14 & df.covid.2$observation<=11,1,
ifelse(df.covid.2$time==30 & df.covid.2$observation<=97,1,
ifelse(df.covid.2$time==45 & df.covid.2$observation<=57,1,
ifelse(df.covid.2$time==60 & df.covid.2$observation<=44,1,
ifelse(df.covid.2$time==75 & df.covid.2$observation<=24,1,
ifelse(df.covid.2$time==90 & df.covid.2$observation<=29,1,
0))))))
covid.fit.log <- summary(survfit(Surv(df.covid.2$time,df.covid.2$status)~1), conf.type="log")
covid.fit.log
## Call: survfit(formula = Surv(df.covid.2$time, df.covid.2$status) ~
## 1)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 14 25850 11 1.000 0.000128 0.999 1.000
## 30 7112 97 0.986 0.001381 0.983 0.989
## 45 5265 57 0.975 0.001960 0.971 0.979
## 60 3827 44 0.964 0.002565 0.959 0.969
## 75 2602 24 0.955 0.003118 0.949 0.961
## 90 1458 29 0.936 0.004641 0.927 0.945
covid.fit.log.log <- summary(survfit(Surv(df.covid.2$time,df.covid.2$status)~1), conf.type="log-log")
covid.fit.log.log
## Call: survfit(formula = Surv(df.covid.2$time, df.covid.2$status) ~
## 1)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 14 25850 11 1.000 0.000128 0.999 1.000
## 30 7112 97 0.986 0.001381 0.983 0.989
## 45 5265 57 0.975 0.001960 0.971 0.979
## 60 3827 44 0.964 0.002565 0.959 0.969
## 75 2602 24 0.955 0.003118 0.949 0.961
## 90 1458 29 0.936 0.004641 0.927 0.945
##Comparing with the original data for COVID-19:
data.frame("Time"=unname(df3[1]),"Surv"=as.vector(covid.fit.log[6]),"Surv from data"=unname(df3[2]),
CI1.lower=covid.fit.log[c(14)],"lower from data"=unname(df3[3]),
CI1.upper=covid.fit.log[c(15)],"upper from data"=unname(df3[4]))
## Time surv Surv.from.data lower lower.from.data upper
## 1 14 0.9995745 0.9995575 0.9993231 0.9992011 0.9998259
## 17 30 0.9859414 0.9858750 0.9832392 0.9831579 0.9886509
## 32 45 0.9752673 0.9752556 0.9714331 0.9714446 0.9791167
## 47 60 0.9640544 0.9640431 0.9590403 0.9591157 0.9690948
## 62 75 0.9551623 0.9549915 0.9490705 0.9490222 0.9612932
## 77 90 0.9361639 0.9422307 0.9271119 0.9342407 0.9453042
## upper.from.data
## 1 0.9997549
## 17 0.9881564
## 32 0.9785635
## 47 0.9683865
## 62 0.9602766
## 77 0.9492763
## Getting Number of events based on data for Influenza
# df.influ <- df3 %>% select(time, n.at.risk2, n.events.2,values2,n.at.risk2.st)
df.influ <- df3[, c("time", "n.at.risk2", "n.events.2",
"values2","n.at.risk2.st") ]
df.influ.2 <- expandRows(df.influ, "n.at.risk2.st", count.is.col = TRUE, drop = F)
df.influ.2 <- df.influ.2 %>% group_by(time) %>% mutate(observation = 1:n())
df.influ.2$status <- ifelse(df.influ.2$time==14 & df.influ.2$observation<=9,1,
ifelse(df.influ.2$time==30 & df.influ.2$observation<=69,1,
ifelse(df.influ.2$time==45 & df.influ.2$observation<=54,1,
ifelse(df.influ.2$time==60 & df.influ.2$observation<=45,1,
ifelse(df.influ.2$time==75 & df.influ.2$observation<=50,1,
ifelse(df.influ.2$time==90 & df.influ.2$observation<=45,1,
0))))))
influenza.fit.log <- summary(survfit(Surv(df.influ.2$time,df.influ.2$status)~1), conf.type="log")
influenza.fit.log
## Call: survfit(formula = Surv(df.influ.2$time, df.influ.2$status) ~
## 1)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 14 24857 9 1.000 0.000121 0.999 1.000
## 30 11174 69 0.993 0.000750 0.992 0.995
## 45 10199 54 0.988 0.001033 0.986 0.990
## 60 9371 45 0.983 0.001247 0.981 0.986
## 75 8634 50 0.978 0.001477 0.975 0.981
## 90 7800 45 0.972 0.001691 0.969 0.975
influenza.fit.log.log <- summary(survfit(Surv(df.influ.2$time,df.influ.2$status)~1), conf.type="log-log")
influenza.fit.log.log
## Call: survfit(formula = Surv(df.influ.2$time, df.influ.2$status) ~
## 1)
##
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 14 24857 9 1.000 0.000121 0.999 1.000
## 30 11174 69 0.993 0.000750 0.992 0.995
## 45 10199 54 0.988 0.001033 0.986 0.990
## 60 9371 45 0.983 0.001247 0.981 0.986
## 75 8634 50 0.978 0.001477 0.975 0.981
## 90 7800 45 0.972 0.001691 0.969 0.975
data.frame("Time"=unname(df3[1]),"Surv"=as.vector(influenza.fit.log[6]),"Surv from data"=unname(df3[5]),
CI1.lower=influenza.fit.log[c(14)],"lower from data"=unname(df3[6]),
CI1.upper=influenza.fit.log[c(15)],"upper from data"=unname(df3[7]))
## Time surv Surv.from.data lower lower.from.data upper
## 1 14 0.9996379 0.9996518 0.9994015 0.9993310 0.9998745
## 17 30 0.9934651 0.9934583 0.9919953 0.9918545 0.9949371
## 32 45 0.9882051 0.9881514 0.9861827 0.9859959 0.9902316
## 47 60 0.9834597 0.9833711 0.9810189 0.9807940 0.9859065
## 62 75 0.9777644 0.9777117 0.9748737 0.9746913 0.9806637
## 77 90 0.9721234 0.9721212 0.9688147 0.9686907 0.9754435
## upper.from.data
## 1 0.9998188
## 17 0.9947472
## 32 0.9899769
## 47 0.9856050
## 62 0.9803753
## 77 0.9751807
# Fig 1 top left
day=c(14,30,45,60,75,90)
n.at.risk.covid =c(24857, 7112, 5265,3827,2602,1458)
n.at.risk.flu =c(25850,11174,10199,9371,8634,7800)
# take advantage of approx function to interpolate
( N.at.risk.flu = round(approx(day,n.at.risk.flu,14:90)$y) )
## [1] 25850 24933 24016 23098 22181 21264 20346 19429 18512 17595 16678 15760
## [13] 14843 13926 13008 12091 11174 11109 11044 10979 10914 10849 10784 10719
## [25] 10654 10589 10524 10459 10394 10329 10264 10199 10144 10089 10033 9978
## [37] 9923 9868 9813 9757 9702 9647 9592 9537 9481 9426 9371 9322
## [49] 9273 9224 9174 9125 9076 9027 8978 8929 8880 8831 8781 8732
## [61] 8683 8634 8578 8523 8467 8412 8356 8300 8245 8189 8134 8078
## [73] 8022 7967 7911 7856 7800
( N.at.risk.covid = round(approx(day,n.at.risk.covid,14:90)$y) )
## [1] 24857 23748 22639 21530 20421 19312 18203 17094 15984 14875 13766 12657
## [13] 11548 10439 9330 8221 7112 6989 6866 6743 6619 6496 6373 6250
## [25] 6127 6004 5881 5758 5634 5511 5388 5265 5169 5073 4977 4882
## [37] 4786 4690 4594 4498 4402 4306 4210 4115 4019 3923 3827 3745
## [49] 3664 3582 3500 3419 3337 3255 3174 3092 3010 2929 2847 2765
## [61] 2684 2602 2526 2449 2373 2297 2221 2144 2068 1992 1916 1839
## [73] 1763 1687 1611 1534 1458
# Make 2 'POPULATION-TIME' PLOTS
# (areas are total amounts of population-time)
par(mfrow=c(1,1),mar=c(4,4,0,0))
plot(c(14,90),c(-30000,30000),col="white",
xlab="Day", ylab="No. at Risk")
polygon(c(14, 14:90, 90,14),
c(0,N.at.risk.covid,0,0),
border=NA,
col="pink")
polygon(c(14, 14:90, 90,14),
c(0,-N.at.risk.flu,0,0),
border=NA,
col="lightblue")
Rough re-construction of Numbers at Risk
cbind(14:90,N.at.risk.flu, N.at.risk.covid)[1:10,]
## N.at.risk.flu N.at.risk.covid
## [1,] 14 25850 24857
## [2,] 15 24933 23748
## [3,] 16 24016 22639
## [4,] 17 23098 21530
## [5,] 18 22181 20421
## [6,] 19 21264 19312
## [7,] 20 20346 18203
## [8,] 21 19429 17094
## [9,] 22 18512 15984
## [10,] 23 17595 14875
# Daily Number of Cases = daily rate * N. at Risk
# Daily Rate (eye fit)
# Theoretically, Risk = 1- exp [- integral of rate function ]
# if ultimate (90 day) risk is low,
# Risk = sum of daily rates, so
# (ave. of daily rates = 90-day-risk / No. of days
# 90-day risks are 0.028 and 0.058, so ...
daily.incidence.rate.flu = 0.028/(90-13)
n.cases.flu.cohort = round(N.at.risk.flu *
daily.incidence.rate.flu )
daily.incidence.rate.covid = 0.058/(90-13)
n.cases.covid.cohort = round( N.at.risk.covid *
daily.incidence.rate.covid)
cbind( day, n.cases.flu.cohort,n.cases.covid.cohort )
## Warning in cbind(day, n.cases.flu.cohort, n.cases.covid.cohort): number of rows
## of result is not a multiple of vector length (arg 1)
## day n.cases.flu.cohort n.cases.covid.cohort
## [1,] 14 9 19
## [2,] 30 9 18
## [3,] 45 9 17
## [4,] 60 8 16
## [5,] 75 8 15
## [6,] 90 8 15
## [7,] 14 7 14
## [8,] 30 7 13
## [9,] 45 7 12
## [10,] 60 6 11
## [11,] 75 6 10
## [12,] 90 6 10
## [13,] 14 5 9
## [14,] 30 5 8
## [15,] 45 5 7
## [16,] 60 4 6
## [17,] 75 4 5
## [18,] 90 4 5
## [19,] 14 4 5
## [20,] 30 4 5
## [21,] 45 4 5
## [22,] 60 4 5
## [23,] 75 4 5
## [24,] 90 4 5
## [25,] 14 4 5
## [26,] 30 4 5
## [27,] 45 4 4
## [28,] 60 4 4
## [29,] 75 4 4
## [30,] 90 4 4
## [31,] 14 4 4
## [32,] 30 4 4
## [33,] 45 4 4
## [34,] 60 4 4
## [35,] 75 4 4
## [36,] 90 4 4
## [37,] 14 4 4
## [38,] 30 4 4
## [39,] 45 4 3
## [40,] 60 4 3
## [41,] 75 4 3
## [42,] 90 4 3
## [43,] 14 3 3
## [44,] 30 3 3
## [45,] 45 3 3
## [46,] 60 3 3
## [47,] 75 3 3
## [48,] 90 3 3
## [49,] 14 3 3
## [50,] 30 3 3
## [51,] 45 3 3
## [52,] 60 3 3
## [53,] 75 3 3
## [54,] 90 3 2
## [55,] 14 3 2
## [56,] 30 3 2
## [57,] 45 3 2
## [58,] 60 3 2
## [59,] 75 3 2
## [60,] 90 3 2
## [61,] 14 3 2
## [62,] 30 3 2
## [63,] 45 3 2
## [64,] 60 3 2
## [65,] 75 3 2
## [66,] 90 3 2
## [67,] 14 3 2
## [68,] 30 3 2
## [69,] 45 3 2
## [70,] 60 3 2
## [71,] 75 3 1
## [72,] 90 3 1
## [73,] 14 3 1
## [74,] 30 3 1
## [75,] 45 3 1
## [76,] 60 3 1
## [77,] 75 3 1
sum( n.cases.flu.cohort )
## [1] 318
sum( n.cases.covid.cohort )
## [1] 384
path = "/Users/jameshanley/OneDrive - McGill University/work/RF/OSF_COVIDPsychiatricSequelae.Rdata"
x = load(path)
y = get(x[1])
psy = y
# Extract data
timepoints <- resMain[[1]]$outcomes[[1]]$KM$survival$time
v1 <- resMain[[1]]$outcomes[[1]]$KM$survival$values1 # COVID-19 cohort
v2 <- resMain[[1]]$outcomes[[1]]$KM$survival$values2 # Influenza cohort
plot(timepoints,v1, pch=19,cex=0.5,
type="l",
ylab="Survival (proportion without Psych Dx)",
xlab="Days post Covid-19 Dx",
ylim=c(0.93, 1))
lines(timepoints,v2, pch=19,cex=0.5)
# CIs:
ci1 <- resMain[[1]]$outcomes[[1]]$KM$survival$CI1 # COVID-19 cohort
ci2 <- resMain[[1]]$outcomes[[1]]$KM$survival$CI2 # Influenza cohort
segments(timepoints, ci1[,1],
timepoints, ci1[,2] )
segments(timepoints, ci2[,1],
timepoints, ci2[,2] )
The team’s graph was entitled
Psychiatric diagnosis in women aged 20-48 y
JH: this is an amusing ‘translation’ of the abbreviations F20-F48! It is important to read not just the statistical methods section but also the other sections. Under ‘Variables of interest and their coding’ one can read
We defined a psychiatric illness as any of the ICD-10 codes F20–F48, comprising psychotic (F20–F29), mood (F30–F39), and anxiety (F40–F48) disorders.
# From rearranging the Greenwood formula, we get the variance:
# this assumes the “plain” conf.interval type and 95% interval
var <- ((v1 - ci1[,1])/1.96)^2 #covid-19 group
var2 <- ((v2 - ci2[,1])/1.96)^2 #influenza group
plot(timepoints,var, pch=19,
xlab="Days post Covid-19 Dx",
cex=0.5,
ylab="Variance[Survival proportion]")
points(timepoints,var2, pch=19,cex=0.5)
We used the Greenwood formula to estimate the variance of the survival function at each timepoint. As seen in the graph above, the variance increases over time, as the number at risk decreases. In the Covid-19 cohort, this increase in variance is more rapid compared to the influenza cohort.
JH: Good start, by getting approximate Greenwood Variance (GV). With the passage of time, as S falls, the first term in the variance falls, but the second term increases. See below on how to solve for the \(n_j\)’s.
JH: continued…
\[GV(\widehat{S[t]}) = S(t)^2 \times \sum_j \frac{1-s_j}{n_j \times s_j}, \] where the summation is over (risksets) \(j \le t\), where \(s_j\) refers to the conditional survival probability for riskset \(j\), and \(n_j\) is the size of the \(j \ th\) riskset.
Since the daily incidence is low (in the COVID cohort, for example, each \(1 - s_j \approx 0.058/75\)) and this each \(s_j\) is close to 1, the denominators in the sum are close to \(n_j\).
If there were NO censoring, so the \(n_j\) decrease only because of the new psych. diagnoses , the entire sum would be approx.
\[ \sum_{75 \ risksets} \frac{0.058/75}{slowly \ decreasing \ n's \ \times \ quantities \ close \ to \ 1} \] So, if S[90] = 1-0.058 = 0.942, then (approximately) \[GV(\widehat{S[t]}) = 0.942^2 \times \frac{0.058}{0.942 \times original \ n} = \frac{0.942 \times 0.058}{original \ n}.\]
This is the pure (single-binomial, at the end of 90 days) variance. Obviously, if the daily \(n_j\) decrease for other reasons, and by large amounts, then the GV will be a lot larger. Indeed, how much wider than the simple binomial \(0.058 \times 0.942\) / ‘starting-out \(n\)’ the variance of \(\widehat{S[90]}\) actually is gives a good sense of how much censoring there is. Of course, at the bottom of the Figure, we are given the numbers at risk at selected timepoints, and so we can see for ourselves. But, in situations where such numbers at risk are not provided, the extra variance can be a good indicator of the degree of censoring.
Orientation [from letter under review]
Reconstruction Methods
The daily numbers of patients at risk, and the daily numbers of new
cases, were back-calculated from the Rdata
file (supplied
as a supplement to the published article) using the widths of the
confidence intervals and the cumulative incidences.
The R
code is given below.
The numbers at risk were back-calculated from the confidence intervals (CIs) and point estimates.
We illustrate using the day-14 point and interval estimate.
The CI was asymmetric, so we used trial and error to determine that
the CI option used by the survfit program in R
was the
logit-based one. By using the appropriate Jacobian, we were thus able to
back-calculate the Greenwood variance for S(14) – this variance is at
the base of all CI options.
The Greenwood variance for S[14] is a function of S[14] itself
(provided in the RData
file), as well as the number of
events and the number at risk. S[14] is a different function of the
number of events and the number at risk. So, from the variance and the
point estimate, we can solve for the 2 unknowns.
Again, the variance for S[15] involves S[15] itself, but it now involves a sum of two variances, the first of which we already have calculated when dealing with S[14], and the second which we obtain by subtraction. Knowing that S[15] is a product of two conditional probabilities, the first of which is S[14] itself, we can get the second. So, again, we can solve two equations to get the number of events and the number at risk for day 15.
And so on.
We were able to double-check our calculations in two ways.
First, we checked that our back-calculated numbers at risk matched the selected ones (at days 30, 45, etc) reported in the article.
Second, we ran our reconstructed daily numbers of cases and numbers
at risk though the R survfit
program, with the logit-based
CI option, and compared the output with that in the Rdata
file supplied.
First 2 calculations , in symbols:
upper case S = ‘cumulative’, lower case s = conditional.
The GV for \(S[14]\) is \[S[14]^2 \frac{1-s_{14}}{n_{14} \times s[14]}\] For very first day, \(S[14] = s_{14}\); we are given \(S[14]\), so if we can back-calculate GV for \(S[14]\), we can solve for \(n_{14}\). We back-calculate the GV from the limits of the CI for \(S[14]\), which are included in the file. (see below)
Now consider day 15, where \[S[15] = s_{14} \times s_{15}.\] We are given \(S[15]\), so we can back-calculate \(s_{15}\).
The GV for \(S[15]\) is \[S[15]^2 \times \{ \frac{1-s_{14}}{n_{14} \times s_{14}} + \frac{1-s_{15}}{n_{15} \times s_{15}} \}\] Again if we can back-calculate the GV from the limits of the CI for \(S[15]\), which are included in the file, we can solve for \(n_{15}\),
and so on
If the authors had used symmetric CIs, it would be very easy to back-calculate the GV: one would simply divide the width of the CI by 2 \(\times\) 1.96, and square this quantity.
However, as is clear from the graph above, the CI is not symmetric, so now it is a matter of figuring out which ‘link’ function (transformation) was used. JH did this by trial and error, each time using the Delta method to connect the variance of the transform and the GV value.
It appears that the authors used the logit transform. (This keeps the CI within the (0,1) bound for S)
# de luxe
# First, some double-checking of rescaling
# factors involved in variances of functions
# of random variables
logit = function(x) log(x/(1-x))
cll = function(x) log(-log(1-x))
S = 0.9; dS = 0.001
( VLogitSDirect = 1/(S*(1-S)) )
## [1] 11.11111
VS = S*(1-S)
( J = diff( logit(S + c(-dS/2,dS/2) ) /dS ) )
## [1] 11.11119
J^2
## [1] 123.4586
J^2 * ( S*(1-S) )
## [1] 11.11128
# Now to the 'data'
setwd("/Users/jameshanley/OneDrive - McGill University/work/RF")
load('OSF_COVIDPsychiatricSequelae.Rdata')
par(mfrow=c(2,3),mar=c(0,0,0,0))
Y0 = 55000
dS = 11
panels=1:6 # (all 6)
panels=1:1 # (just 1st)
if(length(panels)==6) par(mfrow=c(2,3),mar=c(0,0,0,0))
if(length(panels)==1) par(mfrow=c(1,1),mar=c(0,0,0,0))
for(ctlGp in panels){
km=resMain[[ctlGp]]$outcomes[[1]]$KM$survival
index=resMain[[ctlGp]]$cohort1
ref =resMain[[ctlGp]]$cohort2
outcome = resMain[[ctlGp]]$outcomeNames[1]
for(i in 1:2){
t = km$time
L = km$CI1$lower
S = km$values1
U = km$CI1$upper
if(i==2){
L = km$CI2$lower
S = km$values2
U = km$CI2$upper
}
# print(100*min(1-S))
m=cbind(t,L,S,U)
ii=1:50
t(apply( m[ii,2:4], 1, diff))
t(apply( cll(m[ii,2:4]), 1, diff))
t(apply( logit(m[ii,2:4]), 1, diff))
t(apply( m[ii,2:4], 1, diff))
W = (2*qnorm(0.975))
W = 2*1.96
SE = (logit(U) - logit(L))/W
Vlogit = SE^2
J = 1/(S * (1-S))
Vgreenwood = Vlogit / J^2
Sum = Vgreenwood/(S^2)
( v = c(Sum[1],diff(Sum[1:77])) )
n = rep(NA,77)
q= c( 1-S[1], 1- S[2:77]/S[1:76] )
imputed.n = ( q/(1-q) ) / v
n = imputed.n
# print(n)
for.missing = approx(1:77,n,1:77)$y
n[is.na(n)] = for.missing[is.na(n)]
# print(n)
n = round(n)
# print(n)
y = round( n[1:77] * q ) ; print(sum(y))
y[is.na(y)] = 0
YLAB=""
if(ctlGp %in% c(10,14)) YLAB="No. Persons At Risk"
title = paste(outcome,"; ",index,"vs.",ref)
if(i==1) cohort = index
if(i==2) cohort = ref
if(i==1){
plot(14:90,n,col="white",
xlim=c(10,100),
#xlab="Days Post COVID-19 Diagnosis",
xlab="",
ylim=c(-Y0,Y0+Y0/1.8),
ylab=YLAB,
yaxt="n",xaxt="n",
cex.main=1.8
)
abline(h=seq(-50000,50000,10000),
lwd=0.6,col="grey75")
}
xx = as.numeric(t(matrix(c(14:90,15:91),77,2)))
yy = rep(n,each=2)
if(i==2) polygon(c(xx,91,14),-c(yy,0,0),
col="lightblue",border=NA)
if(i==1) polygon(c(xx,91,14), c(yy,0,0),
col="pink",border=NA)
for(j in 1:77) {
cases = y[j]
Ymin=0; Ymax=n[j]
if(i==2) { Ymin = -n[j] ; Ymax = 0 }
if(cases>0) points(
14+j-1+runif(cases,0.05,0.95),
runif(cases,Ymin+50, Ymax-50),
pch=19,cex=0.15
)
}
yy = (Y0/1.6)*(i==1) -(Y0/1.6)*(i==2)
points(75,yy,pch=19,cex=0.2)
text( 100,yy, paste(toString(sum(y)),"Cases"),
adj=c(1,.5),cex=1.5)
YY=Y0+0.1*Y0
lines(t,YY+(Y0/2)*(1-S)*dS,
type='s',
col=c("pink","lightblue")[i],
lwd=2)
segments(14,YY,91,YY)
segments(14,YY,14,YY+Y0/2.5)
text((14+90)/2, YY -Y0/15,
"Day",adj=c(0.5,1.25))
PCT = round(100*(1-S[77]),1)
pct = toString( PCT )
txt =substr(pct,1,1)
text(94.5,YY+(Y0/2)*(1-S[77])*dS, txt,
adj=c(1,0.5) )
points(95,YY+(Y0/2)*(1-S[77])*dS,pch=19,cex=0.35)
txt =paste(substr(pct,3,3),"%",sep="")
text(95.5,YY+(Y0/2)*(1-S[77])*dS, txt,
adj=c(0,0.5) )
for(day in c(14,seq(30,90,15))){
a = 20-40*(i==2)
if(day==14) a = 0
text(day,YY,toString(day),
adj=c(0.5,1.25))
yy = n[day-13]+5000
if(i==2) yy = -yy +100
text(day,yy,
toString(n[day-13]),
adj=c(0.4,0.5),
srt=a)
if(day==14) text(20,yy,cohort, adj=c(0,0.5),
col= c("pink","lightblue")[i],
cex=1.4, font=2)
}
if(i==1) text(15, Y0 +Y0/1.65,
"Cumulative\nIncidence",
cex=1, adj=c(0,1.05) )
if(ctlGp ==1 ) print(n)
} # i
} # 6
## [1] 302
## [1] 24857 9914 9581 9314 9098 8895 8717 8568 8397 8199 8014 7841
## [13] 7694 7554 7433 7272 7112 6956 6823 6709 6580 6471 6337 6192
## [25] 6052 5962 5852 5740 5610 5500 5367 5265 5168 5059 4956 4865
## [37] 4763 4663 4558 4455 4360 4273 4186 4096 4015 3915 3827 3759
## [49] 3694 3618 3495 3399 3324 3254 3160 3099 3025 2951 2848 2758
## [61] 2670 2602 2524 2450 2368 2287 2206 2129 2053 1979 1906 1831
## [73] 1745 1675 1593 1515 1458
## [1] 285
## [1] 25850 12352 12250 12167 12085 12016 11930 11858 11782 11694 11600 11517
## [13] 11448 11386 11329 11257 11174 11099 11024 10962 10895 10830 10768 10688
## [25] 10629 10563 10508 10446 10393 10342 10268 10199 10145 10093 10026 9953
## [37] 9899 9834 9783 9736 9689 9645 9581 9528 9471 9417 9371 9336
## [49] 9286 9234 9182 9128 9060 9020 8965 8916 8863 8816 8764 8725
## [61] 8674 8634 8584 8534 8474 8410 8363 8312 8271 8230 8176 8119
## [73] 8046 7973 7912 7856 7800
# back translation to check that it works
# this 'if' statement turn a chunk of code on/off
# 1==2 turns it off
# 1==1 turns it on
# (saves comments)
if(1==1){
y
n
s = n-y
(c = c(s[1:76] - n[2:77],s[77]))
time=NULL
dx=NULL
for(i in 1:77){
if(y[i] > 0) {
time=c(time,rep(13+i,y[i]))
dx =c(dx,rep(1,y[i]))
}
if(c[i] > 0) {
time=c(time,rep(13+i,c[i]))
dx =c(dx,rep(0,c[i]))
}
}
sum(dx)
length(dx)
library(survival)
fit = survfit(Surv(time,dx)~1,conf.type="logit")
# plot(fit$time,1-fit$surv)
round(cbind(L[1:77], fit$lower, U[1:77],fit$upper),4)
}
## [,1] [,2] [,3] [,4]
## [1,] 0.9993 0.9993 0.9998 0.9998
## [2,] 0.9984 0.9984 0.9994 0.9994
## [3,] 0.9979 0.9979 0.9992 0.9992
## [4,] 0.9974 0.9974 0.9988 0.9988
## [5,] 0.9971 0.9971 0.9986 0.9986
## [6,] 0.9968 0.9968 0.9984 0.9984
## [7,] 0.9964 0.9964 0.9982 0.9982
## [8,] 0.9956 0.9956 0.9976 0.9976
## [9,] 0.9953 0.9953 0.9974 0.9974
## [10,] 0.9949 0.9949 0.9971 0.9971
## [11,] 0.9943 0.9943 0.9967 0.9967
## [12,] 0.9941 0.9941 0.9965 0.9965
## [13,] 0.9939 0.9939 0.9964 0.9964
## [14,] 0.9932 0.9932 0.9958 0.9958
## [15,] 0.9927 0.9927 0.9954 0.9955
## [16,] 0.9923 0.9923 0.9951 0.9951
## [17,] 0.9919 0.9919 0.9947 0.9947
## [18,] 0.9915 0.9915 0.9944 0.9944
## [19,] 0.9911 0.9911 0.9941 0.9941
## [20,] 0.9908 0.9908 0.9939 0.9939
## [21,] 0.9903 0.9903 0.9935 0.9935
## [22,] 0.9901 0.9901 0.9934 0.9934
## [23,] 0.9896 0.9896 0.9930 0.9930
## [24,] 0.9894 0.9894 0.9928 0.9928
## [25,] 0.9886 0.9886 0.9921 0.9921
## [26,] 0.9883 0.9883 0.9919 0.9919
## [27,] 0.9878 0.9878 0.9915 0.9915
## [28,] 0.9873 0.9873 0.9910 0.9910
## [29,] 0.9869 0.9869 0.9908 0.9908
## [30,] 0.9865 0.9865 0.9904 0.9904
## [31,] 0.9862 0.9862 0.9902 0.9902
## [32,] 0.9860 0.9860 0.9900 0.9900
## [33,] 0.9859 0.9859 0.9899 0.9899
## [34,] 0.9858 0.9858 0.9898 0.9898
## [35,] 0.9855 0.9855 0.9895 0.9895
## [36,] 0.9852 0.9852 0.9893 0.9894
## [37,] 0.9850 0.9850 0.9892 0.9892
## [38,] 0.9847 0.9847 0.9889 0.9889
## [39,] 0.9843 0.9843 0.9885 0.9885
## [40,] 0.9837 0.9837 0.9881 0.9881
## [41,] 0.9834 0.9834 0.9878 0.9878
## [42,] 0.9827 0.9827 0.9872 0.9872
## [43,] 0.9820 0.9820 0.9867 0.9867
## [44,] 0.9817 0.9817 0.9864 0.9864
## [45,] 0.9814 0.9814 0.9861 0.9861
## [46,] 0.9809 0.9809 0.9857 0.9857
## [47,] 0.9808 0.9808 0.9856 0.9856
## [48,] 0.9803 0.9803 0.9852 0.9852
## [49,] 0.9794 0.9794 0.9844 0.9844
## [50,] 0.9791 0.9791 0.9841 0.9841
## [51,] 0.9790 0.9790 0.9840 0.9840
## [52,] 0.9783 0.9783 0.9834 0.9834
## [53,] 0.9780 0.9780 0.9832 0.9832
## [54,] 0.9777 0.9777 0.9829 0.9829
## [55,] 0.9770 0.9770 0.9823 0.9823
## [56,] 0.9767 0.9767 0.9821 0.9821
## [57,] 0.9764 0.9764 0.9818 0.9818
## [58,] 0.9760 0.9760 0.9815 0.9815
## [59,] 0.9757 0.9757 0.9812 0.9812
## [60,] 0.9752 0.9752 0.9808 0.9808
## [61,] 0.9751 0.9751 0.9807 0.9807
## [62,] 0.9747 0.9747 0.9804 0.9804
## [63,] 0.9744 0.9744 0.9802 0.9802
## [64,] 0.9742 0.9742 0.9800 0.9800
## [65,] 0.9741 0.9741 0.9798 0.9798
## [66,] 0.9733 0.9733 0.9792 0.9792
## [67,] 0.9727 0.9727 0.9787 0.9787
## [68,] 0.9724 0.9725 0.9784 0.9784
## [69,] 0.9724 0.9725 0.9784 0.9784
## [70,] 0.9718 0.9718 0.9779 0.9779
## [71,] 0.9714 0.9714 0.9776 0.9776
## [72,] 0.9707 0.9707 0.9769 0.9769
## [73,] 0.9699 0.9699 0.9762 0.9762
## [74,] 0.9692 0.9692 0.9756 0.9756
## [75,] 0.9690 0.9690 0.9754 0.9754
## [76,] 0.9690 0.9690 0.9754 0.9754
## [77,] 0.9687 0.9687 0.9752 0.9752
The effect of a placebo relative to statins on symptom intensity controlled by taking neither treatment.
Equation: SI = symptom intensity \[\frac{(SI_{placebo} - SI_{nocebo}) } {(SI_{statin} - SI_{nocebo})}\]
JH: might have written: The (corrected for background) level of reported symptoms when taking placebo relative to when taking statins. He is not keen on the word ‘controlled’, especially when communicating with general audiences.”Net’ is another useful phrase. BTW, do you mean (Placebo-Nothing)/(Statin-Nothing)?
The high variance in the nocebo ratio was a result of the way the equation for it was defined. If the result was bound for example, between 0 and 3, there wouldn’t be a wide possible range of values as there is with negative values and low values in the denominator. Lower values of the denominator (statin-nocebo), lead to high nocebo ratio, but if the denominator was negative, the resulting ratio would also likely be negative. The nocebo ratios within a patient could vary widely between samples with slight changes to symptom intensity.
JH: Indeed, the between-individual SI range was very large, even if we leave aside the correction for background. If for each of 60 individuals, we computed the ratio of the (grip) strength of the non-dominant to the dominant hand, but these individuals varied in age from 2 to 99, we would have some wild ratios. Ratios can be particularly volatile if the denominators are close to zero (imagine a BMI in newborns and adults, or estimating \(\pi\) from the areas of circles ranging in radius from mm to m, etc.) The effects of measurement errors, variation from day to day, etc. are amplified when one takes a ratio.
suppressMessages(library(tidyverse))
suppressMessages(library(data.table))
suppressMessages(library(MASS))
suppressMessages(library(boot))
ds=read.csv("https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")
print(str(ds))
## 'data.frame': 666 obs. of 4 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ PatientNo : int 1 1 1 1 1 1 1 1 1 1 ...
## $ BottleContent : chr "0" "0" "Statin" "Statin" ...
## $ MeanSymbolScore: num 1.07 0.81 0.71 0.99 0.96 0.81 0.96 0.71 0.74 0.89 ...
## NULL
nocebo <- read.csv("https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv",
header=T) %>%
dplyr::select(-X) %>%
group_by(PatientNo, BottleContent) %>%
summarize(mean = mean(MeanSymbolScore), .groups='drop')
bratio <- function(x, i){
df<- x[i,]
df<-
df %>%
group_by(BottleContent) %>%
summarize(mean=mean(mean), .groups='drop') %>%
spread(BottleContent, mean)
(df$Placebo-df$`0`) / (df$Statin-df$`0`)
}
bratio(nocebo) # point estimate
## [1] 0.8536503
round(bratio(nocebo),2) # ADDED BY JH
## [1] 0.85
b <- boot(nocebo, bratio, R=1000)
ci <- boot.ci(b, conf=0.95, type="perc") #95%CI
ci
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = b, conf = 0.95, type = "perc")
##
## Intervals :
## Level Percentile
## 95% ( 0.3484, 1.7339 )
## Calculations and Intervals on Original Scale
answer: ( 0.3492, 1.5814 ), with mean 0.85. (or 0.35 to 1.58, after JH rounding)
JH: this is a very wide CI. It looks like the function adds noise by using a single sample of size 666, with replacement, and ignores the fact that they belong to 60 patients. (See below).
jacknife and bootstrap, get you an interval etimate that can make sense
The analysis method should be chosen after collecting and understanding the distribution of the data rather than before. Using a measurement that will result in reasonable values - i.e. not using an equation that provides a meaninglessly high or negative result. Consider possible outcomes, and if analysis plan will result in reasonable outcome.
JH: Its very much a question of having an estimator that yields a stable number. A ratio estimator uses a ratio of 2 sums or means, not a mean of many patient-specific ratios. JH’s favourite example is from Miettinen, who asks how you would estimate the sex-ratio in children attending small day-care centers of a few children each. Would you take 1 mean (or median) of the n very volatile center-specific ratios? Why not take 1 ratio of the total numbers of boys to the total number of girls?
As per our discussion last week, the “no prior data” should not have influenced the sample size calculation – which should be based on the minimum clinically relevant effect, and not on prior observed effect sizes
- Nocebo ratio: The nocebo ratio is simply the ratio of symptoms induced by taking a placebo to taking statins. When the ratio is >1, there are more symptoms in the placebo group; when the ratio is <1, there are more symptoms in the statin group; when the ratio =1, there are the same amount of symptoms in the placebo and statin group.
** The subtraction of the level of ‘background’ symptoms is indeed implied by the word ‘Induced’. You might wish to write of the ‘level’ or ‘extent’ of the symptoms.**
- The nocebo ratio was initially calculated as: (symptoms with placebo - symptoms with nothing) / (symptoms with statins-symptoms with nothing) for each individual. For some people, the denominator of this calculation was very small or negative (meaning there were similar amount of symptoms when taking statins and taking nothing, or in some instances symptoms were greater when taking nothing compared to taking statins), and this can likely be attributed to random error. This led to an average nocebo ratio of 2.2 (meaning more symptoms in placebo than statins) with very large confidence intervals, which doesn’t make a lot of sense biologically. To get around this, the authors pooled the symptom scores among everyone and re-calculated the nocebo ratio, which helped reduce the variance/error in the estimate.
Indeed, it stabilized the ratio. Interestingly, this advantage is not mentioned in the Wikipedia account of ratio estimators. The article focuses on bias. It also has a big blunder in the very first sentence (‘The ratio estimator is a statistical parameter!’).
- The 95% CI around the new nocebo ratio can be estimated via bootstrapping; we used 1000 bootstrap samples to estimate a nocebo ratio of 0.89 with 95% CI of 0.55-1.31 (using the percentile method). For simplicity, we did not do the bootstrapping at the patient level.
library(readr)
StatinOrPlacebo <- read_csv(
"https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")
## New names:
## Rows: 666 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): BottleContent dbl (3): ...1, PatientNo, MeanSymbolScore
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
placebo.samps<-
StatinOrPlacebo[
StatinOrPlacebo$BottleContent== 'Placebo',] # 216
statin.samps<-
StatinOrPlacebo[
StatinOrPlacebo$BottleContent== 'Statin',] #218
nothing.samps<-
StatinOrPlacebo[StatinOrPlacebo$BottleContent== '0',] #222
bts.1<- function(n){
no.effect<-sum(sample(nothing.samps$MeanSymbolScore, n))
placebo.effect<-sum(sample(placebo.samps$MeanSymbolScore, n))
statin.effect<-sum(sample(statin.samps$MeanSymbolScore, n))
return((placebo.effect-no.effect)/(statin.effect- no.effect))
}
bts.1(100)
## [1] 0.7513779
reps<-seq(1,1000)
n<-100
obs<-c()
for (r in reps){
obs<-c(obs,bts.1(n))
}
mean(obs)
## [1] 0.8893852
quantile(obs, probs= c(0.025, 0.975))
## 2.5% 97.5%
## 0.5686764 1.3175538
This gives a wide CI, but, by keeping the 3 sample sizes (216, 218,222) fixed, not quite as wide as team A.
Again, the objective is stability. See comment to Team A.
As we discussed in class, its more about precision than power. For planning purposes, one could have made up some data, and used bootstrapping to calculate a SE.
library(tidyverse)
nocebo_df <- read.csv(
"https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")
# view dataframe
head(nocebo_df,13)
## X PatientNo BottleContent MeanSymbolScore
## 1 1 1 0 1.07
## 2 2 1 0 0.81
## 3 3 1 Statin 0.71
## 4 4 1 Statin 0.99
## 5 5 1 Statin 0.96
## 6 6 1 Placebo 0.81
## 7 7 1 0 0.96
## 8 8 1 0 0.71
## 9 9 1 Statin 0.74
## 10 10 1 Placebo 0.89
## 11 11 1 Placebo 0.71
## 12 12 1 Placebo 0.81
## 13 13 2 Placebo 1.03
tail(nocebo_df,12)
## X PatientNo BottleContent MeanSymbolScore
## 655 655 58 Placebo 34.75
## 656 656 58 0 28.95
## 657 657 58 Placebo 21.47
## 658 658 58 Placebo 35.29
## 659 659 59 0 16.07
## 660 660 59 0 16.25
## 661 661 59 Statin 45.04
## 662 662 59 Statin 51.41
## 663 663 59 Statin 40.25
## 664 664 60 Placebo 45.87
## 665 665 60 0 43.85
## 666 666 60 Statin 51.52
# change row name
# nocebo_df
# convert to wide format and change variable name
nocebo_wide <- nocebo_df %>% spread(key = BottleContent, value = MeanSymbolScore) %>%
rename(None = "0")
## not sure if need to calculate means first then bootstrap or bootstrap from individual measurements
nocebo_patient <- nocebo_wide %>% group_by(PatientNo) %>%
summarise(mean_none = mean(None, na.rm = T),
mean_placebo = mean(Placebo, na.rm = T),
mean_statin = mean(Statin, na.rm = T)) %>% as.data.frame()
# bootstrap nocebo ratio
nocebo_vector <- NA
for(i in 1:1000){
none_sample <- sample(nocebo_patient$mean_none, size=nrow(nocebo_patient), replace = T)
placebo_sample <- sample(nocebo_patient$mean_placebo, size=nrow(nocebo_patient), replace = T)
statin_sample <- sample(nocebo_patient$mean_statin, size = nrow(nocebo_patient), replace = T)
nocebo_ratio <- (mean(placebo_sample, na.rm = T) - mean(none_sample, na.rm = T))/(mean(statin_sample, na.rm = T) - mean(none_sample, na.rm = T))
nocebo_vector[i] <- nocebo_ratio
}
# get 95% CI using percentile method. NR = 0.87 (95% CI: 0.37, 1.85)
nocebo_vector %>% quantile(c(0.025, 0.5, 0.975))
## 2.5% 50% 97.5%
## 0.3376564 0.8554278 1.6285486
# standard error - 0.4
sd(nocebo_vector)
## [1] 0.3206016
# histogram
hist(nocebo_vector, breaks=20)
The nocebo effect refers to a negative effect of treatment that occurs because someone anticipates a negative effect from taking the treatment (in contrast to the placebo effect which is a positive effect of treatment that occurs because someone anticipates a positive effect from taking the treatment). So the nocebo ratio is the ratio of symptoms that can be attributed to the nocebo effect, calculated as the symptoms attributable to taking a placebo (symptoms from placebo - symptoms from nothing) divided by the symptoms attributable to taking the drug of interest (symptoms from drug - symptoms from nothing). If the nocebo ratio > 1 then the symptoms incurred from taking a placebo are larger than the symptoms incurred from taking the drug of interest. If the ratio = 1 then the symptoms are the same taking placebo or drug. If the ratio < 1 then the symptoms from taking the drug are larger than the symptoms from taking a placebo (indicating a negative effect of the drug itself). In plain words, nobeco ratio is how badly your body reacts to taking just the placebo as compared with taking statin as represented by a ratio.
The original results were strange because they suggest the negative effects from taking a placebo are larger than taking the actual drug. This was the case because for some people the symptoms attributable to taking the drug of interest were very small or even negative (less symptoms when someone was on the drug than when not). This would lead to large confidence intervals.
JH: Indeed. See above. But you mean the perceived effects, don’t you?
The preliminary results were calculated by taking the mean of individual nocebo ratios while the final results were calculated by taking the mean of symptom intensities and calculating an overall nocebo ratio.
Indeed
Treating each score for each individual as a separate observation, I bootstrapped a 95% CI of (0.59, 1.31). Taking the average score for each bottle content within individuals (placebo, statin, none) and then bootstrapping, I got a 95% CI of (0.38, 1.65).
Code for 1st CI:
library(tidyverse)
suppressMessages(library(tidyverse))
nocebo_df <- read.csv(
"https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")
str(nocebo_df)
## 'data.frame': 666 obs. of 4 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ PatientNo : int 1 1 1 1 1 1 1 1 1 1 ...
## $ BottleContent : chr "0" "0" "Statin" "Statin" ...
## $ MeanSymbolScore: num 1.07 0.81 0.71 0.99 0.96 0.81 0.96 0.71 0.74 0.89 ...
# get scores for each bottle content
# alteration by JH
# The next 3 statements worked for him when he ran them in R, but they
# threw up an error when he ran the entire document through R Markdown.
# placebo <- nocebo_df %>% filter(BottleContent=="Placebo") %>% select(MeanSymbolScore) %>% pull()
# statin <- nocebo_df %>% filter(BottleContent == "Statin") %>% select(MeanSymbolScore) %>% pull()
# none <- nocebo_df %>% filter(BottleContent == "0") %>% select(MeanSymbolScore) %>% pull()
# So, he replaced then with his own old-fashioned ones
placebo <- nocebo_df$MeanSymbolScore[nocebo_df$BottleContent=="Placebo"]
statin <- nocebo_df$MeanSymbolScore[nocebo_df$BottleContent == "Statin"]
none <- nocebo_df$MeanSymbolScore[nocebo_df$BottleContent == "0" ]
# bootstrap 95% CI
## create empty vector
nr_vector <- NA
## bootstrap with 1000 repetitions
for(i in 1:1000){
none_sample <- sample(none, size=length(none), replace = T)
placebo_sample <- sample(placebo, size=length(placebo), replace = T)
statin_sample <- sample(statin, size = length(statin), replace = T)
nocebo_ratio <- (mean(placebo_sample, na.rm = T) - mean(none_sample, na.rm = T))/(mean(statin_sample, na.rm = T) - mean(none_sample, na.rm = T))
nr_vector[i] <- nocebo_ratio
}
## get mean and 95% CI
nr_vector %>% quantile(c(0.025, 0.5, 0.975))# JH would 'round' to 2 dec. places
## 2.5% 50% 97.5%
## 0.5933059 0.8813228 1.3136811
# but doesn't know the 'newer' way with this pipe %>% approach.
This gives a wide CI, but, by keeping the 3 sample sizes (216, 218,222) fixed, not quite as wide as team A.
JH: It’s always safer to use quantile-based than Gaussian-based limits. But JH would not have resisted having a look (via a histogram) at the shape of the distribution of the bookstrap estimates of the ratio.
JH: See comment above about keeping together the scores from the same patient.
- Pooling individual observations decreases variance compared to accounting for differences between individuals in analyses. Not sure which is the more appropriate method. Sometimes we should pool the data.
JH: IF we were using a formal statistical model that carefully and correctly laid out the componets of variance, it might not be necessary to do the ‘pooling’ you speak of. The estimator, and the model, might just do the right thing (think for example, of a hierarchical model, or even GEE). For example, if the ratio were defined as the absoluet difference rather than the ratio ( a difference of differences), it should be possible to fit a formal model. The one thing missing from the calculations (and even the bootstrapping) is a model: a model where the parameter of interest, the estimand, is front and centre, and visible.
- I don’t think they had 103 observations per treatment per participant. Although there was no prior data on the relative size of nocebo and statin effects, the authors could’ve considered prior data on statin effects.
JH: Indeed, it was very generic. It also lacked a model for the structure of the observations. Without a model, it is hard to think of precision, sampling variation, etc etc. This lack of a full data-analysis model is something JH rants about regularly. When teaching in a protocol (study design) course decades ago, he was always asked to teach about sample size / precision / power right up front, but refused to. He explained that how one analyzes the data determines the precision. For example, if one has a paired design, but uses a 2-independent samples t-test, one won’t have the precision one should have. Precision depends in part on the model used.
Likewise, even if we wait until the data are in (so no need to do power calculations: just look at the observed precision), JH is not that comfortable with the absence of a model that says what are the (systematic and random) components of each \(y_{ijk}\), the SI score reported by patient \(i\) in condition \(j\) (content of bottle), in month \(k\).
- Nocebo ratio: the fraction of symptoms of statins that was caused by the inert (placebo) pills.
- Their first approach produced unsatisfactory results because the mean symptom score of the patients when not on pill were similar enough to mean symptom score when on statins. Hence, sometimes, the denominator was close to zero, making the estimand unstable.
- The broader statistical methods message from this example is to
stabilize the estimand; do not keep a difference that can potentially be zero (or close to it) in the denominator.
JH: re TERMINOLOGY: The estimand is a parameter (the value you would obtain with an infinite sample size) so it is (unknown and) not under your control. An estimator (here we saw 2, the initial one, and the revised one) is a method or recipe (or formula) for converting the observed data (in the sample of 60 here) into a numerical estimate of the parameter.
The sample size calculation should be done based on the difference that would make a difference, and not on prior data.
JH: In this case it is a question of estimation, rather than testing, so it is more about precision than power.
ds=read.csv("https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")
MEANS = by(ds$MeanSymbolScore,ds$BottleContent,mean)
str(MEANS)
## 'by' num [1:3(1d)] 7.96 15.69 16.79
## - attr(*, "dimnames")=List of 1
## ..$ ds$BottleContent: chr [1:3] "0" "Placebo" "Statin"
## - attr(*, "call")= language by.default(data = ds$MeanSymbolScore, INDICES = ds$BottleContent, FUN = mean)
RATIO = (MEANS[2]-MEANS[1])/(MEANS[3]-MEANS[1])
# Jack-knife
n = length(unique(ds$PatientNo))
ratio=rep(NA,n)
for(pt.id in 1:n){
leave.in = (ds$PatientNo != pt.id)
means = by(ds$MeanSymbolScore[leave.in],
ds$BottleContent[leave.in],mean)
ratio[pt.id] = (means[2]-means[1])/(means[3]-means[1])
}
for(i in 1:1000){
none_sample <- sample(none, size=length(none), replace = T)
placebo_sample <- sample(placebo, size=length(placebo), replace = T)
statin_sample <- sample(statin, size = length(statin), replace = T)
nocebo_ratio <- (mean(placebo_sample, na.rm = T) - mean(none_sample, na.rm = T))/(mean(statin_sample, na.rm = T) - mean(none_sample, na.rm = T))
nr_vector[i] <- nocebo_ratio
}
( Var = ( (n-1)/n) * sum( (ratio - RATIO)^2 ) )
## [1] 0.01678853
( SE = sqrt(Var) )
## [1] 0.1295706
round(RATIO + qnorm(c(0.025,0.5,0.975))*SE,2)
## [1] 0.62 0.88 1.13
hist(ratio - RATIO)
Figure. Jackknife CI, with Jackknifing at the patient level
ds=read.csv("https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")
# bootstrap
measurements.per.pt = as.numeric(table(ds$PatientNo))
n.samples = 10000
bs.ratio = rep(NA,n.samples)
for(sample in 1:n.samples) {
weights = rep(0,n)
id.s = sample(1:n,n,replace=TRUE)
freq = table(id.s)
which.ones = as.numeric( dimnames(freq)$id.s )
weights[which.ones] = as.numeric(freq)
ds$weights = rep(weights,measurements.per.pt)
sums = by(ds$weights*ds$MeanSymbolScore,
ds$BottleContent,sum)
n.summed = by(ds$weights,
ds$BottleContent,sum)
means = sums/n.summed
bs.ratio[sample] = (means[2]-means[1])/(means[3]-means[1])
}
hist(bs.ratio,breaks=40)
Figure. Bootstrap CI, with bootstrapping at the patient level
#hist(log(bs.ratio),breaks=40)
summary(bs.ratio)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4711 0.7939 0.8753 0.8814 0.9623 1.5518
( SE = sd(bs.ratio) )
## [1] 0.1290422
round(quantile(bs.ratio,probs = c(0.025,.50, 0.975)),2)
## 2.5% 50% 97.5%
## 0.65 0.88 1.15
If you are able to locate the methods by which these country-specific projections were made, summarize them, and highlight the key assumptions.
If you are not, provide your own commentary on how plausible the projections are.
Comment, in particular, on the 3 ‘bridges’ based on the data from France.
- We were not able to find the data
JH: SEE BELOW
- The projections don’t seem very plausible; even with medical advances that we’d expect in the future, an average life expectancy of 105 is very old. As well, rates of obesity are increasing worldwide and we would expect this to have a detrimental impact on life expectancy.
JH: INDEED. Life expectancy (LE) at birth (LE[0]) is stalling in some nations, such as the USA. The big jumps ahead were as a result of public health advances that followed the adoption of the germ theory, sanitation improvements, refrigeration, mass vaccinations, antibiotics, improvements in standards of living, education, etc. One wonders if there were some numerical artifacts, especially with unstable death rates from countries with smaller populations.
- We were not clear what this animation was demonstrating.
JH: The animations shown in class should now have helped explain these. They show two cohort lifetables, but mainly ‘current’ or ‘period’ lifetables. The contrast between the cohort and current lifetables for France is notable: nobody could have seen whether (and when) WW I would begin, or that there would be an influenza pandemic in 1918-19. The big reductions in INFANT mortality are the biggest reason for the big jump in LE(0) – see Canada.
- Drops in life expectancy may not be a useful metric because although Covid is contributing to greater mortality this year, the impacts of Covid on mortality will (hopefully) just last for a few years. The life expectancy of a child born today based on current mortality rates that take into account excess death due to Covid may not be accurate because these several years of Covid will ideally just be an outlier, and Covid-attributable mortality in pediatric populations is extremely rare.
JH: it is indeed wrong to interpret a LE(0) calculated from today’s mortalirty rates as a sure guide to what will happen to new generations. LE’s from current lifetable are fictional/hypothetical: they are projections, and as such, have built-in assumptions. And, as Niels Bohr said, Predictions are difficult, especially about the future. It is best to thing of a LE[0] from a table made with current mortality data as JUST THAT: A 1-NUMBER SUMMARY OF THESE VERY MORTALITY RATES.
- I couldn’t find how they got to 105 years either, but from this site I did find the country-specific LE data under “Life Expectancy at Birth”: it gives life expectancy at birth in ireland as 82.81 i think just further supporting the point that the estimate of 105 is nonsensical. Assumptions aren’t stated - presumably some assumptions about countries conditions inflate Ireland’s life expectancy.
JH: the projections were changed (revised DOWNWARDS) once the questionable projections were brought to the attention of UNICEF.
- Given the worlwide life expectancy at birth in 2019 is 78.7 for Europe and North America, and a consistent growth rate would put life expectancy at 83.2 by 2050, the sharp increase to 105 in 2021 for Ireland seems implausible.
JH: indeed.
- The 3 ‘bridges’ based on the data from France:
Bridge 1: As expected, over time, bridge became thick at the beginning, slowly decreased, and thinned out towards the end after 80.
Bridge 2: Similar to bridge 1, though there was a longer plateau at the beginning, and the decrease began at 20 years. In general, there were longer periods of sustained value (less decrease from 24-45). Also fewer deaths right after birth, but much more around late teenage years. Possibly heavily inflenced by depression and world wars.
Bridge 3: Thick at the beginning and a very stable decrease until about 70 years. A larger increase in average death after 60 years. Presumably likely to die at war, leading to higher survival through early adulthood.
JH: Were you able to see in the real (cohort) version the SUDDEN DROP drop in the size of the living male cohort, starting with the YEAR 1914, and lasting to the eyar 1918/19? France lost more young men than any other country. Remember that there are two parallel timescales in a chort lifetable: if the hort was born in 1895, then age 19 is calendar year 1895 + 19 = 1914. You don’t have the same ‘real-ness’ when you calculate a lifetable from current data. The mortality rate for age 19 is based on survivors from those born 19 years earlier (be it in France or elsewhere). The mortality rate for age 29 is based on survivors from those born 29 years earlier. There is no ‘real’ going forward, the way there is in a cohort table. It is like one took a still photo in the year in question. The table in Armitage’s book, contrasting the life table based on 1930-32 death rates with the actual progress of the real 1931 cohort, is a clever way to show the difference between the two.
- The use of Covid-19 mortalities towards life expectancies seems to serve as a useful metric since it shows an important dimension of the impact of the pandemic relative to the effects of other causes of death. However I don’t like to use them personally because I find their assumptions (i.e. that babies will experience the same age-specific mortality as present-day) make the measures themselves less interpretable. I’m more partial to simpler metrics like excess mortality or PYLL.
JH: Agree 110%. For those new to this terminology, PYLL means Potential Years of Life Lost.
JH: Impressive groundwork, ‘investigative journalism’ and new analyses. It includes technical details. The google doc also includes screenshots from the Bridges of Life based on current (1895) and cohort (1895-2004) data, where the dramatic drop in the size of the actual 1895 cohort during WW I is very obvious.
JH: Speaking of ‘visuals’ of the destruction of a ‘cohort’, here is a link to what many consider both the most effective and most poignant graph in history. See Wikipedia, the excellent data-visualization site by Michael Friendly. See also the ‘engineering’ side of Minard’s work that JH used in a class on communication for Concordia engineering students.
**
========
Here, via the McGill Library, is Winer’s 1962 book., here is the chapter ‘Single-factor experiments having repeated measures on the same elements’
The designs in this chapter may be said to involve correlated or dependent observations based on the following reasoning: in experimental work in the behavioral sciences the elements forming the statistical population are frequently people. Because of large differences in experience and background, the responses of people to the same experimental treatment may show relatively large variability. In many cases, much of this variability is due to differences between people existing prior to the experiment. If this latter source of variability can be separated from treatment effects and experimental error, then the sensitivity of the experiment may be increased. If this source of variability cannot be estimated, it remains part of the uncontrolled sources of variability and is thus automatically part of the experimental error. One of the primary purposes of experiments in which the same subject is observed under each of the treatments is to provide a control on differences between subjects. In this type of experiment, treatment effects for subject ‘i’ are measured relative to the average response made by subject ‘i’ on all treatments. In this sense each subject serves as his own control–responses of individual subjects to the treatments are measured in terms of deviations about a point which measures the average responsiveness of that individual subject. Hence variability due to differences in the average responsiveness of the subjects is eliminated from the experimental error (if an additive model is appropriate). Experiments in which the same elements are used under all the ‘k’ treatments require ‘k’ observations on each element. Hence the term repeated measurements to describe this kind of design. To the extent that unique characteristics of the individual elements remain constant under the different treatments, pairs of observations on the same elements will tend to be positively correlated. More generally, the observations will be dependent rather than independent. If the population distributions involved are multivariate normal, the terms dependent and correlated are synonymous, analogously. The terms independent and uncorrelated are synonymous in this context. Since the models that will be used are assumed to have underlying multivariate normal distributions, correlated measurements imply statistically dependent measurements. (PsycINFO Database Record (c) 2019 APA, all rights reserved)
and here is the pdf of the chapter
By simulation
Here are independent 1000 estimates of the mean of a distribution where the true mean is 100, and the SD is 13. In each instance, the sample size is n=30.
n.sims=1000
mu=100
sigma=13
n=25
estimate = function(y){
y=rnorm(n,mean=mu,sd=sigma)
return( c( mean(y), sd(y) ) )
}
estimates = t(matrix( unlist(lapply(1:n.sims,estimate)) ,
nrow = 2, ncol = n.sims ))
head(estimates)
## [,1] [,2]
## [1,] 97.02657 13.882639
## [2,] 106.03737 12.754475
## [3,] 98.96704 14.415149
## [4,] 98.45669 9.679836
## [5,] 95.32901 13.399238
## [6,] 98.54197 14.073843
apply(estimates,2,summary)
## [,1] [,2]
## Min. 92.32055 7.101628
## 1st Qu. 98.09844 11.577759
## Median 99.91087 12.776865
## Mean 99.86511 12.838734
## 3rd Qu. 101.53351 14.151189
## Max. 108.59311 18.597224
round(apply(estimates,2,sd),3)
## [1] 2.579 1.912
( CV = round(100* apply(estimates,2,sd) / c(mu,sigma),3) )
## [1] 2.579 14.705
By theory (SE(s) requires \(y \sim Normal\) )
SE(ybar) = sigma/sqrt(n)
= 13 / 5 =
2.6
SE(s) = sigma/sqrt(2*(n-1))
= 13 / 6.928 =
1.876
Background. Mild perioperative hypothermia, which is common during major surgery, may promote surgical-wound infection by triggering thermoregulatory vasoconstriction, which decreases subcutaneous oxygen tension. Reduced levels of oxygen in tissue impair oxidative killing by neutrophils and decrease the strength of the healing wound by reducing the deposition of collagen. Hypothermia also directly impairs immune function. We tested the hypothesis that hypothermia both increases susceptibility to surgical-wound infection and lengthens hospitalization.
Methods. Two hundred patients undergoing colorectal surgery were randomly assigned to routine intraoperative thermal care (the hypothermia group) or additional warming (the normothermia group). The patients’ anesthetic care was standardized, and they were all given cefamandole and metronidazole. In a double-blind protocol, their wounds were evaluated daily until discharge from the hospital and in the clinic after two weeks; wounds containing culture-positive pus were considered infected. The patients’ surgeons remained unaware of the patients’ group assignments.
Results. The mean (SD) final intraoperative core temperature was 34.7(0.6) C in the hypothermia group and 36.6(0.5) C in the normothermia group (P<0.001). Surgical-wound infections were found in 18 of 96 patients assigned to hypothermia (19 percent) but in only 6 of 104 patients assigned to normothermia (6 percent, P=0.009). The sutures were removed one day later in the patients assigned to hypothermia than in those assigned to normothermia (P=0.002), and the duration of hospitalization was prolonged by 2.6 days (approximately 20 percent) in the hypothermia group (P=0.01).
Conclusions. Hypothermia itself may delay healing and predispose patients to wound infections. Maintaining normothermia intraoperatively is likely to decrease the incidence of infectious complications in patients undergoing colorectal resection and to shorten their hospitalizations. (N Engl J Med 1996;334:1209-15.)
–
The number of patients required for this trial was estimated on the basis of a preliminary study in which 80 patients undergoing elective colon surgery were randomly assigned to hypothermia (mean [SD] temperature, 34.4[0.4]C) or normothermia (involving warming with forced air and fluid to a mean temperature of 37[0.3]C). The number of wound infections (as defined by the presence of pus and a positive culture) was evaluated by an observer unaware of the patients’ temperatures and group assignments. Nine infections occurred in the 38 patients assigned to hypothermia, but there were only four in the 42 patients assigned to normothermia (P=0.16). Using the observed difference in the incidence of infection, we determined that an enrollment of 400 patients would provide a 90 percent chance of identifying a difference with an alpha value of 0.01. We therefore planned to study a maximum of 400 patients, with the results to be evaluated after 200 and 300 patients had been studied. The prospective criterion for ending the study early was a difference in the incidence of surgical-wound infection between the two groups with a P value of less than 0.01. To compensate for the two initial analyses, a P value of 0.03 would be required when the study of 400 patients was completed. The combined risk of a type I error was thus less than 5 percent.
Comment (JH)
To JH, the effect size in a power calculation should be the minimal effect that WOULD ‘MAKE A DIFFERENCE’ IF IT WERE TRUE. It is a difference would would not wish to ‘miss.’ By ‘miss’ we merely mean that the study will not produce a ‘positive’ result, i.e., that the observed difference will not be significantly different from the ‘null.’
Thus, the effect size (the difference that would make a difference) is a judgement, based on cost and other practical matters. In the example above, the observed difference was statistically different from zero, but the blanket costs a considerable amount, and is not re-usable.
The other example JH uses to illustrate what is involved is the choice between the riskier but earlier-news chorionic villous sampling versus the less risky but later-news amniocentesis.
Just about all medications involve a tradeoff between benefits and risks, and these come to the fore in the number needed to treat metric.
For several examples of such studies, see these 607 notes, used in the teaching of the one-sample t-test.
and a few here
This evaluation of three gastroenterological studies makes some important points.
Do starch blockers really block calorie absorption?
See here for a summary of the 5-subject study and page 6 of this document for a longer version. The full version is here and it includes a response by another investigator
We also recently completed a study in which a starch blocker was proved to be ineffective. In a double-blind crossover trial in six healthy male subjects, we were unable to demonstrate any effect on three markers of starch digestion — namely, serum glucose, insulin, and breath hydrogen.9 The two studies in human beings are complementary. The work of Bo-Linn et al. indicates that the bottom line is that α-amylase inhibitors do not result in excess calorie loss in the stool, and ours indicates that this failure is due primarily to a lack of inhibitory effect on starch hydrolysis in the small intestine.
Do infant formula samples reduce the duration of breastfeeding?
Short Version and Full Version
The first author urges younger investigators not do what he did to calculate a sample size. He considered a difference of 10 percentage points as clinically significant, enough to call out the infant formula companies if it were true. But to arrive at a sample size, he simply calculated the n which would make an observed difference of 10 percenatge points statistically significant. He did not realize that if the TRUE difference were 10 percentage points, there was a 50% chance that the observed difference would exceed 10 percentage points (so the study would have ‘established’ that samples reduce – by SOME amount – the rate of breastfeeding), AND a 50% chance that the observed difference would NOT exceed 10 percentage points (so the study would have been ‘negative’). In other words, his study had 50% power to show a ‘(statistically) significant’ reduction.
Bias and precision see diagram on page 1 of these notes. Concept is same for studies as it is for individual measurements.
Bias Reduction Statistical Methods for Comparative Studies: Techniques for Bias Reduction
‘Non-confounding covariates;’: Fairer and SHARPER: Hanley 1983
Different terms, same concept (eg: interaction and effect modification) Miettinen 1985, glossary and Miettinen 2011, Epidemiological Research: Terms and Concepts.
Factorial designs Encyclopedia of Biostatistics
Confounding, historically and today: See ‘The history of confounding’ by Vandenbrouche. See also Fisher, R.A., (1926). The arrangement of field experiments (https://link.springer.com/chapter/10.1007/978-1-4612-4380-9_8) and Kish, L. (1959), Some statistical problems in research design American Sociological Review, 26, 328–38. See also Wikipedia
Confounding and Effect Modification are different concepts. See JH’s Notes on Effect-Modification and on Effect-Modification. See also sections 5 and 6 in ‘Resources / Materials’ in the same webpage. In section 6, there is a helpful graphic from Miettinen showing that a covariate can be a modifier, a confounder, or both
Regression to the mean [Originally, reversion, and to mediocrity] (https://en.wikipedia.org/wiki/Regression_toward_the_mean) and how to avoid it. Original use of the term – with mediocrity instead of mean. Galton’s 1886 Figure and full article. From Miettinen 2011 III – 4, Terms and Concepts of Methods of Study: the tendency of a repeat observation (with chance error) to be closer to the mean. For an application to measurements of serum cholesterol, see Irwig, JAMA 1991
Comparative studies: experimental studies and non-experimental, i.e., observation ONLY, studies JH’s favourites in this list are Cox’s Design of Experiments and Campbell and Stanley’s Experimental and Quasi-Experimental Designs for Research
Match first (on important known factors); then randomize (for unknown factors). Darwin: The effects of cross and self fertilisation in the vegetable kingdom
Confidence intervals <—-> p values via Standard Errors (SEs).Can use qnorm(pvalue) to get the z value. Knowing that the point estimate is z SEs away from the null, one can backcalculet the SE. Fom this one can use point.estimate +/- 1.96 SEs for a 95% CI. This was the basis for Miettinen’s clever test-based CI to accompany a Mantel-Haenszel summary Ratio. before RGB (Rothman Greenland Breslow) came up with a fancier one, and Clayton a simpler one.
p-values not helpful for Table 1 Ask instead which column of patients (those in column 1 or those in columns 2) would you prefer to have on your shift.
2 sided vs 1-sided hypotheses/tests See [History and References sections of] (https://en.wikipedia.org/wiki/One-_and_two-tailed_tests)
Effect Modification
Bias Reduction and More Precision (making comparisons fairer & sharper.. see Hanley 1983 article
Standard Error and Standard Deviation
Reporting Magnitudes of Differences / Scale
Coding of a binary (all or none) variate
Computing
aggregate(fruitfly$Thorax,by = list(type=fruitfly$Type),FUN = mean)
or
tapply()
Use regression instead of t test. e.g.,
lm(longevity~type_female, data=df.fruitfly)
Orthogonal (independent) contrasts Wikipedia
If 4 independent samples from 4 conditions’ A, B, C, D, then 3 independent (uncorrelated, non-over-lapping) contrasts possible
{B,C,D} vs. A contrasts are correlated because they involve a common A as reference category.
Example of 3 orthogonal contrasts: AB vs CD; A vs B ; C vs D.
Overall F test with degrees of freedom = 4 - 1 = 3 tests the null H that all 4 means are equal, against the alternative that at least one [unspecified] is different from the others. It is blunt, because it penalizes user for looking in 3 directions simultaneously, a bit like a (say radar) scanner that scans in 3 D from a central point to find ‘signals’. JH prefers shaper more focused (directed) scans that look at just one (pre-speficified) contrast (1D).
Bonferroni corrections used to penalize for multiple testing, expecially if post-hoc.
Difficult to tell incidental findings (needing replication) from ones based on pre-planned analyses
stepwise regression not appropriate in this context.
Effect Size (several types)
E.g., Difference family: Effect sizes based on differences between means
Male:Female. Birthweight {\(\mu=\) 3.6Kg - \(\mu=\) 3.5Kg}/{\(\sigma =\) 0.5Kg} = 0.1. Adult height (175cm - 165cm)/6cm = 1.6
MaternalSmoking:Not. Birthweight {\(\mu=\) 3.55Kg - \(\mu=\) 3.35Kg}/{\(\sigma =\) 0.5Kg} = 0.2/0.5 = 0.4.
DUMMY Variables INDICATOR Variables
In ‘better families’ we speak of INDICATOR variables, not DUMMY variables. This source reports that The International Statistical Institute’s Dictionary of Statistical Terms objects to the name: the term is ‘’used, rather laxly, to denote an artificial variable expressing qualitative characteristics …. [The] word ’dummy’ should be avoided.’’
see online disctionaries such as the Cambridge dictionary of statistics, or
Miettinen’s text:- Epidemiological Research: Terms and Concepts
Indicator variate∗ – A variate with 0 and 1 as its (only) realizations, with realization 1 indicating something particular. (Examples: Y = 1 indicating membership in the case series of person-moments and X1 = 1 indicating index category of the etio-genetic determinant in an etiogenetic study – in the logistic model for the object of study.)
Dummy variate∗ (synonym: indicator variate) – See ‘Indicator variate’ in section II – 2. Note: This term is a misnomer: there is nothing dummy about an indicator variate.
Regression models
For a index category vs. reference category contrast, use an indicator variable for the index category.
Center continuous covariates (or e.g., relocate them so that they start at the minimum, or at an memorable vale (such as 5 feet for human heighs) so as to yield an interpretable intercept.
Change scores vs. initial values as a covariate
If, by fiting math2 ~ misled + math1c
, we obtain
5.6 - 1.1*misled + 0.45*math1c
, this is arithmetically
equaivalent to the result of fitting
math2 - 0.45*math1c ~ misled
Contrast this with fitting
math2 - 1.0*math1c ~ misled
. Even without any intervention,
if math1c has measurement error, we would not expect
math2 ~ math1c
to give a slope (correlation) of 1. (this
mismeasurement produces regression to the mean).
Because math2
and math1
are on the same
scale, the slope in the regression of math2
on
math1' is basically its correlation with
math1`. A value
further (down) from 1 means less reliable measurements.
A relevant (medical) example of the artifacts created by imperfect measurements re-readings of prostate cancer biopsies. It is a nice bit of statistical teaching – by a pathologist who is better know for the [Gleason] score that bears his name. He reproduces a diagram from his 1951 college text in biometric analysis.
Combining Ratios
DON’T, expecially if they are based on small (unstable) denominators better sum all up as Mantel did in his 1958 odds ratio formula (in paper with Haenszel) that summarizes the relatioships in the individual 2x2 tables with frequencies \(a, b, c, d\), namely \(or_{MH} = \frac{\Sigma ad/n}{\Sigma bc/n}\). Think of Miettinen’s example of estimating the sex ratio in small daycare centers. Better to use M/F = Sum(m)/Sum(f) than mean or median of the individual (m/f) ratios. See Ratio Estimators in sampling theory.
If we calculate sum(math2)/(sum(math1) as a single ratio, we can use jackknife or bootstrap (or delta) method to obtain a SE or CI.
Bootstrap dictionary
A data-based way to estimate the sampling distribution of a statistic, and to form a CI. Conceptually: assuming your sample (of size n) is ‘rich’ enough to represent the population [like Noah’s arc], clone the sample an infinite number of times. Then take many (say B) samples of size n, calculating the statistic of interest on each new sample,and use the B ‘bootstrap’ statistics as the estimated sampling distribution. In the simplest case, use the standard deviation of this estimated sampling distribution as the standard error for the observed statistic. Use it, or the quantiles directly, to form a confidence interval for the parameter that the statistic is an estimate of. In practice, the bootstrap doesn’t clone or photocopy the original sample. Instead, it merely samples from it WITH REPLACEMENT.
For a very readable explanation, see [Computer-Intensive Methods in Statistics By Diaconis and Efron in Scientific American, 1983
Graphs
R-squared
Women and Math
It would have been nice if the authors could have used both matching and randomization, as it would have meant less (mathematical) adjustment after the fact. It was probably logistically difficult to do, but any proxy measure they could have matched on would have helped keep the groups more balanced.
How predictable was it that an unconstrained randomization would
produce math1
means as discrepant (or more discrepant) as
the ones observed. The p-value for a 1-way ANOVA on math1
(see code) provides the answer. That p-values is based on the variance
of the 4 observed means. The simulation code also shows how to evaluate
the probability [wuite large!] of getting 4 means with a range as large
or larger than the observed range.
Of course, even if they had been able to achieve balance, it
would have been important to include math1
in the model so
as to reduce the noise in math2
- just like in the case of
thoraz in the fruitfly study.
The R code we ran in class, updated to include post/pre ratios.
Department’s 6 Public lecture Series in Epidemiology, 2014
Prelude to Oscars competition
Pardoe’s website, 2008 JRSS article, and ‘starter’ dataset (the dataset has been added to each year in course 624)
Some history of S curves: probit and logit for population growth and toxicology, and for the age-specific probabilities of having reached menarche. Beginning from early work on the Framingham Heart Study, the importance logistic regression in epidemiology.
The inverse-logit function \(\frac{e^{\beta x}}{1+e^{\beta x}}\) and the inverse-probit function \(\Phi[x]\) stay within the 0 to 1 probability bounds.
Why choose the logit vs. probit link for the binomial distribution? Both are used. The social sciences (and the toxicologists) like the probit, possibly because of the ‘hidden’ (or not hidden in the case of age at menarche) Gaussian distribution that is thought to underlie it. Epidemiology and Biostatistics use the logit, for substantive reasons. The 1970 book Analysis of Binary Data, by DR Cox did a lot to promote it, and the earlier use of odds ratios for case-control studies did as well. In the 1972 article by Nelder and Wedderburn, the logit is shown to be the ‘canonical’ link.
log vs. logit link ? Although the logit is the canonical link, the odds and the odds raio are not easy to communicate. A ratio of risks [vai log link] is easier and more transparent than ratio of odds [via logit link], and a ratio of odds can be deceptive. However, it may happen that the fitted risks/probabilities in a log-link model can go outside of the (0,1) range.
(log or logit) vs. identity link ? The identity link allows you to model proportions (and thus differences in proportions) directly. This absolute difference is preferred by public health people, and is used directly in cost-effectivess calculations, such as the numer needed to treat to prevent one case (it is simply the reciprocal of the risk difference). However, just like with the log-link, it may happen that the fitted risks/probabilities in an identity-link model can go outside of the (0,1) range.
(The statistical geneticist) Woolf’s (1955) use of Comparative Rates when Estimating the Relation Between Blood Group and Disease, and his very modern conception of what today is still misleadingly called a ‘case-control’ study. ‘This kind of artefact is avoided if one works with incidence rates in the various blood groups. The data usually do not permit calculation of absolute rates, nor are they needed. What is wanted and readily obtained is an estimate of the ratio of one rate to another. ’The incidence in (exposure group) group \(\alpha\) will be h/H x some constant, and that in group \(\beta\) will be k/K x the same constant. An estimate of the (rate) ratio will be hK/Hk.’
Notice that Woolf never uses the term odds ratio or compares (the exposure distribution of the) ‘cases’ with (the exposure distributions of the) ‘controls’. He uses a numerator series [aka as a case series] that split into h and k, and a denominator series, that split into H and K, that represents the base, to proceed directly to the natural comparison of peptic ulcer rates in the two blood types. Sadly, today we still haven’t abandoned the case-control terminology in favour of one that conveys the idea that the two (quasi) rates used in the rate ratio are calculated from partial (sampled) denominators.
If the ratio were based on known denominators (which are roughly proportion to H and K times some blow-up factor), then the variance of the log of the ratio would be merely (1/h and 1/k). But, given that the relative sizes of the two denominators (or the ratio H/K) are estimated from the denominator series (which is merely a fair sample of the base), the additinal cost, in variance terms, is (1/H + 1/K). This 1/h + 1/k + 1/H + 1/K (or today 1/a + 1/b + 1/c + 1/d) formula is referred to as Woolf’s formula. The illustrations in class bear out Samy Suissa’s contention that it may well be the most important formula in planning epidemiological research. An inportant goal is to minimize the variance, given the budget constraints.
Before logistic regression there was Discriminant analysis The goal is to to select a linear combination of features that maximally separates those who are in one group from those in another [i.e., different classes of the response variable], or, in epidemiology, those who have experienced the event of interest (Y=1) from those who have not (Y=0).
In 1962, Cornfield reverse-engineered discriminant anaysis to produce a logistic S curve that gave the probability of Y=1 as a function of the X values, and the 1967 Biometrika article showed how to directly estimate the regression coefficients – using iteratively re-weighted least squares to arrive at Maximum Likelihood estimates. This was a forerunner of the 1972 breakthrough by Nelder and Wedderburn on Generalized Linear models that used the same idea to fit a much wider range of models, using just two ‘toggles’, one for the link function that specified which function of \(\mu_{Y|X}\) (e.g., identity, log, complementary log-log, etc ) is to be linked to the linear predictor, i.e., to the linear combination of the X’s, and the other for the ‘error’ distribution [Normal, Bernoulli/Binomial, Gamma, Poisson, etc] that describes the variation of \(Y|X\) about \(\mu_{Y|X}.\)
Conditional Logistic Regression. For the parallel history of conditional logistic regression in economics/marketing, see the 2003 review article by Norman Breslow. It begins as follows: ‘Econometricians Daniel McFadden and James Heckman won the 2000 Nobel Prize in economics for their work on discrete choice models and selection bias. Statisticians and epidemiologists have made similar contributions to medicine with their work on case-control studies, analysis of incomplete data, and causalinference.’ Further down, it explains that ‘McFadden’s method for discrete choice analysis, developed to assist a graduate student to study freeway routing choices made by the California Department of Transportation, had its basis in formal theories of economic behavior. Under Marschak’s random utility model, observed economic choices—such as whether to ride the bus or drive the car to work—result from multinomial sampling (Becker, DeGroot, and Marschak, 1963). Psychologist Duncan Luce (1959) added the rather strong condition that the ratio of odds for any two choices was independent of the set of other available choices. These assumptions led McFadden (1973) to his ’conditional logit model’ \[Pr(c \ | \ x, \ C) = \frac{e^{v(c,x)}}{\Sigma_{C} \ e^{v(*,x)} },\] where \(c\) is the choice selected, \(C\) is the set of available choices, \(x\) measures attributes of the individual decision maker, and * runs over the elements/members of \(C\). [Notation altered by JH to be closer to conventional notation]. Manski and Lerman (1977) proposed a weighted likelihood method with origins in sampling theory. Manski and McFadden (1981) developed a pseudo likelihood procedure to which they referred as conditional maximum likelihood or CML. Cosslett (1981) studied semiparametric efficient ‘maximum likelihood’ estimation. For more, see McFadden’s website and the cv link in it.
The naturalness and efficiency of “outcome based sampling”
Parameter Estimation in Conditional
Logistic Regression is carried out by maximizing the sum of the
(multinomial) log-likelihood contributions from the various contests or
‘risksets.’ Thus, each row in the dataset needs to have, in addition to
an indicator of whether it was ‘chosen’, and the associated x vector, a
variable that identifies which rows (persons or person
moments) are in the **same ‘set’ (or contest in the
Oscars setting, or ‘riskset’ in fitting the Cox model, or matched
case-‘control’ set). In CLR, the experiences of persons in different
sets are not compared, only those of persons in the same riskset. Thus
the strata
specification in clogit
in the
R survival
package. For the Oscar’s example, the
strata
variable specifies who is in the same competion in
the same year.
Miscellaneous
A linear model is linear in the \(\beta\)’s: the linear predicor (say \(\beta_ \times X_1 + \beta_2 \times X_2\) is a linear combination of betas, with the X’s used as weights. The right hand side of the following Michaelis-Menten relationship \[E[y \ | \ x ] = \frac{\beta_1 \ x}{\beta_2 + x}\] is not (and can not be made) linear in the \(\beta\)’s.
The estimate of a logit of a probability is most stable when the probability in question is near 0.5, whereas the estimate of a probability is most stable when the probability in question is furthest from 0.5. This stems from the fact that the (sampling) variance of a binomial proportion is \(\pi(1-\pi) \times (1/n)\) whereas examining the variance of its logit is \(\frac{1}{\pi(1-\pi)} \times (1/n)\)
‘Catchment area’: where cases come from
Miettinen’s definition of experiment: from glossary in his 1985 text: ‘A study in which a determinant is intentionally perturbed for reasons none other than the goals of the study itself.’ And his story about a doctor inviting you to take some medication ‘samples.’ It an experiment if the doctor wishes to learn from your ‘experience,’ but not if the doctor didn’t ask you to report back to him/her. See also Miettinen’s newer dictionary.
Appropriate ‘controls’ in so-called ‘case-control’ studies. Some epidemiology texts take an overly-restrictive view, insisting that ‘controls’ must be selected from that portion of population time generated by persons who had a non-zero chance to represent an instance (a ‘case’) of the event of interest: thus, just females if the event is limited to females. What is required is a sample that fairly (unbiasedly) represents the distribution of the (putative etiologic) agent/trait in the base. Since the distribution of blood groups is the same in males and females, males could serve as a denominator (‘control’) series if the trait of interest was blood group – even if the event/condition is limited to females.
Also, the denominator (‘control’) series doesn’t have to be distinct from the case series: the two series can be arrived at independently of each other, just as in the hockey-epidemiology example about whether it was better to shoot low on Canadiens goalie Patrick Roy. The 51 goals constitute the case series; a random sample of all shots (no matter whether they resulted in goals or not) serves as a denominator series.
History of ‘Case-control’ studies, including this humourous one
Condo Prices
R markdown
document, JH’s first (and
thanks to Lamin for the template!). The main purpose was to show how to
fit additive-in-floor and multiplicativee-in-floor rate
models, and to link them to the corresponding models
for epidemiological rates (incidence densities) involving
amounts of population-time rather than numbers of square feet. However,
plots of the various fits suggest that (even apart from floor)
area
does not suffice, and that we would need to also study
the role of the number of rooms – the 1½ or 2½ or 3½ measure used in
Montreal, but that is not in the dataset. There is also a suspicious
‘outlier’ condo that is quite influential.A couple questions about GLMs [to be addressed next class]
Prizes for Oscars competition
Degrees of Freedom
Lung cancer screening
Model-based Standard Errors
Statistical Analysis of Correlated Data Using Generalized Estimating Equations: An Orientation
Examples of positively correlated [row 3, number who had visited a physician in previous year] and negatively correlated [row 4, number who were of male gender] responses in a cluster sample of 30 households: data from Cochran, Sampling Techniques, 1953, pp. 124-127.] in Table I of GEE Analysis of negatively correlated binary responses: a caution
Logit and Probit fits to data on ‘detectability’ of laced drinks
Physiological model: Gaussian/Logistic distribution of ‘detectability threshold’, by analogy with the ‘just lethal’ dose on toxicology (see also the age at menarche data below)
Puzzling data from smokers. Using log(added amounts) did not help. The poor fits may be because of the small number of smokers (just 11) or because they deliberately sabotaged the study.
See several analyses, including comparisons of logit and probit models (underlying/latent logistic and Gaussian distributions of ‘detectability’) here
Logit and Probit fits to data on age at menarche
See regression fits, fitted probability density curves, and Cumulative Distribution Functions here:
Data structure is indentical to that in the laced-drinks example
Links for (binomial) models for Mystery dataset (and MORE GENERALLY)
identity: risk difference (invariant to use of desired/undesired outcome)
log: risk ratio. BUT WHICH? percentages that… survived? died? ]
logit (invariant to use of desired/undesired outcome)
For a very thoughtful (and data-based) discussion, see chapter II: Fundamental Measures of Disease Occurrence and Association, in volume 1 of this classic Breslow and Day textbook. Admittedly, in so-called case-control studies (case-base studies, or ‘estimated denominators’ studies), absolute differences in incidence density (or in risk) are not usually estimable, and so the preference for ratio measures is understandable. But the chapter has several compelling examples, all of them involving ‘full denominator’ datasets
If you have time, you could investigate which of the 3 links allows for the smallest model to summarize the survival (or mortality) patterns in the mystery dataset. After all, the pupose is to summarize (or reconstruct) the patterns in the 2 x 2 x 3 + 2 = 14 proportions with as few parameters as possible. Nobody wants to memorize the 14 percentages themselves.
100+ Years of Graphs of the Titanic Data:
The event that drove the water consumption pattern in Edmonton
The striking graph has probably drawn more views than this ‘classic’, depicting the losses during Napoleon’s march to, and retreat from, Moscow
Michael Friendly’s DataVis.ca site
==========
Comments on one (M:F) comparison of the longevity of the Titanic survivors that used a ratio of 2 SMRs – each of which involved the sex-specific Titanic mortality rates versus the general US mortality rates
[Incidendally] From the Encyclopedia Titanica site website, which has biographies on every passenger and crew member, on elearns that the two earliest deaths (both in 1912) were from meningitis.
Ratios of SMRs: See page 33 in Chapter 2, Important concepts in Epidemiology, by O. Miettinen in this 2010 book 2010 Teaching Epidemiology: A guide for teachers in epidemiology, public health and clinical medicine. availble via the McGill Library. See also Breslow and Day, Volume II Chapter 2, and Modern Epidemiology (Rothman, Greenland, Lash).
Male survivors survived on average 15% more than women
:
.
Discussion of how best to describe mortality ratios, and of the dangers
of mixing differences in the time (longevity) scale
with ratios in the rate scale.
.
Contrast this
Former NFL players die at a faster rate than other professional
athletes, study finds
with this
Major league baseball [MLB] players tend to live about 24%
longer than the average American man
Rough translation between mortality ratio and longevity difference:
contact JH for one such ‘translator’.
Language/grammar/punctuation matters: (remember the panda who eats, shoots, leaves
Figure 1 shows a more detailed picture of the adjusted association between age and the odds of death by logistic regression.
Comments on a M:F comparison that used longevity from birth, rather than remaining longevity from the 1912 sinking onwards
See Premature Death in Jazz Musicians: Fact or Fiction?. and reply by Rothman. See also the longer list of longevity comparisons, some ok, some flawed
See also this 2016 presentation: ‘Immortal Time’ Blunders: history, identification, severity
Who needs the Cox model anyway? (Bexdix Carstensen)
Carstensen is the author of the Epi R package that includes the Lexis functions
here is his latest presentation on why [Poisson regression, based on the numbers of events in each of the Lexis cells, is a more transparent approach to the analysis of rates (hazards) than theCox model”] (http://bendixcarstensen.com/WntCma.pdf). It is a followup on his 2005 piece Demography and epidemiology: Practical use of the Lexis diagram in the computer age
Here is a (Stata-based) presentation by another author, entitled Replicate a Cox model using Poisson regression
The Lexis diagram brings up the issue of which time-scale to use in a Cox regression
Cox’s own advice is to match on (ie use as the time scale) the time scale that is more difficult to model, and include the other one as the covariate. Since the force of mortality varies by several orders of magnitude over age, but is a long weaker function of calendar time, Cox would match on age – and put calendar years as a covariate.
The same would apply to the Framingham study: just because the cohort starts at the year 1948 doesn’t mean it is the natural time scale to be matched on in a Cox regression. Time since 1948 (grant time) is an artificial time scale. Age is the natural one –and as Gompertz found, the force of mortality is 1.08 to 1.10 higher at each successive age. (Many of you analysis also found a similar coefficients vale for age at entry to the Titanic cohort.)
This article in Demographic
Research on Exploring the demographic history of populations with
enhanced Lexis surfaces, send to me by Jay Kaufman, has a very nice
example, in Figures 4 and 5, of mortality in various French male
birthcohorts. The
excess mortality during the periods of the two World Wars stands
out.
Inverse Probability Weights – and classical standardization – to reduce condounding in the M:F contrast
In the M:F constrast in the Titanic example, there is no natural ‘index’ category and ‘reference’ category.For the purpose of the following, we will take the males to be the reference category, and females to be the index category. So we expect a rate ratio lower than 1, and a longevity difference greater than 0.
We will take age at entry (‘binned’ into 7 decades of age) and class as potential confounders.
Whereas standardided rates (or standardized risks) are more common, we begin with differences in mean longevity (or rather in the mean length of remaining life). We also address mortality rates.
In both, we ignore any cell where it’s all-or-none, i.e., all males or all females, so no contrast. [the Mantel-Haenszel summary rate ratio does this by construction]
The analyses are here in this separate document
We begin with the classical way to standardize: we use an agreed on set of weights to compute a weighed mean of the cell-specific means (or rates) for males, and the SAME set of weights to compute a weighed mean of the cell-specific means (or rates) for females.
We then show that when one uses the inverses of the observed cell-specific proportions of males, and the inverses of the observed cell-specific proportions of females, as INDIVIDUAL weights, one obtains the same difference in means as if we used FREQUENCIES as weights for the cell- and sex-specific means.
We could have arrived at these same Prob[Female | class agebin] ‘propensities’ and their complements Prob[male | class agebin] ‘propensities’ by fitting a saturated logistic regression model. Typically, one would use a less-than-saturated model.
MATCHING on the ‘PROPENSITY’ to be female, and AGGREGATING the (within propensity-score bins) DIFFERENCES in means is ANOTHER way to reduce condounding in the M:F contrast
Legendre 1805: Sur la Méthode des moindre carées
Submitted article on Teaching/learning multiple regression using family data – send via email.
Galton’s first ‘regression’ line fitted to HUMAN data (he had earlier fitted a ‘regression’ line to )
See Poster explaining the data, the data-reduction, and the fits. More details, including recently uncovered raw data, on JH’s page on Galton
Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors
Table
in the original. Galton was a very clear writer, and an excellent
statistician. BUT, please do take Galton’s ‘for each increase of one
unit on the part of the parent seed, there is a mean increase of only
one third of a unit in the filial seed’ as the way to report a
regression slope, especially from non-experimental data. Better to
simple say ‘From parent seeds one unit apart in size, the mean filial
seeds From parent seeds one unit apart in size differed by only one
third of a unit’
.
For each INCREASE of one unit on the part of the parent seed, there
is a mean INCREASE of only one third of a unit in the filial
seed
From parent seeds ONE UNIT APART in size, the mean filial seeds DIFFERED
by only one third of a unit
Multiple regression as a sequence of simple regressions
“TERM ENTERED LAST” tests
‘Immortal Time’ Blunders
Less consequential longevity comparisons:
MORE CONSEQUENTIAL longevity comparisons:
See, in the 1800s already, the striking teaching examples of William Farr
Updated, February 29, 2020