2021

February 8, 2021 (and earlier)

Vaccine Efficacy in BioNTech/Pfizer, Moderna, Oxford/AstraZeneca and Sputnik Trials

Comment on the null hypothesis (first sentence section 9.2, page 83 of the Moderna protocol
What does a ‘proportional hazards VE’ (top of p. 84) mean? Is proportional hazards a reasonable assumption?
What is a ‘stratified Cox proportional hazard model’? How it is fitted?
Why use of a 1-sided 0.025 significance level? (c.f. FDA guidelines).
‘The primary analysis population for efficacy will be the Per-Protocol (PP) Set, defined in Section 9.4. In the primary analysis of efficacy, cases will be counted starting 14 days after the second dose of IP.’ Explain what is meant by the ‘Per-Protocol (PP) Set.’
Rework the statistical calculations shown in Table 6. Do so ‘from scratch,’ using first principles, i.e. don’t use the referenced R package. Begin with the simplest case, where there are no interim analyses.

D = 151

alpha = 0.05

E.null = 0.3

V.P = 1

( P.case.was.V.null = V.P *(1-E.null) / 
    (  V.P *(1-E.null) + 1*1 ) )

## [1] 0.4117647

7/17

## [1] 0.4117647

n.cases.in.V.arm = 0:D

probs.under.null = dbinom(
  n.cases.in.V.arm, D, P.case.was.V.null )

( n.critical = max(n.cases.in.V.arm[
  pbinom(n.cases.in.V.arm, D, P.case.was.V.null) < alpha/2]) )

## [1] 49

pbinom(n.critical,D,P.case.was.V.null)

## [1] 0.01711484

binom.test(n.critical,D,P.case.was.V.null)

## 
##  Exact binomial test
## 
## data:  n.critical and D
## number of successes = 49, number of trials = 151, p-value = 0.0314
## alternative hypothesis: true probability of success is not equal to 0.4117647
## 95 percent confidence interval:
##  0.2506521 0.4053987
## sample estimates:
## probability of success 
##              0.3245033

E.alt = 0.6

( P.case.was.V.alt = V.P *(1-E.alt) / 
    (  V.P *(1-E.alt) + 1*1 ) )

## [1] 0.2857143

pbinom(n.critical,D,P.case.was.V.alt)

## [1] 0.873225

library(gsDesign)

gsDesign(k = 3, test.type = 2, 
              sfu="OF",n.fix = 151)

## Symmetric two-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
## Spending computations assume trial stops
## if a bound is crossed.
## 
##               
##   Analysis  N   Z   Nominal p  Spend
##          1  52 3.47    0.0003 0.0003
##          2 103 2.45    0.0071 0.0069
##          3 154 2.00    0.0225 0.0178
##      Total                    0.0250 
## 
## ++ alpha spending:
##  O'Brien-Fleming boundary.
## 
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
## 
## Upper boundary (power or Type I Error)
##           Analysis
##    Theta      1      2      3 Total  E{N}
##   0.0000 0.0003 0.0069 0.0178 0.025 152.7
##   0.2638 0.0565 0.5288 0.3147 0.900 120.6
## 
## Lower boundary (futility or Type II Error)
##           Analysis
##    Theta     1      2      3 Total
##   0.0000 3e-04 0.0069 0.0178 0.025
##   0.2638 0e+00 0.0000 0.0000 0.000

gsDesign(k = 3, test.type = 2, 
         sfu="Pocock",n.fix = 151)

## Symmetric two-sided group sequential design with
## 90 % power and 2.5 % Type I Error.
## Spending computations assume trial stops
## if a bound is crossed.
## 
##               
##   Analysis  N   Z   Nominal p  Spend
##          1  58 2.29     0.011 0.0110
##          2 116 2.29     0.011 0.0079
##          3 174 2.29     0.011 0.0060
##      Total                    0.0250 
## 
## ++ alpha spending:
##  Pocock boundary.
## 
## Boundary crossing probabilities and expected sample size
## assume any cross stops the trial
## 
## Upper boundary (power or Type I Error)
##           Analysis
##    Theta     1      2      3 Total  E{N}
##   0.0000 0.011 0.0079 0.0060 0.025 170.3
##   0.2638 0.389 0.3421 0.1689 0.900 108.9
## 
## Lower boundary (futility or Type II Error)
##           Analysis
##    Theta     1      2     3 Total
##   0.0000 0.011 0.0079 0.006 0.025
##   0.2638 0.000 0.0000 0.000 0.000

Rework the statistical calculations in Table 18 of the Moderna brief.

cases = c(11,185)

PY = c(3304.9, 3273.7)

#See https://jhanley.biostat.mcgill.ca/bios601/COVIDvaccineTrials/VaccineTrials.html

# get binomial-based CI for expected (theoretical) proportion of vaccinated cases of COVID-19, i.e, of all COVID cases, what proportion ($\pi$) would you expect in the vaccinated PY?

binom.test(cases[1], sum(cases))

## 
##  Exact binomial test
## 
## data:  cases[1] and sum(cases)
## number of successes = 11, number of trials = 196, p-value < 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.02834578 0.09819815
## sample estimates:
## probability of success 
##             0.05612245

If PY were 50:50, then we could back-translate these limits to VE limits, using the expression \[\pi = \frac{100-VE}{200-VE},\] and its inverse

\[VE = \frac{100-200\pi}{1-\pi} = \frac{100-200 \times 0.056}{1-0.056} = 93.6 \ percent.\]

Because the PY were nor exactly equal, we can use the more general formula (see HPV trial):

\[\pi = \frac{F \times (100-VE)}{F \times (100-VE) + (1-F) \times 100},\] where $F$ is the fraction of all PY that were contributed by those in the Vaccinated arm [here, $F$ = 3304.9/(3304.9+ 3273.7)].

Likewise, rework the statistical calculations shown in Table 19.
Some authorities are planning to use just a single dose. Can you use the data in the Kaplan Meier plots to obtain a point and and interval estimate of the efficacy of a single dose, and if you can, what are the limitations? (You might find the information in Fig. 3 of the NEJM report of the BioNTech/Pfizer trial easier to work with). (See also the paragraph beginning with “Delaying or deferring the second dose of the Pfizer or Moderna vaccine has the same prob- lem” in Risks of altering vaccination protocols are unknown by Montreal Gazette columnist Dr. Christopher Labos, the comments by McGill’s Dr. Donald Vinh, as well as this BBC piece).
Comment on Senn’s Serendipity dividend? on the AstraZeneca/Oxford announcement. “As the press release put it, two different dosing regimens demonstrated efficacy with one showing a better profile.”
Paraphrase the message in this item from the StatsChat website at the University of Auckland.
Repeat the statistical calculations in the ‘SputnikV’ report. Also comment on any major design/analysis differences vis-a-vis the other trials you have looked at.

February 1, 2021

Effects of neonicotinoid pesticides on honey bees and wild bees

January 25, 2021

Reconstruct data: COVID19 <-> Psychiatric Disorder

In order to better understand these curves, it would help to be able to recreate the daily numbers of patients receiving a first psychiatric diagnoses as well as the daily numbers at risk. Focus on the two curves (COVID-19 patients vs. influenza patients) in the top left [F20-F48] panel of Figure 1 in the main text, and on ‘Table 1 – Characteristics of the COVID-19 and influenza cohorts before and after matching’ on page 9 of the supplement.

Using just the numbers at risk at days 30, 45, 60, 75, and 90 shown at the bottom of the relevant Figure 1 panel, and the plotted curves, make a rough estimate of the daily and total number of patients receiving a first psychiatric diagnosis. Hint: First: use rough interpolations to arrive at approx. daily numbers at risk, and multiply these by the daily (conditional) probabilities) of receiving a first psychiatric diagnosis. The almost linear patterns of the plotted outcome probabilities make it easy to arrive at approx. daily conditional probabilities.

>>> TEAM A

For the Covid group, the rate of being diagnosed with psychiatric illness is ~95 people/day. For the control group, it is ~55 people/day. In total there were ~10,000 at risk for first psychiatric diagnosis in the Covid group at day 0, and ~12,750 at risk in the Control group at day 0. Therefore total number of patients receiving a first psychiatric diagnosis by day 90 is ~8,542 in the Covid group (85%) and ~4,950 in the Control group (40%).

# input numbers at risk into dataframe

psych_df <- data.frame(
  day   = c(30,45,60,75,90,30,45,60,75,90),
  group = c(rep("Covid",5),rep("Control",5)),
  nar = c(7112,5265,3827,2602,1458,11174,10199,9371,8634,7800))

# calculate slope for covid group ~95
(1458-7112)/(90-30)

## [1] -94.23333

# calculate slope for control group ~55
(7800-11174)/(90-30)

## [1] -56.23333

# calculate number at risk at day 0 for covid group
1458 + 95*90

## [1] 10008

# calculate number at risk at day 0 for control group
7800 + 55*90

## [1] 12750

# calculate total number who received first diagnosis in covid group
10000-1458

## [1] 8542

# calculate total number who received first diagnosis in control group
12750-7800

## [1] 4950

# calculate probability of receiving first diagnosis in covid group
8542/10000

## [1] 0.8542

# calculate probability of receiving first diagnosis in control group
4950/12750

## [1] 0.3882353

JH: These calculations are in conflict with the (low) cumulative incidence. There is a lot of censoring. See below.

>>> TEAM B

Day 0 # at risk: 24857

Day 30: 7112 * 0.16

7112*0.16

## [1] 1137.92

5265 * 0.2

## [1] 1053

3827 * 0.4

## [1] 1530.8

2602 * 0.42

## [1] 1092.84

1458 * 0.5

## [1] 729

JH: looks like you are multiplying by CUMULATIVE incidence, rather than (daily) INCIDENCE, so double counting. See below, where the cumulative incidence is ‘de-cumulated’

TEAM Z

library(data.table)
library(tidyverse)
library(splitstackshape)
library(survival)

## 
## Attaching package: 'survival'

## The following object is masked from 'package:boot':
## 
##     aml

load(url('https://osf.io/mxa4e/download'))

kmResults=resMain[[1]]$outcomes[[1]]$KM$survival

df <- as.data.frame(kmResults)
colors <- c("COVID-19" = "red", "Influenza" = "blue")

ggplot(df, aes(x = time, y = 1-values1, color="COVID-19")) + 
  geom_line() +
  geom_ribbon(aes(ymin=1-CI1.lower, ymax=1-CI1.upper, color="COVID-19"), 
              fill="red", alpha=0.2)+
  geom_line(aes(x = time, y = 1-values2, color="Influenza"))+
  geom_ribbon(aes(ymin=1-CI2.lower, ymax=1-CI2.upper, color="Influenza"), 
              fill="blue", alpha=0.2)+
  xlab("Days") +
  ylab("Outcome probability (%)") +
  theme_bw() +
  scale_colour_manual(values=colors)+
  theme(legend.title = element_blank())+
  guides(color=guide_legend(override.aes=list(fill=NA,border = NA)))+
  ggtitle("Psychiatric illness (F20-F48)")

## Getting Number of events based on data for COVID-19


n.at.risk <- data.frame(time=c(14,30,45,60,75,90),
                        n.at.risk1=c(25850,7112,5265,3827,2602,1458),
                        n.at.risk1.st=c(18738,1847,1438,1225,1144,1458),
                        n.at.risk2=c(24857,11174,10199,9371,8634,7800),
                        n.at.risk2.st=c(13683,975,828,737,834,7800))

df2 <- merge(df,n.at.risk,by="time",all.x = T)

df2$n.events.1 <- ifelse(df2$time==14,
                         round(df2$n.at.risk1*(1-df2$values1)),NA)
df2$n.events.1 <- ifelse(df2$time==30,
                         round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==14]))
                         ,df2$n.events.1)
df2$n.events.1 <- ifelse(df2$time==45,
                         round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==30]))
                         ,df2$n.events.1)
df2$n.events.1 <- ifelse(df2$time==60,
                         round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==45]))
                         ,df2$n.events.1)
df2$n.events.1 <- ifelse(df2$time==75,
                         round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==60]))
                         ,df2$n.events.1)
df2$n.events.1 <- ifelse(df2$time==90,
                         round(df2$n.at.risk1*(1-df2$values1/df2$values1[df2$time==75]))
                         ,df2$n.events.1)


df2$n.events.2 <- ifelse(df2$time==14,
                         round(df2$n.at.risk2*(1-df2$values2)),NA)
df2$n.events.2 <- ifelse(df2$time==30,
                         round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==14]))
                         ,df2$n.events.2)
df2$n.events.2 <- ifelse(df2$time==45,
                         round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==30]))
                         ,df2$n.events.2)
df2$n.events.2 <- ifelse(df2$time==60,
                         round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==45]))
                         ,df2$n.events.2)
df2$n.events.2 <- ifelse(df2$time==75,
                         round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==60]))
                         ,df2$n.events.2)
df2$n.events.2 <- ifelse(df2$time==90,
                         round(df2$n.at.risk2*(1-df2$values2/df2$values2[df2$time==75]))
                         ,df2$n.events.2)


df3 <- na.omit(df2)

sum(df3$n.events.1)

## [1] 252

sum(df3$n.events.2)

## [1] 272

str(df3)

## 'data.frame':    6 obs. of  13 variables:
##  $ time         : int  14 30 45 60 75 90
##  $ values1      : num  1 0.986 0.975 0.964 0.955 ...
##  $ CI1.lower    : num  0.999 0.983 0.971 0.959 0.949 ...
##  $ CI1.upper    : num  1 0.988 0.979 0.968 0.96 ...
##  $ values2      : num  1 0.993 0.988 0.983 0.978 ...
##  $ CI2.lower    : num  0.999 0.992 0.986 0.981 0.975 ...
##  $ CI2.upper    : num  1 0.995 0.99 0.986 0.98 ...
##  $ n.at.risk1   : num  25850 7112 5265 3827 2602 ...
##  $ n.at.risk1.st: num  18738 1847 1438 1225 1144 ...
##  $ n.at.risk2   : num  24857 11174 10199 9371 8634 ...
##  $ n.at.risk2.st: num  13683 975 828 737 834 ...
##  $ n.events.1   : num  11 97 57 44 24 19
##  $ n.events.2   : num  9 69 54 45 50 45
##  - attr(*, "na.action")= 'omit' Named int [1:72] 2 3 4 5 6 7 8 9 10 11 ...
##   ..- attr(*, "names")= chr [1:72] "2" "3" "4" "5" ...

# df.covid <- df3 %>% select(time, n.at.risk1, n.events.1,values1,n.at.risk1.st)

df.covid = df3[,c("time", "n.at.risk1", "n.events.1",
                       "values1","n.at.risk1.st")] 

df.covid.2 <- expandRows(df.covid, "n.at.risk1.st", count.is.col = TRUE, drop = F)

df.covid.2 <- df.covid.2 %>% group_by(time) %>% mutate(observation = 1:n())

df.covid.2$status <- ifelse(df.covid.2$time==14 & df.covid.2$observation<=11,1,
                            ifelse(df.covid.2$time==30 & df.covid.2$observation<=97,1,
                                   ifelse(df.covid.2$time==45 & df.covid.2$observation<=57,1,
                                          ifelse(df.covid.2$time==60 & df.covid.2$observation<=44,1,
                                                 ifelse(df.covid.2$time==75 & df.covid.2$observation<=24,1,
                                                        ifelse(df.covid.2$time==90 & df.covid.2$observation<=29,1,
                                                               0))))))



covid.fit.log <- summary(survfit(Surv(df.covid.2$time,df.covid.2$status)~1), conf.type="log")
covid.fit.log

## Call: survfit(formula = Surv(df.covid.2$time, df.covid.2$status) ~ 
##     1)
## 
##  time n.risk n.event survival  std.err lower 95% CI upper 95% CI
##    14  25850      11    1.000 0.000128        0.999        1.000
##    30   7112      97    0.986 0.001381        0.983        0.989
##    45   5265      57    0.975 0.001960        0.971        0.979
##    60   3827      44    0.964 0.002565        0.959        0.969
##    75   2602      24    0.955 0.003118        0.949        0.961
##    90   1458      29    0.936 0.004641        0.927        0.945

covid.fit.log.log <- summary(survfit(Surv(df.covid.2$time,df.covid.2$status)~1), conf.type="log-log")
covid.fit.log.log

## Call: survfit(formula = Surv(df.covid.2$time, df.covid.2$status) ~ 
##     1)
## 
##  time n.risk n.event survival  std.err lower 95% CI upper 95% CI
##    14  25850      11    1.000 0.000128        0.999        1.000
##    30   7112      97    0.986 0.001381        0.983        0.989
##    45   5265      57    0.975 0.001960        0.971        0.979
##    60   3827      44    0.964 0.002565        0.959        0.969
##    75   2602      24    0.955 0.003118        0.949        0.961
##    90   1458      29    0.936 0.004641        0.927        0.945

##Comparing with the original data for COVID-19:
data.frame("Time"=unname(df3[1]),"Surv"=as.vector(covid.fit.log[6]),"Surv from data"=unname(df3[2]), 
           CI1.lower=covid.fit.log[c(14)],"lower from data"=unname(df3[3]),
           CI1.upper=covid.fit.log[c(15)],"upper from data"=unname(df3[4]))

##    Time      surv Surv.from.data     lower lower.from.data     upper
## 1    14 0.9995745      0.9995575 0.9993231       0.9992011 0.9998259
## 17   30 0.9859414      0.9858750 0.9832392       0.9831579 0.9886509
## 32   45 0.9752673      0.9752556 0.9714331       0.9714446 0.9791167
## 47   60 0.9640544      0.9640431 0.9590403       0.9591157 0.9690948
## 62   75 0.9551623      0.9549915 0.9490705       0.9490222 0.9612932
## 77   90 0.9361639      0.9422307 0.9271119       0.9342407 0.9453042
##    upper.from.data
## 1        0.9997549
## 17       0.9881564
## 32       0.9785635
## 47       0.9683865
## 62       0.9602766
## 77       0.9492763

## Getting Number of events based on data for Influenza

# df.influ <- df3 %>% select(time, n.at.risk2, n.events.2,values2,n.at.risk2.st)

df.influ <- df3[, c("time", "n.at.risk2", "n.events.2",
                   "values2","n.at.risk2.st") ]

df.influ.2 <- expandRows(df.influ, "n.at.risk2.st", count.is.col = TRUE, drop = F)

df.influ.2 <- df.influ.2 %>% group_by(time) %>% mutate(observation = 1:n())

df.influ.2$status <- ifelse(df.influ.2$time==14 & df.influ.2$observation<=9,1,
                            ifelse(df.influ.2$time==30 & df.influ.2$observation<=69,1,
                                   ifelse(df.influ.2$time==45 & df.influ.2$observation<=54,1,
                                          ifelse(df.influ.2$time==60 & df.influ.2$observation<=45,1,
                                                 ifelse(df.influ.2$time==75 & df.influ.2$observation<=50,1,
                                                        ifelse(df.influ.2$time==90 & df.influ.2$observation<=45,1,
                                                               0))))))



influenza.fit.log <- summary(survfit(Surv(df.influ.2$time,df.influ.2$status)~1), conf.type="log")
influenza.fit.log

## Call: survfit(formula = Surv(df.influ.2$time, df.influ.2$status) ~ 
##     1)
## 
##  time n.risk n.event survival  std.err lower 95% CI upper 95% CI
##    14  24857       9    1.000 0.000121        0.999        1.000
##    30  11174      69    0.993 0.000750        0.992        0.995
##    45  10199      54    0.988 0.001033        0.986        0.990
##    60   9371      45    0.983 0.001247        0.981        0.986
##    75   8634      50    0.978 0.001477        0.975        0.981
##    90   7800      45    0.972 0.001691        0.969        0.975

influenza.fit.log.log <- summary(survfit(Surv(df.influ.2$time,df.influ.2$status)~1), conf.type="log-log")
influenza.fit.log.log

## Call: survfit(formula = Surv(df.influ.2$time, df.influ.2$status) ~ 
##     1)
## 
##  time n.risk n.event survival  std.err lower 95% CI upper 95% CI
##    14  24857       9    1.000 0.000121        0.999        1.000
##    30  11174      69    0.993 0.000750        0.992        0.995
##    45  10199      54    0.988 0.001033        0.986        0.990
##    60   9371      45    0.983 0.001247        0.981        0.986
##    75   8634      50    0.978 0.001477        0.975        0.981
##    90   7800      45    0.972 0.001691        0.969        0.975

data.frame("Time"=unname(df3[1]),"Surv"=as.vector(influenza.fit.log[6]),"Surv from data"=unname(df3[5]), 
           CI1.lower=influenza.fit.log[c(14)],"lower from data"=unname(df3[6]),
           CI1.upper=influenza.fit.log[c(15)],"upper from data"=unname(df3[7]))

##    Time      surv Surv.from.data     lower lower.from.data     upper
## 1    14 0.9996379      0.9996518 0.9994015       0.9993310 0.9998745
## 17   30 0.9934651      0.9934583 0.9919953       0.9918545 0.9949371
## 32   45 0.9882051      0.9881514 0.9861827       0.9859959 0.9902316
## 47   60 0.9834597      0.9833711 0.9810189       0.9807940 0.9859065
## 62   75 0.9777644      0.9777117 0.9748737       0.9746913 0.9806637
## 77   90 0.9721234      0.9721212 0.9688147       0.9686907 0.9754435
##    upper.from.data
## 1        0.9998188
## 17       0.9947472
## 32       0.9899769
## 47       0.9856050
## 62       0.9803753
## 77       0.9751807

>>> JH

# Fig 1 top left

day=c(14,30,45,60,75,90)
n.at.risk.covid =c(24857, 7112, 5265,3827,2602,1458)
n.at.risk.flu   =c(25850,11174,10199,9371,8634,7800)

# take advantage of approx function to  interpolate

( N.at.risk.flu =  round(approx(day,n.at.risk.flu,14:90)$y) )

##  [1] 25850 24933 24016 23098 22181 21264 20346 19429 18512 17595 16678 15760
## [13] 14843 13926 13008 12091 11174 11109 11044 10979 10914 10849 10784 10719
## [25] 10654 10589 10524 10459 10394 10329 10264 10199 10144 10089 10033  9978
## [37]  9923  9868  9813  9757  9702  9647  9592  9537  9481  9426  9371  9322
## [49]  9273  9224  9174  9125  9076  9027  8978  8929  8880  8831  8781  8732
## [61]  8683  8634  8578  8523  8467  8412  8356  8300  8245  8189  8134  8078
## [73]  8022  7967  7911  7856  7800

( N.at.risk.covid =  round(approx(day,n.at.risk.covid,14:90)$y) )

##  [1] 24857 23748 22639 21530 20421 19312 18203 17094 15984 14875 13766 12657
## [13] 11548 10439  9330  8221  7112  6989  6866  6743  6619  6496  6373  6250
## [25]  6127  6004  5881  5758  5634  5511  5388  5265  5169  5073  4977  4882
## [37]  4786  4690  4594  4498  4402  4306  4210  4115  4019  3923  3827  3745
## [49]  3664  3582  3500  3419  3337  3255  3174  3092  3010  2929  2847  2765
## [61]  2684  2602  2526  2449  2373  2297  2221  2144  2068  1992  1916  1839
## [73]  1763  1687  1611  1534  1458

# Make 2 'POPULATION-TIME' PLOTS 
# (areas are total amounts of population-time)

par(mfrow=c(1,1),mar=c(4,4,0,0))

plot(c(14,90),c(-30000,30000),col="white",
     xlab="Day", ylab="No. at Risk")

polygon(c(14, 14:90, 90,14), 
        c(0,N.at.risk.covid,0,0),
        border=NA,
        col="pink")
polygon(c(14, 14:90, 90,14), 
        c(0,-N.at.risk.flu,0,0),
        border=NA,
        col="lightblue")

Rough re-construction of Numbers at Risk

cbind(14:90,N.at.risk.flu, N.at.risk.covid)[1:10,]

##          N.at.risk.flu N.at.risk.covid
##  [1,] 14         25850           24857
##  [2,] 15         24933           23748
##  [3,] 16         24016           22639
##  [4,] 17         23098           21530
##  [5,] 18         22181           20421
##  [6,] 19         21264           19312
##  [7,] 20         20346           18203
##  [8,] 21         19429           17094
##  [9,] 22         18512           15984
## [10,] 23         17595           14875

# Daily Number of Cases = daily rate * N. at Risk

# Daily Rate (eye fit)

# Theoretically, Risk = 1- exp [- integral of rate function ]

# if ultimate (90 day) risk is low,

# Risk = sum of daily rates, so

# (ave. of daily rates = 90-day-risk / No. of days

# 90-day risks are 0.028 and 0.058, so ...

daily.incidence.rate.flu = 0.028/(90-13)
n.cases.flu.cohort = round(N.at.risk.flu *
                    daily.incidence.rate.flu )


daily.incidence.rate.covid = 0.058/(90-13)
n.cases.covid.cohort = round( N.at.risk.covid * 
              daily.incidence.rate.covid)

cbind( day, n.cases.flu.cohort,n.cases.covid.cohort )

## Warning in cbind(day, n.cases.flu.cohort, n.cases.covid.cohort): number of rows
## of result is not a multiple of vector length (arg 1)

##       day n.cases.flu.cohort n.cases.covid.cohort
##  [1,]  14                  9                   19
##  [2,]  30                  9                   18
##  [3,]  45                  9                   17
##  [4,]  60                  8                   16
##  [5,]  75                  8                   15
##  [6,]  90                  8                   15
##  [7,]  14                  7                   14
##  [8,]  30                  7                   13
##  [9,]  45                  7                   12
## [10,]  60                  6                   11
## [11,]  75                  6                   10
## [12,]  90                  6                   10
## [13,]  14                  5                    9
## [14,]  30                  5                    8
## [15,]  45                  5                    7
## [16,]  60                  4                    6
## [17,]  75                  4                    5
## [18,]  90                  4                    5
## [19,]  14                  4                    5
## [20,]  30                  4                    5
## [21,]  45                  4                    5
## [22,]  60                  4                    5
## [23,]  75                  4                    5
## [24,]  90                  4                    5
## [25,]  14                  4                    5
## [26,]  30                  4                    5
## [27,]  45                  4                    4
## [28,]  60                  4                    4
## [29,]  75                  4                    4
## [30,]  90                  4                    4
## [31,]  14                  4                    4
## [32,]  30                  4                    4
## [33,]  45                  4                    4
## [34,]  60                  4                    4
## [35,]  75                  4                    4
## [36,]  90                  4                    4
## [37,]  14                  4                    4
## [38,]  30                  4                    4
## [39,]  45                  4                    3
## [40,]  60                  4                    3
## [41,]  75                  4                    3
## [42,]  90                  4                    3
## [43,]  14                  3                    3
## [44,]  30                  3                    3
## [45,]  45                  3                    3
## [46,]  60                  3                    3
## [47,]  75                  3                    3
## [48,]  90                  3                    3
## [49,]  14                  3                    3
## [50,]  30                  3                    3
## [51,]  45                  3                    3
## [52,]  60                  3                    3
## [53,]  75                  3                    3
## [54,]  90                  3                    2
## [55,]  14                  3                    2
## [56,]  30                  3                    2
## [57,]  45                  3                    2
## [58,]  60                  3                    2
## [59,]  75                  3                    2
## [60,]  90                  3                    2
## [61,]  14                  3                    2
## [62,]  30                  3                    2
## [63,]  45                  3                    2
## [64,]  60                  3                    2
## [65,]  75                  3                    2
## [66,]  90                  3                    2
## [67,]  14                  3                    2
## [68,]  30                  3                    2
## [69,]  45                  3                    2
## [70,]  60                  3                    2
## [71,]  75                  3                    1
## [72,]  90                  3                    1
## [73,]  14                  3                    1
## [74,]  30                  3                    1
## [75,]  45                  3                    1
## [76,]  60                  3                    1
## [77,]  75                  3                    1

sum( n.cases.flu.cohort )

## [1] 318

sum( n.cases.covid.cohort )

## [1] 384

(de luxe version ) Using the data 5 in the OSF COVIDPsychiatricSequelae.Rdata file, use reverse-engineering to arrive at the daily numbers at risk and the daily and total number of patients receiving a first psychiatric diagnosis. Hint: Among others, this involves knowing and using the (Greenwood) formula for the variance of the S(t) at each day t, and figuring out which conf.type option in survfit(Surv(time,dx) 1,conf.type=?) was used to produce the lower and upper limits of the confidence interval.

>>> TEAM X

path = "/Users/jameshanley/OneDrive - McGill University/work/RF/OSF_COVIDPsychiatricSequelae.Rdata"
x = load(path)
y = get(x[1])
psy = y

# Extract data
timepoints <- resMain[[1]]$outcomes[[1]]$KM$survival$time
v1 <- resMain[[1]]$outcomes[[1]]$KM$survival$values1 # COVID-19 cohort
v2 <- resMain[[1]]$outcomes[[1]]$KM$survival$values2 # Influenza cohort

plot(timepoints,v1, pch=19,cex=0.5,
     type="l", 
     ylab="Survival (proportion without Psych Dx)",
     xlab="Days post Covid-19 Dx",
     
     ylim=c(0.93, 1))

lines(timepoints,v2, pch=19,cex=0.5)

# CIs:
ci1 <- resMain[[1]]$outcomes[[1]]$KM$survival$CI1 # COVID-19 cohort
ci2 <- resMain[[1]]$outcomes[[1]]$KM$survival$CI2 # Influenza cohort

segments(timepoints, ci1[,1],
         timepoints, ci1[,2]  )

segments(timepoints, ci2[,1],
         timepoints, ci2[,2]  )

The team’s graph was entitled

Psychiatric diagnosis in women aged 20-48 y

JH: this is an amusing ‘translation’ of the abbreviations F20-F48! It is important to read not just the statistical methods section but also the other sections. Under ‘Variables of interest and their coding’ one can read

We defined a psychiatric illness as any of the ICD-10 codes F20–F48, comprising psychotic (F20–F29), mood (F30–F39), and anxiety (F40–F48) disorders.

# From rearranging the Greenwood formula, we get the variance:
# this assumes the “plain” conf.interval type and 95% interval
var <- ((v1 - ci1[,1])/1.96)^2 #covid-19 group
var2 <- ((v2 - ci2[,1])/1.96)^2 #influenza group

plot(timepoints,var, pch=19,
     xlab="Days post Covid-19 Dx",
     cex=0.5,
     ylab="Variance[Survival proportion]")
points(timepoints,var2, pch=19,cex=0.5)

We used the Greenwood formula to estimate the variance of the survival function at each timepoint. As seen in the graph above, the variance increases over time, as the number at risk decreases. In the Covid-19 cohort, this increase in variance is more rapid compared to the influenza cohort.

JH: Good start, by getting approximate Greenwood Variance (GV). With the passage of time, as S falls, the first term in the variance falls, but the second term increases. See below on how to solve for the $n_j$’s.

JH: continued…

\[GV(\widehat{S[t]}) = S(t)^2 \times \sum_j \frac{1-s_j}{n_j \times s_j}, \] where the summation is over (risksets) $j \le t$, where $s_j$ refers to the conditional survival probability for riskset $j$, and $n_j$ is the size of the $j \ th$ riskset.

Since the daily incidence is low (in the COVID cohort, for example, each $1 - s_j \approx 0.058/75$) and this each $s_j$ is close to 1, the denominators in the sum are close to $n_j$.

If there were NO censoring, so the $n_j$ decrease only because of the new psych. diagnoses , the entire sum would be approx.

\[ \sum_{75 \ risksets} \frac{0.058/75}{slowly \ decreasing \ n's \ \times \ quantities \ close \ to \ 1} \] So, if S[90] = 1-0.058 = 0.942, then (approximately) \[GV(\widehat{S[t]}) = 0.942^2 \times \frac{0.058}{0.942 \times original \ n} = \frac{0.942 \times 0.058}{original \ n}.\]

This is the pure (single-binomial, at the end of 90 days) variance. Obviously, if the daily $n_j$ decrease for other reasons, and by large amounts, then the GV will be a lot larger. Indeed, how much wider than the simple binomial $0.058 \times 0.942$ / ‘starting-out $n$’ the variance of $\widehat{S[90]}$ actually is gives a good sense of how much censoring there is. Of course, at the bottom of the Figure, we are given the numbers at risk at selected timepoints, and so we can see for ourselves. But, in situations where such numbers at risk are not provided, the extra variance can be a good indicator of the degree of censoring.

>>> JH

Orientation [from letter under review]

Reconstruction Methods

The daily numbers of patients at risk, and the daily numbers of new cases, were back-calculated from the Rdata file (supplied as a supplement to the published article) using the widths of the confidence intervals and the cumulative incidences.

The R code is given below.

The numbers at risk were back-calculated from the confidence intervals (CIs) and point estimates.

We illustrate using the day-14 point and interval estimate.

The CI was asymmetric, so we used trial and error to determine that the CI option used by the survfit program in R was the logit-based one. By using the appropriate Jacobian, we were thus able to back-calculate the Greenwood variance for S(14) – this variance is at the base of all CI options.

The Greenwood variance for S[14] is a function of S[14] itself (provided in the RData file), as well as the number of events and the number at risk. S[14] is a different function of the number of events and the number at risk. So, from the variance and the point estimate, we can solve for the 2 unknowns.

Again, the variance for S[15] involves S[15] itself, but it now involves a sum of two variances, the first of which we already have calculated when dealing with S[14], and the second which we obtain by subtraction. Knowing that S[15] is a product of two conditional probabilities, the first of which is S[14] itself, we can get the second. So, again, we can solve two equations to get the number of events and the number at risk for day 15.

And so on.

We were able to double-check our calculations in two ways.

First, we checked that our back-calculated numbers at risk matched the selected ones (at days 30, 45, etc) reported in the article.

Second, we ran our reconstructed daily numbers of cases and numbers at risk though the R survfit program, with the logit-based CI option, and compared the output with that in the Rdata file supplied.

First 2 calculations , in symbols:

upper case S = ‘cumulative’, lower case s = conditional.

The GV for $S[14]$ is \[S[14]^2 \frac{1-s_{14}}{n_{14} \times s[14]}\] For very first day, $S[14] = s_{14}$; we are given $S[14]$, so if we can back-calculate GV for $S[14]$, we can solve for $n_{14}$. We back-calculate the GV from the limits of the CI for $S[14]$, which are included in the file. (see below)

Now consider day 15, where \[S[15] = s_{14} \times s_{15}.\] We are given $S[15]$, so we can back-calculate $s_{15}$.

The GV for $S[15]$ is \[S[15]^2 \times \{ \frac{1-s_{14}}{n_{14} \times s_{14}} + \frac{1-s_{15}}{n_{15} \times s_{15}} \}\] Again if we can back-calculate the GV from the limits of the CI for $S[15]$, which are included in the file, we can solve for $n_{15}$,

and so on

If the authors had used symmetric CIs, it would be very easy to back-calculate the GV: one would simply divide the width of the CI by 2 $\times$ 1.96, and square this quantity.

However, as is clear from the graph above, the CI is not symmetric, so now it is a matter of figuring out which ‘link’ function (transformation) was used. JH did this by trial and error, each time using the Delta method to connect the variance of the transform and the GV value.

It appears that the authors used the logit transform. (This keeps the CI within the (0,1) bound for S)

# de luxe

# First, some double-checking of rescaling 
# factors involved in variances of functions 
# of random variables

logit = function(x) log(x/(1-x))
cll = function(x) log(-log(1-x))

S = 0.9; dS = 0.001
( VLogitSDirect = 1/(S*(1-S)) )

## [1] 11.11111

VS = S*(1-S)
( J = diff( logit(S + c(-dS/2,dS/2) ) /dS ) )

## [1] 11.11119

J^2

## [1] 123.4586

J^2 *  ( S*(1-S) )

## [1] 11.11128

# Now to the 'data'

setwd("/Users/jameshanley/OneDrive - McGill University/work/RF")

load('OSF_COVIDPsychiatricSequelae.Rdata')

par(mfrow=c(2,3),mar=c(0,0,0,0))

Y0 = 55000
dS = 11

panels=1:6 # (all 6)

panels=1:1  # (just 1st)

if(length(panels)==6) par(mfrow=c(2,3),mar=c(0,0,0,0))
if(length(panels)==1) par(mfrow=c(1,1),mar=c(0,0,0,0))

for(ctlGp in panels){

km=resMain[[ctlGp]]$outcomes[[1]]$KM$survival

index=resMain[[ctlGp]]$cohort1
ref  =resMain[[ctlGp]]$cohort2
outcome = resMain[[ctlGp]]$outcomeNames[1]

for(i in 1:2){
  
 t = km$time
 L = km$CI1$lower
 S = km$values1
 U = km$CI1$upper
 if(i==2){
   L = km$CI2$lower
   S = km$values2
   U = km$CI2$upper
 }
 
 # print(100*min(1-S))
 
 m=cbind(t,L,S,U)

 ii=1:50

 t(apply(  m[ii,2:4], 1, diff))
 t(apply(  cll(m[ii,2:4]), 1, diff))
 t(apply(  logit(m[ii,2:4]), 1, diff))
 t(apply(  m[ii,2:4], 1, diff))

 W = (2*qnorm(0.975))
 W = 2*1.96
 SE = (logit(U) - logit(L))/W
 Vlogit  = SE^2
 J = 1/(S * (1-S))
 Vgreenwood = Vlogit / J^2
 Sum = Vgreenwood/(S^2)

 ( v = c(Sum[1],diff(Sum[1:77])) )

 n = rep(NA,77)

 q= c( 1-S[1], 1- S[2:77]/S[1:76] )

 imputed.n = ( q/(1-q) ) / v

 n  = imputed.n
 # print(n)
 for.missing = approx(1:77,n,1:77)$y
 n[is.na(n)] = for.missing[is.na(n)] 
 # print(n)

 n = round(n)
 # print(n)

 y = round( n[1:77] * q ) ; print(sum(y))
 y[is.na(y)] = 0

 YLAB=""
 if(ctlGp %in% c(10,14)) YLAB="No. Persons At Risk"
 
 title = paste(outcome,"; ",index,"vs.",ref)
 
 if(i==1) cohort = index
 if(i==2) cohort = ref
 
 if(i==1){
   plot(14:90,n,col="white",
        xlim=c(10,100),
        #xlab="Days Post COVID-19 Diagnosis",
        xlab="",
        ylim=c(-Y0,Y0+Y0/1.8),
        ylab=YLAB,
        yaxt="n",xaxt="n",
        cex.main=1.8
      )
        
   abline(h=seq(-50000,50000,10000),
          lwd=0.6,col="grey75")
 } 
 xx = as.numeric(t(matrix(c(14:90,15:91),77,2)))
 yy = rep(n,each=2)
 if(i==2) polygon(c(xx,91,14),-c(yy,0,0),
        col="lightblue",border=NA)
 if(i==1) polygon(c(xx,91,14), c(yy,0,0),
                  col="pink",border=NA)
 
 for(j in 1:77) {
   cases = y[j]
   Ymin=0; Ymax=n[j]
   if(i==2) { Ymin = -n[j] ; Ymax = 0  }
   if(cases>0) points(
                   14+j-1+runif(cases,0.05,0.95),
                   runif(cases,Ymin+50, Ymax-50),
                   pch=19,cex=0.15
               )
 }
 yy = (Y0/1.6)*(i==1) -(Y0/1.6)*(i==2)
 points(75,yy,pch=19,cex=0.2)
 text(  100,yy, paste(toString(sum(y)),"Cases"),
      adj=c(1,.5),cex=1.5)
 
 YY=Y0+0.1*Y0
 
 lines(t,YY+(Y0/2)*(1-S)*dS,
       type='s',
       col=c("pink","lightblue")[i],
       lwd=2)
 segments(14,YY,91,YY)
 segments(14,YY,14,YY+Y0/2.5)
 
 text((14+90)/2, YY -Y0/15, 
      "Day",adj=c(0.5,1.25))
 
 PCT = round(100*(1-S[77]),1)
 
 pct = toString( PCT )
 txt =substr(pct,1,1)
 text(94.5,YY+(Y0/2)*(1-S[77])*dS, txt,
      adj=c(1,0.5) )
 points(95,YY+(Y0/2)*(1-S[77])*dS,pch=19,cex=0.35)
 txt =paste(substr(pct,3,3),"%",sep="")
 text(95.5,YY+(Y0/2)*(1-S[77])*dS, txt,
      adj=c(0,0.5) )

 
 for(day in c(14,seq(30,90,15))){
    a = 20-40*(i==2)
    if(day==14) a = 0
    text(day,YY,toString(day),
       adj=c(0.5,1.25))
    yy = n[day-13]+5000
    if(i==2) yy = -yy +100
    
    text(day,yy,
         toString(n[day-13]),
         adj=c(0.4,0.5),
         srt=a)
    
    if(day==14) text(20,yy,cohort, adj=c(0,0.5),
         col= c("pink","lightblue")[i],
         cex=1.4, font=2)
 } 
 
 
 
if(i==1) text(15, Y0 +Y0/1.65,
               "Cumulative\nIncidence", 
              cex=1, adj=c(0,1.05) )
 
if(ctlGp ==1 ) print(n)
 
} # i


} # 6

## [1] 302

##  [1] 24857  9914  9581  9314  9098  8895  8717  8568  8397  8199  8014  7841
## [13]  7694  7554  7433  7272  7112  6956  6823  6709  6580  6471  6337  6192
## [25]  6052  5962  5852  5740  5610  5500  5367  5265  5168  5059  4956  4865
## [37]  4763  4663  4558  4455  4360  4273  4186  4096  4015  3915  3827  3759
## [49]  3694  3618  3495  3399  3324  3254  3160  3099  3025  2951  2848  2758
## [61]  2670  2602  2524  2450  2368  2287  2206  2129  2053  1979  1906  1831
## [73]  1745  1675  1593  1515  1458
## [1] 285
##  [1] 25850 12352 12250 12167 12085 12016 11930 11858 11782 11694 11600 11517
## [13] 11448 11386 11329 11257 11174 11099 11024 10962 10895 10830 10768 10688
## [25] 10629 10563 10508 10446 10393 10342 10268 10199 10145 10093 10026  9953
## [37]  9899  9834  9783  9736  9689  9645  9581  9528  9471  9417  9371  9336
## [49]  9286  9234  9182  9128  9060  9020  8965  8916  8863  8816  8764  8725
## [61]  8674  8634  8584  8534  8474  8410  8363  8312  8271  8230  8176  8119
## [73]  8046  7973  7912  7856  7800

# back translation  to check that it works

# this 'if' statement turn a chunk of code on/off
# 1==2 turns it off
# 1==1 turns it on
# (saves comments)

if(1==1){
  
   y
   n

   s = n-y
  (c = c(s[1:76] - n[2:77],s[77]))

time=NULL
dx=NULL
for(i in 1:77){
  if(y[i] > 0) {
    time=c(time,rep(13+i,y[i]))
    dx  =c(dx,rep(1,y[i]))
  }
  if(c[i] > 0) {
    time=c(time,rep(13+i,c[i]))
    dx  =c(dx,rep(0,c[i]))
  }
}
sum(dx)
length(dx)
library(survival)
fit = survfit(Surv(time,dx)~1,conf.type="logit")
# plot(fit$time,1-fit$surv)

round(cbind(L[1:77], fit$lower, U[1:77],fit$upper),4)

}

##         [,1]   [,2]   [,3]   [,4]
##  [1,] 0.9993 0.9993 0.9998 0.9998
##  [2,] 0.9984 0.9984 0.9994 0.9994
##  [3,] 0.9979 0.9979 0.9992 0.9992
##  [4,] 0.9974 0.9974 0.9988 0.9988
##  [5,] 0.9971 0.9971 0.9986 0.9986
##  [6,] 0.9968 0.9968 0.9984 0.9984
##  [7,] 0.9964 0.9964 0.9982 0.9982
##  [8,] 0.9956 0.9956 0.9976 0.9976
##  [9,] 0.9953 0.9953 0.9974 0.9974
## [10,] 0.9949 0.9949 0.9971 0.9971
## [11,] 0.9943 0.9943 0.9967 0.9967
## [12,] 0.9941 0.9941 0.9965 0.9965
## [13,] 0.9939 0.9939 0.9964 0.9964
## [14,] 0.9932 0.9932 0.9958 0.9958
## [15,] 0.9927 0.9927 0.9954 0.9955
## [16,] 0.9923 0.9923 0.9951 0.9951
## [17,] 0.9919 0.9919 0.9947 0.9947
## [18,] 0.9915 0.9915 0.9944 0.9944
## [19,] 0.9911 0.9911 0.9941 0.9941
## [20,] 0.9908 0.9908 0.9939 0.9939
## [21,] 0.9903 0.9903 0.9935 0.9935
## [22,] 0.9901 0.9901 0.9934 0.9934
## [23,] 0.9896 0.9896 0.9930 0.9930
## [24,] 0.9894 0.9894 0.9928 0.9928
## [25,] 0.9886 0.9886 0.9921 0.9921
## [26,] 0.9883 0.9883 0.9919 0.9919
## [27,] 0.9878 0.9878 0.9915 0.9915
## [28,] 0.9873 0.9873 0.9910 0.9910
## [29,] 0.9869 0.9869 0.9908 0.9908
## [30,] 0.9865 0.9865 0.9904 0.9904
## [31,] 0.9862 0.9862 0.9902 0.9902
## [32,] 0.9860 0.9860 0.9900 0.9900
## [33,] 0.9859 0.9859 0.9899 0.9899
## [34,] 0.9858 0.9858 0.9898 0.9898
## [35,] 0.9855 0.9855 0.9895 0.9895
## [36,] 0.9852 0.9852 0.9893 0.9894
## [37,] 0.9850 0.9850 0.9892 0.9892
## [38,] 0.9847 0.9847 0.9889 0.9889
## [39,] 0.9843 0.9843 0.9885 0.9885
## [40,] 0.9837 0.9837 0.9881 0.9881
## [41,] 0.9834 0.9834 0.9878 0.9878
## [42,] 0.9827 0.9827 0.9872 0.9872
## [43,] 0.9820 0.9820 0.9867 0.9867
## [44,] 0.9817 0.9817 0.9864 0.9864
## [45,] 0.9814 0.9814 0.9861 0.9861
## [46,] 0.9809 0.9809 0.9857 0.9857
## [47,] 0.9808 0.9808 0.9856 0.9856
## [48,] 0.9803 0.9803 0.9852 0.9852
## [49,] 0.9794 0.9794 0.9844 0.9844
## [50,] 0.9791 0.9791 0.9841 0.9841
## [51,] 0.9790 0.9790 0.9840 0.9840
## [52,] 0.9783 0.9783 0.9834 0.9834
## [53,] 0.9780 0.9780 0.9832 0.9832
## [54,] 0.9777 0.9777 0.9829 0.9829
## [55,] 0.9770 0.9770 0.9823 0.9823
## [56,] 0.9767 0.9767 0.9821 0.9821
## [57,] 0.9764 0.9764 0.9818 0.9818
## [58,] 0.9760 0.9760 0.9815 0.9815
## [59,] 0.9757 0.9757 0.9812 0.9812
## [60,] 0.9752 0.9752 0.9808 0.9808
## [61,] 0.9751 0.9751 0.9807 0.9807
## [62,] 0.9747 0.9747 0.9804 0.9804
## [63,] 0.9744 0.9744 0.9802 0.9802
## [64,] 0.9742 0.9742 0.9800 0.9800
## [65,] 0.9741 0.9741 0.9798 0.9798
## [66,] 0.9733 0.9733 0.9792 0.9792
## [67,] 0.9727 0.9727 0.9787 0.9787
## [68,] 0.9724 0.9725 0.9784 0.9784
## [69,] 0.9724 0.9725 0.9784 0.9784
## [70,] 0.9718 0.9718 0.9779 0.9779
## [71,] 0.9714 0.9714 0.9776 0.9776
## [72,] 0.9707 0.9707 0.9769 0.9769
## [73,] 0.9699 0.9699 0.9762 0.9762
## [74,] 0.9692 0.9692 0.9756 0.9756
## [75,] 0.9690 0.9690 0.9754 0.9754
## [76,] 0.9690 0.9690 0.9754 0.9754
## [77,] 0.9687 0.9687 0.9752 0.9752

January 18, 2021

Nocebo effect

>>> TEAM A

Explain in plain words what they mean by a nocebo ratio. (If you like, use numbers to illustrate)

The effect of a placebo relative to statins on symptom intensity controlled by taking neither treatment.
Equation: SI = symptom intensity \[\frac{(SI_{placebo} - SI_{nocebo}) } {(SI_{statin} - SI_{nocebo})}\]

JH: might have written: The (corrected for background) level of reported symptoms when taking placebo relative to when taking statins. He is not keen on the word ‘controlled’, especially when communicating with general audiences.”Net’ is another useful phrase. BTW, do you mean (Placebo-Nothing)/(Statin-Nothing)?

Explain in your own words why, despite the reasonable sample size, the first approach produced unsatisfactory results.

The high variance in the nocebo ratio was a result of the way the equation for it was defined. If the result was bound for example, between 0 and 3, there wouldn’t be a wide possible range of values as there is with negative values and low values in the denominator. Lower values of the denominator (statin-nocebo), lead to high nocebo ratio, but if the denominator was negative, the resulting ratio would also likely be negative. The nocebo ratios within a patient could vary widely between samples with slight changes to symptom intensity.

JH: Indeed, the between-individual SI range was very large, even if we leave aside the correction for background. If for each of 60 individuals, we computed the ratio of the (grip) strength of the non-dominant to the dominant hand, but these individuals varied in age from 2 to 99, we would have some wild ratios. Ratios can be particularly volatile if the denominators are close to zero (imagine a BMI in newborns and adults, or estimating $\pi$ from the areas of circles ranging in radius from mm to m, etc.) The effects of measurement errors, variation from day to day, etc. are amplified when one takes a ratio.

The second approach produced a sensible point estimate, but no interval estimate. Using the data from Figure 1, supply one. (deluxe version not required)

suppressMessages(library(tidyverse))
suppressMessages(library(data.table))
suppressMessages(library(MASS))      
suppressMessages(library(boot))


ds=read.csv("https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")

print(str(ds))

## 'data.frame':    666 obs. of  4 variables:
##  $ X              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ PatientNo      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ BottleContent  : chr  "0" "0" "Statin" "Statin" ...
##  $ MeanSymbolScore: num  1.07 0.81 0.71 0.99 0.96 0.81 0.96 0.71 0.74 0.89 ...
## NULL

nocebo <- read.csv("https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv", 
  header=T) %>%  
  dplyr::select(-X) %>% 
  group_by(PatientNo, BottleContent) %>%
  summarize(mean = mean(MeanSymbolScore), .groups='drop')
bratio <- function(x, i){
  df<- x[i,]
  df<- 
    df %>% 
    group_by(BottleContent) %>% 
    summarize(mean=mean(mean), .groups='drop') %>% 
    spread(BottleContent, mean) 
  (df$Placebo-df$`0`) / (df$Statin-df$`0`)
}
bratio(nocebo) # point estimate

## [1] 0.8536503

round(bratio(nocebo),2) # ADDED BY JH

## [1] 0.85

b <- boot(nocebo, bratio, R=1000)
ci <- boot.ci(b, conf=0.95, type="perc") #95%CI
ci

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b, conf = 0.95, type = "perc")
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 0.3484,  1.7339 )  
## Calculations and Intervals on Original Scale

answer: ( 0.3492, 1.5814 ), with mean 0.85. (or 0.35 to 1.58, after JH rounding)

JH: this is a very wide CI. It looks like the function adds noise by using a single sample of size 666, with replacement, and ignores the fact that they belong to 60 patients. (See below).

What is the broader statistical methods message from this example?

jacknife and bootstrap, get you an interval etimate that can make sense

The analysis method should be chosen after collecting and understanding the distribution of the data rather than before. Using a measurement that will result in reasonable values - i.e. not using an equation that provides a meaninglessly high or negative result. Consider possible outcomes, and if analysis plan will result in reasonable outcome.

JH: Its very much a question of having an estimator that yields a stable number. A ratio estimator uses a ratio of 2 sums or means, not a mean of many patient-specific ratios. JH’s favourite example is from Miettinen, who asks how you would estimate the sex-ratio in children attending small day-care centers of a few children each. Would you take 1 mean (or median) of the n very volatile center-specific ratios? Why not take 1 ratio of the total numbers of boys to the total number of girls?

Comment on the last paragraph of page 6 of the Supplementary Appendix [“As there were no prior data…”]

As per our discussion last week, the “no prior data” should not have influenced the sample size calculation – which should be based on the minimum clinically relevant effect, and not on prior observed effect sizes

>>> TEAM B

Nocebo ratio: The nocebo ratio is simply the ratio of symptoms induced by taking a placebo to taking statins. When the ratio is >1, there are more symptoms in the placebo group; when the ratio is <1, there are more symptoms in the statin group; when the ratio =1, there are the same amount of symptoms in the placebo and statin group.

** The subtraction of the level of ‘background’ symptoms is indeed implied by the word ‘Induced’. You might wish to write of the ‘level’ or ‘extent’ of the symptoms.**

The nocebo ratio was initially calculated as: (symptoms with placebo - symptoms with nothing) / (symptoms with statins-symptoms with nothing) for each individual. For some people, the denominator of this calculation was very small or negative (meaning there were similar amount of symptoms when taking statins and taking nothing, or in some instances symptoms were greater when taking nothing compared to taking statins), and this can likely be attributed to random error. This led to an average nocebo ratio of 2.2 (meaning more symptoms in placebo than statins) with very large confidence intervals, which doesn’t make a lot of sense biologically. To get around this, the authors pooled the symptom scores among everyone and re-calculated the nocebo ratio, which helped reduce the variance/error in the estimate.

Indeed, it stabilized the ratio. Interestingly, this advantage is not mentioned in the Wikipedia account of ratio estimators. The article focuses on bias. It also has a big blunder in the very first sentence (‘The ratio estimator is a statistical parameter!’).

The 95% CI around the new nocebo ratio can be estimated via bootstrapping; we used 1000 bootstrap samples to estimate a nocebo ratio of 0.89 with 95% CI of 0.55-1.31 (using the percentile method). For simplicity, we did not do the bootstrapping at the patient level.

library(readr)
StatinOrPlacebo <- read_csv(
  "https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")

## New names:
## Rows: 666 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): BottleContent dbl (3): ...1, PatientNo, MeanSymbolScore
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

placebo.samps<- 
  StatinOrPlacebo[
    StatinOrPlacebo$BottleContent== 'Placebo',] # 216
statin.samps<- 
  StatinOrPlacebo[
    StatinOrPlacebo$BottleContent== 'Statin',] #218
nothing.samps<- 
  StatinOrPlacebo[StatinOrPlacebo$BottleContent== '0',] #222

bts.1<- function(n){
  no.effect<-sum(sample(nothing.samps$MeanSymbolScore, n))
  placebo.effect<-sum(sample(placebo.samps$MeanSymbolScore, n))
  statin.effect<-sum(sample(statin.samps$MeanSymbolScore, n))
  return((placebo.effect-no.effect)/(statin.effect- no.effect))
}
bts.1(100)

## [1] 0.7513779

reps<-seq(1,1000)
n<-100
obs<-c()

for (r in reps){
  obs<-c(obs,bts.1(n))
}
mean(obs)

## [1] 0.8893852

quantile(obs, probs= c(0.025, 0.975))

##      2.5%     97.5% 
## 0.5686764 1.3175538

This gives a wide CI, but, by keeping the 3 sample sizes (216, 218,222) fixed, not quite as wide as team A.

The broader statistical methods message from this example is that variance can be reduced when data is pooled.

Again, the objective is stability. See comment to Team A.

The authors designed a study of at least 50 participants based on previous evidence that 50 people can help ensure unbiased estimates of SEs. This evidence was based on a simulation study and is an approximate rule of thumb. The authors could have improved their sample size/power calculation by choosing an effect size that is clinically relevant and would inform decision making.

As we discussed in class, its more about precision than power. For planning purposes, one could have made up some data, and used bootstrapping to calculate a SE.

>>> TEAM C

library(tidyverse)

nocebo_df <- read.csv(
  "https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")
# view dataframe
head(nocebo_df,13)

##     X PatientNo BottleContent MeanSymbolScore
## 1   1         1             0            1.07
## 2   2         1             0            0.81
## 3   3         1        Statin            0.71
## 4   4         1        Statin            0.99
## 5   5         1        Statin            0.96
## 6   6         1       Placebo            0.81
## 7   7         1             0            0.96
## 8   8         1             0            0.71
## 9   9         1        Statin            0.74
## 10 10         1       Placebo            0.89
## 11 11         1       Placebo            0.71
## 12 12         1       Placebo            0.81
## 13 13         2       Placebo            1.03

tail(nocebo_df,12)

##       X PatientNo BottleContent MeanSymbolScore
## 655 655        58       Placebo           34.75
## 656 656        58             0           28.95
## 657 657        58       Placebo           21.47
## 658 658        58       Placebo           35.29
## 659 659        59             0           16.07
## 660 660        59             0           16.25
## 661 661        59        Statin           45.04
## 662 662        59        Statin           51.41
## 663 663        59        Statin           40.25
## 664 664        60       Placebo           45.87
## 665 665        60             0           43.85
## 666 666        60        Statin           51.52

# change row name
# nocebo_df 

# convert to wide format and change variable name
nocebo_wide <- nocebo_df %>% spread(key = BottleContent, value = MeanSymbolScore) %>% 
  rename(None = "0")

## not sure if need to calculate means first then bootstrap or bootstrap from individual measurements
nocebo_patient <- nocebo_wide %>% group_by(PatientNo) %>%
  summarise(mean_none = mean(None, na.rm = T),
            mean_placebo = mean(Placebo, na.rm = T),
            mean_statin = mean(Statin, na.rm = T)) %>% as.data.frame()

# bootstrap nocebo ratio
nocebo_vector <- NA

for(i in 1:1000){
  none_sample <- sample(nocebo_patient$mean_none, size=nrow(nocebo_patient), replace = T)
  placebo_sample <- sample(nocebo_patient$mean_placebo, size=nrow(nocebo_patient), replace = T)
  statin_sample <- sample(nocebo_patient$mean_statin, size = nrow(nocebo_patient), replace = T)
  nocebo_ratio <- (mean(placebo_sample, na.rm = T) - mean(none_sample, na.rm = T))/(mean(statin_sample, na.rm = T) - mean(none_sample, na.rm = T))
  
  nocebo_vector[i] <- nocebo_ratio
}

# get 95% CI using percentile method. NR = 0.87 (95% CI: 0.37, 1.85)
nocebo_vector %>% quantile(c(0.025, 0.5, 0.975))

##      2.5%       50%     97.5% 
## 0.3376564 0.8554278 1.6285486

# standard error - 0.4
sd(nocebo_vector)

## [1] 0.3206016

# histogram
hist(nocebo_vector, breaks=20)

>>> TEAM D

The nocebo effect refers to a negative effect of treatment that occurs because someone anticipates a negative effect from taking the treatment (in contrast to the placebo effect which is a positive effect of treatment that occurs because someone anticipates a positive effect from taking the treatment). So the nocebo ratio is the ratio of symptoms that can be attributed to the nocebo effect, calculated as the symptoms attributable to taking a placebo (symptoms from placebo - symptoms from nothing) divided by the symptoms attributable to taking the drug of interest (symptoms from drug - symptoms from nothing). If the nocebo ratio > 1 then the symptoms incurred from taking a placebo are larger than the symptoms incurred from taking the drug of interest. If the ratio = 1 then the symptoms are the same taking placebo or drug. If the ratio < 1 then the symptoms from taking the drug are larger than the symptoms from taking a placebo (indicating a negative effect of the drug itself). In plain words, nobeco ratio is how badly your body reacts to taking just the placebo as compared with taking statin as represented by a ratio.
The original results were strange because they suggest the negative effects from taking a placebo are larger than taking the actual drug. This was the case because for some people the symptoms attributable to taking the drug of interest were very small or even negative (less symptoms when someone was on the drug than when not). This would lead to large confidence intervals.

JH: Indeed. See above. But you mean the perceived effects, don’t you?

The preliminary results were calculated by taking the mean of individual nocebo ratios while the final results were calculated by taking the mean of symptom intensities and calculating an overall nocebo ratio.

Indeed

The first approach produced unsatisfactory results because it introduced extra variability by not pooling the data. The second approach produced a sensible point estimate, but no interval estimate. Using the data from Figure 1, supply one. (deluxe version not required)

Treating each score for each individual as a separate observation, I bootstrapped a 95% CI of (0.59, 1.31). Taking the average score for each bottle content within individuals (placebo, statin, none) and then bootstrapping, I got a 95% CI of (0.38, 1.65).

Code for 1st CI:

library(tidyverse)

suppressMessages(library(tidyverse))

nocebo_df <- read.csv(
  "https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")

str(nocebo_df)

## 'data.frame':    666 obs. of  4 variables:
##  $ X              : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ PatientNo      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ BottleContent  : chr  "0" "0" "Statin" "Statin" ...
##  $ MeanSymbolScore: num  1.07 0.81 0.71 0.99 0.96 0.81 0.96 0.71 0.74 0.89 ...

# get scores for each bottle content

# alteration by JH

# The next 3 statements worked for him when he ran them in R, but they
# threw up an error when he ran the entire document through R Markdown.

# placebo <- nocebo_df %>% filter(BottleContent=="Placebo") %>% select(MeanSymbolScore) %>% pull()
# statin <- nocebo_df %>% filter(BottleContent == "Statin") %>% select(MeanSymbolScore) %>% pull()
# none  <- nocebo_df %>% filter(BottleContent == "0") %>% select(MeanSymbolScore) %>% pull()

# So, he replaced then with his own old-fashioned ones

placebo <- nocebo_df$MeanSymbolScore[nocebo_df$BottleContent=="Placebo"]
statin  <- nocebo_df$MeanSymbolScore[nocebo_df$BottleContent == "Statin"] 
none    <- nocebo_df$MeanSymbolScore[nocebo_df$BottleContent == "0"     ]

# bootstrap 95% CI

## create empty vector

nr_vector <- NA

## bootstrap with 1000 repetitions


for(i in 1:1000){
   none_sample <- sample(none, size=length(none), replace = T)
  placebo_sample <- sample(placebo, size=length(placebo), replace = T)
  statin_sample <- sample(statin, size = length(statin), replace = T)
  nocebo_ratio <- (mean(placebo_sample, na.rm = T) - mean(none_sample, na.rm = T))/(mean(statin_sample, na.rm = T) - mean(none_sample, na.rm = T))
  
  nr_vector[i] <- nocebo_ratio
}


## get mean and 95% CI

nr_vector %>% quantile(c(0.025, 0.5, 0.975))# JH would 'round' to 2 dec. places

##      2.5%       50%     97.5% 
## 0.5933059 0.8813228 1.3136811

# but doesn't know the 'newer' way with this pipe %>% approach.

This gives a wide CI, but, by keeping the 3 sample sizes (216, 218,222) fixed, not quite as wide as team A.

JH: It’s always safer to use quantile-based than Gaussian-based limits. But JH would not have resisted having a look (via a histogram) at the shape of the distribution of the bookstrap estimates of the ratio.

JH: See comment above about keeping together the scores from the same patient.

Pooling individual observations decreases variance compared to accounting for differences between individuals in analyses. Not sure which is the more appropriate method. Sometimes we should pool the data.

JH: IF we were using a formal statistical model that carefully and correctly laid out the componets of variance, it might not be necessary to do the ‘pooling’ you speak of. The estimator, and the model, might just do the right thing (think for example, of a hierarchical model, or even GEE). For example, if the ratio were defined as the absoluet difference rather than the ratio ( a difference of differences), it should be possible to fit a formal model. The one thing missing from the calculations (and even the bootstrapping) is a model: a model where the parameter of interest, the estimand, is front and centre, and visible.

I don’t think they had 103 observations per treatment per participant. Although there was no prior data on the relative size of nocebo and statin effects, the authors could’ve considered prior data on statin effects.

JH: Indeed, it was very generic. It also lacked a model for the structure of the observations. Without a model, it is hard to think of precision, sampling variation, etc etc. This lack of a full data-analysis model is something JH rants about regularly. When teaching in a protocol (study design) course decades ago, he was always asked to teach about sample size / precision / power right up front, but refused to. He explained that how one analyzes the data determines the precision. For example, if one has a paired design, but uses a 2-independent samples t-test, one won’t have the precision one should have. Precision depends in part on the model used.

Likewise, even if we wait until the data are in (so no need to do power calculations: just look at the observed precision), JH is not that comfortable with the absence of a model that says what are the (systematic and random) components of each $y_{ijk}$, the SI score reported by patient $i$ in condition $j$ (content of bottle), in month $k$.

>>> INDIVIDUAL SUBMISSION

Nocebo ratio: the fraction of symptoms of statins that was caused by the inert (placebo) pills.

Their first approach produced unsatisfactory results because the mean symptom score of the patients when not on pill were similar enough to mean symptom score when on statins. Hence, sometimes, the denominator was close to zero, making the estimand unstable.

The broader statistical methods message from this example is to
stabilize the estimand; do not keep a difference that can potentially be zero (or close to it) in the denominator.

JH: re TERMINOLOGY: The estimand is a parameter (the value you would obtain with an infinite sample size) so it is (unknown and) not under your control. An estimator (here we saw 2, the initial one, and the revised one) is a method or recipe (or formula) for converting the observed data (in the sample of 60 here) into a numerical estimate of the parameter.

The sample size calculation should be done based on the difference that would make a difference, and not on prior data.

JH: In this case it is a question of estimation, rather than testing, so it is more about precision than power.

>>> JH CODE (BOTH JACKKNIFE & BOOTSTRAP)

ds=read.csv("https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")

MEANS = by(ds$MeanSymbolScore,ds$BottleContent,mean)
str(MEANS)

##  'by' num [1:3(1d)] 7.96 15.69 16.79
##  - attr(*, "dimnames")=List of 1
##   ..$ ds$BottleContent: chr [1:3] "0" "Placebo" "Statin"
##  - attr(*, "call")= language by.default(data = ds$MeanSymbolScore, INDICES = ds$BottleContent, FUN = mean)

RATIO = (MEANS[2]-MEANS[1])/(MEANS[3]-MEANS[1])

# Jack-knife

n = length(unique(ds$PatientNo))

ratio=rep(NA,n)
for(pt.id in 1:n){
  leave.in = (ds$PatientNo != pt.id)
  means = by(ds$MeanSymbolScore[leave.in],
             ds$BottleContent[leave.in],mean)
  ratio[pt.id] = (means[2]-means[1])/(means[3]-means[1])
}
for(i in 1:1000){
   none_sample <- sample(none, size=length(none), replace = T)
  placebo_sample <- sample(placebo, size=length(placebo), replace = T)
  statin_sample <- sample(statin, size = length(statin), replace = T)
  nocebo_ratio <- (mean(placebo_sample, na.rm = T) - mean(none_sample, na.rm = T))/(mean(statin_sample, na.rm = T) - mean(none_sample, na.rm = T))
  
  nr_vector[i] <- nocebo_ratio
}

( Var = ( (n-1)/n) * sum( (ratio - RATIO)^2 ) )

## [1] 0.01678853

( SE = sqrt(Var) )

## [1] 0.1295706

round(RATIO + qnorm(c(0.025,0.5,0.975))*SE,2)

## [1] 0.62 0.88 1.13

hist(ratio - RATIO)

Figure. Jackknife CI, with Jackknifing at the patient level

ds=read.csv("https://jhanley.biostat.mcgill.ca/bios691/StatinOrPlacebo.csv")

# bootstrap

measurements.per.pt = as.numeric(table(ds$PatientNo))
        
n.samples = 10000

bs.ratio = rep(NA,n.samples)

for(sample in 1:n.samples) {
  
  weights = rep(0,n)
  id.s = sample(1:n,n,replace=TRUE)
  freq = table(id.s)
  which.ones = as.numeric( dimnames(freq)$id.s )
  weights[which.ones] = as.numeric(freq)
  ds$weights = rep(weights,measurements.per.pt)
  sums = by(ds$weights*ds$MeanSymbolScore,
             ds$BottleContent,sum)
  n.summed = by(ds$weights,
             ds$BottleContent,sum)
  means = sums/n.summed
  bs.ratio[sample] = (means[2]-means[1])/(means[3]-means[1])
}

hist(bs.ratio,breaks=40)

Figure. Bootstrap CI, with bootstrapping at the patient level

#hist(log(bs.ratio),breaks=40)

summary(bs.ratio)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4711  0.7939  0.8753  0.8814  0.9623  1.5518

( SE = sd(bs.ratio) )

## [1] 0.1290422

round(quantile(bs.ratio,probs = c(0.025,.50, 0.975)),2)

##  2.5%   50% 97.5% 
##  0.65  0.88  1.15

Life expectancy

If you are able to locate the methods by which these country-specific projections were made, summarize them, and highlight the key assumptions.
If you are not, provide your own commentary on how plausible the projections are.
Comment, in particular, on the 3 ‘bridges’ based on the data from France.

>>> TEAM W

We were not able to find the data

JH: SEE BELOW

The projections don’t seem very plausible; even with medical advances that we’d expect in the future, an average life expectancy of 105 is very old. As well, rates of obesity are increasing worldwide and we would expect this to have a detrimental impact on life expectancy.

JH: INDEED. Life expectancy (LE) at birth (LE[0]) is stalling in some nations, such as the USA. The big jumps ahead were as a result of public health advances that followed the adoption of the germ theory, sanitation improvements, refrigeration, mass vaccinations, antibiotics, improvements in standards of living, education, etc. One wonders if there were some numerical artifacts, especially with unstable death rates from countries with smaller populations.

We were not clear what this animation was demonstrating.

JH: The animations shown in class should now have helped explain these. They show two cohort lifetables, but mainly ‘current’ or ‘period’ lifetables. The contrast between the cohort and current lifetables for France is notable: nobody could have seen whether (and when) WW I would begin, or that there would be an influenza pandemic in 1918-19. The big reductions in INFANT mortality are the biggest reason for the big jump in LE(0) – see Canada.

Drops in life expectancy may not be a useful metric because although Covid is contributing to greater mortality this year, the impacts of Covid on mortality will (hopefully) just last for a few years. The life expectancy of a child born today based on current mortality rates that take into account excess death due to Covid may not be accurate because these several years of Covid will ideally just be an outlier, and Covid-attributable mortality in pediatric populations is extremely rare.

JH: it is indeed wrong to interpret a LE(0) calculated from today’s mortalirty rates as a sure guide to what will happen to new generations. LE’s from current lifetable are fictional/hypothetical: they are projections, and as such, have built-in assumptions. And, as Niels Bohr said, Predictions are difficult, especially about the future. It is best to thing of a LE[0] from a table made with current mortality data as JUST THAT: A 1-NUMBER SUMMARY OF THESE VERY MORTALITY RATES.

>>> TEAM X

I couldn’t find how they got to 105 years either, but from this site I did find the country-specific LE data under “Life Expectancy at Birth”: it gives life expectancy at birth in ireland as 82.81 i think just further supporting the point that the estimate of 105 is nonsensical. Assumptions aren’t stated - presumably some assumptions about countries conditions inflate Ireland’s life expectancy.

JH: the projections were changed (revised DOWNWARDS) once the questionable projections were brought to the attention of UNICEF.

Given the worlwide life expectancy at birth in 2019 is 78.7 for Europe and North America, and a consistent growth rate would put life expectancy at 83.2 by 2050, the sharp increase to 105 in 2021 for Ireland seems implausible.

JH: indeed.

The 3 ‘bridges’ based on the data from France:

Bridge 1: As expected, over time, bridge became thick at the beginning, slowly decreased, and thinned out towards the end after 80.

Bridge 2: Similar to bridge 1, though there was a longer plateau at the beginning, and the decrease began at 20 years. In general, there were longer periods of sustained value (less decrease from 24-45). Also fewer deaths right after birth, but much more around late teenage years. Possibly heavily inflenced by depression and world wars.

Bridge 3: Thick at the beginning and a very stable decrease until about 70 years. A larger increase in average death after 60 years. Presumably likely to die at war, leading to higher survival through early adulthood.

JH: Were you able to see in the real (cohort) version the SUDDEN DROP drop in the size of the living male cohort, starting with the YEAR 1914, and lasting to the eyar 1918/19? France lost more young men than any other country. Remember that there are two parallel timescales in a chort lifetable: if the hort was born in 1895, then age 19 is calendar year 1895 + 19 = 1914. You don’t have the same ‘real-ness’ when you calculate a lifetable from current data. The mortality rate for age 19 is based on survivors from those born 19 years earlier (be it in France or elsewhere). The mortality rate for age 29 is based on survivors from those born 29 years earlier. There is no ‘real’ going forward, the way there is in a cohort table. It is like one took a still photo in the year in question. The table in Armitage’s book, contrasting the life table based on 1930-32 death rates with the actual progress of the real 1931 cohort, is a clever way to show the difference between the two.

The use of Covid-19 mortalities towards life expectancies seems to serve as a useful metric since it shows an important dimension of the impact of the pandemic relative to the effects of other causes of death. However I don’t like to use them personally because I find their assumptions (i.e. that babies will experience the same age-specific mortality as present-day) make the measures themselves less interpretable. I’m more partial to simpler metrics like excess mortality or PYLL.

JH: Agree 110%. For those new to this terminology, PYLL means Potential Years of Life Lost.

>>> TEAM Y

Link to google doc

JH: Impressive groundwork, ‘investigative journalism’ and new analyses. It includes technical details. The google doc also includes screenshots from the Bridges of Life based on current (1895) and cohort (1895-2004) data, where the dramatic drop in the size of the actual 1895 cohort during WW I is very obvious.

JH: Speaking of ‘visuals’ of the destruction of a ‘cohort’, here is a link to what many consider both the most effective and most poignant graph in history. See Wikipedia, the excellent data-visualization site by Michael Friendly. See also the ‘engineering’ side of Minard’s work that JH used in a class on communication for Concordia engineering students.

========

January 11, 2021

Same subject studied in different experimentally induced Conditions

Here, via the McGill Library, is Winer’s 1962 book., here is the chapter ‘Single-factor experiments having repeated measures on the same elements’

The designs in this chapter may be said to involve correlated or dependent observations based on the following reasoning: in experimental work in the behavioral sciences the elements forming the statistical population are frequently people. Because of large differences in experience and background, the responses of people to the same experimental treatment may show relatively large variability. In many cases, much of this variability is due to differences between people existing prior to the experiment. If this latter source of variability can be separated from treatment effects and experimental error, then the sensitivity of the experiment may be increased. If this source of variability cannot be estimated, it remains part of the uncontrolled sources of variability and is thus automatically part of the experimental error. One of the primary purposes of experiments in which the same subject is observed under each of the treatments is to provide a control on differences between subjects. In this type of experiment, treatment effects for subject ‘i’ are measured relative to the average response made by subject ‘i’ on all treatments. In this sense each subject serves as his own control–responses of individual subjects to the treatments are measured in terms of deviations about a point which measures the average responsiveness of that individual subject. Hence variability due to differences in the average responsiveness of the subjects is eliminated from the experimental error (if an additive model is appropriate). Experiments in which the same elements are used under all the ‘k’ treatments require ‘k’ observations on each element. Hence the term repeated measurements to describe this kind of design. To the extent that unique characteristics of the individual elements remain constant under the different treatments, pairs of observations on the same elements will tend to be positively correlated. More generally, the observations will be dependent rather than independent. If the population distributions involved are multivariate normal, the terms dependent and correlated are synonymous, analogously. The terms independent and uncorrelated are synonymous in this context. Since the models that will be used are assumed to have underlying multivariate normal distributions, correlated measurements imply statistically dependent measurements. (PsycINFO Database Record (c) 2019 APA, all rights reserved)

and here is the pdf of the chapter

SE (or stability) of a SD

By simulation

Here are independent 1000 estimates of the mean of a distribution where the true mean is 100, and the SD is 13. In each instance, the sample size is n=30.

n.sims=1000

mu=100
sigma=13

n=25

estimate = function(y){
  y=rnorm(n,mean=mu,sd=sigma)
  return( c( mean(y), sd(y) ) )
} 

  
estimates = t(matrix( unlist(lapply(1:n.sims,estimate)) ,
       nrow = 2, ncol = n.sims ))

head(estimates)

##           [,1]      [,2]
## [1,]  97.02657 13.882639
## [2,] 106.03737 12.754475
## [3,]  98.96704 14.415149
## [4,]  98.45669  9.679836
## [5,]  95.32901 13.399238
## [6,]  98.54197 14.073843

apply(estimates,2,summary)

##              [,1]      [,2]
## Min.     92.32055  7.101628
## 1st Qu.  98.09844 11.577759
## Median   99.91087 12.776865
## Mean     99.86511 12.838734
## 3rd Qu. 101.53351 14.151189
## Max.    108.59311 18.597224

round(apply(estimates,2,sd),3)

## [1] 2.579 1.912

( CV = round(100* apply(estimates,2,sd) / c(mu,sigma),3) )

## [1]  2.579 14.705

By theory (SE(s) requires $y \sim Normal$ )

SE(ybar) = sigma/sqrt(n) = 13 / 5 = 2.6

SE(s) = sigma/sqrt(2*(n-1)) = 13 / 6.928 = 1.876

Effect Size

PERIOPERATIVE NORMOTHERMIA TO REDUCE THE INCIDENCE OF SURGICAL-WOUND INFECTION AND SHORTEN HOSPITALIZATION

Background. Mild perioperative hypothermia, which is common during major surgery, may promote surgical-wound infection by triggering thermoregulatory vasoconstriction, which decreases subcutaneous oxygen tension. Reduced levels of oxygen in tissue impair oxidative killing by neutrophils and decrease the strength of the healing wound by reducing the deposition of collagen. Hypothermia also directly impairs immune function. We tested the hypothesis that hypothermia both increases susceptibility to surgical-wound infection and lengthens hospitalization.

Methods. Two hundred patients undergoing colorectal surgery were randomly assigned to routine intraoperative thermal care (the hypothermia group) or additional warming (the normothermia group). The patients’ anesthetic care was standardized, and they were all given cefamandole and metronidazole. In a double-blind protocol, their wounds were evaluated daily until discharge from the hospital and in the clinic after two weeks; wounds containing culture-positive pus were considered infected. The patients’ surgeons remained unaware of the patients’ group assignments.

Results. The mean (SD) final intraoperative core temperature was 34.7(0.6) C in the hypothermia group and 36.6(0.5) C in the normothermia group (P<0.001). Surgical-wound infections were found in 18 of 96 patients assigned to hypothermia (19 percent) but in only 6 of 104 patients assigned to normothermia (6 percent, P=0.009). The sutures were removed one day later in the patients assigned to hypothermia than in those assigned to normothermia (P=0.002), and the duration of hospitalization was prolonged by 2.6 days (approximately 20 percent) in the hypothermia group (P=0.01).

Conclusions. Hypothermia itself may delay healing and predispose patients to wound infections. Maintaining normothermia intraoperatively is likely to decrease the incidence of infectious complications in patients undergoing colorectal resection and to shorten their hospitalizations. (N Engl J Med 1996;334:1209-15.)

–

The number of patients required for this trial was estimated on the basis of a preliminary study in which 80 patients undergoing elective colon surgery were randomly assigned to hypothermia (mean [SD] temperature, 34.4[0.4]C) or normothermia (involving warming with forced air and fluid to a mean temperature of 37[0.3]C). The number of wound infections (as defined by the presence of pus and a positive culture) was evaluated by an observer unaware of the patients’ temperatures and group assignments. Nine infections occurred in the 38 patients assigned to hypothermia, but there were only four in the 42 patients assigned to normothermia (P=0.16). Using the observed difference in the incidence of infection, we determined that an enrollment of 400 patients would provide a 90 percent chance of identifying a difference with an alpha value of 0.01. We therefore planned to study a maximum of 400 patients, with the results to be evaluated after 200 and 300 patients had been studied. The prospective criterion for ending the study early was a difference in the incidence of surgical-wound infection between the two groups with a P value of less than 0.01. To compensate for the two initial analyses, a P value of 0.03 would be required when the study of 400 patients was completed. The combined risk of a type I error was thus less than 5 percent.

Comment (JH)

To JH, the effect size in a power calculation should be the minimal effect that WOULD ‘MAKE A DIFFERENCE’ IF IT WERE TRUE. It is a difference would would not wish to ‘miss.’ By ‘miss’ we merely mean that the study will not produce a ‘positive’ result, i.e., that the observed difference will not be significantly different from the ‘null.’

Thus, the effect size (the difference that would make a difference) is a judgement, based on cost and other practical matters. In the example above, the observed difference was statistically different from zero, but the blanket costs a considerable amount, and is not re-usable.

The other example JH uses to illustrate what is involved is the choice between the riskier but earlier-news chorionic villous sampling versus the less risky but later-news amniocentesis.

Just about all medications involve a tradeoff between benefits and risks, and these come to the fore in the number needed to treat metric.

Efficiency of paired studies

For several examples of such studies, see these 607 notes, used in the teaching of the one-sample t-test.

and a few here

Defence of small clinical trial

This evaluation of three gastroenterological studies makes some important points.

Definite ‘Negative’ Studies ?

Do starch blockers really block calorie absorption?

See here for a summary of the 5-subject study and page 6 of this document for a longer version. The full version is here and it includes a response by another investigator

We also recently completed a study in which a starch blocker was proved to be ineffective. In a double-blind crossover trial in six healthy male subjects, we were unable to demonstrate any effect on three markers of starch digestion — namely, serum glucose, insulin, and breath hydrogen.9 The two studies in human beings are complementary. The work of Bo-Linn et al. indicates that the bottom line is that α-amylase inhibitors do not result in excess calorie loss in the stool, and ours indicates that this failure is due primarily to a lack of inhibitory effect on starch hydrolysis in the small intestine.

Inconclusive ‘-ve’ Studies

Do infant formula samples reduce the duration of breastfeeding?

Short Version and Full Version

The first author urges younger investigators not do what he did to calculate a sample size. He considered a difference of 10 percentage points as clinically significant, enough to call out the infant formula companies if it were true. But to arrive at a sample size, he simply calculated the n which would make an observed difference of 10 percenatge points statistically significant. He did not realize that if the TRUE difference were 10 percentage points, there was a 50% chance that the observed difference would exceed 10 percentage points (so the study would have ‘established’ that samples reduce – by SOME amount – the rate of breastfeeding), AND a 50% chance that the observed difference would NOT exceed 10 percentage points (so the study would have been ‘negative’). In other words, his study had 50% power to show a ‘(statistically) significant’ reduction.

2020

January 10, 2020

AUDIO
Bias and precision see diagram on page 1 of these notes. Concept is same for studies as it is for individual measurements.
Bias Reduction Statistical Methods for Comparative Studies: Techniques for Bias Reduction
‘Non-confounding covariates;’: Fairer and SHARPER: Hanley 1983
Different terms, same concept (eg: interaction and effect modification) Miettinen 1985, glossary and Miettinen 2011, Epidemiological Research: Terms and Concepts.
Factorial designs Encyclopedia of Biostatistics
Confounding, historically and today: See ‘The history of confounding’ by Vandenbrouche. See also Fisher, R.A., (1926). The arrangement of field experiments (https://link.springer.com/chapter/10.1007/978-1-4612-4380-9_8) and Kish, L. (1959), Some statistical problems in research design American Sociological Review, 26, 328–38. See also Wikipedia
Confounding and Effect Modification are different concepts. See JH’s Notes on Effect-Modification and on Effect-Modification. See also sections 5 and 6 in ‘Resources / Materials’ in the same webpage. In section 6, there is a helpful graphic from Miettinen showing that a covariate can be a modifier, a confounder, or both
Regression to the mean [Originally, reversion, and to mediocrity] (https://en.wikipedia.org/wiki/Regression_toward_the_mean) and how to avoid it. Original use of the term – with mediocrity instead of mean. Galton’s 1886 Figure and full article. From Miettinen 2011 III – 4, Terms and Concepts of Methods of Study: the tendency of a repeat observation (with chance error) to be closer to the mean. For an application to measurements of serum cholesterol, see Irwig, JAMA 1991
Comparative studies: experimental studies and non-experimental, i.e., observation ONLY, studies JH’s favourites in this list are Cox’s Design of Experiments and Campbell and Stanley’s Experimental and Quasi-Experimental Designs for Research
Match first (on important known factors); then randomize (for unknown factors). Darwin: The effects of cross and self fertilisation in the vegetable kingdom
Confidence intervals <—-> p values via Standard Errors (SEs).Can use qnorm(pvalue) to get the z value. Knowing that the point estimate is z SEs away from the null, one can backcalculet the SE. Fom this one can use point.estimate +/- 1.96 SEs for a 95% CI. This was the basis for Miettinen’s clever test-based CI to accompany a Mantel-Haenszel summary Ratio. before RGB (Rothman Greenland Breslow) came up with a fancier one, and Clayton a simpler one.
p-values not helpful for Table 1 Ask instead which column of patients (those in column 1 or those in columns 2) would you prefer to have on your shift.
2 sided vs 1-sided hypotheses/tests See [History and References sections of] (https://en.wikipedia.org/wiki/One-_and_two-tailed_tests)

January 17, 2020

AUDIO

Effect Modification

(Interaction): relation of one X variable with Y depends on the value of another X variable
The beta of an interaction term is the difference in slopes. For example, in Table 4 in the sunscreen paper: Beta[Freckling] = 0.12 = slope (nevi/%frecking) in the no sunscreen group. Beta[Freckling*sunscreen interaction] = -0.38 means that the slope in the sunscreen group is 0.38/%freckling smaller than in the no sunscreen group. Thus, instead of being positice, the slope is 0.12 - 0.38 = -0.26. This more or less lines up with what is seen ib Fig. 2, where (in what looks like a crude plot) the solid line has a slope of approx (15-35)/100 = -0.20/%freckling
Mathematically, putting the product in the model means an extra effect is toggled on when the variable involved in the product is non-zero, and left out when it is zero.
See these NOTES from JH’s (678) course on Analysis of multivariable data
We may consider put numbers into percentage, E.g., the people with 30% freckles (most common group) on face with sunscreen protection gain roughly 8 less (or into percentage 26%) new nevi than those without sunscreen protection More material in session 6 in Resources / Materials further down this page.
R code to draw lines showing how effect of water fluoridation is modified by level of deprivation
It can be dangerous to use exploratory data anaysis to decide what products (modifiers) to include. JH prefers a priori ones based on expert knowledge.
Why ‘Effect modification’ is a more expressive term than ‘interaction’. In everyday language, interaction just involves 2 items/actors: X1 and X2, but NO Y. Example, courtesy of Miettinen, courtesy of Euripides, ‘Love makes time pass; time makes love pass.’ See footnote 4 here

Bias Reduction and More Precision (making comparisons fairer & sharper.. see Hanley 1983 article

Fair means absense of confounding (bias), so we are comparing like with like
We adjust for (correct for the imbalance in) a covariate if the contrast is umbalanced with respect to this variable, and this variable is a determinant of the response. We add the covariate to take out the bias.
Inclusing a covariate that is balanced, but is also a determinant of the response, reduces the residual variance, and the SE of the parameter estimate of interest. We add the covariate to take out the extraneous variance and to improve the precision.
Adding a covariate that is uncorrelated with the response variable to a linear model won’t account for more (reduce residual) variance, so the standard error of estimated effect is not reduced.
It can be dangerous to use exploratory data analysis to decide what covariates to include. * See these NOTES from JH’s (678) course on Analysis of multivariable data More material in session 5 in Resources / Materials further down this page. More material here.

Standard Error and Standard Deviation

Standard error is for a statistic (i.e. some function of data), standard deviation is for individuals
See useful article: Standard Deviation, Standar Error: which ‘Standard’ should we use?. See also these comments in JH’s 607 Notes and in bios601
Someone suggested that the smaller the SE is, the more representative the sample drawn would be. While a small SE is good, it only means the estimate is reproducible, i.e. another similar sample would give almost the same answer. But one can be precisely wrong: if we ask a million randomly sampled individuals who they will vote for, the n = 1 million gives a small SE, but if they are lying, all we have is a precisely wrong interval! That is why I show bias/precision in this diagram on page 1 of these notes

Reporting Magnitudes of Differences / Scale

(Maybe) report percentage differences when consumers not familiar with magnitude of difference in absolute scale. Can use bootstrap to get CI’s.

Coding of a binary (all or none) variate

Carbon bike or not; active or not; flouridated or nor; female or male, not ‘sex’ or ‘gender’. 0 is then the reference category 1 the index category. And there is no need to look up your coding or explain it to someone else.

Computing

aggregate(fruitfly$Thorax,by = list(type=fruitfly$Type),FUN = mean)

tapply()

Use regression instead of t test. e.g.,

lm(longevity~type_female, data=df.fruitfly)

January 24, 2020

AUDIO part1 part2

Orthogonal (independent) contrasts Wikipedia

If 4 independent samples from 4 conditions’ A, B, C, D, then 3 independent (uncorrelated, non-over-lapping) contrasts possible
{B,C,D} vs. A contrasts are correlated because they involve a common A as reference category.
Example of 3 orthogonal contrasts: AB vs CD; A vs B ; C vs D.
Overall F test with degrees of freedom = 4 - 1 = 3 tests the null H that all 4 means are equal, against the alternative that at least one [unspecified] is different from the others. It is blunt, because it penalizes user for looking in 3 directions simultaneously, a bit like a (say radar) scanner that scans in 3 D from a central point to find ‘signals’. JH prefers shaper more focused (directed) scans that look at just one (pre-speficified) contrast (1D).
Bonferroni corrections used to penalize for multiple testing, expecially if post-hoc.
Difficult to tell incidental findings (needing replication) from ones based on pre-planned analyses
stepwise regression not appropriate in this context.

Effect Size (several types)

E.g., Difference family: Effect sizes based on differences between means
Male:Female. Birthweight {$\mu=$ 3.6Kg - $\mu=$ 3.5Kg}/{$\sigma =$ 0.5Kg} = 0.1. Adult height (175cm - 165cm)/6cm = 1.6
MaternalSmoking:Not. Birthweight {$\mu=$ 3.55Kg - $\mu=$ 3.35Kg}/{$\sigma =$ 0.5Kg} = 0.2/0.5 = 0.4.

~~DUMMY Variables~~ INDICATOR Variables

Early uses of the term ‘dummy’: 1952 and 1957
In ‘better families’ we speak of INDICATOR variables, not DUMMY variables. This source reports that The International Statistical Institute’s Dictionary of Statistical Terms objects to the name: the term is ‘’used, rather laxly, to denote an artificial variable expressing qualitative characteristics …. [The] word ’dummy’ should be avoided.’’
see online disctionaries such as the Cambridge dictionary of statistics, or

Miettinen’s text:- Epidemiological Research: Terms and Concepts

Indicator variate∗ – A variate with 0 and 1 as its (only) realizations, with realization 1 indicating something particular. (Examples: Y = 1 indicating membership in the case series of person-moments and X1 = 1 indicating index category of the etio-genetic determinant in an etiogenetic study – in the logistic model for the object of study.)
Dummy variate∗ (synonym: indicator variate) – See ‘Indicator variate’ in section II – 2. Note: This term is a misnomer: there is nothing dummy about an indicator variate.

Regression models

For a index category vs. reference category contrast, use an indicator variable for the index category.
Center continuous covariates (or e.g., relocate them so that they start at the minimum, or at an memorable vale (such as 5 feet for human heighs) so as to yield an interpretable intercept.

Change scores vs. initial values as a covariate

If, by fiting math2 ~ misled + math1c , we obtain 5.6 - 1.1*misled + 0.45*math1c, this is arithmetically equaivalent to the result of fitting math2 - 0.45*math1c ~ misled
Contrast this with fitting math2 - 1.0*math1c ~ misled. Even without any intervention, if math1c has measurement error, we would not expect math2 ~ math1c to give a slope (correlation) of 1. (this mismeasurement produces regression to the mean).
Because math2 and math1 are on the same scale, the slope in the regression of math2 on math1' is basically its correlation withmath1`. A value further (down) from 1 means less reliable measurements.
A relevant (medical) example of the artifacts created by imperfect measurements re-readings of prostate cancer biopsies. It is a nice bit of statistical teaching – by a pathologist who is better know for the [Gleason] score that bears his name. He reproduces a diagram from his 1951 college text in biometric analysis.

Combining Ratios

DON’T, expecially if they are based on small (unstable) denominators better sum all up as Mantel did in his 1958 odds ratio formula (in paper with Haenszel) that summarizes the relatioships in the individual 2x2 tables with frequencies $a, b, c, d$, namely $or_{MH} = \frac{\Sigma ad/n}{\Sigma bc/n}$. Think of Miettinen’s example of estimating the sex ratio in small daycare centers. Better to use M/F = Sum(m)/Sum(f) than mean or median of the individual (m/f) ratios. See Ratio Estimators in sampling theory.
If we calculate sum(math2)/(sum(math1) as a single ratio, we can use jackknife or bootstrap (or delta) method to obtain a SE or CI.

Bootstrap dictionary

A data-based way to estimate the sampling distribution of a statistic, and to form a CI. Conceptually: assuming your sample (of size n) is ‘rich’ enough to represent the population [like Noah’s arc], clone the sample an infinite number of times. Then take many (say B) samples of size n, calculating the statistic of interest on each new sample,and use the B ‘bootstrap’ statistics as the estimated sampling distribution. In the simplest case, use the standard deviation of this estimated sampling distribution as the standard error for the observed statistic. Use it, or the quantiles directly, to form a confidence interval for the parameter that the statistic is an estimate of. In practice, the bootstrap doesn’t clone or photocopy the original sample. Instead, it merely samples from it WITH REPLACEMENT.
For a very readable explanation, see [Computer-Intensive Methods in Statistics By Diaconis and Efron in Scientific American, 1983

Graphs

Instead of legends, consider palcing text labels near the objects.

R-squared

How large it is depends a lot on the field.

Women and Math

It would have been nice if the authors could have used both matching and randomization, as it would have meant less (mathematical) adjustment after the fact. It was probably logistically difficult to do, but any proxy measure they could have matched on would have helped keep the groups more balanced.
How predictable was it that an unconstrained randomization would produce math1 means as discrepant (or more discrepant) as the ones observed. The p-value for a 1-way ANOVA on math1 (see code) provides the answer. That p-values is based on the variance of the 4 observed means. The simulation code also shows how to evaluate the probability [wuite large!] of getting 4 means with a range as large or larger than the observed range.
Of course, even if they had been able to achieve balance, it would have been important to include math1 in the model so as to reduce the noise in math2 - just like in the case of thoraz in the fruitfly study.
The R code we ran in class, updated to include post/pre ratios.

Department’s 6 Public lecture Series in Epidemiology, 2014

January 31, 2020

Prelude to Oscars competition

Pardoe’s website, 2008 JRSS article, and ‘starter’ dataset (the dataset has been added to each year in course 624)
Some history of S curves: probit and logit for population growth and toxicology, and for the age-specific probabilities of having reached menarche. Beginning from early work on the Framingham Heart Study, the importance logistic regression in epidemiology.
The inverse-logit function $\frac{e^{\beta x}}{1+e^{\beta x}}$ and the inverse-probit function $\Phi[x]$ stay within the 0 to 1 probability bounds.
Why choose the logit vs. probit link for the binomial distribution? Both are used. The social sciences (and the toxicologists) like the probit, possibly because of the ‘hidden’ (or not hidden in the case of age at menarche) Gaussian distribution that is thought to underlie it. Epidemiology and Biostatistics use the logit, for substantive reasons. The 1970 book Analysis of Binary Data, by DR Cox did a lot to promote it, and the earlier use of odds ratios for case-control studies did as well. In the 1972 article by Nelder and Wedderburn, the logit is shown to be the ‘canonical’ link.
log vs. logit link ? Although the logit is the canonical link, the odds and the odds raio are not easy to communicate. A ratio of risks [vai log link] is easier and more transparent than ratio of odds [via logit link], and a ratio of odds can be deceptive. However, it may happen that the fitted risks/probabilities in a log-link model can go outside of the (0,1) range.
(log or logit) vs. identity link ? The identity link allows you to model proportions (and thus differences in proportions) directly. This absolute difference is preferred by public health people, and is used directly in cost-effectivess calculations, such as the numer needed to treat to prevent one case (it is simply the reciprocal of the risk difference). However, just like with the log-link, it may happen that the fitted risks/probabilities in an identity-link model can go outside of the (0,1) range.
(The statistical geneticist) Woolf’s (1955) use of Comparative Rates when Estimating the Relation Between Blood Group and Disease, and his very modern conception of what today is still misleadingly called a ‘case-control’ study. ‘This kind of artefact is avoided if one works with incidence rates in the various blood groups. The data usually do not permit calculation of absolute rates, nor are they needed. What is wanted and readily obtained is an estimate of the ratio of one rate to another. ’The incidence in (exposure group) group $\alpha$ will be h/H x some constant, and that in group $\beta$ will be k/K x the same constant. An estimate of the (rate) ratio will be hK/Hk.’
Notice that Woolf never uses the term odds ratio or compares (the exposure distribution of the) ‘cases’ with (the exposure distributions of the) ‘controls’. He uses a numerator series [aka as a case series] that split into h and k, and a denominator series, that split into H and K, that represents the base, to proceed directly to the natural comparison of peptic ulcer rates in the two blood types. Sadly, today we still haven’t abandoned the case-control terminology in favour of one that conveys the idea that the two (quasi) rates used in the rate ratio are calculated from partial (sampled) denominators.
If the ratio were based on known denominators (which are roughly proportion to H and K times some blow-up factor), then the variance of the log of the ratio would be merely (1/h and 1/k). But, given that the relative sizes of the two denominators (or the ratio H/K) are estimated from the denominator series (which is merely a fair sample of the base), the additinal cost, in variance terms, is (1/H + 1/K). This 1/h + 1/k + 1/H + 1/K (or today 1/a + 1/b + 1/c + 1/d) formula is referred to as Woolf’s formula. The illustrations in class bear out Samy Suissa’s contention that it may well be the most important formula in planning epidemiological research. An inportant goal is to minimize the variance, given the budget constraints.
Before logistic regression there was Discriminant analysis The goal is to to select a linear combination of features that maximally separates those who are in one group from those in another [i.e., different classes of the response variable], or, in epidemiology, those who have experienced the event of interest (Y=1) from those who have not (Y=0).
In 1962, Cornfield reverse-engineered discriminant anaysis to produce a logistic S curve that gave the probability of Y=1 as a function of the X values, and the 1967 Biometrika article showed how to directly estimate the regression coefficients – using iteratively re-weighted least squares to arrive at Maximum Likelihood estimates. This was a forerunner of the 1972 breakthrough by Nelder and Wedderburn on Generalized Linear models that used the same idea to fit a much wider range of models, using just two ‘toggles’, one for the link function that specified which function of $\mu_{Y|X}$ (e.g., identity, log, complementary log-log, etc ) is to be linked to the linear predictor, i.e., to the linear combination of the X’s, and the other for the ‘error’ distribution [Normal, Bernoulli/Binomial, Gamma, Poisson, etc] that describes the variation of $Y|X$ about $\mu_{Y|X}.$
Conditional Logistic Regression. For the parallel history of conditional logistic regression in economics/marketing, see the 2003 review article by Norman Breslow. It begins as follows: ‘Econometricians Daniel McFadden and James Heckman won the 2000 Nobel Prize in economics for their work on discrete choice models and selection bias. Statisticians and epidemiologists have made similar contributions to medicine with their work on case-control studies, analysis of incomplete data, and causalinference.’ Further down, it explains that ‘McFadden’s method for discrete choice analysis, developed to assist a graduate student to study freeway routing choices made by the California Department of Transportation, had its basis in formal theories of economic behavior. Under Marschak’s random utility model, observed economic choices—such as whether to ride the bus or drive the car to work—result from multinomial sampling (Becker, DeGroot, and Marschak, 1963). Psychologist Duncan Luce (1959) added the rather strong condition that the ratio of odds for any two choices was independent of the set of other available choices. These assumptions led McFadden (1973) to his ’conditional logit model’ \[Pr(c \ | \ x, \ C) = \frac{e^{v(c,x)}}{\Sigma_{C} \ e^{v(*,x)} },\] where $c$ is the choice selected, $C$ is the set of available choices, $x$ measures attributes of the individual decision maker, and * runs over the elements/members of $C$. [Notation altered by JH to be closer to conventional notation]. Manski and Lerman (1977) proposed a weighted likelihood method with origins in sampling theory. Manski and McFadden (1981) developed a pseudo likelihood procedure to which they referred as conditional maximum likelihood or CML. Cosslett (1981) studied semiparametric efficient ‘maximum likelihood’ estimation. For more, see McFadden’s website and the cv link in it.
The naturalness and efficiency of “outcome based sampling”
Parameter Estimation in Conditional Logistic Regression is carried out by maximizing the sum of the (multinomial) log-likelihood contributions from the various contests or ‘risksets.’ Thus, each row in the dataset needs to have, in addition to an indicator of whether it was ‘chosen’, and the associated x vector, a variable that identifies which rows (persons or person moments) are in the **same ‘set’ (or contest in the Oscars setting, or ‘riskset’ in fitting the Cox model, or matched case-‘control’ set). In CLR, the experiences of persons in different sets are not compared, only those of persons in the same riskset. Thus the strata specification in clogit in the R survival package. For the Oscar’s example, the strata variable specifies who is in the same competion in the same year.

Miscellaneous

A linear model is linear in the $\beta$’s: the linear predicor (say $\beta_ \times X_1 + \beta_2 \times X_2$ is a linear combination of betas, with the X’s used as weights. The right hand side of the following Michaelis-Menten relationship \[E[y \ | \ x ] = \frac{\beta_1 \ x}{\beta_2 + x}\] is not (and can not be made) linear in the $\beta$’s.
The estimate of a logit of a probability is most stable when the probability in question is near 0.5, whereas the estimate of a probability is most stable when the probability in question is furthest from 0.5. This stems from the fact that the (sampling) variance of a binomial proportion is $\pi(1-\pi) \times (1/n)$ whereas examining the variance of its logit is $\frac{1}{\pi(1-\pi)} \times (1/n)$
‘Catchment area’: where cases come from
Miettinen’s definition of experiment: from glossary in his 1985 text: ‘A study in which a determinant is intentionally perturbed for reasons none other than the goals of the study itself.’ And his story about a doctor inviting you to take some medication ‘samples.’ It an experiment if the doctor wishes to learn from your ‘experience,’ but not if the doctor didn’t ask you to report back to him/her. See also Miettinen’s newer dictionary.
Appropriate ‘controls’ in so-called ‘case-control’ studies. Some epidemiology texts take an overly-restrictive view, insisting that ‘controls’ must be selected from that portion of population time generated by persons who had a non-zero chance to represent an instance (a ‘case’) of the event of interest: thus, just females if the event is limited to females. What is required is a sample that fairly (unbiasedly) represents the distribution of the (putative etiologic) agent/trait in the base. Since the distribution of blood groups is the same in males and females, males could serve as a denominator (‘control’) series if the trait of interest was blood group – even if the event/condition is limited to females.
Also, the denominator (‘control’) series doesn’t have to be distinct from the case series: the two series can be arrived at independently of each other, just as in the hockey-epidemiology example about whether it was better to shoot low on Canadiens goalie Patrick Roy. The 51 goals constitute the case series; a random sample of all shots (no matter whether they resulted in goals or not) serves as a denominator series.
History of ‘Case-control’ studies, including this humourous one

Condo Prices

See separate R markdown document, JH’s first (and thanks to Lamin for the template!). The main purpose was to show how to fit additive-in-floor and multiplicativee-in-floor rate models, and to link them to the corresponding models for epidemiological rates (incidence densities) involving amounts of population-time rather than numbers of square feet. However, plots of the various fits suggest that (even apart from floor) area does not suffice, and that we would need to also study the role of the number of rooms – the 1½ or 2½ or 3½ measure used in Montreal, but that is not in the dataset. There is also a suspicious ‘outlier’ condo that is quite influential.

A couple questions about GLMs [to be addressed next class]

’This is the first time I’ve learned about/used them. I did some reading over the weekend and what I’ve ascertained is that the distribution you pick both reflects how your error changes as a function of the predicted value (for instance, since the variance of a binomial distribution is maximized when p = 0.5 there will be the most uncertainty when the log odds is about 0) and what kind of data you have (binary for binomial, frequency for poisson etc.), and the link function is what you believe is the actual relationship between your response and the covariates (log, identity etc.). Because of certain properties of the exponential family of functions and the link functions you can always analytically solve for the best fit beta’s.
Is this the correct background understanding?
How does one choose a particular distribution and link function for a given application? ’

February 14, 2020

Prizes for Oscars competition

Link to story about Abraham Wald and where to put extra armour on fighter planes. [Olga Basso’s talk] (https://www.niehs.nih.gov/news/events/pastmtg/2019/epid_2019/index.cfm) starts at 1hour 01.30 min into the morning session, and her Abraham Wald story is in the first minute.

Degrees of Freedom

See [second full paragraph in second column of 1st page of] https://www.medicine.mcgill.ca/epidemiology/hanley/reprints/CDH-WSGosset.pdf#page=3

Lung cancer screening

See under ‘CT’ here

Model-based Standard Errors

Statistical Analysis of Correlated Data Using Generalized Estimating Equations: An Orientation
Examples of positively correlated [row 3, number who had visited a physician in previous year] and negatively correlated [row 4, number who were of male gender] responses in a cluster sample of 30 households: data from Cochran, Sampling Techniques, 1953, pp. 124-127.] in Table I of GEE Analysis of negatively correlated binary responses: a caution

Logit and Probit fits to data on ‘detectability’ of laced drinks

Physiological model: Gaussian/Logistic distribution of ‘detectability threshold’, by analogy with the ‘just lethal’ dose on toxicology (see also the age at menarche data below)
Puzzling data from smokers. Using log(added amounts) did not help. The poor fits may be because of the small number of smokers (just 11) or because they deliberately sabotaged the study.
See several analyses, including comparisons of logit and probit models (underlying/latent logistic and Gaussian distributions of ‘detectability’) here

Logit and Probit fits to data on age at menarche

See regression fits, fitted probability density curves, and Cumulative Distribution Functions here:
Data structure is indentical to that in the laced-drinks example

Links for (binomial) models for Mystery dataset (and MORE GENERALLY)

identity: risk difference (invariant to use of desired/undesired outcome)
log: risk ratio. BUT WHICH? percentages that… survived? died? ]
logit (invariant to use of desired/undesired outcome)
For a very thoughtful (and data-based) discussion, see chapter II: Fundamental Measures of Disease Occurrence and Association, in volume 1 of this classic Breslow and Day textbook. Admittedly, in so-called case-control studies (case-base studies, or ‘estimated denominators’ studies), absolute differences in incidence density (or in risk) are not usually estimable, and so the preference for ratio measures is understandable. But the chapter has several compelling examples, all of them involving ‘full denominator’ datasets
If you have time, you could investigate which of the 3 links allows for the smallest model to summarize the survival (or mortality) patterns in the mystery dataset. After all, the pupose is to summarize (or reconstruct) the patterns in the 2 x 2 x 3 + 2 = 14 proportions with as few parameters as possible. Nobody wants to memorize the 14 percentages themselves.

100+ Years of Graphs of the Titanic Data:

February 21, 2020

The event that drove the water consumption pattern in Edmonton

ZDnet, March 11, 2010 or economistsview or FLOWING DATA

The striking graph has probably drawn more views than this ‘classic’, depicting the losses during Napoleon’s march to, and retreat from, Moscow

Graphs by Charles Minard
Michael Friendly’s DataVis.ca site

==========

Comments on one (M:F) comparison of the longevity of the Titanic survivors that used a ratio of 2 SMRs – each of which involved the sex-specific Titanic mortality rates versus the general US mortality rates

[Incidendally] From the Encyclopedia Titanica site website, which has biographies on every passenger and crew member, on elearns that the two earliest deaths (both in 1912) were from meningitis.
Ratios of SMRs: See page 33 in Chapter 2, Important concepts in Epidemiology, by O. Miettinen in this 2010 book 2010 Teaching Epidemiology: A guide for teachers in epidemiology, public health and clinical medicine. availble via the McGill Library. See also Breslow and Day, Volume II Chapter 2, and Modern Epidemiology (Rothman, Greenland, Lash).
Male survivors survived on average 15% more than women :
.
Discussion of how best to describe mortality ratios, and of the dangers of mixing differences in the time (longevity) scale with ratios in the rate scale.
.
Contrast this
Former NFL players die at a faster rate than other professional athletes, study finds
with this
Major league baseball [MLB] players tend to live about 24% longer than the average American man
Rough translation between mortality ratio and longevity difference: contact JH for one such ‘translator’.

Language/grammar/punctuation matters: (remember the panda who eats, shoots, leaves

Figure 1 shows a more detailed picture of the adjusted association between age and the odds of death by logistic regression.

Comments on a M:F comparison that used longevity from birth, rather than remaining longevity from the 1912 sinking onwards

See Premature Death in Jazz Musicians: Fact or Fiction?. and reply by Rothman. See also the longer list of longevity comparisons, some ok, some flawed
See also this 2016 presentation: ‘Immortal Time’ Blunders: history, identification, severity

Who needs the Cox model anyway? (Bexdix Carstensen)

Carstensen is the author of the Epi R package that includes the Lexis functions
here is his latest presentation on why [Poisson regression, based on the numbers of events in each of the Lexis cells, is a more transparent approach to the analysis of rates (hazards) than theCox model”] (http://bendixcarstensen.com/WntCma.pdf). It is a followup on his 2005 piece Demography and epidemiology: Practical use of the Lexis diagram in the computer age
Here is a (Stata-based) presentation by another author, entitled Replicate a Cox model using Poisson regression

The Lexis diagram brings up the issue of which time-scale to use in a Cox regression

Cox’s own advice is to match on (ie use as the time scale) the time scale that is more difficult to model, and include the other one as the covariate. Since the force of mortality varies by several orders of magnitude over age, but is a long weaker function of calendar time, Cox would match on age – and put calendar years as a covariate.
The same would apply to the Framingham study: just because the cohort starts at the year 1948 doesn’t mean it is the natural time scale to be matched on in a Cox regression. Time since 1948 (grant time) is an artificial time scale. Age is the natural one –and as Gompertz found, the force of mortality is 1.08 to 1.10 higher at each successive age. (Many of you analysis also found a similar coefficients vale for age at entry to the Titanic cohort.)
This article in Demographic Research on Exploring the demographic history of populations with enhanced Lexis surfaces, send to me by Jay Kaufman, has a very nice example, in Figures 4 and 5, of mortality in various French male birthcohorts. The
excess mortality during the periods of the two World Wars stands out.

Inverse Probability Weights – and classical standardization – to reduce condounding in the M:F contrast

In the M:F constrast in the Titanic example, there is no natural ‘index’ category and ‘reference’ category.For the purpose of the following, we will take the males to be the reference category, and females to be the index category. So we expect a rate ratio lower than 1, and a longevity difference greater than 0.
We will take age at entry (‘binned’ into 7 decades of age) and class as potential confounders.
Whereas standardided rates (or standardized risks) are more common, we begin with differences in mean longevity (or rather in the mean length of remaining life). We also address mortality rates.
In both, we ignore any cell where it’s all-or-none, i.e., all males or all females, so no contrast. [the Mantel-Haenszel summary rate ratio does this by construction]
The analyses are here in this separate document
We begin with the classical way to standardize: we use an agreed on set of weights to compute a weighed mean of the cell-specific means (or rates) for males, and the SAME set of weights to compute a weighed mean of the cell-specific means (or rates) for females.
We then show that when one uses the inverses of the observed cell-specific proportions of males, and the inverses of the observed cell-specific proportions of females, as INDIVIDUAL weights, one obtains the same difference in means as if we used FREQUENCIES as weights for the cell- and sex-specific means.
We could have arrived at these same Prob[Female | class agebin] ‘propensities’ and their complements Prob[male | class agebin] ‘propensities’ by fitting a saturated logistic regression model. Typically, one would use a less-than-saturated model.

MATCHING on the ‘PROPENSITY’ to be female, and AGGREGATING the (within propensity-score bins) DIFFERENCES in means is ANOTHER way to reduce condounding in the M:F contrast

February 28, 2020

Legendre 1805: Sur la Méthode des moindre carées

See p31 - 41 in the section COMBINATION OF OBSERVATIONS. Note that Legendre uses a, b, c … for KNOWNS and x, y, z .. for UNKNOWNS. We do the opposite today.

Submitted article on Teaching/learning multiple regression using family data – send via email.

Galton’s first ‘regression’ line fitted to HUMAN data (he had earlier fitted a ‘regression’ line to )

See Poster explaining the data, the data-reduction, and the fits. More details, including recently uncovered raw data, on JH’s page on Galton
Notes on Galton’s SWEET PEA Data
Galton, Pearson, and the Peas: A Brief History of Linear Regression for Statistics Instructors
Table in the original. Galton was a very clear writer, and an excellent statistician. BUT, please do take Galton’s ‘for each increase of one unit on the part of the parent seed, there is a mean increase of only one third of a unit in the filial seed’ as the way to report a regression slope, especially from non-experimental data. Better to simple say ‘From parent seeds one unit apart in size, the mean filial seeds From parent seeds one unit apart in size differed by only one third of a unit’
.
~~For each INCREASE of one unit on the part of the parent seed, there is a mean INCREASE of only one third of a unit in the filial seed~~
From parent seeds ONE UNIT APART in size, the mean filial seeds DIFFERED by only one third of a unit

Multiple regression as a sequence of simple regressions

See these Notes from JH’s 513-678L: ANALYSIS OF MULTIVARIABLE DATA (June 2003

“TERM ENTERED LAST” tests

See Comparing Models in these Notes from same course.

‘Immortal Time’ Blunders

Less consequential longevity comparisons:
MORE CONSEQUENTIAL longevity comparisons:
- Workers in a Polyvinyl-chloride Production Plant
- Heart Transplant Patients vs. Not Transplanted Patients on List and
- (Artifactual) Benefits of Statins
- Those with greater v.s lesser amounts of exposure to the sun
  (From Marshall) more info on the Danish-sun connection I was talking about: Dr Neils Finsen was the Danish doctor who won the 1903 Nobel prize in medicine for his heliotherapy research (now called phototherapy). He tried natural and artificial light on all sorts of ailments. At the time, it worked well on some skin conditions, especially compared to other treatment options at the time. link Niels Ryberg Finsen (15 December 1860 – 24 September 1904) was a Faroese physician and scientist of Icelandic descent. He was awarded the Nobel Prize in Medicine and Physiology in 1903 “in recognition of his contribution to the treatment of diseases, especially lupus vulgaris, with concentrated light radiation, whereby he has opened a new avenue for medical science.” People enthusiastically took up heliotherapy as a cure-all. A Dr Auguste Rollier founded an Institute of Heliotherapy in the Swiss Alps. The American west was also a popular destination for people seeking treatment. “Heliotherapy” is still popular amount some fringe alternative healers. It doesn’t surprise me that a nutritionist might be tempted by that data. http://www.crystalwind.ca/holistic-healing/love-yourself-healthy/health-yourself/health-yourself/heliotherapy-benefits-of-the-sun-far-and-beyond-vitamin-d https://www.treatmentabroad.com/medical-spas/guide/light-therapy-and-heliotherapy https://sandiegopsychiatrist.com/healing-with-heliotherapy/
See, in the 1800s already, the striking teaching examples of William Farr
More Longevity Comparisons, some OK, some NOT
The Most Dangerous Profession: see row 3

Updated, February 29, 2020

bios691/692: Concepts, Principles, Resources, etc…

2021

February 8, 2021 (and earlier)

Vaccine Efficacy in BioNTech/Pfizer, Moderna, Oxford/AstraZeneca and Sputnik Trials

February 1, 2021

Effects of neonicotinoid pesticides on honey bees and wild bees

January 25, 2021

Reconstruct data: COVID19 <-> Psychiatric Disorder

>>> TEAM A

>>> TEAM B

TEAM Z

>>> JH

>>> TEAM X

>>> JH

January 18, 2021

Nocebo effect

>>> TEAM A

>>> TEAM B

>>> TEAM C

>>> TEAM D

>>> INDIVIDUAL SUBMISSION

>>> JH CODE (BOTH JACKKNIFE & BOOTSTRAP)

Life expectancy

>>> TEAM W

>>> TEAM X

>>> TEAM Y

January 11, 2021

Same subject studied in different experimentally induced Conditions

SE (or stability) of a SD

Effect Size

Efficiency of paired studies

Defence of small clinical trial

Definite ‘Negative’ Studies ?

Inconclusive ‘-ve’ Studies

2020

January 10, 2020

January 17, 2020

January 24, 2020

January 31, 2020

February 14, 2020

February 21, 2020

February 28, 2020