BIOS601 AGENDA: Agenda for Monday Oct 23 and Wed Oct 25, 2023
[updated Oct 19, 2023]
Agenda for Monday Oct 23 and Wed Oct 25, 2023
- Discussion of issues
in C&Hs Chapter 04 (Consecutive Follow-up Intervals)
and JH's
Notes and Assignments on this chapter
Questions to be addressed in class on Monday 23rd
4.01 [ Area under survival curve; link between hazard function & surval curve S(t) ]
4.02 [ Parametric & non-parametric Surv. curves: tumbler data ]
0.04 [ Greenwood's formula for SE of K-M-based S(t) estimate: JH Chapter on proportions ]
4.08 [ Kaplan-Meier vs. Nelson-Aalen Surv. curves ]
4.04 [ JH will address the portion on
Kaplan-Meier Theatre (courtesy Thomas Gerds) ]
Questions to be addressed in class on Wednesday 25th
4.10 [ Kenya RCT of adult circumcision ]
4.11 [ Uganda RCT of adult circumcision ]
4.22 [ The survival time of chocolates on hospital wards ]
5.07 [ Query regarding the longevity of Titanic survivors ]
Remarks on Notes:
These notes were developed to supplement the Clayton and Hills chapter,
which was aimed at epidemiologists, and which does not give the
derivations (the 'wiring' and 'theory') below the results (the user's view of the car).
It is important to read C&H first, before JH's notes.
The core topics in this chapter are non-parametric
(or more precisely,
distribution-free) approaches to estimating survival curves, and the
associated functions (e.g., pdf and hazard function) that can be derived from them.
In the last 2 weeks, in the orientation to ML estimation, several of our examples
involved specific candidate distributions for rv's that take on
values on the (0,Inf) scale, such as exponential, gamma, log-normal etc.
But, probably to your surprise, you will in one of the exercises learn that
whereas the Kaplan-Meier estimator is usually described as a non-parametric estimator,
it can also be shown to be THE survival curve, among
ALL POSSIBLE SURVIVAL CURVES that makes the observed data most likely; and so it
is sometimes referred to as a non-parametric MLE -- 'NPMLE' -- almost a contradiction
in terms, especially when we emphasize that for ML one needs to specify
a distribution with a full (parametric) form.
C&H develop the K-M estimator very 'naturally' by slicing time
finer and finer, so that most conditional survival probabilities
in the product are unity, and can be omitted, leaving just the
(less than unity) conditional survival probabilities for the time-bands that contain
>= 1 event. Imagine checking on the tumblers very day, or every hour, or every 5 minutes.
One could begin even further back, and consider what the empirical cdf(t)
and thus its complement, the empirical S(t), would look like if there were no
censoring. In this case, when we got to the t where a cumulative total of
k subjects have made the transition from the initial state,
the empirical S(t) would be
[(n-1)/(n)] x [(
n-2)/(n-1)]
x [(n-3)/(n-2)] x ...
[(n-{k-1})/(n-{k-2})) x
[(n-k)/(n-{k-1})
and this simplifies, because k-1 terms on the top would cancel the same
k-1 terms in the bottom, to (n-k)/n.
In the K-M version, its the same structure, BUT (because of censoring)
not all of the 'survivors' of one time-band experience the next time-band.
The no. at risk (the risk set) gets progressively smaller, not
just because of the transitions, but because of the 'staggered' entries
or the lost-to-follow-up.
-----
Since the C&H book was written, the Nelson-Aalen estimator
has become
more popular, and it is now found in all good survival analysis packages.
So it deserves some study, and to be properly understood.
As JH notes at the end of his expository article, there is some
confusion as to what a N-A curve means,
since it is often taken to mean
integral of the estimated hazard function
(JH thinks this is the more common
meaning). But it is sometimes used to refer to the survival curve
= exp[-integral of the estimated hazard function] that
one can derive from the integrated hazard function. If we want to think
of K-M and N-A curves 'in parallel', then it is this latter
downwards travelling, N-A step-function,
taking on values from 1 to 0,
version makes the N-A step-function and the K-M step-function very close cousins.
Remarks on assigned (and non-assigned) exercises .
This year, we will leave Q4.3 (the 'NPMLE' property of the K-M estimator)
to the PhD students; Given the nice article by Gerds, we will
however cover the (quite neat) 'distribution-to-the-right'
representation that Efron pointed out.
We will leave the 'self-consistent' estimator representation to the PhD students.
It is the insight need to understand how the Swiss statisticians
fitted the NPMLE survival curve to the avalanche data.
Q4.7 addresses another way to look at
part ii of Q1, and provides some insight for the N-A estimator in Q4.8 and Q4.9.
The exercises are also designed to i. get you familiar with
the Greenwood formula, and with how to obtain K-M and N-A
'curves' via R, ii. appreciate why and by how much they differ,
and when, and iii. see some live examples of survival-analysis
and infection-rate-analysis, and see how sometimes the fact that
interval-censored
observations (such as those from HIV testing) are simplified
in actual analyses, especially if, as in the Kenya and Uganda examples,
simplifying the data doesn't change the estimates very much.
Before you start, you might wish to look at the Bridge-of-Life animations
on JH's website.
http://www.biostat.mcgill.ca/hanley/BridgeOfLife/
4.1
Results 1 and 2 are fundamental.
Result 1 is often used to estimate the mean duration, based on censored data that have been
converted to an (effectively) empirical CDF or 1-CDF. It was used by
one of our seminar speakers last year to calculate the average post-treatment survival for patients with
newly-diagnosed bladder cancer.
Incidentally, if you look at the blue curve in the animation you can produce with the R code (or look at the ages of
the tumblers at any time after the process has reached steady state) you will see that it gives
the current-age distribution of the living.
The total number of deaths is obtained by multiplying the hazard function at each age
to the persone-years lived in the corresponding age-bin, and summing them.
How does that fit with what you are asked to prove in part 1? Does it not just confirm the
definition of the hazard function? deaths in age-bin a = hazard at a x { [S(a) x da }, or
hazard(a) = density(a) / S(a).
So for each age-bin to generate the appropriate number of deaths, i.e,
to generate the correct distribution of ages at death, the numbers
alive in bin 'a' must be [proportional to] S(a).
Result 2 is also fundamental, especially if you want to understand where the Nelson-Aalen estimator of S(t)
comes from.
Most times, the result is derived by solving the differential equation
implicit in the definition of the hazard function. But see question 4.7 for another way to look at it.
This alternative derivation also ties in with the idea of 'generations' that we touched on when dealing
with the Poisson distribution (see the 'Poisson generations' graph on the webpage)
4.2
One message from this question is how little one loses by discretizing time.
Once, when plotting methods were still very crude and inaccessible,
JH once saw a physician making a K-M curve using a crude program.
Because the person had measured survival in days, the number of steps was very large,
and it was hard to draw them all manually. The person was very relieved
to be told that he would not lose much information by rounding the times to the nearest week or month.
The point of part 3 of this question is to give some intuition for the name 'product limit' that K and M gave their estimator.
JH has put the full text of their 1958 'classic' on the webpage.
Meier died in 2011; see NYTimes
or Google "Paul Meier obituary'.
Before testifying in court in 2013, JH was very glad to read Meier's advice on what (not) to do in such situations. See
his article in the
American Statistician Feb 2013 Issue on
Statistics and the Law.
4.3
As we remarked above, this aspect of the K-M estimator
is unusual. But why not think of it this way: imagine you can choose ANY
distribution you wish, (as long as it's a legitimate cdf) and that its cdf is
simply called a 'no-name-cd'f (it could have vertical jumps, and not be
a smooth functio such as we have entertained so far)
Then in this example, what would the Likelihood be?
Wouldn't it be (no matter what cdf or S(t) we choose),
prob[1st observation | this cdf or this S(t) function]
x
prob[2nd observation | this cdf or this S(t) function]
x
prob[3rd observation | this cdf or this S(t) function].
Since the 2nd observation is that the transition (event) will occur
at some time point after t=7, i.e., it is a right-censored at 7,
prob[2nd observation | this cdf or this S(t) function] is S(7)
or 1-CDF(7), So you read off this from the candidate S(t) function
you are 'trying on' for size.
For the likelihood contribution from the 1st observation,
we note that this is an uncensored observation,
or if you like, 'interval-censored' within a narrow interval
that contains the value 5. We need the probability of observing this.
Shouldn't we, by analogy with when we are constructing an empirical cdf
for n uncensored values, put a probability 'spike' or 'point mass'
at t=5? The question is how much to put? If all n observations were uncensored,
we would put a mass of 1/n at each value.
Likewise, we would need to put some probability mass at t=10.
The question is where else (if anywhere) should we put some mass?
how about 1/3 at t=5, 1/3 at t=10, and the other 1/3 spread out
uniformly over the interval t=7 to t=9 say. If we did this,
the S(t) curve would equal 1 until t=5,
take a vertical dive at t=5, and then head horizontally (at a height of 2/3)
until t=7, then head downwards from t=7, until it reaches
S(9)=1/3, then head straight across to S(10) =1/3, then down to S(10+)=0.
We can now calculate the L under this 'candidate' S(t) function:
1/3
x
2/3
x
1/3.
= 2/27
How about 1/3 mass at t=5, 1/2 mass at t=10, and the other 1/6 mass spread out
uniformly over the interval t=7 to t=9 say. If we did this,
the S(t) curve would equal 1 until t=5,
take a vertical dive at t=5, and then head horizontally (at a height of 2/3)
until t=7, then head downwards until it reaches
S(9)=1/2, then straight across to S(10)=1/2,
then head straight down to S(10+)=0.
The L under this 'candidate' S(t) function is:
1/3
x
2/3
x
1/2
= 2/18, better than before.
If you keep reducing the mass between 7 and 9, and instead placing it
at t=10, until t=you get to the S(t) function described in the question,
you get the L under this 'candidate' S(t) function as:
1/3
x
2/3
x
2/3
= 4/27, better than any others.
This suggests that to maximize L, we should only put probability mass
at the times of the events (the so-called 'failure' times), and NONE
at the CENSORED times.
The question then is how much at each 'failure' time.
4.4
JH takes 'self-consistency' to mean that 'it obeys its own laws and definitions'.
Thus, in the equation, at each time t, we would estimate the proportion who
exceed it by counting all those who did, and by (circularly) using the full S(t) function
to make an estimate for each censored observation where we are not sure as to whether
it will or not exceed t. Clearly, the closer it already is, the higher the
probability that it will exceed to. The use of the fraction S(t)/S(T_i), rather than
a 0 or 1, or an 'I don't know', reflects this
thinking.
To answer the self-consistency, take the given S(t) function as correct, and
check at sufficient t values to be sure the equation involving it holds true.
Formerly, when teaching the 're-distribute to the right' representation,
JH used to give the analogy of the person moving out of an apartment
who gives the sofa to the new person coming in, and so on and so on.
But, now, the idea of tearing up and
redistributing one's page (one's weight of 1/n) in the Kaplan-Meier
theatre' by Thomas Gerds conveys the idea much more dramatically.
4.5 and 4.6
This question (relating to the conservation of probability, and optimum placement
of it to maximize the likelihood, is a very useful concept. Its use is not
limited to the easy (1-pass) case of right-censored data. It also applies to
right- and interval-censored data.
A former 556-557 teacher, Alain Vandal, did his PhD work on such data, and has an R package
('Icens'). ( Google "Alain Vandal" and "Constrained Estimation and Likelihood Intervals for Censored Data".
Ming Gao Gu, also formerly at McGill, also worked on this problem.
4.7
JH is think of restructuring this teaching article so as to focus more on
the N-A aspect, which is new for older statisticians and epidemiologists.
In April 2013, The Editor of Epidemiology wrote me:
"Dear Jim: I have a suggestion that might help you convey your ideas effectively:
Change the title to "Battle of the survival functions: The challenger, Nelson-Aalen vs. the champion, Kaplan-Meier.
Intuition, theory and practice." You can explain what the two estimators do, and show the different underlying theory
that you have provided. But you would also need to evaluate conditions under which the estimators give
discordant messages, and which estimator is better for various purposes when they are discordant.
If you're willing to tackle this, and if the new version appears to be a useful contribution, I will send it for review.
If this seems too discouraging, I would certainly understand. In that case, please notify our office so we can close your file. "
What do you think? Are you up for a try at co-authoring it?
4.8
Part 1 asks you to write out the ID(t) function, integrate it, and plug it into
the general formula linking S(t) and the integral.
In Part 2, you can start at either estimator and show it when it is close to the
other one.
In Part 3, in practice, it is best to work with with the variance of the
log of the estimator of S(t), and to form CI's in the log scale, then back-calculate at
the end. So, you might want to focus on the variance of the log, and not bother with the
variance of the antilog of the log.
Incidentally, textbooks are divided on whether to take d_j as Binomial or Poisson.
JH favours the Poisson, as he thinks of it as a Poisson count within a time interval
dt, rather than a binomial 'at' time t. You will see both variance forms. Of course,
in most applications, with n_j fairly large, and d_j small, the difference
is inconsequential.
4.9
JH expects that at first glance, most people who look at the step function in Fig 4 in the teaching article
take it to be a survival curve. BUT IT IS NOT.
This curve is simply the Person-time curve, and that it drops off
at the time each observation is censored OR discontinues.
It is becoming common in better journals to show K-M or N-A curves (or their complements)
with some additional data-rows at the bottom, showing the numbers 'at risk' at various selected
time points in the followup. You can think of this step function as the ultimate (most information)
version of such data. The area is the sum of the products of numbers
of persons and time.
The questions re labels are meant to emphasize how important they are. If you want to see
how silly or uninformative some labels, are, and how any lay person would be confused by them,
look at some of the medical journals. For JH, this is part of a biostatistician's communication
responsibility for communication. And just because the graphs are pretty, or came out of Stata doesn't
make them correct or clear.
4.10
In previous years, I noticed that several students
did the calculations 'by hand'. If using R,
please use vectors to automate the calculations.
You might not agree exactly on the 'difference' or on its standard error.
The CI for the Risk ratio was probably calculated by working in the log-of-the-ratio
scale, and using a result we derived earlier for the log of a r.v.
Here, we do not have two clean binomials, as the numbers of persons being tested
are fewer further out in the follow-up time.
You should use the Greenwood variance for each of the two 'Risk' estimates (it takes
care of the diminishing numbers at risk)
then calculate the CI for the log of their ratio.
i.e., use the square of each SE as the variance of the estimated risk,
and apply the delta method.
4.11
4.12
4.13
The author shared the raw data with JH so we can check how close you get.
4.14
You might want to look at this
graph, which JH has made from
what he thinks the data look like.