BIOS601 AGENDA: Tuesday October 02 and Thursday October 04, 2012

[updated Oct 01, 2012]

Agenda for October 02 & 04, 2012

Discussion of issues in C&Hs Chapter 04 (Consecutive Follow-up Intervals), and JH's Notes and Assignment on this chapter

Answers to be handed in for: (Supplementary) Exercises 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7

Remarks on Notes:

These notes were developed to supplement the Clayton and Hills chapter, which was aimed at epidemiologists, and which does not give the derivations (the 'wiring' and 'theory') below the results (the user's view of the car).

It is important to read C&H first, before JH's notes.

The core topics in this chapter are non-parametric (or more precisely, distribution-free) approaches to estimating survival curves, and the associated functions (e.g., pdf and hazard function) that can be derived from them. Last week, in the orientation to ML estimation, several of our examples involved specific candidate distributions for rv's that take on values on the (0,Inf) scale, such as exponential, gamma, log-normal etc. But, probably to your surprise, you will in one of the exercises learn that whereas the Kaplan-Meier estimator is usually described as a non-parametric estimator, it can also be shown to be the survival curve, among ALL POSSIBLE SURVIVAL CURVES survival curves, that makes the observed data most likely; and so it is sometimes referred to as a non-parametric MLE -- almost a contradiction in terms, especially when we emphasize that for ML one needs to specify a distribution with a full (parametric) form.

C&H develop the K-M estimator very 'naturally' by slicing time finer and finer, so that most conditional survival probabilities in the product are unity, and can be omitted, leaving just the (less than unity) conditional survival probabilities for the time-bands that contain >= 1 event.

One could begin even further back, and consider what the empirical cdf(t) and thus its complement, the empirical S(t), would look like if there were no censoring. In this case, when we got to the t where a cumulative total of k subjects have made the transition from the initial state, the empirical S(t) would be

[(n-1)/(n)] x [( n-2)/(n-1)] x [(n-3)/(n-2)] x ... [(n-{k-1})/(n-{k-2})) x [(n-k)/(n-{k-1})

and this simplifies, because k-1 terms on the top would cancel the same k-1 terms in the bottom, to (n-k)/n.

In the K-M version, its the same structure, BUT (because of censoring) not all of the 'survivors' of one time-band experience the next time-band. The no. at risk (the risk set) gets progressively smaller, not just because of the transitions, but because of the 'staggered' entries or the lost-to-follow-up.

-----

Since the C&H book was written, the Nelson-Aalen estimator has become more popular, and it is now found in all good survival analysis packages. So it deserves some study, and to be properly understood. As JH notes at the end of part II of his expository article, there is some confusion as to what a N-A curve means, since it is often taken to mean integral of the estimated hazard function (JH thinks this is the more common meaning). But it is sometimes used to refer to the survival curve = exp[-integral of the estimated hazard function] that one can derive from the integrated hazard function. If we want to think of K-M and N-A curves 'in parallel', then it is this latter downwards travelling, N-A step-function, taking on values from 1 to 0, version makes the N-A step-function and the K-M step-function very close cousins.

Remarks on assigned exercises .

The exercises are also designed to i. get you familiar with the Greenwood formula, and with how to obtain K-M and N-A 'curves' via R, ii. appreciate why and by how much they differ, and when, and iii. see some live examples of survival-analysis and infection-rate-analysis, and see how sometimes the fact that interval-censored observations (such as those from HIV testing) are simplified in actual analyses, especially if, as in the Kenya and Uganda examples, simplifying the data doesn't change the estimates very much.

4.1 . As we remarked above, this aspect of the K-M estimator is unusual. But why not think of it this way: imagine you can choose ANY distribution you wish, (as long as it's a legitimate cdf) and that its cdf is simply called a 'no-name-cd'f (it could have vertical jumps, and not be a smooth functio such as we have entertained so far) Then in this example, what would the Likelihood be? Wouldn't it be (no matter what cdf or S(t) we choose,

prob[1st observation | this cdf or this S(t) function]
x
prob[2nd observation | this cdf or this S(t) function]
x
prob[3rd observation | this cdf or this S(t) function].

Since the 2nd observation is that the transition (event) will occur at some time point after t=7, i.e., it is a right-censored at 7, prob[2nd observation | this cdf or this S(t) function] is S(7) or 1-CDF(7), So you read off this from the candidate S(t) function you are 'trying on' for size.

For the likelihood contribution from the 1st observation, we note that this is an uncensored observation, or if you like, 'interval-censored' within a narrow interval that contains the value 5. We need the probability of observing this. Shouldn't we, by analogy with when we are constructing an empirical cdf for n uncensored values, put a probability 'spike' or 'point mass' at t=5? The question is how much to put? If all n observations were uncensored, we would put a mass of 1/n at each value.

Likewise, we would need to put some probability mass at t=10. The question is where else (if anywhere) should we put some mass? how about 1/3 at t=5, 1/3 at t=10, and the other 1/3 spread out uniformly over the interval t=7 to t=9 say. If we did this, the S(t) curve would equal 1 until t=5, take a vertical dive at t=5, and then head horizontally (at a height of 2/3) until t=7, then head downwards from t=7, until it reaches S(9)=1/3, then head straight across to S(10) =1/3, then down to S(10+)=0.

We can now calculate the L under this 'candidate' S(t) function:

1/3
x
2/3
x
1/3.

= 2/27

How about 1/3 mass at t=5, 1/2 mass at t=10, and the other 1/6 mass spread out uniformly over the interval t=7 to t=9 say. If we did this, the S(t) curve would equal 1 until t=5, take a vertical dive at t=5, and then head horizontally (at a height of 2/3) until t=7, then head downwards until it reaches S(9)=1/2, then straight across to S(10)=1/2, then head straight down to S(10+)=0.

The L under this 'candidate' S(t) function is:

1/3
x
2/3
x
1/2

= 2/18, better than before.

If you keep reducing the mass between 7 and 9, and instead placing it at t=10, until t=you get to the S(t) function described in the question, you get the L under this 'candidate' S(t) function as:

1/3
x
2/3
x
2/3

= 4/27, better than any others.

This suggests that to maximize L, we should only put probability mass at the times of the events (the so-called 'failure' times), and NONE at the CENSORED times.

The question then is how much at each 'failure' time.

4.2 à venir

...