BIOS601 AGENDA: Tuesday October 02 and Thursday October 04, 2012
[updated Oct 01, 2012]
 Agenda for October 02 & 04, 2012 
  
  -   Discussion of  issues
  in 
  C&Hs Chapter 04 (Consecutive Follow-up Intervals), and 
  JH's 
  Notes and Assignment on this chapter
 
 Answers to be handed in for: 
  (Supplementary) Exercises 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, 4.7
 
 Remarks on Notes:
 
 These notes were developed to supplement the Clayton and Hills chapter,
  which was aimed at epidemiologists, and which does not give the 
  derivations (the 'wiring' and 'theory') below the results (the user's view of the car).
 
 It is important to read C&H first, before JH's notes.
 
 The core topics in this chapter are non-parametric
   (or more precisely,
  distribution-free) approaches to estimating survival curves, and the
  associated functions (e.g., pdf and hazard function) that can be derived from them.
  Last week, in the orientation to ML estimation, several of our examples
  involved specific candidate distributions for rv's that take on
  values on the (0,Inf) scale, such as  exponential, gamma, log-normal etc. 
  But, probably to your surprise, you will in one of the exercises learn that 
  whereas the Kaplan-Meier estimator is usually described as a non-parametric estimator,
  it can also be shown to be the survival curve, among 
  ALL POSSIBLE SURVIVAL CURVES
  survival curves, that makes the observed data most likely; and so it
  is sometimes referred to as a non-parametric MLE -- almost a contradiction
  in terms, especially when we emphasize that for ML one needs to specify
  a distribution with a full (parametric) form.
 
 C&H develop the K-M estimator very 'naturally' by slicing time
  finer and finer, so that most conditional survival probabilities
  in the product are unity, and can be omitted, leaving just the
  (less than unity) conditional survival probabilities for the time-bands that contain 
  >= 1 event.
 
 One could begin even further back, and consider what the empirical cdf(t)
  and thus its complement, the empirical S(t), would look like if there were no
  censoring. In this case, when we got to the t where a cumulative total of
  k subjects have made the transition from the initial state,
  the empirical S(t) would be
 
 [(n-1)/(n)] x [(
  n-2)/(n-1)] 
  x [(n-3)/(n-2)] x ... 
  [(n-{k-1})/(n-{k-2})) x 
  [(n-k)/(n-{k-1})
 
 and this simplifies, because k-1 terms on the top would cancel the same
  k-1 terms in the bottom, to (n-k)/n.
 
 In the K-M version, its the same structure, BUT (because of censoring)
  not all of the 'survivors' of one time-band experience the next time-band.
  The no. at risk (the risk set) gets progressively smaller, not 
  just because of the transitions, but because of the 'staggered' entries 
  or the lost-to-follow-up.
 
 -----
 
 Since the C&H book was written, the Nelson-Aalen estimator 
  has become
  more popular, and it is now found in all good survival analysis packages.
  So it deserves some study, and to be properly understood.
  As JH notes at the end of part II of his expository article, there is some
  confusion as to what a N-A curve means, 
  since it is often taken to mean
   integral of the estimated hazard function
   (JH thinks this is the more common
  meaning). But it is sometimes used to refer to the survival curve  
  = exp[-integral of the estimated hazard function] that
  one can derive from the integrated hazard function. If we want to think
  of K-M and N-A curves 'in parallel', then it is this latter 
  downwards travelling, N-A step-function,
  taking on values from 1 to 0,
  version makes the N-A step-function and the K-M  step-function very close cousins.
 
 Remarks on assigned exercises .
 
 The exercises are also designed to i. get you familiar with
 the Greenwood formula, and with how to obtain K-M and N-A
 'curves' via R, ii. appreciate why and by how much they differ,
 and when, and iii. see some live examples of survival-analysis
 and infection-rate-analysis, and see how sometimes the fact that
 interval-censored 
 observations (such as those from HIV testing) are simplified
 in actual analyses, especially if, as in the Kenya and Uganda examples, 
 simplifying the data doesn't change the estimates very much.
 
 4.1 . As we remarked above, this aspect of the K-M estimator
 is unusual. But why not think of it this way: imagine you can choose ANY
 distribution you wish, (as long as it's a legitimate cdf) and that its cdf is 
 simply called a 'no-name-cd'f (it could have vertical jumps, and not be
 a smooth functio such as we have entertained so far)
 Then in this example, what would the Likelihood be?
 Wouldn't it be (no matter what cdf  or S(t) we choose,
 
 prob[1st observation  | this cdf or this S(t) function]
 x
 prob[2nd observation | this cdf or this S(t) function]
 x
 prob[3rd observation | this cdf or this S(t) function].
 
 Since the 2nd observation is that the transition (event) will occur
 at some time point after t=7, i.e., it is a right-censored at 7,
 prob[2nd observation | this cdf or this S(t) function] is S(7)
 or 1-CDF(7), So you read off this from the candidate S(t) function
 you are 'trying on' for size.
 
 For the likelihood contribution from the 1st observation,
 we note that this is an uncensored observation,
 or if you like, 'interval-censored' within a  narrow interval
 that contains the value 5. We need the probability of observing this.
 Shouldn't we, by analogy with when we are constructing an empirical cdf
 for n uncensored values, put a probability 'spike' or 'point mass'
 at t=5? The question is how much to put? If all n observations were uncensored,
 we would put a mass of 1/n at each value.
 
 Likewise, we would need to put some probability mass at t=10.
 The question is where else (if anywhere) should we put some mass?
 how about 1/3 at t=5, 1/3 at t=10, and the other 1/3 spread out
 uniformly over the interval t=7 to t=9 say. If we did this, 
 the S(t) curve would equal 1 until t=5,
 take a vertical dive at t=5, and then head horizontally (at a height of 2/3)
 until t=7, then head downwards from t=7, until it reaches
 S(9)=1/3, then head straight across to S(10) =1/3, then down to S(10+)=0.
 
 We can now calculate the L under this 'candidate' S(t) function:
 
 1/3
 x
 2/3
 x
 1/3.
 
 = 2/27
 
 
 How about 1/3 mass at t=5, 1/2 mass at t=10, and the other 1/6 mass spread out
 uniformly over the interval t=7 to t=9 say. If we did this, 
 the S(t) curve would equal 1 until t=5,
 take a vertical dive at t=5, and then head horizontally (at a height of 2/3)
 until t=7, then head downwards until it reaches
 S(9)=1/2, then straight across to S(10)=1/2,
 then head straight down to S(10+)=0.
 
 The L under this 'candidate' S(t) function is:
 
 1/3
 x
 2/3
 x
 1/2
 
 = 2/18, better than before.
 
 
 If you keep reducing the mass between 7 and 9, and instead placing it
 at t=10, until t=you get to the S(t) function described in the question,
 you get the L under this 'candidate' S(t) function as:
 
 1/3
 x
 2/3
 x
 2/3
 
 = 4/27, better than any others.
 
 
 This suggests that to maximize L, we should only put probability mass
at the times of the events (the so-called 'failure' times), and NONE 
at the CENSORED times.
 
 The question then is how much at each 'failure' time.
 
 4.2  à venir
 
 ...