Discussion of  issues
  in  C&Hs Chapter 05 (Rates) 
  and  JH's 
  Notes and Assignment on this chapter
  
  Answers to be handed in for: 
  (Supplementary) Exercises   5.1,     5.2,  
    5.3,   5.4,    5.5    5.7 
  
   
   
  Remarks on Notes:
  
  These notes were developed to supplement the Clayton and Hills chapter,
  which was aimed at epidemiologists, and which does not give the 
  derivations (the 'wiring' and 'theory') below the results (the user's view of the car).
  
  
 It is important to read C&H first, before JH's notes.   
  
  
  The core topics in this chapter are rates,
   both  'CONSTANT-in-time, and 
   VARYING-over-time.
  
  
  
  
 Remarks on assigned (and non-assigned) exercises .
 
 
 The exercises are also designed to i. get you familiar with
 the Greenwood formula, and with how to obtain K-M and N-A
 'curves' via R, ii. appreciate why and by how much they differ,
 and when, and iii. see some live examples of survival-analysis
 and infection-rate-analysis, and see how sometimes the fact that
 interval-censored 
 observations (such as those from HIV testing) are simplified
 in actual analyses, especially if, as in the Kenya and Uganda examples, 
 simplifying the data doesn't change the estimates very much.
 
 
  5.1   
  For part 1, the likelihood has already been set up, so its merely
  a matter of maximizing it (before you do so, please check that you
  understand how it was set up, and that there are no typos).
  
  For part 2, if you maximized the logLik w.r.t. lambda, then for the SE
  you need the sqrt of the negative of the reciprocal of the curvature.
  You can calculate the curvature numerically by using
  [ LogLik'(MLE - delta) - LogLik'(MLE + delta) ] / (2*delta)
  
  and you can calculate each LogLik'(.) by the same method
   [ LogLik(. - epsilon) - LogLik(. + epsilon) ] / (2*epsilon) 
   
 You can double-check that LogLik'(MLE) equals 0. 
 
 
  You can also trick optim into calculating the hessian:
  
  pretend that there are 2 parameters, but only use just 1 
  of them in the log-likelihood 
  
loglik=function(par) {
	lambda = par[1]
	return(
	  - ( (2300+2215)/2 + 968 ) * lambda +
	  (19+14)*log(1-exp(-lambda/2)) +
	      12 *log(1-exp(-lambda)) 
	)
}
par = c(45/3300,0)
( fit=optim(par,loglik,
 control=list(fnscale=-1), hessian =TRUE) )
    
 
 A double check on the curvature is that the MLE should be close to
 the 45/3391.8 = 1.33/100 the authors got, so the SE should be approx sqrt(45)/3391.8 .
 so the curvature should be close to - 1/(SE^2). 
 
 
 You should not have to re-estimate log[lambda] from scratch, since the
 MLE of a function of a parameter is the function of the MLE. And you can you use
 the change in scale to compute the new variance. 
 
 One way to judge whether the transform improves the LogLik shape is to plot both. 
 You should be aiming for a near-quadratic-shape -- since that is the
 LogLik based on Gaussian variation. 
 
 But you can also gauge things by the structure of the MLE , which is
 close to 45/3391.8, i.e., a Poisson R.V. divided by a 3391.8 that 
 you can treat as a constant. 
 
  For part 3, make the tests more frequent, and the intervals shorter.
  Say you had weekly testing.
  
  For the first six months, there would still be the product of 19 terms,
  but now they would be 'personalized' .. e.g. the first HIV+ test might be in
  a man who was negative at end of week 3 and positive by end of week 4.
  In his case, the Likelihood contribution is S(4wks) - S(3wks).
  
  If lambda
  is measured in terms of years, then 3 weeks and 4 weeks are 
  at t.start = 3/52 years and t.end = 4/52 years. Thus the contribution is
  exp[- lambda*t.start] - exp[- lambda*t.end]. 
  but now dt is not 0.5 years,
  but just dt = (1/52) years, and we can approximate the contribution by a 
  rectangle with base dt = 1/52 and height = pdf(3.5/52).
  Since pdf(t) is lambda * exp(-lambda t), the contribution from this first
  event is ??? . Now repeat for the other 18, and see how much simpler it is
  that a product of 19 identical ones of the form
  1 - exp[ - 0.5 * lambda].
  
 So you can see how this simplifies matters, and with a likelihood
  that now has a closed form(*), you can head to a the MLE directly (i.e., analytically). 
  
  This is because when you collect ALL the negative tests
  over these 26 weeks, each one involving an exp[-lambda*dt], you will
  end up as almost as many of them as there were weeks*persons. 
  In fact, it will be 26*2319 MINUS the ones that didn't need to be done because
  at some points in those 26 weeks, some 19 tested positive and so stopped
  being tested.  Lets say the average no. of weeks before they tested positive 
  was 13 weeks, we end up with approx 26*2319 - 19*13 negative tests. 
  So their collective
  contribution to the Likelihood is exp[ - {26*2319 - 19*13}*(1/52) * lambda].
  
  And remember that each of the 19 positive tests 
  contributes a (approx.) lambda * exp [ - lambda * t], so there will be a 
  lambda to the 19th power as well as 19 exp[ - lambda * t] terms.
  
  
 but L is now is quite amenable to taking its log, and setting its 
  derivative to zero and solving directly for lambda. 
  
  In fact, if you know the exact times of the 45 seroconversions, the MLE
  comes out to 45/sum(t), where the sum is over all men, and t is either
  the time of the conversion, or the times (2300 of them) 
  of the last negative test.
   
  
  
  
  The likelihood  contributions for the next 26 weeks have the same structure:
  (if ) 14 of the form ???  and 2215*26 - 14*13 of the form
  exp[-lambda*(1/52)]. 
  
  The third period is 52 weeks long; we have 12 positive and
  (approx) 52*968 - 12*26  negative tests, with their respective contributions.
 Once you multiply all these together, you will notice that the ... in the
 exp[- lambda * ...] term is still close (2300*(1/2) + 2215*(1/2) +  968*1 years.
 But now, with the small dt, each 1 - exp[ - lambda * (1/52) ] is very close to what?  
 
 
 part 4:  use Cum. Inc. or Risk = 
 1 - exp[- integral of the lambda function].
 
 integral to time t = lambda * t. 
 
  
  5.2  
  
  Rate of de novo mutations and the importance of father's age to disease risk
  
  
  
  In past years, some have misunderstood the meaning of a constant
   rate, and ignoring the amount of time during which this 
   rate was operation.
   
   
  Think of the data from the the child as telling us how many
  mutations the father had acquired over his however many years
  he had lived up to the time the child was conceived.
  
  
  Thus, if, for example, the rate was CONSTANT over age, say an average of
  2.5 mutations per year lived, the expected number of 
  mutations acquired by the time the father reached age 30 is 
  2.5 x 30  = 75.
  
  
  If the rate was not CONSTANT over age, but (to make a simple
  rate-increasing-with-age example) was say
  2.0 per year while the father was himself a child, i.e.,
  say for the first 15 years, and say 3.0 per year for the next 10
  years, and 4.0 per year for the next 5, the expected no. acquired by 
  age 30 would be 2.0 x 15 + 3.0 x 10 + 4.0 x 5  = 80. The 'linear-in-age'
  model is smoother than that, so one can use an integral to compute the
  expected no. by the age each father had his child. But you will see
  that the form is quite suitable for a linear regression: remember
  that in a linear regression with B0 + B1 x X1 + B2 x X2, the
  LINEARITY is in the B's:  X1 could be age, and X2
  could be age-CUBED, and the model is still linear -- in the B's !!
  
   
    
  
  
  A NON-LINEAR model is one where the expected Y (or function
  of the expected Y) is say B0/(B1 x X1 + B2 x X2). 
  
    
  
  The 'beyond-linear-in-age' rate model will have a slightly
  nastier integral for the expected value, so it may not be possible to put it into
  a 'linear regression' form. In this case, one can maximize
  the log-likelihood numerically with say optim.
  
  
 Think of this as like getting gift money from your grandparents, 
  with the amount increasing with your age! Suppose B0=1.5 and B1=0.12.
  Then the amount received between a = 2.25 years and 
  a = 2.26 years is (1.5 + 0.12*2.255) * 0.01. The total over 25 years
  would then be  the integral of (1.5 + 0.12*a) * da over that 
  25 year period.
  
  The paper referred to is THE glm paper. Its trick was to use iteratively 
  re-weighted LS to obtain the MLE. The generality is that each link and each
  error distribution determines a specific set of weights, but the 
  core code is the same. glm is now built-in to R. So instead of having a separate
  software for fitting a Normal model, or a Poisson model, or a
  Gamma model, or .. , or for fitting the mean E[y|x], or
  the log of  E[y|x], or the logit or probit of  E[y|x],
  as a linear compound, we have one core code that uses the links
  and the error distributions to toggle between weights.
  In all cases, the Poisson model for the observed
  counts continues to make sense. But remember that
  whereas we make a statistical model for the expected 
  COUNTS, that model involves the parameter(s) of the
  mutation-RATE function, coupled with the age-range over which
  the father was at risk of new mutations.
   
   
  
 
 
 
 5.3  
 
 
1.  You will probably find it easier and 
quicker to  maximize each log-likelihood numerically.
However, the closed form estimator is instructive, since
it shows what happens if one made the testing more 
frequent and thus the time when the
infection occurred more precise.
2319 initially HIV- men were tested at the 6-month, 
or 0.5 year follow-up, and 19 of them were found to be HIV+, 
and the remaining 2300 were found to be HIV-.
The probability of observing these are proportional to
S^(2300) times (1-S)^19, where S is the probability of
going a HALF year without becoming HIV+.
But, parametrically, S = exp[- lambda * 0.5] ,
where lambda is the infection rate .. i.e.
lambda = 0.02 means 2 infections per 100 
man-years at risk.
 
So the log-likelihood is 19 log(1-S) + 2300 log(S),
or 19 log(1-exp[ * ] ) + 2300 log(exp[ * ]). 
It is easier to see that
MLE for the parameter S is 2300/2319, and then that 
the MLE for lambda itself is - 2 times log(2300/2319). 
(the 2 is to have the rate
for a full year)
The expansion of log( 1 - x) about x=0 is
-x - x^2/2 - x^3/6 - etc.... 
So if we we expand log(2300/2319 = 1 - 19/2313), we
get 19/2313 + (19/2319)^2/2 + (19/2319)^3/6 + ...
If we ignore all but the first term we get 
lambda.hat.1 = 2 times 19/2319. This is slightly TOO SMALL.
The numerator of 19 is OK, but the DENOMINATOR
of 2319 is TOO LARGE: there were 2319 HIV-
at the beginning of the 6 months, but fewer than that no.
HIV- at the end of the 6 months, so the plot of
No. of persons (y axis) versus time (x axis, up to t=0.5)
does not form a rectangular area of area 2319 x (0.5).
Instead it has 19 departures from the HIV- pool, at 19
(unknown) time locations along the way. The 19 amounts
of HIV+ time should be subtracted from the denominator.   
If we ignore all but the first  AND SECOND terms we get the 
slightly bigger
lambda.hat.2 =  2 times { 19/2319 + (19/2319)^2/2 }.
This second one makes a lot of sense. 
Write 19/2319 as 'L1', so that lambda.hat.2 =  2 times
{ L1 + (L1)^2/2 } 
Re-arranging lambda.hat.2 
so that the numerator is 19,
we have  lambda.hat.2  =  2 times 19/???, and solving for
??? , we get lambda.hat.2  = 2 times 19/2309.539.
Think of the 2309.539 as 2319 - 9.461241, 
or 2300 + 9.539. The latter is very close to 
2300 + 19/2. And so we could think of the total
HIV- person time as 2300 'full' intervals
of 6 months each, and each of the 19 'converters'
as contributing about 1/2 an interval.
Further corrections would refine this further.
At the limit, the - 2 times log(2300/2319) as the
MLE of the infection rate
is 19/1154.743, just as if we had 
0.5 years of HIV- time from each
of 2 x 1154.743 = 2309.487 men. It would take a few
more correction terms to reach this.
 
5.4 
 
Use the datafile with one record per patient,
and the Surv(t,event-indicator) form.
Note the way JH has referred to the indicator. It is 1
if the follow-up was terminated by the event; it is 0
if the follow-up was terminated by your (or the authors)
having had to do the statistical analysis at the time they did.
Also, try NOT to call these type of analyses 'time-to-event'
analyses. They result in an estimated 2-year or 3-year or 4-year
RISK. The majority of the subjects in this study
will never have the event of interest. So it makes no sense
to talk about the time when they will have the event. 
THINK Risk. Or THINK Rates, from which you can calculate RISK. DONT THINK 'time to event' , no matter
how statistically sophisticated it makes you.  
Try NOT to use the 'lost to follow-up' term. 
Very few patients in these industry-sponsored trials are
lost to follow-up: the doctors are paid big money
to ensure they are not. Use 'censored'
(or administratively censored) instead. 
   
5.5    
The Norwegian article is the one from  Stavanger, Norway.
Note that it is based on 5 years of data, from 1989 to 1993 inclusive.
 
In part 5, check whether the incidence rates reported in 
columns 2 and 3 of Table 1 were calculated using the numerators
in columns 4 and 5 (confirmed cases) or 8 and 9 (suspected cases).
The denominators in columns 6 and 7 are AVERAGES of the population 
sizes for the 5 years. Thus the number of person-years is 5 times
the average population size.
Norway (like the other Scandinavian countries) has a registration system what is the envy of
epidemiologists elsewhere. So the authorities know at all time 
who the people are that are living where. So (over say 5 years) 
it is easy to compute
the actual no. of person years spent in each age-band. 
In other countries, the best that can be done is to take the numbers 
when the census was taken (say every 5 or 10 years) and interplate using
supplementary information for a number of sources. They often have to 
correct these once the next census data become available.
You should not think of these incidence rates as hazard rates. They
did not follow all individuals from birth to late in life. They did not even
follow them for the 5 years in question. They got their
numerators for the hospitals. They got their person-years (PY) by
multiplying the average population size by 5 years.
These PY were lived by an open population, not a (closed) cohort.
During that time, there was migration into and out of Norway.
The key is that the cases that occurred on say November 2, 1990
arose from the population living in Norway on that date. 
JH is not sure what they did with cases that occur in tourists.
Incidentally,
a male relative of the instructor  
had an appendix removed while (at age 9) a tourist in 
Northern Australia in the 1990s. 
5.7    
JH thinks the professor arrived at the mean of 68 years
by noting that the median survival is reached 40 years
after 1912, when the average age was 28. 
To try to see where he erred, track these people back an
average of 28 years, to the year 1912 - 28 = 1884.
Now look up the life expectancy AT BIRTH in the lifetables of
1884 (or look at the life tables (in blue) in JH's animations in his 
Bridge of Life webpage. 
Now ask: how many of those OTHER people born in 1884 were alive (and thus
eligible for comparison with the Titanic survivors) in 1912? 
How many of them died before their 1st or 5th or evne 28th birthday?  
So, is he making a fair comparison between the REMAINING lives
of the Titanic survivors and the TOTAL 
lives of those born in 1884?
This 'blunder' is all too common, and has now been given its own
name. The (28) years (on average) or the individual times between
being born and surviving the Titanic are called "immortal", since 
one could not be a Titanic passenger unless one had lived until 1912.
In this  paper (and associated material)  
Avoiding blunders involving immortal time JH and colleague 
Beth Foster have reviewed the history of this blunder,
going back to the writing of the eminent and influential English
'vital statistician' William Farr. 
The immortal time' blunder in comparing the longevity of
people who developed skin cancer with people who did not is probably
the BIGGEST such blunder of all time. As we (and Keiding) say in IJE, the
authors should have known something was wrong with their comparison,
when they arrived at a p-value of < 2 times 10 to the power - 308 (Fig 1,
page 1490 of their article). This is close to the smallest P-value
we have found in the literature! We got 5 times 10 to the power -324.
Their almost-off-the-scale P-value is a reminder that with BIG DATA
one can be VERY (and PRECISELY!!) WRONG.
Reference 4 in the BMJ article on the Titanic survivors
(Spencer FJ. Premature death in jazz musicians: fact or fiction?)
is another example of this same type of blunder that the professor made. 
This link    explains why.