/*
Source
------

USA TODAY, August 1995

Sidebar to a photograph concerning a tropical storm

"Hurricanes in this century

 The number of hurricanes with winds geater than 110 mph has declined
 since the 1950s. Data list: Major hurricanes to hit the USA: (no. & decade) "

Data setup below
----------------

USA_yrs  10 for the full 10 years in the decades 1900 to 1980

          5 for the 1990's decade (1990-94 *)

              * Presuming the data given in Aug 95 exclude the 1995 'season',
                the time span included in the 1990s decade covers the 5
                seasons 1990, 1991, 1992, 1993 and 1994.

count:   No.  major hurricanes that hit the USA in indicated no. of 'USA-years'

*/

*  --- for SAS ---   (see belwoe for Stata );

data a;
input decade  USA_yrs count;
rate = count/USA_yrs ;       * compute observed rate each decade ;
LINES;
       1900      10     6
       1910      10     8
       1920      10     5
       1930      10     8
       1940      10     8
       1950      10     9
       1960      10     6
       1970      10     4
       1980      10     6
       1990       5     2
;
RUN;

title naive average (ignores fact that last obsn is for 5 years) ;

proc means data = a mean clm; var count ;   RUN;

title weighted average od rates, weights proportional to length of observation ;

proc means data = a  mean clm;
  var rate ;
  weight USA_yrs;     RUN;

title regression model.. Expected no. of events    = rate  x Denominator ;

*                    E[ count | Denominator ]  =  B1   x       X     ;

*    note that when Denominator = 0, expect 0 events.. ie intercept ("B0") = 0 ;

*    so ask that the 'constant' or 'intercept' be omitted.. ;

proc genmod data=a ;
  model count = USA_yrs / dist=poisson link=identity  noint waldci;  RUN;

title constant (rather than Poisson) variation around the expected value ;

* use constant (rather than Poisson) variation around the expected value  ;
* ( Gaussian variation is independent of mean [ homoskedastic ] )         ;

proc genmod data=a ;
  model count = USA_yrs / dist=normal link=identity  noint  waldci;   RUN;


*  XXXXXXXXXXXXXXXX  cut here  XXXXXXXXXXXXXXX ;

* ---- for Stata

 clear
input decade  USA_yrs count
       1900      10     6
       1910      10     8
       1920      10     5
       1930      10     8
       1940      10     8
       1950      10     9
       1960      10     6
       1970      10     4
       1980      10     6
       1990       5     2
end

* naive average (ignores fact that last obsn is for 5 years)

ci count

* use country-years as the denominato...

ci count, poisson exposure(USA_yrs)

* rate each decade...

gen rate = count/USA_yrs

* weighted average, weight proportional to length of observation

ci rate [aweight=USA_yrs]

* regression model.. Expected no. of events    = rate  x Denominator
*
*                    E[ count | Denominator ]  =  B1   x       X
*
*    note that when Denominator = 0, expect 0 events.. ie intercept ("B0") = 0
*
*    so ask that the 'constant' or 'intercept' be omitted..
*
glm count USA_yrs, family(poisson) link(identity) noconstant

*
* use constant (rather than Poisson) variation around the expected value
* ( Gaussian variation is independent of mean [ homoskedastic ] )
*
glm count USA_yrs, family(normal)  link(identity) noconstant


cii 95 62, poisson

cii  1 62, poisson