Centiles of Standardized Survival Functions

Background

In the previous tutorial I used stpm2_standsurv to obtain standardized survival functions. In this tutorial I show the first of a number of different measures of the standardized survival function where I obtain centiles of the standardized survival function.

As a reminder a centile of a survival function can be obtained by solving $S(t) = \alpha$ for $t$. For example, for the median survival time we set $\alpha = 0.5$, i.e. the 50th (per)centile. For simple parametric distributions, such as the Weibull, we can solve for $t$ analytically, but for more complex models the centile is obtained through iterative root finding techniques. In stpm2 I have used Brent’s root finder when evaluating centiles.

The centile of a standardized survival function is obtained by solving the following equation for t.

$$ E\left(S(t | X=x,Z\right) = \alpha $$

This is done through root finding (using Brent’s root finder) by solving,

$$ \frac{1}{N}\sum_{i=1}^N {S(t | X=x,Z)} - \alpha = 0 $$

Variances can be obtained using M-estimation .

Example

I use a colon cancer example. I first load and stset the data

. //use https://www.pclambert.net/data/colon, clear
. use c:/cansurv/data/colon, clear
(Colon carcinoma, diagnosed 1975-94, follow-up to 1995)

. drop if stage==0
(2,356 observations deleted)

. stset surv_mm, f(status=1,2) scale(12) exit(time 120)

     failure event:  status == 1 2
obs. time interval:  (0, surv_mm]
 exit on or before:  time 120
    t for analysis:  time/12

------------------------------------------------------------------------------
     13,208  total observations
          0  exclusions
------------------------------------------------------------------------------
     13,208  observations remaining, representing
      8,866  failures in single-record/single-failure data
 43,950.667  total analysis time at risk and under observation
                                                at risk from t =         0
                                     earliest observed entry t =         0
                                          last observed exit t =        10

I drop those with missing stage information (stage == 0). I am investigating all cause survival (status=1,2).

I fit a model that includes stage, sex and age (using a rectricted cubic splines). I assume proportional hazards, but if I relax this assusmption the syntax for stpm2_standsurv would be identical. Stage is classified as localised, regional and distant and is modelled using two dummy covariates with localised as the reference category.

. tab stage, gen(stage)

   Clinical |
   stage at |
  diagnosis |      Freq.     Percent        Cum.
------------+-----------------------------------
  Localised |      6,274       47.50       47.50
   Regional |      1,787       13.53       61.03
    Distant |      5,147       38.97      100.00
------------+-----------------------------------
      Total |     13,208      100.00

. gen female = sex==2

. rcsgen age, df(3) gen(agercs) center(60)
Variables agercs1 to agercs3 were created

. stpm2 stage2 stage3 female agercs*, scale(hazard) df(4) nolog eform

Log likelihood = -19665.932                     Number of obs     =     13,208

------------------------------------------------------------------------------
             |     exp(b)   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
xb           |
      stage2 |   1.758926   .0617187    16.09   0.000     1.642026    1.884149
      stage3 |   5.656322   .1390555    70.48   0.000      5.39024    5.935539
      female |   .8634681     .01891    -6.70   0.000     .8271894     .901338
     agercs1 |   .9958187   .0053355    -0.78   0.434     .9854161    1.006331
     agercs2 |   1.000015   .0000127     1.22   0.223     .9999906     1.00004
     agercs3 |   .9999654   .0000155    -2.24   0.025     .9999351    .9999957
       _rcs1 |   3.484444   .0392205   110.90   0.000     3.408415    3.562169
       _rcs2 |     1.1908   .0101285    20.53   0.000     1.171113    1.210817
       _rcs3 |   .9620673   .0050498    -7.37   0.000     .9522207    .9720158
       _rcs4 |   1.015331    .003491     4.43   0.000     1.008512    1.022197
       _cons |   .1615451   .0046341   -63.55   0.000     .1527132    .1708879
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.

The is a clear effect of stage with a hazard ratio of 5.66 for distant stage versus localised stage. Remember that I am modelling all cause survival and one would expect a cause-specific hazard ratio to be higher. The all-cause mortality rate for females is 14% lower than males.

I will now predict the standardized survival functions, one where I force all subjects to be male and one where I force everyone to be female.

. range tt 0 10 100
(13,108 missing values generated)

. stpm2_standsurv, at1(female 0) at2(female 1) timevar(tt) atvar(ms_male ms_female) ci

. twoway  (line ms_male ms_female tt, sort) ///
>         , yline(0.5, lpattern(dash) lcolor(black)) ///
>         yline(0.5, lpattern(dash) lcolor(black)) ///
>         xtitle("Years since diagnosis") ///
>         ytitle("S(t)", angle(h)) ///
>         ylabel(0(0.2)1, format(%3.1f) angle(h)) ///
>         legend(order(1 "Male" 2 "Female") ring(0) pos(1) cols(1))

The graph of the two standardised survival functions can be seen below.

As expected (given the hazard ratio) females have better survival than males. I have added a horizontal reference line at $S(t)=0.5$. Where this line crosses the survival curves gives the median survival time. Reading from the graph, this is just under 2 years for the males and just under 2.5 years for females. Using the centile option of stpm2_standsurv will estimate these values more accurately with 95% confidenec intervals. We are also interested in contrasts of the centiles, so use of the contrast option will calculate either a difference or ratio of the median survival times with a 95% confidence interval.

. stpm2_standsurv, at1(female 0) at2(female 1) timevar(tt) centile(50) ///
>                 atvar(med_male med_female) contrast(difference) ci

. list med_male* in 1,  ab(15)    

     +-----------------------------------------+
     |  med_male   med_male_lci   med_male_uci |
     |-----------------------------------------|
  1. | 1.9801987      1.8728516      2.0936987 |
     +-----------------------------------------+

. list med_female* in 1, ab(15)   

     +----------------------------------------------+
     | med_female   med_female_lci   med_female_uci |
     |----------------------------------------------|
  1. |  2.4249751        2.3000843        2.5566472 |
     +----------------------------------------------+

. list _contrast* in 1, ab(18)

     +----------------------------------------------------+
     | _contrast2_1   _contrast2_1_lci   _contrast2_1_uci |
     |----------------------------------------------------|
  1. |    .44477636          .30074772            .588805 |
     +----------------------------------------------------+

The median survival time is 1.98 years for males with a 95% CI (1.87 to 2.09). The median for females is 2.42 years (95% CI, 2.30 to 2.56). As I used the contrast option I also get the difference in the median of the standardised survival curves with a 95% CI. Thus the time at which 50% of females have died is 0.44 years more than the time at which 50% of males have died, 95% CI (0.30 to 0.59).

It is possible to predict for multiple centiles by passing a numlist to the centiles option. For example, the code below calculates centiles between 10 and 60 at 10 unit intervals.

. stpm2_standsurv, at1(female 0) at2(female 1) timevar(tt) centile(10(10)60) ///
>                 atvar(cen_males cen_females) contrast(difference) ci ///
>                 centvar(centiles) contrastvar(cendiff)

. list centile cen_males cen_females cendiff in 1/6, sep(0) noobs

  +----------------------------------------------+
  | centiles   cen_males   cen_fem~s     cendiff |
  |----------------------------------------------|
  |       10   .14919254   .16940399   .02021145 |
  |       20   .32365079   .38459929    .0609485 |
  |       30   .63821705   .78393263   .14571558 |
  |       40   1.1648032   1.4192444   .25444124 |
  |       50   1.9801987   2.4249751   .44477636 |
  |       60   3.4239223    4.277573   .85365069 |
  +----------------------------------------------+

We can then plot the difference in these various centiles.

. twoway (rarea cendiff_lci cendiff_uci centile, sort color(red%30)) ///
>         (line cendiff centile, color(black)) ///
>         , xtitle(centile) xlabel(,format(%3.0f)) ///
>         ytitle("Difference in centile") ylabel(0(0.2)1.2,format(%3.1f) angle(h)) ///
>         legend(off)

There are probably more innovative ways of presenting such data.

Acknowledgement

I would like to acknowledge David Druker of StataCorp who I discussed these ideas with at two Nordic Stata User group meetings. David has written a command that estimates centiles of standardized distributions using a two parameter gamma distribution which is available here.

References

Stefanski, L. & Boos, D. The Calculus of M-Estimation. The American Statistician 2002;56:29-38

Professor of Biostatistics