Statistical Methods for Survival Data Analysis Elisa T. Lee, John Wenyu Wang. Praise for the Third Edition . Wang ebook PDF download. Statistical Methods for . ters also contain examples of the application of these methods to the detection of a variety of agents, including dioxin, cigarette smoke, polycyclic aromatic. --Statistics in Medical Research Updated and expanded to reflect the latest developments,Statistical Methods for Survival Data Analysis, FourthEdition continues.
|Language:||English, Spanish, Indonesian|
|Genre:||Fiction & Literature|
|ePub File Size:||17.82 MB|
|PDF File Size:||8.85 MB|
|Distribution:||Free* [*Regsitration Required]|
Statistical Methods for Survival Data Analysis Statistical Methods for Survival Data Analysis Third EditionELISA T Views 4MB Size Report. DOWNLOAD PDF. Share. Email; Facebook; Twitter; Linked In; Reddit; CiteULike. View Table of Contents for Statistical Methods for Survival Data Analysis. Statistical Methods for. Survival Data Analysis. Third Edition. ELISA T. LEE. JOHN WENYU WANG. Department of Biostatistics and Epidemiology and. Center for.
However, patients who had developed macrovascular disease at the time of baseline examination had a lower retinopathy incidence. Group 1: Although the data can be used for various analyses throughout the book, the reader is asked here only to describe in detail how the data can be analyzed. Thus, a general expression of H corrected for ties is H: Feinleib and MacMahon report that the agreement between the observed and calculated distributions is striking for each group except for women with chronic lymphocytic leukemia. The PL estimate can be considered as a special case of the life-table estimate where each interval contains only one observation.
Bibliographic remarks conclude each chapter.
Reviews "…an easy-to read introduction to survival analysis which covers the major concepts and techniques of the subject. Free Access. Summary PDF Request permissions. Tools Get online access For authors. Email or Customer ID. Forgot password? Old Password. New Password. Your password has been changed. Returning user. Request Username Can't sign in? Forgot your username? Enter your email address below and we will send you your username.
In addition, patients with high triglyceride levels tend to have higher incidence of retinopathy p: However, patients who had developed macrovascular disease at the time of baseline examination had a lower retinopathy incidence.
Many of these patients may have developed retinopathy, particularly the patients who have died, but were not included. Therefore, the lower incidence of retinopathy in patients who had macrovascular disease at baseline is probably the result of a selection bias.
Similarly, the large number of death plus the losses to follow-up may also contribute to the drop in retinopathy rate in patients who had had diabetes for more than 12 years at baseline. Examine the simultaneous relationship of the variables to the development of retinopathy.
Univariate analysis of each variable using the contingency table or the chi-square test gives a preliminary idea of which individual variable might be of prognostic importance. The simultaneous effect of all the variables can be analyzed by the linear logistic regression model discussed in Section The results are consistent with those in the univariate analysis.
The linear logistic regression model is useful in identifying important risk factors. However, complete measurements of all the variables are needed; missing data are a problem. In this example, complete data are available on most of the patients.
This may not always be the case. Although there are methods of coping with missing data discussed in Section Thus it is extremely important for investigators to make every effort to obtain complete data on every subject. Bibliographical Remarks It is impossible to cite all the published examples of survival data analysis similar to those in this chapter. Other similar studies can be found in the literature: Although the data can be used for various analyses throughout the book, the reader is asked here only to describe in detail how the data can be analyzed.
The data appear in examples and other exercises in subsequent chapters. Exercise Table 3. A ND, not done.
The investigator is interested in the response and survival of the patients and in identifying prognostic factors. How would you analyze the data? How would you analyze the data to answer these questions? Identify the important prognostic factors that are associated with survival. A Number of years prior to treatment; negative value — no nephrectomy.
B 1, combined chemotherapy and immunotherapy, 2, others. C 0, no response; 1, complete response; 2, partial response; 3, stable; 4, increasing disease; 9, unknown. D 1, dead; 0, alive; 9, unknown. A ; never in remission during study period. B 1, relapsed; 0, still in remission; 9, never in remission during study period.
C 1, dead; 0, still alive. D In millimeters; 99, unknown. Lee et al. A 1, normal; 2, borderline; 3, abnormal.
B 0, no; 1, yes. Unfortunately, the simple method of Example 2. Nonparametric or distribution-free methods are quite easy to understand and apply. Of the three survival functions, survivorship or its graphical presentation, the survival curve, is the most widely used.
Section 4. With the increased availability of computers, this method is applicable to small, moderate, and large samples. However, if the data have already been grouped into intervals, or the sample size is very large, say in the thousands, or the interest is in a large population, it may be more convenient to perform a life-table analysis.
The PL estimates and life-table estimates of the survivorship function are essentially the same. Many authors use the term life-table estimates for the PL estimates. The only difference is that the PL estimate is based on individual survival times, whereas in the life-table method, survival times are grouped into intervals.
The PL estimate can be considered as a special case of the life-table estimate where each interval contains only one observation. In addition, it introduces the standardized mortality rate and standardized incidence rate. Conceptually, we consider this group of patients as a random sample from a much larger population of similar patients. We relabel the n survival times t , t ,. Following 2. Equations 4. The following example illustrates the method. Example 4. Table 4.
The estimated median survival time is 8 months from Figure 4. A more accurate estimate can be obtained using linear interpolation: However, when the median survival time must be estimated from a survival curve, a smooth curve such as Figure 4.
This method can be applied only if all the patients are followed to death. The rationale can be illustrated by the following simple example. At the end of the year, 20 additional patients join the study.
Suppose that the study terminates at the end of and you want to estimate the proportion of patients in the population surviving for two years or more, that is, S 2. Kaplan and Meier believe that the second sample, under observation for only one year, can contribute to the estimate of S 2. That is, S 2: Therefore, the second proportion in 4.
Column 1 contains all the survival times, both censored and uncensored, in order from smallest to largest.
The second column, labeled i, consists of the corresponding rank of each observation in column 1. The third column, labeled r, pertains to uncensored observations only. Let r: To summarize this procedure, let n be the total number of patients whose survival times, censored or not, are available.
The values of r are consecutive integers 1, 2,. The following example illustrates the calculation procedures. Six patients relapse at 3. It is a conservative estimate. For the data in Example 4. In Example 4. Calculation of the PL estimate of S t in Example 4. Let t denote the observed remission time uncensored or censored in Table 4.
One can adopt this code to obtain the PL estimate of S t for any observed uncensored or censored survival time data. The estimated S t is plotted in Figure 3. The median tumor-free time is approximately days. Consider the data in Example 4. The mean survival time is estimated using 4. For example, if in Example 4. Var t: Kaplan and Meier suggest that 4. Consider the survival times in Example 4. The sample mean is t: Var 7. The Kaplan—Meier method provides very useful estimates of survival probabilities and graphical presentation of survival distribution.
Breslow and Crowley and Meier b have shown that under certain conditions, the estimate is consistent and asymptomatically normal. However, a few critical features should be mentioned.
The Kaplan—Meier estimates are limited to the time interval in which the observations fall. If the largest observation is uncensored, the PL estimate at that time equals zero. Although the estimate may not be welcomed by physicians, it is correct since no one in the sample lives longer. The most commonly used summary statistic in survival analysis is the median survival time. However, the solution may not be unique. Consider Figure 4. Figure 4. A practical way to handle the situation is to use probabilities of surviving a given length of time, say 1, 3, or 5 years, or the mean survival time limited to a given time t.
The PL method assumes that the censoring times are independent of the survival times. In other words, the reason an observation is censored is unrelated to the cause of death. This assumption is true if the patient is Figure 4.
However, the assumption is violated if the patient develops severe adverse effects from the treatment and is forced to leave the study before death or if the patient died of a cause other than the one under study e. When there is inappropriate censoring, the PL method is not appropriate. In practice, one way to alleviate the problem is to avoid it or to reduce it to a minimum. Similar to other estimators, the standard error S.
It has been used by actuaries, demographers, governmental agencies, and medical researchers in studies of survival, population growth, fertility, migration, length of married life, length of working life, and so on. There has been a decennial series of life tables on the entire U. States and local governments also publish life tables. As clinical and epidemiologic research become more common, the life-table method has been applied to patients with a given disease who have been followed for a period of time.
Life tables constructed for patients are called clinical life tables. Although population and clinical life tables are similar in calculation, the sources of required data are different.
The cohort has to be followed from until all of them die. The proportion of death survivor is then used to construct life tables for successive calendar years.
This type of table, useful in population projection and prospective studies, is not often constructed since it requires a long follow-up period. The starting point is birth at year 0. Two sources of data are required for constructing a population life table: For example, a current U. The current life table, based on the life experience of an actual population over a short period of time, gives a good summary of current mortality.
This type of life table is regularly published by government agencies of different levels. One of the most often reported statistics from current life tables is the life expectancy. The term population life table is often used to refer to the current life table.
In the United States, the National Center for Health Statistics publishes detailed decennial life tables after each decennial census. These complete life tables use one-year age groups. Between censuses, annual life tables are also published. Tables 4. The abridged table in Table 4. Current life tables usually have the following columns: Age interval [x to x ; t. This is the time interval between two exact ages x and x ; t; t is the length of the interval. For example, the interval 20—21 includes the time interval from the 20th birthday up to the 21st birthday but not including the 21st birthday.
Proportion of persons alive at beginning of age interval but dying during the interval q. The information is obtained from census data. For R V example, q for age interval 20—21 is the proportion of persons who R V died on or after their 20th birthday and before their 21st birthday.
It is an estimate of the conditional probability of dying in the interval given the person is alive at age x. This column is usually calculated from data of the decennial census of population and deaths occurring in the given time interval.
For example, the mortality rates in Table 4. This column is the foundation of the life table from which all of the other columns are derived. Number living at beginning of age interval l. The initial value of l , the V V size of the hypothetical population, is usually , or 1,, The successive values are computed using the formula l: For example, in Table 4. Number dying during age interval d R V d: Stationary population L and T.
Here L is the total number of years R V V R V lived in the ith age interval or the number of person-years that l persons, V aged x exactly, live through the interval. For those who survive the interval, their contribution to L is the length of the interval, t.
For those R V who die during the interval, we may not know exactly the time of death and the survival time must be estimated. Thus, R V L: The symbol T is the total number of person-years lived beyond age t V by persons alive at that age, that is, T: For example, according to the U.
Decennial L ife Tables for —, Vol. L ife Tables, National Vital Statistics Reports, Vol. This means that according to the mortality rates of — newborns are expected to live The life expectancy of a population is a general indication of the capability of prolonging life.
It is used to identify trends and to compare longevity. The overall life expectancy indicates an improvement in longevity in the United States over the time period. Population life tables can be constructed for various subgroups. For example, there are published life tables by gender, race, cause of death, as well as those which eliminate certain causes of death.
Berkson and Gage and Cutler and Ederer give a life-table method for estimating the survivorship function; Gehan provides methods for estimating all three functions survivorship, density, and hazard. The life-table method requires a fairly large number of observations, so that survival times can be grouped into intervals. Similar to the PL estimate, the life-table method incorporates all survival information accumulated up to the termination of the study.
In this way, the life-table technique uses incomplete data such as losses to follow-up and persons withdrawn alive as well as complete death data.
The columns are described below. Interval [t ; t. The interval is from t up to but not including t , i: Midpoint t. The midpoint of each interval, designated t , i: Both functions are plotted as t. W idth b. The width of each interval, b: Number lost to follow-up l.
This is the number of people who are lost G to observation and whose survival status is thus unknown in the ith interval i: Number withdrawn alive w. People withdrawn alive in the ith interval G are those known to be alive at the closing date of the study.
The survival time recorded for such persons is the length of time from entrance to the closing date of the study. Number dying d. This is the number of people who die in the ith G interval.
The survival time of these people is the time from entrance to death. Number entering the ith interval n. Number exposed to risk n. Therefore, people lost or withdrawn in the interval are exposed to risk of death for one-half the interval. If there are no losses or withdrawals, n: It is an estimate of the conditional Q probability of death in the ith interval given exposure to the risk of death in the ith interval.
This is an estimate of the G survivorship function at time t ; it is often referred to as the cumulative G survival rate. For i: Sacher derives an estimate of the hazard function by assuming that hazard is constant within an interval but varies among intervals.
KH Another interesting measure that can be obtained from the life table is the median remaining lifetime at time t , denoted by t i , i: Survival time is computed from time of diagnosis in years. The life table uses 16 intervals of one year. The hazard rate is generally higher after the tenth year. Hence, the prognosis for a patient who has survived one year is better than that for a newly diagnosed patient if factors such as age, gender, and race are not considered. A similar interpretation is reached by examining the estimated median remaining lifetimes.
Initially, the estimated median remaining lifetime is 5. It reaches a peak of 6. The median survival time, either read from the survival curve or using 4. Gehan Then the following SAS code can be used to produce a clinical life table such as Table 4.
The relative survival rate evaluates the survival experience of patients in terms of the general population. To provide a more precise measure of the relationship of the observed and expected survival rates, Cutler et al.
Cutler et al. Using the notations in Table 4. The Connecticut life table for white females, —, is used in calculation of the expected survival rate. The relative survival rates are plotted in Figure 4. Berkson suggests using a corrected survival rate. This is the survival rate if the disease under study alone is the cause of death. If p denotes the survival rate when cancer alone is the cause of death, Berkson A proposes that p p: Rate p may be computed at any time after the initiation of A follow-up; it provides a measure of the proportion of patients that escaped a death from cancer up to that point.
For example, the standardized mortality or morbidity ratio SMR is frequently used in occupational epidemiology as a measure of risk, and the standardized death rate is commonly used in comparing mortality experiences of different populations or the same population at different times.
The concept of the SMR is very similar to that of the relative survival rate described above. The standardized morbidity ratio can similarly be calculated simply by replacing the word deaths by disease cases in 4. If only new cases are of interest, we call the ratio the standardized incidence ratio SIR.
If the populations are similar with respect to demographic variables such as age, gender, or race, the crude rate, or ratio of the number of persons to whom the event under study occurred to the total number of persons in the population, can safely be used for comparison.
The level of the crude rate is affected by demographic characteristics of the population for which the rate is computed. If populations have different demographic compositions, a comparison of the crude rates may be misleading. This is mainly because there is a large proportion of older people in Sunny City.
A crude death rate of a population may be relatively high merely because the population has a high proportion of older people; it may be relatively low because the population has a high proportion of younger people.
Thus, one should adjust the rate to eliminate the effects of age, gender, or other differences. The procedure of adjustment is called standardization and the rate obtained after standardization is called the standardized rate. The most frequently used methods for standardization are the direct method and the indirect method. Direct Method In this method a standard population is selected. The distribution across the groups with different values of the demographic characteristic e.
Let r ,. Let p ,. The formula for the direct standardized rate is G G I R: If we choose a standard population whose distribution is shown in the second column of Table 4. These standardized rates are more reliable than the crude rates for comparison purposes. In this case, it is possible to standardize the rate by an indirect method if the following are available: The number of persons to whom the event being studied occurred D in the population. For example, if the death rate is being standardized, D is the number of deaths.
The distribution across the various groups for the population being studied, denoted by n ,. The crude rate of the standard population, denoted by r. Thus, the indirect method adjusts the crude rate of the standard population by the ratio of the observed to expected number of persons to whom the event occurred in the population under study.
The U. The crude death rate of Oklahoma 9. However, the indirect standardized rates show a reverse relationship 8. This, again, is because of the differences in age distribution. There is a higher proportion of people below the age of 25 in Arizona and a higher proportion of people above the age of 54 in Oklahoma.
Data from Grove and Hetzel G G 1 1—4 5—14 15—24 25—34 35—44 45—54 55—64 65—74 75—84 85; Total Crude rates per thousand Observed deaths Expected deaths? Standardized rate per thousand Age Standard Population U. Hence, this selection should be done carefully. When discussing death rate by age, Shryock et al.
If the death rate of two populations is being compared, it is best to use the average of the two distributions as a standard. No matter which method is used, standardized rates are meaningful only when compared with similarly computed rates. Berkson , Berkson and Gage , Cutler and Ederer , and Gehan have written classic reports on life-table analysis.
Peto et al. The term life-table analysis that they use includes the PL method. Other references on life tables are, for example, Armitage , Shryock et al. Relative survival rates and corrected survival rates have been used by Cutler and co-workers in a series of survival studies on cancer patients in Connecticut in the s and s Cutler et al. Discussions of SMR, standardized rates, and related topics can be found in many standard epidemiology textbooks: Compare your results with Figure 3.
What is the median survival time? Compute and plot the PL estimates of the survivorship functions for each group. What is the median survival time for each? Measurements less than 10;10 5;5 for mumps are considered negative. Exercise Table 4. Provide a life-table like Table 4. Plot the three survival functions. Parker et al.
Copyright American Medical Association. Do a complete life-table analysis of the survival time. Compute the direct standardized death rate for the states of Oklahoma and Montana using the U. Population, thousands Proportion, p G 4, 16, 35, 24, 22, 24, 20, 15, 10, 4, , 0. Grove and Hetzel Shryock et al. A laboratory researcher may want to compare the tumor-free times of two or more groups of rats exposed to carcinogens. A diabetologist may wish to compare the retinopathy-free times of two groups of diabetic patients.
A clinical oncologist may be interested in comparing the ability of two or more treatments to prolong life or maintain health. Almost invariably, the diseasefree or survival times of the different groups vary.
These differences can be illustrated by drawing graphs of the estimated survivorship functions, but that gives only a rough idea of the difference between the distributions.
A statistical test is necessary. In Section 5. Section 5. Let x ,. In group 2, let y ,. For example, the Wilcoxon test or the Mann—Whitney U-test can test the equality of two independent populations, and the sign test can be used for paired or dependent samples Marascuilo and McSweeney, All the tests are designed to handle censored data; data without censored observations can be considered a special case.
For the purpose of illustration, let us assume that the alternative hypothesis is H: From either 5.
Otherwise, a continuity correction of 0. Since W has an asymptotically normal distribution with mean zero and variance in 5. Then U: Example 5. At the end of two years, the following times to relapse or remission times in months are recorded: CMF group 1: In fact, the approximate p value corresponding to Z: Here R t is called the risk set at time t.
Rank from left to right, omitting censored observations 1 Step 2. Assign next-higher rank to censored observations Step 3. Reduce the rank of tied observations to the lower rank for the value Step 4. Rank from right to left 10 Step 6. Reduce the rank of tied observations to the lowest rank for the value Step 7.
Reduce the rank of censored observations to 1 Step 8. From group 1. The total number of observations, failure or censored in R t , is r: The following example illustrates the procedure. There are k: The p value corresponding to Z: The scores are functions of the logarithm of the survival Table 5. Censored observations receive negative scores. The w scores sum identically to zero for the two groups together.
The logrank test is based on the sum S of the w scores of the two groups.
The permutational variance of S is given by Var S: S against? The following example illustrates the computational procedures.
For example, at t: For an uncensored observation w: The 10 scores w sum to zero, G which can be used to check the computation. From sample 2. The statistic S: The variance of S, computed by 5. Hence, the test statistic L: The logrank statistic S can be shown to equal the sum of the failures observed minus the conditional failures expected computed at each failure time, or simply the difference between the observed and expected failures in one of the groups.
A similar version of the logrank test is a chi-square test which compares the observed number of failures to the expected number of failures under the hypothesis.
The number of deaths expected at an uncensored time is obtained by multiplying the deaths observed at that time by the proportion of patients exposed to risk in the treatment group.
The remission times in months are: Consider the following null and alternative hypotheses: Thus, d: Names treat: Similar to the logrank test, this test assigns a score to every observation. For an uncensored observation t, the score is u: G S t; ; S t9 9 1, and for an observation censored at T, the score is u: S T 9 1, where S is the Kaplan—Meier estimate of the survival function. G If we use the notation of Section 5.
The test procedure after the scores are assigned is the same as for the logrank test. Using the scores of group 1, we obtain S: It is for singly censored or complete samples; it is not applicable to progressively censored data.
The procedure is as follows: Group 1. Rank the observations in the combined sample. Replace the ranks by the corresponding expected order statistics in sampling the unit exponential distribution [ f t: Denote by t the PL expected value of the rth observation in increasing order of magnitude, 1 1 t: In particular, 1 t: When two or more observations are tied, the average of the scores is used.
Critical regions for testing H: The calculation of F is slightly different for singly censored data. Then there are p: Cox suggests using the scores t ,. The following example illustrates the computation. Six mice are assigned to treatment A and six to treatment B. The experiment is terminated after 30 days. The following survival times in days are recorded. Our null and alternative hypotheses are H: To compute the test statistic, it is convenient to set up a table like Table 5.
The second column contains the ordered exponential scores t. In this case, n: The scores are computed following 5. For example, t for t: The PL tied observations receive an average score: The last two columns of Table 5. Thus t: They can be further grouped into two categories: In the logrank test, if the statistic S is the sum of w scores in group 2, it is the same as U of the Cox—Mantel test. This can be seen in Examples 5.
There is little difference between the Cox—Mantel and logrank tests and between the two generalized Wilcoxon tests. When the samples are taken from Weibull distributions with a constant hazard ratio i. However, when the hazard ratio is nonconstant, the two generalizations of the Wilcoxon test have more power than the other tests.
Thus, the logrank test is more powerful than the Wilcoxon tests in detecting departures when the two hazard functions are parallel proportional hazards or when there is random but equal censoring and when there is no censoring in the samples Crowley and Thomas, The generalized Wilcoxon tests appear to be more powerful than the logrank test for detecting many other types of differences, for example, when the hazard functions are not parallel and when there is no censoring and the logarithm of the survival times follow the normal distribution with equal variance but possibly different means.
The generalized Wilcoxon tests give more weight to early failures than later failures, whereas the logrank test gives equal weight to all failures. Therefore, the generalized Wilcoxon tests are more likely to detect early differences in the two survival distributions, whereas the logrank test is more sensitive to differences in the right tails.
If heavy censoring exists, the test statistic is dominated by a small number of early failures and has very low power. There are situations in which neither the logrank nor Wilcoxon test is very effective.
When the two distributions differ but their hazard functions or survivorship functions cross, neither the Wilcoxon nor logrank test is very powerful, and it will be sensible to consider other tests. For example, Tarone and Ware discuss general statistics of similar form using scores and Fleming and Harrington and Fleming et al. The latter approach is shown to be more effective than the logrank or Wilcoxon tests when two survival distributions differ substantially for some range of t values, but not necessarily elsewhere.
These statistics have not been widely applied. Interested readers are referred to the original papers. The test has been used in many clinical and epidemiological studies as a method of controlling the effects of confounding variables. For example, in comparing two treatments for malignant melanoma, it would be important to adjust the comparison for a possible confounding variable such as stage of the disease.
In studying the association of smoking and heart disease, it would be important to control the effects of age. Let s be the number of strata, n be the number of individuals in group j, HG j: For each of the s strata, the data can be represented by a 2;2 contingency table: Thus, the test permits simultaneous GH comparison over all the s contingency tables of the difference in survival or death probabilities for the two groups.
This statistic follows the chi-square distribution with 1 degree of freedom. The following two examples illustrate the use of the test.
The study subjects are then divided into two strata: The following tables give the data for smokers: Elevated Cholesterol? The null hypothesis is that the two survival distributions are the same. It is not necessary to set up 10 contingency tables for the 10 intervals. The chi-square value is easily calculated by constructing columns 7 to 12 directly from the life table. It should be noted that this chi-square test statistic, when applied to life tables, gives more weight to those deaths that occur in an early time interval rather than later.
That is, if the two groups are subject to the same probability of surviving through the entire study period, 5. Mantel gives the following illustration. Consider two groups of persons each. Both have 50 deaths. Group 1 2 Total Deaths Survivors Total 50 0 50 50 Deaths Survivors Total 0 50 50 50 50 50 and for the second interval is: Group 1 2 Total From these two tables we have E d: The total deaths expected is 25 ; The problem is to decide whether the K independent samples can be regarded as coming from the same population, or in practical terms, to see if the survival data from patients receiving the K treatments provide enough evidence to conclude that the K treatments are not equally effective.
This problem has been considered by many statisticians: In this section two nonparametric tests for the problem are presented. The second is a generalization of the H-test for censored data Peto and Peto, Both use ranks instead of the original observations and are simple to apply.
Let N be the total number of independent observations in the K samples, n the number of observations in the jth sample, j: Let r be the rank of t. Compute, for j: When K: To correct for the effects of ties, H is computed by 5. In counting g, an untied observation is considered as a tied group of size 1. Thus, a general expression of H corrected for ties is H: The following example illustrates the use of the test. Table 5. The purpose of the study is to decide if the three diets are equally effective in controlling cholesterol level.
In H this case N: In Example 5. The investigator may also be interested in knowing which particular diets differ from one another.
In this section we introduce some nonparametric methods for multiple comparison based on Kruskal—Wallis rank sums. An excellent treatment of multiple comparisons is given by Miller The null hypothesis can be written as H: When sample sizes are equal, that is, n: For cases of small unequal sample sizes n ,.
Values of x are given in? Table B When n ,. To examine which particular diets differ from one another, we apply 5. Since K: The calculation is shown in Table 5. For K: The K-sample test discussed in this section can be considered an extension of these tests and the Kruskal—Wallis test.
Suppose that we have a set of N scores w , w ,. The sum of the N scores is zero. Let S be the sum of the scores in the j th sample. The H null hypothesis H states that the K samples are from the same distribution. In this case, K: A table similar to Table 5. The computation is left to the reader as an exercise. The sums of scores in the three samples are S: The scores for the logrank test was proposed by Peto and Peto in along with another generalization of the Wilcoxon test.
In the same paper, they also discuss the K-sample test for censored data. The logrank test is also discussed in Peto et al. The Kruskal—Wallis one-way analysis of variance can be found in most standard textbooks under nonparametric methods.
Readers who are interested in the theoretical development or more properties of these tests should read the original papers cited above. Applications of these tests are given in the original papers or can easily be found in medical and epidemiological journals. Do you get the same result as in Example 5. Is elevated percent standard BMI associated with renal cell carcinoma after controlling the effects of gender? Exercise Table 5. Standard BMI: Percentage of standard BMI: It is known that under Exercise Table 5.
The students were randomly assigned to the three levels. If they are, determine which levels differ from one another. Are the four treatments equally effective? In this chapter, several theoretical distributions that have been used widely to describe survival time are discussed, their characteristics summarized, and their applications illustrated. In the late s, researchers began to choose the exponential distribution to describe the life pattern of electronic systems.
Davis gives a number of examples, including bank statement and ledger error, payroll check errors, automatic calculating machine failure, and radar set component failure, in which the failure data are well described by the exponential distribution.
Epstein and Sobel report why they select the exponential distribution over the popular normal distribution and show how to estimate the parameter when data are singly censored.
The exponential distribution has since continued to play a role in lifetime studies analogous to that of the normal distribution in other areas of statistics. The exponential distribution is often referred to as a purely random failure pattern. Figure 6. When natural logarithms of the survivorship function are taken, log S t: Example 6. The system consists of injecting a tumor inoculum into inbred mice.
These tumor cells then proliferate and eventually kill the animal, but survival time may be prolonged by an active drug. Table 6. From Zelen, Estimation procedures are discussed in Chapter 7. Zelen However, unlike the exponential distribution, it does not assume a constant hazard rate and therefore has broader application. The distribution was proposed by Weibull and its applicability to various failure situations discussed again by Weibull It has then been used in many studies of reliability and human disease mortality.
Thus, the Weibull distribution may be used to model the survival distribution of a population with increasing, decreasing, or constant risk. Examples of increasing and decreasing hazard rates are, respectively, patients with lung cancer and patients who undergo successful major surgery. The probability density function and cumulative distribution functions are, respectively, f t: For the survival curve, it is simple to plot the logarithm of S t , log S t: Equation 6.
C be written as log[9log S t ]: The two groups were distinguished by pretreatment regime. The times in days, after the start of the experiment, at which the carcinoma was diagnosed for the two groups of rats were as follows: Group 1: The step functions are nonparametric estimates similar to the Kaplan—Meier Figure 6. From Pike, Reproduced with permission of the Biometrics Society.
Pike Reproduced with permission of the Biometric Society.
It is obvious that the Weibull distributions with G: Its origin may be traced as far back as , when McAlister described explicitly a theory of the distribution. Most of its aspects have since been under study. Its history, properties, estimation problems, and uses in economics have been discussed in detail by Aitchison and Brown Therefore, the lognormal distribution is suitable for survival patterns with an initially increasing and then decreasing hazard rate. By a central limit theorem, it can be shown that the distribution of the product of n independent positive variates approaches a lognormal distribution under very general conditions: The popularity of the lognormal distribution is due in part to the fact that the cumulative values of y: The hazard function, from 6.
The two-parameter lognormal distribution can also be generalized to a three-parameter distribution by replacing t with t 9 G in 6. In certain situations the value of G may be determined a priori and should not be regarded as an unknown parameter that requires estimation.
If this is so, the variable T 9 G may be considered in place of T and the distribution of T 9 G has all the properties of the two-parameter lognormal distribution.
However, the estimation procedures developed for the two-parameter case are not directly applicable to the distribution of T 9 G. From Feinleib and MacMahon, Reproduced by permission of the publisher.
The analysis of several subgroups of patients follows. The survival time of each patient is computed from the date of diagnosis in months. The method is discussed in Chapters 7 and 8. When plotting 1 9 S t on this graph paper, a straight line is obtained when the data follow a two-parameter lognormal distribution. An inspection of the graph shows that the distribution is concave. Gaddum a, b has pointed out that such a deviation can be corrected by subtracting an appropriate constant from the survival times.
In other words, the threeparameter lognormal distribution can be used. Similar graphs for male patients with chronic myelocytic leukemia and for female patients with chronic lymphocytic or myelocytic leukemia are given in Figures 6.