Tutorial 3

Evaluating Clinical Decision Rules



Clinical decision rules (CDRs; also referred to as clinical prediction rules) are tools designed to augment the clinican's abilities in several different areas. They are used for estimating the probability of a disease or outcome or to suggest a diagnostic or therapeutic course of action. Investigators who publish articles on CDRs have done 2 steps at minimum: deriving the rule and applying the rule to a population. CDRs are designed for use at the bedside in contrast to decision analysis which is designed for use in health care policy. Formal decision analysis will not be discussed here.

Step 1: Evaluate the study design

A series of tasks are involved when developing a CDR. A list of at least 3 potential predictors (obtained from history, physical examination and sometimes simple diagnostic tests) of the outcome* is assembled. Data (predictors and outcomes) are collected on a sample of patients. Statistical analysis is applied to detect which predictors are most strongly related to the outcome and should be included in the CDR. The rule itself is stated. The rule is then applied to a patient sample. In some cases an impact analysis is performed. Wasson et al suggested criteria for the systematic evaluation of clinical decision rules in 1985 (1). These criteria were further refurbished by Laupacis et al in 2000 (2) and serve as the basis for this tutorial.

*The use of the word outcome in this tutorial pertains to the outcome of interest being studied. It may be a patient's outcome, diagnosis, positive test result, or any number of other entities.

Step 1a: Will the rule serve a purpose if it is valid? Does the rule make clinical sense?

CDRs are most useful in situations which are complex, in which a missed diagnosis or treatment would be harmful, or when there is a potential opportunity for savings. Evaluating sensibility of the rule requires knowledge and experience with the clinical subject matter. It involves judgment rather than statistical methods. Rules that are simply stated, easy to use, and suggest a clear course of action are more likely to be used by clinicians.

Step 1b: Is the outcome of interest clearly defined and clinically important?

Death would be an example of a clearly defined and clinically important outcome which all clinicians would be able to recognize in their own patient samples when using the rule. If a surrogate outcome is used (eg DVT instead of PE) it must have a definite relationship to a clinically important outcome.

Step 1c: Are the predictors clearly defined, sensible, reliable and reproducible as determined by different clinicians?

Prediction rules will be used by a wide range of clinicians. Interobserver variability for evaluating predictors is extremely important. Some symptoms and signs and diagnostic tests are especially subject to high interobserver and intraobserver variability. Unreliable predictors should be excluded from CDRs.

Step 1d: How did the authors construct the list of potential predictors? (based on previous studies?, derivation from univariate analysis?--see below) Did the authors include all elements that might have predictive power when creating the rule?

One cannot answer these questions unless the authors report how they chose the predictors, which ones were excluded and why. There are several reasons why predictors may be excluded from a CDR. The variable may have had no predictive value. The variable may have predictive value on its own, but not above that of the other predictors. The predictor may be excluded because it is unreliable.

Step 1e: Were those who assessed predictors blinded to outcomes (and vice versa)?

The importance of blinding depends upon the clinical scenario. If there is no mention of blinding you can assume that it was absent from the study design.

Step 1f: Was the study site described?

The type of study site, including the types of patients and clinicians involved in the derivation of the rule, may impact the rule's applicability at other clinical sites. The type of institution (primary, secondary or tertiary and teaching or community) and the clinical setting (emergency department, clinic, inpatient ward, etc) should be reported. A description of how patients were selected for enrollment (or referral filter) should also be provided.

Step 1g: Were the data collected prospectively?

Retrospective data collection is generally not as complete or refined as prospective data collection.

Step 2: Evaluate the results

Step 2a: Does the rule have predictive power? Look at the results and statistical method(s) used to derive the rule. Were all relevant results reported and described? Were the statistical tests used and analysis appropriate?

Results can be reported in a number of ways and how they are reported depends upon what is being studied. All results should be reported with 95% confidence intervals so that one can assess the precision of the estimates. The sensitivity** and specificity of the rule itself may be reported. Clinicians are generally more interested in the probability or likelihood of their patient having or not having the outcome of interest than of knowing the sensitivity and specificity of a rule. Results may be reported in terms of the probability of the outcome using likelihood ratios, positive and negative predictive values, receiver operator curves (ROC), and survival curves.

** Statistical terms are highlighted in italics. See appendix for further description of these terms.

Most statistical techniques used in the development of CDRs are based on multivariate regression analyses. Regression analysis looks at the ability of certain chosen factors (independent variables) to predict or cause an outcome (dependent variable). When analyzing a regression analysis, it is more important to judge the rule by the magnitude of its predictive power than by whether the relationships are statistically significant. Other statistical methods used for creating CDRs include discriminant analysis (similar to regression analysis), recursive partitioning, and neural networks. Investigators should clearly describe the mathematical method used to derive the CDR and the reason for choosing that particular method.

Step 2b: Was there an adequate sample size, including an adequate number of positive outcomes?

A general rule is that there should be at least 10 positive outcome events for every predictor in the rule (2).

Step 2c: Were the important predictors present in a significant proportion of the study population?

Step 2d: Were important patient characteristics reported?

At a minimum age and gender of the patient sample should be reported. Other factors that will affect predictors and outcomes of interest (and therefore the performance of the rule) should be reported.

Step 2e: Was interobserver reliability of the rule assessed and reported?

2 examiners would need to be present at the same time which may be impossible with a large study population. The solution is to study a subset of patients with at least 2 investigators. Interobserver variability should be assessed for both predictor and outcome detection. A kappa (k) greater than 0.6 indicates good interobserver reliability.

Step 3: Has the rule's validity been tested? If no, proceed to step 5.

Step 3a: Was the rule applied to a narrow or broad patient sample?

A narrow application would be using the rule on the same population from which the rule was derived or on a small or very similar patient sample. A broad application would be using the rule on a large and varied number of patients and clinicians in a variety of settings. The patients in a broad sample have a wide spectrum of severity of disease.

Step 3b: Were the patients in the validation sample enrolled in an unbiased manner?

Step 3c: Were different clinicians involved in the validation study? Was interobserver reliability of the rule assessed and reported?

Step 3d: Were those assessing predictors blinded to outcomes (and vice versa)?

Step 3e: Was the rule applied in a prospective or retrospective fashion?

Step 3f: Was the rule consistent and valid when applied?

The rule may not be consistent when tested on other patients for several reasons. Associations between predictors and outcomes stated by the rule may have actually occurred by chance when it was derived. Statistical testing can usually predict if this problem would occur. The rule may only be true for certain groups of patients. Clinicians may not be able to accurately and consistently apply the rule in practice. This is why validation of the rule in other patient samples is so important.

Step 3g: Was there excessive loss to follow-up?

Step 4: Did the rule have an impact? (Has an impact analysis been performed? if no, proceed to step 5)

In order to assess the rule's effects in real life an impact analysis must be performed. An impact analysis involves randomizing patients in the form of larger units (eg hospitals or clinics) to implementing or not implementing the CDR. Outcomes are followed for the 2 groups (quality of life, morbidity, mortality, cost savings etc). The patient population should be different from the one(s) from which the rule was designed and validated. The process of implementing the rule and outcome should be studied.

Step 4a: Did clinicians actually use the rule?

Even the best CDR may not be used by a clinician. Clinicians may find the rule extraneous (perhaps they feel that they can estimate the outcome of interest as accurately or more accurately than the rule does) or complicated(they may not have the time to do calculations or they may be concerned that they will not apply the rule correctly). Despite good evidence for the rule they may not believe in the CDR. They may feel that their practice setting and patient population differs too much from the study sample to be able to apply the rule. If the rule worsens the financial status of the clinician or institution it may not get used. The clinician may fear patient resistance to the rule or legal implications and subsequently not use it. As mentioned in step 1, rules that are simply stated, easy to use and suggest a course of action are more likely to be implemented.

Step 4b: Did the rule work in practice? Did it improve outcomes or decrease costs while maintaining quality of care?

Step 5: Grade the rule

You are now ready to grade the rule ranging from 1-4, where 1 is the best. This hierarchy of "best" evidence is described in more detail in JAMA's user's guide to the literature(3). A rule must satisfy the criteria of the level below prior to graduating to a level higher. CDRs can be graded as follows:

Grade 1: Rules that will improve patient care by affecting clinician practice in a wide variety of settings.

Criteria: Impact analysis demonstrating change in clinician practice and beneficial outcome with actual use.

Grade 2: Rules that have been demonstrated to be accurate and can be used in various settings.

Criteria: Rule was accurate in a large prospective sample with a variety of patients and clinicians or in a number of smaller varied populations.

Grade 3: Rules that clinician could reasonably consider using with caution in patients similar to those studied.

Criteria: Rule has been derived and validated in only 1 narrow prospective sample.

Grade 4: Rules that need further validation before they can be used clinically.

Criteria: Rule has been derived but not yet validated or validated prospectively.

Appendix: Biostatistics

Sensitivity refers to the proportion of patients with the outcome in whom the results of the decision rule are abnormal. Specificity refers to the proportion of patients without the outcome in whom the results of the decision rule are normal. To calculate sensitivity and specificity results must be dichotomous. The importance of a high sensitivity or specificity depends on the particular clinical circumstances. The majority of the time, clinicians would sacrifice specificity for sensitivity (eg they would rather not miss anyone with the outcome of interest and have the rule be less accurate) (4).

ROC curves are a method for displaying the relationship between sensitivity and specificity for tests that have continuous outcomes. ROC curves are created by plotting sensitivity against the false positive rate (1-specificity). The closer an ROC curve is to the left upper corner of the graph the more accurate the test (or in this case rule). ROC curves are of limited utility because they do not allow for an estimate of post-rule probability. They are most useful for comparing 2 tests (5).

See tutorial 2 for a description of likelihood ratios.

Survival curves are useful when one wants visual information about the frequency of a disorder over time (4).

The dependent variable is also termed the target or response variable because it is influenced or determined by other variables. The explanatory or predictor variables are termed independent variables. Independent and dependent variables may be continuous (such as numbers) or dichotomous/binary (such as gender). The simplest method of regression analysis would be reporting each predictor variable with the outcome of interest in a 2x2 table. This method, a univariate or simple regression analysis, allows only one independent variable to be studied at a time. An equation can be constructed which relates the independent(x) and dependent variable(y). The equation is y = a + bx, where if plotted on a graph x is the x axis, y is the y axis, a is the y intercept, and b is the slope (also referred to as the regression coefficient). A t test may be used to see whether there is a significant relationship between x and y by testing whether b is different from 0. The relationship appears stronger when the data points come together in the form of a straight line. A regression analysis is referred to as linear regression when both the dependent and independent variables are continuous. When the dependent (outcome) variable is dichotomous (eg death or no death) the analysis is referred to as logistic regression. The risk of developing the outcome can be reported as the logarithmic odds which allows for the calculation of an odds ratio(see tutorial 5). Univariate regression analysis has the advantage of simplicity, but does not allow one to compare relationships among independent variables (predictors) (5).

A multivariate or multiple regression analysis is a mathematical model which simultaneously considers a number of independent variables and can demonstrate the proportion of variation in the outcome (dependent variable) associated with each independent variable and by all of the independent variables together. P values are used to report statistical significance. A p value represents the probability that differences between a study and control group would occur if no true difference actually exists in the larger population from which these groups were sampled. When analyzing a regression analysis, it is more important to judge the rule by the magnitude of its predictive power than by whether the relationships are statistically significant (5).

Other statistical methods used for creating decision rules include discriminant analysis (similar to regression analysis), recursive partitioning, and neural networks. Discriminant analysis is chosen when the outcome variable is categorical. Recursive partitioning successively divides patients into subgroups which eventually results in 1 or more levels that include only patients with a particular outcome. This approach is particularly useful for the creation of the rule requiring a very high sensitivity. Discriminant analysis and multiple logistic regression analysis are especially well suited in deriving rules with a desire for extreme accuracy. Neural networks are infrequently used (4).

References

1. Wasson JH, Soc HC, Neff RK, et al. Clinical prediction rules: application and methodological standards. N Engl J Med. 1985;313:793-799.

2. Laupacis A, Sekar N, Stiell IG. Clinical prediction rules. JAMA. 1997;277(6):488-494.

3. McGinn TG, Guyatt GH, Wyer PC, et al. Users' guides to the medical literature XXII: How to use articles about clinical decision rules. JAMA. 2000;284(1):79-84.

4. Guyatt G, Walter S, Shannon H, et al. Basic statistics for clinicians: 4. correlation and regression. Can Med Assoc J. 1995;152(4):497-504.

5. Basic & Clinical Biostatistics. 2nd Edition. Dawson-Saunders B and Trapp RG. Appleton & Lange. east Norwalk, CT. 1994.

Other Links

http://med.mssm.edu/ebm examines some commonly used clinical decision rules.