Tutorial 4

How to evaluate a Systematic Review or Meta-Analysis

Step 1: Are the results valid?

Similar to evaluating articles on treatment, diagnosis, clinical decision rules, and harm there are several key questions to ask in order to determine if the results can be useful to your clinical practice.

Step 1a: Did the study ask a focused question?

Questions addressed by review articles usually pertain to certain patient populations undergoing treatment, a diagnostic test, or harmful exposure and the associated outcome. The relevant question should be obvious from the title or abstract of the article and be as specific as possible.

Step 1b: Were appropriate inclusion criteria used to select the primary studies?

The specific criteria used to find primary studies should be outlined in the review article. This should include not only the patient population, exposure, and outcome of interest, but also should include the methods for finding the primary studies such as a medline search using the terms of interest. With a vast database of studies such as medline, it is easy to find studies that will support preconceived conclusions. A good review will explicitly state inclusion criteria and include all appropriate articles regardless of outcome. A company that manufactured an anti-platelet agent would be tempted to produce a systematic review of only articles that showed a positive benefit on mortality in MI and ignore the rest.

Step 1c: Were important relevant studies missed?

The authors of reviews not only need to search databases such as medline but also examine article references and contact experts in the field in order to find unpublished or not yet referenced studies. Not everyone agrees on whether or not to include unpublished articles in reviews but not doing so increases the risk of "publication bias" in which only articles with positive outcomes are published.

Step 1d: Was the validity of primary studies examined?

The strategy used to determine the validity of the primary studies should be stated and should be similar to the criteria that you would use in appraising original research. In a review of a new therapy, the methods for establishing validity of the primary studies should be the same as those used when looking at the studies individually as described in the tutorial on treatment.

Step 1e: Was there reproducible assessment of the primary studies?

Deciding the validity of studies is a subjective decision open to variability among investigators. Having more than one person analyze the primary studies to determine what to include in the review reduces errors and bias.

Step 1f: Were the results of primary studies similar?

Most studies included in reviews have differences in patients, exposures, outcome measures, and research methods that make it difficult to compare the studies. There are statistical tests (tests of homogeneity) used in meta-analyses to determine whether the differences in results from study to study are due to chance alone. When there is statistically significant heterogeneity, it is less likely that differences in studies are due to chance and more likely that the studies are not measuring the same underlying effect and have important differences in patients, exposures, outcome measures, or research methods. A nonsignificant test does not rule out important differences between individual studies.

Step 2: What are the results?

Reporting the number of positive studies and negative studies is not a reliable method for summarizing the results. Instead, primary studies are typically given more or less weight based on their size and quality. Reviews of treatment or prognostic factors often use odds ratio (OR) to summarize the results. OR can be expressed as the odds of having an outcome in the exposed group divided by the odds of having an outcome in the control group or the odds that a patient with an adverse outcome was exposed divided by the odds that a control patient was exposed.

If the outcome measures of different studies are not the same, the results of each study can be shown as "effect size" and then the effect sizes from each study can be combined. The effect size tells you how much difference there is between the control and intervention groups and is calculated by the difference in outcomes between the two groups divided by the standard deviation (SD). Although effect sizes are open to interpretation, an effect size of 0.8 is said to be a large effect (the two groups are separated by 0.8 SDs), 0.5 a moderate effect, and 0.2 a small effect.

Step 3: Will the results help me in caring for my patients?

As with the tutorials on treatment and diagnostic tests you need to determine if the results are applicable to your patients. Look at the patient characteristics, subgroups and inclusion and exclusion criteria to see if these match your patients. In a large review, there will more likely be a broad range of patients but the treatment may be more general than you would like. If the review was of glycoprotein IIbIIIa inhibitors for MI you may wonder if certain drugs in this category would have more of an effect than others or if oral inhibitors are the same as the IV preparation. Focused reviews cannot stand alone when deciding on individual management. All clinically relevant outcomes as well as risks and benefits of potential therapies, diagnostic tests, information on prognosis, and risk factors need to be weighed when making an appropriate clinical decision.

References

1. Oxman AD, Cook DJ, Guyatt GH. Users' guides to the medical literature, VI: how to use an overview. JAMA. 1994;272:1367-1371.

2. Dawson-Saunders B, Trapp RG. Basic and Clinical Biostatistics. Norwalk, CT: Appleton and Lang , 1994.