Tutorial 2

How to evaluate an article about a diagnostic test

Step 1: Are the results valid?

When looking at the validity of the study, your goal is to determine if you can believe the results of the study and if the reported diagnostic accuracy is true.

Step 1a: Was there an independent, blind comparison with a reference standard?

Those interpreting the results of the new diagnostic test should not be aware of the results of the reference test. Knowing the results of one study influences how well subsequent studies perform. Seeing the results of a CT scan may influence your ability to pick up a AAA on a subsequent physical exam.

In order to determine the accuracy of a test it must be compared with the "gold standard" for determining the diagnosis (biopsy, autopsy, long term follow-up or a widely accepted test). If you compared a new blood test for the diagnosis of prostate cancer to PSA then you would not know the accuracy of this test for the diagnosis of prostate cancer (biopsy is the gold standard) but only how it compares to PSA testing.

Step 1b: Did the patient sample include an appropriate spectrum of patients?

A given test is only valuable if it can distinguish between having and not having a disorder in the same population in which the test would be clinically useful. A blood test shown to be positive in 80% of a population of patients with ST elevation MI may not be useful in a general population of patients with chest pain. A more useful study might look at the blood test in patients suspected of having MI.

Step 1c: Did the test results influence the decision to perform the reference standard?

The characteristics of the study test will change if the results influence whether or not a confirmatory test is performed. This "verification bias" can be problematic when the reference standard is invasive or harmful and clinicians are reluctant to have patients undergo a potentially harmful test. If only patients with a positive screening test for PE underwent pulmonary angiography then you would not know how many PE's were missed in the group that did not have angiography.

Step 1d: Were the methods for performing the test described in sufficient detail?

You must be able to replicate the exact test performed in the study in order to apply it to your patients. The technique for analysis and interpretation must be described especially with more subjective tests.

Step 2: What are the results?

Step 2a: What is the likelihood ratio (LR)

The LR expresses the accuracy with which a diagnostic test identifies a disorder. The LR should be calculated for each level of a result if the test does not give a yes/no or positive/negative answer (such as with VQ scans). The LR tells you how likely a given test result is in patients with a disease compared with how likely the test result is in patients without the disease. When calculating the LR (whether positive or negative) the proportion of patients with disease will always be in the numerator and the proportion without the disease will be in the denominator. A LR of 10 tells you that a given test result is 10 times more likely to occur in a patient with as opposed to without a disease.

Example of LR from the article on D-Dimer sensitivity and specificity:

 D-dimer result  PE present  PE absent
 positive  167  310
 negative  30  670
 total  197  980

LR+ = positive test in those with disease / positive test in those without disease

LR+ = (167/197) / (310/980) = sensitivity / (1-specificity) = 2.7

LR- = negative test in those with disease / negative test in those without disease

LR- = (30/197) / (670/980) = (1-sensitivity) / specificity = 0.22

How useful a LR is depends on how much it will raise or lower the pretest probability of the disorder. LRs > 1 increase the liklihood of a disorder and LRs < 1 decrease the liklihood of a disorder. In general, LR's less than or equal to 1 and LR's greater than 10 will be robust enough to significantly change the pretest probability. The easiest way to convert pretest probability to post-test probability is to use the Fagan nomogram (click here to use an interactive nomogram). Otherwise you need to convert the pre-test probability to odds (probability/[1-probability]) and multiply by the LR to get the post-test odds. The post-test odds can then be converted to probability (odds/[odds+1]). Using LRs obviates the need for using sensitivity and specificity.

Step 2b: Sensitivity and Specificity

Although the literature is moving towards reporting LRs, older studies will still put results in terms of sensitivity and specificity. Sensitivity is the proportion of patients with a disorder who have a positive test, and specificity is the proportion of patients without a disorder who have a negative test. A test with high sensitivity is useful for ruling out (Snout = "sensitivity rules out") a disorder and a test with a high specificity is useful for ruling in (Spin = "specificity rules in") a disorder. These terms are more limiting than LRs because test results must be dichotomous or able to fit in a 2x2 table in order to calculate sensitivity and specificity. A good example of these limitations comes from the Pioped study in which the authors had to compress the results of 4 possible VQ scans (high prob, intermediate prob, low prob, and normal) into either positive or negative in order to calculate the sensitivity and specificity. LRs would have allowed them to keep each test result separate and calculate the likelihood of PE for any given VQ scan result.

Example of sensitivity and specificity from the above table:

Sensitivity = proportion of patients with disease who have a positive test

Sensitivity = 167 / (167+30) = 167/197 = 85%

Specificity = proportion of patients without disease who have a negative test

Specificity = 670 / (310+670) = 670/980 = 68%

Step 3: Will the results help in caring for my patients?

A test must be reproducible in your clinical setting in order to be useful. If the test requires highly skilled individuals to interpret or observer disagreement is high, the test may be less useful outside of the study setting. Test properties can change in different sub populations with more or less severe disease or in the presence of other conditions that influence the test result. You must compare your population of patients with the study population and examine the inclusion and exclusion criteria. If your patients are similar enough to the study population then you need to consider if the test result is robust enough to move your pre-test probability to a post-test probability that would alter treatment. In other words, would the test result influence the management of an individual patient?


1. Jaeschke R, Guyatt G, Sackett DL. Users' guides to the medical literature, III: how to use an article about a diagnostic test, A: are the results of the study valid? JAMA. 1994;271:389-391.

2. Jaeschke R, Guyatt G, Sackett DL. Users' guides to the medical literature, III: how to use an article about a diagnostic test, B: what are the results and will they help me in caring for my patients? JAMA. 1994;271:703-707.

3. Dawson-Saunders B, Trapp RG. Basic and Clinical Biostatistics. Norwalk, CT: Appleton and Lang , 1994.