'Level of Evidence': Meaning and Historical Context

The concept of 'level of evidence' is a cornerstone in the realm of evidence-based practices, helping to streamline the decision-making process by systematically grading the quality of available evidence. This hierarchical system in EBM generally have placed high-quality randomized controlled trials (RCTs) at the top of the evidence pyramid and assigns the lower rungs to expert opinions and case studies. Over time, this system has undergone various adaptations, with different organizations proposing their own grading structures to meet specific needs.

The first emergence of the 'levels of evidence' concept can be traced back to a 1979 report by the Canadian Task Force on the Periodic Health Examination. The objective of this report was to develop recommendations for periodic health exams, grounding them in evidence culled from medical literature. The authors devised a system to rate evidence, determining the effectiveness of particular interventions. This pioneering system categorized evidence into three tiers: Level I for randomized controlled trials (RCTs), Level II for well-designed cohort or case-control studies and dramatic results from uncontrolled studies, and Level III for expert opinions.


Canadian Task Force on the Periodic Health Examination’s Levels of Evidence*

Level Type of evidence

I At least 1 RCT with proper randomization

II.1 Well designed cohort or case-control study

II.2 Time series comparisons or dramatic results from uncontrolled studies

III Expert opinions

*Adapted from Canadian Task Force on the Periodic Health Examination. The periodic health examination. Can Med Assoc J 1979;121:1193-254


A decade later, in 1989, the levels of evidence were further described and expanded by Sackett, the father of EBM, in an article on levels of evidence for antithrombotic agents. It also places randomized controlled trials (RCT) at the highest level and case series or expert opinions at the lowest level. The hierarchies rank studies according to the probability of bias. RCTs are given the highest level because they are designed to be unbiased and have less risk of systematic errors. For example, by randomly allocating subjects to two or more treatment groups, these types of studies also randomize confounding factors that may bias results. A case series or expert opinion is often biased by the author’s experience or opinions and there is no control of confounding factors.


Levels of Evidence from Sackett*

Level Type of evidence

I Large RCTs with clear cut results

II Small RCTs with unclear results

III Cohort and case-control studies

IV Historical cohort or case-control studies

V Case series, studies with no controls

*Adapted from Sackett DL. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest 1989;95:2S–4S


In the modern era, a grading system providing strength of recommendations based on evidence has seen significant evolution. Various organizations and journals have adopted and adapted the classification system, recognizing that different specialties often require modified types and levels of evidence. 

the United States Preventive Services Task Force (USPSTF):

This current system maintains five levels of evidence but provides more detailed descriptions. Level I includes properly designed RCTs. Level II is divided into three subcategories, with II-1 including well-designed controlled trials without randomization, II-2 including well-designed cohort or case-control analytic studies, and II-3 including multiple time series designs or dramatic results from uncontrolled trials. Level III includes opinions of respected authorities, based on clinical experience, descriptive studies, or reports of expert committees.


Level I: Evidence obtained from at least one properly designed randomized controlled trial.

Level II-1: Evidence obtained from well-designed controlled trials without randomization.

Level II-2: Evidence obtained from well-designed cohort or case-control analytic studies, preferably from more than one center or research group.

Level II-3: Evidence obtained from multiple time series designs with or without the intervention. Dramatic results in uncontrolled trials might also be regarded as this type of evidence.

Level III: Opinions of respected authorities, based on clinical experience, descriptive studies, or reports of expert committees.


The Centre for Evidence-Based Medicine (CEBM), Oxford: 

This system provides a more granular classification, with different levels for systematic reviews, individual RCTs, cohort studies, case-control studies, case series, and expert opinions. It also introduces the concept of "all or none" studies, where all patients died before the treatment became available, but some now survive on it, or when some patients died before the treatment became available, but none now die on it.


1a = Systematic reviews (with homogeneity) of randomized controlled trials (RCT)

1b = Individual RCT (with narrow confidence interval)

1c = All or none.  Met when all patients died before the Rx became available, but some now survive on it; or when some patients died before the Rx became available, but none now die on it.

2a = SR (with homogeneity) of cohort studies

2b = Individual cohort study (including low quality RCT; e.g., <80% follow-up

2c = "Outcomes" research; Ecological studies

3a = SR (with homogeneity) of case-control studies

3b = Individual case-control study

4   = Case-series (and poor quality cohort and case-control studies)

5   = Expert opinion without explicit critical appraisal, or based on physiology, bench research or "first principles"


To conclude, the 'level of evidence' represents a hierarchical system pivotal to evidence-based practices, providing a structured guide to decision-making based predominantly on study design. While study design is crucial in determining the level of evidence, other factors such as the quality of study execution, consistency of results, and directness of the evidence also significantly contribute to the overall assessment of the evidence level. Despite the push for higher levels of evidence, it's important not to overlook lower-tier evidence such as case reports and case series, as they can stimulate hypothesis generation leading to more controlled studies. Moreover, certain scenarios such as the indisputable efficacy of antibiotics for wound infections may not require an RCT (All or none). As we continue to strive for quality in healthcare decisions, the evolution of 'level of evidence' remains integral to this pursuit.


* "All or none" Study

"All or none" is a term used in the context of evidence-based medicine to describe a specific type of observational study, not a randomized controlled trial (RCT). In an "all or none" study, patients with a certain condition or disease either all die before a specific treatment becomes available, but some (or all) survive after it's available; or some patients used to die before the treatment was available, but none now die with it. This is considered strong evidence of the effectiveness of the treatment, even though it's not from an RCT. For example, before the use of antibiotics, everyone with bacterial meningitis died. However, now that antibiotics are available, some people with bacterial meningitis survive. This would be an example of an "all or none" study. It's important to note that "all or none" studies are not common and are typically only seen in situations where the effect of the treatment is dramatic and unequivocal. They are ranked as Level 1c evidence in the Oxford Centre for Evidence-Based Medicine's levels of evidence.


Examples)

Assess the level of evidence of this article(https://pubmed.ncbi.nlm.nih.gov/33567185/) based on ; United States Preventive Services Task Force (USPSTF)

Once-Weekly Semaglutide in Adults with Overweight or Obesity (N Engl J Med)

 In this double-blind trial, we enrolled 1961 adults with a body-mass index (the weight in kilograms divided by the square of the height in meters) of 30 or greater (≥27 in persons with ≥1 weight-related coexisting condition), who did not have diabetes, and randomly assigned them, in a 2:1 ratio, to 68 weeks of treatment with once-weekly subcutaneous semaglutide (at a dose of 2.4 mg) or placebo, plus lifestyle intervention. The coprimary end points were the percentage change in body weight and weight reduction of at least 5%. The primary estimand (a precise description of the treatment effect reflecting the objective of the clinical trial) assessed effects regardless of treatment discontinuation or rescue interventions.

 The article titled "Effect of Vitamin D3 Supplements on Development of Advanced Cancer: A Secondary Analysis of the VITAL Randomized Clinical Trial" is a randomized clinical trial. According to the United States Preventive Services Task Force (USPSTF) grading system, this type of study design falls under Level I evidence.

This is because the study was properly designed as a randomized controlled trial, which is the highest level of evidence in the hierarchy. In this study, participants were randomly assigned to receive either Vitamin D3 supplements or a placebo, and the development of advanced cancer was tracked over time.


Assess the level of evidence of this article(https://pubmed.ncbi.nlm.nih.gov/31483927/) based on ; United States Preventive Services Task Force (USPSTF)

Women with confirmed pregnancy were identified and divided into acupuncture or control group for comparison of their outcomes. Differences in other factors such as age, and rate of high-risk pregnancy and multiple pregnancy were examined. In the acupuncture group, the most frequent acupuncture diagnosis codes and the timing of treatment were also investigated.

The article titled "Safety of acupuncture during pregnancy: a retrospective cohort study in Korea" is a retrospective cohort study. In a retrospective cohort study, researchers start with an existing group of individuals and look back in time (often using medical records or interviews) to determine exposure to certain risk factors. This is different from a prospective cohort study, where researchers follow a group of similar individuals (cohorts) who differ with respect to certain factors under study, to determine how these factors affect rates of a certain outcome. As a retrospective cohort study, it might be considered a lower level of evidence due to potential biases associated with retrospective studies. The exact level might depend on the specific grading system being used and how it categorizes retrospective studies. Generally it might be placed at a lower level.