Error rates of human reviewers during abstract screening in systematic reviews

Zhen Wang; Tarek Nayfeh; Jennifer Tetzlaff; Peter O’Blenis; Mohammad Hassan Murad

doi:10.1371/journal.pone.0227742

Error rates of human reviewers during abstract screening in systematic reviews

Zhen Wang, Tarek Nayfeh, Jennifer Tetzlaff, Peter O’Blenis, Mohammad Hassan Murad

Research output: Contribution to journal › Review article › peer-review

6 Scopus citations

Abstract

Background Automated approaches to improve the efficiency of systematic reviews are greatly needed. When testing any of these approaches, the criterion standard of comparison (gold standard) is usually human reviewers. Yet, human reviewers make errors in inclusion and exclusion of references. Objectives To determine citation false inclusion and false exclusion rates during abstract screening by pairs of independent reviewers. These rates can help in designing, testing and implementing automated approaches. Methods We identified all systematic reviews conducted between 2010 and 2017 by an evidence-based practice center in the United States. Eligible reviews had to follow standard systematic review procedures with dual independent screening of abstracts and full texts, in which citation inclusion by one reviewer prompted automatic inclusion through the next level of screening. Disagreements between reviewers during full text screening were reconciled via consensus or arbitration by a third reviewer. A false inclusion or exclusion was defined as a decision made by a single reviewer that was inconsistent with the final included list of studies. Results We analyzed a total of 139,467 citations that underwent 329,332 inclusion and exclusion decisions from 86 unique reviewers. The final systematic reviews included 5.48% of the potential references identified through bibliographic database search (95% confidence interval (CI): 2.38% to 8.58%). After abstract screening, the total error rate (false inclusion and false exclusion) was 10.76% (95% CI: 7.43% to 14.09%). Conclusions This study suggests important false inclusion and exclusion rates by human reviewers. When deciding the validity of a future automated study selection algorithm, it is important to keep in mind that the gold standard is not perfect and that achieving error rates similar to humans may be adequate and can save resources and time.

Original language	English (US)
Article number	e0227742
Journal	PloS one
Volume	15
Issue number	1
DOIs	https://doi.org/10.1371/journal.pone.0227742
State	Published - Jan 1 2020

ASJC Scopus subject areas

General Biochemistry, Genetics and Molecular Biology
General Agricultural and Biological Sciences
General

Access to Document

10.1371/journal.pone.0227742

Cite this

@article{641ea29cc14f439694d08f01a62b4563,

title = "Error rates of human reviewers during abstract screening in systematic reviews",

abstract = "Background Automated approaches to improve the efficiency of systematic reviews are greatly needed. When testing any of these approaches, the criterion standard of comparison (gold standard) is usually human reviewers. Yet, human reviewers make errors in inclusion and exclusion of references. Objectives To determine citation false inclusion and false exclusion rates during abstract screening by pairs of independent reviewers. These rates can help in designing, testing and implementing automated approaches. Methods We identified all systematic reviews conducted between 2010 and 2017 by an evidence-based practice center in the United States. Eligible reviews had to follow standard systematic review procedures with dual independent screening of abstracts and full texts, in which citation inclusion by one reviewer prompted automatic inclusion through the next level of screening. Disagreements between reviewers during full text screening were reconciled via consensus or arbitration by a third reviewer. A false inclusion or exclusion was defined as a decision made by a single reviewer that was inconsistent with the final included list of studies. Results We analyzed a total of 139,467 citations that underwent 329,332 inclusion and exclusion decisions from 86 unique reviewers. The final systematic reviews included 5.48% of the potential references identified through bibliographic database search (95% confidence interval (CI): 2.38% to 8.58%). After abstract screening, the total error rate (false inclusion and false exclusion) was 10.76% (95% CI: 7.43% to 14.09%). Conclusions This study suggests important false inclusion and exclusion rates by human reviewers. When deciding the validity of a future automated study selection algorithm, it is important to keep in mind that the gold standard is not perfect and that achieving error rates similar to humans may be adequate and can save resources and time.",

author = "Zhen Wang and Tarek Nayfeh and Jennifer Tetzlaff and Peter O{\textquoteright}Blenis and Murad, {Mohammad Hassan}",

note = "Publisher Copyright: {\textcopyright} 2020 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.",

year = "2020",

month = jan,

day = "1",

doi = "10.1371/journal.pone.0227742",

language = "English (US)",

volume = "15",

journal = "PloS one",

issn = "1932-6203",

publisher = "Public Library of Science",

number = "1",

}

TY - JOUR

T1 - Error rates of human reviewers during abstract screening in systematic reviews

AU - Wang, Zhen

AU - Nayfeh, Tarek

AU - Tetzlaff, Jennifer

AU - O’Blenis, Peter

AU - Murad, Mohammad Hassan

N1 - Publisher Copyright: © 2020 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2020/1/1

Y1 - 2020/1/1

N2 - Background Automated approaches to improve the efficiency of systematic reviews are greatly needed. When testing any of these approaches, the criterion standard of comparison (gold standard) is usually human reviewers. Yet, human reviewers make errors in inclusion and exclusion of references. Objectives To determine citation false inclusion and false exclusion rates during abstract screening by pairs of independent reviewers. These rates can help in designing, testing and implementing automated approaches. Methods We identified all systematic reviews conducted between 2010 and 2017 by an evidence-based practice center in the United States. Eligible reviews had to follow standard systematic review procedures with dual independent screening of abstracts and full texts, in which citation inclusion by one reviewer prompted automatic inclusion through the next level of screening. Disagreements between reviewers during full text screening were reconciled via consensus or arbitration by a third reviewer. A false inclusion or exclusion was defined as a decision made by a single reviewer that was inconsistent with the final included list of studies. Results We analyzed a total of 139,467 citations that underwent 329,332 inclusion and exclusion decisions from 86 unique reviewers. The final systematic reviews included 5.48% of the potential references identified through bibliographic database search (95% confidence interval (CI): 2.38% to 8.58%). After abstract screening, the total error rate (false inclusion and false exclusion) was 10.76% (95% CI: 7.43% to 14.09%). Conclusions This study suggests important false inclusion and exclusion rates by human reviewers. When deciding the validity of a future automated study selection algorithm, it is important to keep in mind that the gold standard is not perfect and that achieving error rates similar to humans may be adequate and can save resources and time.

AB - Background Automated approaches to improve the efficiency of systematic reviews are greatly needed. When testing any of these approaches, the criterion standard of comparison (gold standard) is usually human reviewers. Yet, human reviewers make errors in inclusion and exclusion of references. Objectives To determine citation false inclusion and false exclusion rates during abstract screening by pairs of independent reviewers. These rates can help in designing, testing and implementing automated approaches. Methods We identified all systematic reviews conducted between 2010 and 2017 by an evidence-based practice center in the United States. Eligible reviews had to follow standard systematic review procedures with dual independent screening of abstracts and full texts, in which citation inclusion by one reviewer prompted automatic inclusion through the next level of screening. Disagreements between reviewers during full text screening were reconciled via consensus or arbitration by a third reviewer. A false inclusion or exclusion was defined as a decision made by a single reviewer that was inconsistent with the final included list of studies. Results We analyzed a total of 139,467 citations that underwent 329,332 inclusion and exclusion decisions from 86 unique reviewers. The final systematic reviews included 5.48% of the potential references identified through bibliographic database search (95% confidence interval (CI): 2.38% to 8.58%). After abstract screening, the total error rate (false inclusion and false exclusion) was 10.76% (95% CI: 7.43% to 14.09%). Conclusions This study suggests important false inclusion and exclusion rates by human reviewers. When deciding the validity of a future automated study selection algorithm, it is important to keep in mind that the gold standard is not perfect and that achieving error rates similar to humans may be adequate and can save resources and time.

UR - http://www.scopus.com/inward/record.url?scp=85077884055&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077884055&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0227742

DO - 10.1371/journal.pone.0227742

M3 - Review article

C2 - 31935267

AN - SCOPUS:85077884055

SN - 1932-6203

VL - 15

JO - PloS one

JF - PloS one

IS - 1

M1 - e0227742

ER -

Error rates of human reviewers during abstract screening in systematic reviews

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this