Systematic literature reviews (SLRs) are the foundation of evidence-based healthcare. Explicit methods need to be implemented while conducting SLRs to minimize bias in order to provide more reliable findings, since reduction of bias may affect all steps of the review process. For instance, bias can occur while identifying/screening studies, selecting studies (e.g. due to unclear inclusion criteria), during data extraction process and also, during the validity assessment of included studies. (1,2)

Data extraction or data collection is a critical step while carrying out SLRs. The process of data extraction can be defined as extracting any type of data from primary studies into any form of standardized tables.(2) Data extraction is one of the most time-consuming and critical tasks performed in order to validate the results of an SLR. (3) In reality, it typically takes between 2.5 to 6.5 years for a primary study publication to be included and published in a new SLR. (4) Moreover, almost 23 % such studies are out of date within 2 years of the publication of SLRs, because of lack of new evidence that might change the primary results of an SLR. (5)

Evidence from literature further reports high prevalence of extraction errors, which may have only moderate impact on the results of an SLR.(2,6,7)  However, increasing data extraction errors indicate the importance of measures for quality assurance of data extraction in order to minimize the risk of biased results and wrong conclusions. (8) Therefore, there is a dire need of identifying ways to improve the quality of data extraction for SLRs.

One of the first options is the use of two independent reviewers to extract the data, i.e. a process known as ‘double data extraction’. This process has been reported to result in fewer extraction errors; (9) however, it may not be always necessary, thus justifying the process of ‘reduced extraction’. Reduced extraction focuses on identification of critical aspects (e.g. primary outcomes) that form the basis of conclusion instead of emphasizing on the data extraction of lesser important parameters (e.g. patient characteristics, additional outcomes, etc.).2 This is also recommended by the Methodological Expectations of Cochrane Intervention Reviews (MECIR), wherein it is stated that “dual data extraction is particularly important for outcome data, which feed directly into syntheses of the evidence, and hence to the conclusions of the review”. (10) In addition, the Institute of Medicine (IOM) also states that “at minimum, use two or more researchers, working independently, to extract quantitative and other critical data from each study”. (11)

Moreover, training the reviewer team in data extraction (e.g. using a sample) prior to performing the complete data extraction is essential to harmonize the end results as well as to clear up common misunderstandings, which would particularly reduce interpretation and selection errors as well as time and effort.6 The reduction of time and effort is especially useful in case of rapid reviews, which aim to deliver timely yet systematic results. (12)

Automated data extraction has been recently proposed in order to reduce errors as well as for timely completion of SLRs. Natural language processing (NLP) is one such method that involves computerized data extraction to include new, previously unfound information by automatically extracting data from different written resources. (13) This process constitutes aspects of concept extraction/entity recognition, and relation extraction/association extraction. The technique of NLP has been used to automate extraction of genomic and clinical information from biomedical literature. However, the concept of automating data extraction process has not been explored completely yet. The techniques like NLP can initially be used to monitor manual data extraction (which is currently performed in duplicate); then to validate the same done by a single reviewer; then become the primary source for data element extraction to be validated by a human; and eventually completely automate data extraction to enable efficient and faster SLRs. (14)

Having said that, there are no specific, established standards for data extraction, because the actual benefit of a certain extraction method (e.g. independent data extraction) or the specifications of the reviewer team (e.g. expertise) is not well proven. This warrants more comparative studies to further understand the influence of different extraction methods. Particularly, studies exploring the need of training for data extraction are vital owing to the lack of such analysis till date. More efficient utilization of scientific expertise can be achieved with the application of methods requiring less effort without threatening the internal validity. Finally, enhancing the knowledge base would also help in planning effective training strategies for new reviewers and students in the future.(2)

Become a Certified HEOR Professional – Enrol yourself here!


  1. Felson DT. Bias in meta-analytic research. J Clin Epidemiol 1992; 45(8):88-892. 
  2. Mathes T, Klaßen P, Pieper D. Frequency of data extraction errors and methods to increase data extraction quality: A methodological review. BMC Med Res Methodol 2017; 17(1):152.
  3. Higgins JPT, Green S, editors. Cochrane handbook for systematic reviews of interventions version 5.1.0 [updated March 2011]. The Cochrane Collaboration; 2011.
  4. Elliott J, Turner T, Clavisi O, et al. Living systematic reviews: an emerging opportunity to narrow the evidence-practice gap. PLoS Med 2014; 11:e1001603.
  5. Shojania KG, Sampson M, Ansari MT, et al. How quickly do systematic reviews go out of date? A survival analysis. Ann Intern Med 2007; 147(4):224-33.
  6. Haywood KL, Hargreaves J, White R, et al. Reviewing measures of outcome: reliability of data extraction. J Eval Clin Pract 2004; 10:329–337.
  7. Carroll C, Scope A, Kaltenthaler E. A case study of binary outcome data extraction across three systematic reviews of hip arthroplasty: errors and differences of selection. BMC research notes 2013; 6:539.
  8. Gøtzsche PC, Hróbjartsson A, Marić K, et al. Data extraction errors in meta-analyses that use standardized mean differences. JAMA 2007; 298(4):430–437. 
  9. Tendal B, Higgins JP, Juni P, et al. Disagreements in meta-analyses using outcomes measured on continuous or rating scales: observer agreement study. BMJ 2009; 339: b3128. 
  10. Higgins JPT, Lasserson T, Chandler J, et al. Methodological Expectations of Cochrane Intervention Reviews. London: Cochrane; 2016. 
  11. Morton S, Berg A, Levit L, Eden J. Finding what works in health care: standards for systematic reviews. National Academies Press; 2011.
  12. Schünemann HJ, Moja L. Reviews: rapid! Rapid! Rapid! …and systematic. Syst Rev 2015; 4(1):4.
  13. Hearst MA. Untangling text data mining. Proceedings of the 37th annual meeting of the Association for Computational Linguistics. College Park, Maryland: Association for Computational Linguistics; 1999. pp. 3–10. 
  14. Jonnalagadda SR, Goyal P, Huffman MD. Automating data extraction in systematic reviews: a systematic review. Syst Rev 2015; 4:78.

Written by: Ms. Tanvi Laghate

Related Posts