Background and aims Summarising quality improvement (QI) research through systematic literature review has great potential to improve patient care. However, heterogeneous terminology, poor definition of QI concepts and overlap with other scientific fields can make it hard to identify and extract data from relevant literature. This report examines the compromises and pragmatic decisions that undertaking literature review in the field of QI requires and the authors propose recommendations for literature review authors in similar fields.
Methods Two authors (EJ and JF) provide a reflective account of their experiences of conducting a systematic literature review in the field of QI. They draw on wider literature to justify the decisions they made and propose recommendations to improve the literature review process. A third collaborator, (WC) co-created the paper challenging author’s EJ and JF views and perceptions of the problems and solutions of conducting a review of literature in QI.
Results Two main challenges were identified when conducting a review in QI. These were defining QI and selecting QI studies. Strategies to overcome these problems include: select a multi-disciplinary authorship team; review the literature to identify published QI search strategies, QI definitions and QI taxonomies; Contact experts in related fields to clarify whether a paper meets inclusion criteria; keep a reflective account of decision making; submit the protocol to a peer reviewed journal for publication.
Conclusions The QI community should work together as a whole to create a scientific field with a shared vision of QI to enable accurate identification of QI literature. Our recommendations could be helpful for systematic reviewers wishing to evaluate complex interventions in both QI and related fields.
- quality improvement
- continuous quality improvement
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
Enthusiasm for quality improvement (QI) in healthcare continues to gather momentum, leading to increasing interest in studies of its effectiveness and to further the development of QI.1–4 Quality improvement—often shorted to QI—involves making changes to systems and can be used in healthcare to improve specific clinical procedures (such as antibiotic prescribing) or whole care pathways.3 QI projects are different to clinical research:1 5–7 they are typically pragmatic and led by local clinicians,8 often in a single setting,7 and they tend to be ongoing, or continual, in contrast to research which takes place over a defined period of time.9 QI can also be conducted as research. QI research has been likened to health services research, which typically aims to examine how care can be delivered to achieve the highest quality of care,9 and applied research, which can ‘bridge the gap’ (p.1962)10 between the ideal setting of clinical trials and actual routine care. This paper explores some of the difficulties of reviewing the QI literature.
Policy makers and healthcare staff may use literature reviews to choose QI interventions wisely, thus avoiding implementing changes which are unlikely to provide any benefit.11 12 Systematic review is one type of literature review which allows the findings and methodological quality of a large number of published articles in a scientific field to be summarised and this remains the gold standard for many decisions made in healthcare.13 Other types of review including scoping review, narrative review, critical review and systematised review are also important to contextualise and situate research helping to provide a focal point, and to understand the current state of knowledge in a field to prevent research duplication.14 15
Despite the obvious importance of conducting rigorous reviews, their execution is challenging.16 The choice of the review approach is contingent on the research question being pursued and its epistemological basis, the time available to conduct the review, the state and quality of the existing literature, and the quality standard required for the review output.15 Once an appropriate method has been selected, rigorous literature reviewing requires adherence to a series of predefined, reproducible steps to identify, select and critically appraise relevant research.14 These steps are described in reporting guidelines published by the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) network (such as the Preferred Reporting Items for Systematic Reviews and Meta-Analyses: the PRISMA statement),17 but following such guidance may not always be straightforward when reviewing literature on complex interventions such as QI.16
Researchers deliberate how to define QI and QI interventions,18 and QI terminology is vague, inconsistent and variable across disciplines.19 20 Perhaps as a consequence of this, or a precursor to it, the underlying quality of QI articles and their reporting is known to be poor.21 22 These definitional issues are not trivial problems. If a systematic review approach cannot capture all relevant literature because it is so difficult to classify articles as ‘QI’, or interventions as ‘QI interventions’,19 23 24 the review will be incomplete, and less reliable estimates of the effectiveness of QI interventions may be produced.13 Inconsistent terminology can also make it increasingly difficult for reviewers to assess studies for similarity during meta-analyses. If useful comparisons cannot be made between QI studies, knowledge synthesis becomes even more challenging.13 25 26 Definitional problems can create similar difficulties for authors using other literature review approaches, including reduced accuracy in article identification and data extraction. Reviewers may seek precedent in the published work of others to hone their own reporting skill, but if others have been faced with inconsistencies, previous confusion can add to the cycle of poor reporting.27
We each used a different approach to conduct a rigorous literature review within healthcare improvement: (1) Systematic review (EJ).21 28 (2) Integrative review (JF).29 EJ examined QI interventions in surgery, and JF examined how improvement capability is conceptualised and assessed. A systematic review attempts to identify experimental and quasiexperimental evidence that fits prespecified eligibility criteria to answer a specific research question. It uses systematic methods designed to minimise bias.30 An ‘integrative review can precisely represent the state of the current research literature. (It) can be used to … identify the need for future research, build a bridge between related areas of work, … identify a theoretical or conceptual framework’.31 While our studies were being conducted we were both part of a national programme with biannual learning sets where we could discuss reflections on our ongoing work. We each recorded these in our own reflective research diaries. After we had published our reviews we shared our experiences of conducting them during unstructured discussions over the telephone and by email. Despite different research fields and approaches to literature reviewing, the frustrations we encountered were surprisingly similar.
Our literature reviews required considerable debate to design a suitable search strategy, identify studies which could be defined as QI and categorise interventions into groups. In this account, we describe such debates as a ‘black box’ and we attempt to unravel why debate arose and the compromises and pragmatic solutions we employed to resolve them. We have termed this a ‘black box’ because typically debates such as why some papers were excluded and not others, are not included in scientific reporting, but authors are encouraged to explain when much arbitration was required to resolve disputes.32 Second, we justify the compromised and perhaps ‘imperfect’ decisions we made. Finally, by sharing what we learnt in the field of QI we make recommendations for others in fields characterised by similar complexity. To improve the credence of the paper, a third collaborator author (WC), challenged views of authors EJ and JF on the problems and solutions of conducting a review of literature in QI. She drew on her experiences of QI research and of writing a narrative synthesis (outside the field of QI) on the perceptions of patients and healthcare workers in women’s’ health.33
Challenges of conducting a literature review systematically in QI
We identified two main challenges in conducting a review in QI. These were defining QI and selecting QI studies.
We found it challenging to delineate which literature could be described as QI. QI is an ‘emerging’ field of science19 34 35 and contradicting views on how QI should be defined are commonplace in scientific abstracts18 19 36 where a large number of terms37 which change over time19 20 are being used. Conceptualisation of the term ‘quality improvement’ is also rather unique to the healthcare field. In the engineering and manufacturing industry terms such as total quality management (TQM) and ‘continuous improvement’ are much more common, although how they are used to achieve the aims of QI work in healthcare may be understood differently. Thus, it is not surprising that a single accepted definition of QI is still lacking within healthcare.38 To add to this, the volume of publications in QI in healthcare is growing,39 and the work is scattered across different types of journals (QI journals, clinical journals or management journals)21 and databases.24 This may be compounded by the lack of agreement as to ‘what is quality?’.40 41 Therefore, designing a robust review strategy on ‘fuzzy topics’—which are not self-defining—such as quality and QI requires the reviewer to apply judgement to show clear discernment. This may not be a problem which is unique to QI, and we therefore hope that our experiences could be useful to a wider group of researchers.
In addition, QI often uses structured techniques to support the implementation of the intervention, such as Plan-Do-Study-Act (PDSA) cycles. However, while some techniques are related to QI, they may not be easy to define as belonging to the field of ‘QI’. For example, DMAIC (Define, Measure, Analyse, Improve, Control) is commonly used within six sigma and lean six sigma in healthcare,42 and we define this as belonging to the field of QI. The discipline of Instructional Systems Design also involves five integrated steps: analysis, development, design, implementation and evaluation (ADDIE)43 and it has been used to solve specific problems in healthcare such as preventing heat loss in patients recovering from surgery.44 However, ADDIE originates in the field of behavioural psychology and human performance technology (HPT) and is not an easy ‘fit’ in the field of QI. ‘Quality improvement’ and HPT both evolved in the context of a growing emphasis on the importance of a systematic approach to examining the quality of care.45 However, differences in their lineage can be identified, with HPT being focused on the process of organisational learning and competency,46 and QI being focused on studying systems and variation within systems.47 48 Recognising the distinction and similarities between QI and other similar fields is a mindful, defensible way of discerning what ‘counts’ as QI.
During our reflective discussions we realised (EJ, JF) that we had both conducted a scoping review to conceptualise the term ‘quality improvement’. Conceptualising QI early in the review process provided some reassurance unwanted ‘noise’ could be excluded and that all reviewers would consistently identify relevant literature. A third ‘arbitrator’ (WC) helped us to identify what questions we asked during our scoping reviews to resolve the problem of ambiguity in QI:
Who should form our authorship teams? Understanding who publishes research in QI, for example—sociologists, engineers, healthcare professionals—helped us ensure a balanced collection of viewpoints could be brought to our authorship teams. We both worked with clinicians and non-clinicians from different professional backgrounds. EJ (a physiotherapist by background) worked with medical sociologists and an anaesthetist, and JF (an engineer and healthcare manager by background) worked with experts in management and policy.
What research exists on resolving ambiguities in QI ? For example, Rubenstein et al 23 classified QI articles into four types (1) Empirical literature on testing of QI. (2) QI theories and frameworks. (3) QI literature synthesis and meta-analysis. (4) Development and testing of QI-related tools. Each category has strong face validity and is easily recognised.
What search terms have previous QI in healthcare authors used in their published literature reviews? For example—the Health Foundation’s QI evidence scan,48 and Walshe20 both review improvement methodologies using a variety of different terms. We both used a variety of different terms for ‘QI’ within searches to capture QI literature.21 29 We used free-text key words to account for the problem that phrases such as ‘quality improvement’ and ‘continuous improvement’ are sometimes used synonymously.
How is QI defined in the literature?
EJ used consensus meetings with her authorship team to reduce a list of QI definitions (box 1) to one definition deemed to be clear and easy to apply to the literature. Jones et al 21 paraphrased Batalden and Davidoff49 to define QI as: ‘deliberate structured process of purposeful efforts to make changes that will lead to better outcomes, better system performance, and better professional development’.
In addition to defining QI, JF needed to define ‘improvement capability’ to isolate papers for her review topic. Similarly, literature review revealed no accepted definition. Therefore, Furnival et al (2017)29 used Whittemore and Knafl’s integrative review method50 which allows for several perspectives of a topic to be used. Improvement capability was defined as: ‘the organisational ability to intentionally and systematically use improvement approaches, methods and practices, to change processes and products/services to generate improved performance.’29
How is QI distinct from, and how does it overlap with, other fields ? To define the field of QI we worked with experts in QI and related fields. EJ adopted values and norms unique to QI in healthcare, distinguishing it as a separate field in order to deploy a specific review methodology in the field of surgery. JF positioned QI across fields, by examining how QI is used in healthcare based on its heritage from manufacturing and industry, partly because JF had a background in engineering, and because there is so much to learn from other fields. The resultant search sought to identify qualitative and quantitative instruments that had been tested across fields and all sectors were included. No date restrictions were used.
EJ contacted experts in related fields such as Human Factors to identify which features could be used to distinguish QI papers and human factors papers from each other. These discussions resulted in a coauthored manuscript with Human Factors and QI experts to expose the similarities and differences between the fields.51 Conversely, JF acknowledged that HFs (Human Factors) and ergonomics is an integral part of a QI specialist’s skill set, and as such a definition of how QI overlaps (rather than being distinct from) other fields became more important.
We conducted pilot literature searches to ensure a selection of known surgical QI papers42 and improvement capability papers52–54 were successfully captured. This verified that EJ’s choice of terms could capture papers which conceptualised QI as relating to a set of values and norms in the healthcare field, and that JF’s choice of terms could capture papers published in both healthcare and related fields.
Selected descriptions of QI in the literature
‘QI is … a process of change in human behaviour, … driven largely by experiential learning’87
‘Modern QI concepts had their origins in the statistical process control measurements …the input is a work process needing improvement and the output is a new, improved version of the (existing) work process’88
‘Improvement science focuses on systematically and rigorously exploring ‘what works’ to improve quality in healthcare … the focus is … on structured quality improvement approaches, such as plan, do, study, act (PDSA) cycles.’48
‘Key elements … are the combination of a ‘change’ (improvement) combined with a ‘method’ (an approach or specific tools) to attain a superior outcome.’70
‘The combined and unceasing efforts of everyone—healthcare professionals, patients and their families, researchers, payers, planners and educators—to make the changes that will lead to better patient outcomes (health), better system performance (care) and better professional development.’49
‘Better patient experience and outcomes achieved through changing provider behaviour and organisation through using a systematic change method’89
The search strategy and databases used by EJ and JF are included in online supplementary file appendix 1.
The methods we used to conceptualise the term ‘quality improvement’ have some limitations. For example, asking experts to clarify the nature of QI is subject to heterogeneity in opinions. We also made some concessions. For example, EJ used broad search terms such as ‘quality adj2 improve$’ (where the terms ‘quality’ and variations of ‘improve’ such as ‘improvement’ are identified within two words of each other). This captured titles such as ‘Can quality circles improve hospital-acquired infection control?’,55 which would not otherwise be identified by ‘quality improvement’. However, this term also captured articles which described quality of care, rather than QI. To overcome problems with specificity, the Medical Subject Headings (MeSH) ‘Quality Improvement/’ and ‘TQM’ are now in use, which guard against the possibility that review teams may not have thought of all relevant synonyms.56
Additionally, our strategies to resolve ambiguity did not produce perfect classifications. In a relatively new field challenged by inconsistent conceptual and terminological definition, QI can mean different things to different people, depending on professional background, and research aims and intentions. However, documenting which compromises were made and presenting these decisions to others in a discerning, mindful way provides further opportunity for learning which otherwise might be missed. For example, EJ published the review protocol,28 and registered it with PROSPERO57 which allows readers to compare the published review with the protocol to check all data were extracted as intended. We both kept a reflective account of decision making. JF used her reflective account to explain that variable conceptualisation and assessment of capability could be eased by employing the integrative method which allows for more than one a priori definition to be used during searching.
Selecting suitable studies in the QI field
Accurately selecting QI studies during a literature review is problematic because it is hard to determine what makes an intervention a ‘QI intervention’.18 At first glance, any clinical healthcare intervention including pharmaceutical, surgical, physiotherapeutic interventions, for example, could be classed as ‘quality improvement’. All of these interventions aim to improve outcome (quality of life or length of stay, eg), which could in turn impact quality of care. Likewise, checklists, algorithms or pathways are systematic clinical activities that aim to improve outcome. For example, Enhanced Recovery After Surgery (ERAS) pathways combine early mobilisation, intraoperative fluid balance and carbohydrate loading, to achieve an overall effect (reducing surgical stress response to reduce length of stay) which improves a clinical process (surgery).58–62 Like ERAS pathways, QI work tends to involve the implementation of multiple interventions to improve clinical processes, but ERAS is not routinely referred to as a QI intervention in the literature.
Poor conceptualisation of what makes an intervention a QI intervention prompted EJ to follow the recommendations of Shepperd et al,16 who suggest that known taxonomies (or rule sets) could aid intervention classification. Jones et al 21 used a taxonomy produced by Shojania et al 63 (table 1). This taxonomy was built on other well-established taxonomies of behaviour change interventions64–68 after examining QI across a range of medical specialities.69 Shojania et al 63 describe a QI intervention as something which can improve processes to support clinical activity (such as a reminder system to wash your hands). Therefore, not all interventions to improve quality of care are quality interventions and the content of care (such as a new surgical technique) would not be classed as a QI intervention.9
EJ modified Shojania’s original taxonomy to stipulate that a QI intervention must be supported by a QI technique (such as a PDSA cycle) because QI techniques are often cited in the QI literature.48 70 Not all QI interventions require the use of a QI technique, but EJ and her authorship team used the wider literature to create a pragmatic operationalisation of what constitutes QI.21
Conversely, JF did not search for interventions, but frameworks and instruments for assessing or measuring improvement capability. However, JF faced similar challenges due to wide heterogeneity in the framework and instrument constructs. Rather than starting out with a narrow conceptualisation of QI and improvement capability, a broader literature was targeted. Keywords were selected to take account of the broad array of terminological heterogeneity across many disciplines, rather than a taxonomy of categories. Therefore this searching strategy required supplementation to identify further high-quality references.71 72 JF searched across several bibliographical databases with thesaurus terms, MeSH terms and broad-based terms. JF also used citation checking and reference searching to ‘snowball’ iteratively searching backwards and forwards on citations to find more obscure yet relevant articles until saturation was reached.73 JF operationalised inclusion criteria that improvement capability assessment instruments could only be included within the sample if they had been evaluated at least once and were supported by empirical data. Multiple uses or iterations of the same instrument were discarded.
Our report provides a reflexive, critical overview of the process of reviewing the literature in the field of QI. Through clear documentation of reproducible methods of selecting articles for inclusion (such as using a known taxonomy or method) threats to the validity of our results could be reduced. However, the imperfections of these approaches are well known. Even when a research aid such as taxonomy has been selected, they can be challenging to apply, especially when the terminology used is inconsistent and heterogeneous.36 37 74 For example, Colquhoun et al 75 explain that one author may use the behaviour change wheel76 to refer to an intervention as ‘environmental restructuring’, and another may use the EPOC (Cochrane Effective Practice and Organisation of Care) framework77 to describe the same thing as a ‘structural intervention’. This can exaggerate confusion on how to select and implement the most appropriate taxonomy from the many aids to classification that exist in the literature.12 Also, the terminology used in Shojania et al’s63 taxonomy is not always consistent with the wider QI literature. While more time could have been invested to develop a new set of criteria, the divided opinions of researchers in the field may still have threatened its implementation. Therefore, developing a new taxonomy would have been an ‘over-engineered’ solution which would not necessarily be more helpful than what Jones et al 21 were able to produce. Asking the experts whether they judge their work to sit within the field of QI is also an imperfect indicator of how QI should be defined, yet, the literature is also imperfect.
The methods we suggest to identify scientific publications about QI are not infallible, but they are defensible. There is a large amount of heterogeneity between QI articles in terminology and methodology (diversity in intervention and study design).42 78–81 A ‘virtually infinite number of combinations of features and local environmental circumstances’ (p.244)82 also makes the synthesis of QI evidence extremely challenging. Additionally, terms such as QI and improvement capability do not have unproblematic widely agreed meanings. This necessitated judgements to be discerned from a variety of sources including existing literature, expert opinion and personal experience, but these judgements may not be universally accepted within the QI community. For example, EJ’s judgement that QI interventions do not include clinical interventions may be contentious for some, but based on the evidence available and epistemological standpoint, this was deemed wholly appropriate to satisfy the aims of the systematic review.
Recognising the value of a more plural epistemological outlook may also be important for authors reviewing the literature in complex or relatively new scientific fields. EJ followed a systematic review approach which is commonly associated with a positivist approach to gathering and synthesising knowledge. This was well suited to her research question: ‘what is the completeness of reporting of QI interventions in the perioperative literature?’. Conversely, JF adopted a constructivist perspective asking, ‘how is improvement capability conceptualised and assessed?’. While different review questions can drive the method selected for reviewing, the question alone cannot generate a typology of review method.83 We found value in exploring post hoc the different viewpoints from which we designed our review protocols, recognising their strengths, rather than polarities. Indeed, scholars in the field of QI itself argue that hard scientific evidence needs to be combined with an understanding of beliefs and ‘soft intelligence’ to ensure successful interventions can be spread and adopted within complex organisations such as the National Health Service.84 85
The problems we describe cannot be resolved immediately but methodological direction is developing.82 For example, Colquhoun et al,37 through an international working group, are investigating whether a single consensus on the taxonomic classification of interventions (including QI interventions) is possible. While in clinical practice and research, a degree of uncertainty and creativity can be celebrated to allow thinking to evolve,86 we suggest that it is important for the QI community as a whole to seek agreement on what constitutes the term QI for two key reasons. First, we must ensure that patients can gain the most benefit from systematic reviews of QI work. To translate the findings of QI into practice, reviewers must be able to identify the QI literature and accurately synthesise it so that robust conclusions may be drawn. Second, agreement on what constitutes QI is needed so that QI research capability can be better facilitated. Organisations offering funding or guidance, may not be able to standardise eligibility for support if they struggle to identify which activities count as improvement and which do not. This may in time relieve the financial and social burden of poor-quality work.
After our reviews were published, we discussed the challenges we faced and how we resolved them during a series of reflective discussions by telephone and email. Through arbitration with a third author (WC), we were able to suggest what other authors could do to ease the process of literature-reviewing in complex fields. Our approach does not offer a definitive solution, but rather a starting point which should be critiqued and refined depending on the review question and aims. Our approach is based on our own reflections of using a systematic approach to reviewing literature in a complex field. There are many existing frameworks and resources (such as the PRISMA statement) to aid review authors, but by using the specific example of QI in healthcare we add to this literature, providing practical examples from our own experience. During our own work, in a field where there is much contention not just about what QI is, but where it fits in a landscape of other sciences, we made pragmatic choices to complete our reviews with finite time and resources. Therefore, we hope that our recommendations can be used as a starting point for authors in similar complex fields to strengthen the practice of systematic and literature reviews:
Maintain transparency and be as thorough as possible (using supplementary materials if required) to expose the challenges you faced to allow the reader to make judgements about the defensibility of your methodological choices.
Assemble a multiprofessional review team (our teams included clinicians, social scientists and engineers).
Ensure the review team can communicate in an open and honest way about uncertainties and compromises.
Conduct a literature scoping exercise to resolve ambiguities in defining the topic and key terms before starting the main review.
Apply a taxonomy or rule to aid classification of concepts.
Contact experts in related fields and authors of papers selected for full-text review to clarify whether a paper meets inclusion criteria if the literature review team cannot agree and ambiguity exists.
Maintain documentation of reflexivity and make this available to experts in your field.
Consider plurality as an approach when working with a multidisciplinary review team.
The QI community should work together as a whole to create a scientific field with a shared vision of what QI is and how literature can be accurately identified to sit within the field of QI . This in turn will allow successful application of solutions, such as those presented in this paper, to enable faster and more accurate identification and synthesis of QI evidence. This will ensure that effective QI work can be adopted rapidly and reliably with greatest impact for patient care. Our recommendations could also be helpful for systematic reviewers wishing to evaluate complex interventions in similar fields.
The authors thank Professor Natalie Armstrong (University of Leicester), Professor Graham Martin (THIS (The Healthcare Improvement Studies) Institute, University of Cambridge and the University of Leicester) and Professor Mary Dixon-Woods (The Healthcare Improvement Studies Institute, University of Cambridge) who provided comments on earlier manuscript drafts.
Contributors EJ planned and managed the overall conduct of this study. EJ, JF and WC contributed to writing the manuscript. EJ submitted the study.
Funding This work was completed with support from three PhD improvement science studentships for EJ, JF and WC from the Health Foundation.
Competing interests None declared.
Patient consent for publication Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.