Article Text

Quality of locally designed surveys in a quality improvement collaborative: review of survey validity and identification of common errors
  1. Julie E Reed1,2,
  2. Julie K Johnson3,
  3. Robert Zanni4,
  4. Randy Messier5,
  5. Fadi Asfour6,
  6. Marjorie M Godfrey5
  1. 1Julie Reed Consultancy Ltd, London, UK
  2. 2Halmstad University School of Health and Welfare, Halmstad, Sweden
  3. 3Northwestern Quality Improvement, Research, and Education in Surgery, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
  4. 4Robert Wood Johnson Barnabas Health Medical Group, Monmouth Medical Center, Long Branch, New Jersey, USA
  5. 5University of New Hampshire, Durham, New Hampshire, USA
  6. 6UTHSC, Utah, Utah, USA
  1. Correspondence to Professor Marjorie M Godfrey; margiegodfrey{at}gmail.com

Abstract

Objective Surveys are a commonly used tool in quality improvement (QI) projects, but little is known about the standards to which they are designed and applied. We aimed to investigate the quality of surveys used within a QI collaborative, and to characterise the common errors made in survey design.

Methods Five reviewers (two research methodology and QI, three clinical and QI experts) independently assessed 20 surveys, comprising 250 survey items, that were developed in a North American cystic fibrosis lung transplant transition collaborative. Content Validity Index (CVI) scores were calculated for each survey. Reviewer consensus discussions decided an overall quality assessment for each survey and survey item (analysed using descriptive statistics) and explored the rationale for scoring (using qualitative thematic analysis).

Results 3/20 surveys scored as high quality (CVI >80%). 19% (n=47) of survey items were recommended by the reviewers, with 35% (n=87) requiring improvements, and 46% (n=116) not recommended. Quality assessment criteria were agreed upon. Types of common errors identified included the ethics and appropriateness of questions and survey format; usefulness of survey items to inform learning or lead to action, and methodological issues with survey questions, survey response options; and overall survey design.

Conclusion Survey development is a task that requires careful consideration, time and expertise. QI teams should consider whether a survey is the most appropriate form for capturing information during the improvement process. There is a need to educate and support QI teams to adhere to good practice and avoid common errors, thereby increasing the value of surveys for evaluation and QI. The methodology, quality assessment criteria and common errors described in this paper can provide a useful resource for this purpose.

  • quality improvement methodologies
  • collaborative, breakthrough groups
  • surveys
  • healthcare quality improvement

Data availability statement

No data are available.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • Front-line improvement teams frequently default to the use of surveys to gather information to guide improvement activities, however, little is known about the quality of surveys developed by local quality improvement (QI) teams, and poorly designed surveys may misguide improvement activities.

WHAT THIS STUDY ADDS

  • This study demonstrates the variable quality of locally developed surveys for QI, indicating survey development requires careful consideration, time and expertise.

  • Key lessons are identified highlighting common errors in survey design relating to ethics, appropriateness, usefulness and methodological issues.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • This study presents a method for peer-reviewing surveys that can be applied by other QI teams and collaboratives.

  • The findings and key lessons can inform education about the design and development of surveys for front-line improvement teams.

Introduction

Measurement plays a central role in all quality improvement (QI) approaches.1–4 However, research has demonstrated that QI methods and approaches are not always used with high fidelity or scientific rigour5 6 leading to calls to improve the quality of QI,7 including areas of measurement and evaluation.8

Research exploring the use of quantitative data in improvement efforts has highlighted the challenges of developing useful metrics, including precise measure definitions, reliable collection of high-quality data and appropriate analysis, interpretation and action in response to results.4 9 While these challenges exist in any nationally or regionally driven improvement effort, such problems are particularly prominent in local QI efforts where well intended teams may not have extensive skills or experience in developing and using effective quantitative measures.4 While there is a growth in the literature on the challenges of quantitative measures and how to overcome them, the same investigation has not taken place for the development and use of surveys to gain information as part of the QI process.

Surveys are a popular tool in QI that provide a structured approach to capture information from patient or staff respondents about their experiences, opinions, views and impressions.10 While surveys can include space for qualitative written responses, their power comes from their ability to translate qualitative information into semi-quantified data amenable to statistical analysis. For example, asking about levels of patient satisfaction using a 5-point Likert scale (ranging from very satisfied to very dissatisfied) translates a qualitative subjective opinion into a data point. This allows descriptive statistics to be performed at (a) population level (eg, 83% of patients were very satisfied with the service), (b) comparative statistics to be performed between discrete populations (eg, more patients were very satisfied at hospital X than hospital Y) or (c) overtime (eg, patient satisfaction increased from 40% to 60%).

Research to date has focused on the use of surveys in QI that have been designed for wide scale use.11 12 The challenges of developing validated surveys is well recognised, and significant effort and expertise has been invested in areas of survey development in national surveys and by clinical specialist groups, for example, patient experience and outcome measures, and patient safety.13–15 However, in QI initiatives, surveys are usually developed at a more granular local level to support bespoke investigation, evaluation and improvement efforts. The quality of survey instruments developed in such settings has not been explored.

This study aims to assess the quality of surveys produced by QI teams in a QI collaborative. The quality of surveys is often assessed using professional consensus methods such as the Content Validity Index (CVI).16 17 Such methods select which survey items are high quality based on a high proportion of favourable opinions among a group of independent assessors with relevant expertise. While such methods provide subjective assessment to identify any problematic survey items they do not provide insights as to why the decisions were made, and therefore an opportunity is lost to draw on the reviewers insights to inform the design of future surveys. Therefore this study also aims to explore the reasons behind the survey assessment scores in order to identify lessons about common errors of survey design to help other QI teams avoid common pitfalls and strengthen local evaluations.

This study conducted primary research within a multi-site QI collaborative. Due to the on-going and expanding nature of the collaborative there was an opportunity for the work from this research to inform future work of current sites participating in the collaborative and new sites that join as the collaborative expands. As such this research also sought to make pragmatic contributions to the collaborative in identifying which surveys and survey items were recommended to be used by the teams, and to provide guidance to the teams on how to improve the quality of any bespoke surveys they developed. In addition, we believe the demonstration of a method for reviewing survey quality, and insights as to common errors in survey design, will provide valuable guidance to ‘improvers’ at the front-line of care delivery developing QI surveys, and those responsible for running QI collaboratives or other large programmes of improvement work.

Methods

Setting: the cystic fibrosis lung transplant transition learning and leadership collaborative

The Cystic Fibrosis Foundation, based in the USA, has a long tradition of organising to improve care for people with cystic fibrosis (CF) and their families. In 2016, people with advanced CF lung disease who had lung transplant reported after their lung transplantation they felt they were no longer part of the CF family and the referral processes from CF programmes to transplant programmes was ‘broken’.18

The CF Lung Transplant Transition Learning and Leadership Collaborative (CF LTT LLC, herein referred to as ‘the collaborative’) launched in 2017 was adapted from twenty years of experience designing improvement collaboratives for people with CF (led by MMG). Collaborative methodology was based on the original Institute for Healthcare Improvement break through series framework19 and was modified to include the microsystem improvement process including people with CF and family members,20 21 and team coaching.16 The original CF LLC programme was adapted for the collaborative to improve not only one microsystem, but two microsystems (CF referral and lung transplant programmes individually) and the mesosystem of CF lung transplantation (CF referral and lung transplant programmes together) with a shared purpose to improve care for people with advanced CF lung disease.

The aim of the collaborative was ‘within the context of a learning community, explore, improve and decrease practice variation in the systems and processes of lung transplant referrals and transitions from CF programmes to transplant programmes and then to a model of shared responsibility for the patient’s care’. In 2017 the original CF LTT collaborative launched 10 pilot pairs of CF referring and lung transplant improvement teams in the USA and Canada. Following 18 months of the collaborative, a CF LTT regional dissemination network (RDN) was created to share the findings and lessons learned from the pilot programme with new CF referral programmes in each of the regional lung transplant programmes. CF LTT RDN wave 1 was launched in March 2019 engaging 10 new CF referring improvement teams, with wave 2 joining in June 2019 (six new CF referring teams) and wave 3 in September 2019 (seven new CF referring teams). This resulted in 33 CF teams partnering with 10 regionally based lung transplant teams. The surveys produced by these teams were reviewed in this study.

Within the LLC methodology, the teams were encouraged to explore their local microsystem to understand areas for improvement. Surveys were not a prescribed form of measurement but were suggested as a potential method to understand patient and staff perceptions to inform improvement work.

Review of surveys

An overview of the process of review of surveys is shown in figure 1.

Figure 1

Process of survey collection and analysis.

Data collection

Surveys were collated from all CF referring and lung transplant sites. As part of established knowledge sharing procedures, teams shared copies of surveys in the collaborative ‘compendium’ (a 280-page document containing details of activities and lessons learned of all sites.) This document was searched for surveys that were included in full or mentioned within site reports. Where surveys were mentioned but not included, copies were requested by email and obtained from sites. In addition, an email cascade through CF Quality coaches and clinical teams attempted to elicit any other surveys.

Data from the surveys were extracted into an excel spreadsheet including survey name and each individual survey item comprised a question and response options. This spreadsheet acted as the data collection tool for individual reviewers to add quality scores and comments.

Independent survey scoring

An interprofessional panel of five people reviewed and rated the individual survey item and the overall survey for content validity using a trichotomous rating scale. A trichotomous scale was chosen to capture the range of views held by the review panel, and to inform future action in relation to specific survey items and surveys.

Two of the reviewers had research expertise and focused on methodological quality (JER and JJ) scoring items as ‘good’, ‘fair’ or ‘poor’. Three of the reviewers had topic specific expertise (relating to the clinical issue of CF Lung Transplant and coaching QI) (RM, RZ, FA) focused on the usefulness of the questions to inform learning, action and improvement, scoring each item or survey as either ‘very useful’, ‘somewhat useful’ or ‘not useful’. JJ, RM and RZ had previously worked with the study sites in their role as QI coaches.

CVI calculation

The independent reviewer scores were used to calculate a CVI value for each survey.17 22 The CVI calculates the proportion of items on an instrument that achieved a favourable rating by the reviewers: only a survey item that stands up to scrutiny by multiple reviewers is considered to be of good enough quality to be recommended for inclusion in a survey. The use of a diverse panel of reviewers is preferable for the CVI in order to identify as many ‘problems’ with a survey item as possible. It is expected that each individual reviewer will identify different types and amounts of problems, thus strengthening the quality of the review.23–25 As a result, inter-rater reliability is expected to be low across reviewers: what matters is the identification of questions that no one can find fault with.

While the trichotomous scale was valuable in aiding the conduct of the review and for guiding actions in response to the review, a dichotomous scale is needed to calculate CVI. The CVI was therefore calculated by transforming the trichotomous reviewers scale into a dichotomous scale, where ‘good’ (top methodological score) or ‘very useful’ (the top utility score) equated to a favourable assessment (scoring 1.0) and ‘fair’, ‘poor’, ‘somewhat useful’ and ‘not useful’ equated an unfavourable assessment (scoring 0.0). The CVI value for each survey was calculated by averaging the favourability scores for each question across the five reviewers, and then averaging the cumulative questions scores for each survey. A CVI of greater than 0.8 (80%) would demonstrate a high level of agreement that the questions meet the reviewers standards, and a CVI below 0.8 suggests the survey does not adequately meet the reviewers standards and would require substantial further revision before further use.

Consensus discussions

Normally CVI assessments stop after independent assessment as the focus is to identify items for which there is a favourable consensus, rather than to understand why different reviewers were unfavourable about discarded items. However, in this study we were interested in understanding why the decisions were reached, what could be learnt from the different reviewers’ perspectives of the errors they identified. In addition from a practical point of view there was an opportunity to provide specific feedback to the collaborative programme to inform modification to individual survey items, and to inform future survey development.

The conversations in the consensus review meetings (9.5 hours in total) were rich discussions in which different viewpoints were considered and a shared understanding of error types was developed.

Consensus discussions first took place within the two separate review groups to reach consensus on the methodological and usefulness scoring. A final consensus discussion was held between all five reviewers to decide if the surveys and questions were both of good quality, and of high use to the clinical teams. Given a practical consideration was the recommendation of surveys to future sites in the collaborative, the consensus scores were ‘recommend’, ‘requires improvement’ and ‘do not recommend’. This was a rigorous quality assessment, in that a question or survey could only be considered ‘recommended’ if it was both very useful and of good methodological quality.

Survey review data analysis

The consensus scores were analysed in Excel using descriptive statistics to analyse how many items had received each score, and to understand the percentage agreement between different reviewers and reviewer groups.

The error types were coded and analysed thematically in Excel, and then discussed and further refined by the authors until agreement was reached.

During these discussions the reviewers articulated and then iteratively developed a description of the survey quality criteria that they had used to assess the surveys, until consensus was achieved. This knowledge had been implicit during the independent scoring, informing each reviewers mental model for the decisions they reached. Through consensus discussion reviewers made these quality criteria explicit and formalised them as descriptions of each scoring quality criteria assessment category.

Results

Surveys

In total, 27 surveys were identified. Seven surveys were removed because they were duplicates or because they were out of scope of the study (eg, surveys not developed for the CF LTT collaborative). Twenty surveys were kept for full analysis which contained a total of 250 individual questions (average 12 questions per survey, ranging from 3 to 46 questions). Each of the 10 regions had developed at least one survey, with an average of two per region, and a maximum of four. Fourteen surveys were for patients, and six were for staff. Topics covered by the surveys included patient experience and satisfaction with services, patient preferences and expectations, staff experience, staff education and confidence, and intervention impact assessment (eg, educational training sessions).

Survey review findings

CVI scores

The CVI score is a composite measure of independent assessments of survey item quality.17 22 The CVI score can range from 0 (poor quality, no survey items received favourable scores from reviewers) to 1 (high quality, all survey items received favourable scores from all reviewers). The average CVI score for the surveys was 0.54, ranging from 0.18 to 0.98. Three of the 20 surveys (15%) scored above the 0.8 quality benchmark. The scores for all three of the surveys were very high (S2=0.93; S4=0.96; S8=0.98) indicating strong agreement of favourability from the reviewers (see figure 2).

Figure 2

Bar chart shows the Content Validity Index (CVI) score by survey where 1.0 equals consensus on favourability of all survey items, and 0.0 equals consensus on unfavourability of all survey items. The minimum required agreement level of 0.80 is indicated by a grey dashed line. The colours of the bars represent the group agreement on the surveys arrived at after consensus discussion: green—recommend; amber—requires improvement; red—not recommended. Survey S19 was scored as requires improvement (useful topic but problems methodologically). However, a consensus decision was made to amend this score to not recommend due to the existence of a validated questionnaire exploring the same topic, making any potential improvements redundant. Hence, S19 is marked with red and amber stripes to show the agreed change to final recommendation status.

Consensus development of survey quality assessment criteria

Consensus was reached on all assessments. However, as expected, the percentage agreement between the independent reviewers prior to the consensus discussions was low.26–28 The individual question assessment had only 54% agreement between the methodological reviewers, and two-way agreement between each pair of the topic specific reviewers ranged between 47% and 55% (three-way agreement between all topic specific reviewers was 36%). Overall survey assessment was higher with 75% agreement between methodological reviewers, and two way agreement between topic specific reviewers ranging between 40% and 60% (three-way agreement 35%). Prior to consensus discussions there was also very low agreement between reviewers scoring for methodological quality and usefulness (18% agreement between all five reviewers for individual questions, and 25% for overall surveys). This suggests that the reviewers brought a wide range of perspectives about what is problematic in a survey item, therefore strengthening the ability of the panel to exclude problematic questions.

Following completion of consensus discussion there was still only 59% agreement between methodological and usefulness reviewers suggesting that both groups of reviewers were indeed assessing different facets of quality in considering what was of methodological quality versus useful for QI.

As demonstrated by the CVI analysis, there was strong agreement on the assessment on the three surveys that scored above the 0.8 CVI threshold. In this subset of surveys there was 93% agreement between the methodological reviewers, and two-way agreement between the topic specific reviewers ranged between 86% and 97% (three-way agreement 86%).

There was a strong tendency for consensus discussions to downgrade the rating of an item in response to new and valid concerns being raised by the reviewers. For example, of the 114 questions disagreed on by the independent assessments by the methodological reviewers, only 5 (4%) ended up rated as ‘good’ in consensus discussions, whereas 68 (60%) were rated as fair, and 41 (36%) were rated as ‘poor’. This is consistent with expectations of having a panel review survey items in being able to identify high-quality questions via elimination of any problematic questions.

From the consensus discussions the reviewers agreed on standard definitions for assessment criteria (table 1).

Table 1

Survey quality assessment criteria

Survey quality assessment consensus scoring

Following the consensus discussion, of all of the individual survey items scored, only 26% (n=65) scored as good methodological quality, with 40% (n=101) and 34% (n=84) scoring as fair and poor methodological quality respectively. A higher proportion of the survey items were scored as very useful (38%, n=95), with 21% (n=53) and 41% (n=102) as somewhat useful and not useful, respectively. In the overall quality assessment of survey items (combining methodological score and usefulness score) only 19% (n=47) of survey items were recommended, with 35% (n=87) requiring improvements, and 46% (n=116) not recommended (figure 3).

Figure 3

Quality assessment of survey items.

For the overall survey only 20% (n=4) were scored as good methodologically, with 55% (n=11) scoring fair, and 25% (n=5) scoring poor. In terms of overall survey usefulness 30% (n=6) scored very useful, 50% (n=10) somewhat useful and 20% (n=4) not useful. For the overall quality assessment of surveys only 15% (n=3) were recommended, the large majority required improvement 65%, (n=13) and 20% (n=4) were not recommended (see online supplemental file 1).

Supplemental material

Of the three surveys that were ‘recommended’, 25 of the 29 survey items were rated recommended, and four required improvement (S2: 9 survey items, 6 recommend, 3 required improvement; S4: 9 survey items, 8 recommended, 1 required improvement; S8, all 11 items were recommended).

Common errors

Following consensus discussions, thematic analysis was conducted on the reasons that surveys and survey items were scored unfavourably (‘fair’ or ‘poor’, ‘somewhat useful’ or ‘not useful’; n=203 questions and n=17 surveys). Twenty-three error types were identified that were grouped under six themes: ethics and appropriateness of questions and survey format; usefulness of survey items to inform learning or lead to action, and methodological issues with survey questions; survey response options; and overall survey design (see tables 2 and 3 for details of the error types with example survey items that demonstrate that error type).

Table 2

Common error types in survey design

Table 3

Common error types in survey design

Many of the survey items had multiple errors. For example, the question: would you ever consider a lung transplant? (response: yes; no; maybe) (S3:5) was considered unethical due to its potential to cause distress (eg, if this was the first time a patient was learning about lung transplant as a treatment option), not suitable for survey format due to the complexity of an answer (eg, lung transplant is a life changing decision with risk of mortality and any individual patient decision will be influenced by multiple complex factors) which would therefore be better to discuss in person instead; and given the simplicity of the response options, reviewers felt it was unlikely to generate useful learning that could be acted on by the improvement team.

Discussion

Summary of findings

This study demonstrates the variable quality of surveys developed by local well intended QI teams, with only a small proportion of surveys and survey items being recommended for use by the review panel. These findings echo similar results highlighting the low quality use of quantitative measurement by local QI teams,4 and suggests that, as with quantitative measurement, developing surveys is a highly technical task that requires time and expertise to develop reliable and meaningful measures.29

The consensus discussions of the surveys highlighted the complex, multifaceted and nuanced examination of detail required for rigorous assessment of the methodological quality and usefulness of survey items. While each individual assessor came with relevant expertise the diversity of views created a rich and dynamic discussion—where one reviewer had a favourable opinion of a survey item another reviewer might identify a problem reflecting their particular knowledge, expertise and experience. This emphasises the value of drawing on multiple expert perspectives to identify flaws with face and content validity. If multiple diverse expert opinions are unable to find flaw with a survey item it is likely to be of high validity.17

During the review of surveys it became clear that the issue of ethics was an important consideration; whether it was appropriate and sensitive to ask patients specific questions that might cause distress, and particularly in a survey form. This was felt to be of high importance to the CF patient population given the serious morbidity and mortality associated with the disease and the young average age of those being surveyed. These findings resonate with previous calls for the need of ethical oversight of QI activities.30

More broadly, reviewers questioned whether survey was the most appropriate method to obtain responses, particularly in instances where there were open ended questions with complex answers which would be better suited to interview. The reviewers concerns resonates with previous studies that have explored the value and limitations of surveys versus interviews in understanding patient experience to inform QI, and suggests the need to better educate and support QI teams to be aware of the variety of methods available, and when and how to best use such methods to inform learning.31,32 The burden of surveys for patients and staff was also considered, especially in relation to obtaining feedback on small Plan-Do-Study-Act tests of change where it was felt a face to face conversation between staff would be a more efficient and effective way of obtaining feedback, or where existing quantitative data could be analysed rather than relying on patient (or staff) recall. These findings echo the call made by Meyer et al to streamline the growing volume of metrics used to in QI, and consider the parsimony and burden of metrics across a project, service or organisation.33

The common error themes of ethics and appropriateness, and usefulness of survey items to inform learning or lead to action are a unique contribution of this research, reflecting issues of primary concern to the healthcare improvement community. The methodological themes on the other hand reflect well known errors in survey design.17 34 The large number of methodological issues identified in the surveys suggests more is required to educate and support QI teams in developing quality surveys. All of these findings emphasise the importance of the normal steps of survey development which should include iterative cycles of testing and development to assess and refine survey items. Survey developers should strive to put themselves in the shoes of the patients (or staff) intended to use the survey, ideally engaging representative respondents in the design and development of surveys, and ensure that rigorous peer-review and piloting of the surveys take place. Importantly any surveys should have a clear purpose that directly links to improvement goals, and a clear plan for how any data collected will inform learning and action of the QI initiative.

Developing good quality surveys is a highly technical and time consuming task. Based on the evidence of this study we suggest that teams think carefully before deciding to embark on developing a new survey— in terms of the ethics, the appropriateness of the survey format, the burden on staff and patients, and the usefulness of data collected to directly lead to learning and action to improve quality. Only if teams are satisfied that their needs meet all of these criteria should they proceed, and then with caution ensuing there is sufficient time and resource to properly validate and pilot surveys before using them in practice. Given our experience in conducting this review we would advocate the establishment or oversight groups to ensure the appropriateness and quality of surveys being used within a collaborative which is a time efficient approach to support QI teams.

Methodological considerations and further research

This study conducted a rigorous assessment of the content validity of the collated surveys using multiple expert reviewers. Further research could be conducted on the recommended surveys including cognitive interviewing and survey piloting (including data collection, cleaning and analysis), which will likely identify further areas for improvement. In addition, this study was limited to the review of the actual surveys, and did not consider any (formal or informal) protocols for sampling, data collection, cleaning or analysis.

This study is limited in that it only considered the independent reviews of the surveys. Three of the five reviewers had prior engagement coaching a small number of teams within the collaborative. This was a strength in terms of the expert knowledge and contextually understanding of the relevance and utility of the surveys, but is also a limitation as a potential source of bias. The inclusion of five interprofessional reviewers who scored independently was used to reduce the risk of bias and increase the trustworthiness of the data. The QI teams that developed the surveys were not interviewed or observed to understand their perceptions of the survey instruments, or to establish the true utility of the surveys in a practice setting. Further research should be conducted to explore how locally developed surveys are perceived by QI teams, how they are used in practice, and what, if any, impact they have on the QI initiative. For example, it may well be that a survey with poor face validity nonetheless provokes useful insights and leads to meaningful change. Understanding the holistic value of surveys in a practical QI setting is critical to inform how much time and resources should be invested. Successful improvements have been observed in response to patient surveys where there is strong QI infrastructure and culture to support acting on survey results,23 24 suggesting it is also important to understand the context in which surveys are being used.25

Implications for research and practice

This research was of direct value to the CF LTT Learning and Leadership Collaborative, highlighting which surveys were of high quality and which required improvements, which can help focus efforts on gathering information of interest. In addition, the identification of high-quality surveys as a result of the review has the potential to save time of improvement teams, avoiding the need for teams to create, format and test their own surveys, when instead they can build on the experience of others in the collaborative. The reviewed surveys also serve to increase the quality of information gathered to inform improvement plans.

This research is also of value to other collaboratives and QI teams, providing advice to think rigorously about value and science of survey development. We recommend that leaders of larger QI collaboratives invest in expertise to support cross-site survey design and timely review to increase rigour of measurement and value to the clinical teams and evaluators while meeting the pace at which QI projects operate.

The methodology used in this paper can be applied to the review of surveys in other QI collaboratives to help improve the quality of survey instruments. The quality assessment criteria and common errors identified through consensus discussion provide explicit guidance to others developing or reviewing surveys in QI. This list is not intended to be comprehensive as it is solely reflective of the issues identified within this study, but can act as a valuable starting point to guide review and expansion. The comparative time invested by the reviewers was considered a valuable investment in complement to the extensive effort already invested by all of the local QI teams.

This research also has implications for other researchers in considering what it means to apply QI approaches with fidelity, how learning can be supported in complex systems, and with implications for the organisational resource infrastructure required to effectively support improvement in practice.

Conclusion

The development of surveys requires careful consideration, time and expertise. Before developing a survey, consideration should be given as to whether a survey is the most appropriate form of capturing information, and whether a survey best meets the need of the QI team and the targeted population. Once QI teams decide that a survey is the best way to gain knowledge of the patient and/or staff, multiple issues require considerations to ensure the rigorous design of the survey. There is a need to educate and support QI teams to adhere to good practice and avoid common errors, thereby increasing the value of surveys for evaluation and QI. The methodology, quality assessment criteria and common errors described in this paper can provide a useful resource for others, and highlights the value of having an oversight group to ensure the quality of surveys and to facilitate sharing of learning between improvement teams.

Patient and public involvement

Patients or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research.

Data availability statement

No data are available.

Ethics statements

Patient consent for publication

Ethics approval

IRB approval was obtained from University of New Hampshire IRB: UNH IRB-FY2021-60.

Acknowledgments

We would like to acknowledge the Cystic Fibrosis Foundation leadership and all members of the CF LTT LLC and RDN communities who made this research possible.

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Contributors MMG conceived the study idea and is the guarantor of the research and publication. JER designed the study with input from JJ. JER led the conduct of the study with contributions made by JJ, RZ, RM and FA to data collection and analysis. All authors were involved with interpretation of the findings and writing of the manuscript. All authors have read and approved the final manuscript.

  • Funding This research was supported by award number GODFRE20QI2 from the Cystic Fibrosis Foundation. The authors gratefully acknowledge the financial support provided by the Cystic Fibrosis Foundation. MMG receives CF improvement collaborative grant funding from the CF Foundation. RZ, FA, RM, JJ are CF Quality team coach consultants for CF improvement collaboratives and JR was contracted as an independent improvement scientist consultant to conduct this research.

  • Competing interests None declared.

  • Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.