Original ArticlePredicting data saturation in qualitative surveys with mathematical models from ecological research
Section snippets
Context
Surveys with open-ended questions are a simple design to explore the different aspects of a concept in a given population [1]. This design is popular in many fields, including health research, social science, and marketing. For example, in health research, surveys may help identifying the topics that should be addressed in items of patient-reported outcomes [2]. The use of open-ended questions allows respondents to describe with nuance and detail how they perceive the concept under study. By
Methods
We used mathematical modeling to determine the point of data saturation in surveys using open-ended questions. It is important to note that the aim of our work was not to predict the themes, ideas, and meanings that patients may elicit on the topic of interest but rather to estimate how these new ideas are discovered and accumulated across the whole sample of participants during a study.
Performance of the model
In both our study of the burden of treatment and the simulated data sets, the model reliably predicted the number of themes to be found by doubling the sample size of a study. In our study of the burden of treatment, the prediction errors were <2% (difference between expected and observed number of themes were at most 2 of 123 themes) with initial samples of 25, 50, 100, and 200 participants (Table 1 and Fig. 1).
The excellent predictive capability of the model was confirmed with the first group
Discussion
In this study, we showed that models used in ecology to determine species richness could help with qualitative research involving surveys with open-ended questions to predict what themes will be discovered with the inclusion of more units of analysis. Determining when to stop data collection is a thorny question asked by both novice and experienced researchers in qualitative research [6]. However, there is a surprising paucity of explicit discussion of this basic issue in textbooks and articles
Conclusions
In surveys with open-ended questions, the point of data saturation and number of participants to include can be estimated with mathematical models from ecological research.
Acknowledgments
The authors thank Laura Smales (BioMedEditing) for editing.
Authors' contributions: V.-T.T., R.P., V.-C.T., and P.R. conceived and designed the experiments. V.-T.T. and R.P. analyzed data. V.-T.T. wrote the first draft of the article. V.-T.T., R.P., V.-C.T., and P.R. contributed to the writing of the article. V.-T.T., R.P., V.-C.T., and P.R. met ICMJE criteria for authorship. V.-T.T., R.P., V.-C.T., and P.R. agreed with article results and conclusions. P.R. is the guarantor, had full access to
References (17)
- et al.
Quality criteria were proposed for measurement properties of health status questionnaires
J Clin Epidemiol
(2007) The logic of qualitative survey research and its position in the field of social research methods
Forum Qual Social Res
(2010)- et al.
The discipline and practice of qualitative research
- et al.
The discovery of grounded theory: strategies for qualitative research
(1967) - et al.
How many interviews are enough? An experiment with data saturation and variability
Field Methods
(2006) - et al.
How many qualitative interviews are enough? Expert voices and early career reflections on sampling and cases in qualitative research
(2012) Sample size in qualitative research
Res Nurs Health
(1995)- et al.
The species accumulation curve and estimation of species richness
J Anim Ecol
(2003)
Cited by (52)
Individual differences in the definitions of health and well-being and the underlying promotional effect of the built environment
2024, Journal of Building EngineeringReturn-to-work with long COVID: An Episodic Disability and Total Worker Health® analysis
2023, Social Science and MedicineTo group or not to group? Group sizes for requirements elicitation
2023, Information and Software TechnologyDemystification and Actualisation of Data Saturation in Qualitative Research Through Thematic Analysis
2024, International Journal of Qualitative Methods
Conflict of interest: None.
Funding: This study was funded by the French Health Ministry (PHRC AOM13127). Our team is supported by an academic grant from the program “Equipe espoir de la Recherche,” Fondation pour la Recherche Médicale, Paris, France (no. DEQ20101221475). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the article.