Predicting data saturation in qualitative surveys with mathematical models from ecological research

doi:10.1016/j.jclinepi.2016.10.001

Journal of Clinical Epidemiology

Volume 82, February 2017, Pages 71-78.e2

https://doi.org/10.1016/j.jclinepi.2016.10.001 Get rights and content

Abstract

Objective

Sample size in surveys with open-ended questions relies on the principle of data saturation. Determining the point of data saturation is complex because researchers have information on only what they have found. The decision to stop data collection is solely dictated by the judgment and experience of researchers. In this article, we present how mathematical modeling may be used to describe and extrapolate the accumulation of themes during a study to help researchers determine the point of data saturation.

Study Design and Setting

The model considers a latent distribution of the probability of elicitation of all themes and infers the accumulation of themes as arising from a mixture of zero-truncated binomial distributions. We illustrate how the model could be used with data from a survey with open-ended questions on the burden of treatment involving 1,053 participants from 34 different countries and with various conditions. The performance of the model in predicting the number of themes to be found with the inclusion of new participants was investigated by Monte Carlo simulations. Then, we tested how the slope of the expected theme accumulation curve could be used as a stopping criterion for data collection in surveys with open-ended questions.

Results

By doubling the sample size after the inclusion of initial samples of 25 to 200 participants, the model reliably predicted the number of themes to be found. Mean estimation error ranged from 3% to 1% with simulated data and was <2% with data from the study of the burden of treatment. Sequentially calculating the slope of the expected theme accumulation curve for every five new participants included was a feasible approach to balance the benefits of including these new participants in the study. In our simulations, a stopping criterion based on a value of 0.05 for this slope allowed for identifying 97.5% of the themes while limiting the inclusion of participants eliciting nothing new in the study.

Conclusion

Mathematical models adapted from ecological research can accurately predict the point of data saturation in surveys with open-ended questions.

Section snippets

Context

Surveys with open-ended questions are a simple design to explore the different aspects of a concept in a given population [1]. This design is popular in many fields, including health research, social science, and marketing. For example, in health research, surveys may help identifying the topics that should be addressed in items of patient-reported outcomes [2]. The use of open-ended questions allows respondents to describe with nuance and detail how they perceive the concept under study. By

Methods

We used mathematical modeling to determine the point of data saturation in surveys using open-ended questions. It is important to note that the aim of our work was not to predict the themes, ideas, and meanings that patients may elicit on the topic of interest but rather to estimate how these new ideas are discovered and accumulated across the whole sample of participants during a study.

Performance of the model

In both our study of the burden of treatment and the simulated data sets, the model reliably predicted the number of themes to be found by doubling the sample size of a study. In our study of the burden of treatment, the prediction errors were <2% (difference between expected and observed number of themes were at most 2 of 123 themes) with initial samples of 25, 50, 100, and 200 participants (Table 1 and Fig. 1).

The excellent predictive capability of the model was confirmed with the first group

Discussion

In this study, we showed that models used in ecology to determine species richness could help with qualitative research involving surveys with open-ended questions to predict what themes will be discovered with the inclusion of more units of analysis. Determining when to stop data collection is a thorny question asked by both novice and experienced researchers in qualitative research [6]. However, there is a surprising paucity of explicit discussion of this basic issue in textbooks and articles

Conclusions

In surveys with open-ended questions, the point of data saturation and number of participants to include can be estimated with mathematical models from ecological research.

Acknowledgments

The authors thank Laura Smales (BioMedEditing) for editing.

Authors' contributions: V.-T.T., R.P., V.-C.T., and P.R. conceived and designed the experiments. V.-T.T. and R.P. analyzed data. V.-T.T. wrote the first draft of the article. V.-T.T., R.P., V.-C.T., and P.R. contributed to the writing of the article. V.-T.T., R.P., V.-C.T., and P.R. met ICMJE criteria for authorship. V.-T.T., R.P., V.-C.T., and P.R. agreed with article results and conclusions. P.R. is the guarantor, had full access to

References (17)

C.B. Terwee et al.
Quality criteria were proposed for measurement properties of health status questionnaires
J Clin Epidemiol
(2007)
H. Jansen
The logic of qualitative survey research and its position in the field of social research methods
Forum Qual Social Res
(2010)
N. Denzin et al.
The discipline and practice of qualitative research
B. Glaser et al.
The discovery of grounded theory: strategies for qualitative research
(1967)
G. Guest et al.
How many interviews are enough? An experiment with data saturation and variability
Field Methods
(2006)
S. Baker et al.
How many qualitative interviews are enough? Expert voices and early career reflections on sampling and cases in qualitative research
(2012)
M. Sandelowski
Sample size in qualitative research
Res Nurs Health
(1995)
K. Ugland et al.
The species accumulation curve and estimation of species richness
J Anim Ecol
(2003)

There are more references available in the full text version of this article.

Cited by (52)

Individual differences in the definitions of health and well-being and the underlying promotional effect of the built environment
2024, Journal of Building Engineering
Although “health” and “well-being” have been the center of attention within the medical and psychological fields for many centuries, a growing body of evidence suggests that this interest has been expanding to many other disciplines, such as architecture and engineering. Consequently, more inclusive definitions of “health” and “well-being” are needed to incorporate the contribution of other fields, which will facilitate interdisciplinary studies on the topic. However, health and well-being are complex concepts, therefore, more research is required to understand the individual differences presented within those definitions and how the contribution of other fields is perceived. Therefore, the present study adopted a mixed-methods approach to investigate how “health” and “well-being” have been defined within the literature and in the community with the aim to clarify the definitions of those two terms further. A Rapid Evidence Assessment (REA) review was conducted to summarize the definitions of “health” and “well-being” within the literature, particularly within the field of built environment. Additionally, an online survey was administered to experts and laypeople to explore their own definitions of “health” and “well-being”. The results from the REA review demonstrated a paradigm shift over the years, with “health” being perceived on a continuum and with an emphasis on people's ability to adapt to any presenting physical or mental conditions. However, there was still limited reference to the environmental contributions to the definitions of “health” and “well-being”. The findings from the survey indicated that there are four groups of people, those who believe that either health or well-being are necessary, those who believe that there is no interconnection between the two constructs, those who believe that there is an interconnection between them and those who believe that health goes beyond well-being. Future studies could explore these individual differences in definitions further in order to clarify whether the differences in perception are based on knowledge on the concepts or on people's different piorities in life and how these findings could be incorporated in future definitions of “health” and “well-being”. New working definitions are proposed suggesting a paradigm shift in defining health and well-being based on the underlying processes involved.
Return-to-work with long COVID: An Episodic Disability and Total Worker Health® analysis
2023, Social Science and Medicine
A growing number of working individuals have developed long COVID (LC) after COVID-19 infection. Economic analyses indicate that workers' LC symptoms contribute to workforce shortages. However, factors that affect return-to-work from perspectives of people with LC remain largely underexplored. This qualitative study of people with LC conducted by researchers living with LC aimed to identify participants' return-to-work experiences using Total Worker Health® and Episodic Disability frameworks. 10% of participants who participated in a mixed-method global internet survey, had LC symptoms >3 months, and responded in English were randomly selected for thematic analysis using NVivo12. 15% of responses were independently double-coded to identify coding discrepancies. Participants (N = 510) were predominately white and had at least a baccalaureate degree. Four primary work-related themes emerged: 1) strong desire and need to return to work motivated by sense of purpose and financial precarity; 2) diverse and episodic LC symptoms intersect with organization of work and home life; 3) pervasiveness of LC disbelief and stigma at work and in medical settings; and 4) support of medical providers is key to successful return-to-work. Participants described how fluctuation of symptoms, exacerbated by work-related tasks, made returning to work challenging. Participants’ ability to work was often predicated on job accommodations and support. Non-work factors were also essential, especially being able to receive an LC medical diagnosis (key to accessing leave and accommodations) and help at home to manage non-work activities. Many participants described barriers accessing these supports, illuminating stigma and disbelief in LC as a medical condition. Qualitative findings indicate needs for workplace accommodations tailored to fluctuating symptoms, continuously re-evaluated by workers and supervisors together. Reductions in medical barriers to access work accommodations is also critical since many medical providers remain unaware of LC, and workers may lack a positive COVID test result.
To group or not to group? Group sizes for requirements elicitation
2023, Information and Software Technology
Requirement elicitation can be done by individuals or by groups. Computer-based system development life-cycle models suggest having people working together for many steps. Also, recommendations about analysis and design methods indicate that some processes could take advantage of group work. In requirements engineering, groups are suggested for requirements elicitation.
From the software and the requirements engineering viewpoints, and in turn for companies, a relevant overall research question is “What is a suitable size for a requirements elicitation group?” Our goal was to answer this question, first by looking for available guidelines in textbooks and secondly by investigating requirements elicitation in companies.
To address the research question, we conducted two studies. The first was a review of most widely adopted software and requirements engineering textbooks. The second was a study aimed at identifying factors affecting group size for requirements elicitation, based on an online questionnaire submitted to professional analysts.
The review of the textbooks showed that very few give advice on the number of analysts to involve in requirements elicitation sessions. When they do, guidelines are quite general and not supported by empirical data. According to data gathered from the questionnaire, most companies use and suggest using small groups. Data also allowed identifying four categories of factors useful to make decisions about requirements elicitation group sizes: people, relation, project, and output.
Both the textbook review and the data from the questionnaire say that it is better to aim for small groups than to have individual analysts working separately. The ideal number of analysts for a requirements elicitation session appears to be 2, but large groups are necessary in some cases. Factors in all the four categories have to be considered in deciding the size of groups.
Beyond food: Framing ecosystem services value in peri-urban farming in the post-Covid era with a multidimensional perspective. The case of Cascina Biblioteca in Milan (Italy)
2023, Cities
The idea that it is possible to overcome the post-covid crisis starting from urban projects is becoming increasingly popular (Balducci, 2020). This moment becomes a precious opportunity to experiment with innovative, multidisciplinary and multi-scalar methodologies for an urban planning and design capable of condensing apparently distant concepts and approaches that are nonetheless congenial to the same goals. In this sense, multifunctional agriculture (MFA) recognizes multiple functions including food production, environmental preservation and social inclusion, which can be identified as Ecosystem Services (ESs). In our contribution, the case study of a multifunctional farmhouse in the Milan suburban area is proposed as an opportunity to test an integrated preliminary evaluation model to support decisions concerning urban planning and design, with the goal of maximizing the performance of the ecosystem services provided in MFA field.
Demystification and Actualisation of Data Saturation in Qualitative Research Through Thematic Analysis
2024, International Journal of Qualitative Methods
Sample Size and Saturation: A Three-phase Method for Ethnographic Research with Multiple Qualitative Data Sources
2024, Field Methods

View all citing articles on Scopus

: Conflict of interest: None.

: Funding: This study was funded by the French Health Ministry (PHRC AOM13127). Our team is supported by an academic grant from the program “Equipe espoir de la Recherche,” Fondation pour la Recherche Médicale, Paris, France (no. DEQ20101221475). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the article.

View full text

Original ArticlePredicting data saturation in qualitative surveys with mathematical models from ecological research

Abstract

Objective

Study Design and Setting

Results

Conclusion

Section snippets

Context

Methods

Performance of the model

Discussion

Conclusions

Acknowledgments

J Clin Epidemiol

The logic of qualitative survey research and its position in the field of social research methods

Forum Qual Social Res

The discipline and practice of qualitative research

The discovery of grounded theory: strategies for qualitative research

How many interviews are enough? An experiment with data saturation and variability

Field Methods

How many qualitative interviews are enough? Expert voices and early career reflections on sampling and cases in qualitative research

Sample size in qualitative research

Res Nurs Health

The species accumulation curve and estimation of species richness

J Anim Ecol

Original Article
Predicting data saturation in qualitative surveys with mathematical models from ecological research