Article Text

## Statistics from Altmetric.com

## Background

Reporting information on the quality of healthcare providers is a popular strategy attempting to improve the overall healthcare quality at the national level.1 2 The basic idea is to force providers to compare against each other and to stimulate poorly performing providers to improve their quality. In addition, patients may get the opportunity to select a well-performing provider when looking for a specific treatment.

A variety of quality indicators have been established over the last decades, and many countries have established a nationwide assessment of an indicator set.3–10 Publishing an annual report listing the results of each provider from the previous year is a common practice, and the listings may be available to the providers, a central body or even to the public (eg, ref 11–13). Using a fixed observation period of 1 year implies that the number of patients per provider (ie, the annual volume) is varying across the providers. An alternative approach would be to include the same fixed number of patients from each provider, going back in time as long as necessary for each provider. In other words, the sample size is fixed instead of the observation period. This alternative approach seems feasible today, as many countries have collected data on the quality indicators in a rather stable fashion over the last years. It is the purpose of this paper to discuss potential advantages and challenges associated with this alternative approach.

We start with a look at the current practice of presenting profile data in annual reports and of identifying providers with poor or good performance. We then present three advantages of the fixed sample size approach and five challenges in implementing such an approach. Finally, we discuss the perspective to start supplementing the current reporting practice with reports based on fixed sample sizes.

### A short outline of the current methodology for provider profiling

Considering the case of a binary quality indicator (eg, 30-day mortality rate, readmission rate, wound infection rate, etc) and ignoring the need for case-mix adjustment, the data used for provider profiling simply consist of the observed relative frequencies and the volume of each provider j. To judge and compare the providers, the values are inspected and compared with the overall level , that is, the relative frequency over all providers. It is widely accepted that the stochastic imprecision of each relative frequency should also be taken into account. This reflects the desire to base statements and comparisons on the true underlying probability . This probability reflects the quality future patients can expect if the provider j continues to manage patients at the current quality level. Two approaches to visualise the uncertainty are popular and illustrated in figure 1: (1) Each estimate is surrounded by an *α*-CI covering the true value with probability *α*. (2) In a funnel plot,14 15 the estimates are contrasted with so-called control limits, such that the estimates should be within the control limits with probability *α*, if the true value is identical to the overall level . Popular choices for *α* are 95% or 99.8%.

However, both ways of visualisation do not directly identify any provider as poorly or well performing. This can be approached by applying the corresponding rules. For example, a statistically significant deviation from the overall level may be required, that is, the CI does not cover the overall level, or the estimate is outside of the control limits. It is also possible to define a threshold for the true deviation from the overall level = *−* and to aim at identifying the providers above (or below) this threshold. Such a threshold should reflect that even under ideal circumstances some variation in the true values is acceptable, for example, due to staff fluctuations and corresponding learning curve effects.

It has been recommended16 17 to also take the overall variation of the estimated values into account. If the overall variation is close to being explainable by random fluctuation (ie, most estimates are within the control limits), we may hesitate to call any provider a poorly performing provider. If the overall variation is high, we have good reasons to call at least some providers poorly performers. Formally, this idea is typically approached by considering the so-called posterior distribution of This distribution reflects the knowledge about given the observed relative frequency , the volume and the overall variation. This distribution is located closer to in the case of a low overall variation than in the case of a high overall variation. The degree of shrinkage towards depends on the volume . The smaller the volume, the larger is the degree of shrinkage, reflecting the limited knowledge about . The posterior distribution (of or ) can be computed analytically (or at least approximated) and can serve as the basis for alternative rules. Popular choices are to compare the posterior mean of with the threshold , to require a certain posterior probability of to be above 0 or to require a certain posterior probability of to be above the threshold.16 18 19 It is also possible to go one step further and to consider the posterior distribution of the true rank of provider j among all providers based on the true values .20 21

### Advantage 1: no dependence on volume

As mentioned above, it is desirable that classifying providers as poorly performing should depend on the true value . Two providers with the same value of should have equal probability to be labelled as a poor performer. Unfortunately, this is not the case for the commonly used rules. The probability to be labelled as a poor performer depends on the volume of the provider. Such probabilities can explicitly be determined if assumptions are made about the distributions of and across the providers. Table 1 specifies such a scenario, and the corresponding probabilities are shown in figure 2. Poorly performing providers with a true value of above the threshold tend to have an increasing probability to be marked as poorly performing with increasing volume. The variation can be quite substantial. For example, when using as a rule a significant deviation from 0 (SIG), a provider with a true value of about 3% has a probability of 13% to be labelled as a poorly performing provider in case of a volume of 40 patients, but a probability of 68% in case of a volume of 640 patients. When using the rule of a posterior mean being above the threshold (PMAT), the probabilities are 29% and 81%, respectively.

### Supplemental material

Such a variation can be regarded as unfair—at least from the perspective of high-volume providers. It can be also seen as unfair from a patient perspective: why should a patient be at risk to overlook that her or his personal provider is a poor performer, just because it is a low-volume provider? From a societal perspective the situation is less clear: detecting (and removing) poor quality in high-volume providers has a higher impact than in low-volume providers, as more patients would benefit.

### Advantage 2: simplified presentation and interpretation of results

Presenting and interpreting results simplifies in the case of using a fixed sample size. The upper half of figure 3 illustrates this point by replicating the upper half of figure 1 in the case of equal sample sizes for all providers. Considering the estimates themselves or the lower or upper boundary of the CIs always gives the same ordering of the providers. This reflects the simple fact that if all providers contribute with the same sample size, there is only one piece of information about the true underlying value , namely the relative frequency itself. Hence, any reasonable rule to order the providers will give the same ordering. Consequently, there remains little doubt about which provider has the worst *observed* quality.

There still remains the question which providers should be marked as poor performers, and the rules mentioned above can still be applied. However, there is the advantage that any rule results into a horizontal line, as illustrated in the lower half of figure 3.

### Advantage 3: better decisions

Various types of decisions can be made based on provider profiling. Two examples are considered: (1) identifying the best local provider; (2) identifying poorly performing providers above the threshold. For each example, we consider various decision rules and compare their performance between using a fixed observation period and a fixed sample size. For the first, we consider again the scenario described in table 1; for the second, we fix the sample size to 134, such that the overall number of patients included in an annual analysis is identical.

If patients use provider profiling to select the best provider, they often focus on a preselection of local providers. Hence, we consider the decision to identify the best out of five randomly chosen providers. As performance criterion for the decision rule we consider the probability to identify the best local provider and the probability to identify a provider with a performance at most 1 percentage point above the best local provider. According to table 2, a fixed sample size always implies a higher probability. Moreover, all three decision rules lead to identical decisions in the fixed sample size case, and hence perform identically.

Regulators may be interested in identifying all poorly performing providers with a value above the threshold, for example, if they want to invite poorly performing providers for a review of their quality management. Here, a decision rule has to generate a list of providers. The performance of such a rule can be assessed by its sensitivity and specificity, that is, the probability of a provider above the threshold to be included in the list and the probability of a provider below the threshold not to be included, respectively. According to figure 4, using a fixed sample size moves the point given by sensitivity and specificity closer to the optimal value (1.0, 1.0) in the right upper corner, independent of the decision rule used. It depends, however, on the decision rule, whether mainly sensitivity or mainly specificity is improving.

### Challenge 1: scheduling of analyses

When fixing the observation period to 1 year, an obvious choice for scheduling analyses is an annual scheduling. When fixing the sample size, there is no natural choice for the scheduling such as ‘after the next 100 patients’, as this is reached for each provider at different time points. Consequently, there is a need for criteria to decide when a fixed sample size analysis should be performed and which sample size should be used.

A starting point may be to stick to annual reporting and aim at including the same overall number of patients as before—as we did in our considerations above when discussing advantage 3. This would imply observation periods shorter than 1 year for high-volume providers and observation periods longer than 1 year for low-volume providers (cf scenario II in figure 5). However, annual reporting implies now to use only a part of the patients available within each year from high-volume providers, that is, to throw away information. Consequently, it would be natural to schedule the analyses more frequently to ensure that each patient contributes at least once to the analysis (scenario III). Then providers with a very high volume and hence very short observation periods will imply very frequent analyses. To avoid this, prolonged observation periods have to be allowed for these providers (scenario IV). On the other hand, providers with a very low volume will have very long observation periods, and the connection to the actual situation for these providers may be lost. Consequently, a maximal value for the observation period may also be set, and some providers may be included with a smaller sample size (scenario V). A realistic choice may be to aim at three analyses per year and a maximal observation period of 3 years, implying that many providers would be included with the desired sample size even if they differ in annual volume up to a factor of 9.

### Challenge 2: time-varying quality

The quality of a provider may vary over time. Consequently, fixing the observation period or fixing the sample size can lead to different conclusions about a single provider. The most crucial issue is a sudden change from good to poor performance. In a high-volume provider, this can be detected rather quickly when allowing short observation periods, in particular if several analyses per year are made (scenarios III–V in figure 5). On the other hand, for a low-volume provider a long observation period may mask a sudden change. However, even when fixing the observation period to a shorter interval, it is not likely to detect this change due to the limited sample size the low-volume provider is contributing.

In case a quality indicator may be affected by seasonal variation (eg, due to typical changes in the patient population or working conditions during a calendar year), it might be necessary to round the provider-specific observation periods to full years.

### Challenge 3: overlap between analyses

Scheduling fixed sample size analyses regularly in time implies that there is overlap between patient populations for different analyses, especially for low-volume providers (cf figure 5). This has at least two consequences. First, the results for low-volume providers from one analysis are highly predictive of the results from the subsequent analyses. This may encourage well-performing low-volume providers to de-emphasise their efforts in maintaining high quality. On the other side, poorly performing low-volume providers may want to include only new patients in the next analysis in order to have a chance that their efforts to improve quality become visible. Second, high-volume providers may interpret the higher fluctuations from analysis to analysis (compared with low-volume providers) as a higher risk to be marked as poorly performing due to random fluctuations, and hence as unfair. Hence, it is essential to inform them that they have in the long run the same risk to be marked as any low-volume provider of the same quality.

### Challenge 4: introducing new indicators

Fixing the sample size implies that observation periods are defined retrospectively: starting at the current time point of the intended evaluation, we go backwards into the past until the sample size is reached. Introducing new indicators or a new assessment procedure for an existing indicator may imply that for some providers the intended sample size cannot be reached at the first evaluation after the introduction, requiring to accept the available number of patients as sample size. This issue does not appear when using fixed observation periods if the introduction happens at an evaluation time point.

### Challenge 5: choosing the sample size

Fixing the sample size requires the choice of a sample size. Using statistical power considerations for sample size determination for provider profiling is not straightforward, as provider profiling can be used for different purposes—this is illustrated by our two examples. Nevertheless, there exist practical suggestions for sample size calculations.15 22 23 The essential point is that sample size considerations can take into account knowledge about the expected prevalence and the expected spread in performance and volume across providers based on the data from previous years. By contrast, when fixing the observation period to 1 year, the sample size is determined completely by the annual volume of patients.

### Challenge 6: reorganisation of the data flow

Reporting provider profiles requires that some reporting body has access to all necessary information. Often, providers enter the individual patient data into a central database accessible to the body. The body can then directly implement changes in the reporting practice. However, the data flow between the providers and the central body is typically a complex process involving checking and cleaning procedures to ensure completeness and high data quality. The annual reporting defines typically a corresponding cycle in the data flow. When increasing the frequency of reporting there is a need for a more continuous quality control, increasing the burden for the providers. This is even more the case if providers are already required to provide aggregated data.

## Discussion

Fixing the sample size for provider profiling analyses has some clear advantages compared with fixing the observation period: a dependence of decisions on the volume is avoided, the visualisation of the results and the ranking with respect to the observed quality becomes much simpler and the quality of decisions is improved in the long run. Practical challenges in implementing this idea may make it necessary to allow some deviation from a fixed sample size for some providers. However, even in that case the advantages shown above remain, in the sense that the results are still easier to compare across providers and decisions tend to be better.

One practical obstacle against fixing the sample size might be the necessity to measure quality under stable conditions for more than 1 year. However, this seems to be the case today in many countries for many indicators. On the other side, the introduction of new indicators or major changes to the current assessment procedures will always make it necessary to deviate from a strict fixed sample size approach for some time periods. Practical obstacles may also arise from the need to adapt the data flow to a more continuous reporting.

In spite of these obstacles, we regard these advantages as sufficiently relevant to consider alternative reporting strategies aiming at more comparable sample sizes across providers. A first simple step would be to present reports based on a fixed sample size in addition to reports based on a fixed observation period. If this type of reporting becomes popular, further refinements may be considered. The long-term aim should be an ‘optimal strategy’ informed by knowledge about the expected magnitude of fluctuations in quality across providers and over time, while taking simultaneously the organisational needs into account.

Moving into this direction will make provider profiling a more complex task than the current practice of annual reporting. However, it should be kept in mind that the current practice of annual reporting has just emerged over time and does not involve any considerations about optimal design or sample sizes. Hence, moving towards fixed sample sizes may make a significant contribution to improving the field, in particular if this happens simultaneously with additional methodological improvements.24

We have not considered in this paper the need for case-mix adjustment.17 25 Addressing this issue requires the use of standardised prevalence values instead of raw prevalence values. However, most considerations presented in this paper apply equally to standardised prevalence values, except that the precision of these estimates depends also on the distribution of the patient characteristics for each provider. However, this variation in precision will be usually small. In our simulations, we also ignored a potential dependence of the deviations from the overall level on the volume of the providers.

Finally, we note that data on quality indicators can and should be also used for other purposes than comparing providers, for example, for internal quality monitoring or creating annual administrative reports. These purposes imply specific ways of reporting and should be handled separately.

## Conclusion

Fixing the sample size instead of fixing the observation period is a valuable alternative in performing provider profiling, and we recommend supplementing the current practice accordingly. This should be regarded as a step to place the design of the analysis and reporting strategy for provider profiling data on a more rationale base.

## Ethics statements

### Patient consent for publication

## Supplementary materials

## Supplementary Data

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

## Footnotes

Contributors WV developed the idea for this project and conducted all computations. All authors participated in the interpretation and phrasing of the results. All authors approved the final version of the manuscript.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.