Healthcare organisations in the USA rank significantly lower in quality of care compared with other developed nations. Research shows US performance emphasises expensive treatment over effective prevention programmes. This study demonstrates how a comprehensive quality improvement programme can improve health outcomes in a large county-based Medicaid health plan. The health plan serves a diverse community of members spanning racial and ethnic groups with varying levels of clinical risk and social determinants of health burdens. We used a regression discontinuity design to evaluate the impact of a comprehensive quality improvement programme vs using mainly pay-for-performance on Healthcare Effectiveness Data and Information Set (HEDIS) metrics over the course of 10 years. We found significant improvements in several HEDIS metrics that occurred after the quality improvement programme was implemented. These results demonstrate the importance of using a comprehensive quality improvement strategy along with pay-for-performance to improve health outcomes. It was determined that this research was exempt from institutional review board approval, as it used administrative healthcare data, and did not involve direct interventions with human subjects.
- Pay for performance
- Quality improvement methodologies
- Health Promotion
Data availability statement
No data are available. These data are considered confidential and not subject to public disclosure under California WI Section 14087.38 (n), (o) or (p) or other applicable law.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
The quality of healthcare in the USA lags behind other industrialised countries.1 2 The USA lags in many measures of quality such as obesity rates, diabetes rates, low birth weight, infant mortality, ischaemic heart disease rates and AIDS incidence.2–4 The 2021 report from the Organisation for Economic Cooperation and Development (OECD)5 cites the USA as having a life expectancy lower than the OECD average, yet spending more on health as a percentage of gross domestic product than any other OECD country. The system in the USA must change so that providers are incentivised to provide high quality care, not just high cost or high volume care. Pay-for-performance (P4P) is one method that has been proposed as a means of addressing the current hurdles that face US healthcare.6 P4P is part of a larger redirection in payment methodologies, part of what are referred to as value-based payment (VBP) methods. In 2015, Centers for Medicare & Medicaid Services (CMS) set a goal of moving 85% of Medicare fee-for-service payments to VBP methods such as P4P by 2016 and 90% by 2018.7 Since then value-based care has spread rapidly across the country, with approximately 40% of Medicare fee-for-service payments, 30% of commercial payments and 25% of Medicaid payments today being made through some form of value-based arrangement.8 9
However, P4P alone does not necessarily lead to improved healthcare outcomes. Rather, what is required is a coordinated quality improvement (QI) strategy across multiple fronts that will lead to improvements in healthcare quality and outcomes. The current study presents such a coordinated comprehensive strategy, incorporating periodic performance reporting, annual P4P and collaborative QI efforts.
One component of P4P programmes is the creation of provider reports. Healthcare payers provide performance reports to providers as a way of tracking and improving care.10 As part of the transition to VBP, CMS posts comparative performance data on their Hospital Compare website, but researchers and healthcare providers have expressed concerns about the quality and reliability of such report cards.11–14 Report cards rely on composite metrics to reflect performance across multiple domains. Some have cautioned about the use of composite measures in the creation of such report cards.10 15 It is essential that the method used to create composite measures, as well as the logic behind their selection is transparent, well rounded and not focused too heavily on only a few quality areas.
Overall the evidence for the effectiveness of P4P alone across multiple meta-analyses is not promising.16–19 A recent meta-analysis found few improvements in how P4P methods are implemented, often using blunt incentive methods not directly tied to improvements in outcomes and resulting in inconsistent results.19 Evidence of changes in healthcare quality correlated with P4P programmes often comes from studies lacking in methodological or statistical rigour.16 More rigorous research designs find little or no effect. Studies monitoring healthcare outcomes, the desired end state for QI efforts, showed smaller effects than studies monitoring process measures. One study actually found that clinics with fewer QI activities were more responsive to P4P than controls.20 So P4P and report cards may only benefit those groups that are already lagging in QI efforts. Jha21 also pointed out that in addition to the size and frequency of incentives, the selection of metrics and structure of the programme (how incentives are calculated, how complicated are the formulas) also impacts the ability of programmes to show positive outcomes.
VBP and QI Programme
The programme described here will be referred to as the VBP and QI Programme (VBP-QIP). This programme was developed by a large health plan in southern California, serving approximately 2.7 million members to address the programmatic gaps identified above. There was already a P4P programme in place for 5 years prior to the implementation of the VBP-QIP. That programme had not made appreciable progress at improving healthcare outcomes. However, it provides a perfect natural experiment in which to longitudinally test the impact of the VBP-QIP. After the implementation of the VBP-QIP a comprehensive QIP was put in place, along with increased focus from health plan leadership. At the time that the VBP-QIP was put in place leadership noted wide performance variation and identified performance improvement as an enterprise goal which buoyed these efforts. The VBP-QIP was designed to have prioritised and readily interpretable measures. The VBP-QIP was also designed to ensure transparency in how measures were selected and how composite metrics are calculated.
Much of California already uses a capitated payment methodology22 with providers receiving monthly payments based on the number of members they manage and the risk level of those members. This is already a form of VBP. By encouraging preventive health measures in their assigned members they can minimise the number of members’ higher cost urgent care or emergency room visits. And if providers are able to maintain the health of their members and minimise the extent to which members get admitted to the emergency room or have inpatient stays they receive a bonus at the end of the year based on those savings.
It is important that differences between provider groups be meaningful. Prior research using Medicare Compare data23 found that hospitals in different tiers of performance did not show statistically significant differences between them. The VBP-QIP was developed with the CMS Hospital VBP methodology.24 This methodology requires that the difference between the 50th percentile (referred to as the threshold in the CMS methodology) and 95th percentile (referred to as the benchmark in the CMS methodology) be statistically significantly different and that a trimmed coefficient of variation (a measure of dispersion) be within an acceptable range indicating adequate dispersion.
The VBP-QIP was also presented by senior leadership to the leadership of all contracted provider groups prior to implementation. The measures and domains to be incorporated in the report were presented as well as the proposed scoring methodology. Feedback was encouraged and gathered ahead of implementation with all provider groups agreeing to the final proposed selection of metrics.
This paper will demonstrate the success of the VBP-QIP versus using mainly P4P. The VBP-QIP led to improvements in communication and coordination between plan level QI efforts, the C-suite and network providers. Using a type of interrupted time series design known as a regression discontinuity design,25 and taking advantage of the natural experiment that was created with 5 years of mainly P4P followed by 5 years of the VBP-QIP it is possible to determine the true impact of the programme relative to P4P alone. The 5 years prior to the VBP-QIP are said to have used mainly P4P because there was a quality programme, but it was not explicitly highlighted by leadership and integrated with the P4P programme.
The Value-Based Payment and Quality Improvement Programme
In 2010, a P4P programme was created at the health plan. It included several measurement domains and was mainly focused on providing financial incentives to providers for hitting certain performance targets. However the programme did not receive high level attention from health plan leadership, and by 2015, there was still considerable variation in performance among independent practice associations (IPAs). In 2015, the VBP-QIP was developed as a strategic tactic guided by the health plan’s enterprise goal of enhancing quality performance. The VBP-QIP was designed as a tool to evaluate provider groups on multiple performance measures. The goal was to identify lower performing IPAs and to provide them with the support needed to improve quality of care to patients. Many metrics were considered and tested for the report card, including HEDIS26 (Healthcare Effectiveness Data and Information Set; a standard set of metrics used to measure quality in managed care), member experience (as measured by member surveys), utilisation management (UM) (assessing inpatient and emergency room utilisation), encounters (measuring the volume and timeliness of capitated claims), pharmacy, compliance and network adequacy (measuring how well the plan network meets member needs).
The report card tool was finalised in February 2016 and baseline reports were created using the CMS VBP methodology.27 The final list of metrics included five domains of aggregated scores for HEDIS, Access and Availability of Care, Member Satisfaction, UM and Encounter Data. This paper will focus on improvements made to the HEDIS domain. The other domains will be the focus of future papers.
In 2017, the VBP-QIP incorporated a P4P component to strengthen the programme and provide value-based reimbursement for QI. Approximately US$15 million was budgeted to fund the programme. The programme is sustainable, as the budget is determined based on a capitation deduction model whereby dollars received from the state Medicaid programme are set aside to fund the incentive. For each member enrolled in the health plan a certain amount of money is set aside to fund the incentive. This is included in each annual budget and has been in place for the last 13 years. The health plan’s goal was to ensure the VBP-QIP was first well established with performance measurement and reporting, as well as with regular coordinated interventions, and then to introduce incentives to help support the programme. This was learnt from the prior P4P programme where incentives alone did not reduce variations in performance. In the largely capitated environment, where providers receive monthly rates based on membership, the VBP-QIP pays out a maximum of about 10% over providers’ capitated rates. This provides a solid business case for IPAs to invest in improving their performance on the VBP-QIP metrics.
A key component of the VBP-QIP is the IPA Action Plan process. It was developed by a multifunctional VBP-QIP Workgroup, including subject matter experts in QI, provider network management, communications, UM and encounters. The process required IPAs to create, implement and submit project improvement plans using the Specific, Measurable, Attainable, Relevant and Time-Bound methodology. This process helps keep the IPAs actively engaged with the VBP-QIP, the health plan, and its subcontracted health plan partners, and is therefore a vital tool for driving improvements.
Annual programme updates
Planning for each programme year is an iterative process with extensive discussions held among stakeholders. Domains, measures, weighting and scoring methodology are discussed with targeted enhancements made each year. Feedback from the provider network is solicited and each component is evaluated to determine the most effective means of measure selection, performance measurement and reporting, engaging IPAs in project improvement plan development. HEDIS, member satisfaction, UM and encounters are the programme domains currently in place.
The need for consistent and frequent performance reporting was recognised immediately. An important strategy for the plan to support the network in the VBP-QIP was the ability to provide actionable data so that providers could track, monitor and improve performance throughout the year. Reports included bimonthly HEDIS and UM gap-in-care reports, quarterly encounter reports, and distribution of updated measure performance targets. Comprehensive member experience reports are also shared with providers that provide a drilldown of results by demographics and a key driver analysis. All of these reports provide guidance on what services members should be receiving based on national standards as well as a snapshot of their potential final VBP-QIP results.
Provider engagement and training
The VBP-QIP and QI team determined that a comprehensive communications and engagement strategy would be needed to ensure the programme was a success. A multipronged approach was developed that included various means of outreach and interactions with IPAs, plan partners, clinics and physicians to discuss QI efforts. These included regular webinars and Continuing Medical Education Sessions as a method to engage and educate the provider network. Discussion topics included HEDIS and coding, the Action Plan process, encounter data submission, vaccine hesitancy and member experience.
Weekly collaborative meetings with the subcontracted health plans were implemented. These meetings are used to address any operational issues, discuss potential interventions and strategies for improving performance among the lower performing IPAs.
One-on-one meetings were also held between the plan and IPA QI staff. These meetings focused on lower performing IPAs to identify opportunities and strategies for improvement. IPA surveys have also been conducted to evaluate the network’s perception and use of QI reports. Other meetings included physician in-office visits from the plan’s staff to train on the intricacies of the VBP-QIP, provide useful resources and tips, discuss best practices and provide general support. Ad hoc requests were fielded by phone as well.
Provider recognition programme
A provider recognition awards programme was developed to acknowledge the performance of providers. Top performing and most improved practitioners, community clinics and IPAs were identified and recognised in articles published in a plan newsletter sent out to all contracted providers. The awarded providers were also sent plaques, and had billboards displayed throughout the county showing their image, company name and highlighting their award. In subsequent years, the plan instituted an annual awards banquet during which awards are handed out and winners are given a platform to share their experiences in delivering care and to offer best practices.
This analysis used a regression discontinuity design to compare performance prior to the launch of the VBP-QIP (2010–2015) with performance after the launch (2016–2021). A regression discontinuity design is a type of interrupted time-series design. It enables for tests of differences in slope, y-intercept, as well as means preintervention and postintervention. It provides good causal inference. Any sudden change in the trend, the intercept, or the mean that corresponds with the timing of the intervention, in this case the VBP-QIP, is likely due to the intervention. Of course, there are potential confounding factors, such as other concurrent events, and those will be addressed in the discussion. Only HEDIS data were included in these analyses, and only those measures that could be trended (did not have significant measure specification changes over time). HEDIS data are based solely on administrative claims data. These are claims that are sent to the plan via standard claims submission pathways, as well as direct data feeds received from IPAs throughout the year to compensate for any data lost through the claims submission process.
For statistical analysis, we examined the HEDIS measures that were measured across the entire ten year time frame. Thus, ten HEDIS measures were examined from the VBP-QIP. For each measure, data are aggregated at the IPA level. These rates reflect the proportion of an IPA’s eligible membership that received recommended care, or reached outcome benchmarks in the measurement year. To avoid the over or underestimation due to small sample sizes, any measure with less than 30 members was excluded from scoring.
Threshold and benchmark
Attainment and improvement scores are calculated relative to peer group performance for each measure. Attainment scores indicate an IPA’s performance compared with their peer group while improvement scores show an IPA’s performance compared with the prior year. In the VBP-QIP, the 50th percentile of the prior year’s performance distribution for each measure is set as the threshold, which refers to the minimum score a provider group needs in order to receive an attainment score greater than zero. The 95th percentile is set as the benchmark, or the high-end target that qualifies an IPA to receive the maximum attainment points possible for a given measure. If an IPA’s score for a measure is at or above the benchmark, the IPA receives a full 10 points for that measure. However, if the score is below the threshold no points are awarded. If the score is greater than the threshold, but less than the benchmark, one to nine points are awarded based on the linear distance between the threshold and benchmark values.
Improvement scores are calculated relative to an IPA’s prior year score and the benchmark. If the current year’s score for a measure is greater than the prior year score, but below the benchmark, the group is awarded up to nine points. If its score is equal to or lower than the prior-year score, it receives zero improvement points. The CMS Hospital Value-Based Purchasing formula is used to calculate these attainment and improvement scores.24 Lastly, the better of these two scores, attainment and improvement, becomes the final score for each measure.
To examine whether performance improved across the Health Plan’s provider network after the implementation of the VBP-QIP, we performed a regression discontinuity analysis comparing the slope, y-intercept and means between the time period of 2010–2015 and 2016–2021. The metrics compared were the thresholds and benchmarks for each HEDIS measure for which we had complete or near complete data across the timeframe.
Fifty-nine eligible IPAs have participated in this programme to date. The IPAs had 2 469 092 members who were ever enrolled in Medi-Cal in 2021, with 53.8% female, and 46.2% male. Examining the membership by age, 37.8% were 19 years old or younger, 38.5% were between the ages of 20 and 50, 13.3% were between the ages of 51 and 64, and 10.5% were 65 or older. Looking at race, 77.0% were white including Hispanic and non-Hispanic whites, 12.5% were black or African American and 8.3% were Asian. Examining language, 61.2% of members spoke English, 30.1% Spanish, 1.8% Armenian, 1.2% Mandarin and 1.0% Cantonese. We also segmented membership based on the California state aid codes that members fell into, with 54.4% of members qualifying through the Temporary Assistance for Needy Families Programme, 33.4% qualified under the Medicaid Coverage Expansion (MCE) and 5.8% qualified as seniors and persons with disabilities.
The following results reflect the changes in the 50th percentile, called the threshold in the VBP-QIP, from before and after the VBP-QIP was implemented. Using the regression discontinuity design allowed for the examination of preimplementation and postimplementation changes, changes in the slope of the trend, and changes purely related to the passage of time. If there was an impact of the VBP-QIP there would have been either statistically significant changes in the mean or changes in the slope that could indicate improvements or decreases in performance after the programme was implemented. Among the 10 HEDIS measures included in the analyses, there were statistically significant findings for eight of the measures. However, one of those differences was simply due to a statistically significant effect of time (see figure 1 and table 1). In order to create the graphs in figure 1, the denominators were held constant, although time varying denominators were included in analyses. The solid red lines indicate the best fit regression lines, and the dotted line displays the counterfactual, if there was no impact of the VBP-QIP and the trend remained constant. See table 1 for definitions of the HEDIS acronyms used below. There were statistically significant improvements in Adults with Acute Bronchitis (AAB), Breast Cancer Screening (BCS), Cervical Cancer Screening (CCS), Comprehensive Diabetes Care-Eye Exam (CDC-E) and Children With Pharyngitis (CWP). The improvements for BCS, CDC-E and CWP were the result of improving trends already in place before the VBP-QIP but maintained. AAB and CCS had statistically significant changes in slope, from decreasing to increasing after the programme was put in place. For none of the measures was there a statistically significant decrease in performance. For statistically significant results, the squared semipartial correlations were mostly between 0.2 and 0.5 indicating that there were practically meaningful effect sizes and explained variance, not just statistical significance.
Because the time frame of the study overlaps with the COVID-19 pandemic, it was important to assess whether results might have been due to any changes in healthcare utilisation that occurred after 2019. The longitudinal graphs in figures 1 and 2 provide good evidence that any trends identified in analyses were not due to the pandemic. It is obvious from reviewing those trends that very little if anything changed in the threshold results due to the pandemic. Regression discontinuity analyses were also run excluding years 2020 and 2021. While it is true that most of the findings were no longer statistically significant that was likely due to the decrease in the post-VBP-QIP time period included in the analyses. It has been demonstrated that statistical power is increased in time-series analyses by having data points equally distributed before and after an intervention.28 Eliminating the years postintervention likely decreased power, although a visual inspection confirms that the trend remained constant. As a result, only the analyses including the years 2020 and 2021 are reported here.
The benchmarks, or 95th percentile rates, are mostly determined by one or two of the highest performing IPAs so there is more variability making it less likely for results to be statistically significant. There were, in fact, fewer statistically significant results. And those results did not correspond to increases in the thresholds. For BCS, there was an effect of time. For CCS, there was an effect of time and a change in the slope from positive to negative. And for CWP, there were effects of time, the VBP-QIP and a change in slope. The benchmark for CWP was already improving before the VBP-QIP was implemented (see figure 2 and table 2).
These results are mixed and require continued tracking of data to determine the full impact of the VBP-QIP. Furthermore, additional analyses are called for to determine if there was a positive return on investment.
By using a range of targets from the median (threshold) to the 95th percentile (benchmark) the CMS VBP scoring method aims to gradually improve performance. Over time, the range between the threshold and benchmark is expected to move up the scale. In this study, it appears that the majority of the improvement occurs on the low end, with the threshold. Changes in the benchmark were inconsistent. This is likely due to the fact that the benchmarks are driven by one or two high scores, and those scores may not be as stable from year to year as the 50th percentile. The trends seen for many of the threshold values were encouraging. Some of the results were quite striking, such as the quick turnaround seen for the thresholds of AAB, CCS and the dramatic increases in performance for the prenatal and postpartum HEDIS measures. While the time series was short, in terms of the number of data points, the trends are quite clear. For those effects that were statistically significant the squared semipartial correlations were relatively large (0.2 to over 0.5), indicating that predictors were able to explain a substantial amount of the variance in the 50th percentile over time. It is possible that the only reason why the change in slope was not statistically significant for the prenatal measure is due to a sudden drop off in performance in 2020 and 2021 likely due to the impact of the pandemic. The VBP-QIP highlights the need to use comprehensive QIP to improve healthcare outcomes. Combining leadership buy-in, well-defined metrics, reporting, P4P and provider communication resulted in improved health outcomes for health plan members.
The selection of metrics was developed and agreed on by all stakeholders as was the methodology used to calculate scores and payments. This addresses a major concern of providers outlined above.21 The structure and terms of the programme were communicated multiple times in multiple formats.
Buy-in from providers
The health plan has seen improvement in clinical quality rates, data, process and collaboration through the VBP-QIP. Before this programme, the connection between incentive dollars and improvement was tenuous and the training and education on how to improve was minimal. Input from internal and external stakeholders was a critical element in the success of the programme. We followed a structured change plan22 to implement this large scale change. We executed steps outlined by Kotter29 such as establishing a sense of urgency and developing a change plan. We also worked very closely with our IPAs to ensure cooperation and adoption of the changes. Key to this success were meetings held between the leadership of every IPA and health plan leadership during the development and launch of this programme. IPA leadership all agreed with the selection of measures, domains and proposed payment calculations prior to programme launch. This level of leadership support and collaboration was essential to the success of this effort.
Opportunities for improvement
The process by which we calculate and produce the performance reports remains relatively manual and time-consuming. Moving to an online reporting portal is a target for the future. Furthermore, it is essential to streamline the formats of data sources. Currently, data are stored in multiple formats requiring extensive coding to be integrated. In addition, much of the quality assurance process is manual leading to longer processing times. The standardising of data formats and automation of quality assurance processes are already underway.
The VBP-QIP uses quite a few resources to be done successfully. It is acknowledged that other plans may not have the staff, resources, data or applications by which to create such reports successfully. In the absence of available resources to administer such a large complex programme focus should be dedicated to the QI communication process which was really central to the success of this programme.
Equity in payments is a concerning issue for our health plan. As literature has suggested,23 entities with greater resources may be better able to achieve higher quality scores and higher payments through these programmes. We are currently investigating the Health Equity Summary Score score from CMS23 and other ways to ensure equity in payments to groups whose members face extensive social determinants of health, but who have good programmes and are making steady progress.
This health plan is learning every year how to improve the programme and make it more efficient for all involved, while still working towards the Quadruple Aim of improved health outcomes, improved patient experience, lower per capita cost and provider satisfaction. Future improvements include: clinical data integration for all data to reside in similarly set and managed repositories, more efficient and shared programming and applications, and optimised communications.
Data availability statement
No data are available. These data are considered confidential and not subject to public disclosure under California WI Section 14087.38 (n), (o) or (p) or other applicable law.
Patient consent for publication
Contributors MP and KMP conceived and led the development of the paper. MP managed the construction of the paper, wrote the introduction, references and collaborated with YK on the analyses. MP wrote the statistical analyses and results. HS wrote much of the methods section pertaining to the P4P program, and JK provided analytic support. All individuals have read and reviewed the entire paper. MP is the responsible author and guarantor.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient and public involvement Patients and/or the public were not involved in the design, or conduct, or reporting, or dissemination plans of this research.
Provenance and peer review Not commissioned; externally peer reviewed.