Article Text

Download PDFPDF

Aggregated student confidence estimates support continuous quality improvements in a competencies-oriented curriculum
  1. Frank Joseph Papa,
  2. Jerry H Alexander
  1. Medical Education, University of North Texas Health Science Center, Fort Worth, Texas, USA
  1. Correspondence to Dr Frank Joseph Papa; frank.papa{at}


Introduction Competencies oriented medical curricula are intended to support the development of those specific tasks likely to improve patient care outcomes. In 2005, our institution developed curricular objectives and instructional activities intended to enable our students to competently perform four specific clinical tasks (diagnose, treat, manage and explain phenomena) for each of approximately 100 common and/or important patient presentations (eg, dyspnoea). However, competencies oriented curricula must also develop outcome metrics aligned with their objectives and instructional activities in order to launch a continuous quality improvement (CQI) programme. This investigation describes how a novel course evaluation methodology produced presentation and task-focused outcome metrics sufficient to support CQIs in our competencies-oriented curriculum.

Methods Literature suggests that aggregated, group opinions are much more reliable than individual opinions in a variety of settings, including education. In 2010, we launched a course evaluation methodology using aggregated student self-assessments of their confidence in performing the four tasks trained to in each presentation-focused instructional activity. These aggregated estimates were transformed into a variety of graphic and tabular reports which faculty used to identify, and then remediate, those specific instructional activities associated with suboptimal presentation and task-focused confidence metrics.

Results With academic year 2010–2011 serving as a baseline and academic year 2015–2016 as an endpoint, analysis of variance revealed a sustained and statistically significant gain in student confidence across this 6-year study period (p<0.001).

Discussion This investigation demonstrated that aggregated, presentation and task-specific confidence estimates enabled faculty to pursue and attain CQIs in a competencies-oriented curriculum. Suggestions for new approaches to confidence-related research are offered.

  • continuous quality improvement
  • evaluation methodology
  • medical education
  • quality improvement methodologies

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from


Since the early 2000s, medical education has been undergoing a transformational, competencies-oriented curricular reformation.1–3 One hallmark of such curricula is the creation of objectives and instructional activities designed to develop those specific clinical capabilities likely to improve patient care outcomes.4 5 In 2005, curriculum planners at the University of North Texas Health Science Center introduced their vision of a competencies-oriented curriculum in the form of patient presentation-focused instructional ‘modules’ (eg, acute chest pain, dyspnoea, abnormal vaginal bleeding) within year-two system courses.

Each module is designed to support students in competently performing four core, clinically relevant physician tasks, that is, how to diagnose, treat and manage the common and important disease aetiologies causing the patient presentation at hand, and use the biomedical sciences to explain the clinical phenomena associated with each of the diseases addressed in each module. Students attain these four task-specific competencies via an instructional methodology which provides numerous case-based application opportunities and feedback regarding their performance.

The rationale underlying the construction of these modules was derived from decades of learning sciences literature which can largely be summarised via two competencies-oriented, instructional design principles: (1) competence is much more heavily predicated on presentation-focused and task-specific knowledge rather than generalisable problem solving skills6–8 and (2) the development of competencies is expedited via multiple presentation and task-focused application opportunities and feedback.9 10 We refer to this instructional methodology as a Presentation-focused, Task-specific, Application-oriented Learning Module (PTALM).

However, the realisation of a fully developed, competencies-oriented curriculum requires more than highly specified performance objectives and instructional activities. It also requires the creation of outcome metrics aligned with the curriculum’s objectives and instructional activities.11–13 Such alignment makes it possible to create a robust continuous quality improvement (CQI) programme whereby outcome metrics provide a reliable estimate of the degree to which the curriculum’s objectives and instructional activities enable students to competently perform those specific tasks likely to lead to improved patient care outcomes.14 15

Our movement towards presentation and task-focused objectives and instructional activities caused us to realise the limitations associated with our previous curriculum’s outcome metrics. Specifically, our year two course examinations were designed to assess student performance via a single, overall grade for each course. Further, our previously employed course evaluations produced a single estimate of the students’ overall satisfaction for each course. Similarly, licensure board scores reflected a student’s overall performance on the examination, along with an overall grade for each discipline and system assessed.

Simply put, these broadly defined outcome metrics (course grades, course satisfaction evaluations and licensure scores) could not be meaningfully aligned with our PTALMs’ highly granular, presentation and task-specific objectives and associated instructional activities. The inability to align our highly granular objectives and instructional activities with equally granular outcome metrics would significantly impede our ability to launch and sustain a CQI programme.

Rationale underlying the use of highly specified course evaluation-based outcomes metrics: We quickly recognised that significant logistical impediments would need to be overcome before our own faculty or the licensing boards could create examinations containing the number of test items needed to produce objective and reliable assessment metrics reflecting the students’ patient presentation-focused and task-specific capabilities. We subsequently concluded that the initial development of our CQI programme would need to be predicated on outcome metrics produced by our course evaluations.

A suggestion was made to consider the use of student self-assessments, reflecting their confidence in performing each task associated with each PTALM, as the foundation of a new, highly granular approach to course evaluations-based outcomes metrics. However, previous research made clear that novice-derived self-assessments of confidence were unreliable at the level of the individual student. Fortunately, a new line of research suggested that the aggregation of individual opinions, such as self-assessments from novices, can produce a reliable evaluation of the issue at hand.16 17

Study purpose: In 2010, we launched a course evaluation methodology which used individual student self-assessments of confidence to produce an aggregated estimate of the classes’ confidence in performing each task associated with any given PTALM. We called these aggregated presentation-focused, task-specific estimates ‘confidence indices’ (CI). The purpose of this study was to determine if these aggregated student CI metrics could support CQIs in our new competencies-oriented curriculum.

Study overview: In the Methods section of this report, we describe the design and implementation of our CI-based course evaluation methodology. In the Results section, we provide examples of how this new metric provided faculty with formative feedback reflecting the students’ aggregated confidence in their ability to perform each of the four core physician tasks associated with each PTALM. We also provide evidence demonstrating that, over time, this three step CQI process sustained measurable improvements in CIs associated with our year-two preclinical training programme. In the Discussion section, we elaborate on faculty acceptance of the CI metric and suggest opportunities for further research.



Our year two medical training program represents a hybrid curriculum consisting of: 1) Systems-based courses with each course augmented with 2) approximately 6–10 PTALMs. At the end of each year-two System course, students are asked to self-assess their level of confidence in performing each of the four core physician tasks (diagnose, treat, manage and explain) associated with each PTALM offered in the course. Students express their confidence using the qualitative scale of Very Confident, Somewhat Confident, Uncertain, Somewhat Not Confident and Not at all Confident. A schematic portrayal of the confidence data collection instrument for our year-two Cardiovascular System course and the six PTALMs offered during that course is provided (figure 1). The same data collection template is used for all year-two System courses.

Figure 1

Data collection scheme for students’ end-of-course self-assessment of their confidence in performing each task (Diagnose, Treat, Manage and Explain) for each PTALM in a given course. This example portrays the scheme used to collect student Confidence Indices for each of the six PTALMs associated with the year-two Cardiovascular System. PTALM, Presentation-focused, Task-specific, Application-oriented Learning Module.


To calculate an individual student’s CI for a given task, the Likert scale associated with each response is multiplied by 20. Thus, a ‘Very Confident’ response of 5 becomes a CI of 100 and a ‘Not at all Confident’ response of 1 becomes a CI of 20. We then average all students’ responses for any given task to create a task-specific CI for each of the four tasks associated with each PTALM. We posited that this transformation of confidence estimates from a 1 to 5 scale to CI values ranging from 20 to 100 would make it easier for faculty to interpret a scale similar to that used to grade performance on an examination. For example, CIs hovering around a 70 would represent a marginally acceptable score, while CIs below this suggested that the associated instructional activity was suboptimal. Multiple confidence estimates are derived from the students’ self-assessments:

  1. Task CI values for each of the four tasks associated with each PTALM.

  2. A PTALM CI by averaging the four Task CIs associated with each PTALM.

  3. A System CI for each year-two System course, by averaging the CIs for all PTALMs comprising that course.

  4. An Annual CI representing the average of all System CIs for that year.


These four CI metrics are used to produce a variety of tables and graphs designed to facilitate our three step CQI programme. First, faculty use these tables and graphics to readily identify those Task-specific instructional activities, PTALMs and System courses where instruction was suboptimal. Second, faculty draw on their understanding of our two competencies oriented curricular design principles and associated faculty development initiatives to remediate those instructional activities identified as suboptimal. Third, the CI metrics associated with all Tasks, PTALMs and System courses offered the following year are reviewed to determine the effectiveness of both the previous year’s remediation efforts, and whether, year-after-year, their remediation efforts are producing gradually improving, gradually declining or unchanged CI metrics. These tables/graphics are further described in the Results section of this report.

It should be noted that no individual student’s CIs are identified or reported to faculty, per the requirements of our IRB approved research protocol. SPSS Statistics, V.23 (IBM, Armonk, New York, USA) was used to produce the statistical analyses reported in the Results section.


Self-assessments of confidence were received from 1040 students across the 6 years of this study. On average, this represents a participation rate of 77% of each class. It should be noted that student participation in course evaluations is expected, but not mandated.

Using the Cardiovascular System course as an example, one of our 2015–2016 reports to faculty includes both Task CI and PTALM CI values (table 1). When distributed, all CI values are colour coded with shades of green or red creating a gradient of colours making it easy to distinguish the relative strengths and weaknesses of each task-focused instructional activity within each presentation-focused PTALM. Note that the six Cardiovascular PTALMs are not listed alphabetically, but rather rank ordered in terms of their respective Presentation CI values (located in the last column of table 1). This reporting format makes it is easy for faculty to determine that two PTALMs (Palpations/Dysrhythmias and Heart Murmur) received markedly lower CIs (72 and 76, respectively) than the four other PTALMs (Heart Failure, Acute Chest Pain, Shock and Syncope).

Also note that table 1 enables faculty to rapidly review each of the four task-specific CI estimates associated with each PTALM and thereby identify which particular task-focused instructional activities were most in need of remediation. This task-specific level review makes clear that the instructional activities associated with all four tasks trained to in both Palpations/Dysrhythmias and Heart Murmur received essentially the same low CI estimates. This finding suggested that improvements along the entirety of these two PTALMs needed to be pursued.

Table 1

Task-specific and presentation-specific confidence indices cardiovascular system PTALMs, 2015-2016

Another faculty report presents a longitudinal, summative perspective of the Cardiovascular course over the 6 years of this study (table 2). This longitudinal format, with both tabular and graphical elements, makes it easy for faculty to determine which PTALMs improved over time, remained relatively stagnant or declined. The longitudinal perspective represented in this report reveals a consistent upward trend in the CI metrics associated with the PTALMs comprising the Cardiovascular course over the 6 years of this study.

Table 2

Presentation-specific confidence indices (CI), cardiovascular system PTALMs, annual trend, 2010–2011 through 2015–2016

For example, the CI for the presentation of Shock showed dramatic improvement, rising from an initial CI of 49, in the Uncertain/Somewhat Not Confident range in academic year 2010–2011, to a CI of 82 in the Very Confident/Somewhat Confident range in 2015–2016. However, while our Palpation/Dysrhythmia and Heart Murmur PTALMs demonstrated a 10-point improvement in their CIs over the 6 years of this study, the most current year’s CI for each of these two PTALMs has remained markedly lower than the averaged CI score for other Cardiovascular PTALMs. Thus, longitudinal data may be used to design learning sciences principles-based faculty development for those responsible for poorly ranked PTALMs or, where necessary, to support an administrative decision to reassign faculty whose PTALM(s) have shown marginal performance over time. Similar gains in CI values were also found in the other eight Systems courses offered each year, over the 6 years of this investigation, but are not represented in this report.

Gross, year-over-year gains in confidence are summarised by the Annual CI values gathered throughout this 6-year study (figure 2). While a visual inspection of the trend line reveals continuous improvements, an analysis of variance (ANOVA) was conducted to determine if this trend represented significant improvements. The ANOVA confirms that the improvements were statistically significant across this 6 year period (p<0.000) (table 3).

Table 3

Analysis of variance, annual confidence indices, 2010–2011 through 2015–2016

Figure 2

This figure portrays year-over-year increases in the annual Confidence Indices from 2010-2011 through 2015–2016. These annual Confidence Indices represent the average across all system-level Confidence Indices within each year.

Scheffé a posteriori comparisons revealed that the CI gains from the benchmark year, 2010–2011, to each subsequent year were statistically significant, with p<0.05, <0.001, <0.001, <0.001, <0.001. Further, in comparing each year’s Annual CI to the Annual CI of the year immediately following, it was determined that the CI gain from 2010-2011 to 2011–2012 was statistically significant (p<0.05), as was the CI gain from 2012-2013 to 2013–2014 (p<0.001).

It should be noted that consideration of student self-assessed confidence as a predictor of any individual student’s performance level was not originally a part of the study plan. However, in response to numerous faculty inquiries, we performed a 1-year analysis of the relationship (correlation) between our students’ individually averaged CI for the nine system courses and both their individually average grade for these courses as well as their performance on the United States Medical Licensing Examination (USMLE) 1. We found a positive and significant correlation between our students’ CI and their class grade (r=0.300, p=0.000, df=165), and also between their CI and USMLE 1 score (r=0.202, p=0.018, df=165).


Review of rationale

For decades, the learning sciences have provided ever mounting evidence demonstrating that competence is presentation and task-specific. This evidence is now playing a significant role in the evolution of competencies-oriented curricula as witnessed by a recent AAMC survey revealing that ‘curricula organised around clinical presentations will be the next evolutionary step of curriculum renewal for many (American) medical schools’.18

We described how this evidence and two learning sciences-derived principles were used to construct our competencies-oriented curriculum; an approach to training predicated on the establishment of highly specified course objectives and instructional activities delivered via an instructional methodology referred to as PTALMs. Faculty and students were generally pleased with our learning sciences-derived, principles-driven curriculum. Following the implementation and stabilisation of our new curriculum, we sought to establish outcome metrics (in the form of student performance assessments) with a level of granularity equal to our PTALMs’ objectives and instructional activities. Unfortunately, the logistical overhead associated with the production of multiple, presentation and task-specific test items exceeded our resources.

However, it was much simpler to create a course evaluation methodology that would align with our PTALM’s objectives and instructional activities. We posited that if we were to view our students as physicians-in-training, with the collective capacity to reliably estimate how well any given PTALM enabled them to perform its four core physician tasks (ie, used aggregated CI metrics), such metrics could serve as a reliable foundation for launching a meaningful CQI programme.

Interpretation of findings

Our aggregated confidence metrics were used to construct visually oriented (tabular and graphic) portrayals representing the relative rank ordering of the PTALMs comprising each System course within a given year. These visuals enabled faculty to readily identify suboptimally performing PTALMs and any suboptimal task-specific instructional activities within any given PTALM. Once identified, faculty could readily set out to improve these suboptimally performing instructional activities via the use of our learning sciences derived curricular design principles. These visuals were also used to portray how any given System course’s PTALMs were progressing over the several years of this study. Throughout this 6 year study, our use of aggregated student confidence estimates enabled us to pursue and attain incremental improvements in targeted instructional activities as witnessed by the results of our ANOVA and posthoc Scheffé analysis.


The authors readily concede that presentation and task-focused student performance metrics would represent an optimal means of assessing and documenting the development of highly granular clinical capabilities in a competencies-oriented curriculum. However, the logistical overhead associated with the production of multiple, presentation-focused and task-specific test items is substantial. Thus, highly granular, presentation and task-specific course evaluations, such as produced in this investigation, might be among the only pragmatic, near-term means of initiating CQI programmes capable of sustaining continual improvements in the clinical capabilities of students within competencies-oriented curricula.

In regards to the utility of course evaluation-driven CQI programmes, Kogan and Shea have argued that ‘The best, yet most underutilized, reason to evaluate courses is to gather feedback for faculty and where necessary provide remediation to improve their teaching’.19 Evidence of the validity and utility of course evaluations based on aggregated medical student and resident self-assessments has been previously reported by Peterson et al.20 21 However, unlike Peterson’s work, our aggregated confidence-based metrics were used by faculty as ongoing, formative feedback intended to improve the curriculum’s instructional effectiveness in preparation for the next class of students. Evidence of the utility of aggregated learner feedback, as reported by Peterson et al and as also described herein, would appear to support Kogan and Shea’s position. We therefore suggest that most competencies-oriented curricular initiatives could easily implement and subsequently benefit from the development of course evaluation methodologies designed to produce outcome metrics aligned with their curriculum’s objectives and instructional activities.

Faculty acceptance and utilisation of the CI metrics introduced in this evaluation methodology proceeded cautiously. Initially, some faculty expressed concern that we might use our CI metric as a replacement for objective assessments of student performance. Accordingly, it was important to continually make clear that the CI metrics represented a curricular CQI initiative and that it was never our intention to use them as a replacement for objective assessments of student performance. In fact, no individual student confidence levels were ever reported. Thus, any potential utility for these CIs was always based on the aggregation of individual student confidence estimates.


The purpose of this investigation was to determine whether a course evaluation methodology predicated on aggregated presentation and task-focused outcome metrics could support CQIs in our competencies-oriented curriculum. However, some faculty were interested in pursuing the factors contributing to the finding identified at the end of our first year of gathering CI estimates, that is, a statistically significant yet low correlation between our students’ CI and both their course grades and subsequent USMLE scores. Unfortunately, the scope of this investigation precluded efforts to use confidence estimates as a predictor of future performance.

We recognise that the literature regarding the utility of self-assessments by individuals remains unsettled. For example, Caputo and Dunning found little relationship between an individual learner’s confidence and their performance, as did Dunning and Heath and Morgan and Cleave-Hogg.22–24 However, Stankov et al found confidence to be ‘by far the best predictor of both concurrent and future performance.’ 25 Are insights to be gained from launching new investigations into to the use of individual self-assessments of confidence and performance against presentation and task-specific situations?

Future directions for confidence-based research: We speculate that the level of correlation between an individual’s confidence and their performance might be higher under the following circumstances. First, when the number of test items is sufficient to produce a reliable measure of a subject’s performance against each specific presentation and/or task of interest. Second, when subjects receive training sufficient to enable them to consciously, insightfully and reliably reflect on and monitor their confidence in performing against presentation and task-specific situations.

Curiously, there is little evidence that health sciences training programmes support the development of those metacognitive and/or executive skills theorised as responsible for self-reflection/self-monitoring. We also suggest that previous investigators did not fully appreciate the presentation and task-specific nature of competence, and thereby did not create the number of presentation and task-specific test items needed to produce a reliable performance metric. Thus, we posit that future efforts to correlate confidence with performance might be higher (and thereby more useful) if both of these concerns were addressed.

One benefit of research demonstrating the existence of higher confidence/performance correlations could be the reliable identification of poorly calibrated (high confidence/low performing) students or residents. Feedback regarding the specific presentations and/or tasks in which learners performed poorly yet were highly confident, could enable them to gain meaningful insights into the origin of their poor performance. Questions such as how many high confidence/low performing individuals exist in a class of students or residents, and how to support learners in gaining insights into the origins of poor performance, remain largely unaddressed. However, improving the learner’s capacity to identify and self-calibrate disparities between confidence and performance, could lead to improvements in diagnostic performance; an area of increasing concern to all working to reduce the number of unnecessary deaths due to diagnostic errors.26 27


Competencies-oriented curricular initiatives require the careful alignment of course objectives and instructional activities with outcomes metrics such as student performance assessments and course evaluations. Once aligned, CQI programmes can be designed to monitor and provide feedback by which continual curricular improvements might be pursued. In this investigation, the authors demonstrated the utility of a novel course evaluation methodology involving aggregated student confidence estimates as a means of implementing a CQI initiative in pursuit of incremental improvements in a presentation-focused, task-specific, application-oriented curriculum.

We encourage medical education researchers to explore the utility of aggregated estimates of confidence in efforts to continually improve their own highly specified, competencies-oriented curricular reforms. Further, institutions using aggregated confidence-based course evaluation metrics would be well positioned to conduct research exploring the use of individual student confidence estimates as a means of developing training programmes directed at improving their students’ evolving metacognitive skills.



  • Contributors FJP was the primary author of the principles underlying the development of the presentation-focused, task-specific competencies-oriented curriculum and the course evaluation metrics described herein. JHA established the basis for increasingly granular course evaluation metrics and their use as formative feedback for launching curricular CQI programming. JHA was also responsible for handling and analysing the data described herein. Both contributed equally to the initial drafting and multiple revisions of the manuscript.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Patient consent Not required.

  • Ethics approval Ethical approval has been granted by University of North Texas Health Science Center, Office for the Protection of Human Subjects, protocol number 2012-024, annually updated.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.