+ All Categories
Home > Documents > BenchmarkingStrategiesforMeasuring...

BenchmarkingStrategiesforMeasuring...

Date post: 23-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
14
The Scientific World Journal Volume 2012, Article ID 606154, 13 pages doi:10.1100/2012/606154 The cientificWorldJOURNAL Research Article Benchmarking Strategies for Measuring the Quality of Healthcare: Problems and Prospects Pietro Giorgio Lovaglio CRISP and Department of Quantitative Methods, University of Bicocca-Milan, V. Sarca 202, 20146 Milan, Italy Correspondence should be addressed to Pietro Giorgio Lovaglio, [email protected] Received 12 October 2011; Accepted 29 November 2011 Academic Editors: V. Brusic, W. D. Evans, M. Fanucchi, and A. S. Levin Copyright © 2012 Pietro Giorgio Lovaglio. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcare and implementing benchmarking strategies. Besides oering accreditation and certification processes, recent approaches measure the performance of healthcare institutions in order to evaluate their eectiveness, defined as the capacity to provide treatment that modifies and improves the patient’s state of health. This paper, dealing with hospital eectiveness, focuses on research methods for eectiveness analyses within a strategy comparing dierent healthcare institutions. The paper, after having introduced readers to the principle debates on benchmarking strategies, which depend on the perspective and type of indicators used, focuses on the methodological problems related to performing consistent benchmarking analyses. Particularly, statistical methods suitable for controlling case-mix, analyzing aggregate data, rare events, and continuous outcomes measured with error are examined. Specific challenges of benchmarking strategies, such as the risk of risk adjustment (case-mix fallacy, underreporting, risk of comparing noncomparable hospitals), selection bias, and possible strategies for the development of consistent benchmarking analyses, are discussed. Finally, to demonstrate the feasibility of the illustrated benchmarking strategies, an application focused on determining regional benchmarks for patient satisfaction (using 2009 Lombardy Region Patient Satisfaction Questionnaire) is proposed. 1. Introduction Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcare. Accreditation and certification procedures have acted as stimulating mechanisms for the discovery of skills and technology specifically designed to improve perfor- mance. Total Quality Management (TQM) and Continuous Quality Improvement (CQI) are the most widespread and recent approaches to implementing and improving health- care quality control [1]. Besides oering accreditation and certification processes, recent approaches measure the performance of health struc- tures in order to evaluate National Health Systems. For example, various international Agencies [24] measure the performance of health structures in dierent countries, considering three main dimensions: eectiveness, eciency, and customer satisfaction. In this perspective, performance measurement for healthcare providers, structures, or organizations (from here, hospitals) is becoming increasingly important for the im- provement of healthcare quality. However, the debate over which types of performance in- dicator are the most useful for monitoring healthcare quality remains a question of international concern [5]. In a classic formulation, Donabedian [6] asserted that quality of care includes (i) structure (characteristics of the re- sources in the healthcare system, including organization and system of care, accessibility of services, licensure, physical attributes, safety and policies procedures, viewed as the capacity to provide high quality care), (ii) process (measures related to evaluating the process of care, including the man- agement of disease, the existence of preventive care such as screening for disease, accuracy of diagnosis, the appropri- ateness of therapy, complications, and interpersonal aspects of care, such as service, timeliness, and coordination of care across settings and professional disciplines), and (iii) clinical outcomes. A clinical outcome is defined as the “technical result of a diagnostic procedure or specific treatment episode” [7],
Transcript
Page 1: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

The Scientific World JournalVolume 2012, Article ID 606154, 13 pagesdoi:10.1100/2012/606154

The cientificWorldJOURNAL

Research Article

Benchmarking Strategies for Measuringthe Quality of Healthcare: Problems and Prospects

Pietro Giorgio Lovaglio

CRISP and Department of Quantitative Methods, University of Bicocca-Milan, V. Sarca 202, 20146 Milan, Italy

Correspondence should be addressed to Pietro Giorgio Lovaglio, [email protected]

Received 12 October 2011; Accepted 29 November 2011

Academic Editors: V. Brusic, W. D. Evans, M. Fanucchi, and A. S. Levin

Copyright © 2012 Pietro Giorgio Lovaglio. This is an open access article distributed under the Creative Commons AttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properlycited.

Over the last few years, increasing attention has been directed toward the problems inherent to measuring the quality of healthcareand implementing benchmarking strategies. Besides offering accreditation and certification processes, recent approaches measurethe performance of healthcare institutions in order to evaluate their effectiveness, defined as the capacity to provide treatment thatmodifies and improves the patient’s state of health. This paper, dealing with hospital effectiveness, focuses on research methodsfor effectiveness analyses within a strategy comparing different healthcare institutions. The paper, after having introduced readersto the principle debates on benchmarking strategies, which depend on the perspective and type of indicators used, focuses on themethodological problems related to performing consistent benchmarking analyses. Particularly, statistical methods suitable forcontrolling case-mix, analyzing aggregate data, rare events, and continuous outcomes measured with error are examined. Specificchallenges of benchmarking strategies, such as the risk of risk adjustment (case-mix fallacy, underreporting, risk of comparingnoncomparable hospitals), selection bias, and possible strategies for the development of consistent benchmarking analyses, arediscussed. Finally, to demonstrate the feasibility of the illustrated benchmarking strategies, an application focused on determiningregional benchmarks for patient satisfaction (using 2009 Lombardy Region Patient Satisfaction Questionnaire) is proposed.

1. Introduction

Over the last few years, increasing attention has been directedtoward the problems inherent to measuring the quality ofhealthcare. Accreditation and certification procedures haveacted as stimulating mechanisms for the discovery of skillsand technology specifically designed to improve perfor-mance. Total Quality Management (TQM) and ContinuousQuality Improvement (CQI) are the most widespread andrecent approaches to implementing and improving health-care quality control [1].

Besides offering accreditation and certification processes,recent approaches measure the performance of health struc-tures in order to evaluate National Health Systems. Forexample, various international Agencies [2–4] measure theperformance of health structures in different countries,considering three main dimensions: effectiveness, efficiency,and customer satisfaction.

In this perspective, performance measurement forhealthcare providers, structures, or organizations (from here,

hospitals) is becoming increasingly important for the im-provement of healthcare quality.

However, the debate over which types of performance in-dicator are the most useful for monitoring healthcare qualityremains a question of international concern [5].

In a classic formulation, Donabedian [6] asserted thatquality of care includes (i) structure (characteristics of the re-sources in the healthcare system, including organization andsystem of care, accessibility of services, licensure, physicalattributes, safety and policies procedures, viewed as thecapacity to provide high quality care), (ii) process (measuresrelated to evaluating the process of care, including the man-agement of disease, the existence of preventive care such asscreening for disease, accuracy of diagnosis, the appropri-ateness of therapy, complications, and interpersonal aspectsof care, such as service, timeliness, and coordination of careacross settings and professional disciplines), and (iii) clinicaloutcomes.

A clinical outcome is defined as the “technical result ofa diagnostic procedure or specific treatment episode” [7],

Page 2: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

2 The Scientific World Journal

“result, often long term, on the state of patient well-being,generated by the delivery of a health service” [8].

Specifically, ongoing attention has been placed on theimportance of combining structural aspects (such as gover-nance and the healthcare workforce) with measures of out-comes to assess the quality of care [6, 9]. This considerationwas taken into account by the Institute of Medicine, which,in 1990, stated that “quality of care is the degree to whichhealth services for individuals and populations increase thelikelihood of desired health outcomes and are consistent withcurrent professional knowledge” [10].

This definition has been widely accepted and has provento be a robust and useful reference in the formulation of prac-tical approaches to quality assessment and improvement,emphasizing that the process of care increases the probabilityof desirable outcomes for patient, reducing the probability ofundesired outcomes.

This paper deals with hospital effectiveness, defined asthe capacity of hospitals to provide treatment that modifiesand improves the patient’s state of health. Of particularimportance in this perspective is the concept of “relativeeffectiveness” that is, the effectiveness of each specifichospital in modifying the patient’s state of health within astrategy comparing different healthcare institutions, in short,effectiveness evaluation in a benchmarking framework [6].

Benchmarking in healthcare is defined as the continualand collaborative discipline of measuring and comparingthe results of key work processes with those of the bestperformers in evaluating organizational performance [11].

Two types of benchmarking can be used to evaluatepatient safety and quality performance. Internal benchmark-ing is used to identify best practices within an organization,to compare best practices within the organization, and tocompare current practice over time. Competitive or externalbenchmarking involves using comparative data betweenorganizations to judge performance and identify improve-ments that have proven to be successful in other organiza-tions.

Our aim is to discuss the statistical aspects and possiblestrategies for the development of hospital benchmarkingsystems.

The paper is structured as follows: the next section in-troduces readers to the principle debates on benchmarkingstrategies, which depend on the perspective and type of in-dicators used. Section 3 presents statistical methods, whileSection 4 explores the methodological problems relatedto performing consistent benchmarking analyses. Section 5describes an application based on patient satisfaction thatdemonstrates the feasibility of the illustrated benchmarkingstrategies. Section 6 offers conclusions.

2. Perspective and Type of Indicators

The conceptual definition and assessment of “effectiveness”rests on a conceptual and operational definition of “qualityof care”, which is an exceptionally difficult notion to define.

An important contextual issue is the purpose for which aperformance indicator is to be used and by whom.

Performance indicators can be used for various objec-tives: to gain information for policy making or strategydevelopment at a regional or national level, to improvethe quality of care of a hospital, monitor performance ofhealthcare, identify poor performers to protect public safetyas well as to provide information to consumers to facilitatethe choice of hospital.

In general, the broader the perspective required, thegreater the relevance of outcome measures, as they reflect theinterplay of a wide variety of factors, some directly relatedhealthcare, others not. Because outcome measures are anindicator of health, they are valid as performance indicatorsin as much as the quality of health services has an impact onhealth. As the perspective narrows, to hospitals, to specialties,or indeed to individual doctors, outcome measures becomerelatively less indicative and process measures relatively moreuseful.

Process measures have two important advantages overoutcome measures. In fact, if differences in outcome areobserved, before one can conclude that the difference reflectstrue variations in the quality of care, alternative explanationsneed to be considered. In contrast, a process measure lendsitself to a straightforward interpretation (e.g., the morepeople without contra-indications who receive a specifictreatment, the better). Second, the necessary remedial actionis clearer (use the treatment more often), whereas for anoutcome measure (e.g., higher mortality rate) it is notimmediately obvious what action needs to be taken.

Despite these limitations, outcome measures have arole in the monitoring of the quality of healthcare that isimportant per se. To know that death rates from a specificdiagnosis vary across hospitals is an essential finding, even ifthe reasons for the differences cannot be explained throughthe quality of care. Further, outcome measurement willreflect all aspects of the processes of care, although only asubset is measurable or measured (e.g., technical expertiseand medical skill). Such aspects are likely to be importantdeterminants of outcome in some situations and describenot only that a correct procedure is performed, but also theresults for the patients.

Another possible reason why outcome indicators areoften used in some countries is that available data refer toroutine information systems (administrative archives) whichregularly record clinical aspects and other dimension usefulfor case mix adjustment.

In the Italian context, at patient level, the HospitalDischarge Card (HDC) is the only available administrativearchive in the health sector. The HDC, introduced inLombardy in 1975 with the introduction of reimbursementsystem of the Diagnostic Related Group (DRG), collectsclinical information about patient discharge.

In this perspective, the debate on the use of clinicaladministrative data to furnish useful information on qualityassessment remains open.

Many authors have criticized the use of clinical outcomesin the evaluation of quality of the care and, particu-larly, mortality rates [12, 13]. According to Vincent andcolleagues [14], administrative data does not provide asuitably transparent perspective on quality or improvement.

Page 3: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

The Scientific World Journal 3

Others suggest that limited clinical content may compromiseits utility for this purpose, posing serious caveats againstdrawing definitive conclusions [15, 16].

Despite such concerns, major consensus exists on theuse of clinical outcomes from administrative data as auseful screening tool for identifying quality problems andtargeting areas in which quality should be investigated ingreater depth [4, 16, 17]. Excluding mortality, various clinicaloutcomes which could indicate malpractice, are widelyaccepted by private or public Agencies [1–3, 18, 19] whichevaluate national health sectors, for example, unscheduledsurgical returns to the operating room within 48 hours,discharges against medical advice, death in low mortalityDRGs, or failure to rescue (indicating deaths among patientsdeveloping specified complications during hospitalization).

2.1. Outcome Variability. In order to consider the method-ological problems that may limit benchmark strategies, it isnecessary to explore the possible causes of variation in anoutcome. Four major categories of explanation need to beconsidered. The first of these is whether observed differencesmight be due to differences in the type of patient cared for inthe different hospitals (e.g., age, gender, comorbidity, severityof disease, etc.).

The importance of this cause of variation is illustratedby studies where differences in crude outcome disappearwhen the outcomes are adjusted to take account of theseconfounding factors. To this end, researchers propose risk-adjustment methodologies as proper methods of equitablecomparisons for evaluating quality and effectiveness ofhospitals [12, 15, 20].

A second cause of variation in outcome (or its risk-adjusted version) is differences in the way data is collected.Differences in the measurement of events of interest (e.g.,deaths) or in the population at risk (typically the denomina-tor of an event rate) depending on different inclusion criteriafor determining denominators, or when different case-mixdata is used to adjust for potential confounding, will lead toapparent differences in outcome.

Thirdly, observed differences may be due to chance.Random variation is influenced both by number of casesincluded and by the frequency with which the outcomeoccurs. To this end, a fundamental issue is whether theoutcome indicator is likely to have the statistical power todetect differences in quality. Statistical power depends uponhow common the occurrence of the outcome is. For somerare events, the limited number of patients experiencing theevents limits the power of the study [21].

Finally, differences in outcome may reflect real, althoughunobservable, differences in quality of care. This may bedue to variations in different measurable or less measurableaspects such as the interventions performed or the skill of themedical team.

Hence, as these are different causes of an outcomevariation, the conclusion that a variation in outcome is due toa difference in quality of care among hospitals is essentially adiagnosis through exclusion: if variation cannot be explainedin terms of previous components (case-mix, data collection,

chance), then hospital quality of care (relative effectiveness)becomes a possible explanation.

3. Statistical Methods

As described above, if one cannot explain the variation interms of differences in type of patient, in how data is col-lected, or in terms of chance, then quality of care becomesa possible explanation. Following the perspective that vari-ations in outcome are due to a difference in quality of careonly as diagnosis through exclusion, institutional agenciesgather larger data sets from administrative archives and applyrisk-adjustment in order to validate quality indicators and tobenchmark hospitals.

Administrative archives are less prone to the problemrelated to how the data is collected, and reduce the possibilitythat differences in outcome may be due to chance (althoughthis risk increases when analyzing rare outcomes). Usually,the sizes of such databases cover the entire population ofhospitalizations, enhancing their statistical power to detectimportant differences in outcomes.

Therefore, the last exclusion criterion invokes a consis-tent statistical model allowing comparisons between hos-pitals, in order to estimate relative effectiveness [22]. Tothis end, statistical methods for risk-adjustment identifyand adjust variations in patient outcomes stemming fromdifferences inpatient characteristics (or risk factors) acrosshospitals and, therefore, allow fair and accurate interhospitalcomparisons.

However, the kind of adjustment required for assessingeffectiveness is not the same for the various subjects inter-ested in the results. To this regard, it is useful to distinguishbetween two types of effectiveness. In fact, potential patients(users) and institutional stakeholders (agents) are interestedin different types of hospital effectiveness.

Following the approach of Raudenbush and Willms[23], in a comparative setting, the relative effectiveness isusually assessed through a measure of performance adjustedfor the factors out of the control of the hospital, so thedifference between effectiveness simply lies in the kind ofadjustment. The authors identify Type A and Type B relativeeffectiveness: Type A effectiveness deals with users interestedin comparing the results they can obtain by enrolling indifferent hospitals, irrespective of the way such results areyielded; the performance of the hospital adjusted for thefeatures of its users is evaluated. Type B effectiveness dealswith Stakeholders interested in assessing the “productionprocess” in order to evaluate the ability of the hospitals toexploit the available resources; in this case, the performanceof the hospital is adjusted according to the features of itsusers, the features of the hospital itself, and the context inwhich it operates.

In the nineties, numerous authors proposed to estimatethe concept of “relative effectiveness” by means of multilevelor hierarchical models [24, 25]. In fact, when the behaviourof individuals within organizations is studied, the datahave a nested structure. Individuals/patients constitute thesampling units at the first and lowest level of the nested

Page 4: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

4 The Scientific World Journal

hierarchy. Organizations/hospitals constitute the samplingunits at the second level.

Several recent statistical papers deal with risk-adjustedcomparisons, related to the mortality or morbidity out-comes, by means of Multilevel models, in order to take intoaccount different case-mixes of patients (for a review, seeGoldstein and Leyland [26] and Rice and Leyland [27]).

One of the most attractive features of multilevel modelsis the production of useful results in healthcare effectivenessby linking individual (patient) and organizational (hospi-tal) characteristics (covariates). Multilevel models overcomesmall sample problems by appropriately pooling informationacross organizations, introducing some correction or shrink-age, and providing a statistical framework that quantifies andexplains variability in outcomes through the investigation ofpatient/hospital level covariates [27].

Quality indicators are typically calculated and dissemi-nated at hospital level, dividing the number of events (in-hospital death or adverse event as a clinical error whichresults in disability, death, or prolonged hospital stays) by thenumber of discharged patients at risk.

However, at the patient/individual level, the event ofinterest is typically a dichotomous variable and the Multilevelmodel version for this kind of outcome is the LogisticMultilevel Model (LMM, [25]).

For patient i nested in hospital j, let πi j be the probabilityof occurrence of a dichotomous adverse event Yij , where Yij

is Bernoulli distributed with expected value E(Yij) = P(Yij =1) = πi j . Instead of πi j , the LMM specifies, as dependentoutcome, its logistic transformation (ηi j = log(πi j /1−πi j)) asa function of possible covariates, where log is the logarithmictransformation and (πi j /1 − πi j) the ratio of the probabilitythat the adverse event occurs to the probability that it doesnot is called the odds of the modelled event.

The LMM without patients and hospital covariates(intercept-only LMM) assumes that ηi j depends only onthe particular hospital charging patient i, specified by γ0 j anominal variable designating the jth hospital; the hospitaleffect is assumed to be random, meaning that hospitalsare assumed randomly sampled from a large population ofhospitals. Equations (1) and (2) define the intercept-onlyLMM:

ηi j = γ0 j , (1)

γ0 j = γ00 + u0 j , u0 j ∼ N(0, σ2

0

), (2)

where γ0 j is the intercept (effect) for the jth hospital whichcan be decomposed in γ00 representing the average proba-bility of adverse events (in the logit metric) across hospitalsand u0 j , a specific effect capturing the difference betweenthe probability of adverse event for hospital j and theaverage probability of adverse event across hospitals. Theserandom effects are assumed to be independent and normallydistributed with zero mean and variance σ2

0 , which describesthe variability of hospitals’ effects. The intercept-only modelconstitutes a benchmark value of the degree of misfit ofthe model and can be used to compare models involvingdifferent covariates at different levels. Further, this modelallows decomposing the total variance of the outcome into

different variance components for each hierarchical level.Specifically, the Intraclass Correlation Coefficient (ICC),defined as the ratio between the variability among hospitalsσ2

0 and total variability (σ20 plus the variability among

patients within the hospitals, σ2e ) captures the proportion of

total variability of a given risk factor that is due to systematicvariation between hospitals. Nevertheless, in the case of adichotomous outcome Yij , the usual first level residuals ei j ,and hence their variance σ2

e , are not in the model (1). Thisoccurs since the outcome variance πi j /(1 − πi j) being partof the specification of the error distribution depends on themean πi j and thus does not have to be estimated separately.

However, approximating the variability of the first levelwith the variance of the standard logistic distribution (π2/3)and summing this variance with the variability of thesecond level (σ2

0 ) allows separating the total variance intwo components, giving the intercept-only model ICC =σ2

0 /(σ20 + π2/3). This measure is used to assess the percentage

of outcome heterogeneity existing between the hospitalsinvolved in the analysis.

As the second step, the probability (in the logic metricηi j) of an adverse event occurrence for patients can be afunction of patients’ characteristics (case-mix), other thanthe hospital effect. Hence (1) can be extended assuming thatηi j depends on P (p = 1, . . . ,P) patient covariates (xpi j)

ηi j = γ0 j +∑P

p=1γp jxpi j , (3)

γ0 j = γ00 + u0 j , (4)

γp j = γp0 + up j , (5)

where γp j is the slope (regression coefficient) of the pthperson characteristic in hospital j which is allowed torandomly vary across hospitals (e.g., the effect of length ofstay on adverse event occurrence varies among hospitals). Inthe formulation (4), the specific effect for the jth hospitalon the outcome (u0 j) is adjusted for the effects of the Pperson-level characteristics (xi j p). In (5) γp0 represent theaverage slope across hospitals and up j the specific effect ofhospital j to the average slope (random effect). However, ineffectiveness analyses, slope parameters (γp j) are assumed tobe fixed (putting up j = 0 in (5) for p = 1, . . . ,P), whereasonly the intercept u0 j is allowed to randomly vary acrosshospitals. Such models, in which the regression slopes areassumed fixed, are denoted as variance component models.

In the model composed by (3)-(4) and (5) with up j = 0,the u0 j reflects the relative effectiveness of the jth hospital,depurated only by individual case-mix characteristics, andthus potentially depending on different hospital character-istics (Type A effectiveness).

For Type B effectiveness, one can move to the nextstep, accounting for variation in intercept parameters acrosshospitals by adding Q (q = 1, . . . ,Q) hospital variables zq j tolevel 2 equations. Hence, (4)-(5) become

γ0 j = γ00 +∑Q

q=1γ0qzq j + u0 j , (6)

γp j = γp0 +∑Q

q=1γpqzq j , (7)

Page 5: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

The Scientific World Journal 5

in which slope parameters (γp j) referring to (3) are specifiedas nonrandom covariates across hospitals, but possiblyvarying depending on characteristics of hospital j(zq j).

Methodologically, this step is justified when in themodel (3)-(4) the intercepts u0 j do significantly vary acrosshospitals (by investigating the associated residual ICC), oncethe patients’ characteristics are controlled for.

The compact form of (3)-(6)-(7) is

ηi j = γ00 +P∑

p=1

γp0xpi j +Q∑

q=1

γ0qzq j

+P∑

p=1

Q∑

q=1

γpqxpi jzq j + u0 j ,

(8)

where the double sum in (8) captures possible cross-levelinteractions between covariates at different levels (e.g., γpqexhibits that, for hospital j, the effect of length of stay(xpi j) on adverse event occurrence (ηi j) may depend on thespecialisation level zq j of the hospital).

In model (8), the parameters u0 j , called level 2 resid-uals, specify the relative effectiveness of the hospital j(Type B effectiveness): they show the specific “managerial”contribution of the jth hospital to the risk of adverseevent, depurated by overall risk (γ00), individual case-mix(∑

pγp0zpi j), structural/process characteristics of the hos-pitals (

∑qγ0qzq j), and their interactions (

∑p

∑qγpqxpi jzq j).

To make this interpretation clear, (8) can be rewritten byisolating the u0 j in the right term of expression (8): theeffectiveness parameter u0 j is thus a hospital unexplaineddeviation of the actual outcome (ηi j) from the expectedoutcome (γ00 +

∑pγp0xpi j +

∑qγ0qzq j +

∑p

∑qγpqxpi jzq j). The

expected outcome is the outcome predicted by the modelbased on the available hospital and patient-level covariates.For patient i of hospital j, the difference between actualand expected outcome has a hospital-level component u0 j

(the effectiveness). Notice that, since the expected outcomedepends on the covariates, the meaning of effectivenessdepends on how the model adjusts for the covariates (TypeA or Type B).

One method of estimating u0 j is to use the empiricalBayes (EB) residual estimator [24]. The EB estimator canbe interpreted as the difference between the “average” weactually observe for a hospital (average of the actual outcomefor a hospital) and the “average” that is expected for thehospital after controlling for the individual and hospitalfactors that influence the average (average of the expectedoutcome for a hospital). Hence, adjusting for both individualand hospital level sources of variation, the EB residual isthat part of the evaluation of the variable at hand (adverseevent occurrence) that we believe to be due to managementpractices. The exponential value of the estimated hospital-specific random effect u0 j is the odds ratio (OR): the oddsof experimenting an adverse event at the jth hospital dividedby the odds of an average hospital, after controlling for theindividual and hospital factors. Patients who are treated athospitals with positive random effects (OR > 1.0) havegreater odds of adverse event than patients who are treated

at an average hospital, whereas patients who are treated athospitals with negative random effects (OR < 1.0) have lowerodds of adverse event than patients who are treated at anaverage hospital.

However, since the residuals are affected by the samplingvariability and other sources of error, the correspondingranking has a degree of uncertainty. Such uncertainty isdifficult to represent, since it involves multiple comparisons.If Hospital A’s risk-adjusted outcomes are significantly betterthan those of Hospital B’s, then we are more confidentthat Hospital A offers high quality of care, but we cannotassume that Hospital A is actually better than Hospital B.Therefore, several authors [4, 8, 27] suggest avoiding hospitalrankings based on their risk-adjusted outcomes, but to placehospitals into a limited number of groups, based on statisticalcriteria. In a conservative approach, the usual procedure isto build 95% pairwise confidence intervals (CI) of level 2residuals, or their exponentiated values, and situate hospitalsinto three groups: effective (problematic) hospitals are thosewith CIs entirely under (over) the risk-adjusted mean (e.g.,regional) of warning event, whereas CIs that cross the risk-adjusted mean define the intermediate group. Further, theeffectiveness of two hospitals is statistically different whetherthe 95% pairwise ICs of u0 j do not overlap.

3.1. Case-Mix Adjustment. Typically, appropriate adjust-ment instruments must control for the principal diagnosiswithin a Diagnostic-Related Group-(DRG) (categorizationof each hospitalization based on the average resources usedto treat patients), contain demographics as proxies for preex-isting physiological reserve (e.g., gender, age, marital status,socioeconomic status), and measure the number and severityof comorbidities [28].

Comorbidities, or coexisting diseases, are obtained byDRG and principal-secondary diagnoses, whereas comor-bidity severity is measured with different strategies: amongothers, (i) aggregating comorbidities reflecting different con-ditions leading to hospitalization [29], (ii) aggregating DRGreflecting admission gravity (disease staging, [4, 30]). For ex-ample, disease staging maps from the list of comorbiddiagnoses to a severity scale that ranges from 1 to 4 wherestage one is the least severely ill and stage four is death.In absence of institutional software measuring severity,possible alternatives contained in Hospital Discharge Cardsdata are length of stay, admission type (planned/urgent),hospitalization type (surgical/other), DRG, and DRG weight,a numeric value assigned to each discharge based on theaverage resources consumed to treat patients in that DRG.

In this end, risk-adjustment methods that use onlyadministrative data appear to be a viable alternative to widelyaccepted severity adjustment methods when additional clin-ical data (medical chart, laboratory values, etc.) required byexisting severity adjustment strategies are not available [31].

3.2. Decomposing Total Variance. Various approaches havebeen proposed to examine the proportion of explainedvariance and to indicate how well the outcome is predictedin a multilevel model. A straightforward approach consists

Page 6: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

6 The Scientific World Journal

of examining the residual error variances and residual ICCin a sequence of models.

However, in an LLM, if we start with an intercept-only model, and then estimate a second model where weadd a number of covariates (the linear predictor in (3)),we normally expect the variance components to becomesmaller. However, in logistic regression the first level residualvariance is again π2/3. These implicit scale changes make itimpossible to compare regression coefficients across models,or to investigate how variance components change [25].One possible solution is a multilevel extension of a methodproposed by McKelvey and Zavoina [32] that is based onthe explained variance of a latent outcome in a generalizedlinear model. In this formulation, for a specific model withm covariates, the variance of ηi j is decomposed into thefirst level residual variance, which is fixed to π2/3, thesecond-level intercept variance σ2

0 and the variance σ2F of the

linear predictor (obtained by calculating the variability of thepredictions arising from the fixed part of the model). Thevariance of the linear predictor is the systematic variance inthe model, whereas the other two variances are the residualerrors at the two levels. In this specification, we can rescalethe variance estimates σ2

0 and π2/3 of a specified modelwith m covariates by an appropriate scale correction factor,that rescales the model to the same underlying scale as theintercept-only model. Let σ2 = σ2

0 + π2/3 denote the totalvariance of the intercept-only model, and σ2

m = σ20 + π2/3 +

σ2F for the “model m” including m first-level covariates.

Applying the scale correction factor σ2/σ2m to the variance

components of model m, the corrected variance componentscan be used for assessing ICC and the amount of varianceexplained at the two levels.

3.3. Aggregate Data and Rare Events. Often dichotomousdata may be available at higher levels than the patientlevel (e.g., aggregated adverse events occurring in the kthSpecialty belonging to hospital j). In that case, the individualdichotomous outcome Yij becomes a proportion or an eventrate, defined as the number of events divided by the totalperson of experience (πk j). Specifically, πk j is the ratiobetween Yk j , the counts of adverse events occurring in kthSpecialty of the jth hospital (stratum k j), and nk j , the sizeof the population at risk in stratum k j. Conditional onthe covariates, Yk j is assumed to have a binomial errordistribution, with expected value πk j and variance πk j(1 −πk j)/nk j , where nk j is the number of trials or the populationat risk (e.g., discharged patients) in stratum k j.

In this case, with aggregate data, we can continue to useLMM. Here, first level refers to Specialty k, instead of patienti. In each stratum k j, we have a number of patients who mayor may not experiment the adverse event. For each patient iin stratum k j, the probability of a warning event is the same,and the proportion of respondents in the kth Specialty ofthe jth hospital is πk j , which is the dependent outcome tobe modelled. This formulation does not model individualprobability and does not use individual-level covariates.However, in the presence of individual dichotomous data(Yik j for patient i in the stratum k j), we could have a model

where each individual’s probability varies with individual-level covariates in a three-level model.

With aggregate data, another possible way to modelproportions is to use regression count models. Count datais increasingly common in clinical research [33]. Examplesinclude the number of adverse events occurring during afollow-up period or the number of hospitalizations. PoissonRegression (PR, [34]) is the simplest regression model forcount data and assumes that each observed count Yk j isdrawn from a Poisson distribution with the conditionalmean μk j on a given vector xk j for stratum k j. If Yk j isassumed to be drawn from a Poisson distribution, the mixedPoisson regression is useful if researchers are interested inwhether the (logarithm of) expected rates (μk j/nk j), whichare incidence densities, varied across Specialty and hospitalcharacteristics or not. Here, nk j may denote both the size ofthe population at risk in stratum k j or the size of the timeinterval over which the events are counted varies.Indicating ηk j = log(μk j/nk j), once having substituted indexi with index k, (8) identifies the Poisson Multilevel Model. Itinvolves, as the dependent variable, an event rate, such as theratio of clinical errors resulting in patient death to the totaldischarges in the kth Specialties belonging to hospital j orthe number of clinical errors resulting in patient death percharge period. The random error u0 j continues to representthe specific managerial contribution of hospital j to the rateof clinical errors, once Specialties characteristics (case-mix)and hospital structural characteristics are taken into account.

The main feature of the Poisson model is that theexpected value of the random variable Yk j for stratum k jis equal to its variance. However, its assumption of equi-dispersion, resulting in an underestimation of the outcomevariability, is too restrictive for many empirical applications.In practice, the variance of observed count data usuallyexceeds the mean (overdispersion), due to the unobservedheterogeneity and/or when modelling rare events. In thissituation, one classic cause of over-dispersion is the presenceof the excess of zeroes in the analyzed outcome distribution(e.g., when many hospitals are not responsible for adverseevents). Ignoring over-dispersion seriously compromisesthe goodness of fit of the model, which also leads to anoverestimation of the statistical significance of the explicativevariables.

In this perspective, as described in the previous sections,a fundamental issue for statistical models is whether theoutcome indicator is likely to have the statistical power todetect differences in quality. In the presence of a rare event,the small number of patients experiencing said event limitsthe power of the study (at a given significance level) andone cannot conclude that some hospitals are better than therest, or that a specific hospital with low performance (highcomplication rate) is worse, as these differences might havearisen by chance.

When the data show over-dispersion and excess of zeros(rare events) compared to the expected number underthe Poisson distribution, other count models, such as theNegative Binomial Regression model (NBR, [34]) and Zero-Inflated regression models, appear to be more flexible. NBRis able to model count data with over-dispersion, because

Page 7: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

The Scientific World Journal 7

NBR is the extension of PR with a more liberal varianceassumption, modelled by means of a dispersion parameter.Instead, Zero-Inflated regression models address the issueof excess zeroes in their own right, explicitly modelling theproduction of zero counts. Specifically, it is assumed thatthere are two processes that produce the data: some of thezeros are part of the event count and are assumed to follow aPoisson model (or a negative binomial). Other zeros are partof the event taking place or not, a binary process modelledin a binomial model (logistic equation). These zeros are notpart of the count; they are structural zeros, indicating thatthe event never takes place.

Thus, for count data with the evidence of over-dispersionand when over-dispersion results from a high frequency ofzero counts (rare events), several modelling strategies givesatisfactory fitting measures.

3.4. Continuous Outcomes. The rationale underlying thespecification of (8) can be generalized to the case in whichthe outcome variable is assumed to be continuous (oris a scale in which the responses to a large number ofquestions are summated to one score) with a normal errordistribution. However, two main differences arise. Firstly,in a Linear Multilevel Model [24], instead of modellingthe logit of Yij , we directly model Yij and, secondly, themodel now involves the level 1 residuals ei j (assumed tohave a normal distribution with zero mean and varianceσ2e and to be independent from the level 2 residuals u0 j).

The parameter can be estimated by the full or restrictedmaximum likelihood method [24].

In the intercept-only model, the ICC (= σ20 /(σ

20 + σ2

e ))indicates the proportion of the variance explained by thegrouping structure in the population. Since, with additionalcovariates, all residual variance components become smaller,at each step, we can decide which regression coefficients orvariances to keep based on the significance tests, the changein the deviance, and changes in the variance components(residual ICC).

When the response variable does not have a normaldistribution, the parameter estimates produced by the max-imum likelihood method are still consistent and asymp-totically unbiased, meaning that they tend to get closerto the true population values as the sample size becomeslarger. However, the asymptotic standard errors (variance-covariance matrix of the estimated regression coefficients)are incorrect, and they cannot be trusted to produce accuratesignificance tests or confidence intervals for fixed effects[24, page 60]. One available correction method to producerobust standard errors is the so-called Huber/White orsandwich estimator [35], where variances of the estimatedregression coefficients are obtained by empirical residuals ofthe model (robust standard errors). This makes inference lessdependent on the assumption of normality.

Further, when the problem involves violations ofassumptions and the aim is to establish bias-correctedestimates and valid confidence intervals for variance compo-nents, a viable alternative to asymptotic estimation methodsis the bootstrap [25].

3.5. Outcomes Measured with Error. In specific circum-stances, effectiveness analyses may be conducted by usingquality of life outcomes (or patient satisfaction) which canconstitute the basis for assessing different hospitals in acomparative setting. Quality of life indicators refer to thegeneral condition of health of the patient (physical andmental health, functional state, independence in daily living,etc.) and describe the conditions in which services aredistributed.

Although such variables are not directly observable, theycan be estimated by analyzing tests administered to patients.Suppose we wish to analyze the data of a given class of nindependent subjects. Let ξ denote the latent outcome (orpatient satisfaction). The associated Linear Multilevel Modelis

ξi j = γ00 + u0 j +P∑

p=1

γp jxpi j + ei j , (9)

where ei j , conditioned on variables in the linear predictorand ξ, have zero mean and variance σ2

e and u0 j , conditionedon covariates and ξ are independent normal variables withzero mean and variance σ2

0 . However, ξi j is latent and we onlyobserve a fallible measurable version (Yo

i j). In accordancewith the Classical Test Theory, which assumes that theobserved scores for K tests measure the same true latentoutcome score, plus an error term, this defines an explicitmeasurement model for the latent outcome:

Yoi j = ξi j + δi j , δi j | ξi j ∼ N

(0, σ2

i

)(10)

in which the error term δi j is normally distributed with zeromean and variance σ2

i , which varies across subjects (i =1, . . . ,n) in the same manner across hospitals. For example,Yoi j can be thought as the total score obtained by summing

scores for patient i in hospital j over K administered testsor as a composite score, estimated by using one of theknown models for continuous latent variables. From (10) wecan decompose the variance (Var) of Yo

i j as the sum of itsorthogonal variance components:

Var(Yoi j

)= Var

(ξi j)

+ σ2δ , (11)

where σ2δ = N−1Σiσ

2i denotes the average of the individual

standard errors of the measurement.In such circumstance, when the variable measured with

errors is the response variable of the model, its measurementerror is captured by the model error and there are noconsequences on the estimated parameters, but this hasserious consequences on variance components. In fact, (11)illustrates that, due to measurement error, the variance ofthe estimated latent variable overestimates the true latentvariable variance.

Therefore, since instead of ξi j we observe an error-contaminated estimation Yo

i j , by adding δi j to both terms, themodel (9) becomes

Y◦i j = ξi j + δi j = γ00 + u0 j +P∑

p=1

γp jxpi j + ei j + δi j (12)

Page 8: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

8 The Scientific World Journal

in which σ2δ (the variance of measurement errors) enters as

an additional random component in the total variance of Yoi j ,

thus modifying formulas to obtain ICC.For the intercept-only model, ICC = σ2

0 /(σ20 + σ2

e + σ2δ ),

which resulted in an attenuated version of the true ICC, thusunderestimating the variability of outcome across hospitals.Hence, when the outcome is measured with error, ICC mustbe disattenuated (ICC§), by subtracting the term σ2

δ in thedenominator of the attenuated ICC.

To this end, different approaches can be utilized to esti-mate σ2

δ (and thus ICC§). These concerns can be addressedwithin the context of Rasch measurement models [36]providing measures underlying Likert scales with optimalcharacteristics. The Rasch model directly furnishes individ-ual estimates of σ2

i (the standard error of the estimatedoutcome for person i, measured across K items), andaveraging them provides an estimate of σ2

δ .Another possibility deals with factor analysis (FA).

Without loss of generality, let us consider K congeneric tests,allowing different error variances for K tests and removingthe assumption that all tests are based on the same unitsof measurement. Supposing that the scores of K items forn subjects are embedded in the vector of K variables Yo =(Yo

1 , . . . ,YoK )′, and let Yo = λξ + δ denote a single-factor

analysis model for K items, where λ = (λ1, . . . , λK )′ andδ = (δ1, . . . , δK )′ indicate the vector of partial regressioncoefficients of ξ in the regression of Yo on ξ, and the errorterms, respectively. In the FA model, σ2

δ can be estimatedonce the reliability of the composite ρ = 1−(σ2

δ /σ2), defined

as the ratio of true variance to observed score variance σ2, isestimated.

Unlike traditional methods for computing composites astotal scores, the use of maximally reliable composite scores[24] minimizes measurement error in the items contributingto each scale, thus increasing the reliability of the computedscale scores. More specifically, let ξ∗ = λ∗′Σ∗−1Y◦ denotethe factor score estimates for the individuals, where Σ∗ isthe estimated covariance matrix of the observed indicatorsand λ∗ the estimated vector of regression coefficients; thereliability of the composite ξ∗ is estimated as

r = w∗′ (Σ∗ −Θ∗)w∗

(w∗′Σ∗w∗), (13)

where w∗ is the estimated vector of factor score regressionweights (w∗ = λ∗′Σ∗−1) that maximize r and Θ∗ is thediagonal matrix of estimated error terms variances δK .

Finally, measurement error bias becomes more seriouswhen the model involves a covariate measured with error(e.g., when the outcome at baseline is used as a covariateto estimate performances), causing bias in the estimatedparameters. This arises because the measurement error of theoutcome at baseline is correlated with ei j + δi j in (12).

4. Methodological Problems

As described, the proposed analyses on large adminis-trative archives can be used for benchmarking purposes.Notwithstanding the illustrated advantages, these analyses

also present specific challenges, due to the following potentialareas for bias.

4.1. The Risk of Risk Adjustment. Firstly, risk adjustment canonly adjust for factors that can be identified and measuredaccurately (case-mix fallacy). Consequently, risk adjustedbenchmarking, using administrative data, can be hamperedby underreporting, that is, the potential endogeneity of therecorded patient-level covariates (outcomes are correlatedwith the propensity to record information across hospitals)and the potential for nonconsidered covariates (misspecifi-cation). For example, if an important severity measure ismissing from the database, assuming that the distributionof this unmeasured covariate will vary across hospitals, thevariability of adjusted outcomes among hospital may beoverestimated [30].

Furthermore, when using administrative archives foradverse events, claims data is problematic in nature, giventhe limited number of claims generally emerging fromadministrative sources (underreporting, or lack of close callsor near misses/errors that do not result in injury) and the lackof information on the causes of medical errors causing injuryto patients (e.g., processes and systems of care that may beresponsible).

Secondly, unmeasured risk factors are not randomlydistributed across hospitals, due to clustering of certain typesof patients in certain hospitals’ practices. Users can easilydraw incorrect conclusions, because the hospitals that appearto have the worst outcomes may simply have the mostseriously ill patients. To this end, the practice of routinelydisseminate risk-adjusted hospital comparisons has beenstrongly criticized, since an institution’s position in rankingsstrongly depends on the method of risk adjustment used[37].

Third, since differences in the quality of care withinhospitals (e.g., DRGs and/or Specialties) may be greater thandifferences between hospitals, there is no clear evidence ofhigh correlation between how well a hospital performs onone standard of effective care and how well it performson another. After risk adjustment, the remaining hospitalsvariability (type B effectiveness) may be imputable to com-plex factors, typically depending on a reciprocal interactionbetween patient case mix (pathologies, clinical severity)and the institutional form of the hospital (profit, not-for-profit/public, private, University hospital, etc.). Therefore,the unexplained hospital variability appears to be physiolog-ical and not possible to eliminate completely [8, 22, 37]. Inthis perspective, it has become imperative to evaluate whichbenchmarks keep the risk of comparing noncomparablehospitals to a minimum.

To this end, some authors [38] propose to use additionalfactors, which contribute most to variability in patient ex-perience, as supplementary adjustment variables for patientmix or as stratification variables in order to present transpar-ent benchmarking analyses.

4.2. Selection Bias. Patient selection bias is a distortion ofresults due to the way subjects are selected for inclusion in

Page 9: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

The Scientific World Journal 9

the study population. Patients are not randomly assigned tohospitals. Whereas randomized and controlled trials reduceself-selection bias through randomization by evenly dis-tributing subjects among treatment/hospital, observationalstudies based on administrative database are nonrandomizedand effectiveness results may be confounded by selection biasdue to systematic differences in admission practices between(private/public) hospitals or differences in hospital referralpatterns. Such selection biases may result in the preferentialadmission (or exclusion) of patients with different under-lying prognoses, independently of the severity of patients’illness.

Estimates of the effects and outcomes can be biased dueto a correlation between factors (such as baseline healthstatus) associated with hospital selection and outcomes(endogeneity). In fact, effectiveness random parameters u0 j

are assumed as independent and uncorrelated with fixedexplicative variables. When this correlation occurs (e.g., thismay occur since the patients are selected in hospital), thehypothesis is not valid and the model is not appropriate.Such a correlation can result in erroneous inferences aboutthe magnitude and statistical significance of hospital effects[25]. Assessment of such bias, which limits a suitable relativeeffectiveness of hospitals [39], would be extremely difficultand would require information about all possible hospitaladmissions.

A straightforward remedy to endogeneity due to apossible covariate xp is to add the hospital mean of xp tothe model equation: this makes the patient level covariate xpuncorrelated with the hospital effects, so valid estimates ofthe Type A effects can be obtained. In this sense, the bias isshifted to Type B effects by the endogeneity of hospital-levelcovariates that typically occurs for the omission of relevantcovariates at this level.

Furthermore, to control for selection bias in observa-tional data, different statistical techniques can be used forevaluating hospital effectiveness that adjust for observedand unknown differences in the baseline characteristics andprognostic factors of patients across hospitals. PropensityScore (PS), Instrumental Variable (IV), and Sample SelectionModels (SSM) are three techniques developed to minimizethis potential bias [39, 40].

PS is the individual probability that a patient will receivea particular treatment (i.e., chooses hospital j) and is esti-mated by logistic regression that predicts a patient’s choiceas a function of covariates, including patients’ pretreatmentcharacteristics (sociodemographic, comorbidities, diagnosis,and urgency-related factors). Using PS, potential bias due tohospital choice is minimized if the choice and the outcomebeing evaluated are conditionally independent given themeasured pretreatment characteristics.

Further, in a second stage, ad hoc models (e.g., LMMor multilevel version of count regression models when dataare aggregated) are used to estimate relative effectivenessacross hospitals in the outcome equation, adjusting forposttreatment characteristics and propensity scores. This canbe done by adding PS as additional continuous covariate orby estimating hospitals effectiveness in the outcome equationwithin propensity scores strata, typically quintiles.

Sample Selection Models (SSMs) attempt to control thebias introduced by unobserved variables in hospital selection,which are also correlated with the outcome of interest. SSMs,widely used in the econometrics literature, are a specialcase of Instrumental Variable (IV) Models. The conceptbehind an IV is to identify a variable, the “instrument,”that is associated with a subset of the variables that predicthospital choice but is independent of the patient’s baselinecharacteristics. If a good IV is identified, both measuredand unmeasured confounders can be accounted for in theanalysis.

Typical instruments include severity of illness, territorialsupply of healthcare providers that may or may not offerspecific treatments the distance from each patient home toeither the nearest hospital that does specific treatments; orthe nearest hospital, that may or may not provide specifictreatments [41].

SSMs are two-stage methods. Before estimating theoutcome equation (second-stage model), the probability thatpatient i has chosen hospital j is predicted as an endogenousvariable, as a function of observed patient and hospitalcharacteristics, including instrumental variables. Further, allinstrumental variables are excluded from the second-stagemodel.

The residual from the first stage is then added as anexplanatory variable to the outcome equation. It capturedthe unobservable nonrandom component and allowed us tocontrol for selection bias. Instead, IV techniques, contrary toSSMs, use a single equation to estimate the relative effective-ness without estimating the choice equation that is replacedby the presence of instruments in the outcome equation.

5. Application

To clarify the potentiality of the presented methods, thissection focuses on hospital effectiveness concerning patientsatisfaction. In Lombardy, the monitoring of patient sat-isfaction, mandatory for hospitals, is performed using theOfficial Customer Satisfaction (OCS) questionnaire of theLombardy region. It contains 12 items regarding acceptance,healthcare performance, satisfaction with physicians andnurses, accommodation, discharge, and two items asking foran overall judgement of satisfaction. Each item is scored on aseven-point Likert scale ranging from 1 to 7. Scores of 5 andover indicate increasing levels of satisfaction, whereas scoresof 3 and below indicate dissatisfaction.

Available data, provided by the regional Directorate ofHealthcare, refers to all Lombard hospitals in 2009, whichbetween April and November 2009, delivered the OCSquestionnaires to a random sample of discharged patients,proportional to their annual number of discharges in 2009.

For the analysis, we select only patients with plannedadmissions to general hospitals (excluding urgency admis-sions and specialist hospitals) in order to minimize the risk ofpatient selection for analysed hospitals. Globally, the sampleis composed by 46,096 patients, nested in 64 hospitals (anaverage of 720 patients per hospital). Exploring the patientcovariates embedded in the OCS, patients differ by gender(46% are female), age class (7% < 24 years, 37% in the

Page 10: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

10 The Scientific World Journal

Table 1: Item Analysis: missing values, percentage of patients satisfied, and item-component correlation (n = 46, 096).

Missing % Satisfied Y1 Y2 Y3

Item description values (scores 6+7) ClinSAT GenSAT WaitLists

Nurses’ courtesy, attention, availability 370 88.5% 0.70 0.04 0.09

Doctors’ courtesy, attention, availability 606 89.4% 0.83 0.19 0.08

Satisfaction of the care provided 1309 89.3% 0.81 0.06 0.07

Health status (and discharge) information 609 85.1% 0.79 0.06 0.05

Privacy and consent information 635 88.6% 0.72 0.11 −0.05

Comfort, bed, food, cleanliness 2150 83.6% 0.12 0.78 0.11

Organisation of the process of the care 627 81.6% 0.11 0.78 0.02

Recommend hospital (friends or relatives) 1346 85.2% 0.12 0.73 −0.03

Overall satisfaction 704 85.3% 0.05 0.72 0.01

Waiting time to be admitted to the hospital 1417 75.7% 0.01 −0.02 0.99

age class 25–54 and 55% > 54 years), schooling level (5%primary school, 50% middle school, 36% high school, 9%university degree), and nationality (94% are Italian).

Available hospital structural characteristics involve sector(Private/public), typology (University or not), size (in threebed-size categories), and whether the hospital has an emer-gency unit. Hospital process measures (all measured in 2009and obtained by Hospital Discharge Cards) involve numberof specialties in the hospital (N Specialties), percentageof beds utilized (% Beds), number of operating roomsutilized (N OpRoom), total number of hours operatingroom utilized (Hours OpRoom), average monthly hours peroperating room (Ave MH OpRoom), and the case-mix ofcharged patients during 2009.

The case-mix is measured as the percentage of (surgicaland medical) discharges having DRG weight above (Highcase mix) or below (Low case mix) the regional median DRGweight. In the analyzed sample, 52% are public hospitals,85% have emergency unit, 8% are University hospitals, and36% have more than 250 beds (5% < 50 beds).

Analyzing items scores (Table 1) with Confirmative Fac-tor Analysis, we found three orthogonal (Varimax rotation)composites: the first deals with clinical aspects satisfaction(Y1: ClinSAT), the second with general and accommodationaspects of satisfaction (Y2: GenSAT), and the third coincideswith the single item dealing with satisfaction on waiting timeto be admitted in hospitals (Y3: WaitLists). For the first two,coefficients alphas (αY1 = 0.92; αY2 = 0.90) and compositereliability (rY1 = 0.89; rY2 = 0.84) indicate acceptable inter-nal consistency and reliability for the estimated composites.

Despite many patients being very satisfied in manydomains (column 3 of Table 1), a multilevel analysis isperformed to assess whether there are meaningful differencesbetween hospitals in evaluations of patient satisfactionand whether these differences remain, after controlling forpatient and hospital characteristics (hospital effectiveness).Specifically, we specify a Linear Multilevel Model for thecomposites Y1 and Y2, whereas a Logistic Multilevel Modelis used for predicting the probability of being dissatisfiedwith waiting time, using as dependent outcome Y3d (Wait-DISSAT), a dichotomous variable that is equal to 1 whenthe score on the Waiting time item ≤3 and is equal to 0otherwise.

Table 2: ICC and significant hospital characteristics.

Y1 Y2 Y3d

ClinSAT GenSAT WaitDISSAT

ICC 13.0%§ 14.8%§ 12.2%

Residual ICC 2.7%§ 9.5%§ 1.2%#

Hospital Characteristics Model coefficients and significance

Private Hosp n.s 2.068∗∗ 0.0420∗∗∗

University Hosp 1.729∗∗ n.s n.s.

% Beds −0.020∗ −0.056∗∗ n.s.

N Specialties −0.079∗∗∗ −0.281∗∗∗ −0.0040∗∗∗

N OpRoom 0.072∗∗∗ −0.102∗ n.s.

% High medical casemix 3.515∗ n.s n.s.

Hours OpRoom n.s 0.001∗∗∗ n.s.

Ave MH OpRoom n.s −0.058∗∗∗ −0.0004∗§

corrected for measurement error, #rescaled with scale correction factor.∗∗∗ P-value < 0.01, ∗∗P-value < 0.05, ∗P-value < 0.10, n.s. = notsignificant.

The upper part of Table 2 exhibits, for Y1 and Y2,the corrected (disattenuated) ICCs in the intercept-onlymodel and the residual ICCs (the remaining proportion ofvariability due to hospitals differences, once that covariatesare inserted in the models). For Y3d, the Residual ICCis rescaled with the scale correction factor, in order to becomparable to the ICC of the intercept-only model.

The three patient outcomes appear to be highly influ-enced by the inclusion in the different hospitals; for con-tinuous outcomes, the disattenuated ICCs (higher than theattenuated versions that equal 8.2% and 10.4% for Y1and Y2, resp.) demonstrated that a high proportion of thedifferences in the outcomes is attributable to differencesbetween hospitals. This especially occurs for Y2, meaningthat almost 15% of the variance in overall satisfaction(14.8%) is across hospitals.

To explain these differences, available covariates areused. The lower part of Table 2 exhibits covariates that aresignificant at least for one outcome. Firstly, individual patientcharacteristics and other hospital characteristics (such as thechirurgical case-mix, hospital dimension, and presence ofemergency unit) are found to be not significant (at the 0.05significance level).

Page 11: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

The Scientific World Journal 11

This highlights that the three patient satisfaction dimen-sions are not affected by patient characteristics and do notsignificantly vary among available hospital characteristics.

In contrast, for Clinical Satisfaction (Y1), most of thevariation is associated to the difference in the number ofspecialties (inversely linked with Y1) and of the numberof operating rooms (positively linked with Y1) betweenhospitals, with higher levels of Y1 for university hospitals,demonstrating that clinical satisfaction is higher in special-ized university hospitals.

The overall satisfaction (Y2) is higher for private hos-pitals with high volumes of operating room hours utilizedand decreases for hospitals with several specialties and highutilization rates of operating rooms. Observing Y3d, it isof note that the significant covariates for predicting overallsatisfaction (Y2) act in exactly the same manner in predictingthe dissatisfaction for waiting time, (higher for privatehospitals with several specialties) and the high utilizationrates for operating rooms.

After checking for hospital characteristics, the residualICCs become very small, except for Y2 that decreases to 9.5%from 14.8%. Globally, the significant hospital covariatesexplain 81%, 34%, and 90% of the outcome variabilityamong hospitals for Y1, Y2, and Y3d, respectively.

The remaining hospital differences (residuals) are pur-ported to define effects of management practices (Type Beffectiveness) to increase patient satisfaction in the threedomains.

Before investigating the obtained rankings, we explorepossible covariate endogeneity by means of three generalizedlinear models which specify, for each outcome, the hospitalresiduals (u0 j) as dependent variable. In these models, theeffects of hospital covariates are found to be not significant(at the 0.01 significance level).

The global F-tests, referring to the hypothesis that allcovariates’ coefficients are equal to zero, versus the alternativethat at least one does not, are largely not significant (FY1 =0.41, P-value = 0.954; FY2 = 0.49, P-value = 0.913; FY3d =0.51, P-value = 0.987), meaning that no serious endogene-ity is found, so valid effectiveness parameters are obtained.

As a last step of the analysis, we check the concordanceof three hospitals rankings based on the estimated u0 j (TypeB effectiveness). Spearman correlations (r) exhibit weakagreement between estimated rankings for all outcomes,showing three independent dimensions. Specifically, theranking based on overall satisfaction is significantly andpositively correlated with the ranking based on clinicalsatisfaction (r = 0.375, P-value = 0.002) and with thosebased on satisfaction with waiting time (r = 0.304, P-value =0.014), although of modest strength. Instead, the correlationbetween the rankings of clinical and waiting time satisfactionis positive, but at the limit of statistical significance (r =0.252, P-value = 0.045).

6. Conclusion

Using clinical outcomes for quality assessment representsan important approach to documenting the quality ofcare. Consumers of indicator information (stakeholders,

clinicians, and patients) need reliable and valid informationfor benchmarking, making judgments, and determiningpriorities, accountability, and quality improvement.

Where health services have effects on outcome, use ofoutcome measures as performance indicators is appropriateand efforts should be taken to ensure that the benchmarkingstrategies can be interpreted reliably. However, the conclu-sion that differences in outcome are due to differences inquality of care will always be tentative and open to thepossibility that the apparent association between a givenunit and poor outcome is due to the confounding effect ofsome other factor that has not been measured, measuredinadequately, or misspecified.

As the empirical application has shown, estimated hospi-tal rankings must be interpreted in scrupulous detail. Despitesuch limitations, clinical administrative data is broadlyconsidered as a useful screening tool for identifying quality-related problems and in targeting areas, which potentiallyrequire in-depth investigation. The simultaneous monitor-ing of several outcomes, which indicate malpractice appearsto offer a useful strategy in facilitating hospitals and stake-holders in detecting trends and identifying extreme outliers.

Once a benchmark for each performance measure isdetermined, analyzing data results becomes more meaning-ful.

However, moving from the evaluation step towards thephase of statistical implications mainly depends on the wayin which monitored (e.g., adverse) events are distributedamong hospitals. If a large proportion of adverse events areconcentrated among relatively few hospitals, the traditionalquality control approach targeting error prone, ineffectivehealth structures for specific attention has high potentialvalue. When variation is discovered through continuousmonitoring, or when unexpected events suggest performanceproblems, members of the organization may decide thatthere is an opportunity for improvement.

The opportunity may involve a process or an outcomethat could be changed to better meet customer feedback,needs, or expectations.

In contrast, when ineffective hospitals are more diffuselydistributed, targeting specific hospitals may be a less efficientstrategy than investigating the clinical processes in the frame-work of continuous quality improvement with an emphasison careful examination, rigorous, scientific testing methods,statistical analysis, and the transparent adjustment of clinicalprocesses.

To this end, exhaustive and exclusive measure specifica-tions should be described, including specific definitions ofthe clinical indicators and standards and identification of thetarget population and data sources.

Steps can be taken to minimize the possibility of a falseconclusion being drawn on the quality of care based onoutcome measurement.

Standardising how data is collected can reduce the extentto which differences in measurement can potentially causeobserved variation. Including sufficient numbers of patientswill reduce the possibility of random variation mask-ing real differences or making spurious differences appear.Development of sophisticated case-mix adjustment systems

Page 12: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

12 The Scientific World Journal

can reduce the possibility that observed differences are dueto differences in the types of patient, developing of ananalytical plan with descriptions of the statistical and clinicalsignificance of results to be assessed when comparing groupsor comparing a group to a standard.

As part of the development process, indicator measure-ment can be made more efficient when incorporated intoroutine patient care as part of clinicians’ and administra-tors’ documentation of required information on patientcharacteristics and care delivery, already being recorded forclinical purposes (medical record data). This would eliminateduplicative clinical data collection for the purposes of clinicalcare and quality assessment.

In conclusion, another important topic that affects theevaluation quality of the care in a benchmarking perspectiveis the institutional condition of the healthcare system and itsmodifications over time.

For example, the English National Health Service (NHS)has developed from 2002 onwards a new era of hospitalmarket (New Labour). Under this model, competition arisesfrom patient choice, selective contracting of purchasers(primary care trusts) with providers and from competitionbetween different providers (NHS trusts, private providers,independent sector treatment centres, and NHS foundationtrusts).

In Italy, since 2001, the healthcare system has movedin the direction of a welfare-mix system, characterized byfreedom of choice for the consumer and by the joint-presence of state agents (operating with functional financialautonomy), private profit or nonprofit accredited companiesendowed with autonomous decision-making and managerialprocedures and by freedom of choice for the consumer.

Hence, the specific question is to evaluate the relationbetween hospital competition and hospital quality. To thisend, some recent econometric studies focusing on NHSfind causal effects of hospital competition on care quality.Specifically, they show that competition improves clinicalquality (as measured by reduction in hospital mortality ratesafter myocardial infarction) and also reducing waiting times[42, 43].

In this perspective, other open questions remain crucial:does available evidence-based result support institutionalproposals to extend competition? How does competitioncompares with other policies to increase hospital quality?More applied research is required for these topics.

Overall, the present paper suggests a launching board fordiscussions with experts in the field of administrative data,risk adjustment, and performance measurement reporting.Clinicians and researchers should actively participate indesigning future administrative databases to ensure that theyare clinically meaningful and useful for quality measurement,offering regional stakeholders the opportunity to gain adeeper understanding of the problematic areas in clinical riskassessment.

References

[1] J. Øvretveit and D. Gustafson, “Improving the quality of healthcare: using research to inform quality programmes,” BritishMedical Journal, vol. 326, no. 7392, pp. 759–761, 2003.

[2] JCAHO Joint Commission on Accreditation of HealthcareOrganization, A Guide to Establishing Programs for AssessingOutcomes in Clinical Settings, Oakbrook Terrace, Ill, USA,1994.

[3] CIHI Canadian Institute for Health Information, ExecutiveSummary: Data Quality Documentation, Discharge AbstractDatabase 2005-2006, CIHI Press, Ottawa, Canada, 2006.

[4] AHRQ Agency for Healthcare Research and Quality, “Guid-ance for using the AHRQ Quality Indicators for hospital-levelpublic reporting or payment,” 2006, http://www.qualityindi-cators.ahrq.gov/.

[5] S. F. Jencks, T. Cuerdon, D. R. Burwen et al., “Quality ofmedical care delivered to medicare beneficiaries: a profile atstate and national levels,” Journal of the American Medical As-sociation, vol. 284, no. 13, pp. 1670–1676, 2000.

[6] A. Donabedian, “The quality of care. How can it be assessed?”Journal of the American Medical Association, vol. 260, no. 12,pp. 1743–1748, 1988.

[7] L. J. Opit, The Measurement of Health Service Outcomes,Oxford Textbook of Health Care, 10, OLJ, London, UK, 1993.

[8] H. Goldstein and D. J. Spiegelhalter, “League tables and theirlimitations: statistical issues in comparisons of institutionalperformance,” Journal of the Royal Statistical Society. Series A,vol. 159, no. 3, pp. 385–443, 1996.

[9] L. M. Koran, “The reliability of clinical methods, data andjudgments. Part II,” New England Journal of Medicine, vol. 293,no. 14, pp. 695–701, 1975.

[10] K. Lohr, Medicare: A Strategy for Quality Assurance, NationalAcademy Press, Washington, DC, USA, 1990.

[11] R. G. Gift and D. Mosel, Benchmarking in health care, Ameri-can Hospital Publishing, Chicago, Ill, USA, 1994.

[12] L. I. Iezzoni, A. S. Ash, M. Shwartz, J. Daley, J. S. Hughes,and Y. D. Mackieman, “Judging hospitals by severity-adjustedmortality rates: the influence of the severity-adjustmentmethod,” American Journal of Public Health, vol. 86, no. 10,pp. 1379–1387, 1996.

[13] W. R. Best and D. C. Cowper, “The ratio of observed-to-expected mortality as a quality of care indicator in non-sur-gical VA patients,” Medical Care, vol. 32, no. 4, pp. 390–400,1994.

[14] C. Vincent, P. Aylin, B. D. Franklin et al., “Is health care gettingsafer?” British Medical Journal, vol. 337, no. 7680, pp. 1205–1207, 2008.

[15] L. I. Iezzoni, “The risks of risk adjustment,” Journal of theAmerican Medical Association, vol. 278, no. 19, pp. 1600–1607,1997.

[16] L. I. Iezzoni, “Assessing quality using administrative data,”Annals of Internal Medicine, vol. 127, no. 8, pp. 666–673, 1997.

[17] A. Epstein, “Performance reports on quality—prototypes,problems, and prospects,” New England Journal of Medicine,vol. 333, no. 1, pp. 57–61, 1995.

[18] NHS, National Health Service, “Commission for Health Im-provement. A commentary on Star Ratings 2002-2003,” 2004,http://www.chi.nhs.uk./ratings/.

[19] “IQIP International Quality Indicator Project,” 2004, http://www.internationalqip.com/.

[20] R. W. Dubois, R. H. Brook, and W. H. Rogers, “Adjusted hos-pital death rates: a potential screen for quality of medical care,”American Journal of Public Health, vol. 77, no. 9, pp. 1162–1167, 1987.

[21] P. M. Rothwell and C. P. Warlow, “Interpretation of operativerisks of individual surgeons,” Lancet, vol. 353, no. 9161, p.1325, 1999.

Page 13: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

The Scientific World Journal 13

[22] C. Damberg, E. A. Kerr, and E. A. McGlynn, “Description ofdata sources and related issues,” in Health Information Systems.Design Issues and Analytical Application, E. A. McGlynn, C.Damberg, E. A. Kerr, and R. A. Brook, Eds., pp. 43–76, RANDHealth, Santa Monica, Calif, USA, 1998.

[23] S. W. Raudenbush and J. D. Willms, “The estimation of schooleffects,” Journal of Educational and Behavioral Statistics, vol.20, pp. 307–335, 1995.

[24] H. Goldstein, Multilevel Statistical Models, Edward Arnold,London, UK, 1995.

[25] T. A. B. Snijders and R. J. Bosker, Multilevel Analysis. AnIntroduction to Basic and Advanced Multilevel Modelling, Sage,London, UK, 1999.

[26] H. Goldstein and A. H. Leyland, Multilevel Modelling of HealthStatistics, Wiley, New York, NY, USA, 2001.

[27] N. Rice and A. Leyland, “Multilevel models: applications tohealth data,” Journal of Health Services Research & Policy, vol.1, no. 3, pp. 154–164, 1996.

[28] N. P. Wray, J. C. Hollingsworth, N. J. Petersen, and C. M. Ash-ton, “Case-mix adjustment using administrative databases: aparadigm to guide future research,” Medical Care Research andReview, vol. 54, no. 3, pp. 326–356, 1997.

[29] A. Elixhauser, C. Steiner, D. R. Harris, and R. M. Coffey,“Comorbidity measures for use with administrative data,”Medical Care, vol. 36, no. 1, pp. 8–27, 1998.

[30] L. I. Iezzoni, Risk Adjustment for Measuring Health CareOutcomes, Health Administration Press, Ann Arbor, Mich,USA, 1994.

[31] T. Lagu, P. K. Lindenauer, M. B. Rothberg et al., “Developmentand validation of a model that uses enhanced administrativedata to predict mortality in patients with sepsis,” Critical CareMedicine, vol. 39, no. 11, pp. 2425–2430, 2011.

[32] R. McKelvey and W. Zavoina, “A statistical model for theanalysis of ordinal level dependent variables,” Journal ofMathematical Sociology, vol. 4, pp. 103–120, 1975.

[33] R. J. Glynn and J. E. Buring, “Ways of measuring rates ofrecurrent events,” British Medical Journal, vol. 312, no. 7027,pp. 364–367, 1996.

[34] D. Lambert, “Zero-inflated poisson regression, with an appli-cation to defects in manufacturing,” Technometrics, vol. 34, no.1, pp. 1–14, 1992.

[35] H. White, “Maximum likelihood estimation of misspecifiedmodels,” Econometrica, vol. 50, pp. 1–25, 1982.

[36] B. D. Wright and M. Mok, “Rasch models overview,” Journalof applied measurement, vol. 1, no. 1, pp. 83–106, 2000.

[37] R. Lilford, M. A. Mohammed, D. Spiegelhalter, and R.Thomson, “Use and misuse of process and outcome datain managing performance of acute medical care: avoidinginstitutional stigma,” Lancet, vol. 363, no. 9415, pp. 1147–1154, 2004.

[38] B. M. Holzer and C. E. Minder, “A simple approach tofairer hospital benchmarking using patient experience data,”International Journal for Quality in Health Care, vol. 23, no. 5,pp. 524–530, 2011.

[39] N. Zohoori and D. A. Savitz, “Econometric approaches toepidemiologic data: relating endogeneity and unobservedheterogeneity to confounding,” Annals of Epidemiology, vol. 7,no. 4, pp. 251–257, 1997.

[40] P. Rosenbaum and D. Rubin, “Reducing bias in observationalstudies using subclassification on the propensity score,” Jour-nal of the American Statistical Association, vol. 79, pp. 516–524,1984.

[41] H. S. Luft, D. W. Garnick, D. H. Mark et al., “Does qualityinfluence choice of hospital?” Journal of the American MedicalAssociation, vol. 263, no. 21, pp. 5899–2906, 1990.

[42] Z. Cooper, S. Gibbons, S. Jones, and A. McGuire, “Doeshospital competition save lives? Evidence from the EnglishNHS patient choice reforms,” Econometric Journal, vol. 121,pp. F228–F260, 2011.

[43] G. Bevan and M. Skellern, “Does competition betweenhospitals improve clinical quality? A review of evidence fromtwo eras of competition in the English NHS,” British MedicalJournal, vol. 343, no. 7830, article d6470, 2011.

Page 14: BenchmarkingStrategiesforMeasuring theQuality…downloads.hindawi.com/journals/tswj/2012/606154.pdf · benchmarking involves using comparative data between organizations to judge

Submit your manuscripts athttp://www.hindawi.com

Stem CellsInternational

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

MEDIATORSINFLAMMATION

of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Behavioural Neurology

EndocrinologyInternational Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Disease Markers

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

BioMed Research International

OncologyJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Oxidative Medicine and Cellular Longevity

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

PPAR Research

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

Immunology ResearchHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Journal of

ObesityJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Computational and Mathematical Methods in Medicine

OphthalmologyJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Diabetes ResearchJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Research and TreatmentAIDS

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Gastroenterology Research and Practice

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

Parkinson’s Disease

Evidence-Based Complementary and Alternative Medicine

Volume 2014Hindawi Publishing Corporationhttp://www.hindawi.com


Recommended