Benchmarking for Improvement-2013

7/27/2019 Benchmarking for Improvement-2013

1/57

1

Gerhard HammerschmidSteven Van de Walle

Vid timacTony CutlerToby JamesNicholas PrychodkoMichal DziongBarry Quirk

solace.org.uk

November 2013

Benchmarking for improvementEdited by Clive Grace

Arnold F. ShoberWendy ThomsonAlan FennaElaine Yiu LuGwyn BevanDeborah WilsonSteve MartinJames Downe

Sandra NutleyMark McAteerDavid MartinMichael CoughlinJuliet WhitworthAndrew StephensSabine KuhlmannTim Jkel


2/57

2

This SFI pamphlet provides a PolicyBriefing on the critical and ubiquitousrole being performed by benchmarkingin public services both in the UK andinternationally. It complements andpartly draws on a special issue of PublicMoney and Management edited by meand Alan Fenna which also addressesthese issues, and which includes some

overlapping material treated in greaterdepth, and with comprehensive references(see Public services benchmarking andexternal performance assessment: Aninternational perspective.Guest editors:Clive Grace and Alan Fenna (Vol. 33, No.4, 2013) at http://www.tandfonline.com/r/pmm-benchmarking). The pieces by Bevanand Wilson, Coughlin, Downe et al, Fenna,Hammerschmid et al, Kuhlmann and Jkel,Lu, McAteer and Martin, Schober, andStephens will all be found there in one

form or another. I am very grateful tothe Editors of PMM for their support inpreparing this publication, and especiallytheir Managing Editor Micky Lavender.I also thank my colleagues at the SolaceFoundation in particular for sharing theirpublic platform with the Guardian in orderto give these issues the widest possible

airing, and David Gooda at NorthernDesign Collective for helping us present itso professionally.

Finally, I would emphasise our continuingappreciation to both the ESRC and theForum of Federations for their support,which is explained in more detail inthe Foreword.

Clive GraceNovember 2013

Editors note

solace.org.uk


3/57

3 solace.org.uk

ContentsForewordClive Grace, James Downe, Alan Fenna,Felix Knpling, Steve Martin, Sandra Nutley

Benchmarking and the Improvement End of the

TelescopeClive Grace, James Downe, Ala n Fenna,

Felix Knpling, Steve Martin, Sandra Nutley

Benchmarking Inequality: Measuring Education

Progress in American EducationArnold F Schober

Choosing to get better? A Canadian perspective on

sector-led improvement in local childrens services

Wendy Thomson

Benchmarking in a federal context Alan Fenna

Unlocking the Black Box: Performance Evaluation

Practices in China Elaine Yi Lu

Does naming and shaming work? The impact of transparent

public ranking on hospital and school performance Gwyn Bevan

and Deborah Wilson

Natural Laboratory: Learning from a comparison of Performance

Regimes in the UK Steve Martin, James Downe, Clive Grace and

Sandra Nutley

Benchmarking and Service Improvement in Scottish

Local Government Mark McAteer and David Martin

Benchmarking Data for Improvement: Local Government and LGInform Michael Coughlin and Juliet Whitworth

Performance management and benchmarking: The Wales

experience Andrew Stephens

Why does performance benchmarking vary?

Evidence from European local government

Sabine Kuhlmann and Tim Jkel

What determines whether top public sector executives actually use

performance information? Gerhard Hammerschmid, Steven Van de

Walle, and Vid timac

Persuasion and Evidence: an historical case study of

public sector benchmarking and some theoretical reflections

Tony Cutler

Benchmarking Standards of UK Elections

Toby James

Listening to the Voice of Municipal Citizens:

A Canadian PerspectiveNicholas Prychodko and

Michal Dziong

Performance Management: a par t of the answerBarry Quirk

4

7

16

19

22

25

28

30

33

36

39

42

45

48

50

52

55


4/57

4

Clive Grace, James Downe, Alan Fenna, Felix Knpling, Steve Martin, Sandra Nutley

solace.org.uk

A Confluence of Interest

This SFI pamphlet is grounded in two major streamsof work initially conducted by separate teams ofresearchers and policy analysts in the UK and in Canada.In the UK, a team based at Cardiff and Edinburgh (andlater St Andrews) Universities in various combinationsexplored performance issues across UK local

government through a series of studies funded by arange of government and research bodies. A particularfocus for them became the variety and comparison oflocal/central regimes for assessing the performance oflocal government and of local services in the emergingnatural laboratory of post-devolution UK. Meanwhilein Canada, the Forum of Federations developed theperformance benchmarking of services between federaland state/provincial levels as a major theme of its workacross the world, reflecting the growth of that activityin many federal jurisdictions. Its work covers Australia,Canada, the European Union, Germany, Switzerland and

the United States.

The two came together in 2011 in a joint project fundedpartly by the Forum and by the ESRC (ESRC KnowledgeExchange Programme award number ES/J010707/1)with the aim to improve local public services through a

series of linked conferences, seminars and workshopsto enable two-way dialogue and collaboration to helpimprove the assessment of public services so they aremore affordable, better meet community needs, andrespond to underlying change. We also aimed to informthe design of subsequent research through identifyinggaps in the knowledge base and possible avenuesfor future innovation and learning. The UK team was

Dr James Downe (Principal Investigator), Dr CliveGrace, Professor Steve Martin, and Professor SandraNutley, and the Canadian team was Felix Knpling andProfessor Alan Fenna. Part of the prospectus for theproject was to prepare a Policy Briefing to provide areview of international performance assessment writtenspecifically for a policy and practitioner audience, anddistributed widely via the web, professional associationsand other media. It would aim to draw on existingresearch knowledge and bring together lessons fromthe conferences and seminars to highlight best practicefrom across the world. Th is pamphlet gives effect to

that intention, together with a special issue of PublicMoney and Management edited by Grace and Fenna(http://www.tandfonline.com/r/pmm-benchmarking )which also addresses these issues, and which includessome overlapping material treated in greater depth.

Foreword


5/57

5 solace.org.uk

The collection of pieces here starts with a flavourof the international range and variety. Schoberdocuments the minuet which has taken place overmany years between the US federal governmentand the states in relation to educationperformance, in a jurisdiction which few of usmight immediately associate with benchmarkingat least in the public sector. Thomson thendocuments developments in the related field ofchild welfare, but in Canada and in relation to aneven more complex set of issues and a much more

varied set of delivery bodies. Fenna demonstratesthe global character of benchmarking through hisaccount of the Australian Report on GovernmentServices. Lu then shows that these methodologiesare also finding traction in China for China is not,as she puts it, immune to the global movementof performance evaluation. She explores thewho, what and how of the subject in Guangdongprovince, after what is almost a decade of activitythere in this field. This plants the thought thatthe scope for the application of performanceassessment of public services in China as a wholeis significant.

Bevan and Wilson provide the bridge to a groupof pieces which assess developments in the UK

and re-assess the UKs situation. They review bothhealth and education performance in England andWales to take account of that natural laboratoryof public services differences. They conclude thatexposing professionals to reputational risk has asignificant impact on performance. This is not aringing endorsement of terror and targets, butit might well encourage central policy-makers tobe tougher about publication and transparencyin relation to benchmarks and performancemeasures. Meanwhile Downe et al. review how

performance regimes have developed across theUK. They find interesting variations and changein the positions being taken in England, Scotlandand Wales, and those changes are not at all in onedirectionthere is no obvious maturity model atwork here. Rather, the political and administrativecontext is perhaps what most explains thedirection of travel, including directions of travelbetween England and Wales, for example, whichlook to be crossing each other in opposite pathsto those they have previously taken. To underpinthese broader analyses, Coughlin, McAteerand Martin, and Stephens describe recent

developments in all UK three jurisdictions aroundwhat might be thought of as the nuts and boltsof benchmarking and performance management.

The message here is the medium, as much as thecontentall three local government associationsare taking a stronger and more positive approachto the importance of collecting, validating andpublishing benchmark and performance data.This is part of taking a wider and more maturerole in sector-led improvementbecause sector-led improvement requires local authorities (bothindividually and collectively) to take responsibilityfor improvement as well as merely to haveit.

The significance of political and administrativecontext is apparent also in the piece by Kuhlmannand Jkel, the first of two with a Europeancomparative perspective. They standback and review inter-municipal benchmarkingregimes across four very different European

jurisdictions, and find it possible to make theconnections between regime and context. Theway performance is compared and benchmarkedamong local governments varies widely in theOECD world, but the governance structures ofinter-municipal benchmarking regimes currentlyto be found in European countries are largely

shaped and influenced by the starting conditionsof reforms. We are all prisoners of context now,it seems, because Hammerschmid et al. also


6/57

6 solace.org.uk

find it to be of importance in explaining theactual use of performance information by a largecadre of managers of high-level public sectorexecutives from six European countries. The useof such information varied considerably, but thatvariation was seen to be strongly influenced bythe context of the implementation of performancemanagement instruments in an organization.

Three interesting niche areas of benchmarkingare then explored by Cutler (the history of

benchmarking school buildings in the UK), James(benchmarking and elections) and Prychodkoand Dziong (customer focus in Canadianmunicipalities), before Quirk winds it all upthrough a practitioners reflections on wherebenchmarking and performance assessment sitswithin the wider lexicon of improvement action.

Next Steps

Readers of this pamphlet are encouraged to jointhe group set up on the LGAs Knowledge Hubto build a network of policy-makers, academicsand practitioners with an interest in performanceassessment and benchmarking. The groupwill facilitate further knowledge exchange andresearch opportunities. It contains all materialsfrom the UK and international events (www.knowledgehub.local.gov.uk/register then see

Benchmarking and external performanceassessment. See also www.forumfed.org).

Clive Grace is Honorary Research Fellow atCardiff Business School, James Downe isReader of Public Policy and Management atthe Centre for Local & Regional GovernmentResearch, Cardiff Business School, AlanFenna is Professor of Government at CurtinUniversity, Perth, Australia, Felix Knplingis Head of Programs, Forum of Federations,Steve Martin is Professor of Public Policy and

Management at Cardiff Business School, andSandra Nutley is Professor of Public Policy andManagement at the University of St. Andrews


7/57

7

Introduction

Benchmarking of public services mattersbecause it is critical for governments andcommunities who need to know whetherservices are effective and efficient, whois accountable for service delivery, andwhether the outcomes of service deliveryare in the interests of the citizenry. It is animportant framework for policy decision-making as well improving delivery.

Narrowly defined, benchmarkinginvolves the comparative measurement

of performance but we can understandit more broadly to mean the use ofcomparative performance measurementas a tool for identifying and adopting moreefficient or effective practices. For us, it ismore than an assessment device, it is also

a learning and adjustment tool. Seen inthis light, benchmarking is so ubiquitous

within public services managementand measurement that it is not somuch a technique as a way of thinking a disposition toward comparativeassessment, learning, and action. Thusin the context of this pamphlet it refersto the comparison of some aspect of apublic service against a standard, againstthe services of others, or against onesown services over time, coupled with anintention to learn and improve. Yet thatsimplicity masks a world of interesting and

often difficult questions.

Benchmarking exercises have been widelyadopted in devolved and federal systems.All devolved countries face the issue ofbalancing the interests of the national or

Benchmarking and theImprovement End of theTelescope

solace.org.uk

Clive Grace, James Downe, Alan Fenna, Felix Knpling, Steve Martin, Sandra Nutley

federal government in key areas of publicpolicy with the desire of subnational units

or local government to have autonomyor at least flexibility in terms of how theymanage programs. In many devolvedcountries there has been a trend towards aflexible kind of relationship between ordersof government in areas of joint interest.Conditions imposed by the central/federallevel are becoming less restrictive. Assuch controls are loosened many devolvedcountries are showing a strong interest inbenchmarking in order to determine goodor best practices. This development

can be observed in developed as well asdeveloping countries alike.


8/57

8

It may be the chameleon character

of benchmarking that underpins itspopularity as an approach to performancemanagement and measurement. Inany event, that popularity is increasinginternationally, and we should understandbetter why that is so and what are itsconsequences, as well as how it might bedone better.

Origins

Like other aspects of new public

management, benchmarking is a practicethat has spread from the private to thepublic sector with the promise that it willdrive improvements in service delivery.Both external (voluntary with othercompanies) and internal (imposed by topmanagement on company units) versions ofbenchmarking can be found in the privatesector often referred to as, respectively,bottom-up and top-down benchmarking.However, it is the top-down version thattends to predominate in public services.

The lack of intrinsic financial incentive andexternally validated profit measures in thepublic sector is in some ways precisely thereason for introducing benchmarking there just as it has been for internally imposedbenchmarking in major private companies.Performance monitoring and the imposition

of benchmarking requirements is a public

sector surrogate for market forces. Thismay be initiated by an individual agencyto improve its own performance but giventhe lower level of intrinsic incentive and thegreater difficulties, such action is likely tobe the exception to the rule. In reality, thelower level of incentive means that publicsector agencies are more likely to needsuch requirements to be imposed on them.

Hence, then, the attraction of a quitedifferent form of sanction: the political

device of naming and shaming. Here theexercise has the public as audience anaudience it is assumed can be reachedeffectively and will respond in a waythat has the desired sanctioning effect.Reaching such an audience often meanssimplifying performance informationto construct league tables ranking

jurisdictions or agencies according to theirperformance. Well-known in the contextof schools performance, this is a muchdebated device such as teaching to the test

where measured performance is enhancedby neglecting the broader suite of oftenless tangible or immediate concerns,and where the overall purpose may beeclipsed in these efforts to achieve themeasured targets. Since indicators are atbest incomplete representations of policy

solace.org.uk

objectives and sometimes vague proxies,

and there is always going to be a tendencyto hit the target and miss the point.Gaming takes the problem one step further,with performance monitoring regimesgiving agents an incentive to structuretheir activities in such a way as to producethe desired indication of results withoutnecessarily generating any improvementin real results. We could expect that thehigher the stakes involved, the higher thepropensity for perverse behaviour of boththose forms.

It is however possible to design systems topartly address such problems. Proponentsargue that good design and improvementover time will minimise pathologies andeven if there are such dysfunctionalresponses, the overall gain may outweighthe costs. Further, such problems maybe more likely to arise in the assessmentof complex outcomes, but in any eventsome of the simpler benchmarkingrequirements which create less opportunity

for gaming have real potential value.There is much utility in measuring publicsector outputs and in measuring outputefficiency (process benchmarking) andthere are a number of practical serviceswhich government provides where difficultmeasures of impact are not the issue -

although even here there may be significant

challenges given the complexity of manypublic sector outputs.

For benchmarking advocates thecreation of such regimes prompts andpromotes progressive improvement inthe data - a poor start is better than nostart. One lesson of the UK experiencewith a performance monitoring relianceon quantitative indicators, though,seems to have been that significantqualitative dimensions slip through the

net with potential for quite misleadingconclusions to be drawn. For publicsector benchmarking, much hinges onthe development of reliable indicators inregard to both processes and outcomes.In addition, it requires that data sets befully consistent across the benchmarkedentities and reasonably consistent overtime. And, given the complex relationshipbetween government action and particulareconomic or social desiderata and thedegree to which circumstances vary,

assessment of those data must be wellcontextualised to ensure adequate analysisand interpretation.


9/57

9

Benchmarking in the UK

The UK warrants a particular focuswhen it comes to benchmarking. Inthe summer of 2010 the newly electedcoalition government announced theabolition of the principal benchmarkingand performance management regimefor local government in England, theComprehensive Area Assessment, and itsintention to abolish the principal authorsand stewards of that regime, the AuditCommission as well. The government also

announced new requirements for publicservices to publish more information sothat an army of armchair auditors wouldbe sufficiently equipped to hold thoseservices to account directly.

These policies were introduced in thecontext of the wider programme with itsemphasis on localism and on the BigSociety, and a comprehensive assault onthe many intermediary and arms-lengthbodies which were seen as fogging the

relationship between government andcitizenry. They were also a reaction toa decade or more of what was seen astop down performance management,inhibiting the exercise of professionaldiscretion at the front line and creating abureaucratic morass.

The current localist philosophy

propounded by some in the UK coalitiongovernment has taken UK policy-makers into unchartered waters. Itshifts the balance of public and privateaccountability, and re-draws the linesbetween state intervention and individualresponsibility. It assumes that localauthorities will scrutinize their ownperformance and voters will make rationalchoices when presented with performancedata, as they do in an efficient market.Thus public service performance regimes

return to private sector benchmarkingmethodologies. But there are importantquestions to be answered: will theinformation be sufficient (or perhaps toomuch), and how can it be harnessed tosupport and inform consumer behaviourto drive the desired outcomes of greaterefficiency and effectiveness? At the sametime, governments in Wales and Scotlandhave electoral mandates that affect theperformance assessment of local publicservices in those jurisdictions.

There are indeed many varieties ofbenchmarking in the UK. First, thereis a wide range of service-based costand technical comparisons conductedas benchmarking clubs of one kindor another. These include those of the

solace.org.uk

Improvement Service in Scotland in

conjunction with Solace; the Associationof Public Service Excellence (APSE), a not-for-profit voluntary body established withservice comparisons of blue collar localgovernment services as a core aim; theChartered Institute for Public Finance andAccountancy (CIPFA), a major professionalaccountancy body for, inter alia, localgovernment finance staff; and the WalesAudit Office (WAO), the statutory publicaudit body for Wales. Also in this areaare the communities of practice

established across a range of differentservices by the Improvement andDevelopment Agency (IDeA), an agency ofthe Local Government Association (LGA),which has now been absorbed within theLGA, and the development of LGAs ownINFORM project.

Secondly, there have been a seriesof centrally determined performanceindicator sets with results often publishedin the form of league tables. Then there

have been performance regimes forlocal authorities, looking at the wholeorganisation and testing them againstpre-set frameworks, including theComprehensive PerformanceAssessment (England), the WalesProgramme for Improvement, and

Best Value Audits (Scotland). These led

in England to a yet wider programmeof Comprehensive Area Assessments,which brought together data on a muchwider group of local services. Alongsidethese performance regimes has been aprogramme of voluntary assessmentsusing external peer review methodsagainst a framework underpinned bythe European Framework for QualityManagement (EFQM).

There have also been major excellence

benchmarking schemes, which testprojects and services against a pre-designed benchmark to identify best andexcellent practice, and most notably thecentral government run Beacon CouncilScheme in England.

Performance assessments such as theseare important partly because of verticalfiscal imbalance, where there is a lack ofalignment between the level of governmentor the agency which is paying for a service

and that which is delivering it. Further,high-profile service failures have eroded


10/57

10

confidence in professionals to protect

the interests of their pupils, patients andclients, replacing traditional trust-basedbureaucratic and professional controlswith more explicit contractual relations.At the same time, the marketization andassociated fragmentation of responsibilityfor public service delivery has leftgovernments reaching for long distancemechanisms of control to exert oversightover increasingly complex networks ofproviders. External assessment played apivotal role in the Blair/Brown governments

strategy for public services reform. NewLabour believed that top-down terror andtargets provided an important stimulusfor improvement. In contrast, the devolvedadministrations of Scotland and Waleshave eschewed hard-edged performanceregimes, developing their own moreconsensual approaches to assessment.The different methods of performanceassessment adopted within the UK overthe past 10 years provided a naturallaboratory for comparing the effects of

cooperative and competitive approaches.This is an issue that has gained interestinternationally as governments movetowards more self-regulation by localgovernment and scrutiny by citizensacting as armchair auditors.

There is a quite strong and definite

relationship between benchmarkinginstruments and theories of improvementbut it is not always easy to pin down inparticular instances. The range, evenat one particular moment, can beconsiderable, and available scenarios asto how the relationship might developcan carry a strongly normative character.Thus, a relationship of targets and terrorcarries both potential risks and rewardsand a potential regulatory burden butone which may pay dividends. Central

and local government and regulatorsalike may remain wedded to suchmodels when they have already passedtheir optimum effectiveness, and whencentral government needs to let goand local government needs to movebeyond mere compliance. In contrast,an era of cooperation and contractin centrallocal relations invites theuse of both different instruments anddifferent behaviours, especially if thefocus is switched to achieving desired

outcomes rather than merely deliveringdesirable outputs. If the other end of thespectrum is reached one that may becharacterised as a locally driven approachof initiative and innovation then therole of benchmarking is likely to look very

solace.org.uk

different, and perhaps much less intensive.

As seen by one of the high priests of publicservice change and improvement in theUK, it is really a ques tion of whether, forexample, the entity to be improved needsto move from awful to adequate or israther at the stage of going from good togreat (Barber: 2007).

Benchmarking in one form or another hasfeatured in all the theories ofimprovement as applied to UK localgovernment. Moreover, each benchmarking

instrument carries at least potentially a sub-theory, which helps explain(were it to be articulated) what behaviourit is hoping to stimulate or inhibit throughits application. Such theories are not, ofcourse, always made explicit, and if theyare they may not be right about thebehaviour predicted. Nor is it always thecase that where a bundle of instrumentsare explicitly assembled, the resultingcomposite theory will be internallycoherent or fully comprehensive. Just

as UK governments have been vigorousin their use of benchmarking for localgovernment, so have they also been fairlyexplicit about what they hoped to achieveand how but they may not have gotit right.

An International Phenomenon

The simplicity of benchmarking and theglobal reach of NPM within public serviceshas given it an international character (seefor example Mizell: 2009) which looks atdevelopments in Norway, Italy, Austria,Denmark, Sweden, Finland, Ireland, and theNetherlands. This international characterhas at least three dimensions.

The first of these is its relevance forcountries with developed public services

but with a federal rather than a unitarycharacter, such as Australia, Canada, andthe United States.

A number of questions arise for federalsystems, most notably in the way thatbenchmarking arrangements may affectintergovernmental relations and thefunctioning of the federal system, and theextent to which it enhances federalismand what form of benchmarking is mostconducive to effective federal practice.

Alongside these, federal jurisdictionsexperience more universal issues such as


11/57

11

the challenges entailed in moving from

performance monitoring to active policylearning, and whether benchmarkingactually leads to improved outcomes

In federal systems, central governmentsand constituent units have to balance thecentripetal and centrifugal impulses forcountry-wide policy outcomeson the one hand, and policy outcomes thatrespect state autonomy or at leastpromote flexibility, on the other.Benchmarking has become part of that,

and thus an important aspect of federalgovernance. But the issue of how toset up the governance of benchmarkingregimes is also emerging as a key issue.One assessment is that models of acollegial nature, that are not based onhierarchy, targets and reputation effects(naming and shaming), encourage thegreatest willingness of constituent units toparticipate. However, the jury stands outwhether it is those arrangements that bestlead to performance improvement.

The second international dimension is therelevance of benchmarking to developingcountries. This is well reflected in GIZsAssessing Public Sector Performance(2011: Bonn) which reviews what a reessentially benchmarking methodologies

in the Philippines, Nepal, Indonesia,

Ethiopia, and Paraguay. They demonstratethe variety of focus, indicators andmethodologies which operate, and thecritical role which administrative, politicaland developmental context plays inshaping their objectives and scope. Theyalso establish clearly that whilst there are noone-size-fits-all solutions, there are factorswithout which success is unlikely, althoughthey cannot guarantee success. Theyinclude issues of ownership, the importanceof incentives, simplicity, transparency, and

the need to relate performance measuresto policy objectives for the public servicesector which is at issue.

The third international dimension is theextent to which benchmarking permitscomparison of public service performancebetween countries and also regions withincountries. The outstanding domainhere is the educational attainment ofyoung learners which is captured in theOECDs PISA methodology (Programmefor International Student Assessment),providing longitudinal and horizontalcomparison in various fields of educationalattainment across 70 developed anddeveloping countries (see http://www.oecd.org/pisa). Interestingly, PISA has informednot only comparative assessment between

solace.org.uk

countries and for individual countries

over time in a developmental context.It also directly informed political andadministrative action within the UK by theWelsh Government which drew onPISA data to confirm that despiteequivalent injections of resource into bothWelsh and English education followingdevolution, the attainment of Welshlearners had fallen well behind their Englishcounterparts. This appeared to be relatedto the differences in approach adoptedin Welsh public services as compared to

England significantly in a post-devolutioncontext it had been necessary to employwider international comparators becausean in-UK comparison was not as such inpractice otherwise available.

Another important benchmark in thisarea is PEFA (see http://www.pefa.org)the Public Expenditure and FinancialAccountability framework which providesan external benchmark against whichcountries can assess the comparative andabsolute health of their systems for publicfinance. And one final aspect of this partof benchmarkings international characterwhich warrants mention is the MillenniumDevelopment Goals (see http://www.un.org/millenniumgoals) which benchmarkswhole countries and regions across

fundamental indicators concerned with

poverty, health, and education, and whichprovide clear measures to assess wholesociety progress.

Benchmarkings Modern Idioms:Outcomes and Austerity

Whether in the character of a mutant virusor an organism adapting sensibly to a newenvironment, benchmarking itself continuesto evolve and develop. Two recent aspectsconcern the increasing focus on outcomes

and also the extent to which benchmarkingcan help tackle the modern menace ofausterity.

As to outcomes, it is very striking thatoutcomes are now a feature of manybenchmarking regimes. For example theyfigure not only in the Australian federalexperience, but also in the performanceregimes within devolved parts of the UK,notwithstanding the wide difference inconstitutional arrangements and politicalsystems. In part, this has reflected therecognition that top-down targets andexternal assessments on public services


12/57

12

may distort behaviour, and encourage a

focus on narrow scoring systems ratherthan the outcomes that matter most tocitizens and service users. To that extent itmay be more than just a defensive moveby those who would rather not have theirown direct performance scrutinized andcompared unfavourably. It may insteadreflect more mature debates and a greaterunderstanding about the relationship ofpublic services to things that matter forpeople and communities. It may also giveeffect to the generally better capacity

and capability in public services andtheir delivery, and the vastly improvedinformation communication systems thatnow exist. The LGAs INFORM project, forexample, could probably not have beencontemplated in quite its current formatuntil relatively recently.

One of the current high-water marks ofan outcomes approach is the SingleOutcome Agreements being implementedin Scotland. Not only does it engage thewicked issues that really do matter. Italso serves as an instrument to connectand align the legitimate aspirations anddemocratic mandates both of local councilsand of Scottish government ministers, andto bind in other key par ties as well. It is at arelatively early stage of implementation

especially given how long many important

outcomes take to achieve. But it showsconsiderable promise, and has attracteda good deal of positive attention in otherparts of the UK. Importantly, given thecharacter of most of the priority publicservices outcomes, it is necessary inmany cases to treat with proxies and tomeasure intermediate output and processindicators as well. Testimony to that is thework by local authorities themselves toidentify relevant indicators and to collectand compare quality data to measure

them is going on in parallel to the broaderoutcomes approach in Scotland. It is anessential underpinning to the broaderoutcomes based approach.

So the shift of focus to outcomes doesnot herald a decline in benchmarking somuch as extend its range still further. Italso heralds its application to more matureappreciations in many different countriesof the need to measure what is importantas well as what can be readily quantifiedand compared. By providing linkagebetween inputs, outputs and outcomesit also serves potentially to integrateinternal performance management withthe external impacts that public servicesorganizations strive to deliver. It is nopanacea, but contains much promise.

solace.org.uk

A further major contemporary challenge

in both the UK context and many otherjurisdictions is how to best deploybenchmarking in an age of austerity andthe attendant cuts in public expenditureand retrenchment in public services.The benchmarking of unit costs may besomething which is especially useful (andperhaps likely to become more common)given the pressures on public spendingin the UK and elsewhere. Unit costs oftenvary widely, for example, between localauthorities and health trusts, and, worse

still, sometimes they do not know what theirunit costs actually are. Bringing these costsinto the open in order to ask whether highcost services can learn anything from lowercost ones is an important contribution forbenchmarking to make to the austeritychallenge.

Beyond that, in the UK it is widelyacknowledged that long-term expenditurereductions will have to draw on changeat the tactical, transactional, andtransformational levels. At the tacticallevel tightening efficiency in existingservices, shrinking eligibility, and soon financial indicators look to be themost useful. For transactionalchange improving systems using lean methods orbetter technology, for example process

benchmarks are likely to be more relevant.

But for transformationalchange tackling the wicked issues, for example,that cross organisational boundaries, whereservices are being completely re-designedaround customer needs, or where radicalreconstruction is called for probably onlyexcellence benchmarking will be of any useat all, at least at the initial, innovatory stagewhen the early adopters are struggling atthe leading edge. The transformationallevel will increasingly be required, yet it isthe area in which benchmarking is weakest.

So as austerity deepens and persists, it isdifficult to avoid the provisional conclusionthat benchmarking has potentially lessrelevance than in less turbulent times.

Conditions for Success

Benchmarking is popular partly becauseit is a simple and flexible instrument,and one which has shown its capacity tobe developed and applied to a myriadof circumstances and problems. But itspopularity also owes much to the way it


13/57

13

engages some of the most important and

universal themes of modern public lifeand public services, and in particular theway it can serve transparency, trust, andaccountability. Even if benchmarking isnot a panacea or a complete answer topublic services performance, there reallyis no excuse for principled resistance tothe application of benchmarking to publicservices in order to let funders, citizens,users, and fellow service providers knowhow a service is performing. That is notthe whole story, of course, but it ought to

be the starting point, and to underpin themany issues of how, when, and who the issue of whether to benchmark shouldnot even be on the agenda.

Beyond that it is important to highlightsome of the conditions for successfulbenchmarking in addition to those alreadycanvassed here, including thevery important data related issues. Fourstand out.

First there is the issue of the role ofprofessionals. The relationship of theprofessions to benchmarking has beenvery mixed. Some professionals haveundoubtedly resisted the transparencyand public scrutiny which accompaniesbenchmarking, and that has not been to

their credit. It is undoubtedly the case that

some top-down benchmarking has beenmisconceived and has failed to respectthe pressures and the problems whichprofessionals face on the front line. Inother cases, however, it has been more aquestion of resistance to accountabilityand legitimate performance assessment.Either way, at the heart of professionalculture at its best is a commitment toservice and to doing things better, anda sense of values which underpins theprofessions generally. If benchmarking can

engage those values, and that sense ofprofessional self-respect which is reinforcedin the opinions of ones professional peers,that is capable of becoming a powerfulmotivator for the comparison, learning, andimprovement action which is the essence ofbenchmarking.

The second is the wider question oforganisational cultures and the leadershipwhich helps to shape and reinforce theattitudes and behaviours of all thoseengaged in public service. Leadershipwhich is committed to transparency andimprovement is not a guarantee of aculture which is conducive to benefit frombenchmarking, but in its absence theprospects are very slim. Obviously this isnot just a question of heroic leadership

solace.org.uk

from the top - having the right behaviours

and attitudes on the part of leaders atvarious levels within the public serviceorganisations is critical.Thirdly, it is difficult to overstate thesignificance of the digital revolution and thenew environment for public services whichhas been created as a result. That revolut ionhas the potential to transform the deliveryof public services, the relationship betweenservices and their users, and the way inwhich services are produced and managed.The changes we have already seen are only

the beginning. The digital revolution hasbeen a disruptive technology, in the bestsense of that word, and benchmarking isnot always the best instrument to supportthat kind of change. But it also makespossible the more effective collection,validation, interpretation, and comparisonof data, and that makes it imperative toassess and exploit the potential whichit carries to conduct benchmarkingmore effectively.

Finally, a critical feature of successfulbenchmarking is whether the authors andstewards of a benchmarking regime have aclear and coherent theory of improvement.This again is a matter of a necessarybut not sufficient condition for success.It is essential to the effective design of

benchmarking systems that the relationship

between the indicators chosen, the data tosupport them, the methods and resourcesfor interpretation and assessment, and thelevers for subsequent change, are thoughtthrough and rest in some kind of organisedalignment. What is called for is a wholesystem approach, even though the politicaland administrative context inwhich benchmarking systems aredeveloped and introduced may not alwaysbe conducive to that.

Either way, the lesson is fair ly clear. It isessential to think about and to deploy acombination of benchmarking (and other)tools from the improvement end of thetelescope. Adopting an outcome focus,policy makers need to ask themselves:What do you want to get better? Whatis the current context of change, andwhat are the key relationships and forcesshaping that context? How do you thinkchange will happen what is your theoryof improvement? What will be the role ofbenchmarking within that? And how bestcan you optimise that role?


14/57

14

This will still not fashion a silver bullet of

change from the benchmarking tools attheir disposal, but it will perhaps help toensure that the triggers for improvementare more likely to work in the right placeand in a timely manner.

Key Themes and Future Research

The UK may be an outlier in the extent andnature of benchmarking of public servicesbut it is clear that the benchmarkingof public services is very much an

international phenomena and that manyfeatures of performance measurementand management of subnational unitsare widely shared across a range of

jurisdictions. Vertical fiscal imbalanceis a major impulse to performancemeasurement, but data problems andissues bedevil comparison and inhibitclear causal analysis of why things goright or wrong and how they might bestbe copied or fixed. In the turbulent worldof politics and public services the linebetween positive criticism and destructiveblame is continually negotiated, as is theconstant flux around self-regulatory andmore incentivised and interventionistarrangements. But there are now visibleperformance regimes in many jurisdictions,underpinned by explicit or implicit theories

of improvement, conveying complex and

multiple purposes which are not alwayswell related to the benchmarking systemsthey underpin. Effective comparison at ahorizontal level and conducted voluntarilyis difficult enough, but when overlaid by themultiple and intersecting accountabilitiesbetween national and subnational orders ofgovernment, and between government andthe citizenry, the landscape becomes evermore turbulent and difficult to negotiate.For all the effort and energy devoted tobenchmarking, it remains an instrument

of improvement which is s till developingand evolving. Its significance more thanjustifies a future research agenda, and fromour work we have distilled six interwoventhemes which warrant further enquiry.

The first of these is the need to capturethe evolving landscape of benchmarkingand external performance assessmentboth across the UK and in comparative

jurisdictions. We have already seensignificant developments in the U.K.sdevolved context, and benchmarkingsystems are evolving both in developedand developing countries across theworld. What is required in part is both anarrative and analytical account which iscapable of recording the flux and variationas benchmarking systems and theories

solace.org.uk

of improvement change or are engulfed

by tsunamis of service crisis and politicalpreference. The UK offers a special promisein the natural laboratory of public servicewhich has emerged, but both in the UKand more broadly there is an opportunityto apply, test, extend and refine the keyconcepts and lessons of the benchmarkinglandscape.

Secondly, we need more empiricalevidence on the impact of benchmarkingsystems. We need to develop a better

understanding of what works, for whatpurposes, how, when, and why, and withwhat spillovers and opportunity costs. Thelimited evidence available suggests thatbenchmarking systems based on hierarchy,targets, and reputational effects have themost impact on performance improvement.However, they are very unpopular andare likely to have limited lives. Is theresome way to get the benefit without thedownside?

The third area is that of the multiple andcrosscutting accountabilities which operatewithin public services in democratic

jurisdictions. Local authorities have theirown local democratic mandate, but oftenthis is one which has at best an imperfectrelationship to the wishes and needs of the

citizens who they serve given low election

turn outs and the unaligned relationshipbetween the responsibilities of localauthorities and their tax and revenue base.At the higher (or different) level of stateor central government, the issue is notsimply that this may be the source of someor all of the funding for public servicesdelivered at local level, although that initself might be thought justification enoughto require measurement, comparison, andimprovement. More significantly, many ofthe services delivered at state or local level

are of legitimate interest to other levelsof government as a consequence of thedemocratic mandates which they also hold.Defensiveness and political differencemay well intrude on what might otherwisebe a natural partnership of interest inunderstanding comparative performanceand making it bet ter. Either way, this isa potentially significant area for futureresearch interest.

Next, there is the question of the role ofcitizens and service users in benchmarking.In practice, these are often only marginal


15/57

15

participants in many benchmarking

systems. They may well be surveyedfor their views on service quality andperformance, but they are rarely involvedin discussions about what the indicatorsshould be, what they mean, and whatshould happen and change as a resultof benchmarking results. There may alsobe scope for incorporating softer formsof intelligence about service qualityand performance into benchmarkingsystems by using social media. Clearly,this communication and presentational

dimensional is very relevant to armchairauditors, but there is a significant questionmark as to whether they actually exist, andif they do then how their audits can bemade more effective.

Fifthly, there is the relationship betweenpolitics, politicians, and benchmarking.In many ways this is a marriage made inboth heaven and hell, and with the mediaas the key witness. Benchmarking canunderpin the critical political accountabilityto which all public services should besubject. However there is a major issuein the problem of political time horizons,and media drivers. Between them theyalways risk turning a question of legitimateaccountability into one of blame and pointscoring. Great benchmarking requires

tremendous political self-discipline, and

a maturity of view which is not alwaysin evidence. Indeed, natural featuresof government and politics may be infundamental contradiction with whatappears to be important principles ofbenchmarking, at least in the mediumor long-term. For example, one naturallylooks to a comparative time-series ofperformance data to inform an assessmentof progress and relative improvement. Butthe vagaries of politics and governmentmay mean that at best a benchmarking

system will have a shelf life of at best a fewyears. The UK experience suggests that 3 to4 years is about the limit in the modern era.And yet, what if the medium and long termnever arrives, nor is ever really intended toarrive? And what if a series of successiveapproximations and short-term gains werethe only significant game in town?

Finally, there is the major question ofwhether sector led approaches of selfregulation and comparison can deliverimprovement consistently withoutconstant or at least occasional injectionsof top-down discipline and incentives.The early work in some areas has beenpromising in part, albeit that the timescalesfor establishing voluntary schemesof benchmarking in the public sector

solace.org.uk

appear to be very lengthy. A key issue

will therefore be whether having had suchthorough preparation they then deliverstrong results and endure over the mediumto long term. But there are also continuingconcerns as to whether essentiallyvoluntaristic approaches sufficiently involvethe public, or have enough by way ofintelligence to detect and address risk, orenable a properly joined up approach tobe taken to public services performanceassessment. Voluntaristic approaches alsoraise difficult issues about the use and

relevance of data for different audiences,and the different mandates which theybring to their use of that data. All of thistakes us back again to the importanceof understanding the ecologies andthe impacts of benchmarking systems,and the way in which they are assessedand reviewed, and the way in whichaccountability deficits for public servicesin democratic jurisdictions can best beidentified and rectified.

Clive Grace is Honorary ResearchFellow at Cardiff Business School,James Downe is Reader of Public Policyand Management at the Centre for Local& Regional Government Research, CardiffBusiness School, Alan Fenna is Professorof Government at Curtin University,

Perth, Australia, Felix Knpling is Head

of Programs, Forum of Federations, SteveMartin is Professor of Public Policy andManagement at Cardiff Business School,and Sandra Nutley is Professor of PublicPolicy and Management at the Universityof St. Andrews.

(This introductory essay is based on the work of

the Forum of Federations, the ESRC Knowledge

Exchange Programme, and on previous work

of the Cardiff/Edinburgh-St Andrews research

team. See the materials on the Knowledge Hub

at www.knowledgehub.local.gov.uk/register andgo to Benchmarking and external per formance

assessment. See also www.forumfed.org and

Benchmarking in Federal Systems (2011) Edited

by Alan Fenna & Felix Knpling published by the

Australian Productivity Commission and available

at www.pc.gov.au, and in particular the pieces by

Fenna, Grace and Knpling.)

Barber, M. (2007)Instruction to Delive r: Tony

Blair, the Public Services and the Challenge to

Deliver, Politicos, London.

Mizell, L (2009) Promoting Performance,

Working Paper no.5 in the OECD Network on

Fiscal Relations, OECD, Paris


16/57

16

In 2001, the U.S. Congress passed theNo Child Left Behind Act (NCLB) to

boost student academic achievement,a top-down benchmarking strategypar excellence. Yet Congress choiceof benchmarking rather than nationalstandards, exams, curriculum, or directspending illustrates the complexand unpredictable relationshipbetween federalism and public policy.Benchmarking was virtually the only choiceavailable to Congress. The Americanexperience with federal education policyis notable because, first, benchmarks

are used to address a national educationpolicy problem in a country with almostno national educational capacity. Second,Congress arrived at benchmarks only af terthe states had co-opted previous nationallegislation. Third, NCLBs benchmarking

was developed in the face of oppositionfrom many of those states. Yet, despite

these challenges, top-down benchmarkinghas unquestionably refocused Americaneducation on the national educationagenda at the expense of federalism.

Offering American states federal moniesin return for pursuing federal policyprescriptions is virtually the only lever thenational government has over educationpolicy. Education policy in the UnitedStates is primarily a local affair, and thefederal government has little ability to

gather educational data, develop policyalternatives, or effect change directly.Despite recent federal advances in thearea, states and localities still contribute90 percent of all educational revenues(Zhou 2010). The federal governments

Benchmarking Inequality:Measuring education progressin American education

solace.org.uk

Arnold F. Shober

primary involvement in elementary andsecondary education came only in 1965

and only after President Lyndon Johnsonand Commissioner of Education FrancisKeppel circumvented the issue by tying$1.3 billion to low-income students ratherthan to schools through the Elementaryand Secondary Education Act (ESEA),the forerunner of NCLB (McGuinn 2006,p. 30). At the time, federal policymakersbelieved that school spending was theprimary driver of unequal educationalopportunities, a view suggested by theU.S. Supreme Court in Brown v. Board of

Education (1954) (see Reynolds 2007). Solong as schools spent federal money toaid low-income students, they were freeto design their own educational programs


17/57

17

as they wished, and schools could refuse

the money. Thus, the federal governmentsattempt to improve educationalopportunity depended wholly on state andschool district willingness and capacity todo so.

This precedent was ill-suited to addressthe emerging problem of studentachievement. In the thirty years afterESEA, it became painfully clear to moststate and federal policymakers that simplyboosting spending would not produce

educational equality in any meaningfulsense. As early as 1966, troubling datashowed a vast gulf in student achievementamong racial groups some as large as astandard deviation in test scores (Colemanet al. 1996). Despite achievement gainsfor all racial groups over time, that gaphas persisted into the 2000s (McCall et al.2006).

When Congress attempted to shift federalfocus from spending to achievement inthe 1994 reauthorization of ESEA (knownas the Improving Americas SchoolsAct), federal policymakers saw low anddivergent state education standards as aroot cause of disparities in achievement,and many of them, including Presidents

George H. W. Bush and Bill Clinton,

supported the creation of nationalstandards. But advocates of Americanfederalism undermined coherent nationaleducation policy (Ravitch and Schlesinger1996). Despite a rousing fight over nationalhistory standards (which came to nothing),federal law required states to createstandards, assessments, and reporting ofresults. For the first time, federal moneywould be dependent on classroom contentrather than student characteristics, butstates still controlled standard setting and

evaluation. State capacity remained thefulcrum on which federal policy rested (seeManna 2006).

The high-stakes benchmarkingcharacteristic of NCLB came only afterstates had co-opted previous legislation.Through the 1990s, many states designeddiffuse standards, created unclear metricsof success, set low bars to pass stateexams, and occasionally excused low-performing students from taking the examsat all. By 1999, federal policymakers arguedthat states were subverting the spirit, ifnot the letter, of ESEA (McGuinn 2006,pp. 134C45). In response, policymakersadopted stringent benchmarks in NoChild Left Behind to overcome the states

solace.org.uk

foot-dragging. NCLB created a bold, if

simplistic, measure of success. Studentexam performance would be categorizedas below basic, basic, proficient, oradvanced. Exams would be given in atleast reading and math every year from thethird to the eighth grade and once in highschool. At least 95 percent of students ineach demographic subgroup would takethe exams, and, at the end of the 2013C14school year, all students must reach theproficient level or schools would risklosing federal funds (Manna 2006, p. 128).

Further, these benchmarks would be widelydisseminated to parents and the press.Federal policymakers argued that publicly-reported benchmarks with common labelswould weaken states ability to skirt studentachievement, essentially shaming them intoadopting the federal governments focuson academic outputs rather than revenueinputs.

NCLBs benchmarks were developed inspite of states commitment to improveacademic achievement. Indeed, they weredeveloped to force that commitment;and the federal stick has proven anuneven motivator. This is evident in bothstate-reported student proficiency levelsand standards for teacher quality. In

2009, the federal government compared

state standards to federal standards forthe National Assessment of EducationalProgress (NAEP), a national exam. TheNAEP standard score for eighth-grademathematics for proficiency was 299that year. In comparison, Massachusettsproficient standard was more challengingthan the federal standard, at 300;Wisconsins was 262; and Tennessees at229 below the federal standard for basic(Bandeira de Mello 2011)! States also reportdivergent percentages of highly-qualified

teachers. In 2005, Wisconsin claimed 99.5percent of its core classes were taught byhighly-qualified teachers, Massachusettsclaimed 93 percent, and California 74percent (Carey 2006, p. 18). It is apparentthat states are not speaking the samelanguage nor have the same educationalpriorities as the federal government.Despite these variations, the ObamaAdministration extended benchmarking toteacher quality by insisting that teachersbe partly evaluated on student test scores(U.S. Department of Education 2010, p. 14).But even here, the Administration had to


18/57

18

bow to federalist realities and found it had

to promise states a waiver from elementsof No Child Left Behind in order to gaintheir participation in further benchmarking(Cavanagh and Klein 2012). Many stateshave opted to ignore the offer.Despite these obstacles, there is littledoubt that NCLBs benchmarking hassuccessfully shamed states into afundamental re-orientation of theireducational programs. While somelegislators continue to talk aboutincreasing teacher pay, reducing class

sizes, or boosting teachers professionaldevelopment, no state policymakerpublicly contests that academicperformance is a central purpose ofschooling (Shober 2012). States and schoolsdo respond to public grading of theirperformance, and spend significant timedefending decreases in NCLB ratings.

Teachers unions, long staunch opponents

of public ratings, have acquiesced tobenchmarking, too. In 2010, the presidentof the American Federation of Teachersallowed that student test scores . . . shouldalso be considered when evaluatingteachers (Weingarten 2010). NCLBalso prompted states to take defensivemeasures against the future beginning asforty-five states quickly signed on to a new,state-driven consortium in 2010 and 2011to develop common, national academicstandards and assessments the very reform

their advocates scuttled in 1994 (CommonCore State Standards Initiative 2012).Benchmarking brought together whatfederalism kept apart.

Arnold F. Shober is Associate Professor of

Government at Lawrence University, USA.

solace.org.uk

Bandeira de Mello, V. (2011), Mapping State

Proficiency Standards Onto NAEP Scales:Variation and Change in State Standards for

Reading and Mathematics 2005-2009 (National

Center for Education Statistics, Washington, DC).

Carey, K. (2006), Hot Air: How States Inflate

Their Educational Progress Under NCLB

(Education Sector, Washington, DC).

Cavanagh, S. and Klein, A. (2012), Broad

changes ahead as NCLB waivers roll out.

Education Week (9 February).

Coleman, J., Campbell, E., Hobson, C.,

McPartland, J., Mood, A., Weinfeld, F.,

and York, R. (1966), Equality of EducationalOpportunity (U.S. Department of Health,

Education, and Welfare, Office of Education,

Washington, DC).

Common Core State Standards Initiative

(2012), Common Core State Standards (Author,

Washington, DC). http://www.corestandards.org/.

Manna, P. (2006),Schools In: Federalism and

the National Education Agenda (Georgetown

University Press, Washington, DC).

McCall, M., Houser, C., Cronin, J., Kingsbury,

G. and Houser, R. (2006), Achievement Gaps:

An Examination of Differences in Student

Achievement and Growth (Northwest Evaluation

Association, Lake Oswego, OR).

McGuinn, P. (2006), No Child Left Behind

and the Transformation of Federal Education

Policy, 1965-2005 (The University Press of Kansas,

Lawrence, KS).

Patterson, J. (2001),Brown v. Board of

Education: A Civil Rights Milestone and ItsTroubled Legacy (Oxford University Press, New

York).

Ravitch, D. and Schlesinger, A. (1996), The

new, improved history standards. Wall Street

Journal (3 April).

Reynolds, L. (2007),Uniformity of taxation

and the preservation of local control in school

finance reform. University of California Davi s Law

Review, 40, 5, pp. 1835C1895.

Shober, A. (2012),From Teacher Education to

Student Progress:Teacher Quality Since NCLB

(AEI, Washington, DC).U.S. Department of Education (2010),A

Blueprint for Reform (Author, Washington, DC).

http://www2.ed.gov/policy/elsec/leg/blueprint/

blueprint.pdf.

Weingarten, R. (2010),A new path forward:

four approaches to quality teaching and better

schools. Speech, Washington, DC (12 January).

http://aft.3cdn.net/227d12e668432ca48e_

twm6b90k1.pdf.

Zhou, L. (2010), Revenues and Expenditures

for Public Elementary and Secondary Education:

School Year 2007C08 (Washington, DC,

National Center for Education Statistics).


19/57

19

Sector-led improvement is currentlyseen as the way forward. Abolishing the

Comprehensive Area Assessment andthe Audit Commission, the UK CoalitionGovernment declared that local authoritiesshould be responsible for their ownimprovement and be held accountable bytheir local communities and electorate.Local government welcomed this radicalchange and the Local GovernmentAssociation has staked out its claim on theimprovement territory.

The idea has obvious attractions particularly in local services which aregoverned by elected politicians. A localapproach should make for a betteralignment of statutory powers with serviceresponsibilities, and generate cost savingsby reducing the administrative burdenassociated with topdown performance

regimes. Given the magnitude of budgetcuts, it gives authorities the flexibility to

make tough choices locally. So sector ledimprovement is an idea whose time hasfinally come.

Or has it? From its inception, some serviceswere considered too important to beleft to local government alone. Ofstedremains responsible for inspecting schoolsand high risk childrens services. TheCare Quality Commission inspects thecontinuum of social care. So rather betterevidence would be helpful in clarifyingwhat is meant by this term, and identifyingsome of the conditions for its success.Here I draw some reflections about theparticular features and consequencesassociated with a sector led model forimprovement initiated in Ontario. Chargedwith promoting the sustainability of the

Choosing to get better?A Canadian perspective on sector-led

improvement in local childrens services

solace.org.uk

.

Wendy Thomson

child welfare system in Ontario, as oneof three Commissioners I gained some

insights into the questions posed bysector led improvement. In different ways,this experience at times confirms andchallenges what we think we know andhave come to expect about improvingpublic services.

The position in Canada

In places like Canada new publicmanagement never really caught on.By UK standards the public is quite tolerantand even fond of their public services(though becoming decidedly grumpyabout its health services). As responsibilityfor child welfare services rests with the


20/57

20

provinces, the federal government has

largely left them to account for themselvesrather than adopting national standardsor systems of measurement. In Ontario,services such as education and health havebeen the subject of some limited standardsetting and performance reporting.

However this is not the case for childwelfare services which are delivered byforty-six (46) independently governedChildrens Aid Societies (CAS), mandatedand funded by the Ontario government

through the Ministry of Children andYouth Services (MCYS). Despiteengagement with quality assuranceprograms and longstanding collaborationdeveloping outcome measures, littlehard data is available that reveals muchof importance about the performance ofthe child welfare system as a whole and itsimpact on children.

The paradox is that where governmentshave taken a traditional administrativeapproach, such as in many Canadianprovinces, layers of process requirementshave accumulated year after year. Theserequirements obscure the services socialpurpose and client benefit. Consequently,little reliable information about results or

clients is available despite the multiple

and time consuming reporting up the lineon compliance to standard processes,timeframes, expenditure audits and case-level checking.

Ontario decided to establish an arms-length Commission to tackle its concernsabout the child welfare sectors spendingand performance. An arms-length body,reasonably resourced, can provide someleadership, expertise and the discretionafforded by its independence from

government and sector provider. Howeverultimately it must act through others, andfor the Commission working with the sectoroffered the shortest and most effec tiveroute to achieving tangible improvementsfor childrens services.

The flipside of independence of courseis the challenge of developing a systemof performance measurement andimprovement without the legitimacy ofMinisterially sanctioned priorities. Aswe sought to define measures to reflectprogress on what were thought to be keygovernment policies, many proved to beinsufficiently clear or in conflict with otherrequirements. It was difficult to reconcilea policy of differential response with

solace.org.uk

compliance to all aspects of the statutory

protection standards, for example. It wasalso impossible to honour commitmentsmade to First Nations children andcommunities without an agreed form ofheadcount monitoring, or to hold CASaccountable for reducing admissions ofchildren into care when the upstreamservices of early intervention and familysupport were delivered by a patchwork ofother agencies with no duty to collaborate.

A sector-led approach

Government exists to assume a strategicand systemic role. Better that it spendits political capital on achieving itspolicies, priorities and social value, andleave agencies the responsibility formanaging their internal processes andaccounts. This may be a major change forprovincial governments which are moreoccupied with day-to-day operations andenforcement of probity rules. An armslength body such as the Commission mayplay a part in this transition.

Of course at the core of the idea of sectorled improvement is sector leadership,and most sectors are organised througha membership association. The Ontario

Association of Childrens Aid Societies

(OACAS) took on this leadership, well-served by a committed band of childwelfare and measurement experts. Itset up a sector advisory group for thecommission, an indispensable advocateand arbitrator. The strengths are thosewell-rehearsed by advocates of a sector-led approach - buy-in, commitment,specialist expertise, members hands onrole in service delivery, local accountabilityto community and clients. Together withthe Commission, it was possible to make

the case for using performance data andbenchmarking (dumb data) to foster theculture of curiosity and learning necessaryto deliver improvement.

With the combined efforts of theOACAS, technical database support, theCommission, and supportive funding,quite remarkable progress was made.A list of child-centered, events-drivenindicators were adopted (after muchdiscussion), a technical guide produced,data from multiple IT systems wasdownloaded and matched, andpreliminary reports were produced.


21/57

21

Despite some obstacles along the way, all

this was done in a few short months. Thequestion for the future must be whethercollaborative relationships are sufficientlyresilient to sustain the transition to greatertransparency , public reporting and theaccountability agreements that theOntario government is now introducing.

Key issues

A major feature of membershipassociations is reconciling their

consensual decision-making style withtheir commitment to raising the servicequality provided by all their members.Associations are voluntary affiliations, andwhen put to the test members can opt outor simply change their mind at any point inthe process. So voluntary affiliation can restuncomfortably with consensual leadershipand claims to champion improvement.When the stakes are high and members arewithholding their consent or threateningto withdraw, a sector association mayhave to choose between remaining loyalto its voluntary nature and run with thewilling, or becoming comfortable with amore exclusive club that makes adherenceto benchmarking and improvement a

condition of membership. Yet the less

willing may also include the least able andpoorest performers.

Data consistency is another challenge evenfor the most controlling of performancemodels, but it is a perennial problem forvoluntary benchmarking systems. Whenthe performance indicators are developedand adopted by consensus within thesector by its members, the risk is that theycan be constantly subject to review. Atbest such review and revision may improve

the measures but it may also underminetheir value and credibility if not managedwell. More powerful sector players cannotbe seen to exert undue influence to gainagreement to indicators on which theyperform well. Every change of indicatoralso means a break in the time series andsignificant system costs. The appetitefor revising and designing new measuressometimes seems insatiable.

Many factors feature in how sector ledimprovement works. The experience ofthe Ontario child welfare sector highlightswhat can be achieved in collaboration withan independent Commission, supportedby technical expertise and government

solace.org.uk

funding. No doubt the shadow of

hierarchy and the risk of more muscularand top down performance managementwas never far from this project. In thissense, it may be that the distinctionbetween sector led improvement andtop-down command and control ismore tactical than scientific, a voluntarychoice made in conditions not entirely ofour choosing. Nonetheless, there is realvalue in encouraging accountability andimprovement, and one that itself sees thevalue of comparison in doing so.

Dr Wendy Thomson is Professor of

Social Work and Social Policy at McGill

University, Canada.


22/57

22

In recent years there has been anunprecedented increase in the scale, scope

and significance of external assessmentand benchmarking of public services.It is a transnational phenomenon withprofound implications for managementpractices around the globe. Processesof performance assessment andbenchmarking have even attracted stronginterest in federal systems such as Australiaand Canada. The Australian case highlightsdifferences in approach associatedwith the fundamental character ofgovernmental and public servicessystems. But it also highlights theinteresting degree of convergence ofmethods across disparate jurisdictions.Much (though not all) of public servicebenchmarking is vertically and horizontallyintergovernmental.

On the vertical axis, it involves a centralgovernment mandating and/or facilitating

in some way performance reporting byregional or local governments. On thehorizontal axis it involves the implicitor explicit comparison of performancebetween the individual jurisdictions(regional or local governments). Thispotentially creates problems, but in unitarystates, where sovereignty is concentratedin the national government, the centralgovernment occupies a position ofsuperiority both constitutional and fiscalvis--vis the regional or local authoritiesit is directing. The question is usually lesswhether that direction is legitimate, butrather, whether it is effective.

In federal systems, however, theconstituent units are sovereign

Benchmarking ina federal context

solace.org.uk

Alan Fenna

governments in their own right, andthe idea that central governments

could mandate intergovernmentalbenchmarking of functions that lie withinthe jurisdiction of the constituent units isalien. Central governments may promoteintergovernmental benchmarking, butlittle more.

However, in centralized federations thenational government enjoys a moredirective position. Australia is one suchcase, and it is a case where performancemonitoring is now well established. Itdemonstrates the convergence thatis taking place around benchmarkingbetween a number of federal and non-federal jurisdictions.


23/57

23

The States and the Commonwealth

Australias Commonwealth governmentexercises considerable authority over thecountrys six statesprimarily because ofits fiscal dominance. Most service deliveryresponsibilities are the responsibility of thestates, but the Commonwealth controlsmost of the major tax bases and overthe years has developed a substantialappetite for policy-setting in those areasof state jurisdiction. The states rely oncentral government transfers for half of

their revenue needs, and they are subjectto wide-ranging policy direction throughconditional grants. In response to thehigh degree of entanglement and overlapthat has resulted, Australian federalismhas increasingly developed a network ofintergovernmental machinery whereby thetwo levels of government cooperate.

At the apex of that system is aninstitutionalized system of heads ofgovernment meetings called COAG:the Council of Australian Governments.

Australias Report onGovernment Services

For most the 20th century, there wasno formalized comparison of state

performance across Australia. Each state

was accountable to its own citizens. Thevery broad push to put the Australianeconomy on a more open and competitivefooting in the 1980s and early 1990srequired the two levels of government todevelop much more collaborative relations.As part of that, Australias governmentslaunched a comprehensive arrangementfor performance reporting of servicedelivery by the states and territories knownas ROGSthe Report on GovernmentServices. This is an annual compendium of

performance data collated and publishedcentrally. First produced in 1995, ROGShas been published every year since,broadening, deepening and improvingfrom iteration to iteration. Currently, itcovers 14 service domains comprising23 specific services representing overtwo-thirds of total government recurrentexpenditure in Australia.

The success of ROGS has been attributedto the collaborative and consensualway it was established and continuesto operate. It is produced under thedirection of a steering committee onwhich sit representatives from each of theparticipating governments and which ischaired independently. The ProductivityCommission produces the report and

solace.org.uk

plays an important role as honest broker.

Unsurprisingly, part of ROGS successhas been its avoidance of aggregateperformance indicators or league tabletype reporting. ROGS is primarily a tool forgovernment, not the public. The focus ison the data, with some contextualizationto make inter-jurisdictional comparisonmore meaningful. Champions of ROGSpoint to where the report seems to havehelped instigate or promote broaderadoption of good policy but there is onlya weak connexion between the findings in

ROGS and the political and policy process.There is little by way of accountabilitymechanisms to ensure performancereporting becomes performanceimprovement.

Outcomes Focus

ROGS initially evaluated cost-effectiveness, meaning that both efficiencyand outcomes data would be required.Over time, the emphasis on outcomeshas been increased and equity hasbeen inserted as a third criterion. At thesame time, ROGS notes the continuingimportance of output indicators. ROGShas benefited from the iterative nature ofthe process, with data quality and rangeimproving over time.

To help drive that improvement, ROGS

has always followed the approach ofpublishing data even if not all jurisdictionsare participating. The ROGS experiencehas been that no jurisdiction wants to beseen as not participating and so the gapsare soon filled.

But this is now old news. Recentdevelopments in Australian federalismhave built on that framework in an attemptto consolidate a regime of outcomes-focused inter-jurisdictional benchmarking.

Australia is well into a second generationversion of its intergovernmentalbenchmarking scheme. This has involvedan escalation of the regime rather thanmerely revision or remodelling. The reasonfor this was a major reform of Australiassystem of intergovernmental fiscaltransfers in 2009 that involved a retreatfrom the high degree of implementationconditionality traditionally characteristic ofCommonwealth grants to the states, andthe shift instead to a more arms-lengthoutcomes accountability.


24/57

24

A new body, the COAG Reform Council,

now analyses the data to assess how statesare performing in regard to their outcomestargets as part of the intergovernmentalmachinery of co-operative federalism inAustralia.

The targets are a mixture of outputsand outcomes desiderata. In health,for instance, targets include suchpredictableand manageableoutputindicators as waiting times for electivesurgery and availability of aged care

accommodation. However, they alsoinclude such broad outcomes objectivesas incidence of selected cancers,prevalence of overweight and obesityand levels of risky alcohol consumption.

It is relatively early days for this new

outcomes-based regime and the resultsare very mixed and reveal that in manyareas no progress is being made. Thebig question is policy impact, and hereutilization has been the Achilles heel,with little evidence that governments areimproving their performance in response.The new performance reporting processhas certainly upped the ante, but notsufficiently it would seem to give thesystem real political traction.

Alan Fenna is Professor of Governmentat Curtin University, Australia.

solace.org.uk


25/57

25

Many countries are under increasingpressure to either build or sustain a system

of evaluation anchoring on governmentprogram performance and this movementhas already spread into many developingcountries, such as India, Malaysia andSri Lanka. China is not immune to thisglobalization of performance evaluationmovement.

In 2003, Guangdong Province becamethe first provincial government of China toexperiment with performance evaluationand budgeting. But performance orientedefforts are st ill in its infancy in China. Sincethe mid-1990s, the State Council of Chinahas incorporated project performanceevaluation in their administrative review andapproval of projects. By 2008, according tothe Chinese Public Administration Society,

one third of the provincial governmentshave experimented with various models of

performance evaluation. In addition, thelocal governments are, in an unprecedentedway, using performance information in theirdecision making. The City of Hangzhou,the capital city of ZheJiang Province inthe economically developed easternpart of China, for instance, demotedthe director of its Drug Control Bureaubecause the performance of the Bureauwas consecutively rated as unsatisfactory inserving the public.

An effective performance evaluationsystem, however, has never been builtovernight. Given that the basic governancestructure in developing countries is oftenincomplete, introducing performanceevaluations in these countries is especially

Unlocking the Black Box:Performance EvaluationPractices in China

solace.org.uk

Elaine Yi Lu

full of challenges. What seems to behappening in China is a vibrant and fluid

situation on the ground. The demand foran evaluation of performance evaluation inChina is high.

The study of performance evaluations inChina is also under-developed. The subjectsfall under either program/organizationalperformance evaluations or personnelperformance evaluations. The existingresearch can be categorized as westernexperience-centered research andChinese experience-centered research.In the first group, some studies elaborateon the western experience of performanceevaluation and call for more performanceevaluations in China.


26/57

26

The key questions are: what is performance

evaluation as implemented in the westerncountries? And can it be exported toChina? On the other hand, Chineseexperience-centered research focuseson an evaluation of Chinese governmentactivities and/or provokes thoughts on howto conduct performance evaluations andmanagement in a Chinese way.

A Case Study: Zhejiang Province

Zhejiang Province has also experimented

with performance evaluation practice.Between 2006 and 2008, the Province haspromulgated a series of major executiveorders/reports regarding performanceevaluation, and Zhejiang is one of theleaders in the country on this regard.Seventeen official performance evaluationreports of budget outcomes were obtainedand analysed, and interviews wereconducted with 20 government employees.Their comments were then content analyzedto tease out the patterns in their perception

of the effectiveness of performance

evaluation and the usage of thisinformation in decision making.

Performance evaluation in China involves atleast three groups of people: the requester,evaluatee, and evaluator. The requestersask performance evaluation to be done. Ingeneral, performance evaluations are doneat the request of a finance department,a supervising department and/or adepartment it self (self-evaluation).The evaluatees in this sample consist of

both organizations and projects. Therefore,the two most common forms of evaluationsare organizational performance evaluationsand project-based evaluations. Overall,62% of the evaluators are governmentemployees, 34 % third party evaluators(with the majority being accountants) and4% academics. A wider range of evaluatorswas found than expected, going beyondgovernment employees, including usingthird parties to conduct performanceevaluations - a departure from Chinas

solace.org.uk

traditionally hierarchical evaluation system

that tends to be internally driven.

The evaluation reports contain three majorcomponents: project description, evaluationresults and recommendations. A ratherconsistent measurement structure consistsof four kinds of evaluative categories: GoalQuality Assessment, Goal Obtainment,Funds Management and Financial Capacity.The second category (Goal Obtainment)is the key part. Depending on the projectsbeing evaluated, goals consist of both

operational (output) and social impact(outcome) indicators. For instance, a streetoverhaul project was evaluated in termsof both the kilometers of st reets beingoverhauled (output) and the promotion ofcity image (outcome). In addition to results-based evaluation, the majority of the reports(93%) contained the assessment of staff,management and institutional support levelfor goal fulfillment.

The reports made clear that the purpose of

doing so was to gauge the organizationalmanagement capacity in the hope toindicate managerial areas of improvement.Another somewhat surprising finding isthat although the assessment of citizensatisfaction is not yet a regularly utilized toolin the feedback loop of the governmentdecision making process, 40% of the reportsused some kind of target populationfeedback mechanism as part of theevaluative efforts.

Interestingly, the majority of the reports(about three quarters) received highpoints (90 and above), and the interviewsconfirmed that no interviewee could identifya case where performance evaluationscored poorly. This may indicate a degree ofcaution and immaturity in the instrument ofperformance evaluation.


27/57

27

Conclusion

China is in the early developing stage ofperformance evaluations in a larger contextwhere 1) centralized governance is in place,2) various reform initiatives are takingshape, and 3) the boundaries of scientificevaluations and the potential usefulness ofperformance evaluations within its politicalenvironment are unknown. These conditionsare not unique to China. Many developingcountries are in similar situations. Theliterature on performance evaluation and

management stresses the importance ofthe performance system being politicallyfeasible and technically sound in order for itto be successfully implemented.

Chinas approach seems to address theformer (politically feasible) by legitimizingperformance evaluations through issuingexecutive call letters and to enhance thelatter (technically sound) by positioning thefinance department to provide substantialamounts of central guidance. Of coursethe introduction of performance relatedevaluations in transitional countries maybe a critical contribution to building upa professional public service and the

development of viable governmentinstitutions, or an extra burden on alreadyover-burdened staffs and a diversion frommore urgent issues. China is still in theprocess of finding out which it will be forthem, but there is no doubt that a start hasbeen made.

Elaine Yi Lu is Associate Professor in theDepartment of Public Management, JohnJay College of Criminal Justice, the CityUniversity of New York, USA.

solace.org.uk


28/57

28

Improving performance across publicservices is a consistent aim for government

and a focus for ongoing programmes ofreform. How to best manage public servicesin order to achieve such improvement isboth highly politically contentious and thesubject of much academic debate andresearch.

Various models of governance have beenproposed, implemented and evaluated, butit is often difficult to distinguish preciselywhich aspects of such models have had apositive impact on performance due to tworeasons: the usual lack of a control

Date post:	13-Apr-2018
Category:	Documents
Upload:	gerschgersch
View:	217 times
Download:	0 times

Benchmarking for Improvement-2013

Documents