Sahni

i\342\226\240

\342\226\240\302\273 r

O I

\\

ELLISHOROWITZ

SARTAJSAHIMI

SANGUTHEVARRAJASEKARAN

COMPUTERALGORITHMS

COMPUTERSCIENCEPRESSAlfred V. Aho, Columbia University

Jeffrey DUllman, Stanford UniversityFoundations ofComputer Science:PascalEditionFoundations ofComputer Science:CEdition

Michael J.Clancy, University of California at

BerkeleyMarcia C.Linn, University of California at

BerkeleyDesigning PascalSolutions: A CaseStudy

ApproachDesigning PascalSolutions: CaseStudies Using

Data Structures

A. K. Dewdney, University ofWestern OntarioThe New Turing Omnibus: 66Excursions in

Computer ScienceIntroductory Computer Science:Bits ofTheory,

Bytes ofPractice

Robert Floyd, Stanford UniversityRichard Beigel,Yale UniversityThe Language ofMachines:An Introduction to

Computability and Formal Languages

Michael R. Garey, BellLaboratoriesDavid S.Johnson, BellLaboratories

Computers and Intractability: A Guide to the

Theory ofNP-Completeness

Judith L.Gersting, University of Hawaii at HiloMathematical Structures for Computer Science,

Third EditionVisual Basic\" Programming: A Laboratory

Approach

Ellis Horowitz, University of Southern California

Sartaj Sahni, University of FloridaFundamentals ofData Structures in Pascal,

Fourth Edition


Sartaj Sahni, University of Florida

Susan Anderson-Freed, Illinois Wesleyan

UniversityFundamentals ofData Structures in C



Dinesh Mehta, University of TennesseeSpaceInstitute

Fundamentals ofData Structures in C++



Sanguthevar Rajasekaran, University of Florida

Computer Algorithms



Sanguthevar Rajasekaran, University of Florida

Computer Algorithms/C+ +

Thomas W. Parsons, Hofstra UniversityIntroduction to Compiler Construction

Gregory J.E. Rawlins, Indiana University

Compared to What?: An Introduction to the

Analysis ofAlgorithms

Wei-Min Shen, Microelectronics and Computer

Technology CorporationAutonomous Learning from the Environment

James A. Storer, Brandeis UniversityData Compression:Methods and Theory

Steven Tanimoto, University of WashingtonElements ofArtificial Intelligence Using Common

Lisp,SecondEdition

Kim W. Tracy, BellLabs/Lucent Technologies,Inc.

PeterBouthoorn, Gnningen University

Object-OrientedArtificial Intelligence UsingC++

Jeffrey D.Ullman, Stanford University

Principles ofDatabaseand Knowledge-BaseSystems, Vol I:ClassicalDatabaseSystems

Principles ofDatabaseand Knowledge-BaseSystems, Vol II:The New Technologies

COMPUTERALGORITHMS

EllisHorowitzUniversity of Southern California

SartajSahniUniversity of Florida

SanguthevarRajasekaranUniversity of Florida

ComputerSciencePressAn imprint of W. H. Freeman and Company

New York

Acqisitions Editor: Richard BonacciProjectEditor: PenelopeHull

TextDesigner:The Authors

Text Illustrations: The Authors

CoverDesigner:A GoodThingCoverIllustration: TomekOlbinskiProduction Coordinator:SheilaAnderson

Composition:The Authors

Manufacturing: R R Donnelley& SonsCompany

Library of CongressCataloging-in-PublicationData

Horowitz, Ellis.Computer algorithms / Ellis Horowitz, Sartaj Sahni, Sanguthevar Rajasekaran.

p. cm.Includesbibliographical referencesand index.ISBN0-7167-8316-91.Computer algorithms. 2.Pseudocode(Computer program language).I.Sahni, Sartaj. II.Rajasekaran,Sanguthevar. III.Title.

QA76.9.A43H671998005.1NDC21

97-20318CIP

\302\251 1998by W. H.Freeman and Company.All rights reserved.No part of this bookmaybereproducedby any mechanical,photographic, or electronicprocess,or in the form of aphonographic recording,nor may it bestoredin a retrieval system, transmitted, or otherwisecopiedfor public or private use,without written permissionfrom the publisher.

Printed in the United Statesof AmericaFirst printing, 1997

Computer SciencePressAn imprint of W. H.Freeman and Company41MadisonAvenue, New York, New York 10010Houndmills, BasingstokeRG216XS,England

To my nuclearfamily,

MATiyAAfAfE, VTPI,CHAMOCH,andlKA

\342\200\224 EllisHorowitz

To,

MEETA, AGAM, NEW A, and V ATIAM

\342\200\224 Sartaj Sahni

To,

KEETIAN, KTIXSUMA,VANVX, and VOMMUTHAX

\342\200\224 Sanguthevar Rajasekaran

Contents

PREFACE xv

1 INTRODUCTION 11.1.WHAT ISAN ALGORITHM? 11.2 ALGORITHM SPECIFICATION 5

1.2.1PseudocodeConventions 51.2.2RecursiveAlgorithms 10

1.3 PERFORMANCE ANALYSIS 141.3.1SpaceComplexity 151.3.2TimeComplexity 181.3.3Asymptotic Notation(O, ft, 6) 291.3.4PracticalComplexities 371.3.5PerformanceMeasurement 40

1.4 RANDOMIZEDALGORITHMS 531.4.1Basicsof ProbabilityTheory 531.4.2RandomizedAlgorithms:An InformalDescription. . 571.4.3Identifying the RepeatedElement 591.4.4Primality Testing 611.4.5Advantages and Disadvantages 65

1.5 REFERENCESAND READINGS 68

2 ELEMENTARYDATA STRUCTURES 692.1 STACKSAND QUEUES 692.2 TREES 76

2.2.1Terminology 772.2.2Binary Trees 78

2.3 DICTIONARIES 812.3.1Binary SearchTrees 832.3.2CostAmortization 89

vn

viii CONTENTS

2.4 PRIORITYQUEUES 912.4.1Heaps 922.4.2Heapsort 99

2.5 SETSAND DISJOINTSETUNION 1012.5.1Introduction 1012.5.2Unionand Find Operations 102

2.6 GRAPHS 1122.6.1Introduction 1122.6.2Definitions 1122.6.3GraphRepresentations 118

2.7 REFERENCESAND READINGS 126

3 DIVIDE-AND-CONQUER 1273.1 GENERAL METHOD 1273.2 BINARYSEARCH 1313.3 FINDINGTHEMAXIMUMAND MINIMUM 1393.4 MERGE SORT 1453.5 QUICKSORT 154

3.5.1PerformanceMeasurement 1593.5.2RandomizedSortingAlgorithms 159

3.6 SELECTION 1653.6.1A Worst-CaseOptimalAlgorithm 1693.6.2Implementationof Select2 172

3.7 STRASSEN'SMATRIX MULTIPLICATION 1793.8 CONVEX HULL 183

3.8.1SomeGeometricPrimitives 1843.8.2The QuickHullAlgorithm 1853.8.3Graham'sScan 1873.8.4An 0(nlogn) Divide-and-ConquerAlgorithm 188

3.9 REFERENCESAND READINGS 1933.10ADDITIONALEXERCISES 194

4 THEGREEDYMETHOD 1974.1 THEGENERAL METHOD 1974.2 KNAPSACK PROBLEM 1984.3 TREEVERTEX SPLITTING 2034.4 JOBSEQUENCINGWITH DEADLINES 2084.5 MINIMUM-COSTSPANNINGTREES 216

4.5.1Prim'sAlgorithm 218

CONTENTS IX

4.5.2Kruskal'sAlgorithm 2204.5.3An OptimalRandomizedAlgorithm (*) 225

4.6 OPTIMAL STORAGEON TAPES 2294.7 OPTIMAL MERGE PATTERNS 2344.8 SINGLE-SOURCESHORTESTPATHS 2414.9 REFERENCESAND READINGS 2494.10ADDITIONALEXERCISES 250

5 DYNAMICPROGRAMMING 2535.1 THEGENERAL METHOD 2535.2 MULTISTAGEGRAPHS 2575.3 ALL PAIRS SHORTESTPATHS 2655.4 SINGLE-SOURCESHORTESTPATHS:

GENERAL WEIGHTS 2705.5 OPTIMAL BINARYSEARCHTREES(*) 2755.6 STRINGEDITING 2845.7 0/1-KNAPSACK 2875.8 RELIABILITYDESIGN 2955.9 THETRAVELINGSALESPERSONPROBLEM 2985.10FLOW SHOPSCHEDULING 3015.11REFERENCESAND READINGS 3075.12ADDITIONALEXERCISES 308

6 BASICTRAVERSALAND SEARCHTECHNIQUES 3136.1 TECHNIQUESFORBINARYTREES 3136.2 TECHNIQUESFOR GRAPHS 318

6.2.1BreadthFirstSearchand Traversal 3206.2.2DepthFirstSearchand Traversal 323

6.3 CONNECTEDCOMPONENTSAND SPANNINGTREES . 3256.4 BICONNECTEDCOMPONENTSAND DFS 3296.5 REFERENCESAND READINGS 338

7 BACKTRACKING 3397.1 THEGENERAL METHOD 3397.2 THE8-QUEENSPROBLEM 3537.3 SUM OFSUBSETS 3577.4 GRAPH COLORING 3607.5 HAMILTONIANCYCLES 3647.6 KNAPSACK PROBLEM 368

7.7 REFERENCESAND READINGS 3747.8 ADDITIONALEXERCISES 375

8 BRANCH-AND-BOUND 3798.1 THEMETHOD 379

8.1.1LeastCost(LC)Search 3808.1.2The15-puzzle:An Example 3828.1.3ControlAbstractionsfor LC-Search 3868.1.4Bounding 3888.1.5FIFOBranch-and-Bound 3918.1.6LC Branch-and-Bound 392

8.2 0/1KNAPSACK PROBLEM 3938.2.1LC Branch-and-BoundSolution 3948.2.2FIFOBranch-and-BoundSolution 397

8.3 TRAVELINGSALESPERSON(*) 4038.4 EFFICIENCYCONSIDERATIONS 4128.5 REFERENCESAND READINGS 416

9 ALGEBRAICPROBLEMS 4179.1 THEGENERAL METHOD 4179.2 EVALUATIONAND INTERPOLATION 4209.3 THEFAST FOURIERTRANSFORM 430

9.3.1An In-placeVersionof the FFT 4359.3.2SomeRemainingPoints 438

9.4 MODULARARITHMETIC 4409.5 EVEN FASTER EVALUATIONAND INTERPOLATION . 4489.6 REFERENCESAND READINGS 456

10LOWERBOUNDTHEORY 45710.1COMPARISONTREES 458

10.1.1OrderedSearching 45910.1.2Sorting 45910.1.3Selection 464

10.2ORACLES AND ADVERSARYARGUMENTS 46610.2.1Merging 46710.2.2Largestand SecondLargest 46810.2.3StateSpaceMethod 47010.2.4Selection 471

10.3LOWER BOUNDSTHROUGHREDUCTIONS.......474

CONTENTS xi

10.3.1Findingthe ConvexHull 47510.3.2DisjointSetsProblem 47510.3.3On-lineMedian Finding 47710.3.4Multiplying TriangularMatrices 47710.3.5Invertinga LowerTriangularMatrix 47810.3.6Computingthe TransitiveClosure 480

10.4TECHNIQUESFOR ALGEBRAIC PROBLEMS(*) 48410.5REFERENCESAND READINGS 494

11AfV-HARD AND ATP-COMPLETEPROBLEMS 49511.1BASICCONCEPTS 495

11.1.1NondeterministicAlgorithms 49611.1.2TheclassesAfV-hard and .ATP-complete 504

11.2COOK'STHEOREM(*) 50811.3AfV-KABD GRAPH PROBLEMS 517

11.3.1CliqueDecisionProblem(CDP) 51811.3.2Node Cover DecisionProblem 51911.3.3ChromaticNumber DecisionProblem(CNDP)....52111.3.4DirectedHamiltonianCycle (DHC)(*) 52211.3.5Traveling SalespersonDecisionProblem(TSP) ....52511.3.6AND/OR GraphDecisionProblem(AOG) 526

11.4AfV-KABD SCHEDULINGPROBLEMS 53311.4.1SchedulingIdenticalProcessors 53411.4.2Flow ShopScheduling 53611.4.3JobShopScheduling 538

11.5J^V-UARD CODEGENERATION PROBLEMS 54011.5.1CodeGenerationWith CommonSubexpressions. . . 54211.5.2ImplementingParallelAssignment Instructions....546

11.6SOMESIMPLIFIEDAfV-BARD PROBLEMS 55011.7REFERENCESAND READINGS 55311.8ADDITIONALEXERCISES 553

12APPROXIMATIONALGORITHMS 55712.1INTRODUCTION 557V>..2 ABSOLUTEAPPROXIMATIONS 561

12.2.1PlanarGraphColoring 56112.2.2Maximum ProgramsStoredProblem 56212.2.3AfV-haxd Absolute Approximations 563

12.3e-APPROXIMATIONS 566

12.3.1SchedulingIndependentTasks 56612.3.2Bin Packing 56912.3.3AfV-h&id e-ApproximationProblems 572

12.4POLYNOMIALTIMEAPPROXIMATIONSCHEMES...57912.4.1SchedulingIndependentTasks 57912.4.20/1Knapsack 580

12.5FULLYPOLYNOMIALTIMEAPPROXIMATIONSCHEMES 58512.5.1Rounding 58712.5.2IntervalPartitioning 59112.5.3Separation 592

12.6PROBABILISTICALLYGOODALGORITHMS(*) 59612.7REFERENCESAND READINGS 59912.8ADDITIONALEXERCISES 600

13PRAM ALGORITHMS 60513.1INTRODUCTION 60513.2COMPUTATIONALMODEL 60813.3FUNDAMENTALTECHNIQUESAND ALGORITHMS . . 615

13.3.1PrefixComputation 61513.3.2List Ranking 618

13.4SELECTION 62713.4.1MaximalSelectionWith n2 Processors 62713.4.2Findingthe Maximum Usingn Processors 62813.4.3MaximalSelectionAmong Integers 62913.4.4GeneralSelectionUsingn2 Processors 63213.4.5A Work-Optimal RandomizedAlgorithm (*) 632

13.5MERGING 63613.5.1A LogarithmicTimeAlgorithm 63613.5.2Odd-EvenMerge 63713.5.3A Work-Optimal Algorithm 64013.5.4An 0(loglogm)-TimeAlgorithm 641

13.6SORTING 64313.6.1Odd-EvenMergeSort 64313.6.2An Alternative RandomizedAlgorithm 64413.6.3Preparata'sAlgorithm 64513.6.4Reischuk'sRandomizedAlgorithm (*) 647

13.7GRAPH PROBLEMS 65113.7.1An Alternative Algorithm for TransitiveClosure. . . 654

CONTENTS xiii

13.7.2All-PairsShortestPaths 65513.8COMPUTINGTHECONVEX HULL 65613.9LOWER BOUNDS 659

13.9.1A lower boundon averagecasesorting 66013.9.2Findingthe maximum 662

13.PREFERENCESAND READINGS 66313.11ADDITIONALEXERCISES 665

14MESHALGORITHMS 66714.1COMPUTATIONALMODEL 66714.2PACKETROUTING 669

14.2.1PacketRoutingon a LinearArray 67014.2.2A Greedy Algorithm for PPRon a Mesh 67414.2.3A RandomizedAlgorithm With SmallQueues 676

14.3FUNDAMENTALALGORITHMS 67914.3.1Broadcasting 68114.3.2PrefixComputation 68114.3.3DataConcentration 68514.3.4SparseEnumerationSort 686

14.4SELECTION 69114.4.1A RandomizedAlgorithm for n = p (*) 69114.4.2RandomizedSelectionForn >p (*) 69214.4.3A DeterministicAlgorithm For n >p 692

14.5MERGING 69814.5.1Rank Mergeon a LinearArray 69814.5.2Odd-EvenMergeon a LinearArray 69914.5.3Odd-EvenMergeon a Mesh 699

14.6SORTING 70114.6.1Sortingon a LinearArray 70114.6.2Sortingon a Mesh 703

14.7GRAPH PROBLEMS 70814.7.1An n x n Mesh Algorithm for Transitive Closure. . . 71014.7.2All PairsShortestPaths 711

14.8COMPUTINGTHECONVEX HULL 71314.9REFERENCESAND READINGS 71814.10ADDITIONALEXERCISES 719

15HYPERCUBEALGORITHMS 72315.1COMPUTATIONALMODEL 723

xiv CONTENTS

15.1.1The Hypercube 72315.1.2The Butterfly Network 72615.1.3EmbeddingOf OtherNetworks 727

15.2PPRROUTING 73215.2.1A Greedy Algorithm 73215.2.2A RandomizedAlgorithm 733

15.3FUNDAMENTALALGORITHMS 73615.3.1Broadcasting 73715.3.2PrefixComputation 73715.3.3DataConcentration 73915.3.4SparseEnumerationSort 742

15.4SELECTION 74415.4.1A RandomizedAlgorithm for n = p (*) 74415.4.2RandomizedSelectionForn >p (*) 74515.4.3A DeterministicAlgorithm For n >p 745

15.5MERGING 74815.5.1Odd-EvenMerge 74815.5.2BitonicMerge 750

15.6SORTING 75215.6.1Odd-EvenMergeSort 75215.6.2BitonicSort 752

15.7GRAPH PROBLEMS 75315.8COMPUTINGTHECONVEX HULL 75515.9REFERENCESAND READINGS 75715.10ADDITIONALEXERCISES 758

INDEX 761

PREFACE

If we try to identify thosecontributionsof computersciencewhich will belonglasting,surely oneof thesewill be the refinementof the conceptcalledalgorithm.Ever sinceman invented the ideaof a machinewhich couldperform basicmathematicaloperations,the study of what canbecomputedandhow it canbedonewell was launched.Thisstudy, inspiredby the computer,has led to the discoveryof many importantalgorithmsand designmethods.Thedisciplinecalledcomputersciencehas embracedthe study of algorithmsas its own. It is the purposeof this bookto organizewhat is known aboutthem in a coherentfashion sothat studentsand practitionerscan learntodeviseand analyze new algorithmsfor themselves.

A bookwhich containsevery algorithmever invented would beexceedingly large.Traditionally, algorithmsbooksproceededby examiningonly asmallnumberof problemareasin depth.Foreachspecificproblemthe mostefficient algorithmfor its solutionis usually presentedand analyzed. Thisapproachhas onemajorflaw. Thoughthe studentseesmany fast algorithmsand may masterthe toolsof analysis, she/heremainsunconfidentabout howto devisegoodalgorithmsin the first place.

The missingingredientis a lack of emphasison designtechniques.Aknowledgeof designwill certainlyhelp one to creategoodalgorithms,yetwithout, the toolsof analysis thereis no way todeterminethe quality of theresult.Thisobservationthat designshouldbe taught on a parwith analysisledus to a morepromisinglineof approach:namely to organizethis bookaroundsomefundmental strategiesof algorithmdesign.Thenumberofbasic designstrategiesis reasonably small.Moreoverallof the algorithmsonewouldtypicallywish to study caneasilybefit into thesecategories;forexample, inergesortand quicksortareperfectexamplesof the divide-and-conquerstrategy while Kruskal'sminimum spanningtree algorithmand Dijkstra'ssinglesourceshortestpath algorithmare straight forward examplesof thegreedy strategy. An understandingof thesestrategiesis an essentialfirststeptowards acquiringthe skillsof design.

Thoughwe strongly feel that the emphasison designas well as analysisis the appropriateway to organizethe study of algorithms,a cautionaryremarkis in order.First,we have not includedevery knowndesignprinciple.

xv

XVI PREFACE

One exampleis linearprogrammingwhich is one of the most successfultechniques,but is often discussedin acourseof its own. Secondly,the studentshouldbe inhibitedfrom taking a cookbookapproachto algorithmdesignby assumingthat eachalgorithmmust derivefrom only a singletechnique.This is not so.

A majorportionof this book,Chapters3 through 9, dealwith thedifferent designstrategies.Firsteachstrategy is describedin generalterms.Typically a \"programabstraction\"is given whichoutlinesthe form that thecomputationwill take if this strategy can be applied.Followingthis therearea successionof exampleswhich revealthe intricaciesand varietiesof thegeneralstrategy. Theexamplesare somewhatloosely orderedin termsofincreasingcomplexity.The type of complexitymay arisein severalways.Usually we begin with a problemwhich is very simpleto understandandrequiresno data structuresotherthan a one-dimensionalarray. For thisproblemit is usually obvious that the designstrategy yields a correctsolution. Laterexamplesmay requirea proofthat an algorithmbasedon thisdesigntechniquedoeswork. Or, the lateralgorithmsmay requiremoresophisticateddatastructures(e.g.,treesorgraphs)and their analyses may bemorecomplex.Themajorgoalof this organizationis to emphasizethe artsof synthesis and analysis of algorithms.Auxiliary goalsare to exposethestudent to goodprogramstructureand to proofs of algorithmcorrectness.

Thealgorithmsin this bookarepresentedin a pseudocodethatresembles C and Pascal.Section1.2.1describesthe pseudocodeconventions.Executable versions(inC++)of many of thesealgorithmscanbe found in ourhomepage.Most of the algorithmspresentedin this bookareshort and thelanguageconstructsused to describethem aresimpleenoughthat any onecan understand.Chapters13,14,and 15dealwith parallelcomputing.

Another specialfeatureof this bookis that we cover the areaofrandomized algorithmsextensively.Many of the algorithmsdiscussedin Chapters13,14,and 15arerandomized.Somerandomizedalgorithmsarepresentedin the otherchaptersaswell.An introductoryonequartercourseonparallelalgorithmsmight cover Chapters13,14,and 15and perhapssomeminimaladditionalmaterial.

We have identified certainsectionsof the text (indicatedwith (*)) thataremoresuitablefor advanced courses.We view the materialpresentedinthis bookas idealfor a onesemesteror two quartercoursegiven to juniors,seniors,or graduatestudents. It doesrequirepriorexperiencewith

programming in a higher level languagebut everything elseis self-contained.Practicallyspeaking,it seemsthat a courseon datastructuresis helpful, ifonly for the fact that the studentshave greaterprogrammingmaturity. Fora schoolonthe quartersystem, the first quartermight cover the basicdesigntechniquesas given in Chapters3 through9:divide-and-conquer,the greedymethod,dynamic programming,searchand traversal,backtracking,branch-and-bound,and algebraicmethods(seeTABLEI).Thesecondquarterwouldcover Chapters10through 15:lower bound theory, .ATP-completenessand

PREFACE xvn

approximationmethods,PRAM algorithms,Mesh algorithmsand Hyper-cubealgorithms(seeTABLE II).

Week1

2

34

T)

(i7

8910

SubjectIntroductionIntroductionDatastructuresDatastructuresDivide-and-conquer

Thegreedy method

Dynamic programmingSearchand traversaltechniques

BacktrackingBranch-and-boundAlgebraicmethods

Reading1.1to 1.31.42.1,2.22.3to2.6Chapter3AssignmentI dueChapter4ExamIChapter5Chapter6AssignmentIIdueChapter7Chapter8Chapter9AssignmentIIIdueExamII

TABLE I:FIRSTQUARTER

Fora semesterschedulewherethe studenthas not beenexposedto datastructuresand O-notation,Chapters1 through 7, 11,and 13is about theright, amountof material(seeTABLE III).

A morerigorouspacewould cover Chapters1 to 7, 11,13,and 14(seeTABLE IV).

An advanced course,for thosewho have priorknowledgeabout datastructuresand O notation,might consistof Chapters3 to 11,and 13to 15(seeTABLE V).

Programsfor most of the algorithmsgiven in this bookareavailablefromthe following URL:http://www.cise.ufl.edu/~raj/BOOK.html.Pleasesendyour commentsto [email protected].

Forhomeworktherearenumerousexercisesat the end of eachchapter.The most popularand instructivehomeworkassignmentwe have found isone which requiresthe student to executeand timetwo programsusingthesamedata sets.Sincemost of the algorithmsin this bookprovide all theimplementationdetails,they can be easily madeuse of. Translatingthesealgorithmsinto any programminglanguageshould be easy. The problemthen reducesto devisingsuitabledata sets and obtainingtiming results.The timing resultsshouldagreewith the asymptotic analysis that was done

xviii PREFACE

Week12

34

5

67

89

10

SubjectLowerbound theoryLowerbound theoryA/'P-completeand A/'P-hardproblemsAfP-completeand .ATP-hard problemsAAP-completeand NV-hardproblemsApproximationalgorithms

Approximationalgorithms

PRAM algorithmsPRAM algorithms

Mesh algorithmsMesh algorithmsHypercubealgorithmsHypercubealgorithms

Reading10.1to10.310.411.1,11.211.3,11.411.5,11.612.1,12.2AssignmentI due12.3to 12.6ExamI13.1to 13.413.5to 13.9AssignmentIIdue14.1to 14.514.6to 14.815.1to 15.315.4to 15.8AssignmentIIIdueExamII

TABLE II:SECONDQUARTER

PREFACE xix

Week12

34

5

67

8910

111213

14ir>

1(5


Divide-and-conquer

Thegreedy methodThegreedy method

Dynamic programmingDynamic programmingSearchand traversal

BacktrackingBacktrackingAAP-complete and AfP-hard problems

.AA'P-completeand AAP-hard problemsPRAM algorithms \342\226\240

PRAM algorithms

Reading1.1to 1.31.42.1,2.22.3to 2.63.1to 3.4AssignmentI due3.5to 3.7ExamI4.1to 4.44.5to 4.7AssignmentIIdue5.1to5.55.6to5.106.1to 6.4AssignmentIIIdueExamII7.1to 7.37.4to 7.611.1to 11.3Assignment IV due11.4to 11.613.1to 13.413.5to 13.9Assignment V dueExamIII

TABLE III:SEMESTER-Medium pace(nopriorexposure)

XX PREFACE

Week12

34

5

67

8

9

10

111213

141516


Divide-and-conquerThegreedy method

Thegreedy methodDynamic programming

Dynamic programmingSearchand traversaltechniquesSearchand traversaltechniquesBacktrackingBacktracking

AfP-hard and AA'P-completeproblemsTVP-hard and AAP-complete problemsPRAM algorithms

PRAM algorithmsMesh algorithmsMesh algorithms

Reading1.1to 1.31.42.1,2.22.3to 2.63.1to 3.5AssignmentI due3.6to 3.74.1to4.3ExamI4.4to 4.75.1to 5.7AssignmentIIdue5.8to 5.106.1to 6.26.3,6.47.1,7.27.3to7.6AssignmentIIIdueExamII11.1to 11.311.4to11.613.1to 13.4Assignment IV due13.5to 13.914.1to14.314.4to 14.8Assignment V dueExamIII

TABLE IV:SEMESTER-Rigorouspace(nopriorexposure)

PREFACE xxi

Week12

34

5

67

8910

111213

14in

1<>

SubjectDivide-and-conquerDivide-and-conquerThegreedy methodThegreedy methodDynamic programming

Searchand traversaltechniques

BacktrackingBranch-arid-bound

AlgebraicmethodsLowerbound theoryA^P-completeand N V-hardproblems

AA'P-completeand ATP-hard problemsPRAM algorithmsPRAM algorithms

Mesh algorithmsMesh algorithmsHypercubealgorithmsHypercubealgorithms

Reading3.1to 3.53.6,3.74.1to4.34.4to 4.7Chapter5AssignmentI dueChapter6ExamIChapter7Chapter8AssignmentIIdueChapter9Chapter1011.1to 11.3ExamIIAssignmentIII11.4to 11.613.1to13.413.5to 13.9Assignment IV due14.1to 14.514.6to 14.815.1to 15.315.4to 15.8AssignmentV dueExamIII

TABLE V: SEMESTER- Advancedcourse(rigorouspace)

XXII PREFACE

for the algorithm.This is a nontrivial task which can be both educationaland fun. Most importantly it emphasizesan aspectof this field that is oftenneglected,that thereis an experimentalsideto the practiceof algorithms.

AcknowledgementsWe aregratefulto Martin J.Biernat,Jeff Jenness,SaleemKhan,Ming-YangKao,DouglasM. Campbell,and StephenP.Leachfor theircriticalcommentswhich have immenselyenhancedour presentation.We are thankful to thestudents at UF for pointingout mistakesin earlierversions. We are alsothankful to TeoGonzalez,Danny Krizanc,and David Wei who carefullyreadportionsof this book.

EllisHorowitz

SartajSahni

SanguthevarRajasekaranJune,1997

Chapter 1

INTRODUCTION

1.1WHAT IS AN ALGORITHM?Theword algorithmcomesfrom the nameof a Persianauthor, Abu Ja'farMohammedibn Musa al Khowarizmi(c.825A.D.),who wrote a textbookon mathematics.This word has taken on a specialsignificancein computerscience,where \"algorithm\" has cometo refer to a methodthat can beusedby a computerfor the solutionof a problem.This is what makesalgorithmdifferent from words suchas process,technique,or method.

Definition1.1[Algorithm]: An algorithm is a finite setof instructionsthat,if followed,accomplishesa particulartask. In addition,all algorithmsmustsatisfy the following criteria:

1.Input.Zeroor morequantitiesareexternallysupplied.2. Output.At leastonequantity is produced.3.Definiteness.Eachinstructionis clearand unambiguous.4. Finiteness.If we traceout the instructionsof an algorithm,then for

allcases,the algorithmterminatesafter a finite numberof steps.5.Effectiveness.Every instructionmust bevery basicso that it canbe

carriedout, in principle,by a personusingonly penciland paper.Itis not enoughthat eachoperationbe definite as in criterion3; it alsomust befeasible. \342\226\241

An algorithmis composedof a finite set of steps,eachof which mayrequireoneor moreoperations.Thepossibility of a computercarrying outtheseoperationsnecessitatesthat certainconstraintsbeplacedon the typeof operationsan algorithmcan include.

1

2 CHAPTER1. INTRODUCTION

Criteria1and 2 requirethat an algorithmproduceoneor moreoutputsand have zeroormoreinputs that areexternallysupplied.Accordingtocriterion 3,eachoperationmust be definite, meaningthat it must beperfectlyclearwhat shouldbedone.Directionssuchas \"add6 or 7 to x\" or \"compute5/0\"arenot permittedbecauseit is not clearwhich of the two possibilitiesshouldbe doneor what the result is.

Thefourth criterionfor algorithmswe assumein this bookis that theyterminateafter a finite numberof operations.A relatedconsiderationisthat the timefor terminationshouldbe reasonably short.For example,analgorithmcouldbe devisedthat decideswhetherany given positionin thegameof chessis a winning position.Thealgorithmworks by examiningallpossiblemoves and countermovesthat couldbemadefrom the startingposition. Thedifficulty with this algorithmis that even usingthe most moderncomputers,it may takebillionsof years to make the decision.We must bevery concernedwith analyzing the efficiency of eachof our algorithms.

Criterion5 requiresthat eachoperationbe effective; eachstepmust besuchthat it can,at leastin principle,be doneby a personusingpencilandpaperin a finite amount of time. Performingarithmeticon integersis anexampleof an effective operation,but arithmeticwith realnumbersis not,sincesomevalues may beexpressibleonly by infinitely longdecimalexpansion. Adding two suchnumberswould violatethe effectivenessproperty.

Algorithms that are definite and effective are alsocalledcomputationalprocedures.Oneimportantexampleof computationalproceduresis theoperating system of a digitalcomputer.Thisprocedureis designedto controlthe executionof jobs,in such a way that when no jobsare available,itdoesnot terminatebut continuesin a waiting state until a new jobisentered. Thoughcomputationalproceduresincludeimportantexamplessuchas this one,we restrictour study to computationalproceduresthat alwaysterminate.

To helpus achievethe criterionof definiteness,algorithmsarewrittenin aprogramminglanguage.Suchlanguagesaredesignedso that eachlegitimatesentencehasa uniquemeaning.A program is the expressionof an algorithmin a programminglanguage.Sometimeswords suchas procedure,function,and subroutineare used synonymously for program.Most readersof thisbookhave probably already programmedand run somealgorithmson acomputer.This is desirablebecausebeforeyou study a conceptin general,it helpsif you had somepracticalexperiencewith it.Perhapsyou had somedifficulty gettingstartedin formulating an initialsolutiontoa problem,orperhapsyou were unabletodecidewhich of two algorithmswas better.Thegoalof this bookis to teachyou how to make thesedecisions.

The study of algorithmsincludesmany important and active areasofresearch.Thereare four distinctareasof study onecan identify:

1. How to devise algorithms \342\200\224 Creatingan algorithmis an art whichmay never be fully automated.A majorgoalof this bookis to study vari-

1.1.WHAT ISAN ALGORITHM? 3

ousdesigntechniquesthat have proven to be useful in that they have oftenyieldedgoodalgorithms.By masteringthesedesignstrategies,it will becomeeasierfor you to devisenew and useful algorithms.Many of the chaptersof this bookareorganizedaroundwhat we believeare the majormethodsof algorithmdesign.Thereadermay now wish to glanceback at the tableof contentsto seewhat thesemethodsarecalled.Someof thesetechniquesmay already be familiar, and somehave beenfound to be souseful thatbookshave beenwritten about them. Dynamic programmingis one suchtechnique.Someof the techniquesareespeciallyuseful in fields otherthancomputersciencesuchas operationsresearchand electricalengineering.Inthis bookwe canonly hopetogive an introductionto thesemany approachesto algorithmformulation.All of the approacheswe considerhaveapplications in a variety of areasincludingcomputerscience.But someimportantdesigntechniquessuchaslinear,nonlinear,and integerprogrammingarenotcoveredhereas they are traditionally coveredin other courses.

2. How to validate algorithms\342\200\224 Oncean algorithmis devised,it isnecessaryto show that it computesthe correctanswer for allpossiblelegalinputs. We refer to this processas algorithm validation. The algorithmneednot as yet beexpressedas a program.It is sufficient tostate it in anypreciseway. Thepurposeof the validation is toassureus that this algorithmwill work correctlyindependentlyof the issuesconcerningthe programminglanguageit will eventually be written in.Oncethe validity of the methodhas IHie n shown,a programcanbewritten and a secondphasebegins.Thisphaseis referredtoasprogramproving or sometimesas programverification.A proofof correctnessrequiresthat the solutionbe stated in two forms.One form is usually as a programwhich is annotatedby a set of assertionsabout the input and output variablesof the program. Theseassertionsare often expressedin the predicatecalculus.The secondform is calledaspecification,and this may alsobe expressedin the predicatecalculus.Aproofconsistsof showing that thesetwo forms are equivalent in that forevery given legalinput, they describethe sameoutput. A completeproofof programcorrectnessrequiresthat eachstatementof the programminglanguagebepreciselydefinedand all basicoperationsbeprovedcorrect.Allthesedetailsmay causea proofto bevery much longerthan the program.

3. How to analyze algorithms \342\200\224 This field of study is calledanalysisof algorithms.As an algorithmis executed,it uses the computer'scentralprocessingunit (CPU) to performoperationsand its memory (bothimmediate and auxiliary) to hold the programand data. Analysis of algorithmsor performanceanalysis refers to the task of determininghow muchcomputing timeand storagean algorithmrequires.This is a challengingareawhich sometimesrequiresgreatmathematicalskill.An importantresult ofthis study is that it allows you to makequantitative judgmentsabout thevalue of onealgorithmover another.Another result is that it allowsyou topredictwhether the software will meetany efficiency constraintsthat exist.


Questionssuchas how well doesan algorithmperformin the bestcase,inthe worst case,or on the averagearetypical. For eachalgorithmin the text,an analysis is alsogiven.Analysis is morefully describedin Section1.3.2.

4. How to test a program \342\200\224 Testinga programconsistsof two phases:debuggingand profiling (or performancemeasurement).Debuggingis theprocessof executingprogramson sampledata sets to determinewhetherfaulty resultsoccurand, if so,to correctthem. However, as E.Dijkstrahas pointedout, \"debuggingcan only point to the presenceof errors,butnot totheir absence.\"In casesin which we cannotverify the correctnessofoutput on sampledata, the following strategy can be employed:let morethan oneprogrammerdevelopprogramsfor the sameproblem,and comparethe outputs producedby theseprograms.If the outputs match,then thereis a goodchancethat they arecorrect.A proofof correctnessis much morevaluable than a thousandtests(if that proofis correct),sinceit guaranteesthat the programwill work correctlyfor all possibleinputs. Profiling orperformancemeasurementis the processof executinga correctprogramondatasetsand measuringthe timeand spaceit takesto computethe results.Thesetimingfigures areuseful in that they may confirm a previously doneanalysis and point out logicalplacesto performuseful optimization.Adescriptionof the measurementof timingcomplexitycanbefound in Section1.3.5.Forsomeof the algorithmspresentedhere,we show how todevisearangeof datasetsthat will be useful for debuggingand profiling.

Thesefour categoriesserveto outlinethe questionswe ask aboutalgorithms throughoutthis book.As we can't hopeto cover all thesesubjectscompletely,we contentourselveswith concentratingon designand analysis,spendinglesstimeon programconstructionand correctness.

EXERCISES

1.Lookup the wordsalgorismand algorithmin your dictionary and writedown their meanings.

2.Thenameal-Khowarizmi(algorithm)literally means \"from the townof Khowarazm.\" This city is now known as Khiva, and is locatedinUzbekistan.Seeif you can find this country in an atlas.

3.Usethe WEB tofind out moreabout al-Khowarizmi,e.g.,his dates,apicture,or a stamp.

1.2.Ah GORITHMSPECIFICATION 5

1.2 ALGORITHMSPECIFICATION1.2.1PseudocodeConventionsIn computationaltheory, we distinguishbetweenan algorithmand aprogram. Thelatterdoesnot have to satisfy the finitenesscondition.Forexample, we can think of an operatingsystem that continuesin a \"wait\" loopuntil morejobsareentered.Sucha programdoesnot terminateunlessthesystem crashes.Sinceour programsalways terminate,we use \"algorithm\"and \"program\" interchangeablyin this text.

We can describean algorithmin many ways. We can use a naturallanguagelike English,althoughif we selectthis option,we must make surethat the resultinginstructionsare definite. Graphicrepresentationscalledflowcharts areanotherpossibility,but they work well only if the algorithmis smalland simple.In this text we presentmost of our algorithmsusingapseudocodethat resemblesC and Pascal.

1.Commentsbeginwith // and continueuntil the end of line.

2.Blocksare indicatedwith matchingbraces:{ and }. A compoundstatement(i.e.,a collectionof simplestatements)can be representedas a block.The body of a procedurealsoforms a block.Statementsaredelimitedby ;.

3. An identifier beginswith a letter.The data types of variablesarenot explicitlydeclared. The types will be clearfrom the context.Whethera variable is globalorlocalto a procedurewill alsobeevidentfrom the context.We assumesimpledata types suchas integer,float,char,boolean,and soon.Compounddata types can be formed withrecords.Hereis an example:

node= record{ datatypeA dataA;

datatype jn datajn;node *link;

}In this example,link is a pointerto the recordtype node.Individualdataitemsof a recordcanbeaccessedwith \342\200\224> and period.Forinstanceif p pointsto a recordof type node,p \342\200\224> dataA stands for the value ofthe first field in the record.On the otherhand, if q is a recordof typenode,q.dataA will denoteits first field.


4. Assignment of values to variablesis doneusingthe assignmentstatement

(variable):=(expression);5.Thereare two booleanvalues trueand false.In orderto produce

thesevalues, the logicaloperatorsand,or,and notand the relationaloperators<,<,=,/,>,and > areprovided.

6.Elementsof multidimensionalarrays are accessedusing [ and ]. Forexample,if A is a two dimensionalarray, the (i,j)thelementof thearray is denotedas -A[i,j].Array indicesstart at zero.

7. Thefollowing loopingstatementsareemployed:for,while,and repeat-until.The whilelooptakesthe following form:

while (condition)do{

(statement1)

(statementn)}

As longas (condition)is true,the statementsget executed.When(condition)becomesfalse,the loopis exited.Thevalue of (condition)is evaluatedat the top of the loop.Thegeneralform of a for loopis

for variable:=valueltovalue2 stepstepdo{

(statement1)

(statementn)}

Herevaluel,value2,and steparearithmeticexpressions.A variableof type integeror realor a numericalconstant is a simpleform of anarithmeticexpression.Theclause\"stepstep\" is optionaland takenas +1 if it doesnot occur,stepcouldeitherbe positiveor negative.variableis testedfor terminationat the start of eachiteration.Thefor loopcan be implementedas a whileloopas follows:

1.2.ALGORITHMSPECIFICATION 7

variable:=valuel;fin :=value2;incr :=step;while ((variable\342\200\224 fin) * step< 0) do{

{statement1)

{statementn)variable:=variable+ incr;

}A repeat-untilstatementis constructedas follows:

repeat{statement1)

{statementn)until{condition)

Thestatementsareexecutedas longas {condition)is false.Thevalueof {condition)is computedafter executingthe statements.The instructionbreak;can be usedwithin any of the above loopinginstructionsto forceexit. In caseof nestedloops,break;resultsinthe exitof the innermostloopthat it is a part of. A returnstatementwithin any of the above alsowill result in exitingthe loops.A returnstatementresultsin the exitof the function itself.

8. A conditionalstatementhas the following forms:

if {condition)then{statement)if {condition)then{statement1)else(statement2)

Here(condition)is abooleanexpressionand (statement),(statement1)and (statement2) arearbitrary statements(simpleor compound).We alsoemploy the following casestatement:

case{

\342\200\242.{condition 1):(statement1)

'.(conditionn): (statementn):else:(statementn + 1)


Here(statement1),(statement2), etc.couldbeeithersimplestatements or compoundstatements.A casestatementis interpretedasfollows. If (condition1) is true, (statement1) getsexecutedandthe casestatementis exited.If (statement1) is false,(condition2)is evaluated. If (condition2) is true, (statement2) gets executedand the casestatementexited,and soon. If noneof the conditions(condition1),..., (conditionn) aretrue,(statementn+1)is executedand the casestatementis exited.Theelseclauseis optional.

9. Input and output aredoneusingthe instructionsreadand write.Noformat is usedto specify the sizeof input or output quantities.

10.Thereis only one type of procedure:Algorithm.An algorithmconsists of a headingand a body. The headingtakesthe form

AlgorithmName((parameterlist))

whereName is the nameof the procedureand ({parameterlist)) isa listing of the procedureparameters.The body has one or more(simpleor compound)statementsenclosedwithin braces{and }.An

algorithmmay or may not return any values. Simplevariablestoproceduresare passedby value. Arrays and recordsare passedbyreference.An array nameor a recordname is treatedas a pointertothe respectivedata type.As anexample,the following algorithmfindsand returnsthe maximumof n given numbers:

1 AlgorithmMax(A, n)2 // A is an array of sizen.3 {4 Result:=A[l];5 for i :=2 to n do6 if A[i] > ResultthenResult:=A[i\\;7 returnResult;8 }

In this algorithm(namedMax), A and n are procedureparameters.Resultand i arelocalvariables.

Next we presenttwo examplesto illustratethe processof translatingaprobleminto an algorithm.

Example1.1[Selectionsort] Supposewe must devisean algorithmthatsortsa collectionof n > 1elementsof arbitrary type. A simplesolutionisgiven by the following


From thoseelementsthat arecurrentlyunsorted,find the smallestand placeit next in the sortedlist.

Although this statementadequately describesthe sortingproblem,it isnot an algorithmbecauseit leavesseveralquestionsunanswered.Forexample,

it; doesnot tell us whereand how the elementsare initially storedorwherewe shouldplacethe result.We assumethat the elementsarestoredin an array a, such that the ith integer is storedin the ith. positiona[i],1< i <n.Algorithm 1.1is our first attemptat derivinga solution.

1 for i :=1to n do2 {3 Examinea[i] to a[n] and suppose4 the smallestelementis at a[j];5 Interchangea[i] and a[j];6 }

Algorithm1.1Selectionsortalgorithm

To turn Algorithm 1.1into a pseudocodeprogram,two clearly definedsubtasksremain:finding the smallestelement(say a[j])and interchangingit with a[i\\. We can solve the latterproblemusingthe code

t :=a[i\\; a[i] :=a[j};a[j]:=t;

Thefirst subtaskcanbesolvedby assumingthe minimum is a[i],checkinga[i] with a[i+ l],a[i+ 2],...,and, whenever a smallerelementis found,regardingit as the new minimum. Eventually a[n] is comparedwith thecurrentminimum,and we aredone.Putting alltheseobservationstogether,we get the algorithmSelectionSort(Algorithm 1.2).

The obvious questionto ask at this point is,DoesSelectionSortworkcorrectly?Throughoutthis text we use the notationa[i :j] todenotethearray elementsa[i] througha[j].Theorem1.1Algorithm SelectionSort(a,n) correctlysortsa set of n > 1elements;the result remainsin a[l:n] suchthat a[l]< a[2]< \342\200\242\342\200\242\342\200\242<a[n].Proof:We first note that for any i, say i = q, following the executionoflines6 to 9, it is the casethat a[q] < a[r], q < r < n. Also observethatwhen i becomesgreaterthan q, a[i:q] is unchanged.Hence,following thelast executionof theselines(that is,i = n), we have a[l]< a[2] < \342\226\240\342\200\242\342\226\240<a[n].

We observeat this point that the upper limit of the for loopin line4 canbechangedto n \342\200\224 1without damagingthe correctnessof the algorithm.\342\226\241


1 AlgorithmSelectionSort(a,n)2 // Sort the array a[\\ :n] into nondecreasingorder.3 {4 for i :=1to n do5 {6 j :=i\\7 for k :=i+ 1to n do8 if (o[fc]< a[j9 t :=a[i];a[i]:=a10 }11 1

thenj :=ft;

>'];a[j]:=t;

Algorithm1.2Selectionsort

1.2.2RecursiveAlgorithmsA recursivefunction is a function that is defined in termsof itself.Similarly,an algorithmis saidto be recursiveif the samealgorithmis invoked in thebody. An algorithmthat callsitselfis directrecursive.AlgorithmA is saidtobeindirectrecursiveif it callsanotheralgorithmwhich in turn callsA. Theserecursivemechanismsare extremelypowerful, but even moreimportantly,many timesthey canexpressan otherwisecomplexprocessvery clearly.Forthesereasonswe introducerecursionhere.

Typically,beginningprogrammersview recursionas a somewhatmysticaltechniquethat is useful only for somevery specialclassof problems(suchas computingfactorialsor Ackermann'sfunction). This is unfortunatebecause any algorithmthat canbewrittenusingassignment,the if-then-elsestatement,and the whilestatementcan alsobe written usingassignment,the if-then-elsestatement,and recursion.Of course,this doesnot say thatthe resultingalgorithmwill necessarilybe easierto understand.However,thereare many instanceswhen this will be the case.When is recursionanappropriatemechanismfor algorithmexposition?Oneinstanceis when theproblemitselfis recursively defined.Factorialfits this category, as well asbinomialcoefficients,where

I n \342\200\224 1\\ n!+

m\\(n \342\200\224

m)\\

Thefollowing two examplesshow how to developa recursivealgorithm.In the first example,we considerthe Towersof Hanoiproblem,and in thesecond,we generateallpossiblepermutationsof a list of characters.


Example1.2[Towers of Hanoi] The Towers of Hanoipuzzleis fashionedafter the ancientTower of Brahmaritual (seeFigure1.1).Accordingtolegend, at the timethe world was created,therewas a diamondtower (labeledA) with 64 goldendisks.Thedisks wereof decreasingsizeand were stackedon tin: tower in decreasingorderof sizebottomto top.Besidesthis towertherewere two otherdiamondtowers (labeledB and C).Sincethe timeof creation,Brahmanpriestshave beenattemptingto move the disks fromtower A totower B usingtower C for intermediatestorage.As the disks arevery heavy, they canbe moved only one at a time. In addition,at no timecan a disk be on top of a smallerdisk.Accordingto legend,the world willcometo an end when the priestshave completedtheir task.

TowerA TowerB TowerC

Figure1.1Towersof Hanoi

A very elegantsolutionresultsfrom the use of recursion.Assume thatthe numberof disks is n. To get the largestdisk tothe bottomof tower B,we move the remainingn \342\200\224 1disks to tower C and then move the largestto tower B.Now we are left with the task of moving the disks from towerC to tower B.To do this, we have towers A and B available. The factthat tower B has a disk on it can be ignoredas the disk is largerthan thedisks beingmoved from tower C and so any disk can beplacedon top of it.The recursivenature of the solutionis apparentfrom Algorithm 1.3.Thisalgorithmis invokedby TowersOfHanoi(n,A,B,C).Observethat our solutionfor an n-diskproblemis formulated in termsof solutionsto two (n \342\200\224 l)-diskproblems. \342\226\241

Example1.3[Permutation generator]Given a set of n > 1elements,theproblemis to print all possiblepermutationsof this set.Forexample,ifthe set is {a,b,c},then the setof permutationsis {(a,b, c), (a,c,b),(b,a,c),


1 AlgorithmTowersOfHanoi(n,x,y, z)2 // Move the top n disks from tower x to tower y.3 {4 if (n > 1) then5 {6 TowersOfHanoi(n\342\200\224 l,x,z,y);7 write (\"move top disk from tower\",x,8 \"to top of tower\",y);9 TowersOfHanoi(n\342\200\224 l,z,y, x);10 }11 }

Algorithm1.3Towersof Hanoi

(b,c,a),(c,a,b),(c,b, a)}.It is easy to seethat given n elements,therearen!different permutations.A simplealgorithmcan be obtainedby lookingat the caseof four elements(a,b,c,d).Theanswer can be constructedbywriting

1.a followed by all the permutationsof (b,c,d)2.b followed by all the permutationsof (a,c,d)

3. c followed by all the permutationsof (a,b, d)

4. d followed by all the permutationsof (a,b, c)

Theexpression\"followedby allthe permutations\"is the clueto recursion.It impliesthat we cansolve the problemfor a setwith n elementsif we havean algorithmthat works on n \342\200\224 1elements.TheseconsiderationsleadtoAlgorithm 1.4,which is invoked by Perm(a,l,n). Try this algorithmouton setsof lengthone,two, and threeto ensurethat you understandhow it

works. \342\226\241

EXERCISES1.Horner'srule is a means for evaluatinga polynomial at a point xo

usinga minimum numberof multiplications.If the polynomial is A(x)= anxn+ an_i\302\243n_1 + \342\200\242\342\200\242\342\200\242+ a\\x + a$,Horner'srule is


1 AlgorithmPerm(a,k, n)2 <3 if (A; = n) thenwrite (a[l:n]); // Output permutation.4 else//a[A; :n] has morethan onepermutation.5 // Generatetheserecursively.6 for % :\342\200\224 k to n do7 {8 \302\243 :=a[k];a[k] :=a[i];a[i] :=t;9 Perm(a,A; + l,n);10 // All permutationsof a [A; + 1:n]11 t :=a[k];a[k] :=a[i];a[i] :=t;12 }13 }

Algorithm1.4Recursivepermutationgenerator

A(x0) =(\342\200\242\342\200\242\342\200\242 (anxo+ anî)x0 -\\ \\-ai)x0+ a0

Write an algorithmto evaluatea polynomial usingHorner'srule.

2. Given n booleanvariables x\\,X2,---,and xn, we wish to print allpossiblecombinationsof truth values they can assume.Forinstance,if n = 2,therearefour possibilities:true,true;true,false;false,true;and false,false.Write an algorithmto accomplishthis.

3. Devisean algorithmthat inputs threeintegersand outputs them innondecreasingorder.

4. Presentan algorithmthat searchesan unsortedarray a[l:n] for theelementx. If x occurs,then return a positionin the array; elsereturnzero.

5. The factorialfunction n!has value 1when n < 1and value n * (n \342\200\224 1)!when n > 1. Write both a recursiveand an iterativealgorithmtocomputen\\.

6. The Fibonaccinumbersaredefinedas /o = 0,f\\ = 1,and /j = /j_i +/j._2for i > 1.Write both a recursiveand an iterativealgorithmtocompute/j.

7. Give both a recursiveand an iterativealgorithmto computethebinomial coefficient (^) as defined in Section1.2.2,whereQ) = (n) = 1.


8.Ackermann'sfunction A(m, n) is definedas follows:

(nil if m = 0A(m,n) = < A(m-1,1) if n = 0

[ A(m \342\200\224 1,A(m, n \342\200\224 1)) otherwise

Thisfunction is studiedbecauseit growsvery fast for smallvaluesof mand n.Write a recursivealgorithmfor computingthis function.Thenwrite a nonrecursivealgorithmfor computingit.

9. Thepigeonholeprinciplestatesthat if a function / hasn distinctinputsbut lessthan n distinctoutputs, then thereexisttwo inputs a and bsuch that a / b and f(a) = f(b).Presentan algorithmto find a andb suchthat f(a) = f(b).Assume that the function inputsare1,2,...,and n.

10.Give an algorithmtosolve the following problem:Given n, a positiveinteger,determinewhethern is the sum of all of its divisors,that is,whethern is the sum of all t suchthat 1< t < n, and t divides n.

11.Considerthe function F(x) that is definedby \"if a; is even,thenF(x) =x/2;elseF(x) = F(F(3x+ 1)).\" Prove that F(x) terminatesforall integersx. (Hint:Considerintegersof the form (2i+ l)2fc \342\200\224 1anduseinduction.)

12.If S is a set of n elements,the powersetof S is the set of allpossiblesubsetsof S. Forexample,if S = (a,b,c),then powerset(S)= {( ),(a), (6), (c), (a,6), (a,c),(6,c),(a,6,c)}.Write a recursivealgorithmto computepowerset(S).

1.3 PERFORMANCE ANALYSISOnegoalof this bookis todevelop skillsfor makingevaluative judgmentsabout algorithms.Thereare many criteriaupon which we can judgeanalgorithm.Forinstance:

1.Doesit do what we want it todo?

2. Doesit work correctlyaccordingto the originalspecificationsof thetask?

3.Istheredocumentationthat describeshow to use it and how it works?

1.3.PERFORMANCE ANALYSIS 15

4. Are procedurescreatedin sucha way that they performlogicalsub-functions'/

5.Is the codereadable?

Thesecriteriaare all vitally important when it comesto writingsoftware, most especiallyfor largesystems.Thoughwe do not discusshow toreachthesegoals,we try to achievethem throughout this bookwith thepseudocodealgorithmswe write. Hopefully this moresubtleapproachwill

gradually infect your own program-writinghabitssothat you willautomatically

strive to achievethesegoals.Thereareothercriteriafor judgingalgorithmsthat have a moredirect

relationshiptoperformance.Thesehave to do with their computingtimeand storagerequirements.

Definition1.2[Space/Timecomplexity] The spacecomplexityof analgorithm is the amount of memory it needsto run to completion.The timecomplexityof an algorithmis the amountof computertimeit needsto runto completion. \342\226\241

Performanceevaluationcan be loosely divided into two majorphases:(1) a prioriestimatesand (2) a posterioritesting. We refer to theseasperformanceanalysis and performancemeasurementrespectively.

1.3.1SpaceComplexityAlgorithm abe (Algorithm1.5)computesa+ b+b*c+(a+ b \342\200\224 c)/(a+6)+4.0;Algorithm Sum (Algorithm 1.6)computesYa=i aM iteratively, wherethea[i]'sare realnumbers;and RSum (Algorithm 1.7)is a recursivealgorithmthat computes2\"=1a[i\\.

1 Algorithmabc(a,6,c)2 {;{ return a + b + b * c+ (a + b -c)/(a+ b) + 4.0;I }

Algorithm1.5Computesa + b + b*c+(a+ b \342\200\224 c)/(a+ b) + 4.0

The spaceneededby eachof thesealgorithmsis seentobe the sum ofthe following components:


1 AlgorithmSum(a,n)2 {3 s:=0.0;4 for i :=1to n do5 s :=s+ a[i];6 returns;7 }

Algorithm1.6Iterativefunction for sum

1 AlgorithmRSum(a,n)2 {3 if (n < 0) thenreturn0.0;4 elsereturnRSum (a,n \342\200\224 1)+ a[n];5 }

Algorithm1.7Recursivefunction for sum


1.A fixed part that is independentof the characteristics(e.g.,number,size)of the inputs and outputs.This part typically includestheinstruction space(i.e.,spacefor the code),spacefor simplevariablesand fixed-sizecomponentvariables(alsocalledaggregate),spaceforconstants,and soon.

2.A variable part that consistsof the spaceneededby componentvariables whosesizeis dependenton the particularprobleminstancebeingsolved,the spaceneededby referencedvariables(to the extentthat thisdependson instancecharacteristics),and the recursionstackspace(insofar as this spacedependson the instancecharacteristics).

The spacerequirementS(P)of any algorithmP may thereforebe writtenas S(P)= c+5p(instancecharacteristics),wherec is a constant.

When analyzing the spacecomplexity of an algorithm,we concentratesolelyonestimating5p(instancecharacteristics).Forany given problem,weneedfirst to determinewhich instancecharacteristicsto usetomeasurethespacerequirements.This is very problemspecific,and we resorttoexamplesto illustratethe various possibilities.Generally speaking,our choicesarelimitedto quantitiesrelatedto the numberand magnitudeof the inputs toand outputs from the algorithm.At times,morecomplexmeasuresof theinterrelationshipsamongthe dataitemsareused.

Example1.4ForAlgorithm 1.5,the probleminstanceis characterizedbythe specificvalues of a, 6, and c. Making the assumptionthat one wordis adequateto storethe values of eachof a, 6, c, and the result,we seethat the spaceneededby abcis independentof the instancecharacteristics.Consequently,Sp(instancecharacteristics)=0. \342\226\241

Example1.5Theprobleminstancesfor Algorithm 1.6are characterizedby n, the numberof elementstobesummed.Thespaceneededby n is oneword, sinceit is of type integer.The spaceneededby a is the spaceneededby variablesof type array of floating point numbers.Thisis at leastn words,sincea must be largeenoughto hold the n elementsto besummed.So,weobtain Ssum(^)\342\200\224 (n + ^) (n f\302\260r a[]'oneeacnf\302\260r ni h and s). \342\226\241

Example1.6Let us considerthe algorithmRSum (Algorithm 1.7).As inthe easeof Sum,the instancesarecharacterizedby n. Therecursionstackspaceincludesspacefor the formal parameters,the localvariables,and thereturn address.Assume that the return addressrequiresonly one word ofmemory. Eachcallto RSum requiresat leastthreewords (includingspacefor the valuesof n, the returnaddress,and a pointertoa[]).Sincethe depthof recursionis n + 1,the recursionstackspaceneededis > 3(n + 1). \342\226\241


1.3.2 TimeComplexityThetimeT(P) taken by a programP is the sum of the compiletimeandthe run (or execution)time. The compiletimedoesnot dependon theinstancecharacteristics.Also, we may assumethat a compiledprogramwill be run severaltimeswithout recompilation.Consequently,we concernourselveswith just the run timeof a program.This run timeis denotedbytp (instancecharacteristics).

Becausemany of the factorstp dependson are not known at the timea programis conceived,it is reasonableto attemptonly to estimatetp. Ifwe knew the characteristicsof the compilertobe used,we couldproceedtodeterminethe numberof additions,subtractions,multiplications,divisions,compares,loads,stores,and soon, that would be madeby the codefor P.So,we couldobtainan expressionfor tp(n) of the form

tP(n) = caADD(n)+ csSUB{n)+ cmMUL{n)+ cdDIV{n)+ \342\226\240\342\226\240\342\226\240

wheren denotesthe instancecharacteristics,and ca,cs,cm, q, and soon,respectively,denotethe timeneededfor an addition,subtraction,multiplication, division, and soon, and ADD, SUB,MUL,DIV,and so on, arefunctions whosevalues arethe numbersof additions,subtractions,multiplications, divisions,and soon, that areperformedwhen the codefor P is usedon an instancewith characteristicn.

Obtainingsuchan exactformula is in itselfan impossibletask,sincethetimeneededfor an addition,subtraction,multiplication,and soon, oftendependson the numbersbeing added,subtracted,multiplied,and soon.The value of tp(n) for any given n can be obtainedonly experimentally.The programis typed, compiled,and run on a particularmachine.Theexecutiontime is physically clocked,and tp(n) obtained.Even with thisexperimentalapproach,one could face difficulties. In a multiusersystem,the executiontimedependson suchfactorsas system load,the numberofotherprogramsrunningon the computerat the timeprogramP is run, thecharacteristicsof theseotherprograms,and soon.

Given the minimalutility of determiningthe exactnumberof additions,subtractions,and so on, that are neededto solve a probleminstancewithcharacteristicsgiven by n, we might aswell lump allthe operationstogether(providedthat the timerequiredby eachis relatively independentof theinstancecharacteristics)and obtain a count for the totalnumberofoperations. We can go one step further and count only the numberof programsteps.

A program step is looselydefinedasa syntactically orsemanticallymeaningful segmentof a programthat has an executiontimethat is independentof the instancecharacteristics.Forexample,the entirestatement

returna + b+b*c+(a+ b-c)/(a+ b) + 4.0;


of Algorithm 1.5could be regardedas a step sinceits executiontime isindependentof the instancecharacteristics(this statementis not strictlytrue, sincethe time for a multiply and divide generallydependson thenumbersinvolved in the operation).

The number of stepsany programstatementis assigneddependson thekind of statement. For example,commentscount as zerosteps;anassignment statementwhich doesnot involve any callsto otheralgorithmsis countedasonestep;in an iterativestatementsuchas the for,while,andrepeat-untilstatements,we considerthe stepcountsonly for the controlpart of the statement.Thecontrolpartsfor for and whilestatementshavethe following forms:

for % :=(expr)to (exprl)do

while ((expr))do

Each executionof the controlpart of a while statementis given a stepcount, equal to the numberof stepcountsassignableto (expr). Thestepcount for eachexecutionof the controlpart of a for statementis one,unlessthe countsattributableto (expr)and (exprl)are functions of the instancecharacteristics.In this latter case,the first executionof the controlpartof the for has a stepcountequal to the sum of the countsfor (expr)and(exprl) (note that theseexpressionsare computedonly when the loopisstarted).Remainingexecutionsof the for statementhave a stepcount ofone;and soon.

We can determinethe numberof stepsneededby a programto solve aparticularprobleminstancein one of two ways. In the first method,weintroducea new variable,count, into the program.This is a globalvariable with initialvalue 0.Statementsto incrementcountby the appropriateamountareintroducedinto the program.This is donesothat eachtimeastatementin the originalprogramis executed,count is incrementedby thestepcount of that statement.

Example1.7When the statementstoincrementcountareintroducedintoAlgorithm 1.6,the result is Algorithm 1.8.Thechangein the value of countby the time this programterminatesis the numberof stepsexecutedbyAlgorithm 1.6.

Sincewe are interestedin determiningonly the changein the value ofcount, Algorithm 1.8may be simplified to Algorithm 1.9.Forevery initialvalue of count, Algorithms 1.8and 1.9computethe samefinal value forcount.It is easy to seethat in the for loop,the value of count will increaseby a total of In. If count is zeroto start with, then it will be 2n + 3 ontermination.Soeachinvocationof Sum (Algorithm 1.6)executesa totalof2n -f 3 steps. \342\226\241


12345678910111213

AlgorithmSum(a,n)1

}

s:=0.0;count:=count+ 1;// count is global;it is initially zero.for i :=1to n do{

count:=count+ 1;// Forfors:=s+ a[i];count:=count+ 1;// Forassignment

}count:=count+ 1;// Forlast timeof forcount:=count+ 1;// Forthe returnreturns;

Algorithm1.8Algorithm 1.6with countstatementsadded

12345

AlgorithmSum(a,n){

for i :=1to n docount:count:=count+ 3;

}

= count+ 2;

Algorithm1.9Simplifiedversionof Algorithm 1.8


Example1.8When the statementsto incrementcountareintroducedintoAlgorithm 1.7,Algorithm 1.10is obtained.Let tRSum(^) be the increaseinthe value of count when Algorithm 1.10terminates.We seethat

\302\243RSum(0)= 2.When n > 0,count increasesby 2 plus whatever increaseresultsfromthe invocationof RSum from within the elseclause.Fromthe definition oftRSum) it follows that this additionalincreaseis t^sum(n~!)\342\200\242 So,if the valueof countiszeroinitially, its value at the timeof terminationis

2+\302\243RSum(^\342\200\224 1),n>0.

12345(5

789101112131415

AlgorithmRSum(a,n){

count:=count+ 1;//

}

if{

(n <0) thencount :\342\200\224 count+ 1;return0.0;

}else{

}

count :=count+ 1;returnRSum (a,n \342\226\240

Forthe if conditional

// Forthe return

// Forthe addition,function// invocationand return

-l)+ a[n];

Algorithm1.10Algorithm 1.7with countstatementsadded

When analyzing a recursiveprogramfor its stepcount,we often obtaina recursiveformula for the stepcount,for example,

, , . _ / 2 if n = 0tRSumW- j 2 + tRSum(n_1) ifn>0

Theserecursiveformulas are referredto as recurrencerelations.One wayof solvingany suchrecurrencerelationis to make repeatedsubstitutionsforeachoccurrenceof the function t^Sum \302\260n the right-handsideuntil all suchoccurrencesdisappear:


tRSum(n) = 2 + tRSum(n-l)= 2 + 2 + \302\243RSum(n-2)= 2(2)+tRSum(n-2)

= n(2) + tRSum{0)= 2n + 2, n > 0

Sothe stepcount for RSum (Algorithm1.7)is 2n+2. \342\226\241

Thestepcountis useful in that it tellsus how the run timefor a programchangeswith changesin the instancecharacteristics.Promthe stepcountforSum,we seethat if n is doubled,the run timealsodoubles(approximately);if n increasesby a factor of 10,the run timeincreasesby a factor of 10;andsoon.So,the run timegrows linearly in n.We say that Sum is a lineartimealgorithm(the timecomplexityis linearin the instancecharacteristicn).

Definition1.3[Input size] One of the instancecharacteristicsthat isfrequently usedin the literatureis the input size.Theinput sizeof any instanceof a problemis defined to be the numberof words (or the numberofelements) neededto describethat instance.The input sizefor the problemof summingan array with n elementsis n + 1,n for listingthe n elementsand 1for the value of n (Algorithms1.6and 1.7).The problemtackledin

Algorithm 1.5has an input sizeof 3. If the input to any probleminstanceis a singleelement,the input sizeis normally taken to be the numberofbitsneededto specify that element.Run timesfor many of the algorithmspresentedin this text areexpressedas functions of the correspondinginputsizes. \342\226\241

Example1.9[Matrix addition] Algorithm 1.11is to add two mxnmatricesa and b together.Introducingthe counHncrementingstatementsleadstoAlgorithm 1.12.Algorithm 1.13is a simplifiedversion of Algorithm 1.12that computesthe samevalue for count.ExaminingAlgorithm 1.13,we seethat line 7 is executedn timesfor eachvalue of i,or a total of ran times;line5 is executedm times;and line9 is executedonce.If countis 0 to beginwith, it will be 2mn+ 2m + 1when Algorithm 1.13terminates.

Promthis analysis we seethat if m > n, then it is betterto interchangethe two for statementsin Algorithm 1.11.If this is done,the stepcountbecomes2mn+2n+l.Note that in this examplethe instancecharacteristicsaregiven by m and n and the input sizeis 2mn+2. \342\226\241

The secondmethodto determinethe stepcount of an algorithmis tobuild a tablein which we list the totalnumberof stepscontributedby eachstatement.This figure is often arrived at by first determiningthe numberof


1 AlgorithmAdd(a,b, c,m,n)2 {3 for i :=1to m do4 for j :=1to n do5 c[ij]:=a[i,j]+b[i,j];6 }

Algorithm1.11Matrix addition

1

2345(>

7891011121314151617

AlgorithmAdd(a,6,c,m,n){

}

for{

}

i :=1to m do

count :=count+ 1;// For'forV

for j :=1to n do{

count :=count+1;// For'forj'c[ij]:=a[i,j]+b[i,j];count:=count+ 1;// For the assignment

}count :=count+ 1;//For loopinitializationand

// lasttimeof 'forj'count:=count+ 1; // For loopinitializationand

// lasttimeof 'forV

Algorithm1.12Matrix additionwith countingstatements


1 AlgorithmAdd(a,b, c,m,n)2 {3 for i :=1to m do4 {5 count :=count+2;6 for j :=1to n do7 count :=count+2;8 }9 count :=count+ 1;10 }

Algorithm1.13Simplifiedalgorithmwith countingonly

stepsperexecution(s/e) of the statementand the totalnumberof times(i.e.,frequency)eachstatementis executed.The s/eof a statementisthe amountby which the count changesas a result of the executionof that statement.By combiningthesetwo quantities,the totalcontributionof eachstatementis obtained.By addingthe contributionsof all statements,the stepcountfor the entirealgorithmis obtained.

In Table1.1,the numberof stepsper executionand the frequency ofeachof the statementsin Sum (Algorithm 1.6)have beenlisted.The totalnumberof stepsrequiredby the algorithmis determinedtobe2n+3.It isimportantto note that the frequency of the for statementis n + 1and notn. This is sobecausei has to be incrementedto n + 1beforethe for loopcan terminate.

Table1.2gives the stepcount for RSum (Algorithm 1.7).Noticethatunder the s/e(stepsperexecution)column,the elseclausehas beengivena count of 1+ t^sum(n \342\200\224

!)\342\200\242

This is the total costof this line eachtimeit is executed.It includesall the stepsthat get executedas a resultof theinvocationof RSum from the elseclause.The frequency and total stepscolumnshave beensplitinto two parts:onefor the casen = 0 and the otherfor the casen >0.This is necessarybecausethe frequency (andhencetotalsteps) for somestatementsis different for eachof thesecases.

Table1.3correspondsto algorithmAdd (Algorithm 1.11).Onceagain,note that the frequency of the first for loopis m + 1and not m. This issoas i needstobe incrementedup to m + 1beforethe loopcan terminate.Similarly,the frequency for the secondfor loopis m(n + 1).

When you have obtainedsufficient experiencein computingstepcounts,you can avoid constructingthe frequency tableand obtainthe stepcountasin the following example.

1.3.PERFORMANCEANALYSIS 25

Statement1 AlgorithmSum (a,n)^2 {3 s:=0.0;4 for i :=1to n do5 s :=s+a[i];6 returns;7 }Total

s/e0011110

frequency

1n + 1n1

totalsteps001n + 1n102n+ 3

Table1.1Steptablefor Algorithm 1.6

Statement1 AlgorithmRSum(a,n)2 {3 if (n <0) then4 return0.0;5 elsereturn6 RSum(a,n \342\200\224 1)+ a[n\\;7 }Total

s/e0

11

1+x0

frequencyn = 0 n > 0

1 11 0

0 1

totalstepsn = 0 n > 0

0 0

1 11 0

0 1+x0 02 2 +x

x = tRSum{n- 1)



Statement1 AlgorithmAdd(a,b,c,m,n)2 {3 for i :=1to m do4 for j :=1to n do5 c[i,j]:=a[i,j]+b[i,j];6 }Total

s/e001110

frequency

m + 1m(n + 1)mn

totalsteps00m + 1mn + mmn02mn + 2m + 1


Example1.10[Fibonaccinumbers]TheFibonaccisequenceof numbersstartsas

0,1,1,2,3,5,8,13,21,34,55,...Eachnew termis obtainedby takingthe sumof the two previousterms.Ifwe callthe first termof the sequence/o,then /o = 0, f\\ = 1,and in general

fn = fn-l+fn-2, U>2Fibonacci(Algorithm 1.14)takesas input any nonnegativeintegern andprints the value fn.

To analyze the timecomplexityof this algorithm,we needto considerthetwo cases(1) n = 0 or 1and (2) n > 1.When n = 0 or 1,lines4 and 5 getexecutedonceeach.Sinceeachlinehas an s/eof 1,the totalstepcount forthis caseis 2.When n > 1,lines4, 8,and 14areeachexecutedonce.Line9 getsexecutedn times,and lines11and 12get executedn \342\200\224 1timeseach(notethat the lasttimeline9 is executed,i is incrementedto n+ 1,and theloopexited).Line8 has an s/eof 2,line12has an s/eof 2,and line13hasan s/eof 0.The remaininglinesthat getexecutedhave s/e'sof 1.Thetotalstepsfor the casen > 1is therefore4n + 1. \342\226\241

Summary ofTimeComplexityThetimecomplexity of an algorithmis given by the numberof stepstakenby the algorithmto computethe function it was writtenfor.Thenumberofstepsis itselfa function of the instancecharacteristics.Although any specificinstancemay have severalcharacteristics(e.g.,the numberof inputs, thenumberof outputs, the magnitudesof the inputs and outputs), the number


1

2a45(>

78910111213141516

Algorit

AlgorithmFibonacci(n)// Computethe nth Fibonaccinumber.{

}

;hm

if (n < 1) thenwrite (n);else

{

}

fnm2 :=0;fnml:=1;for i :=2ton do{

fn :=fnml+fnm2;fnm2 :=fnml;fnml :=fn;

}write (fn);

1.14Fibonaccinumbers

of steps is computedas a function of somesubsetof these. Usually, wechoosethosecharacteristicsthat areof importancetous. For example,wemight wish to know how the computing(orrun) time(i.e.,timecomplexity)increasesas the numberof inputs increase.In this casethe numberof stepswill becomputedas a function of the numberof inputsalone.For a differentalgorithm,we might be interestedin determininghow the computingtimeincreasesas the magnitudeof one of the inputs increases.In this casethenumberof stepswill be computedas a function of the magnitudeof thisinput, alone.Thus,beforethe stepcountof an algorithmcanbedetermined,we needto know exactlywhich characteristicsof the probleminstanceareto be used.Thesedefine the variables in the expressionfor the stepcount.In the caseof Sum,we choseto measurethe timecomplexity as a functionof the numbern of elementsbeingadded.For algorithmAdd, the choiceofcharacteristicswas the numberm of rows and the numbern of columnsinthe matricesbeingadded.

Oncethe relevant characteristics(n,m,p,q,r,...)have beenselected,wecandefinewhat a stepis.A step is any computationunit that is independentof the characteristics(n,m,p,q,r,...).Thus, 10additionscan beonestep;100multiplicationscanalsobe one step;but n additionscannot.Nor canm/2 additions,p +q subtractions,and soon, becountedas onestep.


A systematicway to assignstepcountswas alsodiscussed.Oncethis hasbeendone,the timecomplexity(i.e.,the totalstepcount) of an algorithmcan beobtainedusingeitherof the two methodsdiscussed.

Theexampleswe have lookedat sofar weresufficiently simplethat thetimecomplexitieswerenicefunctions of fairly simplecharacteristicslike thenumberof inputs and the numberof rows and columns.Formanyalgorithms, the timecomplexityis not dependentsolelyon the numberof inputsor outputs or someothereasily specifiedcharacteristic.For example,thesearchingalgorithmyou wrote for Exercise4 in Section1.2,may terminatein onestep if x is the first elementexaminedby your algorithm,or it maytaketwo steps(this happens if x is the secondelementexamined),and soon.In otherwords, knowingn aloneis not enoughto estimatethe run timeof your algorithm.

We can extricateourselvesfrom the difficultiesresultingfrom situationswhen the chosenparametersare not adequateto determinethe stepcountuniquely by defining threekinds of stepcounts:bestcase,worst case,andaverage. The best-casestep count is the minimum numberof stepsthatcan beexecutedfor the given parameters.Theworst-casestep count is themaximumnumberof stepsthat can be executedfor the given parameters.Theaveragestep count is the averagenumberof stepsexecutedon instanceswith the given parameters.

Our motivation to determinestepcounts is to be ableto comparethetimecomplexitiesof two algorithmsthat computethe samefunction andalsoto predictthe growth in run timeas the instancecharacteristicschange.

Determiningthe exactstepcount(bestcase,worst case,oraverage)of analgorithmcanprove to bean exceedinglydifficult task.Expendingimmenseeffort to determinethe stepcountexactlyis not a very worthwhileendeavor,sincethe notion of a step is itselfinexact.(Both the instructionsx :=y;and x :=y + z + (x/y) + (x * y * z \342\200\224 x/z);countas onestep.)Becauseofthe inexactnessof what a stepstands for, the exactstepcount is not veryuseful for comparativepurposes.An exceptionto this is when the differencebetweenthe stepcountsof two algorithmsis very large,as in 3n+ 3 versuslOOn+ 10.We might feel quite safe in predictingthat the algorithmwithstepcount3n+3 will run in lesstimethan the onewith stepcount100n+10.But even in this case,it is not necessaryto know that the exactstepcountis lOOn+10.Somethinglike,\"it'sabout 80nor 85nor 75n,\" is adequatetoarrive at the sameconclusion.

Formostsituations,it is adequateto be abletomakea statementlikec\\n2 < tp(n) < C2n2or tQ(n,m) = c\\n + C2m, wherec\\ and c^ are non-negativeconstants.This is sobecauseif we have two algorithmswith acomplexityof c^n2+ C2n and c^n respectively, then we know that the onewith complexityc^n will be faster than the onewith complexityc\\n2 +c^nfor sufficiently largevaluesof n.Forsmallvaluesof n, eitheralgorithmcouldbefaster (dependingon c\\, C2,and C3).If c\\ = 1,C2 = 2,and C3= 100,then


c\\n2 \\- can < c%n for n < 98 and c\\n2 + C2n > c^n for n > 98.If c\\ = 1,C2= 2, and C3 = 1000,then c\\n2 +C2n < c^n for n < 998.

No matterwhat the values of c\\, C2, and C3, therewill be an n beyondwhicli the algorithmwith complexity c^n will be faster than the one withcomplexityc\\7i2 + c^n. This value of n will be calledthe break-evenpoint.If the break-evenpoint is zero,then the algorithmwith complexityc^n isalways faster (or at leastas fast). The exactbreak-evenpoint cannot bedeterminedanalytically. The algorithmshave to be run on a computerinorderto determinethe break-evenpoint.To know that thereis a break-evenpoint, it is sufficient toknow that onealgorithmhas complexity c\\n2 + c^nand the otherc^n for someconstantsc\\, C2,and C3.Thereis littleadvantagein determiningthe exactvalues of c\\, C2,and C3.

1.3.3 AsymptoticNotation(O, Q, 9)With the previousdiscussionas motivation, we introducesometerminologythat enablesus to make meaningful (but inexact)statementsabout the timeand spacecomplexitiesof an algorithm.In the remainderof this chapter,the functions / and g arenonnegativefunctions.

Definition1.4[Big \"oh\"] Thefunction f(n) = 0(g(n)) (readas \"/ of n isbig0I1of g of n\") iff (if and only if) thereexistpositive constantsc and nosuchthat f(n) < c* g(n) for alln, n >uq. \342\226\241

Example1.11The function 3n + 2 = 0(n)as 3n + 2 < 4n for alln > 2.3n + 3 = O(n) as 3n + 3 < 4n for all n > 3. lOOn+ 6 = 0(n)aslOOn-Mi< lOlnforall n > 6.10n2+4n+2 = 0(n2)as 10n2+4n+2 < lln2for all n >5.lOOOn2+ lOOn-6 = 0(n2)as lOOOn2+ lOOn-6 < lOOln2 forn> 100.6*2n+n2 = 0(2n)as6*2n+n2<7*2nforn>4.3n+3 = 0(n2)as 3n+ 3 < 3n2 for n >2.10n2+4n +2 = 0(n4) as 10r/2+4n + 2 < 10n4for n > 2. 3n + 2 7^ 0(1)as 3n + 2 is not lessthan or equal to c for anyconstantc and alln > uq. 10n2+4n + 2 7^ O(n), \342\226\241

We write O(l)to mean a computingtime that is a constant.0(n)iscalledlinear, 0(n2)is calledquadratic, 0(n3)is calledcubic,and 0(2n)is calledexponential.If an algorithmtakestimeO(logn),it is faster,forsufficiently largen, than if it had taken0(n).Similarly,O(nlogn)is betterthan 0(n2)but not as goodas 0(n).Theseseven computingtimes-O(l),O(logn),0(n),0(nlogn),0(n2),0(n3),and 0(2ra)-arethe ones we seemostoften in this book.

As illustratedby the previousexample,the statementf(n) = 0(g(n))statesonly that g(n) is an upperbound on the value of /(n) for all n,n > uq. It doesnot say anything about how goodthis bound is. Notice


that n = 0(2n),n = 0{n2-5),n = 0(n3),n = 0(2n),and soon. Forthe statement/(n) = 0(g(n)) tobe informative, g(n) shouldbe as smallafunction of n as onecan comeup with for which f(n) = 0(g(n)).So,whilewe often say that 3n+ 3 = O(n), we almostnever say that 3n+3 = 0(n2),even thoughthis latterstatementis correct.

Fromthe definition of O, it shouldbe clearthat f(n) = 0(g(n)) is notthe sameas 0(g(n)) = f(n).In fact, it is meaninglessto say that 0(g(n)) =f(n).Theuseof the symbol= is unfortunatebecausethis symbolcommonlydenotesthe equalsrelation.Someof the confusionthat resultsfrom the useof this symbol (which is standardterminology)can be avoided by readingthe symbol = as \"is\" and not as \"equals.\"

Theorem1.2obtainsa very useful result concerningthe orderof /(n)(that is,the g(n) in /(n) = 0(g(n))) when /(n) is a polynomial in n.

Theorem1.2If /(n) = amnm -\\ \\- axn+a0,then /(n) = 0(nm).Proof:

/(\302\253)<

\302\243\342\204\242okk

< nmYT=o\\ai\\ni~m< nmYT=oW\\ forn>lSo,/(n) = 0(nm) (assumingthat m is fixed). \342\226\241

Definition1.5[Omega]The function /(n) = Q(g(n)) (readas \"/ of nis omegaof g of n\") iff thereexistpositiveconstantsc and no such thatf(n) > c* g(n) for alln, n > tiq. \342\226\241

Example1.12The function 3n + 2 = fi(n) as 3n + 2 > 3n for n > 1(the inequality holdsfor n > 0,but the definition of fi requiresan no > 0).3n+3 = fl(n) as 3n+3 > 3n for n > 1.100n+6 = Q(n)as 100n+6 > lOOnfor n > 1. 10n2+ 4n + 2 = fi(n2) as 10n2+ 4n + 2 > n2 for n > 1.6 * 2n + n2 = n(2n) as 6 * 2n + n2 > 2n for n > 1. Observealsothat3n+ 3 = 0(1),10n2+4n + 2 = Q(n), 10n2+4n + 2 = fi(l),6 * 2n + n2 =fi(n100),6 * 2n +n2 = fi(n50-2),6 * 2n +n2 = fi(n2),6 * 2n +n2 = fi(n), and6*2n+n2 = n(l). D

As in the caseof the bigoh notation,thereareseveralfunctions g(n) forwhich /(n) = Q(g(n)). The function g(n) is only a lower bound on /(n).Forthe statement/(n) = Q(g(n))to be informative, g(n) shouldbeas largea function of n as possiblefor which the statement/(n) = Q(g(n)) is true.So,while we say that 3n + 3 = Q(n) and 6 * 2n + n2 =

\302\2437(2ra),we almost

never say that 3n + 3 = 0(1)or 6 * 2n + n2 = 0(1),even though both ofthesestatementsarecorrect.

Theorem1.3is the analogueof Theorem1.2for the omeganotation.


Theorem1.3If /(n) = amnm + \342\200\242\342\200\242\342\200\242+ a\\n + ao and am > 0,then /(n) =n(nm).

Proof:Left as an exercise. \342\226\241

Definition1.6[Theta] The function f(n) = @(g(n))(readas \"/ of n istheta, of g of n\") iff thereexistpositiveconstantsci,C2,and no suchthatci<?(n)< /(n) < c29(n)f\302\260r an n, n > riQ. \342\226\241

Example1.13Thefunction 3n + 2 = 6(n)as 3n + 2 > 3n for all n > 2and 3n+2 < 4n for all n > 2,so c\\ = 3, C2 = 4, and no = 2.3n+3 = 0(n),10n2f 4n + 2 = 6(n2),6 * 2n +n2 = 6(2n),and 10*logn+4 = 6(logn).3n+2^(-)(l),3n+3^6(n2),10n2+4n +2^ 6(n),10n2+4n+2 ^ 6(1),6*2\"+n2 7^ 0(n2),6 * 2n + n2 ^ G(n100),and 6 * 2n + n2 ^ 0(1). D

Tlietheta notation is moreprecisethan both the the big oh and omeganotations.The function /(n) = 0(<?(n))iff g(n) is both an upperand lowerboundon /(n).

Noticethat the coefficientsin allof the <?(n)'susedin the precedingthreeexample'shave been1. This is in accordancewith practice. We almostnever find ourselvessaying that 3n + 3 = 0(3n),that 10= 0(100),that10n2f 4n +2 = Q(4n2),that 6 * 2n + n2 = 0(6* 2n),or that 6 * 2n +n2 =0(4* 2\,")even thougheachof thesestatementsis true.

Theorem1.4If /(n) = amnm + \342\200\242\342\200\242\342\200\242+ a\\n + ao and am > 0,then /(n) =6(n7\.


Definition1.7[Little \"oh\"] Thefunction /(n) = o(g(n)) (readas \"/ of nis littleoh of g of n\") iff

lim 44 = 0

\342\226\241

Example1.14The function 3n + 2 = o{n2)sinceHindoo ^= 0.3n+2 = o(nlogn).3n + 2 = o(nloglogn).6 * 2n + n2 = o(3ra).6 * 2n + n2 =o(2nlogn).3n+ 2 ^ o(n). 6 * 2n + nV o(2n). D

Analogousto o is the notationuj defined as follows.


Definition1.8[Little omega]Thefunction f(n)n is littleomegaof g of n\") iff

u)(g(n))(readas \"/ of

limf^n->coj{n)0

\342\226\241

Example1.15Let us reexaminethe timecomplexity analyses of theprevious section.Forthe algorithmSum (Algorithm 1.6)we determinedthattsum(n) = 2n + 3. So,tsUm(n) = 6(n). ForAlgorithm 1.7,tRSum(n) =2n+ 2 = G(n). \342\226\241

Although we might allseethat the O, fi, and 6 notationshave beenusedcorrectlyin the precedingparagraphs,we arestillleft with the question,Ofwhat use are thesenotations if we have to first determinethe stepcountexactly? The answer to this questionis that the asymptotic complexity(i.e.,the complexityin termsof O, fi, and 6) can be determinedquiteeasily without determiningthe exactstepcount. This is usually done byfirst determiningthe asymptotic complexityof eachstatement(or groupofstatements)in the algorithmand then addingthesecomplexities.Tables1.4through1.6dojustthis for Sum,RSum, and Add (Algorithms1.6,1.7,and1.11).

Statement1 AlgorithmSum(a,n)2 {3 s:=0.0;4 for i :=1to n do5 s :=s+a[i];6 return s;7 }Total

s/e0011110

frequency\342\200\224-1n+ 1n1-

totalsteps6(0)6(0)6(1)6(n)6(n)6(1)6(0)0(n)

Table1.4Asymptotic complexityof Sum (Algorithm 1.6)

Although the analyses of Tables1.4through1.6arecarriedout in termsof stepcounts,it is correctto interprettp(n) = 6(<?(n)),tp(n) = Q(g(n)),or tp{n) = 0(g(n)) as a statementabout the computingtimeof algorithmP.This is sobecauseeachsteptakesonly 0(1)timetoexecute.


Statement1 AlgorithmRSum (a,n)2 {3 if (n <0) then4 return0.0;5 elsereturn6 RSum(a,n \342\200\224 1)+a[n]);7 }Total

s/e0011

1+x0

frequencyn = 0 n > 0

1 11 0

0 1

totalstepsn = 0 n > 00 0(0)0 0(0)1 0(1)1 9(0)0 9(1+ 3;)0 9(0)2 9(1+3;)

x = tRSum(n- 1)

Table1.5Asymptotic complexityof RSum (Algorithm 1.7).

Statement1 AlgorithmAdd(a,b,c,m,n)2 {3 for i :=1to m do4 for i :=1to n do5 c[i,j]:=a[i,j]+ b[i,j];6 }Total

s/e001110

frequency

0(m)9(mn)9(mn)

totalsteps9(0)9(0)0(m)9(mn)@(mn)9(0)9(mn)

Table1.6Asymptotic complexityof Add (Algorithm 1.11)


After you have had someexperienceusing the tablemethod,you willbe in a positionto arrive at the asymptotic complexity of an algorithmbytakinga moreglobalapproach.We elaborateonthis methodin the followingexamples.

Example1.16[Permutation generator]ConsiderPerm (Algorithm1.4).Whenk = n, we seethat the timetaken is 0(n).When k < n, the elseclauseisentered.At this time,the secondfor loopis enteredn \342\200\224 k + 1times.Eachiterationof this looptakes0(n+ tperm(k + l>n)) time. So, \302\243perm(&, n) =@((n\342\200\224k + l)(n +tpem(k+ l, n))) when k < n. Since

\302\243perm(A: +1,n) is atleastn when k +1< n, we get tperm(k, n) = Q((n \342\200\224 k + l)tperm(k+l, n)) fork < n.Usingthe substitutionmethod,we obtain tperm (l,n)= 9(n(n!)),n > 1. \342\226\241

Example1.17[Magicsquare]The nextexamplewe consideris a problemfrom recreationalmathematics.A magicsquareis an n x n matrixof theintegers1to n2 suchthat the sum of every row, column,and diagonalis thesame.Figure1.2gives an examplemagicsquarefor the casen = 5.In thisexample,the commonsum is 65.

15

16

22

3

9

8

14

20

21

2

1

7

13

19

25

24

5

6

12

18

17

23

4

10

11

Figure1.2Examplemagicsquare

H. Coxeterhas given the following simplerule for generatinga magicsquarewhen n is odd:

Startwith 1 in the middleof the toprow; then go up and left,assigningnumbersin increasingorderto empty squares;if youfall off the squareimaginethe samesquareas tiling the planeand continue;if a square is occupied,move down insteadandcontinue.


Themagicsquareof Figure1.2wasformedusingthisrule.Algorithm 1.15is for creatingan n x n magicsquarefor the casein which n is odd.Thisresultsfrom Coxeter'srule.

The magicsquareis representedusinga two-dimensionalarray having nrows and n columns.For this applicationit is convenientto numbertherows (andcolumns)from 0 to n\342\200\224 1ratherthan from 1to n.Thus,when thealgorithm\"falls off the square,\"the mod operatorsetsi and/orj back to0 or ti. ~ 1.

Thetimeto initializeand output the squareis 0(n2).Thethird for loop(in which key rangesover 2 throughn2) is iteratedn2 \342\200\224 1timesand eachiterationtakes0(1)time. So,this for looptakes0(n2)time. Hencetheoverall timecomplexityof Magicis 0(n2).Sincethereare n2 positionsinwhich the algorithmmust placea number,we see that 0(n2)is the bestboundan algorithmfor the magicsquareproblemcan have. \342\226\241

Example1.18[Computing xn] Our final exampleis to computexn for anyrealnumberx and integern > 0.A naive algorithmfor solvingthis problemis to perform n \342\200\224 1multiplicationsas follows:

power:=x;for i :=1to n \342\200\224 1dopower:=power* x;

This algorithmtakes0(n)time. A betterapproachis to employ the\"repeated squaring\"trick. Considerthe specialcasein which n is an integralpower of 2 (that is,in which n equals2*1for someintegerk). Thefollowingalgorithmcomputesxn.

power:=x\\

for i :=1to A; dopower:=power2;

Thevalue ofpowerafter q iterationsof the for loopis x . Therefore,thisalgorithm takesonly 0(A;) = O(logn)time,which is a significantimprovementover the run timeof the first algorithm.

Can the samealgorithmbe usedwhen n is not an integralpower of 2?Fortunately, the answer is yes. Let 6/c^/c-i'''lô be the binaryrepresentation of the integern.This meansthat n = Ylq=obq29-Now,

xn = x^=0^29=(x)b\302\260

* (x2)fci * (x4)b* * ...* (x2k)b*

Alsoobservethat bo is nothingbut n mod2 and that [n/2\\is b^bk-i \342\226\240\342\226\240\342\226\240&iin binary form. Theseobservationsleadus to Exponentiate(Algorithm 1.16)for computingxn.


123456789101112131415161718192021222324252627282930313233

AlgorithmMagic(n)//{

}

Createa magicsquareof sizen, n beingodd.

if{}

((n mod2) = 0) thenwrite (\"n is even\;")return;

else{

}

for i :=0to n \342\200\224 1do // Initializesquaretozero.for j :=0to n -1do square[i,j]:=0;

square[0,(n \342\200\224 l)/2]:=1;// Middle of first row// (i,j)is the currentposition.j :=(n- l)/2;for key :=2 to n2 do{

// Move up and left.Thenexttwo if statements// may bereplacedby the mod operatorif// \342\200\2241 modn has the value n \342\200\224 1.if (i > 1) thenk :=i \342\200\224 1;elsek :=n \342\200\224 1;if (j > 1) thenI :=j \342\200\224 1;else/ :=n \342\200\224 1;if (square[k,I] > 1) theni :=(i + 1) modn;else// sgwarefA;,/] is empty.{

i :=k; j :=I;}square[i,j]:=key,

}II Output the magicsquare.for i :=0 to n \342\200\224 1do

for j :=0 to n \342\200\224 1dowrite (square[i,j]);

Algorithm1.15Magicsquare


1 AlgorithmExponentiate^,n)2 // Return xn for an integern > 0.3 {4 rn :=n; power:= 1;z :=a;;5 while (m > 0) do6 {7 while ((m mod2) = 0) do8 {9 m := Lm/2J;z :=z2;10 }11 m :=m \342\200\224 1;power:=power* z;12 }1.5 returnpower;14 }

Algorithm1.16Computationof x;

Provingthe correctnessof this algorithmis left as an exercise.Thevariable rn startswith the value of n, and after every iterationof the innermostwhile loop(line7), itsvalue decreasesby a factor of at least2.Thus therewill beonly O(logn) iterationsof the while loopof line7. Eachsuchiteration takes6(1)time.Whenever controlexitsfrom the innermostwhileloop,the value of m is oddand the instructionsrn :=rn \342\200\224 1;power:=power* z;areexecutedonce.After this execution,sincern becomeseven,eithertheinnermostwhile loopis enteredagain or the outermostwhile loop(line5) is exited(in casern \342\200\224 0). Thereforethe instructionsrn := rn \342\200\224 1;power := power* z\\ can only be executedO(logn) times. In summary,the overall run timeof Exponentiateis 6(logn). \342\226\241

1.3.4 PracticalComplexitiesWe have seenthat the timecomplexityof an algorithmis generally somefunction of the instancecharacteristics.This function is very useful indetermining how the timerequirementsvary as the instancecharacteristicschange;.The complexityfunction can alsobe used to comparetwoalgorithms /' and Q that performthe sametask. Assume that algorithmP hascomplexity0(n)and algorithmQ has complexity6(n2).We canassertthatalgorithmP is faster than algorithmQ for sufficiently largen. To seethevalidity of this assertion,observethat the computingtimeof P is bounded


from above by en for someconstantc and for alln, n > n\\, whereasthat ofQ is boundedfrom belowby dn2 for someconstantd andalln, n > ri2. Sinceen < dn2 for n > c/d,algorithmP is faster than algorithmQ whenever n> max{ni,n2,c/d}.

You shouldalwaysbecautiously awareof the presenceof the phrase\"sufficiently large\" in an assertionlike that of the precedingdiscussion.Whendecidingwhich of the two algorithmsto use,you must know whetherthen you aredealingwith is,in fact, sufficiently large.If algorithmP runs in106nmilliseconds,whereasalgorithmQ runs in n2 milliseconds,and if youalwayshave n < 106,then, otherfactorsbeingequal,algorithmQ is the oneto use.

To get a feel for how the various functions grow with n, you areadvisedto study Table1.7and Figure1.3very closely.It is evident from Table1.7and Figure1.3that the function 2n grows very rapidly with n. In fact, ifan algorithmneeds2n stepsfor execution,then when n = 40, the numberof stepsneededis approximately1.1* 1012.On a computerperformingonebillionstepspersecond,this would requireabout 18.3minutes. If n = 50,the samealgorithmwould run for about 13days on this computer.When n= 60,about 310.56years arerequiredto executethe algorithmand when n= 100,about4*10 years areneeded.So,we may concludethat the utilityof algorithmswith exponentialcomplexityis limitedto smalln (typicallyn <40).

logn012345

n1248

1632

nlogn028

2464

160

n114

1664

2561,024

n618

64512

4,09632,768

2n24

16256

65,5364,294,967,296

Table1.7Functionvalues

Algorithms that have a complexitythat is a polynomial of high degreearealsoof limitedutility. For example,if an algorithmneedsn10steps,thenusingour 1-billion-steps-per-secondcomputerwe need10secondswhen n= 10,3171years when n = 100,and 3.17*1013years when n = 1000.If thealgorithm'scomplexityhad beenn3 stepsinstead,then we would needonesecondwhen n = 1000,110.67minuteswhen n = 10,000,and 11.57dayswhen n = 100,000.


/

60

50

40

30 _

20

10

n log n

0 12 3 4 5 6 7

Figure1.3Plotof function values


Table1.8gives the timeneededby a one-billion-steps-per-secondcomputer to executean algorithmof complexityf(n) instructions.You shouldnote that currently only the fastestcomputerscan executeabout 1billioninstructionspersecond.Froma practicalstandpoint,it is evident that forreasonablylargen (say n > 100),only algorithmsof smallcomplexity(suchas n, nlogn,n2, and n3) arefeasible.Further, this is the caseeven if youcouldbuild a computercapableof executing1012instructionsper second.In this case,the computingtimesof Table1.8would decreaseby a factor of1000.Now, when n = 100,it would take 3.17years to executen10instructions and 4 * 1010years to execute2n instructions.

n1020304050100

1,00010,000100,000

1,000,000

Time for f(n) instructions on a 10 instr/sec computer!(n)= n

.01lis.02lis.03lis.04lis.05lis.1lis1 lis10lis

100 lis1ms

/(n) = n log, n.03lis.09jis.15lis.21lis.28(is.66lis9.96lis130 lis1.66ms19.92ms

/(\") = nA.1lis.4lis.9lis1.6lis2.5lis10 lis1ms

100ms10 s

16.67min

f(n)=n*1 lis8 lis

27 lis64 lis125 lis1ms1s

16.67min11.57d31.71yr

/(\") = \302\2534

10 fiS160 lis810jiS2.56ms6.25ms100ms

16.67mill115.7d3171yr

3.17*107yr

f(n)=nW10s

2.84 hr6.83d121.36d3.1yr3171yr

3.17*1013yr3.17*1023yr3.17*1033yr3.17*1043yr

/(\302\253)= 2\"

1 flS1ms1s

18.3min13d

4*1013yr32*10283yr

Table1.8Timeson a 1-billion-steps-per-secondcomputer

1.3.5PerformanceMeasurementPerformancemeasurementis concernedwith obtainingthe spaceand timerequirementsof a particularalgorithm.Thesequantitiesdependon thecompilerandoptionsusedas well as onthe computeronwhich the algorithmis run. Unlessotherwisestated,all performancevaluesprovided in this bookareobtainedusingthe Gnu C++compiler,the default compileroptions,andthe Sparc10/30computerworkstation.

In keepingwith the discussionof the precedingsection,we do not concernourselveswith the spaceand timeneededfor compilation.We justify thisby the assumptionthat eachprogram(after it has beenfully debugged)iscompiledonceand then executedseveraltimes. Certainly, the spaceandtimeneededfor compilationare importantduringprogramtesting,whenmoretimeis spenton this taskthan in runningthe compiledcode.

We do not considermeasuringthe run-timespacerequirementsof aprogram. Rather, we focus on measuringthe computingtimeof a program.To obtain the computing(or run) timeof a program,we needa clockingprocedure.We assumethe existenceof a programGetTime()that returnsthe currenttimein milliseconds.


Supposewe wish to measurethe worst-caseperformanceof the sequentialsearchalgorithm(Algorithm 1.17).Beforewe can do this, we need to (1)decideon the values of n for which the timesare to be obtainedand (2)determine,for eachof the above valuesof n, the datathat exhibitthe worst-casebehavior.

1 AlgorithmSeqSearch(a,x,n)2 // Searchfor x in a[l:n].a[0] is usedas additionalspace.3 {4 i :=n; a[0] :=x;5 while (a[i]^ x) do i :=i \342\200\224 1;6 returni;7 }

Algorithm1.17Sequentialsearch

Thedecisionon which valuesof n to useis basedonthe amountof timingwe wish to performand alsoon what we expectto do with the timesoncethey areobtained.Assume that for Algorithm 1.17,our intent is simply topredicthow long it will take,in the worst case,to searchfor x, given thesizen of a. An asymptotic analysis revealsthat this timeis 6(n).So,weexpecta plot of the timesto bea straightline.Theoretically,if we know thetimesfor any two values of n, the straight line is determined,and we canobtainthe timefor allother values of n from this line.In practice,we needthe timesfor morethan two values of n.This is sofor the following reasons:

1.Asymptotic analysis tellsus the behavioronly for sufficiently largevalues of n. Forsmallervalues of n, the run timemay not follow theasymptotic curve. To determinethe point beyond which theasymptotic curve is followed,we needto examinethe timesfor severalvaluesof n.

2. Even in the regionwherethe asymptotic behavioris exhibited,thetimesmay not lie exactlyon the predictedcurve (straight line inthe caseof Algorithm 1.17)becauseof the effects of low-ordertermsthat are discardedin the asymptotic analysis. For instance,analgorithm with asymptotic complexityB(n) can have timecomplexityc\\n +C2logn+C3or,for that matter,any otherfunction of n in whichthe highest-ordertermis c\\n for someconstantc\\, c\\ >0.

It is reasonabletoexpectthat the asymptotic behaviorof Algorithm 1.17beginsfor somen that is smallerthan 100.So,for n > 100,we obtainthe


run timefor just a few values.A reasonablechoiceis n = 200,300,400,..., 1000.Thereis nothingmagicalabout this choiceof values. We can justas well use n = 500,1,000,1,500,...,10,000or n = 512,1,024,2,048,...,215.It costsus morein termsof computertimeto use the latter choices,and we probably do not get any better information about the run timeofAlgorithm 1.17usingthesechoices.

For n in the range[0, 100]we carry out a more-refinedmeasurement,sincewe are not quite sure wherethe asymptotic behaviorbegins.Of course,ifourmeasurementsshow that the straight-linebehaviordoesnot beginin thisrange,we have toperforma more-detailedmeasurementin the range [100,200],and so on, until the onset of this behavioris detected.Timesin therange [0, 100]areobtainedin stepsof 10beginningat n = 0.

Algorithm 1.17exhibitsitsworst-casebehaviorwhenx is chosensuchthatit is not one of the a[i]'s.For definiteness,we set a[i] \342\200\224 i, 1< i < n, andx = 0.At this time,we envisionusingan algorithmsuchas Algorithm 1.18to obtainthe worst-casetimes.

1 AlgorithmTimeSearchQ2 {3 for j :=1to 1000do a[j]:=j;4 for j :=1to 10do5 {6 n[j]:=10* (j-1);n[j+ 10]:=100* j;7 }8 for j :=1to 20 do9 {10 h :=GetTime();11 k :=SeqSearch(a,0,n[j]);12 hi :=GetTime();13 t:=hl-h]14 write (n[j],t);15 }16 }

Algorithm1.18Algorithm to timeAlgorithm 1.17

The timing resultsof this algorithmis summarizedin Table1.9.Thetimesobtainedare toosmalltobe of any use to us. Most of the timesarezero;this indicatesthat the precisionof ourclockis inadequate.Thenonzerotimesarejust noiseand arenot representativeof the timetaken.


n0

102030405060708090

time0000000000

n100200300400500600700800900

1000

time0010100100

Table1.9Timingresultsof Algorithm 1.18.Timesare in milliseconds.

To time a shortevent, it isnecessaryto repeat it severaltimesanddivide the total time for the event by the number of repetitions.

Sinceourclockhasan accuracyof aboutone-tenthof a second,we shouldnot attempttotimeany singleevent that takeslessthan about onesecond.With an event timeof at leastten seconds,we canexpectourobservedtimesto beaccurateto onepercent.

Thebody of Algorithm1.18needsto bechangedtothat of Algorithm 1.19.In this algorithm,r[i] is the numberof timesthe searchis to be repeatedwhen the numberof elementsin the array is n[i\\. Noticethat rearrangingthe timingstatementsas in Algorithm 1.20or 1.21doesnot producethedesired results.For instance,from the dataof Table1.9,we expectthat withthe structureof Algorithm 1.20,the value output for n = 0 will stillbe0.This is becausethereis a chancethat in every iterationof the for loop,theclockdoesnot changebetweenthe two timesGetTimeQis called.With thestructureof Algorithm 1.21,we expectthe algorithmnever to exitthe whileloopwhen n = 0 (in reality, the loopwill beexitedbecauseoccasionallythemeasuredtimewill turn out tobea few milliseconds).

Yet anotheralternativeis shown in Algorithm 1.22.This approachcanbeexpectedto yield satisfactory times.It cannotbeusedwhen the timingprocedureavailable gives us only the timesincethe last invocationof Get-Time.Another difficulty is that the measuredtimeincludesthe timeneededto readthe clock.For smalln, this timemay belargerthan the timetorunSeqSearch.This difficulty can be overcomeby determiningthe timetakenby the timingprocedureand subtractingthis timelater.


123456789101112131415161718192021

AlgorithmTimeSearch(){

}

// Repetitionfactorsr[21]:={0,200000,200000,150000,100000,100000,100000,

50000,50000,50000,50000,50000,50000,50000,50000,50000,50000,25000,25000,25000,25000};

for j :=1to 1000do a\\j] :=j;for j :=1to 10do{

n[j]:=10* (j-1);n\\j + 10]:=100* j;}for j :=1to 20do{

h :=GetTime();for i :=1to r[j]do k :=SeqSearch(a,0,n[j]);hi :=GetTime();tl :=hl-/i;t :=tl; t :=t/r[j};write {n[j],tl, t);

}

Algorithm1.19Timingalgorithm

123456789

t:=0',for i :=1to r[j]do{

h :=GetTime();k :=SeqSearch(a,0,n[j']);hi :=GetTime();t:=t+hl~h;

}t :=t/r[j];

Algorithm1.20Impropertimingconstruct


1234567

t :=0;while (t < DESIRED.TIME)do{

h :=GetTime();k :=SeqSearch(a,0,n[j])',hi :=GetTime();t:=t+hl-h;

Algorithm1.21Another impropertimingconstruct

1 h :=GetTime();* :=0;2 while (t < DESIRED.TIME)do3 {4 k :=SeqSearch(a,0,n[j]);5 hi :=GetTime();6 t:=hl-h;7 }

Algorithm1.22An alternatetimingconstruct


Timingresultsof Algorithm 1.19,is given in Table1.10.Thetimesfor nin the range [0, 1000]areplottedin Figure1.4.Values in the range[10,100]have not beenplotted.Thelineardependenceof the worst-casetimeon nis apparentfrom this graph.

n0

102030405060708090

tl308923

1181108713841691999

115613061460

t

0.0020.0050.0080.0110.0140.0170.0200.0230.0260.029

n100200300400500600700800900

1000

tl1683335946936323779993105419620169947725

t

0.0340.0670.0940.1260.1560.1860.2170.2480.2800.309

Timesare in milliseconds

Table1.10Worst-caserun timesfor Algorithm 1.17

Thegraph of Figure1.4can be used to predictthe run timefor othervalues of n.We can goonestepfurther and get the equationof the straightline. The equationof this line is t = c + mn, wherem is the slopeand cthe value for n = 0.Fromthe graph,we seethat c = 0.002.Usingthe pointn = 600and t = 0.186,we obtain m = (t -c)/n= 0.184/600= 0.0003067.Sothe lineof Figure1.4has the equationt = 0.002+ 0.0003067n,wheretis the timein milliseconds.Fromthis, we expectthat when n = 1000,theworst-casesearchtimewill be0.3087millisecond,and when n = 500,it willbe0.155millisecond.Comparedto the observedtimesof Table1.10,we seethat thesefiguresarevery accurate!

Summary ofRunning TimeCalculationTo obtain the run timeof a program,we needto plan the experiment.Thefollowing issuesneedto beaddressedduringthe planningstage:

1.What is the accuracyof the clock?Howaccuratedo ourresultshave tobe? Oncethe desiredaccuracyis known, we can determinethe lengthof the shortestevent that shouldbe timed.

1.3.1PERFORMANCEANALYSIS 47

0.40 i\342\200\224

100 200 300 400 500 600 700 800 900 1000

n >

Figure1.4Plotof the data in Table1.10


2. For eachinstancesize,a repetitionfactor needsto bedetermined.Thisis to bechosensuchthat the event timeis at leastthe minimum timethat can beclockedwith the desiredaccuracy.

3.Are we measuringworst-caseor averageperformance?Suitabletestdataneedto begenerated.

4. What is the purposeof the experiment?Are the timesbeingobtainedfor comparativepurposes,orarethey to beusedto predictrun times?If the latter is the case,then contributionsto the run timefrom suchsourcesas the repetitionloopand data generationneedto besubtracted (incasethey areincludedin the measuredtime).If the formeris the case,then thesetimesneednot besubtracted(providedthey arethe samefor allprogramsbeingcompared).

5.In casethe timesareto beusedtopredictrun times,thenwe needto fit

a curve throughthe points.For this, the asymptotic complexityshouldbe known. If the asymptotic complexityis linear,then a least-squaresstraight linecan be fit; if it is quadratic,then a parabolacan beused(that is,t = clq + a\\n + a2n2). If the complexityis O(nlogn),then aleast-squarescurve of the form t = ao + a\\n + a2nlog2n can be fit.When obtainingthe least-squaresapproximation,one shoulddiscarddata correspondingto smallvalues of n, sincethe programdoesnotexhibitits asymptotic behaviorfor thesen.

GeneratingTestData

Generatinga dataset that resultsin the worst-caseperformanceof analgorithm is not always easy. In somecases,it is necessaryto use a computerprogramto generatethe worst-casedata. In othercases,even this is verydifficult. In thesecases,anotherapproachto estimatingworst-caseperformance is taken. For eachset of values of the instancecharacteristicsofinterest,we generatea suitably largenumberof randomtest data. Theruntimesfor eachof thesetestdataareobtained.The maximumof thesetimesis used as an estimateof the worst-casetimefor this set of values of theinstancecharacteristics.

To measureaverage-casetimes,it is usually not possibletoaverageoverallpossibleinstancesof a given characteristic.Although it is possibleto dothis for sequentialsearch,it is not possiblefor a sortalgorithm.If we assumethat all keys are distinct,then for any given n, n!different permutationsneedto beused to obtain the averagetime.Obtainingaverage-casedata isusually much harderthan obtainingworst-casedata. So,we often adoptthestrategy outlinedabove and simply obtain an estimateof the averagetimeon a suitablesetof test data.


Whether we areestimatingworst-caseoraveragetimeusingrandomdata,the numberof instancesthat we can try is generallymuch smallerthanthe total numberof such instances.Hence,it is desirableto analyze thealgorithmbeingtestedto determineclassesof datathat shouldbegeneratedfor the (experiment.This is a very algorithm-specifictask,and we do not gointo it here.

EXERCISES1.Comparethe two functions n2 and 2n/4 for various values of n.

Determine when the secondbecomeslargerthan the first.

2. Prove by induction:

(a) \302\243\"=i*= n(n+l)/2,n> 1

(l>) E?=i*2= n(n + l)(2n+ l)/6,n > 1(<\342\200\242) E\"=o^ = (xn+1-l)/(x-l),x^l,n>0

3. Determinethe frequencycountsfor allstatementsin the following twoalgorithmsegments:

L for i :=1to n do2 for j :=1to i do3 for k :=1toj do4 x :=x + 1;

123456

i :\342\226\240

= i;while (i <n) do{

}

x :=x + 1;i :=i + 1;

(a) (b)

4. (a) Introducestatementsto incrementcountat allappropriatepointsin Algorithm 1.23.

(I)) Simplify the resultingalgorithmby eliminatingstatements.Thesimplifiedalgorithmshouldcomputethe samevalue for countascomputedby the algorithmof part (a).

(c) What is the exactvalue of countwhen the algorithmterminates?You may assumethat the initialvalue of count is 0.

(d) Obtain the step count for Algorithm 1.23using the frequencymethod.Clearly show the stepcounttable.

5. Do Exercise4 for Transpose(Algorithm 1.24).6. Do Exercise4 for Algorithm 1.25.Thisalgorithmmultipliestwo n x n

matricesa and b.


12345678910111213

AlgorithmD(x,n)1

}

t:=l;repeat{

x[i]:=x[i]+2;i :=i +2;}until(i > n);t:=l;while (i < [n/2j)do{

x[i]:=x[i]+x[i+ 1];i :=i + 1;}

Algorithm1.23Examplealgorithm

1 AlgorithmTranspose(a,n)2 {3 for i :=1to n \342\200\224 1do4 for j :=i + 1to n do5 {6 * :=a[tj];a[i,j]:=a\\j,i\\; a\\j,i]:=*;7 }8 }

Algorithm1.24Matrixtranspose


1 AlgorithmM u It (a,6,c,n)2 {3 for i :=1to n do4 for j :=1to n do5 {6 c[t,j]:=0;7 for A; :=1to n do8 c[t,j] :=c[t,j]+a[i,k] * b[k,j];9 }II) }

Algorithm1.25Matrixmultiplication

7. (a) Do Exercise4 for Algorithm 1.26.Thisalgorithmmultipliestwomatricesa and b, wherea is an m x n matrixand b is an n x pmatrix.

1 AlgorithmMult(a,6,c,m,n,p)2 {3 for i :=1to m do4 for j :=1top do5 {6 c[*,j]:=0;7 for A; :=1to n do8 c[i,j]:=c[i,j]+a[i,k]* b[k,j];9 }11) }

Algorithm1.26Matrixmultiplication

(b) Underwhat conditionsis it profitable tointerchangethe twooutermost for loops?

8.Show that the following equalitiesarecorrect:

(a) 5n2-6n= 0(n2)(b) n! = 0(nn)(<:) 2n22n +nlogn= 0(n22n)(<i) e?=o\302\2532

=\302\251(\302\2533)


(e) ELo*3=\302\251(n4)-

(f) n27l+6*2\"= 0(n2\(g) n3 + 106n2= 0(n3)(h) 6n3/(logn+ l) = 0(n3)(i) n1M1+nlogn= 0(n1-001)(j) nfc+e+nk logn= 0(nfc+e)for allfixed k and e, A; > 0 and e > 0

(k) 10n3+ 15n4+ 100n22n= O(100n22n)(1)33n3 +4n2 = fi(n2)

(m) 33n3 +4n2 = fi(n3)

9.Show that the following equalitiesareincorrect:

(a) 10n2+9 = 0(n)(b) n2logn=

\302\251(n2)

(c) n2/logn= 0(n2)(d) n32n + 6n23n = 0(n32n)

10.ProveTheorems1.3and 1.4.11.Analyze the computingtimeof SelectionSort(Algorithm 1.2).12.Obtainworst-caserun timesfor SelectionSort(Algorithm 1.2).Do this

for suitablevalues of n in the range [0, 100].Your reportmust includea plan for the experimentas well as the measuredtimes.Thesetimesare to be provided both in a tableand as a graph.

13.Considerthe algorithmAdd (Algorithm1.11).(a) Obtainrun timesfor n = 1,10,20,..., 100.(b) Plotthe timesobtainedin part (a).

14.Do the previousexercisefor matrixmultiplication(Algorithm 1.26).15.A complex-valuedmatrixX is representedby apairof matrices(A, B),

whereA and B containrealvalues.Write an algorithmthat computesthe productof two complex-valuedmatrices(A,B) and (C,D),where(A, B) * (C,D) = (A +iB)* (C+ iD) = (AC -BD)+ i(AD+BC).Determinethe numberof additionsand multiplicationsif the matricesarealln x n.

1.4.RANDOMIZEDALGORITHMS 53

1.4 RANDOMIZEDALGORITHMS

1.4.1BasicsofProbabilityTheoryProbability theory has the goalof characterizingthe outcomesof natural orconceptual\"experiments.\"Examplesof suchexperimentsincludetossingacointen times,rollinga diethreetimes,playing a lottery, gambling,pickinga ballfrom an urn containingwhite and redballs,and soon.

Eachpossibleoutcomeof an experimentis calleda samplepoint and theset of all possibleoutcomesis known as the samplespaceS. In this textwe assumethat S is finite (sucha samplespaceis calleda discretesamplespace).An event E is a subsetof the samplespaceS. If the samplespaceconsistsof n samplepoints,then thereare2n possibleevents.

Example1.19[Tossingthreecoins] When a coin is tossed,thereare twopossibleoutcomes:heads (H) and tails (T).Considerthe experimentofthrowing threecoins.Thereare eight possibleoutcomes:HHH,HHT,HTH,HTT,THH,THT,TTH,and TTT.Eachsuchoutcomeis a samplepoint.Thesets{HHT,HTT,TTT},{HHH,TTT},and { } are threepossibleevents.Thethird event has no samplepointsand is the empty set.Forthis experimentthereare28 possibleevents. \342\226\241

Definition1.9[Probability] Theprobability of an event E is dennedto be[gj,whereS is the samplespace. \342\226\241

Example1.20[Tossingthree coins] The probability of the event {HHT,HTT,TTT}is jj.Theprobability of the event {HHH,TTT}is f and thatof the event { } is zero. \342\226\241

Note that the probability of S,the samplespace,is 1.

Example1.21[Rolling two dice] Let us lookat the experimentof rollingtwo (six-faced)dice.Thereare 36 possibleoutcomessomeof which are(1,1),(1,2),(1,3),....What is the probability that the sum of the two facesis 10?Theevent that the sum is 10consistsof the following samplepoints:(1,9),(2,8),(3,7),(4,6),(5,5),(6,4),(7,3),(8,2), and (9,1).Therefore,theprobability of this event is ^ = ^. \342\226\241

Definition1.10[Mutual exclusion]Two events E\\ and E2 are said to bemutually exclusiveif they do not have any commonsamplepoints,that is,if\302\243inJS2 = $. \342\226\241


Example1.22[Tossingthreecoins]When we tossthreecoins,letE\\ be theevent that therearetwo i/'sand letE2 be the event that thereareat leasttwo T\"s.Thesetwo events aremutually exclusivesincethereareno commonsamplepoints.On the otherhand, if E2 is dennedto bethe event that thereis at leastoneT, then E\\ and E2 will not be mutually exclusivesincetheywill have THH,HTH,and HHTas commonsamplepoints. \342\226\241

Theprobability of event E is denotedas Prob.[E].The complementofE, denotedE, is dennedto be S \342\200\224 E. If E\\ and E2 are two events, theprobability of E\\ or E2 or both happeningis denotedas Prob.\\E\\ U E2].Theprobability of both E\\ and E2 occurringat the sametimeis denotedasProb.[E\\fl E2].Thecorrespondingevent is E\\ fl E2.

Theorem1.51.Prob.[E]= 1-Prob.[E].

Prob.[EiU E2] = Prob.[Ei]+Prob.< Prob.[Ei]+Prob.

E2E2

Prob. n E2]

Definition1.11[Conditional probability] Let E\\ and E2 be any two eventsof an experiment.The conditionalprobability of E\\ given E2,denotedbyProb.[Ex\\E2lis defined as Pffggf]. \342\226\241

Example1.23[Tossingfour coins]Considerthe experimentof tossingfourcoins.Let E\\ be the event that the numberof i/'sis even and let E2 bethe event that thereis at leastoneH. Then,E2 is the complementof theevent that thereare no i/'s.The probability of no i/'sis ^. Therefore,Prob.[E2]= 1- Jg

= if. Prob.[EiD E2] is ^ sincethe event Ex fl E2has the seven samplepoints HHHH,HHTT,HTHT,HTTH,THHT,THTH,and TTHH.Thus,Prob.[Ei\\E2]is j0^ = ^. \342\226\241

Definition1.12[Independence]Two events E\\ and E2 aresaidtobeindependent if Prob.[Eifl E2] = Prob.[Ei]* Prob.[E2]. \342\226\241

Example1.24[Rolling a die twice] Intuitively, we say two events E\\ andE2 areindependentif the probability of oneevent happeningis in no wayaffected by the occurrenceof theotherevent. Inotherwords, if Prob.\\E\\ \\E2]

=Prob.[E\\],thesetwo events are independent.Supposewe rolla dietwice.What is the probability that the outcomeof the secondrollis 5 (callthisevent E\\), given that the outcomeof the first rollis 4 (callthis event E2)lTheanswer is ^ no matterwhat the outcomeof the first rollis.In this caseE\\ and E2 areindependent.Therefore,Prob.[E\\fl E2] = I * I = ^- n


Example1.25[Flippinga coin 100times] If a coinis nipped100timeswhatis the probability that allof the outcomesaretails?Theprobability that thefirst outcomeis T is ^. Sincethe outcomeof the secondflip is independentof the outcomeof the first flip, the probability that the first two outcomesare T\"s can be obtainedby multiplying the correspondingprobabilitiestoget |.Extendingthe argumentto all 100outcomes,we concludethat the

/ ,\\ 100probability of obtaining100T'sis f ^ J .In this casewe say the outcomesof the 100coinflips aremutually independent. \342\226\241

Definition1.13[Random variable] LetSbe the samplespaceof anexperiment. A random variable on S is a function that mapsthe elementsof Sto the set of realnumbers.Forany samplepoint s 6 5,X(s) denotestheimageof .sunder this mapping.If the rangeof X, that is,the setof valuesX can take,is finite, we say X is discrete.

Let the rangeof a discreterandomvariable X be {?\"!,r2,...,rm}.Then,Prob.[X= rj],for any i, is defined to be the the numberof samplepointswhose imageis rt divided by the numberof samplepoints in S.In this textwe areconcernedmostly with discreterandomvariables. \342\226\241

Example1.26We flip a coinfour times.The samplespaceconsistsof 24samplepoints.We can define a random variable X on S as the numberof heads in the coinflips. Forthis randomvariable,then, X(HTHH)= 3,X(HHHH)= 4, and soon.Thepossiblevaluesthat X can take are0,1,2, 3,and 4. Thus X is discrete.Prob.[X= 0] is ^, sincethe only samplepointwhose imageis 0 is TTTT.Prob.[X= 1] is ^,sincethe four samplepointsHTTT,THTT,TTHT,and TTTHhave 1as their image. \342\226\241

Definition1.14[Expectedvalue] If the samplespaceof an experimentisS \342\200\224 {si,s2,\342\200\242\342\200\242

\342\200\242, sn},the expectedvalue or the meanof any randomvariableX is defined tobe

\302\243\342\204\242=i Prob.[st]* X(Si) = \302\261 Zti XM- \302\260

Example1.27[Coin tosses]Thesamplespacecorrespondingto theexperiment of tossingthreecoinsis S = {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}.If X is the numberof headsin the coinflips, then theexpectedvalue of X is |(3+2 +2+1+ 2 + 1+ 1+0) = 1.5. \342\226\241

Definition1.15[Probability distribution] LetX bea discreterandomvariable defined over the samplespaceS. Let {n,r2,...,rm} be its range.Then, the probability distribution of X is the sequenceProb.[X= r{\\,

Prob.[X= r2],..., Prob.[X= rrn\\. Noticethat YT=iProh\\x= n] = 1-\342\226\241


Example1.28[Coin tosses]If a coin is nippedthreetimesand X is thenumberof heads,then X can take on four values, 0, 1,2, and 3. Theprobability distributionof X is given by Prob.[X= 0] =

\\, Prob.[X= 1]=\302\247,

Prob.[X= 2] =\302\247,

and Prob.[X= 3] =\302\261.

\342\226\241

Definition1.16[Binomial distribution] A Bernoullitrial is an experimentthat has two possibleoutcomes,namely,successand failure.Theprobabilityof successis p. Considerthe experimentof conductingthe Bernoullitrialntimes.Thisexperimenthas a samplespaceS with 2n samplepoints.LetXbe a randomvariable on S dennedto be the numbersof successesin the ntrials.ThevariableX is saidtohave a binomialdistributionwith parameters(n, p).Theexpectedvalue of X is rip. Also,

Prob.[X= i] = (\"Vtl-p)\"-*\342\226\241

In severalapplications,it is necessaryto estimatethe probabilitiesat thetail endsof probability distributions.Onesuchestimateis provided by thefollowing lemma.

Lemma 1.1[Markov'sinequality] If X is any nonnegativerandomvariablewhose meanis /i, then

Prob.lX> x] < -x

\342\226\241

Example1.29Let \\xbe the mean of a randomvariable X. We can use

Markov's lemma(also calledMarkov's inequality) to make the followingstatement:\"The probability that the value of X exceeds2/j, is < ^.\"Consider the example:if we tossa coin1000times,what is the probability thatthe numberof heads is > 600?If X is the numberof heads in 1000tosses,then, the expectedvalue of X, E[X],is 500.Applying Markov's inequalitywith x = 600and \\x

= 500,we infer that P[X > 600] <\302\247.

\342\226\241

ThoughMarkov's inequality can be appliedto any nonnegativerandomvariable,it is ratherweak. We can obtain tighter boundsfor a numberofimportantdistributionsincludingthe binomialdistribution.Theseboundsaredue to Chernoff. Chernoffboundsas appliedto the binomialdistributionareemployed in this text to analyze randomizedalgorithms.


Lemma 1.2[Chernoff bounds]If X is a binomialwith parameters(n, p),and m > np is an integer,then

Prob.(X>m)< (^Yeim~np\\ (1.1)\\ m J

Also, Prob.(X< L(l-e)H)< e(-e2np/2) (1.2)and Pro6.(X>f(l+e)npl)< e(-f2np/3) (1.3)

for all 0 < e < 1. \342\226\241

Example1.30Considerthe experimentof tossinga coin 1000times.Wewant to determinethe probability that the numberX of headsis > 600.Wecan use Equation1.3to estimatethis probability. Thevalue for e hereis0.2.Also, n = 1000and p =

\\- Equation1.3now becomes

P[X > 600] < eH0-2)2(500/3)]= e-20/3< a001273

Thisestimateis moreprecisethan that given by Markov's inequality. \342\226\241

1.4.2 RandomizedAlgorithms:An InformalDescriptionA randomizedalgorithmis one that makesuse of a randomizer(suchas arandomnumbergenerator).Someof the decisionsmadein the algorithmdependon the output of the randomizer.Sincethe output of anyrandomizer might differ in an unpredictableway from run to run, the output of arandomizedalgorithmcouldalsodiffer from run to run for the sameinput.Theexecutiontimeof a randomizedalgorithmcouldalsovary from run torun for the sameinput.

Randomizedalgorithmscan be categorizedinto two classes:The firstis algorithmsthat always producethe same(correct)output for the sameinput. ThesearecalledLas Vegasalgorithms.Theexecutiontimeof a LasVegasalgorithmdependson the output of the randomizer.If we are lucky,the algorithmmight terminatefast, and if not, it might run for a longerperiodof time. In generalthe executiontimeof a LasVegas algorithmischaracterizedas a randomvariable (seeSection1.4.1for a definition).Thesecondis algorithmswhoseoutputsmight differ from run to run (for the sameinput). ThesearecalledMonte Carloalgorithms.Considerany problemforwhich thereareonly two possibleanswers,say,yes and no.If a Monte Carloalgorithmis employedto solvesuchaproblem,thenthe algorithmmight giveincorrectanswers dependingon the output of the randomizer.We requirethat the probability of an incorrectanswer from a Monte Carloalgorithmbelow. Typically, for a fixed input, a Monte Carloalgorithmdoesnot display


much variation in executiontimebetweenruns,whereasin the caseof a LasVegasalgorithmthis variation is significant.

We can think of a randomizedalgorithmwith one possiblerandomizeroutput to be different from the samealgorithmwith a different possiblerandomizeroutput.Therefore,a randomizedalgorithmcan be viewed as afamily of algorithms.For a given input,someof the algorithmsin this familymay run for indefinitelylongperiodsof time(ormay give incorrectanswers).Theobjectivein the designof a randomizedalgorithmis to ensurethat thenumberof suchbad algorithmsin the family is only a smallfractionof thetotalnumberof algorithms.If for any input we can show that at least1\342\200\224 e(e beingvery closeto 0) fractionof algorithmsin the family will run quickly(respectivelygive the correctanswer)on that input, then clearly,a randomalgorithmin the family will run quickly (or output the correctanswer)onany input with probability > 1 \342\200\224 e. In this casewe say that this family ofalgorithms(or this randomizedalgorithm)runs quickly (respectivelygivesthe correctanswer)with probability at least1\342\200\224 e,wheree is calledthe errorprobability.

Definition1.17[The 0()]Just like the 0()notationis usedto characterizethe run timesof non randomizedalgorithms,0()is used for characterizingthe run timesof Las Vegasalgorithms.We say a Las Vegasalgorithmhas aresource(time,space,and soon.) boundof 0(g(n)) if thereexistsa constantc such that the amountof resourceusedby the algorithm(onany input ofsizen) is no morethan cag(n) with probability > 1

\342\200\236.

We shall refer totheseboundsas high probabilitybounds.

Similardefinitionsapply alsoto suchfunctions as\302\251(), (l(),o(), etc. \342\226\241

Definition1.18[High probability]By high probabilitywe meanaprobabilityof > 1\342\200\224 n~a for any fixed a. We calla the probability parameter. \342\226\241

As mentionedabove,the run timeT of any Las Vegasalgorithmistypically characterizedasa randomvariable over a samplespaceS.Thesamplepoints of S are allpossibleoutcomesfor the randomizerused in thealgorithm. Though it is desirableto obtain the distributionof T, often this isa challengingand unnecessarytask. Theexpectedvalue of T often sufficesas a goodindicatorof the run time. We can do better than obtainingthemeanof T but short of computingthe exactdistributionby obtainingthehighprobability bounds.Thehighprobability boundsof our interestareofthe form \"With highprobability the value of T will not exceedTo,\" for someappropriateTo.

Severalresultsfrom probability theory can be employed to obtain highprobability boundson any randomvariable.Two of the moreuseful suchresultsareMarkov's inequality and Chernoffbounds.


Next we give two examplesof randomizedalgorithms.Thefirst is of theLasVegastype and the secondis of the Monte Carlotype. Otherexamplesarepresentedthroughoutthe text. We say a Monte Carlo(LasVegas)algorithm has failed if it doesnot give a correctanswer (terminatewithin aspeciiiedamount of time).

1.4.3 Identifyingthe RepeatedElementConsideran array a[ ] of n numbersthat has|distinctelementsand|copiesof anotherelement.Theproblemis to identify the repeatedelement.

Any deterministicalgorithmfor solving this problemwill needat least?!+ 2 timestepsin the worst case.This fact can be argued as follows:Consideran adversary who has perfectknowledgeabout the algorithmusedand who is in chargeof selectingthe input for the algorithm.Such anadversary can make sure that the first ~ + 1elementsexaminedby thealgorithmareall distinct.Even after having lookedat t|+ 1elements,thealgorithmwill not be in a positionto infer the repeatedelement.Itwill haveto examineat least

\302\247

+2 elementsand hencetake at least\302\247

+2 timesteps.In contrastthereis a simpleand elegantrandomizedLas Vegasalgorithm

that takes only O(logn) time. It randomly picks two array elementsandcheckswhetherthey comefrom two different cellsand have the samevalue.If they do,the repeatedelementhas beenfound. If not, this basicstepof samplingis repeatedas many timesas it takesto identify the repeatedelement.

In this algorithm,the samplingperformedis with repetitions;that is,thefirst and secondelementsare randomly pickedfrom out of the n elements(eachelementbeingequally likely to bepicked).Thus thereis a probability(equalto -)that the samearray elementis pickedeachtime.If we just checkfor the equality of the two elementspicked,our answer might be incorrect(in casethe algorithmpickedthe samearray indexeachtime).Therefore,itis essentialto makesure that the two array indicespickedaredifferent andthe two array cellscontainthe samevalue.

This algorithmis given in Algorithm 1.27.The algorithmreturns thearray indexof oneof the copiesof the repeatedelement.Now we prove thatthe run timeof the above algorithmis O(logn).Any iterationof the whileloopwill besuccessfulin identifying the repeatednumberif i is any onethe|array indicescorrespondingto the repeatedelementand j is any one ofthe same|indicesotherthan i. In otherwords, the probability that thealgorithmquits in any given iterationof the while loopis P \342\200\224

n'^\"(

~ ',which is > g for all n > 10.This impliesthat the probability that thealgorithmdoesnot quit in a given iterationis <|.


12345678910

RepeatedElement(a,n)// Findsthe repeatedelementfrom a[l:n].{

while (true) do

}

{

}

i :=RandomQ modn + 1;j :=Random()modn + 1;// i and j arerandomnumbersin the range[l,n].if ((i t^ j) and (a[i\\

= a[j])) thenreturni;

Algorithm1.27Identifying the repeatedarray number

Therefore,the probability that the algorithmdoesnot quit in 10iterations/4\\10is < f 5 J < .1074.So,Algorithm 1.27will terminatein 10iterationsor

lesswith probability > .8926.Theprobability that the algorithmdoesnot/4\\ioo _ \342\200\236

terminatein 100iterationsis < if J < 2.04* 10 . That is,almostcertainlythe algorithmwill quit in 100iterationsor less.If n equals2 * 106,for example,any deterministicalgorithmwill have to spendat leastonemilliontimesteps,as opposedto the 100iterationsof Algorithm 1.27!

In general,the probability that the algorithmdoesnot quit in the firstcalogn (c is a constantto be fixed) iterationsis

< (4/5)calosn= n-cal\302\260s(5/4)

which will be <n~a if we pick c >^\342\200\2247574V

Thus the algorithmterminatesin -, tttzt\302\253log n iterationsor lesswith

probability > 1 \342\200\224 nT01. Sinceeachiterationof the while looptakes0(1)time,the run timeof the algorithmis O(logn).

Note that this algorithm,if it terminates,will alwaysoutput the correctanswer and henceis of the Las Vegastype. Theabove analysis shows thatthe algorithmwill terminatequickly with highprobability.

Thesameproblemof inferring the repeatedelementcan be solvedusingmany deterministicalgorithms.Forexample,sortingthe array is one way.But sortingtakesQ(nlogn)time(proved in Chapter10).An alternativeisto partitionthe array into [^] parts,whereeachpart (possibly exceptforone part) has threearray elements,and to searchthe individual parts for


the repeatedelement.At leastone of the partswill have two copiesof therepeatedelement.(Provethis!) Therun timeof this algorithmis O(n).

1.4.4 PrimalityTestingAny integergreaterthan one is saidtobe a primeif its only divisors are 1and the integeritself.By convention,we take 1 to be a nonprime.Then2, 3,5,7,11,and 13arethe first sixprimes.Given an integern, the problemof decidingwhethern is a primeis known as primality testing. It has anumberof applicationsincludingcryptology.

If a numbern iscomposite(i.e.,nonprime),it must have a divisor< [y/n\\.Thisobservationleadsto the following simplealgorithmfor primality testing:Considereachnumber\302\243 in the interval [2, [v^J] and checkwhether\302\243 dividesn. If none of thesenumbersdivides n, then n is prime;otherwiseit iscomposite.

Assumingthat it takes0(1)timeto determinewhetheroneintegerdividesanother, the naive primality testing algorithmhas a run timeof 0(^/n).The input sizefor this problemis [(logn+ 1)],sincen can be representedin binary form with thesemany bits.Thus the run timeof this simplealgorithmis exponentialin the input size(noticethat i/n = 22logn).

We can devisea Monte Carlorandomizedalgorithmfor primality testingthat runs in time0((logn)2).The output of this algorithmis correctwithhighprobability. If the input isprime,the algorithmnever givesan incorrectanswer. However, if the input numberis composite(i.e.,nonprime),thenthereis a smallprobability that the answer may be incorrect.Algorithmsof this kind aresaidto have one-sidederror.

Beforepresentingfurther details,we list two theoremsfrom numbertheory that will serveas the backboneof the algorithm.The proofs of thesetheoremscan be found in the referencessuppliedat the end of this chapter.

Theorem1.6[Fermat] If n is prime,then an~l = 1 (modn) for anyinteger a <n. \342\226\241

Theorem1.7Theequationx2 = 1 (modn) has exactlytwo solutions,namely 1and n \342\200\224 1,if n is prime. \342\226\241

Corollary1.1If the equationx2 = 1 (modn) hasrootsotherthan 1andn \342\200\224 1,then n is composite. \342\226\241

Note:Any integerx which is neither1nor n \342\200\224 1but which satisfiesx2 = 1(modn) is saidto be a nontrivial squareroot of 1modulon.

Fermat'stheoremsuggeststhe following algorithmfor primality testing:Randomly choosean a < n and checkwhetheran_1= 1 (modn) (callthis


Fermat'sequation).If Fermat'sequation is not satisfied,n is composite.If the equation is satisfied,we try somemorerandoma's. If on eachatried,Fermat'sequationis satisfied,we output \"n is prime'1;otherwiseweoutput \"n is composite.\"Inorderto computean~l modn, we couldemployExponentiate(Algorithm1.16)with someminormodifications.Theresultantprimality testing algorithm is given as Algorithm 1.28.Herelarge is anumbersufficiently largethat ensuresa probability of correctnessof > 1\342\200\224

n~a.

12345678910111213141516171819202122232425

PrimeO(////{

}

n,a)Returnstrueif n is a primeand falseotherwise.a is

q:=for{

}

the probability parameter.

= n \342\200\224 1;i :=1to largedo // Specifylarge.

m:=q;y:=1;a :=Random()modq + 1;// Choosea randomnumberin the range[l,n-z :=a;II Computean~l modn.while (m >0) do{

while (m mod2 = 0) do{

z :=z2 modn;m :=[m/2\\;}m :=m \342\200\224 1;y := (y * z) modn;

}if (y t^ 1) thenreturnfalse;// If an_1modn is not 1,n is not a prime.

returntrue;

-!]\342\200\242

Algorithm1.28Primality testing:first attempt

If the input is prime,Algorithm 1.28will never output an incorrectanswer. If n is composite,will Fermat'sequationnever be satisfied for any alessthan n and greaterthan one? If so,the above algorithmhas to examinejust one a beforecomingup with the correctanswer. Unfortunately, the


answer to this questionis no. Even if n is composite,Fermat'sequationmaybe satisfied dependingon the a chosen.

Is it the casethat for every n (that is composite)therewill be somenonzeroconstant fractionof a'slessthan n that will not satisfy Fermat'sequation?If the answer is yes and if the above algorithmtriesa sufficientlylargenumber of a's,thereis a highprobability that at leastonea violatingFermat'sequationwill be found and hencethe correctanswer be output.Hereagain, the answer is no. Thereare compositenumbers (known asCarniichaelnumbers)for whichevery a that is lessthan and relativelyprimeto n will satisfy Fermat'sequation.(Thenumberof a'sthat do not satisfyFermat'sequationneednot be a constantfraction.) Thenumbers561and1105arc-examplesof Carmichaelnumbers.

Fortunately, a slight modificationof the above algorithmtakescareoftheseproblems.The modified primality testing algorithm(alsoknown asMiller-Rabin'salgorithm)is the sameas PrimeO (Algorithm 1.28)exceptthat within the body of PrimeO,we alsolookfor nontrivial squarerootsof n.The modifiedversion is given in Algorithm 1.29.We assumethat n is odd.

Miller-Rabin'salgorithmwill never give an incorrectanswer if the inputis prime,sinceFermat'sequationwill always be satisfied and no nontrivialsquare; root of 1 modulon can be found. If n is composite,the abovealgorithmwill detectthe compositenessof n if the randomly chosena eitherleadsto the discoveryof a nontrivial squarerootof 1or violatesFermat'sequation.Callany sucha a witness to the compositenessof n. What is theprobability that a randomly chosena will bea witness to the compositenessof n? Thisquestionis answeredby the following theorem(the proofcan befound in the referencesat the end of this chapter).

Theorem1.8Thereare at least^- witnessesto the compositenessof nif n is compositeand odd. \342\226\241

Assume that n is composite(sinceif n is prime,the algorithmwill alwaysbe correct).Theprobability that a randomly chosena will be a witnessis

\342\200\224 l!^n ' wmcn is very nearly equalto ^. Thismeansthat a randomly chosena will fail to be a witness with probability < ^.

Therefore,the probability that none of the first alogna'schosenis awitnessis < (^1 = n a. In otherwords, the algorithmPrime will

give an incorrectanswer with only probability <n~Q.The run timeof the outermostwhile loopis nearly the sameas that of

Exponentiate(Algorithm 1.16)and equaltoO(logn).Sincethis while loopis executedO(logn) times,the run timeof the wholealgorithmis 0(log2n).


5. Givena 2-sidedcoin.Usingthis coin,how will you simulatean n-sidedcoin

(a) when n is a power of 2?.(b) when n is not a power of 2?.

6. Computethe run timeanalysis of the Las Vegas algorithmgiven in

Algorithm 1.30and expressit usingthe 0()notation.

1 LasVegas()2 {3 while (true)do4 {5 i :\342\200\224 Random()mod2;6 if (i > 1) thenreturn;7 }8 }

Algorithm1.30A Las Vegasalgorithm

7. Thereare \\fn copiesof an elementin the array c. Every otherelementof c occursexactlyonce.If the algorithmRepeatedElement is used to

identify the repeatedelementof c, will the run timestillbe O(logn)?If so,why? If not, what is the new run time?

8. What is the minimum number of timesthat an elementshouldberepeatedin an array (theotherelementsof the array occurringexactlyonce)sothat it can be found usingRepeatedElementin O(logn) time?

9. An array a has ^ copiesof a particularunknown elementx. Everyother elementin a has at most

\302\247 copies.Presentan O(logn) timeMonte Carloalgorithmto identify x. The answer shouldbe correctwith high probability. Can you develop an O(logn) timeLas Vegasalgorithmfor the sameproblem?

10.Considerthe naive Monte Carloalgorithmfor primality testingpresented in Algorithm 1.31.HerePower(x,y)computesxy. What shouldbe the value of t for the algorithm'soutput to be correctwith highprobability?

11.Let A bea Monte Carloalgorithmthat solvesa decisionproblemn intimeT.The output of A is correctwith probability >|.Show how


1 Primel(rz)2 {3 // Specifyt.4 for i :=1to t do5 {6 m :=Power(n,0.5);7 j :=Random() modm +2;8 if ((n modj) = 0) thenreturnfalse;9 // If j divides n, n is not prime.10 }1L returntrue;12 }

Algorithm1.31Another primality testingalgorithm

you can modify A so that its answer is correctwith high probability.Themodifiedversion can take O(Tlogn)time.

12.In generala Las Vegasalgorithmis preferableto a Monte Carloalgorithm, sincethe answergiven by the former is guaranteedto becorrect.Theremay becriticalsituationsin whicheven a very smallprobabilityof an incorrectanswer is unacceptable.Say there is a Monte Carloalgorithmfor solving a problemn in T\\ timeunits whoseoutput iscorrectwith probability >

\\. Also assumethat thereis anotheralgorithm that can checkwhethera given answer is valid for n in T2 timeunits. Showhow you usethesetwo algorithmstoarrive at a Las Vegasalgorithmfor solving n in time0{{T\\+T2) logn).

13.Theproblemconsideredhereis that of searchingfor an elementx inan array a[\\ : n]. Algorithm 1.17gives a deterministicO(n) timealgorithmfor this problem.Show that any deterministicalgorithmwill have to take fi(n) time in the worst casefor this problem.Incontrasta randomizedLas Vegasalgorithmthat searchesfor x is givenin Algorithm 1.32.This algorithmassumesthat x is in a[].What isthe 0()run timeof this algorithm?


1 AlgorithmRSearch(a,x,n)2 // Searchesfor x in a[l:n].Assume that x is in a[].3 {4 while (true)do5 {6 i :=Random()modn + 1;7 // i is randomin the range[l,n],8 if (a[i]= x) then returni;9 }10 }

Algorithm1.32Randomizedsearch

1.5 REFERENCES AND READINGSFora moredetaileddiscussionof performanceanalysis and measurement,seeSoftware Developmentin Pascal,Third Edition,by S.Sahni, NSPANPrintingand Publishing,1993.

Fora discussionon mathematicaltoolsfor analysis seeConcreteMathematics: A Foundationfor ComputerScience,by R. L.Graham,D.E.Knuth,and O.Patashnik,Addison-Wesley,1989.

More detailsabout the primality testingalgorithmcanbe found inIntroduction to Algorithms,by T.H.Cormen,C.E.Leiserson,and R. L.Rivest,MITPress,1990.

An excellentintroductorytext on probability theory is Probability andRandomProcesses,by G.R.Grimmetand D.R. Stirzaker,OxfordUniversityPress,1988.A proofof Lemma1.1can be found in this book.Fora proofof Lemma1.2seeQueueingSystems,Vol. I,by L.Kleinrock,John Wiley SzSons,1975.

A formal treatmentof randomizedalgorithmsand severalexamplescanbefound in \"Derivation of randomizedalgorithmsfor sortingand selection,\"by S.Rajasekaranand J. H.Reif, in ParallelAlgorithm DerivationandProgramTransformation, editedby R. Paige,J. H. Reif, and R. Wachter,Kluwer AcademicPublishers,1993,pp.187-205.Formoreon randomizedalgorithmsseeRandomized Algorithms by R. Motwani and P. Raghavan,CambridgeUniversity Press,1995.

Chapter2

ELEMENTARYDATASTRUCTURES

Now that we have examinedthe fundamental methodswe needto expressand analyze algorithms,we might feel all set to begin.But, alas,we needto make one lastdiversion,and that is a discussionof datastructures.Oneof the basictechniquesfor improving algorithmsis to structurethe datain such a way that the resultingoperationscan be efficiently carriedout.In this chapter,we review only the most basicand commonly used datastructures.Many of theseareused in subsequentchapters.We shouldbefamiliar with stacksand queues(Section2.1),binary trees(Section2.2),andgraphs (Section2.6)and beableto refer to the otherstructuresas needed.

2.1 STACKSAND QUEUESOneof the most commonforms of dataorganizationin computerprogramsis the orderedor linearlist,which is often written as a = (ai,a,2,...,an)-TheMj-'h are referredto as atomsand they arechosenfrom someset.Thenull orempty list has n = 0 elements.A stackis an orderedlist in which allinsertionsand deletionsaremadeat oneend,calledthe top. A queueis anorderedlist in which all insertionstakeplaceat oneend,the rear,whereasalldeletionstakeplaceat the otherend,the front.

Theoperationsof a stackimply that if the elementsA, B,C,D, and Eareinsertedinto a stack,in that order,then the first elementtoberemoved(delet.ed)must beE.Equivalentlywe say that the last elementto beinsertedinto the stackis the first to beremoved.Forthis reasonstacksaresometimesreferredto as Last InFirst Out (LIFO)lists.The operationsof a queuerequirethat the first elementthat is insertedinto the queueis the first oneto beremoved.Thusqueuesareknown as First InFirst Out (FIFO)lists.SeeFigure2.1for examplesof a stackand a queueeachcontainingthe same

69

70 CHAPTER2. ELEMENTARYDATA STRUCTURES

< top

A B C D EA A

front rear

queue

Figure2.1Exampleof a stackand a queue

five elementsinsertedin the sameorder. Note that the data objectqueueas defined hereneednot correspondto the conceptof queuethat is studiedin queuingtheory.

Thesimplestway to representa stackis by usinga one-dimensionalarray,say stack\\f) :n \342\200\224 1],wheren is the maximumnumberof allowableentries.Thefirst or bottomelementin the stackis storedat stack]})],the secondatstack[\\],and the ith at stack[i\342\200\224 1],Associatedwith the array is a variable,typically calledtop, which points to the topelementin the stack. To testwhether the stackis empty, we ask \"if (top < 0)\". If not, the topmostelementis at stack[top].Checkingwhetherthe stackis full can be donebyasking \"if (top > n \342\200\224 1)\".Two moresubstantialoperationsare insertingand deletingelements.The correspondingalgorithmsare Add and Delete(Algorithm 2.1).

Eachexecutionof Add or Deletetakesa constantamountof timeand isindependentof the numberof elementsin the stack.

Another way torepresenta stackis by usinglinks (or pointers).A nodeis a collectionof data and link information.A stackcan be representedbyusingnodeswith two fields, possibly calleddata and link.Thedata fieldof eachnodecontainsan itemin the stackand the correspondinglink fieldpoints to the nodecontainingthe next itemin the stack.The link field ofthe last nodeis zero,for we assumethat all nodeshave an addressgreaterthan zero.Forexample,a stackwith the itemsA, B,C,D, and E insertedin that order,looksas in Figure2.2.

E

D

C

B

A

stack

2.1.STACKSAND QUEUES 71

12345678910111213

12345678910111213

AlgorithmAdd (item)////1

}

Push an elementonto the stack.Return trueif successful;elsereturn false,item is usedas an input.

if (top > n \342\200\224 1) then{

write (\"Stackis full!\;")returnfalse;}else{

top :=top + 1;stack[top]:=item;returntrue;}

AlgorithmDelete(item)// Popthe topelementfrom the stack.Return trueif successful//1

I

elsereturn false,item is usedas an output.

if (top < 0) then{

write (\"Stackis empty!\;")returnfalse;}else{

item:=stack[top\\;top :=top \342\200\224 1;returntrue;}

Algorithm2.1Operationson a stack

stack

ID B A 0

data link

Figure2.2Exampleof a five-element,linked stack


II Type is the type of data,node=record{

Type data;node*link;}1 AlgorithmAdd (item)2 {3 // Geta new node.4 temp:=new node;5 if (temp 0) then6 {7 (temp \342\200\224> data) :=item; (temp \342\200\224> link) :=top;8 top :=temp;returntrue;9 }10 else11 {12 write (\"Outof space!\;13 returnfalse;14 }15 }1 AlgorithmDelete(iiem)2 {3 if (top = 0) then4 {5 write (\"Stackis empty!\;6 returnfalse;7 }8 else9 {10 item:=(top \342\200\224> data);temp:=top;11 top :=(top \342\200\224>\342\226\240 link);12 deletetemp;returntrue;13 }14 }

Algorithm2.2Link representationof a stack


Thevariable top points to the topmostnode(the last iteminserted)inthe list.Theempty stackis representedby setting top :=0.Becauseof theway the links are pointing,insertionand deletionare easy to accomplish.SeeAlgorithm 2.2.

In the caseof Add, the statementtemp := new node;assignsto thevariable tempthe addressof an available node.If no morenodesexist,itreturns0. If a nodeexists,we storeappropriatevalues into the two fieldsofthe node.Thenthe variable top is updatedto point to the new topelementof the list.Finally, trueis returned.If no morespaceexists,it prints anerrormessageand returns false.Referingto Delete,if the stackis empty,then trying todeletean itemproducesthe errormessage\"Stackis empty!\"and falseis returned.Otherwisethe topelementis storedas the value ofthe variable item,a pointerto the first nodeis saved, and top is updatedto point to the next node.Thedeletednodeis returnedfor future use andfinally trueis returned.

Theuse of links to representa stackrequiresmorestoragethan thesequential array stack[0: n \342\200\224 1]does.However, there is greaterflexibilitywhen using links,for many structurescansimultaneouslyusethe samepoolof available space.Most importantly the timesfor insertionand deletionusingeitherrepresentationare independentof the sizeof the stack.

An efficient queue representationcan be obtainedby taking an arrayq[0 : n \342\200\224 1]and treating it as if it were circular.Elementsare insertedbyincreasingthe variable rear to the next free position.When rear = n \342\200\224 1,the next elementis enteredat q[0] in casethat spotis free. Thevariablefrontalwayspointsonepositioncounterclockwisefrom the first elementinthe queue.The variable front= rear if and only if the queue is emptyand we initially set front:= rear := 0. Figure2.3illustratestwo of thepossibleconfigurationsfor a circularqueuecontainingthe four elementsJlto J4 with n >4.

To insertan element,it is necessaryto move rear onepositionclockwise.Thiscan bedoneusingthe code

if (rear = n \342\200\224 1) thenrear :=0;elserear :=rear + 1;

A moreelegantway to do this is to usethe built-inmodulooperatorwhichcomputesremainders.Beforedoingan insert,we increasethe rearpointerby saying rear := (rear + 1) modn;. Similarly, it is necessaryto movefrontonepositionclockwiseeachtimea deletionis made.An examinationof Algorithm 2.3(a)and (b) shows that by treating the array circularly,additionand deletionfor queuescan be carriedout in a fixed amount oftimeor O(l).

Onesurprisingfeature in thesetwo algorithmsis that the test for queuefull in AddQ and the test for queueempty in DeleteQare the same.Inthe


/j4y^[3][j3__[

\\J2 A

v1/[n^-_j[0]

front = 0;

\342\200\242 \\

\\\\ [\302\253-4l

/C/ [n-3]

___A [n-2][\302\253-l]

rear = 4

[4] /\\Pi ryY

in x

front

\342\200\242

J4

. \342\200\242 \\

L J1/ r/\\/ [n-3]

r\\j27J^y/ [n-2]

[0][\342\200\236-l]

= rc-4; rear = 0

Figure2.3Circularqueueof capacity n \342\200\224 1containingfour elementsJl,J2,J3,and J4

caseof AddQ,however, when front= rear,thereis actually onespacefree,q[rear],sincethe first elementin the queue is not at q[front]but is onepositionclockwisefrom this point.However,if we insertan itemthere,thenwe cannotdistinguishbetweenthe casesfull and empty, sincethis insertionleaves front = rear. To avoid this, we signal queue full and permitamaximumof n \342\200\224 1ratherthan n elementsto be in the queueat any time.Oneway to usealln positionsis to useanothervariable,tag, to distinguishbetweenthe two situations;that is,tag = 0 if and only if the queueis empty.This however slowsdown the two algorithms.Sincethe AddQ and DeleteQalgorithmsareused many timesin any probleminvolving queues,the lossof onequeuepositionis morethan madeup by the reductionin computingtime.

Another way to representa queue is by using links. Figure2.4showsa queuewith the four elementsA, B,C,and D enteredin that order. Aswith the linked stackexample,eachnodeof the queueis composedof thetwo fieldsdataand link.A queueis pointedat by two variables,frontandrear.Deletionsaremadefrom the front, and insertionsat the rear.Variablefront= 0 signalsan empty queue.Theproceduresfor insertionand deletionin linked queuesareleft as exercises.

EXERCISES1.Write algorithmsfor AddQ and DeleteQ,assumingthe queueis

represented as a linked list.


1234567891011121314ir>

l(i17IX

1920

1234567891011121311ir>

AlgorithmAddQ(item)IIII

Insertitemin the circularqueuestoredin q[G :n \342\200\224

l\\.rear points to the last item,and frontis one

II positioncounterclockwisefrom the first itemin q.{

}

rear := (rear+ 1) modn; // Advance rear clockwise.if (front= rear) then{

write (\"Queueis full!\;if (front\342\200\224 0) thenrear :\342\200\224 n \342\200\224 1;elserear :=rear \342\200\224 1;// Move rear onepositioncounterclockwise.returnfalse;

}else{

q[rear]:\342\200\224 item;// Insertnew item.returntrue;

}(a) Addition of an element

AlgorithmDeleteQ(iiem)//{

}

Removesand returns the front elementof the queueq[0 :n \342\200\224 1]if (front= rear) then{

write (\"Queueis empty!\;returnfalse;

}else{

front:=(front+ 1) modn; // Advance front clockwise.item:=q[front];// Set itemtofront of queue.returntrue;

}(b) Deletionof an element

Algorithm2.3Basicqueueoperations


data link

A

/

B C D 0

1front rear

Figure2.4A linked queuewith four elements

A linearlist is beingmaintainedcircularlyin an array c[0:n/ and r setup as for circularqueues.

1] with

(a) Obtaina formula in termsof /,r, and n for the numberof elementsin the list.

(b) Write an algorithmto deletethe A;th elementin the list.(c) Write an algorithmto insert an elementy immediatelyafter the

fcth element.

What is the timecomplexityof your algorithmsfor parts (b) and (c)?3.Let X = (xi,...,xn) and Y = (j/i,...,ym) be two linked lists.Write

an algorithm to mergethe two liststo obtain the linked list Z =(xi,yi,x2,y2,\342\200\242\342\200\242\342\200\242,xm,ym,xm+1,...,xn) ifm <norZ = (xi,yi,x2,y2,...,xn,yn,yn+i,...,ym)if m >n.

4. A double-endedqueue(deque)is a linearlist for which insertionsanddeletionscan occurat eitherend.Showhow to representa dequein aone-dimensionalarray and write algorithmsthat insert and deleteateitherend.

5.Considerthe hypotheticaldataobjectXI. The objectX2 is a linearlist with the restrictionthat althoughadditionsto the list canbemadeat eitherend,deletionscanbemadefrom oneendonly. Designa linkedlist representationfor XI.Specifyinitialand boundary conditionsforyour representation.

2.2 TREESDefinition2.1[Tree] A treeis a finite set of oneor morenodessuch thatthereis a speciallydesignatednodecalledthe root and the remainingnodesarepartitionedinto n > 0 disjointsetsTi,...,Tn,whereeachof thesesetsis a tree.ThesetsTi,...,Tn arecalledthe subtreesof the root. \342\226\241

2.2.TREES 77

2.2.1 TerminologyTherearemany termsthat areoften usedwhen referringto trees.Considerthe treein Figure2.5.Thistreehas 13nodes,eachdataitemof a nodebeinga singleletterfor convenience.The rootcontainsA (we usually say nodeA), and we normally draw treeswith their rootsat the top.Thenumberofsubtreesof a nodeis calledits degree.The degreeof A is 3,of C is 1,and ofF is 0. Nodesthat have degreezeroarecalledleaf or terminalnodes.Theset {K,L, F,G,M, I, J} is the set of leafnodesof Figure2.5.The othernodesare referredto as nonterminals.Therootsof the subtreesof a nodeX art; the childrenof X. ThenodeX is the parentof its children.Thusthechildrenof D areH,I,and J, and the parentof D is A.

Figure2.5A sampletree

Childrenof the sameparent are said to be siblings.ForexampleH, I,and J aresiblings.We canextendthis terminology if we needto sothat wecan ask for the grandparentof M, which is D, and soon.The degreeof atreeis the maximumdegreeof the nodesin the tree.Thetree in Figure2.5has degree3.The ancestorsof a nodeareallthe nodesalongthe path fromthe rootto that node.Theancestorsof M areA, D,and H.

The level of a nodeis definedby initially lettingthe rootbe at level one.If a nodeis at level p, then its childrenareat level p + 1.Figure2.5showsthe levelsof all nodesin that tree.The height or depth of a tree is definedto be the maximumlevel of any nodein the tree.

A forestis a setof n > 0 disjointtrees.Thenotionof a forest is very closeto that of a treebecauseif we remove the rootof a tree,we get a forest.Forexample,in Figure2.5if we remove A, we get a forest with threetrees.


Now how do we representa tree in a computer'smemory? If we wishto use a linked list in which one nodecorrespondsto onenodein the tree,then a nodemust have a varying numberof fieldsdependingon the numberof children. However, it is often simplerto write algorithmsfor a datarepresentationin which the nodesizeis fixed.We can representa treeusinga fixed nodesizelist structure. Sucha list representationfor the tree ofFigure2.5is given in Figure2.6.In this figure nodeshave threefields:tag,data,and link.Thefieldsdataand link areusedas beforewith theexceptionthat when tag = 1,datacontainsa pointertoa list ratherthan a dataitem.A treeis representedby storingthe rootin the first nodefollowed by nodesthat point to sublistseachof which containsonesubtreeof the root.

A

i '

B

'E

F 0

K

\" _

C

L 0

G 0

0

\302\273

D

\302\273

H

I

M 0

J 0

The tag field of a nodeis one if it has a down-pointingarrow; otherwiseit is zero.

Figure2.6List representationfor the treeof Figure2.5

2.2.2 BinaryTreesA binary treeis an important type of treestructurethat occursvery often.It is characterizedby the fact that any nodecan have at most two children;that is,thereis no nodewith degreegreaterthan two. Forbinary treeswedistinguishbetweenthe subtreeon the left and that on the right, whereasfor othertreesthe orderof the subtreesis irrelevant.Furthermorea binarytreeis allowedto have zeronodeswhereasany other treemust have at leastonenode.Thus a binary tree is really a different kind of objectthan anyothertree.

Definition2.2A binary tree is a finite set of nodesthat is eitheremptyor consistsof a rootand two disjointbinary treescalledthe left and rightsubtrees. \342\226\241

Figure2.7shows two samplebinary trees.Thesetwo treesare specialkinds of binary trees.Figure2.7(a) is a skewedtree, skewed to the left.

2.2.TREES 79

Thereis a correspondingtreeskewedto the right, which is not shown.Thetree in Figure2.7(b) is calleda completebinary tree. This kind of tree isdefined formally lateron.Noticethat for this treeall terminalnodesareontwo adjacentlevels.Thetermsthat we introducedfor trees,suchas degree,level,height,leaf, parent,and child,all apply to binary trees in the sameway.

level

(a)

Figure2.7Two samplebinary trees

Lemma2.1Themaximumnumberof nodeson level i of a binary tree is2*~1.Also, the maximumnumberof nodesin a binary tree of depth k is2k -1,k >0. \342\226\241

The binary tree of depth k that has exactly2k \342\200\224 1 nodesis calledafull binary treeof depth k. Figure2.8shows a full binary treeof depth 4.A very elegantsequentialrepresentationfor full binary treesresultsfromsequentiallynumberingthe nodes,starting with the nodeon level one,thengoing to thoseon level two, and soon. Nodeson any level arenumberedfrom left toright (seeFigure2.8).A binary treewith n nodesand depth kis completeiff its nodescorrespondto the nodesthat arenumberedonetonin the full binary treeof depth k. A consequenceof this definition is that ina completetree,leaf nodesoccuron at most two adjacentlevels.Thenodes


of an n-nodecompletetree may be compactly storedin a one-dimensionalarray, tree[l:n], with the nodenumberedi beingstoredin tree[i\\.Thenextlemmashows how to easily determinethe locationsof the parent,left child,and right childof any nodei in the binary tree without explicitlystoringany link information.

Figure2.8Full binary treeof depth 4

Lemma 2.2If a completebinary treewith n nodesis representedsequentially as describedbefore,then for any nodewith indexi, 1n, i has no left child.

3.rchild(i)is at 1i+ 1if 2i + 1< n. If 1i+ 1>n, i has no right child.\342\226\241

This representationcan clearly be used for all binary trees though inmost casesthere is a lot of unutilizedspace.For completebinary treesthe representationis idealas no spaceis wasted.For the skewed tree ofFigure2.7,however, lessthan a third of the array is utilized.In the worstcasea right-skewedtreeof depth k requires2k \342\200\224 1locations.Of theseonlyk areoccupied.

Although the sequentialrepresentation,as in Figure2.9,appearsto begoodfor completebinary trees,it is wasteful for many otherbinary trees.Inaddition,the representationsuffers from the generalinadequaciesofsequential representations.Insertionor deletionof nodesrequiresthe movement

2.3.DICTIONARIES 81

A B - C - - - D - .. . E

A B C D E F G H I

(1) (2) (3) (4) (5) (6) (7) (8) (9) ...(16)

Figure2.9Sequentialrepresentationof the binary treesof Figure2.7

of potentially many nodesto reflect the changein level numberof theremaining nodes.Theseproblemscanbeeasilyovercomethroughthe useof alinked representation.Eachnodehas threefields:Ichild,data,and rchild.Although this nodestructuremakesit difficult to determinethe parentof anode,for most applicationsit is adequate.In caseit is often necessaryto beableto determinethe parent of a node,then a fourth field,parent,can beincludedwith the obvious interpretation.The representationof the binarytreesof Figure2.7usinga three-fieldstructureis given in Figure2.10.

2.3 DICTIONARIESAn abstractdatatype that supportsthe operationsinsert,delete,and searchis calleda dictionary.Dictionarieshave found applicationin the designofnumerousalgorithms.

Example2.1Considerthe databaseof booksmaintainedin a librarysystem. When a userwants tocheckwhethera particularbookis available, asearch,operationis calledfor. If the bookis available and is issuedto theuser,a deleteoperationcan beperformedto remove this bookfrom the setof availablebooks.When the userreturns the book,it can be insertedbackinto the set. \342\226\241

It is essentialthat we are ableto support the above-mentionedoperations as efficiently as possiblesincetheseoperationsare performedquitefrequently. A numberof datastructureshave beendevisedto realizeadictionary. At a very highlevel thesecanbecategorizedascomparisonmethodsanddirectaccessmethods.Hashingis an exampleof the latter.We elaborateonly on binary searchtreeswhich arean exampleof the former.


tree tree

A 0

\342\226\240

B 0

1

C 0

\342\226\240

D 0

\342\226\240

0 E 0

A

/ B \\

/ D \\ 0 E 0 0 F 0

c

0 G 0

OHO 0 I 0

(a) (b)

Figure2.10Linked representationsfor the binary treesof Figure2.7

2.3.DICTIONARIES 83

2.3.1 BinarySearchTreesDefinition2.3[Binary searchtree]A binary searchtreeis a binary tree. Itmay beempty. If it is not empty, then it satisfies the following properties:

1.Every elementhas a key and no two elementshave the samekey (i.e.,the keys aredistinct).

2.Thekeys (if any) in the left subtreearesmallerthan the key in theroot.

3. The keys (if any) in the right subtreeare largerthan the key in theroot.

4. Theleft and right subtreesarealsobinary searchtrees. \342\226\241

A binary searchtree can supportthe operationssearch,insert,and deleteamongothers.In fact, with a binary searchtree,we can searchfor a dataelementboth by key value and by rank (i.e.,find an elementwith key x,find the fifth-smallest element,deletethe elementwith key x, deletethefifth-smallestelement,insertan elementand determineits rank, and soon).

Thereis someredundancy in the definitionof a binary searchtree.Properties 2,3,and 4 togetherimply that the keysmust bedistinct.So,property1can bereplacedby the property:The roothas a key.

Someexamplesof binary treesin which the elementshave distinctkeysareshown in Figure2.11.Thetreeof Figure2.11(a)is not a binary searchtree,despitethe fact that it satisfiesproperties1,2,and 3.The right subtreefails to satisfy property 4. This subtreeis not a binary searchtree,as itsright subtreehas a key value (22)that is smallerthan that in the subtree'sroot(25).Thebinary treesof Figure2.11(b)and (c)arebinary searchtrees.

Searchinga BinarySearchTreeSincethe definitionof a binary searchtreeis recursive,it iseasiestto describea recursivesearchmethod.Supposewe wish tosearchfor an elementwithkey x. An elementcouldin generalbean arbitrary structurethat has as oneof its fields a key. We assumefor simplicity that the elementjust consistsof a key and use the termselementand key interchangeably.We beginatthe root.If the rootis 0,then the searchtreecontainsno elementsand thesearcliis unsuccessful.Otherwise,we comparex with the key in the root.Ifx equalsthis key, then the searchterminatessuccessfully. If x is lessthanthe key in the root,then no elementin the right subtreecan have key valuex, and only the left subtreeis to be searched.If x is largerthan the keyin the root,only the right subtreeneedsto be searched.Thesubtreescanbesearchedrecursivelyas in Algorithm 2.4.This function assumesa linked


(a) (b) (c)

Figure2.11Binary trees

representationfor the searchtree. Eachnodehas the threefields Ichild,rchild,and data. The recursionof Algorithm 2.4is easily replacedby awhile loop,as in Algorithm 2.5.

1 AlgorithmSearch(\302\243,:r)

2 {3 if (t = 0) thenreturn0;4 elseif (x = t -\302\273 data) thenreturnt;5 elseif (x < t \342\200\224> data) then6 return

Search(\302\243-\302\273 lchild,x);

7 elsereturnSearch(\302\243

-\302\273 rchild,x);8 }

Algorithm2.4Recursivesearchof a binary searchtree

If we wish to searchby rank, eachnodeshouldhave an additionalfieldleftsize,which is oneplus the numberof elementsin the left subtreeof thenode.Forthe searchtree of Figure2.11(b),the nodeswith keys 2,5, 30,and 40, respectively, have leftsizeequal to 1,2,3,and 1.Algorithm 2.6searchesfor the fcth-smallestelement.

As can be seen,a binary searchtreeof heighth can besearchedby keyas wellas by rank in 0(h)time.

2.3.DICTIONARIES 85

12345678910111213

AlgorithmISearch(:c)1

}

found :=false;t :=tree;while ((t y\302\243 0) andnotfound) do{

if (x = (t -\302\273 data)) thenfound :elseif (re < (i ->data)) then\302\243 :

else\302\243:=(\302\243-\302\273 rchild);

}if (notfound) thenreturn0;elsereturnt;

= true;=

(\302\243

-\302\273 Ichild);

Algorithm2.5Iterativesearchof a binary searchtree

12345678910111213141510

Algori{

}

ithm Searchk(fc)

found :=false;t :=tree;while ((t y\302\243 0) andnot fowid) do{

}if

if (& =(\302\243-\302\273 leftsize))thenfound :

elseif(&<(\302\243-\302\273 leftsize))then\302\243 :

else{

k :=k \342\200\224 (t -^ leftsize);t := (t \342\200\224> rchild);

}(notfound) thenreturn0;

elsereturnt;

= true;=

(\302\243

->Ichild);

Algorithm2.6Searchinga binary searchtreeby rank


Insertionintoa BinarySearchTreeTo inserta new elementx, we must first verify that its key is different fromthoseof existingelements.To do this, a searchis carriedout.If the searchisunsuccessful,thenthe elementis insertedat the point the searchterminated.For instance,toinsertan elementwith key 80 into the treeof Figure2.12(a),we first searchfor 80.This searchterminatesunsuccessfully,and the lastnodeexaminedis the one with key 40.Thenew elementis insertedas theright childof this node.Theresultingsearchtreeis shown in Figure2.12(b).Figure2.12(c)shows the result of insertingthe key 35 into the searchtreeof Figure2.12(b).

(a) (b) (c)

Figure2.12Insertinginto a binary searchtree

Algorithm 2.7implementsthe insert strategy just described.If a nodehas a leftsizefield, then this is updatedtoo.Regardless,the insertioncanbeperformedin 0(h)time,whereh is the heightof the searchtree.

Deletionfrom a BinarySearchTreeDeletionof a leafelementis quite easy. For example,to delete35 from thetree of Figure2.12(c),the left-childfield of its parent is set to 0 and thenodedisposed.This gives us the tree of Figure2.12(b).To deletethe 80from this tree, the right-childfield of 40 is set to 0;this gives the tree ofFigure2.12(a).Then the nodecontaining80 is disposed.

Thedeletionof a nonleafelementthat has only one child is alsoeasy.Thenodecontainingthe elementtobe deletedis disposed,and the singlechildtakesthe placeof the disposednode.So,to deletethe element5 fromthe tree of Figure2.12(b),we simply changethe pointerfrom the parentnode(i.e.,the nodecontaining30) to the single-childnode(i.e.,the nodecontaining2).

2.3.DICTIONARIES 87

12345678910111213

14151617181920212223242526

AlgorithmInsert(x)//{

}

Insertx into the binary searchtree.

found :=false;

pill ,

= tree;Searchfor x. q is the parent of p.

while ((p/ 0) andnot found) do{

}//if (

{

}

Q '\342\226\240= P'i II Savep.if (x = (p \342\200\224> data)) thenfound :=true;else if (x < (p \342\200\224> data)) thenp :=(p \342\200\224> Ichild);

elsep :=(p \342\200\224> rchild);

Performinsertion.notfound) then

p :=new TreeNode;(p ->Ichild):=0;(p ->rchild) :=0;(p ->data) :=j;;if (tree/ 0) then{

if (x < (q \342\200\224> data)) then(g->Ichild):=p;else(g ->rchild) :=p;

}else\302\243ree :=p;

Algorithm2.7Insertioninto a binary searchtree


When the elementto bedeletedis in a nonleafnodethat has two children,the elementis replacedby eitherthe largestelementin its left subtreeor thesmallestonein its right subtree.Then we proceedto deletethis replacingelementfrom the subtreefrom which it was taken.For instance,if we wishto deletethe elementwith key 30from the treeof Figure2.13(a),then wereplaceit by eitherthe largestelement,5, in its left subtreeor the smallestelement,40, in its right subtree.Supposewe opt for the largestelementinthe left subtree.The5 is moved into the root,and the treeof Figure2.13(b)is obtained.Now we must deletethe second5. Sincethis nodehas only onechild,the pointerfrom its parent is changedto point to this child.Thetreeof Figure2.13(c)is obtained.We can verify that regardlessof whetherthereplacingelementis the largestin the left subtreeor the smallestin the rightsubtree,it is originally in a nodewith a degreeof at most one.So,deletingitfrom this nodeis quiteeasy. We leave the writing of the deletionprocedureasan exercise.Itshouldbeevidentthat a deletioncanbeperformedin 0(h)timeif the searchtreehas a heightof h.

(a) (b) (c)

Figure2.13Deletionfrom a binary searchtree

Heightofa BinarySearchTreeUnlesscareis taken, the heightof a binary searchtreewith n elementscanbecomeas largeas n. This is the case,for instance,when Algorithm 2.7isusedto insert the keys [1,2,3,...,n], in this order,into an initially emptybinary searchtree. It can, however, be shown that when insertionsanddeletionsaremadeat randomusingthe proceduresgiven here,the heightofthe binary searchtreeis O(logro)on the average.

Searchtreeswith aworst-caseheightof O(logn) arecalledbalancedsearchtrees.Balancedsearchtreesthat permitsearches,inserts,and deletestobeperformedin O(logro)timeare listedin Table2.1.ExamplesincludeAVL

trees,2-3trees,Red-Blacktrees,and B-trees.On the otherhandsplay trees

2.3.DICTIONARIES 89

take ()(logn) timefor eachof theseoperationsin the amortizedsense.Adescriptionof thesebalancedtreescanbefound in the bookby E.Horowitz,S.Salmi,and D.Mehta citedat the end of this chapter.

DatastructureBinary searchtree

AVL tree2-3treeRed-BlacktreeB-treeSplay tree

search0(n)(wc)

O(logro)(av)O(logro)(wc)0(logn)~[wc)O(logro)(wc)O(logro)(wc)O(logro)(am)

insert0{n)(wc)

O(logro)(av)O(logro)(wc)O(logro)(wc)O(logro)(wc)O(logro)(wc)O(logro)(am)

delete0{n)(wc)

O(logro)(av)O(logro)(wc)O(logro)(wc)O(logro)(wc)O(logro)(wc)O(logro)(am)

Tabic2.1Summary of dictionary implementations.Here(wc) stands forworst case,(av) for averagecase,and (am) for amortizedcost.

2.3.2 CostAmortizationSupposethat a sequenceII,12,Dl,13,14,15,16,D2,17of insertand deleteoperationsis performedon a set.Assume that the actualcostof eachof theseven insertsis one. (We usethe termscostand complexityinterchangeably.)By this, we mean that eachinsert takesone unit of time.Further,supposethat the deleteoperationsDl and D2 have an actualcostof eightand ten,respectively.So,the totalcostof the sequenceof operationsis 25.

In an amortizationschemewe chargesomeof the actualcostof anoperation to otheroperations.This reducesthe chargedcostof someoperationsand increasesthat of others.Theamortizedcostof an operationis the totalcostchargedto it. Thecosttransferring(amortization)schemeis requiredto be suchthat the sum of the amortizedcostsof the operationsis greaterthan or equalto the sum of their actualcosts.If we chargeone unit of thecostof a deleteoperationto eachof the insertssincethe last deleteoperation(if any), then two units of the costof Dlget transferredto II and 12(thechargedcostof eachincreasesby one),and four units of the costof D2 gettransferredto 13to16.Theamortizedcostof eachof II to 16becomestwo,that of 17becomesequalto its actualcost(that is,one),and that of eachofDl and D2 becomes6. Thesum of the amortizedcostsis 25,which is thesameas the sumof the actualcosts.

Now supposewe can prove that no matterwhat sequenceof insert anddeleteoperationsis performed,we can chargecostsin sucha way that theamortizedcostof eachinsertionis nomorethan two and that of eachdeletion


is no morethan six.This enablesus to claimthat the actualcostof anyinsert/deletesequenceis no morethan 2 * i + 6 * <i, where i and d are,respectively, the number of insert and deleteoperationsin the sequence.Supposethat the actualcostof a deletionis no morethan ten and that ofan insertionis one.Usingactualcosts,we can concludethat the sequencecostis no morethan i + 10* d. Combiningthesetwo bounds,we obtainmin{2* i + 6 * d, i + 10* d] as a bound on the sequencecost.Hence,usingthe notionof costamortization,we can obtain tighter boundson thecomplexity of a sequenceof operations.

Theamortizedtimecomplexitytoperforminsert,delete,and searchoperations in splay treesis O(logro).This amortizationis over n operations.In otherwords, the total timetaken for processingan arbitrary sequenceof n operationsis 0(nlogn). Someoperationsmay takemuch longerthanO(logn) time,but when amortizedover n operations,eachoperationcostsO(logro)time.

EXERCISES1.Write an algorithmtodeletean elementre from a binary searchtreet.

What is the timecomplexityof your algorithm?2. Presentan algorithmto start with an initially empty binary search

tree and maken random insertions.Usea uniform randomnumbergeneratorto obtain the values to be inserted.Measurethe heightofthe resultingbinary searchtreeand divide this heightby log2n. Dothis for n = 100,500,1,000,2,000,3,000,..., 10,000.Plot the ratioheight/log2n as a function of n. Theratioshouldbe approximatelyconstant(around2).Verify that this is so.

3. Supposethat eachnode in a binary searchtree also has the fieldleftsizeas describedin the text. Designan algorithmto insert anelementx into sucha binary searchtree.Thecomplexityof youralgorithm shouldbe O(h), whereh is the heightof the searchtree. Showthat this is the case.

4. Do Exercise3,but this timepresentan algorithmtodeletethe elementwith the fcth-smallestkey in the binary searchtree.

5. Find an efficient datastructurefor representinga subset S of theintegers from 1to n.Operationswe wish toperformon the setare

\342\200\242 INSERT(i)to insert the integeri to the setS.If i is already inthe set,this instructionmust be ignored.

\342\200\242 DELETEtodeletean arbitrary memberfrom the set.\342\200\242 MEMBER(j)to checkwhetheri is a memberof the set.

2.4.PRIORITYQUEUES 91

Your datastructureshouldenableeachoneof the above operationsinconstanttime(irrespectiveof the cardinality of S).

6. Any algorithmthat mergestwo sortedlistsof sizen and m,respectively, must make at leastn + m \342\200\224 1comparisonsin the worst case.What implicationsdoesthis have on the run timeof any comparison-basedalgorithmthat combinestwo binary searchtreesthat have n andm elements,respectively?

7. It is known that every comparison-basedalgorithmto sortn elementsmust make0(nlogn) comparisonsin the worst case.Whatimplications doesthis have on the complexity of initializinga binary searchtreewith n elements?

2.4 PRIORITY QUEUESAny data structurethat supportsthe operationsof searchmin (or max),insert,and deletemin (ormax,respectively)is calleda priority queue.

Example2.2Supposethat we aresellingthe servicesof a machine.Eachuser pays a fixed amount peruse.However,the timeneededby eachuseris different. We wish tomaximizethe returns from this machineunder theassumptionthat the machineis not to bekept idleunlessno useris available.This can be doneby maintaininga priority queueof allpersonswaiting tousethe machine.Whenever the machinebecomesavailable,the userwith thesmallest,timerequirementis selected.Hence,a priority queuethat supportsdeletemin is required.When a new user requeststhe machine,his or herrequestis put into the priority queue.

If eachuser needsthe sameamount of timeon the machinebut peopleare willing topay different amountsfor the service,then a priority queuebasedon the amount of payment canbemaintained.Whenever the machinebecomesavailable,the userwillingto pay the most is selected.Thisrequiresa deletemaxoperation. D

Example2.3Supposethat we aresimulatinga largefactory. This factoryhas many machinesand many jobsthat requireprocessingon someof themachines.An event is said to occurwhenever a machinecompletestheprocessingof a job.When an event occurs,the jobhas tobe moved to thequeuefor the next machine(if any) that it needs.If this queueis empty,the jobcanbeassignedto the machineimmediately.Also,a new jobcan bescheduledon the machinethat has becomeidle(provided that its queueisnot empty).

To determinethe occurrenceof events,a priority queue is used.Thisqueuecontainsthe finish timeof all jobsthat arepresently beingworkedon.


Thenext event occursat the leasttimein the priority queue.So,a priorityqueuethat supportsdeletemin can beused in this application. \342\226\241

Thesimplestway torepresenta priority queueis as an unorderedlinearlist.Supposethat we have n elementsin this queueand the deletemaxoperationis to besupported.If the list is representedsequentially,additionsare most easily performedat the end of this list.Hence,the insert timeis 0(1).A deletionrequiresa searchfor the elementwith the largestkey,followed by its deletion.Sinceit takesQ(n) timeto find the largestelementin an ro-element unorderedlist,the deletetimeis Q(n). Eachdeletiontakes@(n) time.An alternativeis to usean orderedlinearlist.Theelementsarein nondecreasingorderif a sequentialrepresentationis used.Thedeletetimefor eachrepresentationis \302\251(1)

and the inserttime0(n).When a max heapis used,both additionsand deletionscan beperformedin O(logro)time.

2.4.1 HeapsDefinition2.4[Heap]A max (min)heapis a completebinary treewith theproperty that the value at eachnodeis at leastas largeas (as smallas) thevalues at its children(if they exist).Callthis property the heap property.

\342\226\241

In this sectionwe study in detailan efficient way of realizinga priorityqueue.We might first considerusinga queuesinceinsertingnew elementswould be very efficient. But finding the largestelementwould necessitatea scan of the entirequeue.A secondsuggestionmight be to use a sortedlist that is storedsequentially.But an insertioncouldrequiremoving allof the itemsin the list.What we want is a datastructurethat allows both

operationsto be doneefficiently. Onesuchdatastructureis the maxheap.Thedefinition of a maxheap impliesthat one of the largestelementsis

at the rootof the heap.If the elementsaredistinct,then the rootcontainsthe largestitem.A maxheapcan be implementedusingan array a[].

To insert an elementinto the heap,one adds it \"at the bottom\" of theheapand then comparesit with its parent,grandparent,greatgrandparent,and so on, until it is lessthan or equal to one of thesevalues. AlgorithmInsert (Algorithm 2.8)describesthis processin detail.

Figure2.14shows oneexampleof how Insert would insert a new valueinto an existingheap of sixelements.It is clearfrom Algorithm 2.8andFigure2.14that the time for Insert can vary. In the best casethe newelementis correctlypositionedinitially and no valuesneedto berearranged.In the worst casethe numberof executionsof the while loopis proportionalto the numberof levelsin the heap.Thus if therearen elementsin the heap,insertinga new elementtakes\302\251(logn) timein the worst case.


1 Algorithmlnsert(a,n)2 {3 // Insertsa[n] into the heapwhich is storedin a[l:n \342\200\224 1].4 i :=n; item:=a[n];5 while ((i > 1) and (a[|_i/2J]< item)) do6 {7 a[{]:=a[[i/2\\];i:=[i/2\\;8 }9 a[i] :=item;returntrue;10 }

Algorithm2.8Insertioninto a heap

Figure2.14Action of Insert inserting90 into an existingheap


To deletethe maximumkey from the maxheap,we use an algorithmcalledAdjust. Adjust takesas input the array a[ ] and the integersi and n.It regardsa[l:n] asa completebinary tree. If the subtreesrootedat 1iand2i +1arealready maxheaps,then Adjust will rearrangeelementsof a[] suchthat the treerootedat i is alsoa maxheap.Themaximumelementfrom themaxheap a[\\ :n] can be deletedby deletingthe rootof the correspondingcompletebinary tree. The last elementof the array, that is,a[n],is copiedto the root,and finally we callAdjust(a,1,n \342\200\224 1).BothAdjust and DelMaxaredescribedin Algorithm 2.9.

1234567891011121314151617

12345678910

AlgorithmAdjust(o,i,n)//////{

}

The completebinary treeswith roots2i and 2i + 1arecombinedwith node i toform a heaprootedat i. Nonodehas an addressgreaterthan n or lessthan 1.

j :=2i;item:=a[i];while (j < n) do{

if ((j< n) and (a[j]< a[j+ 1]))thenj :=j + 1;// Compareleft and right child// and letj be the largerchild.

if (item>a[j])thenbreak;// A positionfor item is found.

o[b'/2J]== <*[?'];J \342\226\240= 2j;}a[L?/2J]:=ltem'i

AlgorithmDelMax(a,n,x)II{

}

Deletethe maximumfrom the heapa[l:n] and storeit in x.if (n = 0) then{

write (\"heapis empty\;")returnfalse;}x :=a[l];a[l]:=a[n];Adjust(a,l,n \342\200\224 1);returntrue;

Algorithm2.9Deletionfrom a heap


Note that the worst-caserun timeof Adjust is alsoproportionalto theheight, of the tree.Therefore,if therearen elementsin a heap,deletingthemaximumcan bedone in O(logn) time.

To sortn elements,it suffices tomake n insertionsfollowed by n deletionsfrom a heap. Algorithm 2.10has the details.Sinceinsertionand deletiontake O(logn) timeeachin the worst case,this sortingalgorithmhas a timecomplexityof 0(nlogn).

1 AlgorithmSort(a,n)2 // Sort the elementsa[\\ :n\\.3 {4 for i :\342\200\224 1to n do lnsert(a,i);5 for % :\342\200\224 n to 1 step\342\200\2241 do6 {7 DelMax(a,i,2i);a[i]:=x;\302\253 }9 }

Algorithm2.10A sortingalgorithm

It turns out that we can insertn elementsinto a heapfaster than we canapply Insert n times.Beforegetting into the detailsof the new algorithm,let us considerhow the n insertstakeplace. Figure2.15shows how thedata (40,80,35,90,45, 50,and 70) move around until a heap is createdwhen usingInsert.Treesto the left of any \342\200\224> representthe stateof the arraya[\\ :/] beforesomecallof Insert.Treesto the right of \342\200\224> showhow the arraywas alteredby Insert to producea heap.Thearray is drawn as a completebinary treefor clarity.

Thedataset that causesthe heapcreationmethodusingInsert to behavein the worst way is a setof elementsin ascendingorder.Eachnew elementrisesto becomethe new root.Thereare at most 2i~1nodeson level i ofa completebinary tree, 1< i < |\"log2(n+ 1)].Fora nodeon level i thedistanceto the root is i \342\200\224 1. Thus the worst-casetimefor heap creationusingInsert is

J2 (i ~ l)2i_1< riog2(n+ l)l2rioS2(\302\273+i)l= O(nlogn)

1<i<|\"log2(n+l)l

A surprisingfact about Insert is that its averagebehavioron n randominputs isasymptotically faster than its worst case,0(n)ratherthan 0(nlogn).


(g)

Figure2.15Forminga heap from the set {40,80,35,90,45,50,70}


Thisimpliesthat on the averageeachnew value only risesa constantnumberof levels in the tree.

It is possibletodevisean algorithmthat canperformn insertsin O(n)time rather than 0(nlogn).This reductionis achievedby an algorithmthat regardsany array a[\\ :n] as a completebinary treeand works from theleavesup tothe root,level by level.At eachlevel,the left and right subtreesof any nodeareheaps.Only the value in the rootnodemay violatethe heapproperty.

Given n elementsin a[l:n], we can createa heapby applying Adjust. Itis easy to seethat leafnodesare already heaps.So we can beginbycalling Adjust for the parentsof leafnodesand then work our way up, level bylevel, until the root is reached.The resultant algorithmis Heapify(Algorithm 2.11).In Figure2.16we observethe actionof Heapify as it createsa heap out of the given seven elements.The initial tree is drawn inFigure 2.16(a).Sincen = 7, the first calltoAdjust has %

= 3.In Figure2.16(b)the three; elements118,151,and 132are rearrangedto form a heap.Subsequently Adjust is calledwith i = 2 and %

= 1;this gives the trees inFigure2.16(c)and (d).

1 AlgorithmHeapify(a,n)2 // Readjustthe elementsin a[l:n] to form a heap.3 {4 for % := [n/2\\to 1 step\342\200\2241 doAdjust(a,i,n);5 }

Algorithm2.11Creatinga heapout of n arbitrary elements

For the worst-caseanalysis of Heapify let 2k~l < n < 2k, where k =[log(n+1)1,and recallthat the levels of the n-nodecompletebinary treearenumbered1tok. Theworst-casenumberof iterationsfor Adjust is A; \342\200\224\302\253

for a nodeon level i. Thetotaltimefor Heapify is proportionalto

j2 2>-i(k-%)= y, i2k~l~l<n E j72i^2n=\302\260(n) (21)

1<7<A- l<i<fc-l l<i<fc-l

ComparingHeapifywith the repeateduseof Insert,we seethat the formeris faster in the worst case,requiring0(n)versus0{nlog n) operations.However, Heapify requiresthat all the elementsbeavailablebeforeheapcreationbegins.Using Insert,we can add a new elementinto the heapat any time.

Our discussionon insert,delete,and soon, sofar has beenwith respectto a maxheap.It shouldbeeasy to seethat a paralleldiscussioncouldhave


(c) (d)

Figure2.16Action of Heapify(a,7)on the data (100,119,118,171,112,151,and 132)


beencarriedout with respectto a min heap.Fora min heap it is possibleto deletethe smallestelementin O(logn) timeand alsoto insertan elementin O(logn)time.

2.4.2 HeapsortThebest-knownexampleof the useof a heaparisesin its applicationtosorting. A conceptuallysimplesortingstrategy has beengiven before,in whichthe maximumvalue is continually removedfrom the remainingunsortedelements. A sortingalgorithmthat incorporatesthe fact that n elementscanbe insertedin 0{n)timeis given in Algorithm 2.12.

1 AlgorithmHeapSort(a,n)2 // a[l:n] containsn elementstobesorted.HeapSort3 // rearrangesthem inplaceinto nondecreasingorder.4 {5 Heapify(a,n); // Transform the array into a heap.6 // Interchangethe new maximumwith the element7 // at the end of the array.8 for i :=n to 2 step\342\200\2241 do9 {10 t :=a[i]]a[i]:=a[l];a[l]:=t;11 Adjust(a,l,i-1);12 }13 }

Algorithm2.12Heapsort

Thoughthe callof Heapify requiresonly 0{n)operations,Adjust possiblyrequiresO(logn) operationsfor eachinvocation.Thus the worst-casetimeis 0(?*logn).Noticethat the storagerequirements,besidesa[\\ :n], areonlyfor a few simplevariables.

A numberof otherdatastructurescanalsobeusedtoimplementapriority queue;.Examplesincludethe binomialheap,deap,Fibonacciheap,andsoon.A descriptionof thesecan be found in the bookby E.Horowitz, S.Sahni,and D.Mehta. Table2.2summarizesthe performancesof thesedatastructures.Many of thesedatastructuressupportthe operationsof deletingand searchingfor arbitrary elements(Red-Blacktreebeingan example),inadditionto the onesneededfor a priority queue.


DatastructureMin heapMin-maxheapDeapLeftist treeBinomialheap

Fibonacciheap

2-3treeRed-Blacktree

insertO(logn)(wc)O(logn) (wc)0(logn)(wc)O(logn) (wc)O(logn) (wc)0(1)(am)

O(logn)(wc)O(l)(am)

O(logn) (wc)^7(logn) (wc)

deletemin

O(logn) (wc)O(logn)(wc)O(logn)(wc)O(logn) (wc)O(logn)(wc)O(logn) (am)O(logn) (wc)O(logn) (am)O(logn)(wc)O(logn)(wc)

Table2.2Performancesof different data structureswhen realizingapriority queue.Here(wc) stands for worst caseand (am) denotesamortizedcost.

EXERCISES1.Verify for yourself that algorithmInsert (Algorithm 2.8)usesonly a

constant numberof comparisonsto insert a randomelementinto aheapby performingan appropriateexperiment.

2. (a) Equation2.1makesuse of the fact that the sum Yli^=i %%con~

verges.Provethis fact.(b) Useinductionto show that Yl=i2i~l(k- i) = 2k - k - l,k> 1.

3.Programand run algorithmHeapSort (Algorithm 2.12)and compareits timewith the timeof any of the sortingalgorithmsdiscussedinChapter1.

4. Designa data structurethat supportsthe following operations:INSERT and MIN. Theworst-caserun timeshouldbe 0(1)for eachoftheseoperations.

5.Noticethat a binary searchtreecan be used to implementa priorityqueue.

(a) Presentan algorithmto deletethe largestelementin a binarysearchtree.Your procedureshouldhave complexity0(h),whereh is the heightof the searchtree.Sinceh is O(logn) on average,you can performeachof the priority queueoperationsin averagetimeO(logn).

2.5.SETSAND DISJOINTSETUNION 101

(t>) Comparethe performancesof maxheapsand binary searchtreesas datastructuresfor priority queues.Forthis comparison,generate randomsequencesof insert and deletemaxoperationsandmeasurethe total timetaken for eachsequenceby eachof thesedatastructures.

6. Input is a sequenceX of n keys with many duplicationssuchthat thenumberof distinctkeys is d (<n). Presentan 0(nlogd)-timesortingalgorithmfor this input. (For example,if X = 5,6,1,18,6,4,4,1,.r>, 17,the numberof distinctkeys in X is six.)

2.5 SETSAND DISJOINTSETUNION2.5.1IntroductionIn this sectionwe study the use of forests in the representationof sets.We shallassumethat the elementsof the setsarethe numbers1,2,3,...,n.Thesenumbersmight,in practice,beindicesinto a symboltablein whichthenamesof the elementsarestored.We assumethat the setsbeingrepresentedarepairwisedisjoint(that is,if S{and Sj,i / j, aretwo sets,then thereis noelementthat is in both Si and Sj).Forexample,when n = 10,the elementscan bepartitionedinto threedisjointsets,,Si = {1,7,8,9},S2 = {2,5,10},and Sj= {3,4,6}.Figure2.17shows onepossiblerepresentationfor thesesets.In this representation,eachset is representedas a tree.Noticethat foreachsetwe have linkedthe nodesfrom the childrento the parent,ratherthanour usualmethodof linking from the parent to the children.Thereasonforthis changein linkagebecomesapparentwhen we discussthe implementationof setoperations.

Figure2.17Possibletreerepresentationof sets


Theoperationswe wish to performon thesesetsare:

1.Disjointset union.If Si and Sj are two disjointsets,then theirunion SiU Sj= allelementsx suchthat x is in Si orSj.Thus, ,SiU S2= {1,7, 8, 9,2,5,10}.Sincewe have assumedthat allsetsaredisjoint,we can assumethat following the union of Si and Sj,the setsSi andSjdo not existindependently;that is,they arereplacedby Si U Sj inthe collectionof sets.

2.Find(i).Given the elementi, find the set containingi. Thus, 4 is insetS3,and 9 is in set S\\.

2.5.2 UnionandFindOperationsLet us considerthe union operationfirst. Supposethat we wish to obtainthe union of ,Si and S2 (from Figure2.17).Sincewe have linked the nodesfrom childrento parent,we simply makeone of the treesa subtreeof theother.S\\ U S2 couldthen have oneof the representationsof Figure2.18.

S,uS2 or S,u5

Figure2.18Possiblerepresentationsof SiU S2

To obtainthe union of two sets,allthat has to bedoneis to setthe parentfield of oneof the rootsto the otherroot.This can be accomplishedeasilyif, with eachsetname,we keepa pointertothe rootof the treerepresentingthat set.If, in addition,eachroot has a pointerto the set name,then todeterminewhichsetan elementis currently in, we follow parent links to therootof its treeand usethe pointerto the setname.Thedatarepresentationfor Si,S2,and S3 may then takethe form shown in Figure2.19.

In presentingthe union and find algorithms,we ignorethe setnamesandidentify setsjust by the rootsof the treesrepresentingthem.This simplifies


setname pointer

\342\200\242S':

S2

S3

Figure2.19Datarepresentationfor Si,S-2,and S3

the discussion.The transition to set namesis easy. If we determinethatelementi is in a tree with rootj, and j has a pointerto entry k in theset name table, then the set name is just name[k]. If we wish to unitesetsS, and Sj, then we wish to unite the treeswith rootsFindPointer(S'j)and FindPointer(/Sj).HereFindPointer is a function that takesa set nameand determinesthe rootof the tree that representsit. This is doneby anexaminationof the [set name,pointer]table.In many applicationsthe setnameis just the elementat the root.Theoperationof Find(i)now becomes:Determinethe rootof the treecontainingelementi.Thefunction Union(i,j)requirestwo treeswith rootsi and j bejoined.Also to simplify, assumethatthe set elementsare the numbers1throughn.

Sincethe setelementsarenumbered1throughn, we representthe treenodesusing an array p{\\ :n], wheren is the maximumnumberof elements.Theith elementof this array representsthe treenodethat containselementi. This array elementgives the parent pointerof the correspondingtreenode.Figure2.20shows this representationof the setsSi,S2, and S3 ofFigure2.17.Noticethat rootnodeshave a parent of \342\200\2241.

%

p1-1 2

53-1 4

35-1 6

371

81

91

[10]5

Figure2.20Array representationof Si,S-2,and S3of Figure2.17


We can now implementFind(i)by following the indices,starting at iuntil we reacha nodewith parent value \342\200\224 1.Forexample,Find(6)startsat6 and then moves to 6'sparent,3.Sincep[3] is negative,we have reachedthe root.TheoperationUnion(i,j)is equally simple.We passin two treeswith roots% and j. Adopting the convention that the first treebecomesasubtreeof the second,the statementp[i] :=j; accomplishesthe union.

1 AlgorithmSimpleUnion(i,j)2 {3 p[i] :=j;4 }1 AlgorithmSimpleFind(j)2 {3 while (p[i\\ >0) doi :=p[i];4 returni;5 }

Algorithm2.13Simplealgorithmsfor union and find

Algorithm 2.13gives the descriptionsof the union and find operationsjust discussed.Although thesetwo algorithmsarevery easy tostate,theirperformancecharacteristicsarenot very good.Forinstance,if we start withq elementseachin a set of its own (that is,Si =

{\302\253},

1< i < q), then theinitialconfigurationconsistsof a forest with q nodes,and p[i] = 0,1< i < q.Now let us processthe following sequenceof union-findoperations:

Union(l,2),Union(2,3),Union(3,4),Union(4,5),..., Union(n\342\200\224 l,n)Find(l),Find{2),...,Find{n)

This sequenceresultsin the degeneratetreeof Figure2.21.Sincethe timetaken for a union is constant, the n \342\200\224 1 unionscan be

processedin time0(n).However,eachfind requiresfollowing a sequenceofparent pointersfrom the elementto be found to the root.Sincethe timerequiredto processa find for an elementat level i of a treeis 0(i),the totaltimeneededto processthe n finds is 0(Y%=ii) = 0(n2).

We can improve the performanceof our union and find algorithmsbyavoiding the creationof degeneratetrees.To accomplishthis, we make useof a weightingrule for

\342\226\240Union(i,j).


9T

A

0Figure2.21Degeneratetree

Definition2.5[Weighting rule for Union(i,j)]If the numberof nodesin thetreewith root i is lessthan the numberin the treewith rootj, then makej the parent of i;otherwisemake i the parent of j. \342\226\241

When we use the weightingrule to performthe sequenceof set unionsgiven before,we obtain the treesof Figure2.22.In this figure,the unionshave beenmodified so that the input parametervalues correspondto therootsof the treesto becombined.

To implementthe weightingrule,we needto know how many nodesthereare in every tree. To do this easily, we maintaina count field in the rootof every tree. If i is a rootnode,then count[i]equalsthe numberof nodesin that tree. Sinceall nodesotherthan the rootsof treeshave a positivenumberin the p field, we can maintainthe count in the p field of the rootsas a negativenumber.

Usingthis convention,we obtain Algorithm 2.14.In this algorithmthetimerequiredto performa unionhas increasedsomewhatbut is stillboundedby a constant (that is,it is O(l)).Thefind algorithmremainsunchanged.Themaximumtimeto performa find is determinedby Lemma2.3.Lemma 2.3Assume that we start with a forest of trees,eachhaving onenode.Let T be a tree with m nodescreatedas a result of a sequenceofunionseachperformedusingWeightedUnion. Theheightof T is no greaterthan [log2m\\ + 1.Proof:The lemmais clearly true for m = 1. Assume it is true for alltreeswith i nodes,i < m \342\200\224 1. We show that it is alsotrue for i = m.


r> (2

initial

Union(l,2) Union(l,3)

Union(l,4) Union(l,n)

Figure2.22Treesobtainedusingthe weightingrule

1234567891011121314

AlgorithmWeightedUnion(i,j)IIII{

}

Unionsetswith rootsi and j, i / j, usingtheweightingrule.p[i] =

\342\200\224count[i]and p\\j] =

\342\200\224count[j].

temp:=p[i]+p\\j];if (p[i\\ >p\\j]) then{ // i has fewer nodes.

P[i] \342\226\240= j;P[j] \342\226\240= temp;}else{// j has fewer or equalnodes.

P[j] \342\226\240= *! p[i] \342\226\240= temp;}

Algorithm2.14Unionalgorithmwith weightingrule


Let T be a tree with m nodescreatedby WeightedUnion. Considerthelast unionoperationperformed,Union(k,j).Let a be the numberof nodesin treej, and m \342\200\224 a the number in k. Without lossof generalitywe canassume1 < a < y. Then the heightof T is eitherthe sameas that of kor is one morethan that of j. If the former is the case,the heightof T is< [\\o\\x,2(m

\342\200\224

a)\\ + 1< [log-2rnJ + 1-However,if the latter is the case,theheightof T is < |_log2a\\+2<[log2f J + 2 < [log2m\\ + 1. \342\226\241

Example2.4shows that the boundof Lemma2.3is achievablefor somesequeliceof unions.

Example2.4Considerthe behaviorof WeightedUnion on the followingsequenceof unionsstarting from the initial configurationp[i] =

\342\200\224count[i]=

-1,1< i <8=n:

Union(l,2),Union(3,4),Union(5,6),Union(7,8),\302\243/mon( 1,3),Union(b,7), Union(1,5)

Thetreesof Figure2.23areobtained.As is evident,the heightof eachtreewith in nodesis [log2mj + 1. \342\226\241

FromLemma2.3,it follows that the timetoprocessa find is O(logm)iftherearem elementsin a tree. If an intermixedsequenceofu\342\200\2241 unionand/ find operationsis to beprocessed,the timebecomes0(u+flogu),as notreehas morethan u nodesin it. Of course,we need0(n)additionaltimeto initializethe n-treeforest.

Surprisingly,further improvementis possible.This timethe modificationis madein the find algorithmusingthe collapsingrule.

Definition2.6[Collapsingrule]:If j is a nodeon the path from i to itsrootand p\\i] / root[i],then setp[j]toroot[i]. \342\226\241

CollapsingFind(Algorithm 2.15)incorporatesthe collapsingrule.

Example2.5Considerthe treecreatedby WeightedUnion on the sequenceof unionsof Example2.4.Now processthe following eightfinds:

Find(8),Find(8),...,Find(8)If SimpleFindis used,eachFind(8)requiresgoingup threeparent link fieldsfor a totalof 24 moves toprocessalleightfinds. When CollapsingFindis used,the first Find(8)requiresgoingup threelinks and then resettingtwo links.Note that even thoughonly two parent links needto bereset,CollapsingFindwill resetthree(the parent of 5 is resetto 1).Eachof the remainingsevenfinds requiresgoingup only one link field. The total costis now only 13moves. \342\226\241


[-1][-1][-1] [-1][-1][-1][-1][-1]T) (2) (3) (4) (5) (6) (T

(a)Initial height-1trees

[-2]T

[-2] [-2] [-2]

J) (4) (6J(b)Height-2treesfollowing Union(\\,2),(3,4),(5,6),and (7,8)

(c)Height-3treesfollowing Union(l,3)and (5,7)

(d)Height-4treefollowing Union(l,5)

Figure2.23Treesachievingworst-casebound


1 AlgorithmCollapsingFind(i)2 // Find the rootof the treecontainingelementi. Usethe3 // collapsingrule to collapseall nodesfrom i to the root.4 {5 r :=i;6 while (p[r] > 0) dor :=p[r];// Find the root.7 while (i / r) do // Collapsenodesfrom i to rootr.8 {9 s :=p[i\\; p[i] :=r; % :=s;10 }11 returnr;12 }

Algorithm2.15Find algorithmwith collapsingrule

In the algorithmsWeightedUnion and CollapsingFind,use of thecollapsing

ruleroughly doublesthe timefor an individual find. However,it reducesthe worst-casetimeover a sequenceof finds. Theworst-casecomplexityofprocessinga sequenceof unionsand finds usingWeightedUnion andCollapsingFind is statedin Lemma2.4.This lemmamakesuseof a function a(p,q)that is relatedto a functionalinverseof Ackermann'sfunction A(i,j).Thesefunctions aredefinedas follows:

A(l,j)= 2i forj>lA(i,l)=A{i-1,2) fori>2A(i,j)= A(i - l,A(i,j- 1)) for i,j> 2

Pa(p,q)= min-0> l\\A(z, |_-J)> log2g},p > q > 1

The function A(i,j)is a very rapidly growing function. Consequently,a grows very slowlyasp and q are increased.In fact, since 4(3,1)= 16,a(Pi<])< 3 for g < 216 = 65,536and p > q. Since^4(4,1)is a very largenumberand in our applicationq is the numbern of set elementsand p isn + / (/ is the numberof finds),a(p,q) < 4 for allpracticalpurposes.

Lemrna2.4[Tarjan and Van Leeuwen]Assume that we start with a forestof trees,eachhaving onenode.Let T(f,u)be the maximumtimerequiredto processany intermixedsequenceof / finds and u unions. Assume thatu > \\. Then


ki[n+fa{f+n,n)]<T{f,u)<k2[n+fa(f+n,n)]

for somepositiveconstantsk\\ and k2. Q

Therequirementthat u > ^ in Lemma2.4is really not significant,aswhen u < ^, someelementsare involved in no union operation.Theseelementsremainin singletonsetsthroughout the sequenceof union andfind operationsand canbeeliminatedfrom consideration,as find operationsthat involve thesecanbedonein O(l)timeeach.Even thoughthe functiona(f,u) is a very slowlygrowingfunction,the complexityof our solutiontothe set representationproblemis not linearin the numberof unions andfinds. Thespacerequirementsareonenodefor eachelement.

In the exercises,we explorealternativesto the weight rule and thecollapsing rule that preservethe timeboundsof Lemma2.4.

EXERCISES1.Supposewe start with n sets,eachcontaininga distinctelement.

(a) Show that if u unionsareperformed,then no set containsmorethan u + 1elements.

(b) Show that at most n \342\200\224 1 unionscan be performedbeforethenumberof setsbecomes1.

(c) Show that if fewer than\\^~\\

unionsareperformed,then at leastonesetwith a singleelementin it remains.

(d) Showthat if u unionsareperformed,then at leastmax{n\342\200\224 2u,0}singletonsetsremain.

2. Experimentallycomparethe performanceof SimpleUnionand Sim-pleFind (Algorithm2.13)with WeightedUnion (Algorithm 2.14)andCollapsingFind(Algorithm2.15).Forthis, generatea randomsequenceof unionand find operations.

3. (a) Presentan algorithmHeightUnion that uses the height rule forunionoperationsinsteadof the weightingrule.Thisruleis definedbelow:

Definition2.7[Height rule] If the height of tree i is lessthanthat of treej, then make j the parent of i;otherwisemake % theparent of j. \342\226\241

Your algorithmmust run in O(l)timeand shouldmaintaintheheightof eachtreeas a negativenumberin the p field of the root.

SETSAND DISJOINTSETUNION 111

(b) Show that the heightbound of Lemma2.3appliesto treesconstructed usingthe heightrule.

(c) Give an exampleof a sequenceof unionsthat start with nsingleton setsand createtreeswhose heightsequalthe upper boundsgiven in Lemma2.3.Assume that eachunionis performedusingthe heightrule.

(d) Experimentwith the algorithmsWeightedUnion (Algorithm2.14)and HeightUnion to determinewhichproducesbetterresultswhenused in conjunctionwith CollapsingFind(Algorithm 2.15).

(a) Write an algorithmSplittingFindthat usespath splitting, definedbelow,for the find operationsinsteadof path collapsing.

Definition2.8[Path splitting] The parent pointerin eachnode(exceptthe rootand its child)on the path from i to the root ischangedto point tothe node'sgrandparent. \342\226\241

Note that when path splittingis used,a singlepassfrom i to theroot suffices. R. Tarjan and J. Van Leeuwenhave shown thatLemma2.4holdswhen path splittingis used in conjunctionwitheitherthe weight or the heightrule for unions.

(b) Experimentwith CollapsingFind(Algorithm2.15)and SplittingFindto determinewhichproducesbetterresultswhen usedinconjunction with WeightedUnion (Algorithm 2.14).

(a) Designan algorithmHalvingFind that usespath halving, definedbelow,for the find operationsinsteadof path collapsing.

Definition2.9[Path halving] In path halving, the parentpointerof every othernode(exceptthe rootand its child)on the pathfrom i to the rootis changedto point to the nodesgrandparent.

\342\226\241

Note that path halving, like path splitting (Exercise4), can beimplementedwith a singlepassfrom i to the root.However,inpath halving, only half as many pointersarechangedas in pathsplitting.Tarjan and Van Leeuwenhave shown that Lemma2.4holdswhen path halving is used in conjunctionwith eithertheweight or the heightrule for unions.

(b) Experimentwith CollapsingFindand HalvingFind to determinewhichproducesbetterresultswhen used in conjunctionwithWeightedUnion (Algorithm2.14).


2.6 GRAPHS

2.6.1 IntroductionThefirst recordedevidenceof the use of graphsdatesback to 1736,whenLeonhardEulerusedthem to solvethe now classicalKonigsbergbridgeproblem. In the town of Konigsberg(now Kaliningrad)the river Pregel(Pre-golya) flows aroundthe islandKneiphofand then divides into two. Thereare,therefore,four landareasthat have this river on their borders(seeFigure 2.24(a)).Theseland areasareinterconnectedby sevenbridges,labeledato g.Theland areasthemselvesarelabeledA to D.TheKonigsbergbridgeproblemis to determinewhether,starting at one land area, it is possibleto walk acrossall the bridgesexactlyoncein returningto the starting landarea.Onepossiblewalk: Startingfrom land areaB,walk acrossbridgea toislandA, takebridgee to areaD, takebridgeg to C,takebridged to A,takebridgeb to Z?, and takebridge/ to D.

This walk doesnot goacrossallbridgesexactlyonce,nor doesit returnto the starting land areaB.Euleransweredthe Konigsbergbridgeproblemin the negative:Thepeopleof Konigsbergcannot walk acrosseachbridgeexactlyonceand return to the starting point.He solved the problembyrepresentingthe land areasas verticesand the bridgesas edgesin a graph(actually a multigraph)as in Figure2.24(b).His solutionis elegantandappliesto all graphs.Defining the degreeof a vertex to be the numberofedgesincidentto it,Eulershowedthat thereis a walkstartingat any vertex,goingthrougheachedgeexactlyonceand terminatingat the start vertexifand only if the degreeof eachvertexis even. A walk that doesthis is calledEulerian.Thereis no Eulerianwalk for the Konigsbergbridgeproblem,asall four verticesareof odddegree.

Sincethis first application,graphs have beenused in a wide variety ofapplications.Someof theseapplicationsare the analysis of electriccircuits, finding shortestroutes,projectplanning,identificationof chemicalcompounds,statisticalmechanics,genetics,cybernetics,linguistics,socialsciences,and soon.Indeed,it might well be saidthat of all mathematicalstructures,graphsarethe most widely used.

2.6.2 DefinitionsA graph G consistsof two setsV and E. Theset V is a finite, nonemptyset of vertices.ThesetE is a setof pairsof vertices;thesepairsarecallededges.ThenotationsV(G)andE{G)representthe setsof verticesand edges,respectively,of graph G.We alsowrite G = (V, E) torepresenta graph.Inan undirectedgraph the pairof verticesrepresentingany edgeis unordered.Thus,the pairs(u,v) and (v, u) representthe sameedge.In a directedgrapheachedgeis representedby a directedpair (u,v); u is the tail and v the

2.6.GRAPHS 113

Figure2.24Sectionof the river Pregelin Konigsbergand Euler'sgraph


head of the edge.Therefore,(v, u) and (u,v) representtwo different edges.Figure2.25showsthreegraphs:G\\, G2,and G3.ThegraphsG\\ and G2 areundirected;G3 is directed.

(a)G, (b)G2 (c)G3

Figure2.25Threesamplegraphs

Theset representationsof thesegraphsare

V{GX) = {1,2,3,4} E{GX)= {(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)}V{G2)= {1,2,3,4,5,6,7}E(G2)= {(1,2),(1,3),(2,4),(2,5),(3,6),(3,7)}V(G3)= {1,2,3} \302\243(G3)

= {(1,2),(2,1),(2,3)}Noticethat the edgesof a directedgrapharedrawn with an arrow from thetail to the head.Thegraph G2 is a tree;the graphsG\\ and G3 arenot.

Sincewe define the edgesand verticesof a graph as sets,we imposethefollowing restrictionson graphs:

1.A graph may not have an edgefrom a vertexv back to itself.That is,edgesof the form (v, v) and (v,v) arenot legal.Suchedgesareknownas self-edgesor self-loops.If we permitself-edges,we obtain a dataobjectreferredto as a graph with self-edges.An exampleis shown in

Figure2.26(a).2.A graph may not have multipleoccurrencesof the sameedge.If we

removethis restriction,we obtaina dataobjectreferredto asa multi-graph (seeFigure2.26(b)).

Thenumberof distinctunorderedpairs(u,v) with u / v in a graphwithn verticesis n^n2~ '.This is the maximumnumberof edgesin any n-vertex,undirectedgraph.An n-vertex,undirectedgraph with exactly\\\342\200\224- edgesis saidto be complete.ThegraphG\\ of Figure2.25(a)is the completegraph

2.6.GRAPHS 115

(a)Graph with a selfedge (b) Multigraph

Figure2.26Examplesof graphlikestructures

on four vertices,whereasG2 and G3 arenot completegraphs.In the caseofa directedgraphon n vertices,the maximumnumberof edgesis n{n \342\200\224 1).

If (u,v) is an edgein E(G),then we say verticesu and v areadjacentandedge(u,v) is incidenton verticesu and v. Theverticesadjacentto vertex2in G2 are4, 5, and 1.The edgesincidenton vertex3 in G2 are (1,3),(3,6),and (3,7).If (u,v) is a directededge,then vertexu is adjacentto v, and vis adjacentfrom u. The edge(u,v) is incidenttou and v. In G3,the edgesincidenttovertex2 are (1,2),(2,1),and (2,3).

A subgraphof G is a graph G'such that V(G') C V(G) and E(G')CE{G).Figure2.27showssomeof the subgraphsof G\\ and G3.

A path from vertex u to vertex v in graph G is a sequenceof verticesu,ii,i,2,\342\226\240\342\226\240\342\226\240,ik,v,such that {u,ii),{11,12),\342\226\240\342\226\240

\342\226\240, {ik>v) are edgesin E{G).IfG'is directed,then the path consistsof the edges{u,ii),{11,12),\342\226\240\342\226\240

\342\226\240, (it,\in E{G').The length of a path is the numberof edgeson it. A simplepathis a path in which all verticesexceptpossibly the first and last aredistinct.A path suchas (1,2),(2,4),(4,3),is alsowritten as 1,2,4, 3.Paths1,2,4,3 and 1,2,4, 2 of G\\ areboth of length3. The first is a simplepath; thesecondis not.The path 1,2,3 is a simpledirectedpath in G3,but 1,2,3,2 is not a path in G3,as the edge(3,2)is not in E{G%).

A cycleis a simplepath in which the first and last verticesarethe same.Thepath 1,2,3,1is a cycle in Giand 1,2,1is a cycle in G3.For directedgraphswe normally add the prefix \"directed\"to the termscycle and path.

In an undirectedgraphG,two verticesu and v aresaidto beconnectediffthereisa path in G from u tov (sinceG is undirected,this meanstheremustalsobea path from v tou). An undirectedgraph is saidto beconnectedifffor every pairof distinctverticesu and v in V{G),thereis a path from u tov in G.GraphsGiand G2 areconnected,whereasG4 of Figure2.28is not.


(i) (ii) (iii)

(a) Someof the subgraphs of G|

(i) (ii) (iii)

(b) Someof the subgraphs of G3(iv)

Figure2.27Somesubgraphs

2.6.GRAPHS 117

A connectedcomponent(or simply a component)H of an undirectedgraphis a maximalconnectedsubgraph.By \"maximal,\" we meanthat G containsno othersubgraphthat is both connectedand properly containsH. G4 hastwo components,H\\ and H2 (seeFigure2.28).

H\\ (1

Figure2.28A graphwith two connectedcomponents

A treeis a connectedacyclic(i.e.,has no cycles)graph.A directedgraphG is said to be strongly connectediff for every pair of distinct verticesuand v in V(G), there is a directedpath from u to v and alsofrom v tou. The graph G3 (repeatedin Figure2.29(a))is not strongly connected,as there is no path from vertex 3 to 2. A strongly connectedcomponentis a maximalsubgraphthat is strongly connected.The graph G3 has twostrongly connectedcomponents(seeFigure2.29(b)).

Thedegreeof a vertex is the numberof edgesincidentto that vertex.Thedegreeof vertex 1 in G\\ is 3. If G is a directedgraph, we define thein-degrceof a vertexv to be the numberof edgesfor which v is the head.Theout-degreeis defined to be the numberof edgesfor which v is the tail.Vertex2 of G3 has in-degree1,out-degree2,and degree3.If di is the degreeof vertex i in a graph G with n verticesand e edges,then the numberofedgesis

5> /2

In the remainderof this chapter,we refer to a directedgraphas a digraph.When we usethe termgraph,we assumethat it is an undirectedgraph.


(

Figure2.29A graphand itsstrongly connectedcomponents

2.6.3 GraphRepresentations

Although severalrepresentationsfor graphsarepossible,we study only thethreemostcommonly used:adjacencymatrices,adjacencylists,andadjacency multilists.Onceagain, the choiceof a particularrepresentationdependson the applicationwe have in mind and the functions we expecttoperformon the graph.

AdjacencyMatrix

Let G = (V,E) be a graph with n vertices,n > 1.Theadjacencymatrixof G is a two-dimensionaln x n array, say a, with the property that a[i,j]= 1 iff the edge(i,j)((i,j)for a directedgraph) is in E(G).Theelementa[hJ] = 0 if there is no suchedgein G. The adjacencymatricesfor thegraphsGi,G3, and G4 areshown in Figure2.30.The adjacencymatrixforan undirectedgraph is symmetric,as the edge(i,j)is in E(G)iff the edge(j,i) is alsoin E(G).Theadjacencymatrixfor a directedgraphmay not besymmetric (as is the casefor G3).Thespaceneededto representa graphusingits adjacencymatrixis n2 bits.About half this spacecan be saved inthe caseof an undirectedgraph by storingonly the upperor lower triangleof the matrix.

Fromthe adjacencymatrix,we can readily determinewhetherthere isan edgeconnectingany two verticesi and j. Foran undirectedgraph thedegreeof any vertexi is its row sum:

(a) (b)

2.6.GRAPHS 119

12 3 40 111101111011110

1

23

1 2 3

0 1 01 0 1

0 0 0

(a)G, (b)G3

12 3 4 5 6 7 8

\"o i i o o o o o~100 100 0 0100 100 0 00 1 10 0 0 0 00 0 0 0 0 1000 0 0 0 10100 0 0 0 0 10 1

0 0 0 0 0 0 10(c)G4

Figure2.30Adjacency matrices

i=iFora directedgraphthe row sum is the out-degree,and the columnsum isthe in-degree.

Supposewe want to answer a nontrivial questionabout graphs,suchasHow many edgesare therein G1or IsG connected?Adjacency matricesrequireat leastn2 time,as n2 \342\200\224 n entriesof the matrix(diagonalentriesare zero)have to be examined.When graphsaresparse(i.e.,most of thetermsin the adjacencymatrixarezero),we would expectthat the formerquestioncouldbe answeredin significantlylesstime,say 0(e+ n), whereeis the numberof edgesin G,and e < ~. Sucha speedupis madepossiblethroughthe useof a representationin whichonly the edgesthat arein G areexplicitlystored.Thisleadsto the nextrepresentationfor graphs,adjacencylists.

AdjacencyListsIn this representationof graphs, the n rows of the adjacencymatrixarerepresentedas n linked lists.Thereis one list for eachvertex in G. Thenodesin list i representthe verticesthat are adjacentfrom vertexi. Eachnodehas at leasttwo fields:vertexand link.The vertexfield containstheindicesof the verticesadjacentto vertexi. The adjacencylistsfor Gi, G3,


headnodes

~\"~

\"

\"

\"

vertex link

4

3

2

1

__^~~~

\"

~~

_^\"

2

4

4

2

\"\"

\"

~~

\"

3

1

1

3

0

0

0

0

(a)G,

headnodes

0

~~\"

___\"

9

3

0

__~\" 1 0

(b)G3

/iea^nodes

\"\"\"

~~

___\"

~~

\"\"\"

~~~

~~

~~~

3

4

1

2

6

7

6

7

0

0

~~~

__^~~

___

~~

\"

2

1

4

3

__~~

\"

5

8

0

0

0

0

0

0

(c)G4

Figure2.31Adjacency lists

2.6.GRAPHS 121

and G'4 are shown in Figure2.31.Noticethat the verticesin eachlist arenot requiredtobeordered.Eachlist has a headnode.Theheadnodesaresequential,and soprovide easy randomaccesstothe adjacencylist for anyparticularvertex.

Foran undirectedgraphwith n verticesand e edges,this representationrequiresn head nodesand 2e list nodes.Each list nodehas two fields.Intermsof the numberof bitsof storageneeded,this countshouldbemultipliedby logn for the headnodesand logn + loge for the list nodes,as it takesO(logm)bitstorepresenta numberof value m. Often, you cansequentiallypack the nodeson the adjacencylists,and thereby eliminatethe use ofpointers.In this case,an array node [1: n + 2e+ 1]can be used.Thenode[i]gives the starting point of the list for vertex i, 1 < i < n, andnode[n+ 1]is set to n + 2e+ 2. The verticesadjacentfrom vertex i arestoredin node[i],...,node[i+ 1]\342\200\224 1,1 < i < n. Figure2.32shows thesequentialrepresentationfor the graph G4 of Figure2.28.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

10 12 14 16 18 19 21 23 24 3 2 4 1 1 4 2 3 6 7 5 6 8 7

Figure2.32Sequentialrepresentationof graph G4:Array node[\\:n + 2e+ 1]

Thedegreeof any vertex in an undirectedgraph can be determinedbyjust countingthe numberof nodesin its adjacencylist.So,the numberofedgesin G can bedeterminedin 0(n+ e) time.

Fora digraph,the numberof list nodesis only e.Theout-degreeof anyvertexcanbedeterminedby countingthe numberof nodeson its adjacencylist.Hence,the totalnumberof edgesin G can be determinedin 0(n+ e)time.Determiningthe in-degreeof a vertexis a littlemorecomplex.If thereis a needto accessrepeatedlyallverticesadjacentto anothervertex,thenit may be worth the effort to keepanother set of listsin additionto theadjacencylists.This set of lists,calledinverseadjacency lists, containsonelist for eachvertex.Eachlist containsa nodefor eachvertexadjacentto thevertexit represents(seeFigure2.33).

Onecan alsoadopta simplerversion of the list structurein which eachnodehas four fieldsand representsoneedge.Thenodestructureis

tail I head I columnlink for head I row link for tail

Figure2.34showsthe resultingstructurefor the graphG3 of Figure2.25(c).Theheadnodesarestoredsequentially.


[1]

[2]

[3]

~~

\"

~~

2

1

\">

0

0

0

Figure2.33Inverseadjacencylistsfor G3 of Figure2.25(c)

headnodes(shown twice)

1

1 0

3

\\

2 3'

0 0

3 0

Figure2.34Orthogonallist representationfor G3 of Figure2.25(c)

2.6.GRAPHS 123

AdjacencyMultilistsIn the adjacency-listrepresentationof an undirectedgraph,eachedge(u,v)is representedby two entries,oneon the list for u and the otheron the listfor v. In someapplicationsit is necessaryto beableto determinethe secondentry for a particularedgeand mark that edgeas having beenexamined.Thiscan beaccomplishedeasily if the adjacencylistsare maintainedasmultilists(i.e.,listsin which nodescan besharedamongseverallists).Foreachedgethereis exactlyone node,but this nodeis in two lists(i.e.,theadjacencylistsfor eachof the two nodesto which it is incident).Thenewnodestructureis

m vertex! vertex*! list! Iist2

wherem is aone-bitmark field that canbeusedto indicatewhetherthe edgehas beenexamined.Thestoragerequirementsare the sameas for normaladjacencylists,exceptfor the additionof the mark bit m. Figure2.35showsthe adjacencymultilistsfor G\\ of Figure2.25(a).

headnodesNl

N2

1

1

2

3

N2 N4

N3 N4

N3

N4

1 4 0 N5

2 3 N5 N6

N5

N6

2 4 0 N6

3 4 0 0

edge(1,2)

edge(1,3)

edge(1,4)

edge(2,3)

edge(2,4)

edge(3,4)

The lists are vertex 1vertex 2.vertex 3vertex 4:

Nl h>N2h>N3Nl h>N4h>N5N2 h> N4 h> N6N3 h> N5 h> N6

Figure2.35Adjacency multilistsfor G\\ of Figure2.25(a)


WeightedEdgesIn many applications,the edgesof a graph have weights assignedto them.Theseweights may representthe distancefrom onevertextoanotheror thecostof goingfrom one vertextoan adjacentvertex.In theseapplications,the adjacencymatrixentriesa[i,j]keepthis informationtoo.Whenadjacency listsareused,the weight information canbekept in the list nodesbyincludingan additionalfield, weight. A graphwith weightededgesis calleda network.

EXERCISES1.Doesthe multigraphof Figure2.36have an Eulerianwalk? If so,find

one.

Figure2.36A multigraph

2.Forthe digraphof Figure2.37obtain

(a) the in-degreeand out-degreeof eachvertex(b) its adjacency-matrixrepresentation(c) its adjacency-listrepresentation(d) its adjacency-multilistrepresentation(e) its strongly connectedcomponents

3.Devisea suitablerepresentationfor graphssothat they can bestoredondisk.Write an algorithmthat readsin sucha graphand createsitsadjacencymatrix.Write anotheralgorithmthat createsthe adjacencylistsfrom the disk input.

4. Draw the completeundirectedgraphs on one,two, three,four, andfive vertices.Provethat the numberof edgesin an n-vertexcomplete

i \342\200\242

n(n\342\200\224l)graph is-^-2\342\200\224\342\200\242

2.6.GRAPHS 125

Figure2.37A digraph

5. Is the directedgraph of Figure2.38strongly connected?List all thesimplepaths.

Figure2.38A directedgraph

6. Obtain the adjacency-matrix,adjacency-list,and adjacency-multilistrepresentationsof the graphof Figure2.38.

7. Showthat the sum of the degreesof the verticesof an undirectedgraphis twice the numberof edges.

8. Proveordisprove:

If G(V,E) is a finite directedgraphsuchthat the out-degreeof eachvertexis at leastone,then thereis a directedcycleinG.

9. (a) Let G be a connected,undirectedgraph on n vertices. Showthat G must have at leastn \342\200\224 1edgesand that all connected,undirectedgraphswith n \342\200\224 1edgesaretrees.


(b) What is the minimumnumberof edgesin a strongly connecteddigraphwith n vertices?What form dosuchdigraphshave?

10.Foran undirectedgraph G with n vertices,prove that the followingareequivalent:

(a) G is a tree.(b) G is connected,but if any edgeis removed,the resultinggraph is

not connected.(c) For any two distinctverticesu G V(G) and v G V(G), thereis

exactlyonesimplepath from u tov.

(d) G containsno cyclesand has n \342\200\224 1edges.11.Write an algorithmto input the numberof verticesin an undirected

graphand its edgesoneby oneand to setup the linked adjacency-listrepresentationof the graph. You may assumethat no edgeis inputtwice.What is the run timeof your procedureas a function of thenumberof verticesand the numberof edges?

12.Do the precedingexercisebut now set up the multilist representation.13.Let G be an undirected,connectedgraph with at leastone vertexof

odddegree.Show that G containsno Eulerianwalk.

2.7 REFERENCESAND READINGSA wide-rangingexaminationof datastructuresand their efficientimplementation canbefound in the following:Fundamentalsof DataStructuresin C++,by E.Horowitz,S.Sahni,and D.Mehta, ComputerSciencePress,1995.DataStructuresand Algorithms 1:Sortingand Searching,by K.Mehlhorn,Springer-Verlag,1984.Introductionto Algorithms: A Creative Approach,by U.Manber,Addison-Wesley, 1989.Handbook of Algorithms and Data Structures,secondedition,by G.H.Gonnetand R. Baeza-Yates,Addison-Wesley,1991.

Proofof Lemma2.4can be found in \"Worst-caseanalysis of set unionalgorithms,\"by R. Tarjanand J.Van Leeuwen,Journalof the ACM'31;no.2 (1984):245-281.

Chapter3

DIVIDE-AND-CONQUER

3.1 GENERALMETHOD

Given a function to computeon n inputs the divide-and-conquerstrategysuggestssplittingthe inputs into k distinctsubsets,1< k < n, yielding k

subproblems.Thesesubproblemsmust be solved, and then a methodmustbe found tocombinesubsolutionsinto a solutionof the whole. If the sub-problemsarestillrelatively large,then the divide-and-conquerstrategy canpossibly be reapplied.Often the subproblemsresultingfrom a divide-and-conquerdesignareof the sametype as the originalproblem.For thosecasesthe reapplicationof the divide-and-conquerprincipleis naturally expressedby a recursivealgorithm.Now smallerand smallersubproblemsof the samekind fire generateduntil eventually subproblemsthat aresmallenoughto besolvedwithout splittingareproduced.

To bemoreprecise,supposewe considerthe divide-and-conquerstrategywhen it splitsthe input into two subproblemsof the samekind as the originalproblem. This splitting is typical of many of the problemswe examinehere.We can write a controlabstractionthat mirrorsthe way an algorithmbasedon divide-and-conquerwill look.By a controlabstractionwe meana procedurewhose flow of controlis clearbut whose primary operationsarespecifiedby otherprocedureswhoseprecisemeaningsareleft undefined.DAndC (Algorithm 3.1)is initially invoked as DAndC(P),whereP is theproblemtobe solved.

Small(P)is a Boolean-valuedfunction that determineswhetherthe inputsizeis smallenoughthat the answer can be computedwithout splitting.Ifthis is so,the function S is invoked. Otherwisethe problemP is dividedinto smallersubproblems.ThesesubproblemsPi,P2,\342\200\242\342\200\242\342\200\242,Pk are solved byrecursiveapplicationsof DAndC.Combineis a function that determinesthesolutionto P usingthe solutionsto the k subproblems.If the sizeof P is nand the sizesof the k subproblemsareni,ri2,\342\200\242\342\200\242

\342\200\242,njt, respectively,then the

127

128 CHAPTER3. DIVIDE-AND-CONQUER

1 AlgorithmDAndC(P)2 {3 if Small(P)thenreturnS(P);4 else5 {6 divide P into smallerinstancesPi,P2,...,Pk, k > 1;7 Apply DAndC to eachof thesesubproblems;8 returnCombine(DAndC(Pi),DAndC(P2)r..,DAndC(Pfc));9 }10 }

Algorithm3.1Controlabstractionfor divide-and-conquer

computingtimeof DAndC is describedby the recurrencerelation

T(\342\200\236\\_ I 9(n) n small ( .

{n> ~ [ T(m)+T(n2)+ ---+T(nk) + f(n) otherwise ^^

whereT(n) is the timefor DAndCon any input of sizen and g(n) is the timeto computethe answer directly for smallinputs. The function f(n) is thetimefor dividing P and combiningthe solutionsto subproblems.Fordivide-and-conquer-basedalgorithmsthat producesubproblemsof the sametypeas the originalproblem,it is very natural to first describesuchalgorithmsusingrecursion.

Thecomplexityof many divide-and-conqueralgorithmsis given byrecurrences of the form

T(n) ={ aT(n/b) + f(n) n > 1 (3>2)

wherea and b areknown constants.We assumethat T(l)is known and nis a power of b (i.e.,n = bk).

Oneof the methodsfor solvingany suchrecurrencerelationis calledthesubstitution method. This methodrepeatedlymakessubstitution for eachoccurrenceof the function T in the right-handsideuntil allsuchoccurrencesdisappear.

3.1.GENERALMETHOD 129

Example3.1Considerthe casein which a = 2 and 6 = 2. Let T(l)and f(n) = n.We have

T{n) = 2T(n/2)+ n= 2[2T(n/4)+ n/2] + n= 4T(n/4) +2n= 4[2T(n/8)+ n/4] + 2n= 8T(n/8)+ 3n

In general,we seethat T(n) = 2lT(n/2l)+ in,for any log2n > i > 1.Inparticular,then, T(n) = 2log2\"T(n/2log2\")+nlog2n,correspondingto thechoiceof i = log2n. Thus,T(n) = nT(l)+ n log2n = n log2n + In. \342\226\241

Beginningwith the recurrence(3.2)and usingthe substitutionmethod,it can be shown that

T(n) =nlog\302\273a[T(l) + u(n)]

whereu(n) = Ej=ih(b>)and h{n) = /(n)/nlog\"a.Table3.1tabulatestheasymptotic value of u(n) for various values of h(n). This tableallowsonetoeasily obtain the asymptotic value of T(n) for many of the recurrencesoneencounterswhen analyzing divide-and-conqueralgorithms.Let us considersomeexamplesusingthis table.

h(n)Q{nr),r <0

6((logn)z),i>00(nr),r > 0

u(n)0(1)

6((logn)2+i/(i+ l))0(/i(n))

Table3.1t/(n) values for various h(n) values

Example3.2Look at the following recurrencewhen n is a power of 2:

T(l) n = lT(n) T(n/2)+ c n > 1


Comparingwith (3.2),we seethat a = 1,6 = 2,and f(n) = c.So,logb(a)=0 and h{n)= /(n)/nloâ= c = c(logn)0= 0((logn)\302\260).FromTable3.1,weobtainu{n)= 6(logn).So,T(n) =

nl\302\260s>>a[c +6(logn)]= 6(logn). \342\226\241

Example3.3Next considerthe casein whicha = 2, 6 = 2, and f(n) = en.Forthis recurrence,logâ= 1and h(n) = f(ri)/n= c= 0((logn)\302\260).Hence,u(n) = G(logn)and T(n) = n[T(l)+G(logn)]= G(nlogn). \342\226\241

Example3.4As anotherexample,considerthe recurrenceT(n) = 7T(n/2)+18n2,n > 2 and a power of 2. We obtain a = 7, 6 = 2, and /(n) = 18n2.So, log6a= log27ss 2.81and /i(n) = 18n2/nlog27= 18n2\"loS27= 0(rf),wherer = 2 \342\200\224 log2 7 <0.So,t/(n) = 0(1).Theexpressionfor T(n) is

T(n) =n'\302\260S27[T(l) +0(l)]

= 6(nloS27)

as T(l)is assumedto bea constant. \342\226\241

Example3.5As a final example,considerthe recurrenceT(n) = 9T(n/3)+4n6, n > 3 and a powerof 3.Comparingwith (3.2),we obtaina = 9,6 = 3,and /(n) = 4n6. So, log6a= 2 and h(n) = 4n6/n2 = 4n4 = Q(n4). FromTable3.1,we seethat u(n) = Q(h(n))= 6(n4).So,

T(n) = n2[T(l)+6(n4)]= 9(n6)

as T(l)can beassumedconstant. \342\226\241

EXERCISES1.Solvethe recurrencerelation(3.2)for the following choicesof a,6, and

f(n) (c beinga constant):

(a) a = 1,6 = 2, and /(n)= en(b) a = 5, 6 = 4, and /(n)= en2

(c) a = 28,6 = 3,and /(n) = en3

2. Solvethe following recurrencerelationsusingthe substitutionmethod:

(a) All threerecurrencesof Exercise1.(b)

1[n)~\\ T{î)+cn>4

3.2.BINARYSEARCH 131

(\302\2530

T(n) - I 1 H ~ 4

JW~| 2T(v^) +logn n>4(\302\2531)

n < 4T(n) \"

{ 2T(v^) +^ n > 4

3.2 BINARYSEARCHLet a,,1< i <n, bea list of elementsthat aresortedin nondecreasingorder.Considerthe problemof determiningwhethera given elementx is presentinthe list.If x is present,we areto determinea value j suchthat a.j = x. If a:is not in the list,thenj is to be set to zero.Let P = (n,a,i,...,at,x) denotean arbitrary instanceof this searchproblem(n is the numberof elementsinthe list,a,i,...,d( is the list of elements,and x is the elementsearchedfor).

Divide-and-conquercan be used to solve this problem.Let Small(P)betrue if n = 1.In this case,S(P)will take the value i if x = af, otherwiseitwill take the value 0.Theng(l) = 0(1).If P hasmorethan oneelement,itcanbedivided (orreduced)into a new subproblemasfollows.Pickan indexq (in the range [i,C])and comparex with aq. Therearethreepossibilities:(1) x \342\200\224 aq: In this casethe problemP is immediately solved. (2) x < aq:In this casex has to be searchedfor only in the sublist a2,aj+i,...,a9_i.There;fore,P reducesto (q \342\200\224 i,a,i,..\342\226\240,aq-i,x).(3) x > aq: In this casethesublisl.to besearchedis a9+i,..., ag. P reducesto (I \342\200\224 q, a9+i,...,ae,x).

In this example,any given problemP getsdivided (reduced)into onenew subproblem.This division takesonly 6(1)time. After acomparison with a9, the instanceremainingto be solved (if any) can be solvedby usingthis divide-and-conquerschemeagain. If q is always chosensuchthat aq is the middleelement(that is,q =

\\_{n + l)/2\\), then theresulting

searchalgorithmis known as binary search.Note that the answer tothe new subproblemis alsothe answer to the originalproblemP; thereis no need for any combining.Algorithm 3.2describesthis binary searchmethod,whereBinSrchhas four inputsa[ ],\302\253',/,

and x. It is initially invokedas BinSrch(a,l,n,x).

A nonrecursiveversion of BinSrchis given in Algorithm 3.3.BinSearchhas threeinputsa,n,and x. The while loopcontinuesprocessingas longas therearemoreelementsleft to check.At the conclusionof the procedure0 is returned if x is not present,orj is returned,suchthat a\\j]

= x.Is BinSearchan algorithm? We must be sure that all of the operations

suchas comparisonsbetweenx and a[mid\\ arewell defined.The relationaloperatorscarry out the comparisonsamongelementsof a correctlyif theseoperatorsareappropriatelydefined.DoesBinSearchterminate?We observe


1 AlgorithmBinSrch(a,i,l,x)2 // Given an array a[i:/] of elementsin nondecreasing3 // order,1< i <I,determinewhetherx is present,and4 // if so,return j suchthat x = a[j];elsereturn 0.5 {6 if (/ =

\302\253)

then // If Small(P)7 {8 if (x = a[i\\) thenreturni;9 elsereturn0;10 }11 else12 { // ReduceP into a smallersubproblem.13 mid:=[(i+ l)/2\\;14 if (x = a[mid])thenreturnmid;15 elseif (x < a[mid\\) then16 returnBinSrch(a,\302\253', mid \342\200\224 1,x);17 elsereturnBinSrch(a,mid+1,1,x);18 }19 }

Algorithm3.2Recursivebinary search

1 AlgorithmBinSearch(a,n,x)2 // Given an array a[l:n] of elementsin nondecreasing3 // order,n > 0,determinewhetherx is present,and4 // if so,return j suchthat x = a[j];elsereturn 0.5 {6 low :=1;high :=n;7 while (/ou;< /wg/i) do8 {9 mid:= [(low+high)/2j;10 if (x < a[mid])thenhigh :=mid\342\200\224 1;11 elseif (a; > a[mic(|)thenlow :=mid+ 1;12 elsereturnmid;13 }14 return0;15 }

Algorithm3.3Iterativebinary search


that low and highareintegervariablessuchthat eachtimethroughthe loopeitherx is found or low is increasedby at leastoneor high is decreasedbyat leastone.Thuswe have two sequencesof integersapproachingeachotherand eventually low becomesgreaterthan high and causesterminationin afinite number of stepsif x is not present.

Example3.6Let us selectthe 14entries

-15,-6,0,7, 9, 23,54, 82,101,112,125,131,142,151

placethem in a[l:14],and simulatethe stepsthat BinSearchgoesthroughas it searchesfor different values of x. Only the variableslow, high, andmidneedto be tracedas we simulatethe algorithm.We try the followingvaluesfor x:151,\342\200\22414,

and 9 for two successfulsearchesand oneunsuccessfulsearch.Table3.2shows the tracesof BinSearchon thesethreeinputs. \342\226\241

].r>i low1

81214

high14141414

mid7111314found

x = 9 low114

high1466

x = -14

mid735found

low11122

high146221

mid7312not found

Table3.2Threeexamplesof binary searchon 14elements

Theseexamplesmay give us a littlemoreconfidenceabout Algorithm3.3,but they by no meansprove that it is correct.Proofsof algorithmsarevery useful becausethey establishthe correctnessof the algorithmfor allpossibleinputs, whereastesting gives much lessin the way of guarantees.Unfortunately, algorithmprovingis a very difficult processand the completeproofof an algorithmcan be many timeslongerthan the algorithmitself.We contentourselveswith an informal \"proof\" of BinSearch.

Theorem3.1Algorithm BinSearch(a,n,x)works correctly.

Proof:We assumethat allstatementswork as expectedand thatcomparisons such as x > a[mid]are appropriatelycarriedout. Initially low = 1,high \342\226\240- n, n > 0,and a[l]< a[2] < \342\200\242\342\200\242\342\200\242 < a[n].If n = 0,the while loopis


not enteredand 0 is returned.Otherwisewe observethat eachtimethroughthe loopthe possibleelementsto be checkedfor equality with x area[low],a[low+ 1],- - -, a[mid],...,a[high].If x = a[mid],then the algorithmterminates successfully. Otherwisethe range is narrowed by eitherincreasinglow to mid+ 1or decreasinghigh to mid \342\200\224 1. Clearly this narrowingofthe rangedoesnot affect the outcomeof the search.If low becomesgreaterthan high, then x is not presentand hencethe loopis exited. \342\226\241

Noticethat to fully testbinary search,we neednot concernourselveswiththe values of a[l:n].By varying x sufficiently, we can observeall possiblecomputationsequencesof BinSearchwithout devisingdifferent values for a.To test allsuccessfulsearches,x must takeon the n values in a.To test allunsuccessfulsearches,x needonly takeon n + 1different values.Thus thecomplexityof testing BinSearchis 2n+ 1for eachn.

Now let'sanalyze the executionprofile of BinSearch.The two relevantcharacteristicsof this profilearethe frequencycountsand spacerequiredforthe algorithm.ForBinSearch,storageis requiredfor the n elementsof thearray plus the variableslow,high,mid,and x, or n +4 locations.As for thetime,thereare threepossibilitiesto consider:the best,average,and worstcases.

Supposewe beginby determiningthe timefor BinSearchon theprevious data set. We observethat the only operationsin the algorithmarecomparisonsand somearithmeticand datamovements.We concentrateoncomparisonsbetweenx and the elementsin a[ ], recognizingthat thefrequency count of all otheroperationsis of the sameorderas that for thesecomparisons.Comparisonsbetweenx and elementsof a[ ] are referredtoas elementcomparisons.We assumethat only onecomparisonis neededtodeterminewhich of the threepossibilitiesof the if statementholds.Thenumberof elementcomparisonsneededto find eachof the 14elementsis

[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]Elements: -15 -6 0 7 9 23 54 82 101 112 125 131 142 151Comparisons: 3 424341434 2 4 3 4

No elementrequiresmorethan 4 comparisonsto be found. The averageis obtainedby summingthe comparisonsneededto find all 14itemsanddividing by 14;this yields 45/14,or approximately3.21,comparisonspersuccessfulsearchon the average.Thereare15possibleways that anunsuccessful searchmay terminatedependingon the value of x. If x < a[l],thealgorithmrequires3 elementcomparisonsto determinethat x is not present.Forallthe remainingpossibilities,BinSearchrequires4 elementcomparisons.Thusthe averagenumberof elementcomparisonsfor an unsuccessfulsearchis (3 + 14* 4)/15= 59/15\302\253 3.93.

Theanalysis just doneappliesto any sortedsequencecontaining14elements. But the result we would preferis a formula for n elements.A good

3.2.MNARY SEARCH 135

way to derive sucha formula plus a betterway to understandthe algorithmis to considerthe sequenceof values for midthat areproducedby BinSearchfor all possiblevalues of x.Thesevalues arenicely describedusinga binarydecisiontree in which the value in eachnodeis the value of mid. Forexample, if n, \342\200\224 14,then Figure3.1containsa binary decisiontreethat tracesthe way in which thesevalues areproducedby BinSearch.

Figure3.1Binary decisiontreefor binary search,n = 14

Thefirst comparisonis x with a[7].If x < a[7],then the next comparisonis with

\302\253,[3]; similarly, if x > a[7],then the next comparisonis with a[ll].Eachpath through the tree representsa sequenceof comparisonsin thebinary searchmethod.If x is present,then the algorithmwill end at oneof the circularnodesthat liststhe indexinto the array wherex was found.If x is not present,the algorithmwill terminateat oneof the squarenodes.Circularnodesarecalledinternalnodes,and squarenodesarereferredtoasexternalnodes.

Theorem3.2If n is in the range[2fc~',2fc),then BinSearchmakesat most kelementcomparisonsfor a successfulsearchand either\302\243;

\342\200\224 1or A; comparisonsfor an unsuccessfulsearch.(In otherwords the timefor a successfulsearchis O(logn)and for an unsuccessfulsearchis 9(logn)).Proof:Considerthe binary decisiontreedescribingthe actionof BinSearchon n elements.All successfulsearchesend at a circularnodewhereasallunsuccessfulsearchesend at a square node.If 2fc_1 < n < 2fc, then allcirculaxnodesareat levels1,2,...,k whereasall squarenodesareat levels


k and k + 1 (note that the root is at level 1). The numberof elementcomparisonsneededto terminateat a circularnodeon level i is i whereasthe numberof elementcomparisonsneededto terminateat a squarenodeatlevel i is only i \342\200\224 1.The theoremfollows. \342\226\241

Theorem3.2statesthe worst-casetimefor binary search.To determinethe averagebehavior,we needto lookmorecloselyat the binary decisiontreeand equateits sizeto the numberof elementcomparisonsin the algorithm.Thedistanceof a nodefrom the rootis one lessthan its level.Theinternalpath length I is the sum of the distancesof all internalnodesfrom the root.Analogously,the externalpath length E is the sum of the distancesof allexternalnodesfrom the root.It is easy to show by inductionthat for anybinary treewith n internalnodes,E and / arerelatedby the formula

E = I + 2n

It turns out that there is a simplerelationshipbetweenE,I, and theaveragenumberof comparisonsin binary search.Let As(n) be the averagenumberof comparisonsin a successfulsearch,and Au(n) the averagenumberof comparisonsin an unsuccessfulsearch.Thenumberof comparisonsneededto find an elementrepresentedby an internalnodeis one morethan thedistanceof this nodefrom the root.Hence,

As(n) = 1+I/nThe numberof comparisonson any path from the root to an externalnodeis equalto the distancebetweenthe rootand the externalnode.Sinceeverybinary treewith n internalnodeshas n + 1externalnodes,it follows that

Au(n) = E/(n+ 1)

Usingthesethreeformulas for E,As(n),and Au(n),we find that

As(n) = (1+1/n)Au(n)-lPromthis formula we seethat As(n)and Au(n) aredirectly related.The

minimum value of As(n) (and henceAu(n)) is achievedby an algorithmwhosebinary decisiontreehas minimum externaland internalpath length.Thisminimum is achievedby the binary treeallof whoseexternalnodesareon adjacentlevels,and this is preciselythe tree that is producedby binarysearch.From Theorem3.2it follows that E is proportionalto n logn.Usingthis in the precedingformulas, we concludethat As(n) and Au(n) arebothproportionalto logn. Thus we concludethat the average-and worst-casenumbersof comparisonsfor binary searcharethe sameto within a constant


factor. The best-caseanalysis is easy. For a successfulsearchonly oneelementcomparisonis needed. For an unsuccessfulsearch,Theorem3.2statesthat [logn\\ elementcomparisonsareneededin the bestcase.

Inconclusionwe arenow abletocompletelydescribethe computingtimeof binary searchby giving formulas that describethe best,average,andworst cases:

successfulsearches unsuccessfulsearches6(1), e(logn),G(logn) 6(logn)best, average, worst best,average,worst

Canwe expectanothersearchingalgorithmto besignificantlybetterthanbinary searchin the worst case? This questionis pursued rigorously in

Chapter10. But we can anticipatethe answer here,which is no. Themethodfor proving suchan assertionis to view the binary decisiontreeasa generalmodelfor any searchingalgorithmthat dependson comparisonsof entireelements.Viewed in this way, we observethat the longestpath todiscoverany elementis minimizedby binary search,and soany alternativealgorithmis no betterfrom this point of view.

Beforewe end this section,there is an interestingvariation of binarysearchthat makesonly one comparisonper iterationof the while loop.This variation appearsas Algorithm 3.4.Thecorrectnessproofof thisvariation is left as an exercise.

BinSearchwill sometimesmaketwiceas many elementcomparisonsasBinSearchl(for example,when x > a[n\\).However,for successfulsearchesBinSearchlmay make(logn)/2moreelementcomparisonsthan BinSearch(for example,when x = a[mid\\).The analysis of BinSearchlis left as anexercise.It shouldbeeasy to seethat the best-,average-,and worst-casetimesfor BinSearchlare0(logn)for both successfuland unsuccessfulsearches.

Thesetwo algorithmswere run on a Sparc10/30.The first two rows inTable3.3representthe averagetimefor a successfulsearch.The secondsetof two rowsgive the averagetimesfor allpossibleunsuccessfulsearches.Forboth successfuland unsuccessfulsearchesBinSearchldid marginally betterthan BinSearch.

EXERCISES1.Run the recursiveand iterativeversionsof binary searchand compare

the times.For appropriatesizesof n, have eachalgorithmfind everyclementin the set.Then try alln + 1possibleunsuccessfulsearches.

2.Proveby inductionthe relationshipE = / + 2n for a binary treewith\342\226\240n, internalnodes.Thevariables E and / are the externaland internalpath length,respectively.


1 AlgorithmBinSearchl(a,n,a?)2 // Samespecificationsas BinSearchexceptn > 03 {4 low :=1;high :=n + 1;5 // high is onemorethan possible.6 while (low < (high \342\200\224 1))do7 {8 mid:= [(low+high)/2};9 if (x < a[mid])thenhigh :\342\200\224 mid;10 // Only onecomparisonin the loop.11 elselow :=mid;// x > a[mid]12 }13 if (x = a[low\\) thenreturnlow;// x is present.14 elsereturn0;// x is not present.15|

Algorithm3.4Binary searchusingonecomparisonpercycle

Array sizes 5,00010,00015,00020,00025,00030,000successfulsearches

BinSearchBinSearch1

51.3047.68

67.9553.92

67.7261.98

73.8567.46

76.7768.95

73.4071.11

unsuccessfulsearchesBinSearchBinSearch1

50.4041.93

66.3652.65

76.7863.33

79.5466.86

78.2069.22

81.1572.26

Table3.3Computingtimesfor two binary searchalgorithms;timesareinmicroseconds

3.3.FINDINGTHEMAXIMUMAND MINIMUM 139

3.In an infinite array, the first n cellscontainintegersin sortedorderand the restof the cellsare filled with oo.Presentan algorithmthattakesx as input and finds the positionof x in the array in 0(logn)time. You arenot given the value of n.

4. ])evisea \"binary\" searchalgorithmthat splitsthe setnot into two setsof (almost)equalsizesbut into two sets,oneof which is twicethe sizeof the other.Howdoesthis algorithmcomparewith binary search?

5. Devisea ternary searchalgorithmthat first teststhe elementatposition n/3for equality with somevalue x, and then checksthe elementat 2n/3and eitherdiscoversx or reducesthe setsizeto one-thirdthesizeof the original.Comparethis with binary search.

6. (a) Provethat BinSearchlworks correctly.(b) Verify that the following algorithmsegmentfunctions correctly

accordingtothe specificationsof binary search.Discussitscomputing time.

low :=1;high :=n;repeat{

mid:= [(low+high)/2\\;if (x > a[mid])thenlow :=mid;elsehigh :=mid;

}until((low+ 1)= high)

3.3 FINDING THE MAXIMUMAND MINIMUM

Let us consideranother simpleproblemthat can be solved by the divide-and-conquertechnique.The problemis to find the maximumand minimumitemsin a set of n elements.Algorithm 3.5is a straightforward algorithmto accomplishthis.

In analyzing the timecomplexityof this algorithm,we onceagainconcentrate on the numberof elementcomparisons.The justificationfor thisis thai, the frequency count for otheroperationsin this algorithmis of thesameorderas that for elementcomparisons.More importantly, when theelementsin a[l:n] arepolynomials,vectors,very largenumbers,or stringsof characters,the costof an elementcomparisonis much higherthan thecostof the otheroperations.Hencethe timeis determinedmainly by thetotalcostof the elementcomparisons.

StraightMaxMinrequires2(n \342\200\224 1)elementcomparisonsin the best,average,

and worst cases.An immediateimprovement is possibleby realizing


AlgorithmStraightMaxMin(a,n,max,rain)II Set maxtothe maximumand min to the minimum of a[l:n].{

max:=min :=a[l];for i :=2 to n do{

if (a[i]> max)thenmax:=a[i];if (a[i)<min) thenmin :=a[i];

Algorithm3.5Straightforward maximumand minimum

that the comparisona[i] < min is necessaryonly when a[i] > max is false.Hencewe can replacethe contentsof the for loopby

if (a[i\\ >max)thenmax:=a[i\\;elseif (a[i\\ <min) thenmin :=a[i\\;

Now the best caseoccurswhen the elementsare in increasingorder.The numberof elementcomparisonsis n \342\200\224 1.Theworst caseoccurswhenthe elementsare in decreasingorder. In this casethe numberof elementcomparisonsis 2(n \342\200\224 1).Theaveragenumberof elementcomparisonsis lessthan 2(n \342\200\224 1).On the average,a[i]is greaterthan max half the time,andsothe averagenumberof comparisonsis 3n/2\342\200\224 1.

A divide-and-conqueralgorithmfor this problemwould proceedasfollows: Let P = (n,a[i],...,a[j])denotean arbitrary instanceof the problem.Heren is the numberof elementsin the list a[i],...,a[j]and we areinterested in finding the maximumand minimum of this list.Let Small(P)/betrue when n <2.In this case,the maximumand minimum area[i] if n = 1.If n = 2, the problemcan be solved by makingonecomparison.

If the list has morethan two elements,P has to bedividedinto smallerinstances.For example,we might divide P into the two instancesPi =(|n/2j,a[l],...,a[|n/2J])and P2 = (n- [n/2\\,a[[n/2\\+1],...,a[n]).After having divided P into two smallersubproblems,we can solve them byrecursively invoking the samedivide-and-conqueralgorithm.How can wecombinethe solutionsfor Pi and P2 toobtaina solutionfor P? If MAX(P)and MIN(P) are the maximumand minimum of the elementsin P, thenMAX(P) is the largerof MAX(Pj) and MAX(P2). Also, MIN(P) is thesmallerof MIN(Pi)and MIN(P2).


Algorithm 3.6resultsfrom applying the strategy just described.MaxMinis a recursivealgorithmthat finds the maximumand minimum of the setof elements{a(i),a(i +1),...,a(j)}.Thesituationof set sizesone (i = j)and two (i = j \342\200\224 1) arehandledseparately.Forsetscontainingmorethantwo elements,the midpointis determined(just as in binary search)and twonew subproblemsare generated.When the maximaand minimaof thesesubproblemsare determined,the two maximaare comparedand the twominimaarecomparedto achievethe solutionfor the entireset.

1 AlgorithmMaxMin(\302\253, j,max,rain)2 // a[l:n] is a globalarray. Parametersi and j areintegers,3 //l<\302\253<i<n.The effect is to setmaxand min to the4 // largestand smallestvalues in a[i :j],respectively.5 {6 if (i = j) thenmax:=min :=a[i];// Small(P)7 elseif (i = j \342\200\224 1) then // Another caseof Small(P)8 {9 if (a[i]<a[j])then10 {11 m,ax :=a[j];rain :=a[{\\;12 }13 else14 {IT) max:=a[i];min :=a[j];Hi }17 }18 else1!) { //HP is not small,divide P into subproblems.20 // Find wheretosplit the set.21 mid:=[(i+ j)/2\\;22 // Solvethe subproblems.2,3 MaxMin(i,mid,max,min);24 MaxMin(rmd+l,j,rnaxl,minl);25 // Combinethe solutions.2(i if (max<maxl) thenmax:=maxl;27 if (min >mini)thenmin :=mini;28 }2!) \\

Algorithm3.6Recursivelyfinding the maximumand minimum


The procedureis initially invokedby the statement

MaxMin(l,n,x,y)

Supposewe simulateMaxMin on the following nineelements:

a: [1] [2] [3] [4] [5] [6] [7] [8] [9]22 13 -5 -8 15 60 17 31 47

A goodway of keepingtrackof recursivecallsis to build a treeby addinganodeeachtimea new callis made.Forthis algorithmeachnodehas fouritemsof information:i,j, max,and rain. On the array a[ ] above,the treeof Figure3.2is produced.

1,9,60,-8

Figure3.2Treesof recursivecallsof MaxMin

ExaminingFigure3.2,we seethat the rootnodecontains1and 9 as thevaluesof i and j correspondingto the initialcallto MaxMin.Thisexecutionproducestwo new callsto MaxMin,wherei and j have the values 1,5 and6,9,respectively,and thus split the set into two subsetsof approximatelythe samesize. From the tree we can immediately seethat the maximumdepth of recursionis four (includingthe first call).Thecirclednumbersinthe upper left cornerof eachnoderepresentthe ordersin which max andmin areassignedvalues.


Now what is the numberof elementcomparisonsneededfor MaxMin?IfT(n) representsthis number,then the resultingrecurrencerelationis

f T{\\n/2])+T{\\n/2])+ 2 n>2T{n)= { 1 n = 2

{ 0 71=1

When n is a power of two, n = 2k for somepositive integerk, then

T{n) = 2T(n/2)+ 2= 2(2T(n/4)+2)+2= 4T(n/4) +4+2

= 2fc-1T(2)+ E1<l<fc-i22= 2k-{+2fc-2= 3n/2-2

Note that 3n/2 \342\200\224 2 is the best-,average-,and worst-casenumberofcomparisons when n is a power of two.

Comparedwith the 2n \342\200\224 2 comparisonsfor the straightforward method,this is a saving of 25% in comparisons.It can be shown that no algorithmbasedon comparisonsuseslessthan 3n/2\342\200\224 2 comparisons.So in this sensealgorithmMaxMin is optimal(seeChapter10for moredetails).But doesthis imply that MaxMin is better in practice?Not necessarily.In termsof storage,MaxMin is worse than the straightforward algorithmbecauseitrequiresstackspacefor i,j, max,min,maxl,and mini.Given n elements,therewill be [log2 \302\253J

+1levelsof recursionand we needto save sevenvaluesfor eachrecursivecall(don'tforget the return addressis alsoneeded).

Lei.us seewhat the count is when elementcomparisonshave the samecostas comparisonsbetweeni and j. Let C(n) be this number. First,weobservethat lines6 and 7 in Algorithm 3.6can bereplacedwith

if (i >j-l){ // Small(P)

to achievethe sameeffect. Hence,a singlecomparisonbetweeni and j \342\200\224 1is adequateto implementthe modified if statement.Assuming n = 2k forsomepositive integerk, we get

C7(n) = {f^2)+ 3nnl\\

(3.3)


Solvingthis equation,we obtain

C(n) = 2C(n/2)+ 3= 4C(n/4) + 6 + 3

The comparativefigure for StraightMaxMinis 3(n \342\200\224 1) (includingthecomparison neededto implementthe for loop).This is largerthan 5n/2 \342\200\224 3.Despitethis, MaxMin will be slower than StraightMaxMinbecauseof theoverheadof stackingi,j,max,and min for the recursion.

Algorithm 3.6makesseveralpoints.If comparisonsamongthe elementsof a[ ] aremuch morecostly than comparisonsof integervariables,then thedivide-and-conquertechniquehas yielded a moreefficient (actually anoptimal) algorithm.On the otherhand, if this assumptionis not true,thetechnique yields a less-efficientalgorithm.Thusthe divide-and-conquerstrategyis seento be only a guideto betteralgorithmdesignwhich may not alwayssucceed.Also we seethat it is sometimesnecessaryto work out theconstants associatedwith the computingtimebound for an algorithm.BothMaxMin and StraightMaxMinare0(n),sothe useof asymptotic notation isnot enoughof a discriminatorin this situation.Finally, seethe exercisesfor another way to find the maximumand minimum usingonly 3n/2\342\200\224 2comparisons.Note:In the designof any divide-and-conqueralgorithm,typically, it is astraightforward task to defineSmall(P)and S(P).So,from now on, we onlydiscusshow todivide any given problemP and how to combinethe solutionsto subproblems.

EXERCISES1.TranslatealgorithmMaxMin into a computationallyequivalent

procedure that usesno recursion.

2.Testyour iterativeversion of MaxMin derived above againstStraightMaxMin. Countallcomparisons.

3.Thereis an iterativealgorithmfor finding the maximumand minimumwhich, though not a divide-and-conquer-basedalgorithm,isprobably

moreefficient than MaxMin. It works by comparingconsecutivepairsof elementsand then comparingthe largerone with the currentmaximumand the smalleronewith the currentminimum.Write out

2k2k +5n/2

1C(2)+ 3 ES\"3*2fc-1-3-3

(3.4)

3.4.MERGESORT 145

the algorithmcompletelyand analyze the numberof comparisonsitrequires.

4. In Algorithm 3.6,what happensif lines7 to 17aredropped?Doestheresultantfunction stillcomputethe maximumand minimum elementscorrectly?

3.4 MERGE SORTAs another exampleof divide-and-conquer,we investigatea sortingalgorithm that has the niceproperty that in the worst caseits complexityis0(nlogn). Thisalgorithmis calledmergesort.We assumethroughoutthatthe elementsare to be sortedin nondecreasingorder.Given a sequenceofn elements(alsocalledkeys) a[l],...,a[n],the generalideais to imaginethem split into two setsa[l],...,a[|_n/2J]and a[|_n/2j+1],...,a[n].Eachset is individually sorted,and the resultingsortedsequencesaremergedtoproducea singlesortedsequenceof n elements.Thuswe have anotheridealexampleof the divide-and-conquerstrategy in which the splittingis into twoequal-sizedsetsand the combiningoperationis the mergingof two sortedsetsinto one.

MergeSort(Algorithm3.7) describesthis processvery succinctlyusingrecursionand a function Merge (Algorithm 3.8)which mergestwo sortedsets.BeforeexecutingMergeSort,the n elementsshouldbeplacedin a[\\ :n].Then MergeSort(l,n)causesthe keys to be rearrangedinto nondecreasingorderin a.

Example3.7Considerthe array oftenelementsa[l:10]= (310,285,179,652,351,423,861,254,450,520).Algorithm MergeSortbeginsby splittinga[ ] into two subarrays eachof sizefive (a[l:5] and a[6:10]).Theelementsin a[l:5] are then split into two subarrays of sizethree(a[l:3])and two(a[4 : 5]).Then the itemsin a[\\ : 3] are split into subarrays of sizetwo(a[l: 2]) and one (a[3 : 3]).The two values in a[\\ : 2] are split a finaltime into one-elementsubarrays, and now the mergingbegins.Note thatno movement of data has yet taken place. A recordof the subarrays isimplicitly maintainedby the recursivemechanism.Pictoriallythe file cannow be viewedas

(310| 285| 179| 652,351| 423,861,254, 450,520)

whereverticalbarsindicatethe boundariesof subarrays.Elementsa[l]anda[2] aremergedto yield

(285,310| 179| 652,351| 423,861,254,450,520)


1 AlgorithmMergeSort(Zou>,high)2 // a[low :high] is a globalarray to besorted.3 // Small(P)is true if thereis only oneelement4 // to sort.In this casethe list is already sorted.5 *6 if (low < high) then //If therearemorethan oneelement7 {8 // DivideP into subproblems.9 // Find wheretosplit the set.10 mid:=[(low+high)/2\\;11 // Solvethe subproblems.12 MergeSort(low,mid);13 MergeSort(mid+ I,high);14 // Combinethe solutions.15 Merge(low,mid,high);16 }17 }

Algorithm3.7Mergesort

Then a[3] is mergedwith a[l:2] and

(179,285,310| 652,351| 423,861,254,450,520)

is produced.Next,elementsa[4] and a[5] aremerged:

(179,285,310| 351,652| 423,861,254,450,520)

and then a[l:3] and a[4 :5]:

(179,285,310,351,652| 423,861,254, 450,520)

At this point the algorithmhas returnedto the first invocationof MergeSortand is about to processthe secondrecursivecall. Repeatedrecursivecallsare invokedproducingthe following subarrays:

(179,285,310,351,652| 423 | 861| 254 | 450,520)

Elementsa[6] and a[7] aremerged.Then a[8] is mergedwith a[6:7]:

3.4.MERGE SORT 147

123456789101L1213141516171819202122232425262728293031

AlgorithmMerge(low,mid,high)IIIIIIII{

}

a[low :highsubsetsin a

is a globalarray containingtwo sortedlow :mid] and in a[mid+ 1:high]. Thegoal

is to mergethesetwo setsinto a singleset residingin a[low :high]. b[ ] is an auxiliary globalarray.

h :=low; i :=low; j :=mid+ 1;while ((h < mid) and (j < high)) do{

if (a[h] <a\\j])then{

b[i] :=a[h];h:=h + 1;}else{

b[i] -=a[j];j :=j+ l;}i :=i + 1;

}if (h >mid) then

for k :=j to high do{

b[i] :=a[k];i :=i + 1;}elsefor k :=h to middo{

b[i] :=a[k];i :=i + 1;}

for k :=low to high doa[k] :=b[k];

Algorithm3.8Mergingtwo sortedsubarrays usingauxiliary storage


(179,285,310,351,652| 254, 423,861| 450,520)

Next a[9]and a[10]aremerged,and then a[Q:8] and a[9:10]:(179,285,310,351,652| 254, 423,450,520,861)

At this point thereare two sortedsubarrays and the final mergeproducesthe fully sortedresult

(179,254, 285,310,351,423,450,520,652,861)

Figure3.3Treeof callsof MergeSort(l,10)

Figure3.3is a treethat representsthe sequenceof recursivecallsthat areproducedby MergeSortwhen it is appliedtoten elements.Thepair of valuesin eachnodeare the values of the parameterslow and high. Noticehowthe splittingcontinuesuntil setscontaininga singleelementareproduced.Figure3.4is a treerepresentingthe callsto procedureMergeby MergeSort.For example,the nodecontaining1,2, and 3 representsthe mergingofa[l:2] with a[3]. \342\226\241

If the timefor the mergingoperationis proportionalto n, then thecomputing timefor mergesort is describedby the recurrencerelation

rpi \\ _ \\ a n = l,aaconstant1(nj -|2T(n/2)+ en n > 1,c a constant

3.4.MERGE SORT 149

When n is a power of 2, n = 2k, we can solve this equationby successivesubstitutions:

T{n) = 2(2T(n/4)+cn/2) + en= 4T(n/4) +2cn= 4(2T(n/8)+ en/4) +2cn

= 2kT(l)+kcn= an + enlogn

It is easy to seethat if 2k < n < 2k+1,then T{n)< T(2k+1).Therefore

T(n) = 0(nlogn)

Figure3.4Treeof callsof Merge

ThoughAlgorithm 3.7nicely capturesthe divide-and-conquernature ofmergesort,thereremainseveralinefficienciesthat can and shouldbeeliminated. We presenttheserefinementsin an attemptto producea versionofmergesortthat is goodenoughto execute.Despitetheseimprovementsthealgorithm'scomplexityremains0(nlogn).We see in Chapter10that nosortingalgorithmbasedon comparisonsof entirekeys can do better.

One complaintwe might raiseconcerningmergesort is its use of 2nlocations.Theadditionaln locationswere neededbecausewe couldn'treasonably mergetwo sortedsetsin place.But despitethe useof this spacethe


algorithmmust stillwork hard and copy the resultplacedinto b[low :high]backinto a[low:high]on eachcallof Merge.An alternativeto this copyingis to associatea new field of information with eachkey. (The elementsina[ ] arecalledkeys.) This field is used to link the keys and any associatedinformation togetherin a sortedlist (keysand relatedinformation arecalledrecords).Thenthe mergingof the sortedlistsproceedsby changingthe linkvalues,and no recordsneedbemoved at all.A field that containsonly a linkwill generallybesmallerthan an entirerecord,solessspacewill be used.

Along with the originalarray a[],we definean auxiliary array link[l:n]that containsintegersin the range [0,n].Theseintegersare interpretedaspointersto elementsof a[]. A list is a sequenceof pointersendingwith azero.Belowis one set of values for link that containstwo lists:Q and R.TheintegerQ = 2 denotesthe start of one list and R = 5 the start of theother.

link: [1] [2] [3] [4] [5] [6] [7] [8]6 4 7 13 0 8 0

Thetwo listsareQ = (2,4, 1,6) and R = (5,3,7,8).Interpretingtheselistsas describingsortedsubsetsof a[\\ :8],we concludethat a[2] < a[4] < a[l]< a[6] and a[5] < a[S\\ < a[7] < a[$\\.

Another complaintwe couldraiseabout MergeSortis the stackspacethatis necessitatedby the useof recursion.Sincemergesortsplitseachset intotwo approximatelyequal-sizedsubsets,the maximumdepth of the stackisproportionalto logn.Theneedfor stackspaceseemsindicatedby the top-down mannerin which this algorithmwas devised.Theneedfor stackspacecanbeeliminatedif we build an algorithmthat works bottom-up;seetheexercisesfor details.

As can beseenfrom function MergeSortand the previousexample,evensetsof sizetwo will causetwo recursivecallstobemade.For smallsetsizesmostof the timewill be spent processingthe recursioninsteadof sorting.This situationcan be improved by not allowing the recursionto go to thelowest level.In termsof the divide-and-conquercontrolabstraction,we aresuggestingthat when Small is true for mergesort,moreworkshouldbedonethan simply returningwith no action. We use a secondsortingalgorithmthat works well on small-sizedsets.

Insertionsortworks exceedinglyfast on arrays of lessthan, say, 16elements, though for largen its computingtimeis 0(n2).Itsbasicideaforsortingthe itemsin a[\\ :n] is as follows:

for j :=2 to n do{placea[j]in its correctpositionin the sortedseta[l:j \342\200\224 1];

}

3.4.MERGESORT 151

Thoughalltheelementsin a[l:j \342\200\224 1] may have tobemoved to accommodatea[j],for smallvalues of n the algorithmworks well. Algorithm 3.9has thedetails.

AlgorithmlnsertionSort(a,n)// Sort the array a[l:n] into nondecreasingorder,n > 1.{

for j :\342\200\224 2 to n do{

// a[\\ :j \342\200\224 1]is already sorted.item:=a[j];i :=j -1;while ((i > 1) and (item< a[i]))do{

a[i+ 1]:=a[i];i :\342\200\224 i \342\200\224 1;}a[i+ 1]:=item;

Algorithm3.9Insertionsort

The statementswithin the while loopcan be executedzeroup to amaximumof j times.Sincej goesfrom 2 to n, the worst-casetimeof thisprocedureis boundedby

Y, j = n(n + l)/2-l= 6(n2)2<j<n

Itsbest-casecomputingtimeis 0(n)underthe assumptionthat the body ofthe while loopis never entered.This will be true when the data is alreadyin sortedorder.

We arenow ready topresentthe revisedversion of mergesortwith theinclusionof insertionsort and the links. FunctionMergeSortl(Algorithm3.10)is initially invoked by placingthe keys of the recordsto be sortedin

a[\\ :n] and setting link[l:n] tozero.Then one says MergeSortl(l,n).A

pointerto a list of indicesthat give the elementsof a[ ] in sortedorderisreturned.Insertionsort is usedwhenever the numberof itemsto be sortedis lessthan 16.Theversionof insertionsortas given by Algorithm 3.9needstobealteredsothat it sortsa[low :high]into a linked list.Callthe alteredversion InsertionSortl.The revisedmergingfunction,Mergel,is given in

Algorithm 3.11.


1 AlgorithmMergeSortlftow,high)2 // The globalarray a[low :high] is sortedin nondecreasingorder3 // usingthe auxiliary array link[low:high]. Thevalues in link4 // representa list of the indiceslow throughhigh giving a[ ] in5 // sortedorder.A pointerto the beginningof the list is returned.6 {7 if {{high- low) < 15)then8 returnlnsertionSortl(a,link,low,high);9 else10 {11 mid:=[{low+high)/2\\;12 q :=MergeSortl(Zou>,mid);13 r :=MergeSortl(mid+l,high);14 returnMergel{q,r);15 }16 }

Algorithm3.10Mergesortusinglinks

Example3.8As an aid to understandingthis new versionof mergesort,supposewe simulatethe algorithmas it sortsthe eight-elementsequence(50,10,25,30,15,70,35,55).We ignorethe fact that lessthan 16elementswouldnormally besortedusingInsertionSort.Thelink array is initializedto zero.Table3.4shows how the link array changesafter eachcallof MergeSortlcompletes.On eachrow the value of p points to the list in link that wascreatedby the last completionof Mergel.To the right are the subsetsofsortedelementsthat arerepresentedby theselists.Forexample,in the lastrow p = 2 which beginsthe list of links 2, 5,3,4, 7, 1,8,and 6;this impliesa[2] < a[5] < a[3] < a[4] < a[7] <a[l]< a[8] < a[6]. \342\226\241

EXERCISES1.Why is it necessaryto have the auxiliary array b[low :high]in function

Merge?Give an examplethat showswhy in-placemergingis inefficient.

2.Theworst-casetimeof procedureMergeSortis 0{nlogn).What is itsbest-casetime?Can we say that the timefor MergeSortis O(nlogn)?

3.A sortingmethodis said to be stableif at the end of the method,identicalelementsoccurin the sameorderas in the originalunsorted

3.4.MERGE SORT 153

1 AlgorithmMergel(g,r)2 11q and r arepointersto listscontainedin the globalarray3 // link[0:n].link[0]is introducedonly for convenienceand need4 // not be initialized.Thelistspointedat by q and r aremerged5 // and a pointerto the beginningof the mergedlist is returned.6 {7 i :=q; j :=r; k :=0;8 // Thenew list startsat link[0].9 while ((i jt 0) and (j \302\243 0)) do10 { // While both listsarenonempty do11 if (a[i\\ <a\\j])then12 { // Find the smallerkey.13 link[k] :\342\200\224 i;k :=i;i :=link[i];14 // Add a new key to the list.ir, }l(i else17 {18 Unk[k]:=j; k :\342\200\224 j; j :=link[j];1!) }20 }2 L if (i = 0) thenlink[k]:=j;22 elselink[k]:=i;23 returnlink[Q]\\24 >

Algorithm3.11Merginglinked listsof sortedelements


a:link:q r p1 2 23 4 32 3 25 6 57 8 75 7 52 5 2

(0)-0

2325752

(1)500

0000008

(2)100

1133335

(3)250

0444444

(4)300

0011117

(5)150

0006673

(6)700

0000000

(7)350

0000881

(8)550

0000066

(10,50)(10,50),(25,30)(10,25,30,50)(10,25,30,50),(15,70)(10,25,30,50),(15,70), (35,55)(10,25,30,50) (15,35,55, 70)(10,15,25,30,35,50,55,70)

MergeSortlappliedto a[l:8] = (50,10,25,30,15,70,35,55)Table3.4Exampleof link array changes

set.Ismergesorta stablesortingmethod?4. Supposea[l: m] and 6[1: n] both containsortedelementsin non-

decreasingorder. Write an algorithmthat mergestheseitemsintoc[l: m + ri\\. Your algorithmshouldbe shorterthan Algorithm 3.8(Merge)sinceyou can now placea largevalue in a[m+1]and b[n+1].

5.Givena file of n recordsthat arepartially sortedas x\\ < X2 < \342\226\240\342\226\240\342\226\240< xmand xm+i < \342\226\240\342\226\240\342\226\240< xn, is it possibleto sortthe entirefile in time0(n)usingonly a smallfixed amountof additionalstorage?

6. Another way to sorta file of n recordsis to scanthe file, mergeconsecutive pairsof sizeone,then mergepairsof sizetwo, and soon.Writean algorithmthat carriesout this process.Show how your algorithmworks on the data set (100,300,150,450,250,350,200,400,500).

7. A version of insertionsort is used by Algorithm 3.10to sort smallsubarrays. However, its parametersand intent are slightly differentfrom the procedureInsertionSortof Algorithm 3.9.Write a versionofinsertionsortthat will work as Algorithm 3.10expects.

8. ThesequencesX\\, X2,..\342\226\240, Xg, aresortedsequencessuchthat Ya=i \\Xi|:

n. Show how to mergethese\302\243 sequencesin time0(nlogI).

3.5 QUICKSORTThedivide-and-conquerapproachcanbeusedto arrive at an efficient sortingmethoddifferent from mergesort.In mergesort,the file a[\\ :n] was divided

3.5.QUICKSORT 155

at its midpoint into subarrays which were independentlysortedand latermerged.In quicksort,the division into two subarrays is madesothat thesortedsubarrays do not need to be mergedlater. This is accomplishedbyrearrangingthe elementsin a[l:n] suchthat a[i] < a[j]for all i between1and m and allj betweenm + 1and n for somem, 1< m < n. Thus, theelementsin a[l:m] and a[rn + 1:n] canbe independentlysorted.No mergeis needed.The rearrangementof the elementsis accomplishedby pickingsomeelementof a[ ],say t = a[s],and then reorderingthe otherelementssothat allelementsappearingbeforet in a[l:n] are lessthan or equal tot and all elementsappearingafter t are greaterthan or equal to t. Thisrearrangingis referredtoaspartitioning.

FunctionPartition of Algorithm3.12(dueto C.A. R.Hoare)accomplishesan in-placepartitioningof the elementsof a[m:p \342\200\224 1].It is assumedthata\\p] > a[m] and that a[m] is the partitioningelement.If m = 1

andp\342\200\2241= n,

then\302\253[h

+ 1]must be defined and must be greaterthan or equal to allelementsin a[l:n].Theassumptionthat a[m] is the partitionelementismerely for convenience;otherchoicesfor the partitioningelementthan thefirst itemin the setarebetter in practice.Thefunction Interchange^,i,j)exchangesa[i]with a[j].Example3.9As an exampleof how Partition works,considerthe followingarray of nineelements.Thefunction is initially invokedas Partition(a,1,10).Theends of the horizontalline indicatethoseelementswhich wereinterchanged toproducethe next row. The elementa[l]= 65 is the partitioningelementand it is eventually (in the sixthrow) determinedto be the fifth

smallestelementof the set.Noticethat the remainingelementsareunsortedbut partitionedabout a[5] = 65. \342\226\241

(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) i p65 70 75 80 85 60 55 50 45 + oo 2 9

65 45 75 80 85 60 55 50 70 + oo 3 8

65 45 50 80 85 60 55 75 70 + oo 4 7

65 45 50 55 85 60 80 75 70 + oo 5 6

65 45 50 55 60 85 80 75 70 + oo 6 5

60 45 50 55 65 85 80 75 70 + oo

UsingHoare'sclevermethodof partitioninga set of elementsabout achosenelement,we can directly devisea divide-and-conquermethodforcompletelysortingn elements.Followinga call to the function Partition,two setsSiand

\302\2432areproduced.All elementsin Siare lessthan or equal


1 AlgorithmPartition(a,m,j9)2 // Within a[m],a[m+1],...,a\\p

\342\200\224 1]the elementsare3 // rearrangedin sucha mannerthat if initially t = a[m],4 // then after completiona[q] = t for someq betweenm5 // and p \342\200\224 1,a[k] < t for m < k < g, and a[k] > t6 // for q < k <p.q is returned.Set a\\p]

= oo.7 {8 v :=a[m];i :=m; j :=p;9 repeat10 {11 repeat12 \302\253:=\302\253 + l;13 until(a[i]> v);

14 repeat15 j :=j -1;16 until(a[7] < u);

17 if (i < j) thenlnterchange(a,i,j);18 }until(i > j);19 a[m] :=a[j];a[7] :=v; returnj;20 }1 AlgorithmInterchange^,i,j)2 // Exchangea[i]with a\\j],3 {4 p :=a[i];5 a[i]:=a\\j];a[j]:=p;6 }

Algorithm3.12Partitionthe array a[m:p \342\200\224 1]about a[m]

3.5.QUICKSORT 157

to the elementsin 52.HenceSiand 52can besortedindependently.Eachset is sortedby reusingthe function Partition. Algorithm 3.13describesthecompleteprocess.

1 AlgorithmQuickSort(p,q)2 // Sortsthe elementsalp],...,a[q] which residein the global3 // array a[l:n] into ascendingorder;a[n+ 1]is consideredto4 //bedefinedand must be > all the elementsin a[l:n].5 *6 if (p < q) then //If therearemorethan oneelement7 {8 // divide P into two subproblems.9 j :=Partition(a,p,q + 1);10 // j is the positionof the partitioningelement.11 // Solvethe subproblems.12 QuickSort(j9,j \342\200\224 1);Hi QuickSort(j+ l,q);14 // Thereis no needfor combiningsolutions.ir, }i\302\253 }

Algorithm3.13Sortingby partitioning

In analyzing QuickSort,we countonly the numberof elementcomparisonsC(n).It is easy to seethat the frequencycountof otheroperationsis of thesameorderas C(n).We make the following assumptions:the n elementstobe sortedaredistinct,and the input distributionis suchthat the partitionelementv = a[m] in the callto Partition(a,m,p)has an equalprobability ofbeingthe ith smallestelement,1< i <p \342\200\224 m, in a[m:p \342\200\224 1].

First,let us obtain the worst-casevalue Cw(n) of C(n).Thenumberofelementcomparisonsin eachcallof Partition is at most p \342\200\224 m + 1.Let rbe the totalnumberof elementsin all the callsto Partition at any level ofrecursion.At level oneonly onecall,Partition(a,l,n+l),is madeand r = n;at level two at most two callsaremadeand r = n \342\200\224 1;and soon.At eachlevel of recursion,0(r)elementcomparisonsaremadeby Partition. At eachlevel,r is at leastonelessthan the r at the previouslevel as the partitioningelementsof the previouslevel areeliminated.HenceCw(n) is the sum on ras r varies from 2 ton, or 0(n?).Exercise7 examinesinput dataon whichQuickSortusesQ.(n2)comparisons.

The averagevalue CA{n)of C(n) is much lessthan Cw(n). Undertheassumptionsmadeearlier,the partitioningelementv hasanequalprobability


of beingthe ith-smallestelement,1<i <p \342\200\224 m, in a[m:p \342\200\224 1].Hencethetwo subarrays remainingto be sortedarea[m:j] and a[j+ 1:p \342\200\224 1]with

probability l/(p\342\200\224 m),m <j <p. Fromthis we obtain the recurrence

CA(n) = n + 1+ - Y, [CA{k-1))+ CA(n-k)] (3.5)ni<k<n

Thenumberof elementcomparisonsrequiredby Partition on its first callis n + 1.Notethat CA(0)= CA(1)= 0.Multiplying both sidesof (3.5)byn, we obtain

nCA(n) = n{n+ 1)+ 2[CA{0)+ CA(1)+ \342\226\240\342\226\240\342\226\240+ CA{n- 1)] (3.6)

Replacingn by n \342\200\224 1in (3.6)gives

(n- l)CA(n- 1)= n{n- 1)+ 2[CA{0)+ \342\226\240\342\226\240\342\226\240+ CA(n-2)]

Subtractingthis from (3.6),we get

nCA(n) - (n - l)CA(n- 1) = 2n+2CA(n-l)or

CA(n)/(n+ l) = CA(n-l)/n+2/(n+ l)

Repeatedlyusingthis equationto substitute for CA(n \342\200\224 1),CA(n \342\200\224 2),...,we get

CA(n) = CA(n-2) ,2,2n+1 n \342\200\224 1 n n+1

CA\\n-Z) , 2^ , 2 , 2_n-2 \"\342\200\242\" n-1\"r n \"\342\200\242\" n+1

2 +z2^3<fc<n+lfc2E3<Kn+lfc

(3.7)

Since

E r^r -rfa;= loge(n+ l)-loge2k<n+lk h ~

3<k<n ' '\" '\" ^

( 3.7)yields

3.5.QUICKSORT 159

CA{n)<2(n+ l)[loge(n+ 2)-loge2]= 0(nlogn)Even thoughtheworst-casetimeis 0(n2),the averagetimeis only 0(nlog n).Let us now lookat the stackspaceneededby the recursion.In the worst casethe maximumdepth of recursionmay ben \342\200\224 1.Thishappens,for example,when the partitionelementon eachcallto Partition is the smallestvalue in

a[m:;> \342\200\224 1].Theamount of stackspaceneededcan bereducedto O(logn)by usingan iterativeversion of quicksortin which the smallerof the twosubarrays a\\p :j \342\200\224 1]and a[j+ 1:q] is alwayssortedfirst.Also, the secondrecursivecallcan be replacedby someassignmentstatementsand a jumpto the beginningof the algorithm.With thesechanges,QuickSorttakestheform of Algorithm 3.14.

We cannow verify that the maximumstackspaceneededis O(logn).LetS(n) be the maximumstackspaceneeded.Then it follows that

s(n)<{2+S(l(n-l)/2j) \342\200\236>1

which is lessthan 2 logn.As remarkedin Section3.4,InsertionSortis exceedinglyfast for n lessthan

about 10.HenceInsertionSortcanbeusedto speedup QuickSort2wheneverq \342\200\224 p < 16.Theexercisesexplorevarious possibilitiesfor selectionof thepartitionelement.

3.5.1PerformanceMeasurementQuickSortand MergeSortwere evaluatedon a SUN workstation10/30.Inboth casesthe recursiveversionswereused.For QuickSortthe Partitionfunction was alteredto carry out the medianof threerule (i.e.the partitioningelementwas the medianof a[m],a[[(m+p\342\200\224 1)/2J]and

a\\p\342\200\224 1]).Eachdataset consistedof randomintegersin the range (0, 1000).Tables3.5and 3.6recordthe actualcomputingtimesin milliseconds.Table3.5displays theaveragecomputingtimes.For eachn, 50 randomdatasetswere used.Table3.6shows the worst-casecomputingtimesfor the 50 datasets.

Scanningthe tables,we immediately see that QuickSortis faster thanMergeSortfor all values. Even though both algorithmsrequire0(nlogn)timeon the average,QuickSortusually performswell in practice.Theexercises discussotherteststhat would make useful comparisons.

3.5.2 RandomizedSortingAlgorithmsThoughalgorithmQuickSorthasan averagetimeof 0(nlogn) onn elements,its worst-casetimeis 0(n2).On the otherhand it doesnot makeuseof any


123456789101112131415161718192021222324

AlgorithmQuickSort2(p,q)II{

}

Sortsthe elementsin a\\p :q].

II stackis a stackof size21og(n).repeat{

while (p < q) do{

j :=Partition (a,p,q+ 1);if (O'-P)< (<?-J))then{

Add(j + 1);// Add j + 1to stoc/c.Add(g);q :=j \342\200\224 1;// Add q tostacfc

}else{

Add(p); // Add p to stacfc.Add(j -1);p :=j + 1;// Add j -1to stac/c

}}// Sort the smallersubfile.if stackis empty thenreturn;Delete(g);Delete(p);// Deleteq and p from stack.

}until(false);

Algorithm3.14Iterativeversion of QuickSort

additionalmemory as doesMergeSort.A possibleinput on which QuickSortdisplays worst-casebehavioris one in which the elementsare already insortedorder.In this casethe partitionwill be such that therewill be onlyone elementin one part and the rest of the elementswill fall in the otherpart. The performanceof any divide-and-conqueralgorithmwill begoodifthe resultantsubproblemsareas evenly sizedaspossible.Can QuickSortbemodifiedsothat it performswell on every input? The answer is yes. Isthetechniqueof usingthe medianof the threeelementsalp],a[[(q+p)/2\\],anda[q] the solution?Unfortunately it is possibleto constructinputs for whicheven this methodwill takeQ,(n2) time,as is exploredin the exercises.

Thesolutionis the useof a randomizer.While sortingthe array a\\p :q],insteadof pickinga[m],pick a randomelement(from amonga\\p\\, ...,a[q])as the partitionelement.Theresultantrandomizedalgorithm(RQuickSort)

3.5.QUICKSORT 161

n

MergeSortQuickSortn

MergeSortQuickSort

100072.836.66000607.6339.4

2000167.285.17000723.4411.0

3000275.1138.98000811.5487.7

4000378.5205.7j9000949.2556.3

5000500.6269.0100001073.6645.2

Table3.5Average computingtimesfor two sortingalgorithmson randominputs

n

MergeSortQuickSortn

MergeSortQuickSort

1000105.741.66000691.3383.8J

2000206.497.17000794.8497.3

3000335.2158.68000889.5569.9

4000422.1244.990001067.2616.2

5000589.9397.8100001167.6738.1

Table3.6Worst-casecomputingtimesfor two sortingalgorithmsonrandom inputs


works on any input and runs in an expected0(nlogn) time,wheretheexpectationis over the spaceof all possibleoutcomesfor the randomizer(rather than the spaceof all possibleinputs). Thecodefor RQuickSortisgiven in Algorithm 3.15.Note that this is a Las Vegas algorithmsinceitwill alwaysoutput the correctanswer.Every callto the randomizerRandomtakesa certainamount of time. If thereare only a very few elementstosort,the timetakenby the randomizermay becomparableto the restof thecomputation.For this reason,we invoke the randomizeronly if (q \342\200\224 p) > 5.But 5 is not a magicnumber;in the machineemployed, this seemsto givethe bestresults.In generalthis numbershouldbedeterminedempirically.

1 AlgorithmRQuickSort(p,q)2 // Sortsthe elementsa\\p],...,a[q] which residein the global3 // array a[\\ :n] into ascendingorder.a[n+ 1]is consideredto4 // bedennedand must be > all the elementsin a[l:n].5 {6 if (p < q) then7 {8 if ((q-p)>5)then9 Interchange^,Random()mod(q \342\200\224 p + 1)+p,p);10 j :=Partition(a,p,g+ 1);11 // j is the positionof the partitioningelement.12 RQuickSort(p,j-1);13 RQuickSort(j+ l,g);14 }15 }

Algorithm3.15Randomizedquick sortalgorithm

Theproofof the fact that RQuickSorthas an expected0(nlogn) timeis the sameas the proofof the averagetimeof QuickSort.Let A(n) be theaveragetimeof RQuickSorton any input of n elements.Then the numberofelementsin the secondpart will be0,1,2,..., n

\342\200\2242,or n\342\200\224 1,allwith an equal

probability of ^ (in the probability spaceof outcomesfor the randomizer).Thus the recurrencerelationfor A(n) will be

A(n) = - Y (A(k-l)+A(n-k))+n+ ll<k<n

This is the sameas Equation3.4,and henceits solutionis 0(nlogn).RQuickSortand QuickSort(without employing the medianof three

elements rule)were evaluatedon a SUN 10/30workstation.Table3.7displays

3.5.QUICKSORT 163

the timesfor the two algorithmsin millisecondsaveragedover 100runs. Foreachn, the input consideredwas the sequenceof numbers1,2,...,n. Aswe can seefrom the table,RQuickSortperformsmuch betterthan QuickSort.Notethat the timesshown in this tablefor QuickSortare much morethanthe correspondingentriesin Tables3.5and 3.6.Thereasonis that Quick-Sort makes@(n2)comparisonson inputs that are already in sortedorder.However,on randominputs its averageperformanceis very good.

n

QuickSortRQuickSort

1000195.59.4

2000759.221.0

3000172830.5

4000316541.6

5000482952.8

Table3.7Comparisonof QuickSortand RQuickSorton the input a[i] =i, 1< i < n; timesare in milliseconds.

The performanceof RQuickSortcan be improved in various ways. Forexamjile,we could pick a smallnumber (say 11)of the elementsin thearray a[ } randomly and use the medianof theseelementsas the partitionelement.Theserandomly chosenelementsform a random sampleof thearray elements.We wouldexpectthat the medianof the samplewould alsobean approximatemedianof the array andhenceresult in an approximatelyeven partitioningof the array.

An even moregeneralizedversionof the aboverandomsamplingtechniqueis shown in Algorithm 3.16.Herewe choosea randomsampleSof selements(wherets is a function of n) from the input sequenceX and sortthemusingHeapSort,MergeSort,or any othersortingalgorithm.Let

\302\243i,\302\2432,.. \342\226\240, isbethesortedsample.We partitionX into s + 1partsusingthe sortedsampleaspartitionkeys. In particularX\\ = {xG X\\x < l\\};X{= {xG

X\\\302\243i^\\<x <

4},for i = 2,3,..., s;and Xs+i = {xG X\\x >\302\243s}.

After having partitionedX into s+1parts,we sorteachpart recursively.For a properchoiceof s,thenumberof comparisonsmadein this algorithmis only nlogn+ o(nlogn).Note the constant1beforer?,logn.We seein Chapter10that this numberis very closeto the information theoreticlowerboundfor sorting.

Chooses = . \" . Thesamplecanbesortedin O(slogs)= 0(^~-)timeand comparisonsif we use HeapSortor MergeSort.If we storethe sortedsample(dementsin an array, say b[ ], for eachx G X, we can determinewhich part X\\ it belongsto in < logncomparisonsusingbinary searchonb[ ].Thusthe partitioningprocesstakesnlogn+0(n)comparisons.In theexercisesyou are askedto show that with high probability the cardinality


1 AlgorithmRSort(a,n)2 // Sort the elementsa[l:n].3 {4 Randomly sampleselementsfrom a[];5 Sort this sample;6 Partitionthe input usingthe sortedsampleas partitionkeys;7 Sorteachpart separately;8 }

Algorithm3.16A randomizedalgorithmfor sorting

of eachXi is no morethan O(^logn)= 0(log3n).Using HeapSortorMergeSortto sorteachof the X^s (without employing recursionon any ofthem), the totalcostof sortingthe X^s is

SJ20(\\Xl\\log\\Xl\\)=ma*:{log|^|}\302\243

0(1^1).\342\200\224f

1<j<s+1.\342\200\224,1=1 1=1

Sinceeach|Xj|is 0(log3n),thecostof sortingthes+lpartsis 0(nloglogn) =o(nlogn). In summary, the numberof comparisonsmadein this randomizedsortingalgorithmis n logn+o(nlogn).

EXERCISES1.Show how QuickSortsortsthe following sequencesof keys: 1,1,1,1,

1,1,1and 5,5,8,3,4, 3,2.2. QuickSortis not a stablesortingalgorithm.However,if the key in a[i]

is changedto a[i]* n + i \342\200\224 1,then the new keys areall distinct.After

sorting, which transformationwill restorethe keys to their originalvalues?

3. In the function Partition, Algorithm 3.12,discussthe meritsordemerits of alteringthe statementif (i < j) to if (i <j). Simulatebothalgorithmson the dataset (5, 4, 3,2,5, 8,9) to seethe differenceinhow they work.

4. FunctionQuickSortusesthe output of function Partition, which returnsthe positionwherethe partitionelementis placed.If equal keys arepresent,then two elementscanbeproperly placedinsteadof one.Show

3.6.SELECTION 165

how you might changePartition sothat QuickSortcan takeadvantageof this situation.

5. In additionto Partition, therearemany otherways topartitiona set.Considermodifying Partition sothat i is incrementedwhile a[i] < vinsteadof a[i] < v. Rewrite Partition makingall of the necessarychangesto it and then comparethe new version with the original.

6. Comparethe sortingmethodsMergeSortland QuickSort2(Algorithm3.10and 3.14,respectively).Devisedatasetsthat compareboth theaverage-and worst-casetimesfor thesetwo algorithms.

7. (a) On which input data doesthe algorithmQuickSortexhibititsworst-casebehavior?

(b) Answerpart (a) for the casein which the partitioningelementisselectedaccordingtothe medianof threerule.

8. With MergeSortwe includedinsertionsortingtoeliminatethebookkeeping for smallmerges.Howwouldyou usethis techniquetoimproveQuickSort?

9.Take the iterativeversions of MergeSortand QuickSortand comparethem for the same-sizedatasetsas used in Section3.5.1.

10.Let, S be a sampleof s elementsfrom X. If X is partitionedinto.s+ 1partsas in Algorithm 3.16,show that the sizeof eachpart isf)(Mogn).

3.6 SELECTIONThePartition algorithmof Section3.5can alsobeusedto obtainan efficientsolutionfor the selectionproblem.In this problem,we aregiven n elementsa[\\ : n] and are requiredto determinethe fcth-smallestelement. If thepartitioningelementv is positionedat a\\j], thenj \342\200\224 1elementsarelessthanorequalto a[j]and n \342\200\224 j elementsaregreaterthan or equalto a[j].Henceif A; < j, then the fcth-smallestelementis in a[l: j \342\200\224 1];if A; = j, thena[j]is the fcth-smallestelement;and if k >j, then the fcth-smallestelementis the (k \342\200\224 j)th-smallestelementin a\\j + 1 : n].The resultingalgorithmis function Selectl(Algorithm 3.17).This function placesthe fcth-smallestelementinto positiona[k] and partitionsthe remainingelementssothata[i] < a[k],1 a[k],k < i <n.

Example3.10Let us simulateSelectlas it operateson the samearrayused to test Partition in Section3.5.Thearray has the nineelements65,70,


1 AlgorithmSelectl(a,n, k)2 // Selectsthe fcth-smallestelementin a[l:n] and placesit3 // in the A;th positionof a[].Theremainingelementsare4 // rearrangedsuchthat a[m] < a[k] for 1< m < k, and5 // a[m] > a[k] for k < m <n.6 {7 low :=1;up :=n + 1;8 a[n+ 1]:=oo;// a[n+ 1]is set to infinity.9 repeat10 {11 // Eachtimethe loopis entered,12 // 1< low < k <up<n+ 1.13 j :=Partition(a,low,up)',14 // j is suchthat a\\j] is the jth-smallestvalue in a[].15 if (k = j) thenreturn;16 elseif (k <j) thenup :=j; // j is the new upperlimit.17 elselow :=j + 1;// j + 1is the new lower limit.18 }until(false);19 }

Algorithm3.17Findingthe fcth-smallestelement

75,80,85,60,55,50,and 45, with a[10]= oo.If k = 5, then the first callofPartition will be sufficient since65 is placedinto a[5].Instead,assumethatwe are lookingfor the seventh-smallestelementof a, that is,k = 7. Thenextinvocationof Partition is Partition(6,10).

a: (5) (6) (7) (8) (9) (10)65 85 80 75 70 +oo

65 70 80 75 85 +ooThis lastcallof Partition has uncoveredthe ninth-smallestelementof a.Thenext invocationis Partition(6,9).

a: (5) (6) (7) (8) (9) (10)65 70 80 75 85 +oo

65 70 80 75 85 +ooThis time,the sixthelementhas beenfound. Sincek ^ j, another calltoPartition is made,Partition(7,9).

3.6.SELECTION 167

a: (5) (6) (7) (8) (9) (10)65 70 80 75 85 +oo

65 70 75 80 85 +oo

Now 80 is the partitionvalue and is correctlyplacedat a[8].However,Selectlhas stillnot found the seventh-smallestelement.It needsone morecalltoPartition, whichis Partition(7,8).Thisperformsonly an interchangebetweena[7] and a[8] and returns,having found the correctvalue. D

In analyzing Selectl,we make the sameassumptionsthat were madeforQuicksort:

1.The n elementsaredistinct.2. The input distributionis such that the partitionelementcan be the

rth-smallestelementof a[m:p \342\200\224 1]with an equalprobability for eachi, 1 < i <p \342\200\224 m.

Partition requires0(p\342\200\224 rn) time. On eachsuccessivecallto Partition,eitherm increasesby at leastone or j decreasesby at leastone. Initiallym =

I and j = n + 1. Hence,at most n callsto Partition can be made.Thus, the worst-casecomplexityof Selectlis 0(n2).Thetimeis f2(n2),forexample,when the input a[l: n] is such that the partitioningelementonthe itli callto Partition is the ith-smallestelementand k = n. In this case,m increasesby onefollowing eachcallto Partition and j remainsunchanged.Hence,n callsaremadefor a totalcostof 0(YHTi) = 0(n2).The averagecomputingtimeof Selectlis,however, only 0(n).Beforeproving this fact,we specifymorepreciselywhat we meanby the averagetime.

Let Tkx(n) bethe averagetimeto find the fcth-smallestelementin a[l:n].Thisaverageis takenover all n!different permutationsof n distinctelements.Now defineTa(ti) and R(n) as follows:

TA(n) = -\302\243

T\\{n)\\<k<n

and

R(n) = max {TkA{n)}k

T/i(n) is the averagecomputingtimeof Selectl.It is easy to seethat T/i(n)<R(n). We arenow ready to show that T/i(n) = 0(n).Theorem3.3TheaveragecomputingtimeTa(ti) of Selectlis 0(n).


Proof:On the first callto Partition, the partitioningelementv is the ith-smallestelementwith probability ^,1 0,suchthat

TkA{n) < cn+]-[Y,TkA-l{n-i)+\302\243 TkA(i-1)],n>2

l<i<fe k<i<n

So,R(n) < en+-max{ V R(n - i) + V R(i -1)}l<i<k k<i<n

1 n\342\200\224\\ n\342\200\224\\

R(n) < cn+ -max{ ^ R(i) + î?(i)},n > 2 (3.8)n\342\200\224fe+1 k

We assumethat c is chosensuch that R(l) < c and show, by inductiononn, that R(n) < 4cn.InductionBase:Forn = 2,(3.8)gives

R(n) < 2c+ -max{R(1),R(1)}< 2.5c< 4cn

InductionHypothesis:Assume R(n) < Acn for alln, 2 < n <m.InductionStep:Forn = m, (3.8)gives

1 [ TO \342\200\2241 TO\342\200\2241

i?(m)<cm-\\ max < ^ R(i) + ^ i?(i)^m\342\200\224fc+1

k

Sincewe know that R(n) is a nondecreasingfunction of n, it follows that

TO\342\200\2241 TO\342\200\2241

\302\243 JJ(i)+\302\243J2(0TO-fe+1 k

is maximizedif A: = y when m is even and A; = ^^ when m is odd.Thus,if m is even,we obtain

2TO-1fi(m) < cm-\\ ^2R(i)m

to/2

3.6.SELECTION 169

<

<

If m is odd,R(m) <

<

<SinceTa(ii) < R(n), it follows that T/i(n)<4cn,and soTj\\(n) is 0(n). \342\226\241

Thespaceneededby Selectlis 0(1).Algorithm3.15is a randomizedversionof QuickSortin whichthe partition

elementis chosenfrom the array elementsrandomly with equalprobability.Thesametechniquecanbeappliedto Selectland the partitionelementcanbe chosento be a random array element.The resultingrandomizedLasVegas algorithm(callit RSelect)has an expectedtimeof 0(n)(wheretheexpectationis over the spaceof randomizeroutputs) on any input. Theproofof this expectedtimeis the sameas in Theorem3.3.

3.6.1 A Worst-CaseOptimalAlgorithmBy choosingthe partitioningelementv morecarefully, we can obtain aselection algorithmwith worst-casecomplexity0(n).To obtain such analgorithm, v must be chosensothat at leastsomefractionof the elementsis smallerthan v and at leastsome(other) fractionof elementsis greaterthan v. Sucha selectionof v can be madeusing the medianof medians(mm) rule. In this rule the n elementsare divided into \\n/r\\ groupsof relementseach(for somer,r > 1).Theremainingn \342\200\224 r \\n/r\\ elementsarenot used.Themedianmi of eachof these\\n/r\\ groupsis found.Then, themedianmmof the m^'s,1< i < [n/r\\, is found. Themedianmm is usedas the partitioningelement.Figure3.5illustratesthe m,'sand mm whenn = 35 and r = 7. The five groupsof elementsare Bi,1 < i < 5. Theseven elementsin eachgrouphave beenarrangedinto nondecreasingorderdown the column.Themiddleelementsare the m^'s. The columnshavebeenarrangedin nondecreasingorderof rrij. Hence,the m; correspondingto column3 is mm.

Sincethe medianof r elementsis the [Y/2]th-smallestelement,it follows(seeFigure3.5)that at least[Ln/rJ/2]of the rrij'sarelessthan or equaltommand at least[n/r\\ \342\200\224

\\[n/r\\ /2]+1> TLn/rJ /2]of the m^'s aregreaterthan orequalto mm. Hence,at least[r/2][[n/rJ/2]elementsarelessthan

O TO\342\200\2241

cm-\\ > im

4cmm/2

TO \342\200\2241

cm+ - E R(i)(to+1)/2

TO\342\200\22418c ^cm-\\ >m

(to+1)/24cm


elements< mm nondecreasingorder

medians -î mm

i i

elements> mm

By B2 B3 B4 B5

Figure3.5Themedianof medianswhen r = 7, n = 35

or equalto (orgreaterthan or equalto) mm. When r = 5, this quantity isat least1.5\\n/h\\. Thus, if we usethe medianof mediansrulewith r = 5 toselectv = mm,we areassuredthat at least1.5[n/5\\elementswill begreaterthan orequalto v. This in turn impliesthat at most n \342\200\2241.5 [n/5\\< Jn+1.2elementsarelessthan v. Also, at most .In+ 1.2elementsaregreaterthanv. Thus, the medianof mediansrule satisfiesour earlierrequirementon v.

Thealgorithmto selectthe fcth-smallestelementusesthe medianofmedians rule to determinea partitioningelement.Thiselementis computedbya recursiveapplicationof the selectionalgorithm.A high-leveldescriptionof the new selectionalgorithmappearsas Select2(Algorithm 3.18).Select2can now beanalyzed for any given r. First,letus considerthe casein whichr = 5 and allelementsin a[ ] aredistinct.Let T(n) be the worst-casetimerequirementof Select2when invokedwith up \342\200\224 low +1= n. Lines4 to9 and11to 12requireat most 0(n)time(notethat sincer = 5 is fixed, eachm[i](lines8 and 9) can be found in 0(1)time).Thetimefor line 10is T(n/5).Let Sand R, respectively,denotethe elementsa[low :j \342\200\224 1]and a[j+1:up].We seethat l^l and \\R\\ areat most .In+ 1.2,which is no morethan 3n/4for n > 24.So,the timefor lines13to16is at most T(3n/4) when n > 24.Hence,for n > 24, we obtain

3.6.SELECTION 171

1 AlgorithmSelect2(a,k, low,up)2 // Find the A;-tli smallestin a[low :up].3 {4 n :=up \342\200\224 low + 1;5 if (n < r) thensorta[low :up] and return the k-th element;6 Divide a[low :up] into n/r subsetsof sizer each;7 Ignoreexcesselements;8 Let rn[i], 1< i < (n/r) be the setof mediansof9 the above n/r subsets.10 v :=Select2(m,\\{n/r)/2],l,n/r);11 Partitiona[low :up] usingv as the partitionelement;12 Assume that v is at positionj;115 if (k

\342\200\224 (j \342\200\224 low + 1)) thenreturnv;14 elseif (k < (j- low + 1)) then15 returnSelect2(a,k, low,j \342\200\224 1);16 elsereturnSelect2(a,k \342\200\224 (j \342\200\224 low+ l),j+ I,up);17 }

Algorithm3.18Selectionpseudocodeusingthe medianof mediansrule

T(n) <T(n/5)+T(3n/4) +en (3.9)wherec. is chosensufficiently largethat

T(n) < en for n < 24

A jiroofby inductioneasily establishesthat T(n) < 20cnfor n > 1.Algorithm Select2with r = 5 is a lineartimealgorithmfor the selectionproblemon distinctelements!Theexercisesexamineothervalues of r thatalsoyield this behavior.Let us now seewhat happenswhen the elementsofa[] arenot all distinct.In this case,following a useof Partition (line11),thesizeof Sor R may bemorethan .In+1.2as someelementsequaltov mayappearin both Sand R. Oneway to handlethe situationis topartitiona[ ]into threesetsU,S,and R, such that U containsallelementsequal to v, Shas all elementssmallerthan v, and R has the remainder.Lines11to 16become:

Partitiona[ ] into U, S,and R, as above.if (\\S\\ > k) thenreturnSelect2(a,k, low,low + \\S\\

\342\200\224 1);elseif ((\\S\\ + \\U\\) > k) thenreturnv;elsereturnSelect2(a,k \342\200\224

\\S\\\342\200\224 \\U\\,low+\\S\\ + \\U\\,up);


When this is done,the recurrence(3.9) is stillvalid as \\S\\ and \\R\\ are <.In+ 1.2.Hence,the new Select2will be of linearcomplexityeven whenelementsarenot distinct.

Another way tohandlethecaseof nondistinctelementsis tousea differentr. To seewhy a different r is needed,let us analyze Select2with r = 5 andnondistinctelements.Considerthe casewhen .7n+1.2elementsarelessthanv and the remainingelementsareequal to v. An examinationof Partitionrevealsthat at most half the remainingelementsmay be in S.We can verifythat this is the worst case.Hence,\\S\\ < .In+1.2+

(.3n\342\200\224 1.2)/2= .85n+.6.Similarly, \\R\\ < .85n+ .6.Since,the total numberof elementsinvolved inthe two recursivecalls(in lines10and 15or 16) is now 1.05n+ .6> n, thecomplexityof Select2is not 0(n).If we try r = 9, then at least2.5[n/9\\elementswill be lessthan or equal to v and at least this many will begreaterthan or equal to v. Hence,the sizeof S and R will be at mostn -2.5Ln/9J + 1/2(2.5Ln/9j)= n - 1.25[n/9\\< 31/36n+ 1.25< 63n/72for n > 90.Hence,we obtain the recurrence

Tin) < { T(n/9)+T(63n/72)+cin n > 90*\342\226\240 ' \342\200\224

\\ c\\n n < 90

whereci is a suitableconstant.An inductiveargumentshows that T(n) <72cin,n > 1.Othersuitablevalues of r areobtainedin the exercises.

As far as the additionalspaceneededby Select2is concerned,we seethat spaceis neededfor the recursionstack. The recursivecallfrom line15or 16is easily eliminatedas this callis the last statementexecutedinSelect2.Hence,stackspaceis neededonly for the recursionfrom line 10.The maximumdepth of recursionis logn. Therecursionstackshouldbecapableof handlingthis depth.In addition to this stackspace,spaceisneededonly for somesimplevariables.

3.6.2 Implementationof Select2Beforeattemptingto write a pseudocodealgorithmimplementingSelect2,we needto decidehow the medianof a setof sizer is to be found and wherewe aregoingtostorethe \\n/r\\ mediansof lines8 and 9. Since,we expectto be usinga smallr (say r = 5 or 9), an efficient way to find the medianof r elementsis to sort them using lnsertionSort(a,2,j). This algorithmisa modificationof Algorithm 3.9to sort a[i : j]. The medianis now themiddleelementin a[i :j}.A convenientplaceto storethesemediansis atthe front of the array. Thus, if we are finding the fcth-smallestelementin

low : up], then the elementscan be rearrangedsothat the mediansarelow],a[low+l],a[low+2],and soon.Thismakesit easy to implementline

10asa selectiononconsecutiveelementsof a[].FunctionSelect2(Algorithm3.19)resultsfrom the above discussionand the replacementof the recursivecallsof lines15and 16by equivalent codeto restartthe algorithm.

3.6.SELECTION 173

1 AlgorithmSelect2(a,k, low,up)2 // Return i suchthat a[i] is the A;th-smallest elementin3 // a[low :up]; r is a globalvariable as describedin the text.4 {5 repeat6 {7 n '.=up \342\200\224 low + 1;// Number of elements8 if (n < r) then9 {10 lnsertionSort(a,low,up);11 returnlow + k \342\200\224 1;12 }13 for % :=1to [n/r\\ doU {15 lnsertionSort(a,low + (i \342\200\224 1)* r, low + i * r \342\200\224 I);10 // Collectmediansin the front part of a[low :up].17 Interchange^,low + i \342\200\224 1,18 /ow+ (i - 1)*r + [r/2]-1);19 }20 j :=Select2(a,\\[n/r\\/2], low, low + [n/r\\-1);// mm21 Interchange^,low, j);22 j :=Partition(a,/ow, up+l);23 if (fe = (j \342\200\224 low + 1)) thenreturnj;24 elseif (k < (j \342\200\224 /ou; + 1)) thenup :=j \342\200\224 1;25 else20 {27 k:=k-(j- /ow + 1);/ow :=j + 1;28 }29 }until(false);30 }

Algorithm3.19Algorithm Select2


An alternativeto moving the mediansto the front of the array a[low :up] (as in the Interchangestatementwithin the for loop) is to deletethisstatementand usethe fact that the mediansarelocatedat low + (i \342\200\224 \\)r +|Y/2] \342\200\224 1,1 1.At the start of the algorithm,all elementsarea distanceof oneapart,i.e.,a[l],a[2],..., a[n].On the first callof Select2we wish to useonly elementsthat are r apart starting with a[|Y/2]].At the next level of recursion,theelementswill be r2 apart and soon.This ideais developedfurther in theexercises.We refer to arrays with an interelementdistanceof b as b-spacedarrays.

Algorithms Selectl(Algorithm 3.17)and Select2(Algorithm 3.19)wereimplementedand run on a SUN Sparcstation10/30.Table3.8summarizesthe experimentalresultsobtained.Timesshown are in milliseconds.Thesealgorithmswere testedon randomintegersin the range [0, 1000]and theaverageexecutiontimes(over 500input sets) were computed.Selectloutperforms Select2on randominputs. But if the input is already sorted(ornearly sorted),Select2can beexpectedtobesuperiorto Selectl.

n

SelectlSelect2nSelectlSelect2

1,0007.4249.546,00070.88341.34

2,00023.50104.027,00083.14414.06

3,00030.44174.548,00095.00476.98

4,00039.24233.569,000101.32532.30

5,00052.36288.6410,000111.92604.40

Table3.8Comparisonof Selectland Select2on randominputs

EXERCISES1.RewriteSelect2,Partition, and InsertionSortusing6-spacedarrays.

2. (a) Assume that Select2is to beusedonly when all elementsin a aredistinct.Which of the following valuesof r guarantee0(n)worst-caseperformance:r = 3,5,7,9,and 11?Proveyour answers.

(b) Do you expectthe computingtimeof Select2to increaseordecreaseif a larger(but stilleligible)choicefor r is made?Why?

3.6.SELECTION 175

3. Do Exercise2 for the case in which a is not restrictedto distinctelements.Let r = 7, 9,11,13,and 15in part (a).

4. Section3.6describesan alternativeway to handlethe situationwhena[ } is not restrictedto distinctelements.Usingthe partitioningelement v, a[] is divided into threesubsets.Write algorithmscorresponding

to Selectland Select2usingthis idea.Usingyour new version ofSelect2show that the worst-casecomputingtimeis 0(n)even whenr = 5.

5.Determineoptimalr values for worst-caseand averageperformancesof function Select2.

6. |Shamos]Let x[l: n] and y[l : n] contain two setsof integers,eachsortedin nondecreasingorder.Write an algorithmthat finds themedian of the 2n combinedelements.What is the timecomplexityofyour algorithm?(Hint:Usebinary search.)

7. I jet S be a (not necessarilysorted)sequenceof n keys. A key k in Sis saidto be an approximatemedian of S if \\{k' G 5:k' < k}\\ > jand \\{k' G S : k' > k}\\ > j. Devisean 0(n)timealgorithmto findall the approximatemediansof S.

8. Input are a sequenceS of n distinct keys, not necessarilyin sortedorder,and two integersmi and mi (1< mi,m2< n). For any x inS,we define the rank of x in S to be |{A; G S : k < x}\\. Show howto output all the keys of S whoseranks fall in the interval [mi,m2]in0{n)time.

9. The A;th quantilesof an n-elementsetare the k \342\200\224 1elementsfrom thesetthat divide the sortedset into k equal-sizedsets.Give an 0(nlogk)timealgorithmto list the A;th quantilesof a set.

10.Input is a (not necessarilysorted) sequenceS = k\\, A,-2,...,kn of n

arbitrary numbers.Considerthe collectionC of n2 numbersof theform mm{ki,kj},for 1< i,j < n. Presentan 0(n)-timeand 0(n)-spacealgorithmto find the medianof C.

11.Given two vectorsX = (x\\,...,xn) and Y = (yi,...,yn), X < Y ifthereexistsan i,1< i < n, suchthat Xj

= yj for 1< j < i and Xi < yt.(jivenm vectorseachof sizen, write an algorithmthat determinestheminimum vector.Analyze the timecomplexityof your algorithm.

12.Presentan 0(1)timeMonte Carloalgorithmto find the medianofan array of n numbers. The answer output shouldbe correctwithprobability > -k

CHAPTER3. DIVIDE-AND-CONQUER

Input is an array a[ ] of n numbers.Presentan O(logn) timeMonteCarloalgorithmto output any memberof a[ ] that is greaterthan orequal to the median.Theanswer shouldbe correctwith highprobability. Providea probability analysis.

Given a setJofn numbers,how will you find an elementof X whoserank in X is at most jt^t, using a Monte Carloalgorithm? Your

algorithmshouldrun in time0(f(n)logn). Prove that the outputwill becorrectwith highprobability.

In additionto Selectland Select2,we can think of at leasttwo moreselectionalgorithms.The first of theseis very straightforward andappearsas Algorithm 3.20(Algorithm Select3).The timecomplexityof Select3is

0{nmin {k,n \342\200\224 k + 1})

Hence,it is very fast for valuesof k closeto 1orcloseto n. In the worstcase,its complexityis 0(n2).Itsaveragecomplexityis also0(n2).Another selectionalgorithmproceedsby first sortingthe n elementsinto nondecreasingorderand thenpickingout the kth element.Acomplete sortcan be avoided by usinga minheap.Now, only k elementsneedto beremovedfrom the heap.Thetimeto setup the heapis 0{n).An additional0(klogn)timeis neededto make k deletions.The totalcomplexityis 0(n+ klogn).This basicalgorithmcan be improvedfurther by usinga maxheapwhen k > n/2and deletingn \342\200\224 k + 1elements. Thecomplexityis now 0(n+ log n min {k,n\342\200\224 k + l}).CalltheresultingalgorithmSelect4.Now that we have four plausibleselectionalgorithms,we would like to know which is best.On the basisof theasymptotic analyses of the four selectionalgorithms,we can make thefollowing qualitative statementsaboutourexpectationson the relativeperformanceof the four algorithms.

\342\200\242 Becauseof the overheadinvolved in Selectl,Select2,and Select4and the relativesimplicity of Select3,Select3will be fastest bothon the averageand in the worst casefor smallvalues of n. Itwill alsobe fastest for largen and very smallorvery largek, forexample,k = 1,2, n, or n \342\200\224 1.

\342\200\242 Forlargervalues of n, Selectlwill have the bestbehavioron theaverage.

\342\200\242 As far as worst-casebehavioris concerned,Select2will out-performthe otherswhen n is suitably large.However,therewill probablybe a range of n for which Select4will be faster than both Se-Iect2and Select3.We expectthis becauseof the relatively large

3.6.SELECTION 111

123456789101L121314If,l(i171819202L2223242,r\302\273

2(i

AlgorithmSelect3(a,n,ft)

II{

}

Rearrangea[ ] suchthat a[A;] is the ft-th smallest.

if (ft <for{

}elsefor{

}

[n/2\\)theni :=1to ft do

q :=i;rain :=a[i];for j :=i + 1to n do

if [a[j]< vain) then{

q :=j; rmn :=a[j'];}

Interchange^,q, i);

i :=n to ft step\342\200\2241 do

g :=i;max:=o[i];for j :=(i \342\200\224 1) to 1 step\342\200\2241 do

if [a[j]>max)then{

q:=j; max:=a[j};}

Interchange^,g, i);

Algorithm3.20Straightforwardselectionalgorithm

overheadin Select2(i.e.,the constant term in 0(n)is relativelylarge).

\342\200\242 As a resultof the above assertions,it is desirableto obtaincomposite algorithmsfor goodaverageand worst-caseperformances.The compositealgorithmfor goodworst-caseperformancewillhave the form of function Select2but will includethe followingafter the first if statement.

if (n <ci)thenreturnSelect3(a,m, p, ft);elseif (n < C2) thenreturnSelect4(a,m, p, ft);

Sincethe overheadsin Selectland Select4are about the same,theconstantsassociatedwith the averagecomputingtimeswill be about

CHAPTER3. DIVIDE-AND-CONQUER

the same.Hence,Selectlmay always be betterthan Select4or theremay be a smallC3 such that Select4is betterthan Selectlfor n < C3.In any case,we expectthereis a c^,c^> 0,suchthat Select3is fasterthan Selectlon the averagefor n < c\\.

To verify the precedingstatementsand determineci,C2,C3,and C4,itis necessaryto programthe four algorithmsin someprogramminglanguage and run the four correspondingprogramson a computer.Oncethe programshave beenwritten, test data are neededto determineaverageand worst-casecomputingtimes. So,let us now saysomething about the data neededto obtain computingtimesfrom which<Hi 1<

\302\253< 4, canbedetermined.Sincewe would alsolike information

regardingthe averageand worst-casecomputingtimesof the resultingcompositealgorithms,we need test data for this too.We limit ourtesting to the caseof distinctelements.To obtain worst-casecomputingtimesfor Selectl,we changethealgorithm slightly. This changewill not affect its worst-casecomputingtimebut will enableus to use a rathersimpledata set todeterminethis timefor various values of n. We dispensewith the randomselection rulefor Partition and insteadusea[m] as the partitioningelement.It is easy to see that the worst-casetime is obtainedwith a[i] = i,1 < i < n, and k = n. As far as the averagetimefor any given nis concerned,it is not easy to arrive at onedataset and a k thatexhibits this time. On the otherhand, trying out all n!different inputpermutationsand k = 1,2,...,n for eachof theseis not a feasiblewayto find the average.An approximationto the averagecomputingtimecan be obtainedby trying out a few (say ten) randompermutationsof the numbers1,2,..., n and for eachof theseusinga few (say five)randomvalues of k. Theaverageof the timesobtainedcan be usedas an approximationto the averagecomputingtime.Of course,usingmorepermutationsand morek values resultsin a betterapproximation. However,the numberof permutationsand k values we can use islimitedby the amountof computationalresources(in termsof time)we have available.ForSelect2,the averagetimecan be obtainedin the sameway as forSelectl.Fortheworst-casetimewe caneithertry to figure out an inputpermutationfor which the numberof elementslessthan the medianofmediansis always as largeas possibleand then use k = 1.A simplerapproachis to find justan approximationto the worst-casetime.Thiscan be done by taking the maxof the computingtimesfor all thetestsused to obtainthe averagecomputingtime.Sincethe computingtimesfor Select2vary with r, it is first necessaryto determinean rthat yields optimumbehavior.Note that the r'sfor optimumaverageand worst-casebehaviorsmay be different.

3.7.STRASSEN'SMATRIXMULTIPLICATION 179

We can verify that the worst-casedata for Select3area[i]= n + 1\342\200\224 i,for 1< i < n, and k = |.Thecomputingtimefor Select3is relativelyinsensitiveto the input permutation.Thispermutationaffects only thenumber of timesthe secondif statementof Algorithm 3.20is executed.On the average,this will be doneabout half the time. This can beachievedby using a[i] = n + 1 \342\200\224 i, 1< i < n/2,and a[i] = n + 1,n/2< i < n. The k value neededto obtain the averagecomputingtime is readily seento ben/4.

(a) What testdatawould you use to determineworst-caseandaverage

timesfor Select4?(b) Usethe ideasabove to obtain a tableof worst-caseand average

timesfor Selectl,Select2,Select3,and Select4.

16.ProgramSelectland Select3.Determinewhen algorithmSelectlbecomesbetterthan Select3on the averageand alsowhen Select2betterthan Select3for worst-caseperformance.

17.[Project]Programthe algorithmsof Exercise4 as well as Select3andSelect4.Carry out a completetestalongthe linesdiscussedin Exercise15.Write a detailedreporttogetherwith graphsexplainingthe datasets,test strategies,and determinationof ci,...,C4.Write the finalcompositealgorithmsand give tablesof computingtimesfor thesealgorithms.

3.7 STRASSEN'SMATRIXMULTIPLICATIONLet A and B be two n x n matrices.TheproductmatrixC = AB is alsoann x n matrixwhose i,jthelementis formed by taking the elementsin theith row of A and the jth columnof B and multiplying them to get

C(i,j)= Y, A(i,k)B(k,j) (3.10)l<k<n

for all i and j between1 and n. To computeC(i,j)using this formula,we needn multiplications.As the matrixC has n2 elements,the timefor the resultingmatrixmultiplicationalgorithm,which we refer to as theconventionalmethodis 6(n3).

The divide-and-conquerstrategy suggestsanother way to computetheproductof two n x n matrices.For simplicity we assumethat n is a powerof 2, that is,that thereexistsa nonnegativeintegerk suchthat n = 2k.Incase?i is not a power of two, then enoughrows and columnsof zeroscan beaddedto both A and B so that the resultingdimensionsarea power of two


(seethe exercisesfor moreon this subject).Imaginethat A and B areeachpartitionedinto four squaresubmatrices,eachsubmatrixhaving dimensions^x|.Then the productAB can becomputedby usingthe above formulafor the productof 2 x 2 matrices:if AB is

\"

An A12A21 A22

thenC11= ^4n-Bii+ A12B21C12= A11B12+ A12B22C21 = A21B11+ A22B21C22 = A21B12+ A22B22

If n = 2, then formulas (3.11)and (3.12)arecomputedusingamultiplication operationfor the elementsof A and B.Theseelementsaretypicallyfloating point numbers.Forn > 2, the elementsof C can be computedusingmatrix multiplicationand additionoperationsappliedto matricesofsizen/2x n/2.Sincen is a power of 2,thesematrixproductscanberecursively computedby the samealgorithmwe areusingfor the nxncase.Thisalgorithmwill continueapplying itselfto smaller-sizedsubmatricesuntil nbecomessuitably small(n = 2) sothat the productis computeddirectly.

To computeAB using (3.12),we need toperformeight multiplicationsof n/2x n/2matricesand four additionsof n/2x n/2matrices.Sincetwon/2xn/2matricescanbeaddedin timeen2 for someconstantc, the overallcomputingtimeT(n) of the resultingdivide-and-conqueralgorithmis givenby the recurrence

T( \\- J b U-21[n) ~

\\ 8T(n/2)+ cn2 n > 2

whereb and c areconstants.This recurrencecan be solved in the sameway as earlierrecurrencesto

obtainT{n)= 0(n3).Henceno improvementover the conventionalmethodhas beenmade.Sincematrixmultiplicationsaremoreexpensivethan matrixadditions(0(n3)versus0(n2)),we canattempttoreformulatetheequationsfor Cij soas to have fewer multiplicationsand possibly moreadditions.Volker Strassenhas discovereda way tocomputethe Cjj'sof (3.12)usingonly 7 multiplicationsand 18additionsorsubtractions.His methodinvolvesfirst computingthe seven n/2x n/2matricesP, Q, R, S, T, U, and V asin (3.13).Then the C^'sare computedusing the formulas in (3.14).Ascan be seen,P, Q, R, S, T, [/, and V can be computedusing 7 matrixmultiplicationsand 10matrixadditionsor subtractions.TheCjj'srequirean additional8 additionsor subtractions.

Bn BuB21 B22

Cn C12C21 C22 (3.11)

(3.12)

3.7.STRASSEN'SMATRIXMULTIPLICATION 181

pQRSTUV

=======

diCV2C2iC22

(An + A22)(Bn+ B22{A2i + A22)BUAn(Bv2-B22)A22{B2i-Bii)(Axl + Al2)B22(A2i-An)(Bn+B12(Av2

-A22)(B2l+B22= P+S-T+V= R + T= Q + S= P+R-Q+U

(3.13)

(3.14)

Theresultingrecurrencerelationfor T(n) is

T^ ={lT(n/2)+aniill ^

wherea and b areconstants.Working with this formula, we get

T(n) = an2[l+ 7/4 + (7/4)2+---+ (7/4)fc-1]+7fcT(l)< m2(7/4)10^\302\273 + 7log2\", c a constant\342\200\224 cnlo824+log27-log24

_j_ nlog27

= 0(nlo^7)\302\2530(n2'81)

EXERCISES1.Verify by hand that Equations3.13and 3.14yield the correctvalues

for Cn,C12,C21,and C22-

2. Write an algorithmthat multipliestwo nxn matricesusing0(nx)operations. Determinethe precisenumberof multiplications,additions,and array elementaccesses.

3. If k is a nonnegativeconstant,then prove that the recurrence

3T(n/2)+ kn n > 1has the following solution(for n a power of 2):

T(n) = 3knl0 3 -2kn (3.17)


Figure3.6Convexhull: an example

(1) obtain the verticesof the convexhull (theseverticesarealsocalledextreme points),and (2) obtain the verticesof the convexhull in someorder(clockwise,for example).

Hereis a simplealgorithmfor obtainingthe extremepoints of a givenset S of points in the plane. To checkwhethera particularpoint p \302\243 Sis extreme,lookat eachpossibletripletof pointsand seewhetherp liesinthe triangleformed by thesethreepoints.If p liesin any such triangle,itis not extreme;otherwiseit is.Testingwhetherp liesin a given trianglecan be done in 0(1)time (usingthe methodsdescribedin Section3.8.1).Sincethereare 0(n3)possibletriangles,it takes0(n3)timeto determinewhethera given point is an extremepoint or not.Sincetherearen points,this algorithmruns in a totalof 0(n ) time.

Usingdivide-and-conquer,we can solve both versionsof the convexhullproblemin 0(nlogn)time.We developthreealgorithmsfor the convexhullin this section.The first has a worst-casetimeof 0(n2)whereasitsaverage

timeis 0(nlogn).This algorithmhas a divide-and-conquerstructuresimilarto that of QuickSort.The secondhas a worst-casetimecomplexityof 0(nlogn)and is not basedon divide-and-conquer.The third algorithmis basedon divide-and-conquerand has a timecomplexityof 0(nlogn)inthe worst case.Beforegiving further details,we digressto discusssomeprimitive geometricmethodsthat areused in the convexhull algorithms.

3.8.1SomeGeometricPrimitivesLet A be an n x n matrixwhoseelementsare{a^},1< i, j <n.Theijthminorof A, denotedas Aij, is defined to be the submatrixof A obtainedby deletingthe ith row and j'thcolumn.The determinantof A, denoted

3.8.CONVEXHULL 185

det(A), is given by

, .(A\\ \342\200\224 j an n = 1det(A)-|an det(Aii) _ ai2det(A12)+ \342\200\242\342\200\242\342\200\242+ (-1)\"\"1det(i4ln) n > 1

Considerthe directedlinesegment(j>\\,P2) from somepoint p\\= (x\\,y\\)

to someotherpoint p2 = (2:2,2/2)-If 9 = (#3,2/3) is anotherpoint,we say qis to the left (right) of (pi,P2)if the anglep\\P2q is a left (right) turn. [Anangle is saidto be a left (right) turn if it is lessthan or equal to (greaterthan or equal to) 180\302\260.]

We can checkwhetherq is to the left (right) of{PI1P2)hy evaluatingthe determinantof the following matrix:

Xi X2 X3

2/1 2/2 2/31 1 1

If this determinantis positive (negative),then q is to the left (right)of {pi,P'z)-If this determinantis zero,the threepointsare colinear.Thistest can beused,for example,to checkwhethera given point p is within atriangleformed by threepoints,say P\\,P2,and p% (inclockwiseorder).Thepoint p is within the triangleiff p is to the right of the linesegments(pi,/^),(P2,p:i),and (p3,Pi)-

Also, for any threepoints (#i,yi),(#2,2/2)5and (#3,2/3),the signedareaformed by the correspondingtriangle is given by one-halfof the abovedeterminant.

Let P\\,P2t\342\200\242\342\200\242iPn be the verticesof the convexpolygon Q in clockwiseorder. Let p be any otherpoint.It is desiredto checkwhetherp liesinthe interiorof Q oroutside.Considera horizontallineh that extendsfrom\342\200\224 00 to 00 and goesthroughp. Thereare two possibilities:(1) h doesnotintersectany of the edgesof Q, (2) h intersectssomeof the edgesof Q. Ifcase(L) is true, then,p is outsideQ.In case(2), therecan be at most twopoints of intersection.If h intersectsQ at a singlepoint,it is countedastwo. Count the numberof points of intersectionsthat are to the left of p.If this numberis even,thenp is externalto Q;otherwiseit is internalto Q.This methodof checkingwhetherp is interiorto Q takes0(n)time.

3.8.2 TheQuickHullAlgorithmAn algorithmthat is similarto QuickSortcan be devisedto computetheconvexhull of a set X of n points in the plane. This algorithm,calledQuickHull,first identifies the two points (callthem p\\ and p-i) of X withthe smallestand largestx-coordinatevalues.Assumenow that therearenoties.Later we seehow to handleties.Both p\\ and P2 areextremepointsand part of the convexhull. The set X is divided into X\\ and X2 sothat


X\\ has all the points to the left of the linesegment(pi,P2)and X2 has allthe points to the right of {p\\,P2)-Both X\\ and X2 includethe two pointspi and p2.Then, the convexhulls of X\\ and X2 (calledthe upper hull andlowerhull, respectively)arecomputedusinga divide-and-conqueralgorithmcalledHull. Theunionof thesetwo convexhulls is the overall convexhull.

If thereis morethan onepoint with the smallestx-coordinate,letp[andp'[be the pointsfrom amongthesewith the leastand largesty-coordinates,respectively.Similarly define p'2 and p2 f\302\260r the points with the largestincoordinatevalues. Now X\\ will be all the points to the left of {p'[,p2)(includingp'[and p2) and X2 will be all the points to the right of (p[,p2)(includingp\\ and p'2).In the restof the discussionwe assumefor simplicitythat thereareno tiesfor p\\ and p2.Appropriatemodificationsareneededin the event of ties.

We now describehow Hull computesthe convexhull of Xi.We determinea point of X\\ that belongsto the convexhull of X\\ and use it to partitionthe probleminto two independentsubproblems.Sucha point is obtainedbycomputingthe areaformed by p\\,p,and p2 for eachp in Xi and pickingtheonewith the largest(absolute)area.Tiesarebrokenby pickingthe point pfor which the anglepp\\p2 is maximum.Let p% be that point.

Now Xi is divided into two parts;the first part containsall the pointsofX\\ that are to the left of (pi,P3)(includingp\\ and p%),and the secondpartcontainsallthe pointsof Xi that are to the left of (^3,^2)(includingp3 andp2) (seeFigure3.7).Therecannotbe any point of Xi that is to the left ofboth (p\\,P3)and (p3,p2).Also, all the otherpointsareinteriorpointsandcan be droppedfrom future consideration.The convexhull of eachpart iscomputedrecursively,and the two convexhulls aremergedeasily by placingonenext to the otherin the right order.

If thereare m points in X\\, we can identify the point of division p3 intime0(m).PartitioningXi into two partscan alsobe done in 0(m)time.Mergingthe two convexhulls can be done in time0(1).Let T(m) standfor the run timeof Hull on a list of m pointsand let mi and m2 denotethesizesof the two resultantparts.Note that mi + m2 < m. The recurrencerelationfor T{m)is T{m)= T{m\\)+T(m2)+ 0(m),which is similarto theone for the run timeof Quicksort.Theworst-caserun timeis thus 0{m2)on an input of m points.This happenswhen the partitioningat eachlevelof recursionis highly uneven.

If the partitioningis nearly even at eachlevel of recursion,then the runtimewill equal0(mlogm)as in the caseof QuickSort.Thus the averagerun timeof QuickHull is 0(nlogn), on an input of sizen, underappropriateassumptionson the input distribution.

3.8.CONVEXHULL 187

Figure3.7Identifying a point on the convexhull of Xi

3.8.3 Graham'sScanIf S is a set of points in the plane,Graham'sscan algorithmidentifies thepoint p from Swith the lowesty-coordinatevalue (tiesarebrokenby pickingthe leftmost amongthese). It then sortsthe points of S accordingto theanglesubtendedby the points and p with the positive x-axis.Figure3.8gives an example.After having sortedthe points,if we scan through thesortedlist starting at p,every threesuccessivepoints will form a left turnif all of thesepoints lieon the hull. On the otherhand if thereare threesuccessivepoints,say pi,P2,and P3, that form a right turn, then we canimmediately eliminatep2 sinceit cannotlieon the convexhull. Noticethatit will bean internalpoint becauseit lieswithin the triangleformed by p,p\\,and /;.{.

We caneliminateallthe interiorpointsusingthe aboveprocedure.Starting

from p,we considerthreesuccessivepointspi,P2,and p% at a time.Tobeginwith, p\\ = p. If thesepoints form a left turn, we move to the nextpoint in the list (that is,we set p\\ = P2,and soon). If thesethreepointsform a right turn, then P2 is deletedsinceit is an interiorpoint.We moveone point behind in the list by setting p\\ equal to its predecessor.Thisprocessof scanningendswhen we reachthe point p again.

Example3.11In Figure3.8,the first threepointslookedat arep,1,and 2.Sincetheseform a left turn, we move to 1,2,and 3.Theseform a right turnand hence2 is deleted.Next,the threepointsp,1,and 3 are considered.Theseform a left turn and hencethe pointeris moved to point 1.Thepoints


Figure3.8Graham'sscanalgorithmsortsthe pointsfirst

1,3,and 4 alsoform a left turn, and the scanproceedsto 3,4,and 5 andthen to 4,5,and 6. Now point 5 gets deleted.The triplets3,4,6;4,6,7;and 6,7,8form left turns whereasthe next triplet7,8,9 forms a right turn.Therefore,8 getsdeletedand in the next round 7 alsogetseliminated.Thenext threetripletsexaminedare4,6,9;6,9,10;and 9,10,p,allof which areleft turns. Thefinal hull obtainedis p,1,3,4,6,9, and 10,which arepointson the hull in counterclockwise(ccw)order. \342\226\241

Thisscanprocessis given in Algorithm 3.21.In this algorithmthe setofpointsis realizedasa doubly linked list ptslist.FunctionScanruns in 0{n)timesincefor eachtripletexamined,eitherthe scanmoves onenodeaheador onepoint getsremoved.In the lattercase,the scanmoves onenodeback.Also note that for eachtriplet,the test as to whethera left or right turn isformed can be done in 0(1)time.FunctionArea computesthe signedareaformed by threepoints.Themajorwork in the algorithmis in sortingthepoints.Sincesortingtakes0(nlogn) time,the totaltimeof Graham'sscanalgorithmis 0(nlogn).

3.8.4 An 0(nlogn)Divide-and-ConquerAlgorithmIn this sectionwe presenta simpledivide-and-conqueralgorithm,calledDCHull,which alsotakes0(nlog n) timeand computesthe convexhull inclockwiseorder.

3.8.CONVEXHULL 189

point = record{floatx; floaty;point *prev;point *next;

h1 AlgorithmScan(list)2 // listis a pointerto the first nodein the input list.3 {4 *p :=list;*pl:=list;5 repeat6 {7 pi:= (pi \342\200\224> next);8 if ((p2->next) ^ 0) thenp3 :=(p2 ->nezi);9 elsereturn;// End of the list10 temp:=Area((pl->x), (pi ->y), (p2->x),11 (p2-\342\226\272!/), (p3^x),(p3-\342\226\272?/));

12 if (temp>0.0)thenpi :=(pi \342\200\224> next);13 // If pl,p2,p3form a left turn, move onepoint ahead;14 // If not, deletep2 and move back,in elseHi {17 (pi \342\200\224> next) :=p3;(p3 \342\200\224> prev) :=pi;deletep2;18 pi := (pi \342\200\224> prev);19 }20 }until(false);21 }1 AlgorithmConvexHull(ptslist)2 {3 // ptslistis a pointerto the first itemof the input list.Find4 // the point p in ptslistof lowesty-coordinate.Sort the5 // pointsaccordingto the anglemadewith p and the x-axis.

6 Sort(ptslist);Scan(ptslist);Pr\\ntL\\st(ptslist);7 }

Algorithm3.21Graham'sscanalgorithm


Given a setlofnpoints,like that in the caseof QuickHull,the problemis reducedto finding the upperhull and the lower hull separatelyand thenputting them together.Sincethe computationsof the upperand lower hullsarevery similar,we restrictourdiscussionto computingthe upperhull. Thedivide-and-conqueralgorithmfor computingtheupperhull partitionsX intotwo nearly equalhalves.Partitioningis doneaccordingto the x-coordinatevaluesof pointsusingthe median -coordinateas the splitter(seeSection3.6for a discussionon medianfinding).Upperhulls are recursively computedfor the two halves. Thesetwo hulls are then mergedby finding the line oftangent(i.e.,a straight line connectinga point eachfrom the two halves,suchthat all the pointsof X areon onesideof the line) (seeFigure3.9).

Hntangent

Figure3.9Divide and conquerto computethe convex hull

To beginwith, the points p\\ and\302\243>2

are identified [wherep\\ (^2) is thepoint with the least(largest)x-coordinatevalue].Thiscan bedonein 0{n)time.Tiescan be handledin exactlythe samemanneras in QuickHull.So,assumethat thereare no ties.All the points that are to the left of theline segment(pi,P2)are separatedfrom thosethat are to the right. Thisseparationalsocan be done in 0{n)time. Fromhereon, by \"input\" and\"X\" we meanall the pointsthat are to the left of the linesegment(p\\,p2).Also let \\X\\=N.

Sort the input points accordingto their x-coordinatevalues. Sortingcan be done in 0(NlogN) time. This sorting is done only oncein thecomputationof the upperhull. Let gi,q2,..\342\226\240, qN bethe sortedorderof these

3.8.(CONVEXHULL 191

points.Now partitionthe input into two equal halves with qi,q2l \342\226\240\342\226\240

\342\226\240, qN/2in the first half and gTv/2+iiQn/2+2i\342\200\242\342\200\242\342\200\242,Qn m the secondhalf. The upperhull of eachhalf is computedrecursively.Let H\\ and H2 be the upperhulls.Upperhulls are maintainedas linkedlistsin clockwise\"order. We refertothe first elementin the list as the leftmost point and the lastelementas therightmostpoint.

Thelineof tangent is then found in 0(logN) time. If (u,v) is the lineof tangent, then all the pointsof Hi that areto the right of u aredropped.Similarly, all the points that are to the left of v in H2 are dropped.Theremainingpart of Hi, the line of tangent, and the remainingpart of H2form the upper hull of the given input set.

If T(N) is the run timeof the above recursivealgorithmfor the upperhull on an input of N points,then we have

T(N) = 2T(N/2)+0(log2N)

which solvesto T(N) = O(N).Thusthe run timeis dominatedby the initialsortingstep.

Theonly part of the algorithmthat remainsto bespecifiedis how to findthe lineof tangent (u,v) in 0(log2N) time.Theway to find the tangent isto start from the middlepoint,callit p,of H\\. Herethe middlepoint refersto the middleelementof the correspondinglist.Find the tangent of p withH2, Let {p,q)be the tangent. Using(p,q),we can determinewhetheru isto the left of, equal to,or to the right of p in H\\, A binary searchin thisfashion on the pointsof H\\ revealsu.Usea similarprocedureto isolatev.

Lemma3.1Let Hi and H2 be two upper hulls with at mostm pointseach. If p is any point of Hi,its point q of tangency with H2 can be foundin O(logm)time.

Proof. If q' is any point in H2l we can checkwhetherq' is to the left of,equalto.or to the right of q in 0(1)time(seeFigure3.10).In Figure3.10,x and y arethe left and right neighborsof q' in H2,respectively.If Ipq'xisa right, turn and Ipq'yis a left turn, then q is to the right of q' (seecase1ofFigure3.10).If Ipq'xand Ipq'yareboth right turns, then q' = q (seecase2 of Figure3.10);otherwiseq is to the left of q' (seecase3 of Figure3.10).Thus we can performa binary searchon the pointsof H2 and identify q in

O(logm)time. \342\226\241

Lemma3.2If Hi and H2 aretwo upperhulls with at mostm pointseach,their commontangentcan becomputedin 0(logm) time.

Proof.Let u\342\202\254 Hi and v \342\202\254 H2 be such that {u,v) is the lineof tangent.

Also letp bean arbitrary point of Hi and letq \342\202\254 H2 besuchthat {p,q) is a

192 CHAPTER3, DIVIDE-AND-CONQUER

//, H\342\200\236

P case1

case2

P

Figure3.10Proofof Lemma3.1

tangentof Hi-Givenp and g, we cancheckin 0(1)timewhetheru is to theleft of, equalto,or tothe right of p (seeFigure3.11).Herex and y areleftand right neighbors,respectively,of p in H\\. If (j),q) is alsotangentialtoHi,then p = u. If Lxpqis a left turn, then u is to the left of p;otherwiseuis to the right of p. This suggestsa binary searchfor u. For eachpoint p ofHichosen,we have to determinethe tangentfrom p to H2 and then decidethe relativepositioningof p with respecttou. We can do this computationin 0(logmx logm) = O(log2m) time. \342\226\241

In summary,given two upperhulls with y pointseach,the lineof tangentcan becomputedin 0(logN) time.

Theorem3.4A convexhull of n points in the planecan be computedin0(nlogn) time. \342\226\241

3.9.REFERENCESAND READINGS 193

q

y

H, H,


EXERCISES1.Write an algorithmin pseudocodethat implementsQuickHull and test

it usingsuitabledata.

2. Codethe divide-and-conqueralgorithmDCHull and test it usingappropriate data.

3. Run the threealgorithmsfor convexhull discussedin this sectiononvarious randominputs and comparetheir performances.

4. Algorithm DCHull can be modified as follows: Insteadof using themedianas the splitter,we coulduse a randomly chosenpoint as thesplitter.ThenX is partitionedinto two aroundthis point.Therestofthe function DCHullis the same.Write codefor this modifiedalgorithmand compareit with DCHull empirically.

5. Let Sbea setof n points in the plane.It is given that thereis only aconstant (say c) numberof pointson the hull of S.Can you deviseaconvexhull algorithmfor S that runs in timeo(nlogn)? Conceiveofspecialalgorithmsfor c= 3 and c= 4 first and then generalize.

3.9 REFERENCES AND READINGS

Algorithm MaxMin (Algorithm 3.6)is due to I.Pohland the quicksortalgorithm (Algorithm 3.13)is due toC.A. R. Haore.Therandomizedsortingalgorithmin Algorithm 3.16is due to W. D.Frazerand A. C.McKellerand


the selectionalgorithmof Algorithm 3.19is due to M. Blum,R. Floyd, V.Pratt, R. Rivest and R. E. Tarjan.

For moreon randomizedsortingand selectionsee:\"Expectedtimeboundsfor selection,\"by R. Floyd and R. Rivest,Communications of the ACM 18,no.3 (1975):165-172.\"Samplesort:A SamplingApproach to Minimal StorageTreeSorting,\"byW. D. Frazerand A. C.McKellar,Journalof the ACM 17,no. 3 (1970):496-507.\"Derivation of RandomizedSortingand SelectionAlgorithms,\"by S.Ra-jasekaranand J. H.Reif, in ParallelAlgorithm Derivationand ProgramTransformation, editedby R. Paige,J. H. Reif, and R. Wachter, KluwerAcademicPublishers,1993,pp.187-205.

The matrixmultiplicationalgorithmin Section3.7is due to V.Strassen.For moreinformation onthe matrixmultiplicationproblemsee\"Matrixmultiplication via arithmeticprogressions,\"by D. Coppersmithand S.Wino-grad, Journalof Symbolic Computation 9 (1990):251-280.A complex0(n2'376)timealgorithmfor multiplying two nx n matricesis given in thispaper.

For moreapplicationsof divide-and-conquersee:ComputationalGeometry, by F. Preparataand M. I. Shamos,Springer-Verlag,1985.ComputationalGeometry:An IntroductionThrough RandomizedAlgorithmsby K.Mulmuley,Prentice-Hall,1994.Introductionto Algorithms: A Creative Approach,by U.Manber,Addison-Wesley, 1989.

3.10ADDITIONALEXERCISES1.What happensto the worst-caserun timeof quicksortif we use the

medianof the given keysas the splitterkey? (Assumethat the selectionalgorithmof Section3.6is employedtodeterminethe median).

2. The setsA and B have n elementseachgiven in the form of sortedarrays. Presentan 0(n)timealgorithmtocomputeAUBand AC\\B.

3. The setsA and B have m and n elements(respectively)from a linearorder.Thesesetsarenot necessarilysorted.Alsoassumethat m <n.Show how to computeAU B and A n B in O(nlogm)time.

4. Considerthe problemof sortinga sequenceX of n keys whereeachkey is eitherzeroor one (i.e.,eachkey is a bit).One way of sorting

3JO. ADDITIONALEXERCISES 195

X is to start with two empty listsLq and L\\. Let X = k\\,k2, \342\226\240\342\226\240\342\226\240,kn.For each1< i < n do:If kt = 0,then appendk-h to Lq. If ki = 1,thenappend/cj to Li. After processingall the keys of X in this manner,output the list L0 followed by the list L\\.

Theabove ideaof sortingcanbeextendedto the casein whicheachkeyis of lengthmorethan onebit.In particular,if the keysareintegersinthe range[0,m \342\200\2241],

thenwe startwith rn empty lists,Lo,L\\,...,Lm_i,one list (or bucket)for eachpossiblevalue that a key can take. Thenthe keys areprocessedin a similarfashion.In particular,if a key hasa value

\302\243,

then it will beappendedto the ftli list.Write an algorithmthat employsthis ideato sortn keysassumingthateachkey is in the range [0,m \342\200\224 1].Show that the run timeof youralgorithmis 0(n+m). Thisalgorithmis known as the bucketsort.

5. Considerthe problemof sortingn two-digit integers.Theideaof radixsortcan beemployed.We first sortthe numbersonly with respecttotheir leastsignificantdigits (LSDs).Followedby this, we apply a sortwith respectto their secondLSDs.More generally, d-digitnumberscan be sortedin d phases,where in the ith phase(1< i < d) wesortthe keys only with respectto their ith LSDs.Will this algorithmalwayswork?

As an example,let the input be k\\ = 12,k-i = 45,k%= 23,k\\ =

14,A,y,= 32,and k(, = 57.After sortingthesekeys with respectto their

LSDs,we end up with: k5 = 32,k\\ = 12,fc3 = 23,k\\ = 14,/c2 = 45,and ke = 57. When we sort the resultant sequencewith respecttothe keys' secondLSDs(i.e.,the next-mostsignificant digits),we getA:i = 12,fc4 = 14,k:i = 23,k5 = 32,k2 - 45, and k6 = 57,which is thecorrectanswer!Hut note that in the secondphase of the algorithm, k\302\261

= 14,/ci =12,fej = 23,&5 = 32,k-2 = 45,k^ = 57 is alsoa valid sortwith respectto the secondLSDs.Theresult in any phaseof radixsortingcan beforced to becorrectby enforcingthe following conditionon the sortingalgorithmto be used.\"Keys with equalvalues shouldremainin thesamerelativeorderin the output as they were in the input.\" Any

sortingalgorithmthat satisfies this is calleda stable sort.Note that in the above example,if the algorithmused to sort thekeys in the secondphaseis stable,then the output will be correct.In summary, radixsortcan be employed to sortaf-digit numbersin dphasessuchthat the sortappliedin eachphase(exceptthe first phase)is stable.More generally, radixsort can be used to sort integersof arbitrarylength. As usual,the algorithmwill consistof phasesin eachof whichthe keys are sortedonly with respectto certainparts of their keys.


The parts used in eachphasecould be singlebits,singledigits,ormoregenerally, \302\243 bits,for someappropriate\302\243.

In Exercise4, you showed that n integersin the range [0,m \342\200\224 1] canbe sortedin 0(n+m) time. Isyour algorithmstable?If not, makeit stable.As a specialcase,your algorithmcan sortn integersin therange [0,n \342\200\224 1]in 0(n)time. Usethis algorithmtogetherwith theideaof radixsortingtodevelopan algorithmthat can sortn integersin the range [0,nc \342\200\224 1](for any fixed c) in 0(n)time.

6.Two setsA and B have n elementseach.Assume that eachelementisan integerin the range [0,n100].Thesesetsarenot necessarilysorted.Show how tocheckwhetherthesetwo setsaredisjointin 0(n)time.Your algorithmshoulduse0{n)space.

7. Input arethe setsSi,S2,...,and Sg (where\302\243<n). Elementsof these

setsare integersin the range [0,nc \342\200\224 1](for somefixed c). Also letJ2i=i\\\302\243>i\\

= n- The goal is to output Si in sortedorder,then 62in sortedorder,and so on.Presentan 0(n)timealgorithmfor thisproblem.

8. Input is an array of n numberswhereeachnumberis an integerin therange[0,N] (for someN \302\273 n). Presentan algorithmthat runs in theworst casein timeO and checkswhetherall thesen numbersaredistinct.Your algorithmshoulduseonly 0(n)space.

9.Let S be a sequenceof n2 integersin the range [l,n].Let R(i) bethe numberof i'sin the sequence(for i = 1,2,...,n). Given S,wehave to computean approximatevalue of R(i) for eachi.IfN(i) is anapproximationtoR(i),i= 1,...,n,it shouldbe the casethat (withhigh probability) N(i) > R(i) for eachi and Ya=i N{i) = 0{n2).Of coursewe can do this computationin deterministic0(n2)time.Designa randomizedalgorithmfor this problemthat runs in time0(nlog\302\260^n).

Chapter4

THE GREEDYMETHOD

4.1 THE GENERALMETHOD

Thegreedy methodis perhapsthe most straightforward designtechniqueweconsiderin this text,and what's moreit canbeappliedto a wide variety ofproblems.Most, thoughnot all,of theseproblemshave n inputsand requireus to obtaina subsetthat satisfiessomeconstraints.Any subsetthatsatisfies thoseconstraintsis calleda feasiblesolution.We needto find a feasiblesolutionthat eithermaximizesor minimizesa given objective function. Afeasiblesolutionthat doesthis is calledan optimal solution.Thereisusually an obvious way to determinea feasiblesolutionbut not necessarilyanoptimalsolution.

Tin1greedy methodsuggeststhat onecandevisean algorithmthat worksin stages,consideringoneinput at a time.At eachstage,a decisionis maderegardingwhethera particularinput is in an optimalsolution.This is doneby consideringthe inputs in an orderdeterminedby someselectionprocedure. If the inclusionof the next input into the partially constructedoptimalsolutionwill result in an infeasiblesolution,then this input is not addedtothe partialsolution.Otherwise,it is added.Theselectionprocedureitselfis basedon someoptimizationmeasure.This measuremay be the objectivefunction. In fact, severaldifferent optimizationmeasuresmay be plausiblefor a given problem.Most of these,however, will result in algorithmsthatgeneratesuboptimalsolutions.Thisversionof the greedy techniqueis calledthe subsetparadigm.

We can describethe subsetparadigmabstractly, but morepreciselythanabove, by consideringthe controlabstractionin Algorithm 4.1.

The function Selectselectsan input from a[ ] and removesit.Theselectedinput's value is assignedto x. Feasibleis a Boolean-valuedfunction thatdetermineswhetherx canbeincludedinto the solutionvector.ThefunctionUnion combinesx with the solutionand updatesthe objectivefunction.The

197

198 CHAPTER4. THEGREEDY METHOD

1 AlgorithmGreedy(a,n)2 // a[l:n] containsthe n inputs.3 <4 solution:=0;// Initializethe solution.5 for i :=1to n do6 {7 x :=Select(a);8 if Feas\\b\\e(solution,x) then9 solution:=Union (solution,x);10 }11 returnsolution;12 }

Algorithm4.1Greedymethodcontrolabstractionfor the subsetparadigm

function Greedydescribesthe essentialway that a greedyalgorithmwill look,oncea particularproblemis chosenand the functions Select,Feasible,andUnion areproperly implemented.

For problemsthat do not callfor the selectionof an optimalsubset,in thegreedy methodwe make decisionsby consideringthe inputs in someorder.Eachdecisionis madeusingan optimizationcriterionthat canbecomputedusingdecisionsalready made. Callthis version of the greedy methodtheorderingparadigm.Sections4.2,4.3,4.4,and 4.5considerproblemsthat fit

the subsetparadigm,and Sections4.6,4.7,and 4.8considerproblemsthatfit the orderingparadigm.

EXERCISE1.Write a controlabstractionfor the orderingparadigm.

4.2 KNAPSACKPROBLEMLet us try to apply the greedy methodto solve the knapsackproblem.Wearegiven n objectsand a knapsackor bag.Objecti has a weight wi and theknapsackhas a capacity m. If a fractionxi, 0 <xi < 1,of objecti is placedinto the knapsack,then a profit oipiXi is earned.Theobjectiveis to obtaina filling of the knapsackthat maximizesthe total profit earned.Sincetheknapsackcapacity is m, we requirethe totalweight of all chosenobjectstobe at most m. Formally,the problemcan bestatedas

4.2.KNAPSACKPROBLEM 199

maximize\\] Pi'xi (4-1)

subjectto 2J WiXi < m (4.2)l<i<n

and 0 < Xj < 1, 1< i < n (4.3)Theprofits and weightsarepositivenumbers.

A feasiblesolution(or filling) is any set (xi,...,xn) satisfying (4.2)and(4.3) above. An optimalsolutionis a feasiblesolutionfor which (4.1)ismaximized.

Example4.1Considerthe following instanceof the knapsackproblem:n = 3,m= 20,(pi,p2,P3)= (25,24,15),and (wuw2,w3)= (18,15,10).Fourfeasiblesolutionsare:

(Xi,X2,X3) T,wixi T,Pixi1. (1/2,1/3,1/4) 16.5 24.252. (1,2/15,0) 20 28.23. (0,2/3,1) 20 314. (0, 1,1/2) 20 31.5

Of thesefour feasiblesolutions,solution4 yields the maximumprofit. Aswe shallsoonsee,this solutionis optimalfor the given probleminstance.\342\226\241

Lemma4.1In casethe sum of all the weights is < m, then x^ = 1,1<i < n is an optimalsolution. \342\226\241

So let us assumethe sum of weightsexceedsm. Now all the x^s cannotbe 1.Another observationto make is:

Lemma4.2All optimalsolutionswill fill the knapsackexactly. \342\226\241

Lemma4.2is true becausewe can always increasethe contributionofsomeobjecti by a fractionalamount until the totalweight is exactlyrn.

Note that the knapsackproblemcallsfor selectinga subset of theobjects and hencefits the subsetparadigm.In additionto selectinga subset,the knapsackproblemalsoinvolves the selectionof an x% for eachobject.Severalsimplegreedy strategiesto obtainfeasiblesolutionswhosesumsareidenticallym suggestthemselves.First,we can try to fill the knapsackbyincluding next the objectwith largestprofit. If an objectunderconsiderationdoesn't;fit, then a fractionof it is includedto fill the knapsack.Thus eachtimean objectis included(exceptpossiblywhen the last objectis included)


into the knapsack,we obtain the largestpossibleincreasein profit value.Note that if only a fractionof the last objectis included,then it may bepossibleto get a biggerincreaseby usinga different object.For example,ifwe have two units of spaceleft and two objectswith (pi = 4, Wi = 4) and(pj = 3,Wj = 2) remaining,then usingj is betterthan usinghalf of i. Letus use this selectionstrategy on the dataof Example4.1.

Objectonehas the largestprofit value (p\\= 25).So it is placedinto the

knapsackfirst. Then x\\ = 1and a profit of 25 is earned.Only 2 units ofknapsackcapacity areleft.Objecttwo has the next largestprofit (p2 = 24).However,\302\253;2

= 15and it doesn'tfit into the knapsack.Usingx^ = 2/15fillsthe knapsackexactlywith part of object2 and the value of the resultingsolutionis 28.2.This is solution2 and it is readily seento be suboptimal.Themethodusedto obtainthis solutionis termeda greedy methodbecauseat eachstep(exceptpossibly the last one)we chosetointroducethat objectwhich would increasethe objectivefunction value the most.However,thisgreedymethoddid not yield an optimalsolution.Note that even if we changethe above strategy sothat in the last step the objectivefunction increasesby as much as possible,an optimalsolutionis not obtainedfor Example4.1.

We can formulate at least two othergreedy approachesattemptingtoobtainoptimalsolutions.Fromthe precedingexample,we note thatconsidering objectsin orderof nonincreasingprofit valuesdoesnot yield an optimalsolutionbecauseeven thoughthe objectivefunction value takeson largeincreasesat eachstep,the numberof stepsis few as the knapsackcapacity isusedup at a rapidrate. So,let us try tobegreedy with capacity and use it

up as slowlyas possible.This requiresus toconsiderthe objectsin orderofnondecreasingweights W{. UsingExample4.1,solution3 results.This toois suboptimal.This time,even thoughcapacity is usedslowly,profits aren'tcomingin rapidly enough.

Thus,our next attemptis an algorithmthat strives to achievea balancebetweenthe rate at which profit increasesand the rate at which capacity isused.At eachstepwe includethat objectwhich has the maximumprofitper unit of capacity used.This meansthat objectsareconsideredin orderof the ratioPi/wi- Solution4 of Example4.1is producedby this strategy.Ifthe objectshave already beensortedinto nonincreasingorderof Pi/wi,thenfunction GreedyKnapsack(Algorithm4.2)obtainssolutionscorrespondingtothis strategy. Note that solutionscorrespondingto the first two strategiescanbeobtainedusingthis algorithmif the objectsareinitially in theappropriate order.Disregardingthe timeto initially sortthe objects,eachof thethreestrategiesoutlinedabove requiresonly 0(n)time.

We have seenthat when one appliesthe greedy methodto the solutionof the knapsackproblem,thereareat leastthreedifferent measuresonecanattemptto optimizewhen determiningwhich objectto includenext.Thesemeasuresaretotalprofit, capacity used,and the ratioof accumulatedprofitto capacity used.Oncean optimizationmeasurehas beenchosen,the greedy

4.3.TREE VERTEX SPLITTING 203

2. [0/1Knapsack]Considerthe knapsackproblemdiscussedin thissection. We add the requirementthat xt = 1or X{ = 0,1< i < n; thatis,an objectis eitherincludedor not includedinto the knapsack.Wewish to solve the problem

n

m.dJX.}pjXil

n

subjecttoY^ uijXi < ml

and Xi = 0 or 1.1< i < n

Onegreedystrategy is toconsiderthe objectsin orderof nonincreasingdensity Vilwi and add the objectinto the knapsackif it.fits.Showthatthis strategy doesn'tnecessarilyyield an optimalsolution.

4.3 TREE VERTEXSPLITTINGConsidera directedbinary tree eachedgeof which is labeledwith a realnumber (calledits weight). Treeswith edgeweights are calledweightedtrees. A weightedtree can be used,for example,to modela distributionnetwork in whichelectricsignalsorcommoditiessuchas oilaretransmitted.Nodesin the treecorrespondtoreceivingstationsand edgescorrespondtotransmissionlines.It is conceivablethat in the processof transmissionsomelossoccurs(dropin voltagein the caseof electricsignalsordropin pressurein the caseof oil).Eachedgein the treeis labeledwith the lossthat occursin traversingthat edge. The network may not be able to toleratelossesbeyond a certainlevel.In placeswherethe lossexceedsthe tolerancelevel,boostershave to be placed.Given a network and a losstolerancelevel,theTree Vertex SplittingProblem(TVSP)is to determinean optimalplacementof boosters.It is assumedthat the boosterscan only beplacedin the nodesof the tree.

TheTVSPcan be specifiedmorepreciselyas follows:Let T \342\200\224 (V, E,w)bea weighteddirectedtree,whereV is the vertexset,E is the edgeset,andw is the weight function for the edges.In particular,w(i,j)is the weight ofthe edge{i,j)\342\202\254 E. The weight w(i,j)is undefined for any (\302\253',j) ^ E. Asourcevertex is a vertexwith in-degreezero,and a sink vertex is a vertexwith out-degreezero.For any path P in the tree,its delay,d(P),is definedto be the sum of the weightson that path. The delay of the treeT,d(T), isthe maximumof all the path delays.

Let T/X be the forest that resultswhen eachvertexu in X is split intotwo nodesul and u\302\260 such that all the edges(u.j) \342\202\254

E {{j,u)\342\202\254 E) are


Figure4.1A treebeforeand after splittingthe node3

replacedby edgesof the form{u\302\260,j} ({j,u1}).In otherwords, outbound

edgesfrom u now leave from u\302\260 and inboundedgesto u now enterat ul.Figure4.1shows a treebeforeand after splittingthe node3. A nodethatgets split correspondstoa boosterstation.TheTVSPis to identify a setX C V of minimum cardinality for which d(T/X) < 5, for somespecifiedtolerancelimit S. Note that the TVSPhas a solutiononly if the maximumedgeweight is < S. Also note that the TVSPnaturally fits the subsetparadigm.

Given a weightedtreeT(V,E,w) and a tolerancelimit 5, any subsetX ofV is a feasiblesolutionif d(T/X) < S. Givenan X, we cancomputed(T/X)in 0(|V|)time. A trivial way of solving the TVSPis to computed(T/X)for eachpossiblesubsetX of V. But thereare2>v> suchsubsets!A betteralgorithmcan beobtainedusingthe greedy method.

For the TVSP,the quantity that is optimized(minimized)is the numberof nodesin X. A greedy approachto solving this problemis to computeforeachnodeuel^,the maximumdelay d(u) from u to any othernodein itssubtree. If u has a parent v such that d(u) + w(v,u) > 6, then the nodeu gets split and d(u) is set tozero.Computationproceedsfrom the leavestoward the root.

In the treeof Figure4.2,let S = 5. For eachof the leafnodes7, 8,5,9,and 10the delay is zero.Thedelay for any nodeis computedonly after thedelays for its childrenhave beendetermined.Let u be any nodeand C(u)be the setof all childrenof u. Then d(u) is given by

d(u) = max{d(v)+w(u,v)\\vEC(u)

Usingthe above formula, for the tree of Figure4.2,d(A) = 4. Sinced(4) +w(2,4)= 6 > 6, node4 getssplit.We setd(4) = 0.Now d{2)can be

4.3.TREE VERTEXSPLITTING 205

Figure4.2An exampletree

computedand is equalto 2.Sinced{2)+w(l,2) exceeds5, node2 getssplitand d{2)is setto zero.Thend(6)is equalto 3.Also,sinced(6)+w(3,6) > S,node6 has tobesplit.Setd(6) to zero.Now d(3) is computedas 3.Finally,d(l) is computedas 5.

Figure4.3shows the final treethat resultsafter splittingthe nodes2,4,and 6. This algorithmis describedin Algorithm 4.3,which is invoked asTVS(root,5), rootbeingthe rootof the tree.Theorderin which TVS visits(i.e.,computesthe delay values of) the nodesof the tree is calledthe postorderand is studiedagain in Chapter6.

Figure4.3Thefinal treeafter splittingthe nodes2,4, and 6


12345678910111213141516171819

AlgorithmTVS(T,<5)////{

}

Determineand output the nodesto besplit.w()

if{

}

is the weightingfunction for the edges.

(T ^ 0) then

d[T] :=0;for eachchildv of T do{

tvs(M);d[T] :=max{d[T],d[v] +w(T,v)};

}if ((T is not the root) and(d[T]+w(parent(T),T)>6)) then

{write (T); d[T] :=0;

}

Algorithm4.3Thetreevertexsplittingalgorithm

Algorithm TVS takesG(n) time,wheren is the numberof nodesin thetree. This can be seenas follows:When TVS is calledon any nodeT, onlya constantnumberof operationsare performed(excludingthe timetakenfor the recursivecalls).Also, TVS is calledonly onceon eachnodeT in thetree.

Algorithm 4.4is a revisedversionof Algorithm 4.3for the specialcaseof directedbinary trees.A sequentialrepresentationof the tree(seeSection2.2)has beenemployed.The treeis storedin the array tree[] with the rootat

\302\243ree[l]. Edgeweightsarestoredin the array weight[].If tree[i]has a treenode,the weight of the incomingedgefrom its parent is storedin weight[i\\.The delay of nodei is storedin d[i\\. The array d[ ] is initializedto zeroat the beginning.Entriesin the arrays tree[] and weight[] correspondingto nonexistentnodeswill be zero. As an example,for the tree of Figure4.2,tree[] will beset to {1,2,3, 0,4,5,6, 0,0,7, 8,0,0,9,10}starting at cell1. Also, weight\\ ] will be set to {0,4,2,0,2,1,3,0,0,1,4,0,0,2,3}at thebeginning,starting from cell1.Thealgorithmis invokedas TVS(1,S). Nowwe show that TVS (Algorithm 4.3) will always split a minimalnumberofnodes.

4.3.TREE VERTEXSPLITTING 207

1234567891011121314151617181920212223

AlgorithmTVS(M)////////{

}

Determineand output a minimum cardinality split set.The treeis realizedusingthe sequentialrepresentation.Root is at

\302\243ree[l].N is the largestnumbersuchthat

tree[N]has a treenode.

if(\302\243ree[i] ^ 0) then// If the treeis not empty

if (2i >N) thend[i] :=0;// i is a leaf.else{

TVS(2i,6);d[i] :=max(d[i],d[2i] +weight[2i]);if (2i+ 1<N) then{

TVS(2i+ l,<J);d[i] :=max(d[i],d[2i+ 1]+weight[2i+ 1]);

}}if ((tree[i\\ 1) and (d[i]+weight[i]> 5)) then{

write (tree[i\\);d[i] :=0;}

Algorithm4.4TVS for the specialcaseof binary trees

Theorem4.2Algorithm TVS outputs a minimum cardinality set U suchthat d(T'/[/)< \302\243 on any treeT, provided no edgeof T has weight >S.

Proof:Theproofis by inductionon the numberof nodesin the tree. If thetreehas a singlenode,the theoremis true. Assume the theoremfor alltreesof size<n.We prove it for treesof sizen + 1also.

Lo;t T beany treeof sizen +1and let U be the setof nodessplit by TVS.Also let W be a minimum cardinality set suchthat d(T/W) < S. We haveto show that \\U\\

< \\W\\. If\\U\\= 0, this is true. Otherwise,let x be the first

vertexsplit by TVS.Let Tx be the subtreerootedat x. Let T' be the treeobtainedfrom T by deletingTx exceptfor x. Note that W has to have atleastone node,say y, from Tx. Let W = W \342\200\224 {y}.If thereis a W* suchthat \\W*\\ < \\W'\\ and d(T'/W*) < 6, then sinced{T/{W*+{a;}))< 6, W isnot a minimum cardinality split set for T. Thus, W' has to be a minimumcardinality split setsuchthat d(T'/W')< 5.


If algorithmTVS is run ontreeT',the setof split nodesoutput is U \342\200\224 {x}.SinceT\" has < n nodes,U \342\200\224 {x}is a minimum cardinality split set for T\".This in turn meansthat \\W'\\ >

\\U\\\342\200\224 1.In other words, \\W\\ > \\U\\. \342\226\241

EXERCISES1.For the tree of Figure4.2solve the TVSPwhen (a) 6 = 4 and (b)

6 = 6.2.RewriteTVS (Algorithm 4.3) for generaltrees.Make useof pointers.

4.4 JOBSEQUENCINGWITH DEADLINESWe are given a set of n jobs.Associatedwith jobi is an integerdeadlinedi > 0 and a profit pi >0.Forany jobi the profit pi is earnediff the jobiscompletedby its deadline.To completea job,onehas to processthe jobona machinefor oneunit of time.Only onemachineis availablefor processingjobs.A feasiblesolutionfor this problemis a subsetJ of jobssuchthat eachjobin this subsetcan be completedby its deadline.Thevalue of a feasiblesolutionJ is the sum of the profits of the jobsin J, or YîeJPi- ^n optimalsolutionis a feasiblesolutionwith maximumvalue. Hereagain, sincetheprobleminvolves the identificationof a subset,it fits the subsetparadigm.

Example4.2Let n = 4, (pi,p2,P3,P4)= (100,10,15,27)and (di,c?2,d3,cLi)=(2,1,2,1).Thefeasiblesolutionsand their values are:

1.2.3.4.5.6.7.8.9.

feasiblesolution

(1,2)(1,3)(1,4)(2,3)(3,4)(1)(2)(3)(4)

processingsequence2,11,3 or 3, 14,12,34,31234

value1101151272542

100101527

Solution3 is optimal.In this solutiononly jobs1and 4 areprocessedandthe value is 127.Thesejobsmust be processedin the orderjob4 followedby job1.Thusthe processingof job4 beginsat timezeroand that of job1is completedat time2. \342\226\241

4.4.JOBSEQUENCINGWITH DEADLINES 209

To formulate a greedy algorithmto obtain an optimalsolution,we mustformulate an optimizationmeasureto determinehow the nextjobis chosen.As a first attemptwe can choosethe objectivefunction YîejPi as ouroptimization measure.Usingthis measure,the next jobto includeis the onethat increasesJ2tejPithe most,subjectto the constraintthat the resultingJ is a feasiblesolution.This requiresus to considerjobsin nonincreasingorderof the p^s. Let us apply this criterionto the dataof Example4.2.Webeginwith J = 0 and J2?,^jPi = 0- J\302\260b 1is addedto J as it has the largestprofit and J = {1}is a feasiblesolution.Next,job4 is considered.ThesolutionJ = {1,4}is alsofeasible.Next,job3 is consideredand discardedas J = {1,3,4}is not feasible.Finally,job2 is consideredfor inclusionintoJ. It is discardedas J = {1,2,4}is not feasible.Hence,we are left withthe solutionJ = {1,4}with value 127.This is the optimalsolutionfor thegiven probleminstance.Theorem4.4proves that the greedy algorithmjustdescribedalwaysobtainsan optimalsolutionto this sequencingproblem.

Beforeattemptingthe proof, let us seehow we can determinewhethera given J is a feasiblesolution.One obvious way is to try out allpossiblepermutationsof the jobsin J and checkwhetherthe jobsin J can beprocessed in any one of thesepermutations(sequences)without violatingthedeadlines.For a given permutationa = i\\. i<2, \302\2533,

\342\226\240\342\226\240

\342\226\240, ik, this is easy to do,sincethe earliesttimejobiq,1< q < k, will be completedis q. If q > diq,then usinga,at leastjobiq will not becompletedby its deadline.However,if |J\\

= i, this requirescheckingi\\ permutations.Actually, the feasibilityof a set .7 can bedeterminedby checkingonly onepermutationof the jobsin J. This permutationis any one of the permutationsin which jobsareorderedin nondecreasingorderof deadlines.

Theorem4.3Let J be a setof k jobsand a = i\\, i-2,\342\226\240\342\226\240

\342\226\240, ik a permutationof jobsin J such that dix < d{2 < \342\226\240\342\226\240\342\226\240 < dik. Then J is a feasiblesolutioniff

the jobsin J canbeprocessedin the ordera without violatingany deadline.

Proof:Clearly, if the jobsin J can be processedin the ordera withoutviolatingany deadline,then J is a feasiblesolution.So,we have only toshow that if J is feasible,then a representsa possibleorderin which thejobscan be processed.If J is feasible,then thereexistsa' = n,?\"2i \342\200\242\342\200\242

\342\200\242, rksuchthat dTq > q, 1< q <k. Assumea'^ a. Then leta be the leastindexsuch that ra ^ ia.Let r\\, = ia.Clearly, b > a. In a' we can interchangera and r^. SincedTa > drb, the resultingpermutationa\" = si,S2,\342\200\242\342\226\240\342\200\242,Skrepresentsan orderin which the jobscan be processedwithout violatinga deadline.Continuingin this way, o'can be transformedinto o withoutviolatingany deadline.Hence,the theoremis proved. \342\226\241

Theorem4.3is true even if the jobshave different processingtimesti > 0(seethe exercises).


Theorem4.4Thegreedy methoddescribedabove alwaysobtainsanoptimal solutionto the jobsequencingproblem.Proof:Let (pj,dj),l< i < n, define any instanceof the jobsequencingproblem.Let / be the set of jobsselectedby the greedy method.Let Jbe the setof jobsin an optimalsolution.We now show that both / and Jhave the sameprofit values and so/ is alsooptimal.We can assumeI ^ Jas otherwisewe have nothingtoprove. Note that if J C /, then J cannotbe optimal.Also, the case/ C J is ruled out by the greedy method.So,thereexistjobsa and b suchthat a E I, a 0 J, b E J, and b 0 /. Let a bea highest-profitjobsuch that a E I and a 0 J. It follows from the greedymethodthat pa > pi, for alljobsb that are in J but not in /. To seethis,note that if pi, >pa, then the greedy methodwould considerjobb beforejoba and includeit into /.

Now, considerfeasibleschedulesSiand Sj for / and J respectively.Leti be a jobsuch that i E I and i E J. Let i be scheduledfrom t to t + 1inSiand t' to t'+ 1in Sj.If t < t',then we can interchangethe job(if any)scheduledin [t1,t'+ 1]in Siwith i. If no jobis scheduledin [t',t'+ 1]in /,then i is moved to [t1,t'+1].Theresultingscheduleis alsofeasible.If t' < t,then a similartransformationcanbemadein Sj.In this way, we canobtainschedulesS'Tand Sj with the property that alljobscommonto/ and J arescheduledat the sametime.Considerthe interval [ta,ta+ 1]in S'Tin whichthe joba (defined above) is scheduled.Let b be the job(if any) scheduledin S'jin this interval. Fromthe choiceof a,pa > pi,. Schedulinga from tato ta + 1 in Sj and discardingjobb gives us a feasibleschedulefor jobsetJ' =

J\342\200\224 {b}U {a}.Clearly,J' has a profit value no lessthan that of J anddiffers from / in one lessjobthan J does.

By repeatedlyusingthe transformationjust described,J can betransformed into / with no decreasein profit value.So/ must beoptimal. \342\226\241

A high-leveldescriptionof the greedy algorithmjust discussedappearsas Algorithm 4.5.This algorithmconstructsan optimalset J of jobsthatcan beprocessedby their due times.Theselectedjobscan be processedinthe ordergiven by Theorem4.3.

Now, let us seehow to representthe setJ and how to carry out the testof lines7 and 8 in Algorithm 4.5.Theorem4.3tellsus how to determinewhetheralljobsin J U {i}can be completedby their deadlines.We canavoid sortingthe jobsin J eachtimeby keepingthe jobsin J orderedbydeadlines.We can use an array d[\\ : n] to storethe deadlinesof the jobsin the orderof theirp-values.Theset J itselfcan berepresentedby a one-dimensionalarray J[l :k] such that J[r],1< r < k are the jobsin J andd[J[l]]<d[J[2]]< \342\226\240\342\226\240\342\226\240< d[J[k]].To testwhetherJ\\j{i}is feasible,we havejust to insert i into J preservingthe deadlineorderingand then verify thatc^b\"]]<r?1< r < k + 1.Theinsertionof i into J is simplifiedby the useof a fictitious job0 with d[0] = 0 and J[0]= 0. Note alsothat if jobi isto be insertedat positionq, then only the positionsof jobsJ[q],J[q+ 1],


1 AlgorithmGreedyJob(d,J,n)2 // J is a setof jobsthat can becompletedby their deadlines.3 {4 J:={1};5 for i :=2 to n do6 {7 if (alljobsin Ju{i}can becompleted8 by their deadlines)thenJ :=J U {i};9 }10 }

Algorithm4.5High-leveldescriptionof jobsequencingalgorithm

..., J[k] are changedafter the insertion.Hence,it is necessaryto verifyonly that thesejobs(andalsojobi) do not violatetheir deadlinesfollowingthe insertion.The algorithmthat resultsfrom this discussionis functionJS (Algorithm4.6).The algorithmassumesthat the jobsarealready sortedsuchthat p\\ >p'2 > \342\226\240\342\226\240\342\226\240 >pn- Furtherit assumesthat n > 1and thedeadlined[i] of jobi is at least1.Note that no jobwith d[i] < 1can ever befinishedby its deadline.Theorem4.5proves that JS is a correctimplementationofthe greedy strategy.

Theorem4.5FunctionJS is a correctimplementationof the greedy-basedmethoddescribedabove.

Proof:Sinced[i] > 1,the jobwith the largestpt will always be in thegreedy solution.As the jobsare in nonincreasingorderof the p^'s, line8 in Algorithm 4.6includesthe jobwith largestp%. The for loopof line10considersthe remainingjobsin the orderrequiredby the greedy methoddescribedearlier.At alltimes,the setof jobsalready includedin the solutionis maintainedin J. If J[i],1< % < k, is the set already included,then J issuch that d[J[i]]< d[J[i+ 1]],1< i < k. This allows for easy applicationof the feasibility test of Theorem4.3.When jobi is beingconsidered,thewhile loopof line15determineswherein J thisjobhas to beinserted.Theuseof a fictitious job0 (line7) allowseasy insertioninto position1.Let wbesuchthat d[J[u;]]< d[i] and d[J[g]]>d[i],w < q < k. If jobi is includedinto J, then jobsJ[q],w < q < k, have to be moved one positionup in J(line19).FromTheorem4.3,it follows that sucha move retainsfeasibilityof ,/ iff d[J[<7]]7^ q, w < q < k. This conditionis verified in line 15.Inaddition,i can be insertedat positionw + 1 iff d[i] > w. This is verified inline16(noter = w on exitfrom the while loopif <i[J[g]]7^ q, w < q < k).Thecorrectnessof JS follows from theseobservations. \342\226\241


1 AlgorithmJS(d,j,n)2 // d[i] > 1,1 1.Thejobs3 // areorderedsuchthat p[\\] >p[2]> \342\200\242\342\200\242\342\200\242>p[n].J[i]4 // is the ith jobin the optimalsolution,1< i < k.5 // Also, at terminationd[J[i]]<d[J[i+ 1]],1< i < k.6 {7 rf[0] :=J[0]:=0;// Initialize.8 J[l]:=1;//Includejob1.9 fc:=l;10 for i :=2 to n do11 {12 // Considerjobsin nonincreasingorderof p[i].Find13 // positionfor i and checkfeasibility of insertion.14 r :=k;15 while ((d[J[r]\\> d[i\\) and (d[J[r]] r)) dor :=r -1;16 if ((d[J[r]]< d[i])and (d[i\\ > r)) then17 {18 // Inserti into J[].19 for q :=k to (r + 1) step-1doJ[q+ 1]:=J[q];20 J[r+ 1]:=i;k :=k + 1;21 }22 }23 returnk;24 >

Algorithm4.6Greedy algorithmfor sequencingunit timejobswithdeadlines and profits

ForJStherearetwo possibleparametersin termsof which its complexitycan be measured.We can use n, the numberof jobs,and s, the numberofjobsincludedin the solutionJ. Thewhileloopof line15in Algorithm4.6isiteratedat most k times.Eachiterationtakes9(1)time.If the conditionalof line 16is true, then lines 19and 20 are executed.Theselinesrequire@(k \342\200\224 r) time to insert jobi. Hence,the total timefor eachiterationofthe for loopof line 10is @(k). This loopis iteratedn \342\200\224 1 times. If s isthe final value of k, that is,s is the numberof jobsin the final solution,then the total timeneededby algorithmJS is @(sn). Sinces < n, theworst-casetime,as a function of n alone is G(n2). If we considerthe jobset pi = di = n \342\200\224 i + 1,1 < i < n, then algorithmJS takesG(n2) timeto determineJ. Hence,the worst-casecomputingtimefor JS is 0(n2).Inaddition to the spaceneededfor d, JS needs0(s)amount of spacefor J.


Note that the profit valuesarenot neededby JS.It is sufficient to know thatPi >Pi+i,1<i <n.

Thecomputingtimeof JS can be reducedfrom 0(n2)to nearly 0(n)by usingthe disjointset union and find algorithms(seeSection2.5)and adifferent methodto determinethe feasibility of a partialsolution.If J is afeasiblesubsetof jobs,then we can determinethe processingtimesfor eachof the jobsusingthe rule: if jobi hasn't beenassigneda processingtime,then assign it to the slot[a \342\200\224 l,a],wherea is the largestintegerr suchthat 1< r < di and the slot[a \342\200\224 l,a]is free.This rule simply delays theprocessingof jobi as much as possible.Consequently,when J is beingbuiltup jobby job,jobsalready in J do not have to bemoved from their assignedslotsto accommodatethe new job.If for the new jobbeingconsideredthereis no a as defined above,then it cannotbe includedin J. Theproofof thevalidity of this statementis left as an exercise.

Example4.3Let n = 5,(pi,...,p5) = (20,15,10,5,1)and (di,...,d5)= (2,2,1,3,3).Usingthe above feasibility rule,we have

J assignedslots jobconsidered action profit0 none I assignto[1,2J 0

{1} [1,2] 2 assignto [0, 1] 20{1,2} [0, 1],[1,2] 3 cannot fit; reject 35{1,2} [0, 1],[1,2] 4 assignto [2, 3] 35

{1,2,4}[0, 1],[1,2], [2, 3] 5 reject 40

The optimalsolutionis J = {1,2,4}with a profit of 40. \342\226\241

Sincethereare only n jobsand eachjobtakesone unit of time, it isnecessaryonly to considerthe timeslots[i \342\200\224 l,i],1 < i < b, such thatb = min {n,max{di}}.Oneway to implementthe above schedulingrule isto partitionthe timeslots[i \342\200\224 1,i],1< i < 6, into sets.We usei to representthe timeslots[i \342\200\224 1,i].For any sloti, let rii be the largestintegersuchthatrii < i and slotrii is free.To avoid end conditions,we introducea fictitiousslot

[\342\200\2241,0]which is always free. Two slotsi and j are in the sameset iff

rii = nj- Clearly, if i and j, i <j, arein the sameset,theni,i+l,i+2,...,jarein the sameset.Associatedwith eachsetk of slotsis a value f(k).Thenf(k) \342\200\224 rii for allslotsi in setk. Usingthe set representationof Section2.5,eachset is representedas a tree. The root nodeidentifies the set. Thefunction / is defined only for rootnodes.Initially, allslotsare free and wehave 6+1setscorrespondingto the 6+1slots[i \342\200\224 1,i],0 < i < b. At thistimef(i)=i,0<i<b.We usep(i) to link slot i into its set tree. Withthe conventionsfor the unionand find algorithmsof Section2.5,p(i) =

\342\200\2241,

0 < i < 6, initially. If a jobwith deadlined is tobescheduled,then we needto find the rootof the treecontainingthe slot min{n,d}.If this root is j,


then f(j) is the nearestfree slot,provided /(j) ^ 0.Having usedthis slot,the setwith rootj shouldbecombinedwith the setcontainingslot/(j) \342\200\224 1.

Example4.4The treesdefined by the p(i)'sfor the first threeiterationsin Example4.3areshown in Figure4.4. \342\226\241

J0

{1}

{1,2}

f 0

&P(0)

f 0

0P(0)

f(i)=o

i3l

1 2

0 0P(l) P(2)

1

Qpd)

P(2)

trees3

0P(3)

3

\302\256

P(3)

4

&P(4)

4

\302\251

P(4)

f(3)=3f(4)=4f(5)=5

0\\ P(3)

\302\251

P(4)@P(5)

., ,actionconsidered5 \\,d\\ = 2 select

0P(5)

5 2,d2= 2 select

0P(5)

3,^3=1 reject

Figure4.4Fast jobscheduling

Thefast algorithmappearsas FJS(Algorithm 4.7).Itscomputingtimeis readily observedto be 0(na(2n,n))(recallthat a(2n,n) is the inverseof Ackermann'sfunction defined in Section2.5).It needsan additional2nwords of spacefor / and p.


12345678910111213141516171819

AlgorithmFJS(d,n,b,j)IIII{

}

Find an optimalsolutionJ[l:k].It is assumedthatp[l]>p[2]> \342\226\240\342\226\240\342\226\240>p[n] and that b = min{n,maxj(d[i])}.

// Initially thereare6+1singlenodetrees.for i :=0 to b dof[i]:=i;k :=0;// Initialize.for i :=1to n do{ // Usegreedy rule.

q :=CollapsingFind(min(n,d[i]));if (/[<?] 0) then{

k :=k + 1;J[k]:=i;// Selectjobi.m := CollapsingFind(/[(?]\342\200\224 1);WeightedUnion(m,q);f[q] :=f[m]',// q may benew root.

}}

Algorithm4.7Fasteralgorithmfor jobsequencing

EXERCISES

1.You aregiven a setof n jobs.Associatedwith eachjobi is a processingtimet{ and a deadlinedi by which it must be completed.A feasiblescheduleis a permutationof thejobssuchthat if thejobsareprocessedin that order,then eachjobfinishes by itsdeadline.Define a greedyscheduleto be one in which the jobsare processedin nondecreasingorderof deadlines.Show that if thereexistsa feasibleschedule,thenallgreedy schedulesarefeasible.

2. [Optimalassignment]Assume therearen workersand n jobs.Let Vijbethe value of assigningworkeri to jobj. An assignmentof workerstojobscorrespondsto the assignmentof 0 or1to the variablesXij, 1< i,j < n. Then xy =1meansworker i is assignedto jobj, and Sy = 0means that worker i is not assignedto jobj. A valid assignmentisone in which eachworker is assignedto exactlyonejoband exactlyone worker is assignedto any onejob.Thevalue of an assignmentislî l^j vijxij-


For example,assumetherearethreeworkersw\\,W2, and w3 and threejobsji,j'2,and j'3.Let the values of assignmentbe v\\\\ = 11,v\\2 = 5,^13= 8, v2\\ = 3, v22 = 7, v23 = 15, 31= 8, v32 = 12,and v33 = 9.Then, a valid assignmentis xu = 1,\302\24323

= liand x3i= 1-Therestofthe Xjj'sarezeros.The value of this assignmentis 5 + 15+ 8 = 28.An optimalassignmentis a valid assignmentof maximumvalue.Writealgorithmsfor two different greedy assignmentschemes.Oneof theseassignsa workerto the bestpossiblejob.Theotherassignsto a jobthebestpossibleworker.Showthat neitherof theseschemesis guaranteedto yield optimalassignments.Iseitherschemealways betterthan theother?Assume Vij >0.

3. (a) What is the solutiongeneratedby the function JS when n =7, (pi,P2,---,P7)= (3,5,20,18,1,6,30),and (di,d2,...,d7) =(1,3,4,3,2,1,2)?

(b) Showthat Theorem4.3is true even if jobshave differentprocessing requirements.Associatedwith jobiisaprofit pi >0,a timerequirementU > 0,and a deadlinedi >U.

(c) Showthat for the situationof part (a), the greedy methodof thissectiondoesn'tnecessarilyyield an optimalsolution.

4. (a) For the jobsequencingproblemof this section,show that thesubset J representsa feasiblesolutioniff the jobsin J can beprocessedaccordingto the rule:if jobi in J hasn'tbeenassigneda processingtime,then assignit tothe slot[a \342\200\224 1,a],wherea isthe leastintegerr such that 1< r < di and the slot [a \342\200\224 I,a] isfree.

(b) Forthe probleminstanceof Exercise3(a)draw the treesand givethe values of /(i),0< i < n, after eachiterationof the for loopof line8 of Algorithm 4.7.

4.5 MINIMUM-COSTSPANNINGTREESDefinition4.1Let G = (V, E) be an undirectedconnectedgraph.A

subgraph t = (V, E')of G is a spanningtreeof G iff t is a tree. \342\226\241

Example4.5Figure4.5shows the completegraphon four nodestogetherwith threeof its spanningtrees. \342\226\241

Spanningtreeshave many applications.Forexample,they can be usedto obtainan independentset of circuitequationsfor an electricnetwork.First, a spanningtree for the electricnetwork is obtained.Let B be theset of network edgesnot in the spanningtree. Adding an edgefrom B to

4.5.MINIMUM-COSTSPANNINGTREES 217

( \\

Figure4.5An undirectedgraph and threeof its spanningtrees

the spanningtree createsa cycle. Kirchoff'ssecondlaw is used on eachcycle to obtain a circuitequation. The cycles obtainedin this way areindependent(i.e.,none of thesecycles can be obtainedby taking a linearcombinationof the remainingcycles)aseachcontainsan edgefrom B thatis not containedin any othercycle.Hence,the circuitequationssoobtainedarealsoindependent.In fact, it can be shown that the cycles obtainedbyintroducingthe edgesof B one at a timeinto the resultingspanningtreeform a cycle basis,and soallothercycles in the graph can be constructedby takinga linearcombinationof the cycles in the basis.

Another applicationof spanningtreesarisesfrom the property that aspanningtreeis a minimalsubgraphG'of G suchthat V{G')= V(G)and G'is connected.(A minimalsubgraphis onewith the fewestnumberof edges.)Any connectedgraphwith n verticesmust have at leastn \342\200\224 1edgesand allconnectedgraphswith n \342\200\224 1edgesare trees. If the nodesof G representcitiesand the edgesrepresentpossiblecommunicationlinks connectingtwocities,then the minimum numberof links neededto connectthe n citiesisn \342\200\224 1.Thespanningtreesof G representall feasiblechoices.

In practicalsituations,the edgeshave weights assignedto them. Theseweights may representthe costof construction,the lengthof the link, andso on.Given sucha weightedgraph,onewould then wish to selectcitiestohave minimum totalcostor minimum total length. In eithercasethe linksselectedhave to form a tree(assumingall weightsarepositive).If this is notso,then the selectionof links containsa cycle.Removal of any one of thelinks on this cycle resultsin a link selectionof lesscostconnectingallcities.We are thereforeinterestedin finding a spanningtreeof G with minimumcost.(The costof a spanningtree is the sum of the costsof the edgesinthat tree.)Figure4.6shows a graph and oneof its minimum-costspanningtrees.Sincethe identificationof a minimum-costspanningtreeinvolves theselectionof a subsetof the edges,this problemfits the subsetparadigm.


10/ V

7 14/2) \302\256

25V(5) /

22T4)

\\16\302\256

/12

(b)

Figure4.6A graphand its minimum costspanningtree

4.5.1Prim'sAlgorithmA greedy methodto obtain a minimum-costspanningtreebuildsthis treeedgeby edge.Thenextedgeto includeis chosenaccordingto someoptimization criterion.Thesimplestsuchcriterionis to choosean edgethat resultsin a minimum increasein the sum of the costsof the edgessofar included.Thereare two possibleways to interpretthis criterion.In the first, the setof edgessofar selectedform a tree. Thus, if A is the set of edgesselectedsofar, then A forms a tree. Thenext edge(u,v) to be includedin A is aminimum-costedgenot in A with the property that A U {(u,v)}is alsoatree.Exercise2 shows that this selectioncriterionresultsin a minimum-costspanningtree.Thecorrespondingalgorithmis known as Prim'salgorithm.

Example4.6Figure4.7showsthe workingof Prim'smethodon the graphof Figure4.6(a).Thespanningtreeobtainedis shown in Figure4.6(b)andhas a costof 99. \342\226\241

Having seenhow Prim'smethodworks, let us obtaina pseudocodealgorithm to find a minimum-costspanningtreeusingthis method.Thealgorithm will start with a tree that includesonly a minimum-costedgeof G.Then, edgesare addedto this treeone by one.The next edge(i,j) to beaddedis suchthat i is a vertexalready includedin the tree,j is a vertexnotyet included,and the costof (i,j),cost[i,j],is minimum amongalledges{k,l)such that vertex k is in the tree and vertex I is not in the tree. Todeterminethis edge(i,j) efficiently, we associatewith eachvertexj not yetincludedin the treea value near[j].The value near[j]is a vertexin the treesuchthat cost[j,near[j]]is minimum amongallchoicesfor near[j].Wedefine near[j]= 0 for all verticesj that arealready in the tree.Thenext edge


Figure4.7Stagesin Prim'salgorithm

toincludeis defined by the vertexj suchthat near[j] 0 (j not already inthe tree) and cost[j,near[j]]is minimum.

In function Prim (Algorithm 4.8),line 9 selectsa minimum-costedge.Lines 10to 15initializethe variables so as to representa treecomprisingonly the edge(k,I).In the for loopof line16the remainderof the spanningtreeis built up edgeby edge.Lines18and 19select(j,near[j])as the nextedgeto include.Lines23 to 25 updatenear[].

The timerequiredby algorithmPrim is 0(n2),wheren is the numberofverticesin the graphG.To seethis, note that line9 takes

0(|2\302\243|)timeand

line 10takes6(1)time.The for loopof line 12takes0(n)time.Lines18and 19and the for loopof line23 require0(n)time.So,eachiterationofthe for loopof line 16takes0(n)time.Thetotaltimefor the for loopofline16is therefore0(n2).Hence,Prim runs in 0(n2)time.


If we storethe nodesnot yet includedin the treeasa red-blacktree(seeSection2.4.2),lines 18and 19takeO(logn) time. Note that a red-blacktreesupportsthe following operationsin O(logn) time: insert,delete(anarbitrary element),find-min, and search(for an arbitrary element).Thefor loopof line 23 has to examineonly the nodesadjacentto j. Thus itsoverall frequency is

0(|2\302\243|). Updatingin lines24 and 25 alsotakesO(logn)time(sincean updatecan be doneusinga deleteand an insertioninto thered-blacktree).Thus the overall run timeis 0((n+ \\E\\) logn).

Thealgorithmcan be speededa bit by makingthe observationthat aminimum-costspanningtreeincludesfor eachvertexv a minimum-costedgeincidentto v. To seethis, supposet is a minimum-costspanningtreefor G =(V,E). Let v be any vertex in t. Let {v,w)be an edgewith minimumcostamongalledgesincidentto v. Assume that {v,w)0 E(t) and cost[v,w]<cost[v,x]for all edges(v,x) \302\243 E{t).The inclusionof {v,w) into t createsa uniquecycle.This cycle must includean edge{v,x),x ^ w. Removing(v, x) from E{t)U {(v,w)}breaksthis cyclewithout disconnectingthe graph(V,E(t)\\J{{v,w)}).Hence,(V,E{t)l){{v,w)}-{{v,x)})is alsoa spanningtree. Sincecost[v,w]< cost[v,x],this spanningtreehas lower costthan t.This contradictsthe assumptionthat t is a minimum-costspanningtreeofG.So,t includesminimum-costedgesas statedabove.

Fromthis observationit follows that we can start the algorithmwith atreeconsistingof any arbitrary vertexand no edge.Thenedgescanbeaddedone by one.Thechangesneededare to lines9 to 17.Theselinescan bereplacedby the lines

9' mincost:=0;10' for i :\342\200\224 2 to n donear[i]:=1;11' // Vertex1is initially in t.12' near[l]:=0;13'-16' for i :=1to n -1do17' { // Find n -1edgesfor t.

4.5.2 Kruskal'sAlgorithmThereis a secondpossibleinterpretationof the optimizationcriteriamentioned earlierin whichthe edgesof the graphareconsideredin nondecreasingorderof cost.This interpretationis that the set t of edgessofar selectedforthe spanningtreebe suchthat it is possibletocompletet into a tree.Thust may not be a treeat allstagesin the algorithm.In fact, it will generallyonly bea forest sincethe setof edgest canbecompletedinto a treeiff thereare no cycles in t. We show in Theorem4.6that this interpretationof thegreedy methodalsoresultsin a minimum-costspanningtree.This methodis due to Kruskal.


1 AlgorithmPrim(E,cost,n,t)2 // E is the setof edgesin G.cost[\\:n, 1:n] is the cost3 // adjacencymatrixof an n vertexgraphsuchthat cost[i,j]is4 // eithera positive realnumberor oo if no edge(i,j)exists.5 //A minimum spanningtreeis computedand storedas a setof6 // edgesin the array i[l:n \342\200\224 1,1:2].(t[i,l],t[i,2]) is an edgein7 //the minimum-costspanningtree.Thefinal costis returned.8 {9 Let (k,l)be an edgeof minimum costin E;10 mincost:\342\200\224 cost[k,I];11 i[l,l]:=k; t[l,2]:=I;12 for i :\342\200\224 1to n do // Initializenear.13 if (cost[i,l]< cost[i,k])thennear[i]:=I;14 elsenear[i]:=k;15 near[k]:=near[l]:=0;16 for i :=2 to n \342\200\224 1do17 {// Find n \342\200\224 2 additionaledgesfor i.18 Let j bean indexsuchthat near[j] 0 and19 cost\\j,near[j]] is minimum;20 t[i,l]:=j;t[i,2]:=near[j];21 mincost:=mincost+ co.sifj,near[j]];22 near[j]:=0;23 for k :=1to n do // Updatenear[].24 if ((near[k] 0) and (cost[k,near[k]]>cost[k,j]))25 thennear[k]:=j;26 }27 returnmincost;28 }

Algorithm4.8Prim'sminimum-costspanningtreealgorithm


Example4.7 Considerthe graphof Figure4.6(a).We beginwith noedgesselected.Figure4.8(a)showsthe currentgraphwith noedgesselected.Edge(1,6)is the first edgeconsidered.It is includedin the spanningtreebeingbuilt.Thisyields the graphof Figure4.8(b).Next,the edge(3,4) is selectedand includedin the tree (Figure4.8(c)).Thenext edgeto be consideredis(2,7). Itsinclusionin the treebeingbuilt doesnot createa cycle,sowe getthe graph of Figure4.8(d).Edge(2,3)is considerednext and includedinthe treeFigure4.8(e).Of the edgesnot yet considered,(7,4) has the leastcost.It is considerednext.Itsinclusionin the treeresultsin a cycle,sothisedgeis discarded.Edge(5,4) is the next edgeto beaddedto the treebeingbuilt.Thisresultsin the configurationof Figure4.8(f).Thenextedgeto beconsideredis the edge(7,5).It is discarded,as its inclusioncreatesa cycle.Finally, edge(6,5) is consideredand includedin the treebeingbuilt.Thiscompletesthe spanningtree.Theresultingtree (Figure4.6(b))has cost99.

\342\226\241

Forclarity, Kruskal'smethodis written out moreformally in Algorithm4.9. Initially E is the set of all edgesin G. The only functions we wishto performon this set are (1) determinean edgewith minimum cost(line4) and (2) deletethis edge(line5).Both thesefunctions can be performedefficiently if the edgesin E aremaintainedas a sortedsequentiallist.It isnot essentialto sortall the edgessolongas the next edgefor line4 can bedeterminedeasily. If the edgesaremaintainedas a minheap,then the nextedgeto considercanbeobtainedin 0(log\\E\\) time.Theconstructionof theheap itselftakes

0(|2\302\243|)time.

To be able to performstep 6 efficiently, the verticesin G should begroupedtogetherin sucha way that one can easily determinewhethertheverticesv and w arealready connectedby the earlierselectionof edges.Ifthey are,then the edge(v,w) is to bediscarded.If they arenot, then {v,w)is to beaddedto t. Onepossiblegroupingis to placeallverticesin the sameconnectedcomponentof t into a set (allconnectedcomponentsof t will alsobe trees).Then, two verticesv and w areconnectedin t iff they are in thesameset.Forexample,when the edge(2,6)is to beconsidered,the setsare{1,2},{3,4,6},and {5}.Vertices2 and 6 are in different setssothesesetsarecombinedto give {1,2,3,4,6}and {5}.Thenext edgetobeconsideredis (1,4).Sincevertices1and 4 are in the sameset,the edgeis rejected.Theedge(3,5) connectsverticesin different setsand resultsin the finalspanning tree. Usingthe set representationand the union and find algorithmsof Section2.5,we can obtainan efficient (almostlinear)implementationofline6.Thecomputingtimeis,therefore,determinedby the timefor lines4and 5,which in the worst caseis

0(|2\302\243| log\\E\\).

If the representationsdiscussedabove are used,then the pseudocodeofAlgorithm 4.10results.In line6 an initialheap of edgesis constructed.Inline7 eachvertexis assignedto a distinctset (andhencetoa distincttree).Theset t is the set of edgesto be includedin the minimum-costspanning


Figure4.8Stagesin Kruskal'salgorithm

treeand i is the numberof edgesin t. The set t can be representedas asequentiallist usinga two-dimensionalarray t[l:n

\342\200\2241,1:2].Edge(u,v) can

beaddedto t by the assignmentst[i,1]:=u; and t[i,2] :=v;. In the whileloopof line10,edgesareremovedfrom the heaponeby onein nondecreasingorderof cost.Line 14determinesthe setscontainingu and v. If j ^ k, thenverticesu and v are in different sets(and soin different trees)and edge(u,v) is includedinto t. Thesetscontainingu and v arecombined(line20).If u = v, the edge(u,v) is discardedas its inclusioninto t would createacycle.Line 23 determineswhethera spanningtree was found. It followsthat i ^ n - 1 iff the graph G is not connected.One can verify that thecomputingtimeis 0(\\E\\ log\\E\\), whereE is the edgesetof G.

Theorem4.6Kruskal'salgorithmgeneratesa minimum-costspanningtreefor every connectedundirectedgraphG.


1 t:=0;2 while

((\302\243has lessthan n \342\200\224 1edges) and

(2\302\243 ^ 0)) do3 {4 Choosean edge{v,w)from I? of lowestcost;5 Delete(v, w) from E;6 if {v,w)doesnot createa cycle in \302\243 thenadd (v,w) to

\302\243;

7 elsediscard{v,w);8 }

Algorithm4.9Early form of minimum-costspanningtreealgorithmdueto Kruskal

1 AlgorithmKruska\\(E,cost,n,t)2 // E is the setof edgesin G.G has n vertices.cost[u,v] is the3 // costof edge(u,v). t is the setof edgesin the minimum-cost4 // spanningtree.Thefinal costis returned.5 {6 Constructa heapout of the edgecostsusingHeapify;7 for i :=1to n doparent[i]:= \342\200\224 1;8 // Eachvertexis in a different set.9 i :=0;mincost:=0.0;10 while ((i < n \342\200\224 1) and (heapnot empty)) do11 {12 Deletea minimum costedge(u,v) from the heap13 and reheapify usingAdjust;14 j :=F\\nd(u); k :=Find(w);15 if (j ^ k) then16 {17 i:=i+ l;18 t[i,i\\:=u;t[i,2]:=vi19 mincost:=mincost+ cost[u,v];20 Union(j,fc);21 }22 }23 if (i y\302\243

n \342\200\224 1) thenwrite (\"Nospanningtree\;24 elsereturnmincost;25 }

Algorithm4.10Kruskal'salgorithm


Proof:Let G beany undirectedconnectedgraph.Let t bethe spanningtreefor G generatedby Kruskal'salgorithm.Let t' bea minimum-costspanningtreefor G.We show that both t and t' have the samecost.

Let E(t) and E(t')respectivelybethe edgesin t and t'. If n is the numberof verticesin G,then both t and t' have n \342\200\224 1edges.If E(t) = E{t'),thent is clearly of minimum cost.If E{t)^ E(t'),then let q bea minimum-costedgesuch that q \342\202\254

2\302\243(\302\243)and q 0 #(\302\243')\342\200\242 Clearly, sucha g must exist.The

inclusionof q into t' createsa uniquecycle (Exercise5).Let q,e\\. e2,...,e^be this uniquecycle.At leastone of the ej's,1< i < &, is not in

2\302\243(\302\243)as

otherwiset would alsocontainthe cycle q, e\302\261, e2,\342\200\242\342\200\242

\342\200\242, &k- Let e3 be an edgeon this cyclesuchthat ej \302\243 E(t).If ej is of lowercostthan q, then Kruskal'salgorithmwill considerej beforeq and includeej into t. To seethis, notethat alledgesin E(t) of costlessthan the costof q arealsoin E(t')and donot form a cycle with ey Socost(ej)>cost(q).

Now, reconsiderthe graph with edgeset E(t')U {q}.Removal of anyedgeon the cycle q,ei,e-2,\342\226\240\342\226\240

\342\226\240, e^ will leave behinda tree t\" (Exercise5).Inparticular,if we deletethe edgee7, then the resultingtree t\" will have acostno morethan the costof t' (ascost(ej)> cost{e)).Hence,t\" is alsoaminimum-costtree.

By repeatedlyusingthe transformationdescribedabove, tree t' can betransformedinto the spanningtree t without any increasein cost.Hence,tis a minimum-costspanningtree. \342\226\241

4.5.3 An OptimalRandomizedAlgorithm(*)Any algorithmfor finding the minimum-costspanningtreeof a given graphG(V,E)will have to spendfi(|F|+ \\E\\) time in the worst case,sinceithas to examineeachnode and eachedgeat leastoncebeforedeterminingthe correctanswer. A randomizedLas Vegas algorithmthat runs in time0(|V|+ \\E\\) canbedevisedas follows: (1)Randomly samplem edgesfromG (for somesuitablem). (2) Let G'be the inducedsubgraph;that is,G'has V as itsnodeset and the samplededgesin its edgeset.ThesubgraphG'neednot be connected.Recursivelyfind a minimum-costspanningtreefor eachcomponentof G'. Let F be the resultant minimum-costspanningforestof G'.(3)UsingF,eliminatecertainedges(calledthe F-heavy edges)of G that cannotpossibly be in a minimum-costspanningtree. Let G\" bethe graph that resultsfrom G after eliminationof the F-heavy edges.(4)Recursivelyfind a minimum-costspanningtree for G\". This will alsobe aminimum-costspanningtreefor G.

Steps1 to 3 are useful in reducingthe numberof edgesin G. Thealgorithm can be speededup further if we can reducethe numberof nodesin the input graph as well. Sucha nodeeliminationcan be effected usingthe Boriivkasteps.In a Boruvka step,for eachnode,an incidentedgewithminimumweight is chosen.Forexamplein Figure4.9(a),the edge(1,3)is


chosenfor node1,the edge(6,7) is chosenfor node7, and soon.All thechosenedgesareshown with thick lines.The connectedcomponentsof theinducedgraph are found. In the exampleof Figure4.9(a),the nodes1,2,and 3 form one component,the nodes4 and 5 form a secondcomponent,and the nodes6 and 7 form anothercomponent.Replaceeachcomponentwith a singlenode.The componentwith nodes1,2, and 3 is replacedwiththe nodea.Theothertwo componentsarereplacedwith the nodesb and c,respectively.Edgeswithin the individual componentsarethrown away. Theresultantgraph is shown in Figure4.9(b).In this graph keeponly an edgeof minimum weight betweenany two nodes.Deleteany isolatednodes.

Sincean edgeis chosenfor every node,the numberof nodesafter oneBoruvka stepreducesby a factor of at leasttwo. A minimum-costspanning tree for the reducedgraph can beextendedeasily toget a minimum-costspanningtree for the originalgraph. If E' is the set of edgesin theminimum-costspanningtree of the reducedgraph, we simply includeintoE' the edgeschosenin the Boruvka stepto obtain the minimum-costspanning

tree edgesfor the originalgraph. In the exampleof Figure 4.9,aminimum-costspanningtreefor (c)will consistof the edges(a,b) and (b,c).Thusa minimum-costspanningtreefor the graphof (a) will have the edges:(1,3),(3,2),(4,5),(6,7),(3,4),and (2,6).More detailsof the algorithmsaregiven below.

Definition4.2Let F bea forest that forms a subgraphof a given weightedgraphG(V,E).If u and v areany two nodesin F,letF(u,v) denotethe path(if any) connectingu and v in F and let Fcost(u,v) denotethe maximumweight of any edgein the path F(u,v).If thereis no path betweenu andv in F, Fcost(u,v) is taken to be oo.Any edge(x,y) of G is said to beF-heavy if cost[x,y] > Fcost(x,y) and F-lightotherwise. \342\226\241

Note that all the edgesof F areF-light.Also, any F-heavy edgecannotbelongto a minimum-costspanningtreeof G. The proofof this is left asan exercise.Therandomizedalgorithmappliestwo Boruvkastepsto reducethe numberof nodesin the input graph.Next,it samplesthe edgesof G andprocessesthem to eliminatea constantfractionof them. A minimum-costspanningtreefor the resultantreducedgraphis recursivelycomputed.Fromthis tree, a spanningtree for G is obtained.A detaileddescriptionof thealgorithmappearsas Algorithm 4.11.

Lemma4.3states that Step4 can be completedin time0(|F|+ \\E\\).The proofof this can be found in the referencessuppliedat the end of thischapter.Step1takes0(|V| + \\E\\) timeand step2 takes0(\\E\\) time.Step6takes

0(|2\302\243|)timeas well.Thetimetaken in all the recursivecallsin steps3

and 5 canbeshowntobe0(|V|+ \\E\\). Fora proof, seethe referencesat theend of the chapter.A crucialfact that is used in the proofis that both thenumberof nodesand the numberof edgesarereducedby a constantfactor,with high probability, in eachlevel of recursion.


(b) (c)

Figure4.9A Boruvka step

Lemma4.3Let G(V,E)be any weightedgraph and let F be a subgraphof G that forms a forest.Then,all the F-heavy edgesof G canbe identifiedin time0(\\V\\ + \\E\\). \342\226\241

Theorem4.7 A minimum-weight spanning tree for any given weightedgraphcanbe computedin time0(|V|+ |2\302\243|).

\342\226\241

EXERCISES1.Computea minimum costspanningtreefor the graph of Figure4.10

using(a) Prim'salgorithmand (b) Kruskal'salgorithm.2.Prove that Prim's methodof this sectiongeneratesminimum-cost

spanningtrees.


Step1.Apply two Boriivkasteps.At the end, the numberofnodeswill have decreasedby a factor at least4.Let the resultantgraphbeG(V,E).

Step2.Forma subgraphG'{V',E')of G,whereeachedgeof Gis chosenrandomly to be in E'with probability \\. Theexpectednumberof edgesin E' is K^-.

Step3.Recursivelyfind a minimum-costspanningforest F forG'.

Step4. Eliminateall the F-heavy edgesfrom G. With highprobability, at leasta constantfractionof the edgesof G will beeliminated.Let G\" be the resultantgraph.

Step5. Computea minimum-costspanningtree (callit T\for G\" recursively. The tree T\" will alsobe a minimum-costspanningtreefor G.

Step6.Returntheedgesof T\" togetherwith the edgeschoseninthe Boruvka stepsof step1.Thesearethe edgesof a minimum-costspanningtreefor G.

Algorithm4.11An optimalrandomizedalgorithm

3. (a) RewritePrim'salgorithmunder the assumptionthat the graphsarerepresentedby adjacencylists.

(b) Programand run the above version of Prim'salgorithmagainstAlgorithm4.9.Comparethe two ona representativesetof graphs.

(c) Analyze preciselythe computingtimeand spacerequirementsofyour new version of Prim'salgorithmusingadjacencylists.

4. Programand run Kruskal'salgorithm,describedin Algorithm 4.10.You will have to modify functions Heapifyand Adjust of Chapter2.Usethe sametest datayou devisedto testPrim'salgorithmin Exercise3.

5. (a) Showthat if t is a spanningtreefor the undirectedgraphG,thenthe additionof an edgeq, q \302\243 E(t) and q \342\202\254 E(G),to t createsaunique cycle.

4.6.OPTIMAL STORAGEON TAPES 229

Figure4.10Graph for Exercise1

(b) Showthat if any of the edgeson this unique cycle is deletedfromE(t) U {q},then the remainingedgesform a spanningtreeof G.

6. In Figure4.9,find a minimum-costspanningtreefor the graphof part(c)and extendthe treeto obtaina minimum costspanningtreefor thegraph of part (a). Verify the correctnessof your answer by applyingeitherPrim'salgorithmor Kruskal'salgorithmon the graph of part(a).

7. Let G(V,E)beany weightedconnectedgraph.

(a) If C is any cycle of G, then show that the heaviest edgeof Ccannotbelongto a minimum-costspanningtreeof G.

(b) Assume that F is a forest that is a subgraphof G.Showthat anyF-heavy edgeof G cannot belongto a minimum-costspanningtreeof G.

8.By consideringthe completegraphwith n vertices,showthat thenumber of spanningtreesin an n vertexgraphcanbegreaterthan 2n_1\342\200\224 2.

4.6 OPTIMAL STORAGEON TAPESThereare n programsthat are to be storedon a computertape of lengthI. Associatedwith eachprogrami is a length^,1< i < n. Clearly, allprogramscan bestoredon the tape if and only if the sum of the lengthsof


the programsis at most I. We assumethat whenever a programis to beretrievedfrom this tape,the tape is initially positionedat the front. Hence,if the programsarestoredin the order/ = ii,i%,...,in, the timetj neededto retrieveprogramij is proportionalto J2i<k<jîk-^ an programsareretrievedequally often, then the expectedor mean retrievaltime (MRT) is(Vn) Yj\\<j<n tj- I11the optimalstorageon tapeproblem,we are requiredto find a permutationfor the n programssothat when they are storedon the tape in this orderthe MRT is minimized.This problemfits theorderingparadigm.Minimizingthe MRT is equivalent tominimizingd(I) =zî<j<nXa<fc<jhk \342\226\240

Example4.8Let n = 3 and (Zi,^2?^3)= (5,10,3).Thereare n!= 6possibleorderings.Theseorderingsand their respectived values are:

ordering/1,2,31,3,22,1,32,3,13,1,23,2,1

d(I)5 + 5 + 10+ 5 + 10+ 3 =

5 + 5 + 3 + 5 + 3 + 1010+ 10+ 5 + 10+ 5 + 3 =

10+ 10+ 3 + 10+ 3 + 5 =

3 + 3 + 5 + 3 + 5 + 10 =

3 + 3+10+ 3 + 10+ 5 =

= 38= 31= 43= 41= 29= 34

The optimalorderingis 3,1,2. \342\226\241

A greedy approachto buildingthe requiredpermutationwould choosethe next programon the basisof someoptimizationmeasure.Onepossiblemeasurewould be the d value of the permutationconstructedsofar. Thenext programto be storedon the tape would be one that minimizestheincreasein d. If we have already constructedthe permutationii,%2,...,ir,thenappendingprogramj givesthe permutationi\\, i2,.\342\226\240

\342\226\240, ir,v+i = j- Thisincreasesthe d value by J2i<k<rhk+ h- SinceJ2i<k<rhk1Sfixed andindependent of j, we trivially observethat the increasein d is minimizedifthe next programchosenis the one with the leastlength from amongtheremainingprograms.

The greedy algorithmresultingfrom the above discussionis sosimplethat we won'tbotherto write it out.Thegreedy methodsimply requiresusto storethe programsin nondecreasingorderof their lengths.Thisorderingcan be carriedout in O(nlogn)timeusingan efficient sorting algorithm(e.g.,heap sort from Chapter2). For the programsof Example4.8,notethat the permutationthat yields an optimalsolutionis the one in which theprogramsare in nondecreasingorderof their lengths. Theorem4.8showsthat the MRT is minimizedwhen programsarestoredin this order.


Theorem4.8If l\\ < l<i < \342\200\242\342\200\242\342\200\242 < ln, then the orderingij = j,1< j< n,minimizes

11 k

\302\243\302\243<

over all possiblepermutationsof the ij.Proof:Let / = ii,i-2,\342\226\240\342\226\240

\342\200\242, in beany permutationof the indexset{1,2,...,n}.Then

fc=ij=i fc=i

If thereexista and b such that a Zj , then interchangingiaand if, resultsin a permutation/' with

d(l') ^2(n-k+ l)lik

.kjtb

+ (n-a+ l)lib + (n-b+l)lia

Subtractingd(I')from d(I),we obtain

d(I)-d(I')= (n-a+ Wia-lib)+ (n-b+l)(lib-lia)= (b-a)(lia-lib)> 0

Hence,no permutationthat is not in nondecreasingorderof the lt7s canhave minimum d. It is easy to seethat all permutationsin nondecreasingorderof the Zj's have the samed value. Hence,the orderingdefined byij = J,1< j <n.minimizesthe d value. \342\226\241

The tapestorageproblemcan beextendedto severaltapes.If therearem > 1tapes,To,...,Tm_i,then the programsare to be distributedoverthesetapes.Foreachtapea storagepermutationis to be provided.If Lis the storagepermutationfor the subsetof programson tapej, then d(Ij)is as definedearlier.The total retrievaltime {TD)is X/0<j<m\342\200\2241 d(Ij).Theobjectiveis to storethe programsin sucha way as to minimizeTD.

The obvious generalizationof the solutionfor the one-tapecaseis toconsiderthe programsin nondecreasingorderof Zj's.Theprogramcurrently


1 AlgorithmStore(n,m)2 // n is the numberof programsand m the numberof tapes.3 {4 j '\342\226\240= 0;// Next tape to storeon5 for i :=1to n do6 {7 write (\"appendprogram\",i,8 \"to permutationfor tape\",j);9 j '\342\226\240= (j+ 1) modm;10 }11 }

Algorithm4.12Assigningprogramsto tapes

beingconsideredis placedon the tape that resultsin the minimum increasein TD. This tape will be the onewith the leastamount of tape used sofar. If there is morethan one tape with this property, then the one withthe smallestindexcan beused.If the jobsare initially orderedso that l\\ <I2 < \342\200\242\342\200\242\342\226\240 < ln, then the first m programsareassignedto tapesTo,...,Tm-\\respectively.The next m programswill be assignedto tapesTo,...,Tm_irespectively.Thegeneralrule is that programi is storedon tapeT{ m0<x m.On any given tape the programsarestoredin nondecreasingorderof theirlengths. Algorithm 4.12presentsthis rule in pseudocode.It assumesthatthe programsare orderedas above. It has a computingtimeof &(n) anddoesnot need to know the programlengths. Theorem4.9proves that theresultingstoragepatternis optimal.

Theorem4.9If l\\ < I2 < \342\200\242\342\200\242\342\200\242 < ln, then Algorithm 4.12generatesanoptimalstoragepatternfor m tapes.

Proof:In any storagepatternfor m tapes,let r, be one greaterthan thenumberof programsfollowing programi on its tape.Thenthe totalretrievaltimeTD is given by

ntd = Y, nU

In any given storagepattern,for any given n, therecan be at most m

programs for which r, = j. FromTheorem4.8it follows that TD is minimizedif the m longestprogramshave r, = 1,the next m longestprogramshave


T{ = 2,and soon.When programsareorderedby length,that is,l\\ <h <\342\226\240\342\226\240\342\226\240< ln, then this minimizationcriteriais satisfied if r, = \$n

\342\200\224 i + l)/m\"|.Observethat Algorithm 4.12resultsin a storagepatternwith theser^s. \342\226\241

Theproofof Theorem4.9shows that thereare many storagepatternsthat minimizeTD.If we computer, = \\(n

\342\200\224 i + l)/m]for eachprogrami,then solongas allprogramswith the same

r\302\273

arestoredon different tapesand have rL

\342\200\224 1programsfollowing them,TDis the same.If n is a multipleof m, then thereareat least (m\$n'm storagepatternsthat minimizeTD.Algorithm 4.12producesoneof these.

EXERCISES1.Findan optimalplacementfor 13programson threetapesTo,T\\,and

T2, wherethe programsareof lengths12,5,8,32,7,5,18,26,4,3,11,10,and 6.

2. Show that replacingthe codeof Algorithm 4.12by

for i :=1to n dowrite (\"appendprogram\",i, \"to permutationfor

tape\",(i \342\200\224 1) modm);

doesnot affect the output.

3.Let Pi,P2,\342\200\242\342\200\242

\342\226\240, Pn beasetof n programsthat areto bestoredona tapeof lengthI. ProgramP,requiresOj amount of tape.If J2ai U thenclearlyall theprogramscanbestoredon the tape.So,assumeJ2a% > I.The problemis to selecta maximumsubset Q of the programsforstorageon the tape. (A maximumsubset is one with the maximumnumberof programsin it).A greedy algorithmfor this problemwouldbuild the subsetQ by includingprogramsin nondecreasingorderof a\\.

(a) Assume the p areorderedsuch that a\\ < 0,2 < \342\226\240\342\200\242\342\226\240 < an. Writea function for the above strategy. Your function shouldoutputan array ,s[l: n] such that s[i]= 1 if Pi is in Q and s[i]= 0otherwise.

(b) Show that this strategy always finds a maximumsubset Q suchthat J2pt^Qai<l-

(c) Let Q be the subsetobtainedusingthe above greedy strategy.Howsmallcan the tapeutilizationratio(J2pieQai)/lSe^

(d) Supposethe objectivenow is to determinea subsetof programsthat maximizesthe tape utilizationratio. A greedy approach


would be to considerprogramsin nonincreasingorderof ai. Ifthereis enoughspaceleft on the tapefor Pj,then it is includedinQ.Assume the programsareorderedso that a\\ > a \342\226\240\342\226\240\342\226\240> an.Write a function incorporatingthis strategy. What is its timeandspacecomplexity?

(e) Show that the strategy of part (d) doesn'tnecessarilyyield asubset that maximizes{J2pteQai)l^- How smallcan this ratioget? Proveyour bound.

4. Assume n programsof lengthsh,l2,---,lnare to bestoredon a tape.Programi is to be retrievedwith frequency /,. If the programsarestoredin the orderii,i2,\342\226\240\342\226\240\342\226\240,in,the expectedretrievaltime (ERT)is

j fc=iiY,fi

(a) Show that storingthe programsin nondecreasingorderof li doesnot necessarilyminimizethe ERT.

(b) Show that storingthe programsin nonincreasingorderof /j doesnot necessarilyminimizethe ERT.

(c) Show that the ERT is minimizedwhen the programsarestoredin nonincreasingorderof fi/l{.

5. Considerthe tape storageproblemof this section.Assume that twotapes Tl and T2, are available and we wish to distributen givenprogramsof lengthsl\\,l2,...,lnontothesetwo tapesin sucha mannerthat the maximumretrievaltimeis minimized.That is,if A and B arethe setsof programson the tapesTl and T2respectively,then we wishto chooseA and B suchthat max{ J2ieA h, J2ieBh } 1Sminimized.Apossiblegreedy approachto obtainingA and B wouldbe to start withA and B initially empty. Then considerthe programsone at a time.Theprogramcurrently beingconsideredis assignedto setA if J2ieA h= min { YjieAîiYjieBh }> otherwiseit is assignedto B. Show thatthis doesnot guaranteeoptimalsolutionseven if l\\ < I2 < \342\226\240\342\226\240\342\226\240 < ln.Show that the sameis true if we requirel\\ > I2 > \342\200\242\342\200\242\342\200\242> ln-

4.7 OPTIMAL MERGE PATTERNSIn Section3.4we saw that two sortedfiles containingn and m recordsrespectivelycouldbemergedtogetherto obtainonesortedfile in time0(n+m). When morethan two sortedfiles aretobemergedtogether,the mergecanbe accomplishedby repeatedlymergingsortedfiles in pairs.Thus, if

4.7. OPTIMAL MERGE PATTERNS 235

files x\\,X2,x%,and x\\ are to be merged,we could first mergex\\ and a;2to get a file y\\. Then we couldmergeyi and x.3 to get y-i- Finally, wecouldmergey2 and X4 to get the desiredsortedfile. Alternatively, we couldfirst mergex\\ and x-i getting y\\, then mergeX3 and

\302\2434and get y2, and

finally mergey\\ and y2 and get the desiredsortedfile. Given n sortedfiles,thereare many ways in which to pairwisemergethem into a singlesortedfile. Different pairingsrequirediffering amountsof computingtime. Theproblemwe addressourselvesto now is that of determiningan optimalway(onerequiringthe fewestcomparisons)to pairwisemergen sortedfiles. Sincethis problemcallsfor an orderingamongthe pairsto be merged,it fits theorderingparadigm.

Example4.9Thefiles x\\,X2, and x-j arethreesortedfiles of length30,20,and 10recordseach.Mergingx\\ and X'i requires50 recordmoves.Mergingthe result with 23 requiresanother 60 moves. Thetotal numberof recordmoves requiredto mergethe threefiles this way is 110.If, instead,we first

mergex^ and X3 (taking30 moves)and then x\\ (taking60 moves),the totalrecordmoves madeis only 90.Hence,the secondmergepatternis fasterthan the first. \342\226\241

A greedyattemptto obtainanoptimalmergepatternis easy to formulate.Sincemergingan n-recordfile and an m-recordfile requirespossibly n +m recordmoves, the obvious choicefor a selectioncriterionis: at eachstepmergethe two smallestsizefiles together.Thus, if we have five files(x\\,...,\302\2435)

with sizes(20,30,10,5,30), our greedy rule would generatethefollowing mergepattern:mergex\\ and 23 to get z\\ (\\z\\\\ = 15),mergez\\ andx\\ to get Z2 (|^21= 35), mergeX2 and x5 to get 23 (\\z^\\ = 60), and mergeZ2 and z;$ to get the answer 24. Thetotal numberof recordmoves is 205.Onecan verify that this is an optimalmergepatternfor the given probleminstance.

The mergepatternsuch as the one just describedwill be referredtoas a two-way mergepattern (eachmergestep involves the mergingof twofiles). The two-way mergepatternscan be representedby binary mergetrees.Figure4.11showsa binary mergetreerepresentingthe optimalmergepatternobtainedfor the above five files.Theleaf nodesaredrawn as squaresand representthe given five files.Thesenodesarecalledexternalnodes.Theremainingnodesare drawn as circlesand are calledinternal nodes.Eachinternalnodehas exactlytwo children,and it representsthe file obtainedby mergingthe files representedby its two children.The number in eachnodeis the length(i.e.,the numberof records)of the file representedby thatnode.

Theexternalnodex\\ is at a distanceof 3 from the rootnodez\\ (a nodeat level i is at a distanceof i \342\200\224 1from the root).Hence,the recordsof fileX4 are moved threetimes,onceto get zi,onceagain to get Z2, and finallyone moretimeto get 24. If dt is the distancefrom the root to the external


Z]

5

X 4

(Tty 20 30 30

X 1 *5 X 2

10*3

Figure4.11Binary mergetreerepresentinga mergepattern

nodefor file x\\ and qi, the lengthof x\\ is then the total numberof recordmoves for this binary mergetreeis

Xîi=i

This sum is calledthe weightedexternalpath length of the tree.An optimaltwo-way mergepatterncorrespondstoa binary mergetree

with minimum weightedexternalpath length. The function TreeofAlgorithm 4.13uses the greedy rule statedearlierto obtain a two-way mergetreefor n files.Thealgorithmhas as input a list listof n trees.Eachnodein a treehas threefields, Ichild,rchild,and weight. Initially, eachtree inlisthas exactlyonenode.Thisnodeis an externalnodeand has Ichildandrchildfields zerowhereasweight is the length of one of the n files to bemerged.Duringthe courseof the algorithm,for any tree in listwith rootnodet, t \342\200\224> weightis the lengthof the mergedfile it represents(t \342\200\224> weightequalsthe sum of the lengthsof the externalnodesin treet).FunctionTreeuses two functions,Least(list)and \\nsert(list,t).Least(list)finds a tree inlistwhoseroothas leastweightand returnsa pointerto this tree.Thistreeis removedfrom list.\\nsert(list,t) insertsthe treewith roott into list.Theorem 4.10shows that Tree (Algorithm4.13)generatesan optimaltwo-waymergetree.

4.7. OPTIMAL MERGEPATTERNS 237

treenode= record{treenode* Ichild;treenode* rchild;integerweight;

h1 AlgorithmTree(n)2 // listis a globallist of n singlenode3 // binary treesas describedabove.4 {5 for i :=1to n \342\200\224 1do6 {7 pt :=new treenode;// Geta new treenode.8 (pt \342\200\224> Ichild):=Least(list);// Mergetwo treeswith9 (pt \342\200\224> rchild) :=Least(list);// smallestlengths.10 (pt \342\200\224> weight):= ((pt \342\200\224>\342\200\242 Ichild) \342\200\224> weight)11

+((p\302\243~*\342\200\242 rchild) \342\200\224>\342\200\242 weight);

12lnsert(Zis\302\243,p\302\243);

13 }14 returnLeast(Zisi);// Treeleft in Zisi is the mergetree.15|

Algorithm4.13Algorithm to generatea two-way mergetree

Example4.10Let us seehow algorithmTreeworkswhen listinitiallyrepresents sixfiles with lengths(2,3,5,7,9,13).Figure4.12shows listat theend of eachiterationof the for loop.The binary mergetreethat resultsatthe end of the algorithmcan be used todeterminewhich files aremerged.Mergingis performedon those files which are lowest (have the greatestdepth) in the tree. \342\226\241

The main for loopin Algorithm 4.13is executedn \342\200\224 1 times. If listis kept in nondecreasingorderaccordingto the weight value in the roots,then Least(list)requiresonly 0(1)timeand \\nsert(list,t) can be done in

0(n)time.Hencethe totaltimetaken is 0(n2).In caselistis representedas a minheapin which the rootvalue is lessthan or equal to the values ofits children(Section2.4),then Least(list)and lnsert(/isi,t) can be done inO(logn) time.In this casethe computingtimefor Tree is 0(nlogn).Somespeedupmay be obtainedby combiningthe Insertof line 12with the Leastof line9.


Theorem4.10If listinitially containsn > 1singlenodetreeswith weightvalues (qi,q2,...,qn),then algorithmTree generatesan optimaltwo-waymergetreefor n files with theselengths.

Proof:Theproofis by inductionon n. For n = 1,a treewith no internalnodesis returnedand this treeis clearly optimal.Forthe inductionhypothesis,

assumethe algorithmgeneratesan optimaltwo-way mergetreefor all), 1< m < n. We show that the algorithmalsogenerates

optimal treesfor all (qi,q2,\342\226\240\342\226\240

\342\226\240, qn)- Without lossof generality,we canassumethat q\\ < q2 < \342\200\242\342\200\242\342\200\242 < qn and q\\ and q2 are the values of the weight fieldsof the treesfound by algorithmLeast in lines8 and 9 duringthe firstiteration of the for loop.Now, the subtreeT of Figure4.13is created.Let T\"

be an optimaltwo-way mergetree for {q\\,q2,\342\226\240\342\226\240\342\226\240,qn)- Let p be an internalnodeof maximumdistancefrom the root.If the childrenof p are not qiand q2, then we can interchangethe presentchildrenwith q\\ and q2without increasingthe weighted externalpath lengthof T\". Hence,T is alsoasubtreein an optimalmergetree. If we replaceT in T\" by an externalnodewith weight q\\ + q2, then the resultingtreeT\" is an optimalmergetreefor(<7i +12-,931\342\200\242\342\200\242\342\200\242i Qn)- Fromthe inductionhypothesis,after replacingT by theexternalnodewith value q\\ + q2, function Treeproceedsto find an optimalmergetreefor (q\\ + q2,53,...,qn). Hence,Tree generatesan optimalmergetreefor (q1,q2,...,qn). \342\226\241

Thegreedy methodto generatemergetreesalsoworks for the caseof k-ary merging.In this casethe correspondingmergetreeis a A;-ary tree. Sinceall internal nodesmust have degreeA;, for certainvalues of n there is nocorrespondingA;-ary mergetree.Forexample,when k = 3,thereis no A;-ary

mergetree with n = 2 externalnodes.Hence,it is necessaryto introducea certainnumberof dummy externalnodes.Eachdummy node is assigneda qi of zero.This dummy value doesnot affect the weightedexternalpathlengthof the resultingA;-ary tree. Exercise2 shows that a A;-ary treewithall internalnodeshaving degreek existsonly when the numberof externalnodesn satisfiesthe equality n mod(A; \342\200\224 1)= 1.Hence,at most k \342\200\224 2 dummynodeshave tobe added.The greedy rule to generateoptimalmergetreesis:at eachstepchoosek subtreeswith leastlengthfor merging.Exercise3proves the optimality of this rule.

HuffmanCodesAnother applicationof binary treeswith minimalweightedexternalpathlengthis to obtainan optimalsetof codesfor messagesMi,...,Mn+\\. Eachcodeis a binary string that is used for transmissionof the correspondingmessage.At the receivingend the codeis decodedusing a decodetree.A decodetree is a binary tree in which externalnodesrepresentmessages.

4.7. OPTIMAL MERGE PATTERNS 239

after

iterationlist

imtiai m m m m m m

\302\256_

m m m

5) [5] (Tj ^[9]

Figure4.12Treesin h's\302\243 of Tree for Example4.10


Figure4.13Thesimplestbinary mergetree

0/ \\1

o/ \\i

0/M,

Figure4.14Huffman codes

The binary bitsin the codeword for a messagedeterminethe branchingneededat eachlevel of the decodetree to reachthe correctexternalnode.Forexample,if we interpreta zeroas a left branch and a one as a rightbranch,then the decodetreeof Figure4.14correspondstocodes000,001,01,and 1for messagesMi,M2,M3,and M4 respectively.ThesecodesarecalledHuffman codes.Thecostof decodinga codeword is proportionaltothe numberof bitsin the code.This number is equal to the distanceofthe correspondingexternalnodefrom the root node.If qi is the relativefrequency with which messageMi will be transmitted,then the expecteddecodetimeis J2i<i<n+iQidi, whered{ is the distanceof the externalnodefor messageMi from the rootnode.Theexpecteddecodetimeis minimizedby choosingcodewords resultingin a decodetree with minimalweightedexternalpath length! Note that J2i<i<n+iQid-i is alsothe expectedlengthof a transmittedmessage.Hencethe codethat minimizesexpecteddecodetimealsominimizesthe expectedlengthof a message.

4.8.SINGLE-SOURCESHORTESTPATHS 241

EXERCISES1.Find an optimalbinary mergepatternfor ten files whose lengthsare

28,32,12,5,84,53,91,35,3,and 11.2. (a) Show that if all internalnodesin a treehave degreek, then the

numbern of externalnodesis suchthat n mod (k \342\200\224 1)= 1.(b) Showthat for every n suchthat n mod (k \342\200\224 1)= 1,thereexistsa

A;-ary treeT with n externalnodes(ina A;-ary treeall nodeshavedegreeat most k). Also show that all internalnodesof T havedegreek.

3. (a) Show that if n mod[k \342\200\224 1) = 1,then the greedy rule describedfollowing Theorem4.10generatesan optimalA;-ary mergetreeforall (qi,q2,...,qn)-

(b) Draw the optimalthree-way mergetreeobtainedusingthis rulewhen {quq2,...,qn) = (3,7,8,9,15,16,18,20,23,25,28).

4. Obtaina setof optimalHuffman codesfor the messages(Mi,...,My)with relativefrequencies(q-\\,...,q-?)= (4,5,7,8,10,12,20).Draw thedecodetreefor this setof codes.

5. Let T bea decodetree. An optimaldecodetreeminimizesJ2liî- Fora given setof g's,let Ddenotealltheoptimaldecodetrees.For any treeTel?,let L(T) = max{dt}and let SL(T)= \302\243 dt.Schwartzhasshownthat thereexistsa treeT* e D such that L(T*) = minTeD {L{T)jand SL(T*)= minTeD {SL(T)}.(a) For {qi,...,q8)= (1,1,2,2,4,4,4,4)obtaintreesTland T2such

that L(T1)>L(T2).(b) Usingthe data of a, obtain Tl and T2e D such that L(Tl)=

L(T2) but SL(T1)> SL(T2).(c) Showthat if the subalgorithmLeastusedin algorithmTree is such

that in caseof a tie it returnsthe treewith leastdepth,then Treegeneratesa treewith the propertiesof T*.

4.8 SINGLE-SOURCESHORTESTPATHS

Graphscanbeusedtorepresentthe highway structureof a stateor countrywith verticesrepresentingcitiesand edgesrepresentingsectionsof highway.The edgescan then be assignedweights which may be eitherthe distancebetweenthe two citiesconnectedby the edgeor the averagetimeto drivealongthat sectionof highway. A motoristwishingtodrive from city A to Bwould be interestedin answersto the following questions:


Path Length1) 1,4 102) 1,4,5 253) 1,4,5,2 454) 1,3 45

(b)Shortestpaths from 1

Figure4.15Graphand shortestpaths from vertex1to alldestinations

\342\200\242 Istherea path from A to B1\342\200\242 If thereis morethan onepath from A to B,whichis the shortestpath?

The problemsdefined by thesequestionsare specialcasesof the pathproblemwe study in this section.The length of a path is now defined tobe the sum of the weights of the edgeson that path. The starting vertexof the path is referredtoas the source,and the last vertexthe destination.Thegraphsare digraphsto allow for one-way streets.In the problemweconsider,we are given a directedgraph G = (V,E),a weightingfunctioncostfor the edgesof G,and a sourcevertexvq. Theproblemis to determinethe shortestpaths from vq to allthe remainingverticesof G.It is assumedthat all the weights are positive.Theshortestpath betweenvo and someothernodev is an orderingamonga subsetof the edges.Hencethis problemfits the orderingparadigm.

Example4.11Considerthe directedgraphof Figure4.15(a).Thenumbersonthe edgesarethe weights.If node1is the sourcevertex,then the shortestpath from 1to 2 is 1,4,5, 2.Thelengthof this path is 10+ 15+ 20 = 45.Even though thereare threeedgeson this path, it is shorterthan the path1,2which is of length 50.Thereis no path from 1to 6. Figure4.15(b)liststhe shortestpaths from node1to nodes4, 5, 2, and 3, respectively.Thepaths have beenlistedin nondecreasingorderof path length. \342\226\241

To formulate a greedy-basedalgorithmto generatethe shortestpaths,we must conceiveof a multistagesolutionto the problemand alsoof anoptimizationmeasure.Onepossibility is tobuild the shortestpaths oneby

(a)Graph


one. As an optimizationmeasurewe can use the sum of the lengthsof allpaths sofar generated.Forthis measuretobe minimized,eachindividualpath must be of minimum length. If we have already constructedi shortestpaths,thenusingthisoptimizationmeasure,the nextpath to beconstructedshouldbethe nextshortestminimum lengthpath. Thegreedy way (andalsoa systematicway) to generatethe shortestpaths from vq to the remainingverticesis to generatethesepaths in nondecreasingorderof path length.First, a shortestpath to the nearestvertex is generated.Then a shortestpath to the secondnearestvertex is generated,and soon.For the graphof Figure4.15(a)the nearestvertex to vq = 1 is 4 (cost[l,4]= 10).Thepath 1,4is the first path generated.The secondnearestvertex to node 1is 5 and the distancebetween1and 5 is 25.The path 1,4,5is the nextpath generated.In ordertogeneratethe shortestpaths in this order,weneed to be abletodetermine(1) the next vertex to which a shortestpathmust begeneratedand (2) a shortestpath to this vertex.Let S denotetheset of vertices(includingvq) towhich the shortestpaths have already beengenerated.For w not in S, let dist[w] be the length of the shortestpathstarting from vo, goingthroughonly thoseverticesthat arein S,and endingat w. We observethat:

1.If the next shortestpath is to vertexu, then the path beginsat vq,endsat u, and goesthroughonly thoseverticesthat arein S.To provethis, we must show that all the intermediateverticeson the shortestpath tou arein S.Assume thereis a vertexw on this path that is notin S.Then, the vq to u path alsocontainsa path from vo tow that isof lengthlessthan the vq to u path. By assumptionthe shortestpathsarebeinggeneratedin nondecreasingorderof path length,and so theshorterpath vq to w must already have beengenerated.Hence,therecan be no intermediatevertexthat is not in S.

2.Thedestinationof the next path generatedmust be that of vertexuwhich has the minimum distance,dist[u],amongallverticesnot in S.Thisfollows from the definitionof distand observation1.In casethereareseveralverticesnot in Swith the samedist,then any of thesemaybeselected.

3.Havingselecteda vertexu as in observation2 and generatedtheshortest vo to u path, vertexu becomesa memberof S.At this point thelengthof the shortestpaths starting at v0, goingthoughverticesonlyin S, and endingat a vertex w not in S may decrease;that is,thevalue of dist[w] may change. If it doeschange,then it must be dueto a shorterpath starting at vq and goingto u and then to w. Theintermediateverticeson the vq to u path and the u to w path mustall be in S.Further,the vo to u path must be the shortestsuchpath;otherwisedist[w] is not defined properly.Also, the u to w path canbe chosensoas not to containany intermediatevertices.Therefore,


we can concludethat if dist[w] is to change(i.e.,decrease),then it isbecauseof a path from vo to u to w, wherethe path from vo to u isthe shortestsuchpath and the path from u to w is the edge(u,w).The lengthof this path is dist[u]+cost[u,w\\.

Theabove observationsleadto a simpleAlgorithm 4.14for the single-sourceshortestpath problem.This algorithm(known as Dijkstra'salgorithm) only determinesthe lengthsof the shortestpaths from vq to allotherverticesin G.Thegenerationof the paths requiresa minorextensionto thisalgorithmand is left as an exercise.In the function ShortestPaths(Algorithm4.14)it is assumedthat the n verticesof G arenumbered1throughn.Theset S is maintainedas a bit array with S[i]= 0 if vertex i is not in S andS[i]= 1if it is.It is assumedthat the graph itselfis representedby its costadjacencymatrixwith

cos\302\243[i, j]'sbeingthe weight of the edge(i,j). Theweight cost[i,j]is settosomelargenumber,oo,in casethe edge{i,j) is notin E(G).Fori = j, cost[i,j] can beset to any nonnegativenumberwithoutaffecting the outcomeof the algorithm.

Fromourearlierdiscussion,it is easy to seethat the algorithmis correct.Thetimetaken by the algorithmon a graph with n verticesis 0{n2).Toseethis, note that the for loopof line7 in Algorithm 4.14takes0(n)time.Thefor loopof line12is executedn \342\200\224 2 times.Eachexecutionof this looprequires0{n)timeat lines 15and 16to selectthe next vertex and againat the for loopof line 18to updatedist.Sothe total timefor this loopis0{n2).In casea list t of verticescurrently not in s is maintained,then thenumberof nodeson this list would at any timebe n \342\200\224 num. This wouldspeedup lines 15and 16and the for loopof line 18,but the asymptotictimewould remain0{n2).This and othervariations of the algorithmareexploredin the exercises.

Any shortestpath algorithmmust examineeachedgein the graph atleastoncesinceany of the edgescouldbe in a shortestpath. Hence,theminimum possibletimefor suchan algorithmwould be 0(|.E|).Sincecostadjacencymatriceswere used to representthe graph, it takes0(n2)timejust to determinewhich edgesare in G,and soany shortestpath algorithmusingthis representationmust take

f\302\243(n2)time.Forthis representationthen,

algorithmShortestPathsis optimaltowithin a constantfactor.If a changeto adjacencylistsis made,the overall frequencyof the for loopof line18canbebroughtdown to 0(|.E|)(sincedistcan changeonly for verticesadjacentfrom u). If V \342\200\224 S is maintainedas a red-blacktree(seeSection2.4.2),eachexecutionof lines 15and 16takesO(logn) time. Note that a red-blacktree supportsthe following operationsin O(logn) time: insert,delete(anarbitrary element),find-min, and search(for an arbitrary element).Eachupdatein line21takesO(logn) timeas well (sincean updatecan be doneusinga deleteand an insertioninto the red-blacktree).Thusthe overall runtimeis 0((n+ \\E\\) logn).


1 AlgorithmShortestPaths(u,cost,dist,n)2 // dist[j],1< j < n, is set to the lengthof the shortest3 // path from vertexv to vertexj in a digraphG with n4 // vertices.dist[v] is set to zero.G is representedby its5 // costadjacencymatrixcost[l:n, 1:n].6 {7 for i :=1to n do8 {// InitializeS.9 S[i]:=false;efe\302\243[i]

:=cost[v,i];10 }11 5[u]:=true;dist[v] :=0.0;// Put v in 5.12 for num :=2 to n \342\200\224 1do13 {14 // Determinen \342\200\224 1paths from v.15 Chooseu from amongthoseverticesnot16 in S suchthat dist[u]is minimum;17 5[u] :=true;// Put u in S.18 for (eachw adjacentto u with S[w]= false)do19 // Updatedistances.20 if {dist[w\\ > dist[u]+cost[u,w])) then21 dist[w] :=dist[u]+cost[u,w];22 }23 }

Algorithm4.14Greedy algorithmtogenerateshortestpaths

Example4.12Considerthe eight vertex digraph of Figure4.16(a)withcostadjacencymatrixas in Figure4.16(b).The values of distand theverticesselectedat eachiterationof the for loopof line12in Algorithm4.14for finding all the shortestpaths from Bostonareshown in Figure4.17.Tobeginwith, S containsonly Boston.In the first iterationof the for loop(that is,for num = 2), the city u that is not in S and whosedist[u] isminimum is identifiedtobeNew York. New Yorkentersthe setS.Also thedist[] valuesof Chicago,Miami, and New Orleansget alteredsincethereareshorterpaths to thesecitiesvia New York. In the next iterationof the forloop,the city that entersS is Miami sinceit has the smallestdist[] valuefrom amongall the nodesnot in S. None of the dist[] values are altered.Thealgorithmcontinuesin a similarfashionand terminateswhen only sevenof the eightverticesare in S.By the definition of dist,the distanceof thelast vertex,in this caseLos Angeles,is correctas the shortestpath fromBostontoLos Angelescan go throughonly the remainingsixvertices. \342\226\241


SanFrancisco;2W^\302\260^3

300

LosAngeles

Boston

Chicago J^0JL_---H^1000 /250

6^) NewYork

v5^_1000NewOrleans

\342\200\224~-~/7x

Miami

(a)Digraph

1

234567

8

1

0300100

1700

2

0800

3

01200

4

015001000

5

0

6

2500

7

9000

8

14001000

0

(b)Length-adjacencymatrix

Figure4.16Figuresfor Example4.12

One can easily verify that the edgeson the shortestpaths from avertex v to all remainingverticesin a connectedundirectedgraph G form aspanningtree of G. This spanningtree is calleda shortest-pathspanningtree.Clearly, this spanningtreemay be different for different rootverticesv. Figure4.18shows a graph G, its minimum-costspanningtree, and ashortest-pathspanningtreefrom vertex1.


Iteration

Initial

1

23456

s

-(5){5,6}{5,6,7){5,6,7,4)15,6,7,4,8){5,6,7,4,8,3)(5,6,7,4,8,3,2)

Vertex

selected

\342\200\224

674832

LA

[1]+oo

+oo

+oo

+oo

335033503350

SF

[2]

+oo

+oo

+oo

+oo

+oo

32503250

DEN

[3]

+oo

+oo

+oo

2450245024502450

Distance

CHI BOST

[4] [5]

1500 01250 01250 01250 01250 01250 01250 0

NY

[6]

250250250250250250250

MIA

[7]

+oo

115011501150115011501150

NO

[8]+\302\260o

165016501650165016501650

Figure4.17Action of ShortestPaths

EXERCISES1.UsealgorithmShortestPathstoobtainin nondecreasingorderthe lengths

of the shortestpaths from vertex1to all remainingverticesin thedigraph of Figure4.19.

2.Usingthe directedgraphof Figure4.20explainwhy ShortestPathswillnot work properly.What is the shortestpath betweenverticesv\\ andWj ?

3. RewritealgorithmShortestPathsunder the following assumptions:

(a) G is representedby its adjacencylists.The head nodesareHEAD(l),...,HEAD(n)and eachlist nodehas threefields:VERTEX, COST,and LINK.COSTis the lengthof the correspondingedgeand n the numberof verticesin G.

(b) Insteadof representingS,the setof verticesto which the shortestpaths have already beenfound, the set T = V(G) \342\200\224 S isrepresented usinga linked list.What canyou say about the computingtimeof your new algorithmrelativeto that of ShortestPaths?

4. Modify algorithmShortestPathsso that it obtainsthe shortestpathsin additionto the lengthsof thesepaths.What is the computingtimeof your algorithm?


(a)A Graph (b)Minimum costspanning tree

(c)Shortestpath spanning treefrom vertex 1.

Figure4.18Graphsand spanningtrees

Figure4.19Directedgraph


Figure4.20Another directedgraph

4.9 REFERENCESAND READINGS

Thelineartimealgorithmin Section4.3for the treevertexsplittingproblemcan be found in \"Vertex upgradingproblemsfor VLSI,\"by D.Paik,Ph.D.thesis,Departmentof ComputerScience,University of Minnesota,October1991.

Thetwo greedy methodsfor obtainingminimum-costspanningtreesaredue toR. C.Primand J.B.Kruskal,respectively.

An O(eloglogu)timespanningtreealgorithmhas beengiven by A. C.Yao.

Theoptimalrandomizedalgorithmfor minimum-costspanningtreespresented in this chapterappearsin \"A randomizedlinear-timealgorithmforfinding minimum spanningtrees,\"by P.N. Kleinand R. E.Tarjan,inProceedings of the 26th Annual Symposium,on Theory of Computing,1994,pp.9-15.Seealso\"A randomizedlinear-timealgorithmtofind minimumspanning trees,\"by D.R. Karger,P.N. Klein,and R. E.Tarjan,Journalof theACM42,no.2 (1995):321-328.

Proofof Lemma4.3canbefound in \"Verificationand sensitivity analysisof minimum spanningtreesin lineartime,\" by B.Dixon,M. Rauch,and R.E.Tarjan,SIAMJournalon ComputinglX(1992):1184-1192,and in \"A simpleminimum spanningtreeverification algorithm,\"by V. King,Proceedingsofthe Workshop on Algorithmsand DataStructures,1995.

A very nearly lineartimealgorithmfor minimum-costspanningtreesappears in \"Efficient algorithmsfor finding minimum spanningtreesinundirected and directedgraphs,\"by H.N. Gabow,Z. Galil,T. Spencer,andR. E.Tarjan,Combinatorica6 (1986):109-122.


A lineartimealgorithmfor minimum-costspanningtreeson a strongermodelwherethe edgeweights can be manipulatedin their binary form isgiven in \"Trans-dichotomousalgorithmsfor minimum spanningtreesandshortestpaths,\" by M. Fredmanand D.E.Willard, in Proceedingsof the31stAnnual Symposium on Foundationsof ComputerScience,1990,pp.719-725.

Thegreedy methoddevelopedhereto optimally storeprogramson tapeswas first devisedfor a machineschedulingproblem.In this problemn jobshave to be scheduledon m processors.Jobi takesti amountof time.Thetimeat which a jobfinishes is the sum of the jobtimesfor all jobspreceding

and includingjobi. Theaveragefinish timecorrespondsto the meanaccesstimefor programson tapes.The (m\\)n'm schedulesreferredto inTheorem4.9are known as SPT(shortestprocessingtime) schedules.Therule to generateSPTschedulesaswell as the ruleof Exercise4 (Section4.6)aredue toW. E.Smith.

The greedy algorithmfor generatingoptimalmergetreesis due to D.Huffman.

Fora given set {q\\,...,qn} therearemany setsof Huffman codesminimizing Yl Qidi- Fromamongstthesecodesetsthereis onethat has minimum^di and minimum max {di}.An algorithmto obtain this codeset wasgiven by E.S.Schwartz.

Theshortest-pathalgorithmof the text is due to E.W. Dijkstra.Forplanargraphs,the shortest-pathproblemcanbesolved in lineartime

as has beenshown in \"Fastershortest-pathalgorithmsfor planar graphs,\"by P.Klein,S.Rao,and M. Rauch,in Proceedingsof the ACM Symposiumon Theory of Computing,1994.

The relationshipbetweengreedy methodsand matroidsis discussedinCombinatorialOptimization, by E. Lawler,Holt,Rinehart and Winston,1976.

4.10 ADDITIONALEXERCISES

1.[Coinchanging]Let An = {a\\,a,2,\342\226\240\342\226\240

\342\226\240, an}be a finite set of distinctcointypes (for example,a\\ = 50^/, a,2 = 25^/, a^ = 10^/,and soon.) Wecan assumeeachaj is an integerand a\\ >a2 > \342\226\240\342\200\242\342\200\242> an. Eachtype isavailable in unlimitedquantity. Thecoin-changingproblemis to makeup an exactamount C usinga minimum total numberof coins.C isan integer>0.

4.10.ADDITIONALEXERCISES 251

(a) Showthat if an ^ 1,then thereexistsa finite setof cointypes anda C for which thereis no solutionto the coin-changingproblem.

(b) Show that thereis always a solutionwhen an = 1.(c) When an = 1,a greedy solutionto the problemmakeschange

by usingthe coin types in the ordera\\, a-i, \342\226\240\342\226\240\342\226\240,an.When cointype at is beingconsidered,asmany coinsof this type as possiblearegiven.Write an algorithmbasedon this strategy. Show thatthis algorithmdoesn'tnecessarilygeneratesolutionsthat use theminimum totalnumberof coins.

(d) Show that if An = {A;71-1,A;\"-2,...,k0}for somek > 1,then thegreedymethodof part (c)alwaysyieldssolutionswith a minimumnumberof coins.

2. [Setcover] You aregiven a family S of m setsSi,1< i < m. Denoteby \\A\\ the sizeof set A. Let \\Si\\

= jf, that is,Si = {si,S2,\342\226\240\342\226\240\342\226\240,Sjf}.A subsetT = {Ti,T2,...,T^} of S is a family of setssuch that foreachi,1 < i < k, Tj = Sr for somer, 1< r < m. ThesubsetT isa coverof S iff LIT,= USi.Thesizeof T, |T|,is the numberof setsin T. A minimum cover of S is a cover of smallestsize.Considerthe following greedy strategy:build T iteratively, at the kth iterationT = {Ti,...,7fc-i})now add to T a set Sj from S that containsthelargestnumberof elementsnot already in T,and stop when UT?= US{.

(a) Assume that USi = {1,2,...,n}and m < n. Usingthe strategyoutlinedabove, write an algorithmto obtain set covers.Howmuch timeand spacedoesyour algorithmrequire?

(b) Show that the greedy strategy above doesn'tnecessarilyobtainaminimum setcover.

(c) Supposenow that a minimum cover is definedtobeonefor whichî=i \\Tt\\ is minimum. Doesthe above strategy always find aminimum cover?

3. [Nodecover] Let G = (V, E) be an undirectedgraph.A nodecover ofG is a subsetU of the vertexsetV suchthat every edgein E is incidentto at leastone vertex in U. A minimum nodecover is one with thefewestnumberof vertices.Considerthe following greedy algorithmforthis problem:


1 AlgorithmCover(V,E)2 {3 C7\" :=0;4 repeat5 {6 Let q bea vertexfrom V of maximumdegree;7 Add q to t/;Eliminateq from F;8 E :=E \342\200\224 {(x,y) suchthat x = q ov y = q}',9 }until(E = 0); // t/ is the nodecover.10 }

Doesthis algorithmalwaysgeneratea minimum nodecover?

4. [Traveling salesperson]Let G bea directedgraphwith n vertices.Letlength(u,v) be the lengthof the edge{u,v). A path startingat a givenvertex vq7 goingthroughevery othervertexexactly once,and finallyreturningtov0 is calleda tour. Thelengthof a tour is the sum of thelengthsof the edgeson the path defining the tour. We areconcernedwith finding a tour of minimum length. A greedy way to constructsucha tour is:let (P,v) representthe path sofar constructed;it startsat vo and endsat v. Initially P is empty and v = vq, if allverticesin Gareon P, then includethe edge{v,vq) and stop;otherwiseincludeanedge(v,w) of minimum lengthamongalledgesfrom v to a vertexwnot on P. Show that this greedy methoddoesn'tnecessarilygeneratea minimum-lengthtour.

Chapter5

DYNAMICPROGRAMMING


Dynamic programmingis an algorithmdesignmethodthat can be usedwhen the solutionto a problemcan be viewedas the resultof a sequenceofdecisions.In earlierchapterswe saw many problemsthat canbeviewedthisway. Herearesomeexamples:

Example5.1[Knapsack]The solutionto the knapsackproblem(Section4.2) can be viewed as the result of a sequenceof decisions.We have todecidethe values of Xj, 1< i < n. Firstwe make a decisionon x\\, then onX2, then on X3, and soon.An optimalsequenceof decisionsmaximizestheobjectivefunction Y^V%X%- (It alsosatisfies the constraintsYl,wixi < m and0 < Xi < 1.) \342\226\241

Example5.2[Optimal mergepatterns] This problemwas discussedinSection 4.7. An optimalmergepatterntellsus which pair of files shouldbemergedat eachstep.As a decisionsequence,the problemcallsfor us todecide which pairof files shouldbemergedfirst, which pairsecond,whichpairthird, and soon.An optimalsequenceof decisionsis a least-costsequence.

\342\226\241

Example5.3[Shortestpath] Oneway to find a shortestpath from vertexi to vertexj in a directedgraph G is to decidewhich vertexshouldbe thesecondvertex,which the third, which the fourth, and soon, until vertexjis reached.An optimalsequenceof decisionsis one that resultsin a path ofleastlength. \342\226\241

253

254 CHAPTER5. DYNAMICPROGRAMMING

Forsomeof the problemsthat may be viewed in this way, an optimalsequenceof decisionscan be found by makingthe decisionsone at a timeandnever makingan erroneousdecision.Thisis true for allproblemssolvableby the greedy method.Formany otherproblems,it is not possibletomakestepwisedecisions(basedonly on localinformation)in sucha mannerthatthe sequenceof decisionsmadeis optimal.

Example5.4[Shortestpath] Supposewe wish to find a shortestpath fromvertexi to vertexj. Let Ai be the verticesadjacentfrom vertexi. Which ofthe verticesin Ai shouldbe the secondvertexon the path? Thereis no waytomakea decisionat this timeand guaranteethat future decisionsleadingto an optimalsequencecan be made.If on the otherhand we wish tofinda shortestpath from vertexi to allotherverticesin G,then at eachstep,acorrectdecisioncan bemade(seeSection4.8). \342\226\241

Oneway to solveproblemsfor which it is not possibletomakea sequenceof stepwisedecisionsleadingto an optimaldecisionsequenceis to try allpossible decisionsequences.We couldenumeratealldecisionsequencesandthenpick out the best.But the timeand spacerequirementsmay beprohibitive.Dynamic programmingoften drastically reducesthe amountof enumerationby avoidingthe enumerationof somedecisionsequencesthat cannotpossiblybe optimal.In dynamic programmingan optimalsequenceof decisionsisobtainedby makingexplicitappealto the principleof optimality.

Definition5.1[Principleof optimality] The principleof optimality statesthat an optimalsequenceof decisionshas the property that whatever theinitial state and decisionare, the remainingdecisionsmust constituteanoptimaldecisionsequencewith regardto the state resultingfrom the firstdecision. \342\226\241

Thus, the essentialdifferencebetweenthe greedy methodand dynamicprogrammingis that in the greedy methodonly one decisionsequenceisever generated.In dynamic programming,many decisionsequencesmay begenerated.However,sequencescontainingsuboptimalsubsequencescannotbe optimal(if the principleof optimality holds)and sowill not (as far aspossible)begenerated.

Example5.5[Shortestpath] Considerthe shortest-pathproblemofExample

5.3.Assume that i,ii,i?,...,i/.,jis a shortestpath from i to j. Startingwith the initialvertexi,a decisionhas beenmadeto go to vertex i\\.

Following this decision,the problemstate is dennedby vertex i\\ and we needto find a path from i\\ to j. It is clearthat the sequencei\\, i?,...,ik,jmustconstitutea shortesti\\ to j path. If not, let i\\, r\\, r2,\342\226\240\342\226\240

\342\226\240, rq,j bea shortesti\\ to j path. Then i,ii,n, \342\226\240\342\226\240

\342\226\240, rq,j is an i to j path that is shorterthan thepath i,i\\, i2,- --,ik,j-Thereforethe principleof optimality appliesfor thisproblem. \342\226\241

5.1.THEGENERALMETHOD 255

Example5.6[0/1knapsack]The0/1knapsackproblemis similarto theknapsackproblemof Section4.2exceptthat the Xj's arerestrictedtohavea value of either0 or 1.UsingKNAP(Z,j,y) to representthe problem

maximize\302\243kj<jPi%i

subjectto Y,i<i<jwixi < V (5-1)Xj = 0 or 1,I < i < j

the knapsackproblemis KNAP(1,n,m).Let yi,V2, \342\200\242\342\200\242\342\226\240,yn be an optimalsequenceof 0/1values for xi,x2,\342\226\240\342\226\240\342\226\240,xn, respectively. If y\\ = 0, theny2,?/3,\342\226\240\342\200\242

\342\200\242, yn must constitutean optimalsequencefor the problemKNAP(2,n, m). If it doesnot, then yi,y2,---,ynis n\302\260t an optimalsequenceforKNAP(1,n,m).If y\\ = 1,then y2,...,ynmust be an optimalsequencefor the problemKNAP(2,n,m\342\200\224

w\\). If it isn't,then thereis another0/1sequencez2,z3,...,znsuch that Y.2<i<nwiz% < m -wi and Y,2<i<nPizi>Y^2<i<nPiVi- Hence,the sequenceyi,Z2,zs,...,znis a sequencefor (5.1)with greatervalue.Again the principleof optimality applies. \342\226\241

Let Sobethe initialproblemstate.Assumethat n decisionsdi, 1< i < n,have to be made.Let D\\ = {n,r2l \342\226\240\342\226\240

\342\226\240, r3}be the set of possibledecisionvalues for d\\. Let Si be the problemstate following the choiceof decisionfii 1< i <j- Let Fj bean optimalsequenceof decisionswith respectto theproblemstateS{.Then,when the principleof optimality holds,an optimalsequenceof decisionswith respecttoSo is the bestof the decisionsequencesrllYil 1< i <j.Example5.7[Shortestpath] Let Ai bethe setof verticesadjacentto vertexi. Foreachvertexk G Ai, let F^ be a shortestpath from k to j. Then, ashortesti to j path is the shortestof the paths {i,Tk\\kG Ai}. \342\226\241

Example5.8[0/1knapsack]Let gj(y) be the value of an optimalsolutiontoKNAP(j+ l,n,y).Clearly,go(m)is the value of an optimalsolutiontoKNAP(1,n,m).The possibledecisionsfor x\\ are 0 and 1 {D\\

= {0,1}).Fromthe principleof optimality it follows that

g0(m)= max {ffi(m),gi(m-wi) +pi} (5.2)D

While the principleof optimality has beenstatedonly with respecttothe initialstateand decision,it can beappliedequally well to intermediatestatesand decisions.The next two examplesshow how this can bedone.

Example5.9[Shortestpath] Let k bean intermediatevertexon a shortesti toj path i,i\\,i2,\342\226\240\342\226\240\342\226\240,h,pi,p2,-\342\226\240\342\226\240,j- Thepaths i,i\\,...,k and k,pi,\342\226\240\342\226\240

\342\226\240, jmust, respectively,beshortesti tok and k toj paths. \342\226\241


Example5.10[0/1knapsack]Let yi,?/2,\342\226\240..,ynbe an optimalsolutionto

KNAP(1,n,m). Then, for eachj, 1< j < n, 2/1,...,2/j, and yj+Xl...,ynmust be optimalsolutionsto the problemsKNAP(l,j,Z\\<i<jwiyi)andKNAP(j+ 1,n, m \342\200\224 J2i<i<jwiVi) respectively.Thisobservationallowsus to

generalize(5.2)to

^(2/)= max {gi+i(y),gi+]_(y-wi+l) + pl+l} (5.3)\342\226\241

The recursiveapplicationof the optimality principleresultsin arecurrence equationof type (5.3).Dynamic programmingalgorithmssolve thisrecurrenceto obtaina solutionto the given probleminstance.Therecurrence (5.3)can be solved usingthe knowledgegn{y) = 0 for all y > 0 and9n(y) =

~\302\260\302\260 f\302\260r 2/ < 0.Fromgn(y),onecan obtaingn~i(y)using(5.3)withi = n \342\200\224 1.Then, usinggn-i(y)i one can obtain gn_2(y).Repeatingin thisway, onecan determineg\\(y) and finally go(m)using(5.3)with i = 0.

Example5.11[0/1knapsack]Considerthe casein which n = 3, W\\ =2,W2 =

3,\302\253;3= 4, pi = 1,^2= 2,p3= 5, and m = 6. We have to compute

go(6).Thevalue of g0(6)= max {#1(6),g\\(4) + 1}.In turn, 51(6)= max{52(6),g2(3)+ 2}.But#2(6)= max{53(6),53(2)+

5} = max {0,5}= 5. Also, ff2(3) = max {#,(3),53(3- 4) + 5} =max {0,\342\200\22400}

= 0.Thus,51(6)= max {5,2}= 5.Similarly,gx(4)= max {52(4),\302\2432(4

- 3) + 2}.But #2(4)= max {g3(4),53(4-4)+5}= max {0,5}= 5.Thevalue of g2{l)= max {53(1),53(1-4) + 5}= max {0,-00}= 0.Thus, gx (4) = max {5,0}= 5.

Therefore,go(6)= max {5,5 + 1}= 6. \342\226\241

Example5.12[Shortestpath] Let P~ be the setof verticesadjacenttovertex j (that is,k G Pj iff (k,j)G E(G)).Foreachk G Pj,let T^ bea shortesti to fc path. Theprincipleof optimality holdsand a shortesti toj path isthe shortestof the paths {F^,j\\k G Pj}.

To obtainthis formulation,we startedat vertexj and lookedat the lastdecisionmade.Thelast decisionwas to use oneof the edges{k,j),k G Pj.In a sense,we are lookingbackward on the i toj path. \342\226\241

Example5.13[0/1knapsack]Lookingbackward on the sequenceofdecisions xi,X2,---,xn,we seethat

fj(y) = max {/j_i(2/),fj-i(y-wj) + pj} (5.4)

wherefj(y) is the value of an optimalsolutiontoKNAP(1,j,y).

5.2.MULTISTAGEGRAPHS 257

Thevalue of an optimalsolutiontoKNAP(1,n, m) is fn(m). Equation5.4canbesolvedby beginningwith /o(y) = 0 for all y, y > 0,and /o(y) = \342\200\224 oo,for ally, y < 0.Fromthis, /i,/2,\342\200\242\342\200\242

\342\200\242, /n can be successivelyobtained. \342\226\241

Thesolutionmethodoutlinedin Examples5.12and 5.13may indicatethat onehas to lookat all possibledecisionsequencesto obtain an optimaldecisionsequenceusingdynamic programming.This is not the case.Because of the useof the principleof optimality, decisionsequencescontainingsubsequencesthat are suboptimalare not considered.Although the totalnumberof different decisionsequencesis exponentialin the numberofdecisions (if thereared choicesfor eachof the n decisionsto bemadethen therearedn possibledecisionsequences),dynamic programmingalgorithmsoftenhave a polynomialcomplexity.

Another importantfeatureof the dynamic programmingapproachis thatoptimalsolutionsto subproblemsare retainedsoas to avoid recomputingtheir values. Theuse of thesetabulatedvalues makesit natural to recastthe recursiveequationsinto an iterativealgorithm.Most of the dynamicprogrammingalgorithmsin this chapterareexpressedin this way.

The remainingsectionsof this chapterapply dynamic programmingto avariety of problems.Theseexamplesshouldhelpyou understandthe methodbetterand alsorealizethe advantageof dynamicprogrammingover explicitlyenumeratingalldecisionsequences.

EXERCISES1.The principleof optimality doesnot hold for every problemwhose

solutioncan be viewed as the result of a sequenceof decisions.Findtwo problemsfor which the principledoesnot hold.Explainwhy theprincipledoesnot hold for theseproblems.

2.Forthe graphof Figure5.1,find the shortestpath betweenthe nodes1and 2. Usethe recurrencerelationsderived in Examples5.10and5.13.

5.2 MULTISTAGEGRAPHSA multistagegraphG = (V, E) is a directedgraph in which the verticesarepartitionedinto k > 2 disjointsetsVi, 1< i < k. In addition,if {u,v) is anedgein E, then u G Vi and v G Vi+\\ for somei,1< i < k. ThesetsV\\ andVfc aresuchthat |Vi|= \\V^\\

= 1.Let s and t, respectively,be the verticesinV\\ and Vfc. The vertexs is the source,and t the sink. Let c(i,j)be the costof edge(i,j).Thecostof a path from s to t is the sum of the costsof theedgeson the path.Themultistagegraph problemis to find a minimum-cost


Figure5.1Graph for Exercise2 (Section5.1)

path from s to t. Eachset Vi definesa stagein the graph.Becauseof theconstraintson E, every path from s to t starts in stage1,goestostage2,then to stage3, then to stage4, and soon, and eventually terminatesinstagek. Figure5.2showsa five-stagegraph.A minimum-costs to t path isindicatedby the brokenedges.

Many problemscanbeformulatedasmultistagegraphproblems.We giveonly oneexample.Considera resourceallocationproblemin which n unitsof resourceare to be allocatedto r projects.If j, 0 < j < n, units of theresourceare allocatedtoprojecti, then the resultingnet profit is N(i,j).Theproblemis toallocatethe resourceto the r projectsin sucha way as tomaximizetotalnet profit. Thisproblemcanbe formulated as an r + 1stagegraphproblemas follows.Stagei, 1< i < r, representsprojecti.Therearen +1verticesV(i,j),0 < j < n, associatedwith stagei, 2 < i < r. Stages1and r + 1eachhave onevertex,V(l,0) = sand V(r+ 1,n) = t, respectively.VertexV(i,j),2 < i < r, representsthe state in which a total of j unitsof resourcehave beenallocatedto projects1,2,..., i \342\200\224 1.Theedgesin Gare of the form (V(i,j),V(i + 1,1)}for all j < I and 1< i < r. Theedge{V(i,j),V(i+ 1,0))J ^ ') is assigneda weight or costof N(i,l\342\200\224 j) andcorrespondsto allocatingI \342\200\224 j units of resourceto projecti, 1< i < r. Inaddition,G has edgesof the type (V(r,j),V(r + l,n)).Eachsuchedgeisassigneda weight of maxo<p<n_j{iV(r,p)}.Theresultinggraph for a three-projectproblemwith n =\"4 is shown in Figure5.3.It shouldbe easy to seethat an optimalallocationof resourcesis defined by a maximumcosts tot path. This is easily convertedinto a minimum-costproblemby changingthe signof all the edgecosts.


V, V2 V3 V4 V5

Figure5.2Five-stagegraph

A dynamic programmingformulation for a A;-stagegraphproblemisobtained by first noticingthat every s to t path is the result of a sequenceof k \342\200\224 2 decisions.The ith decisioninvolves determiningwhich vertex inVi+i, 1< i < k \342\200\224 2,is to beon the path- It is easy to seethat the principleof optimality holds.Let p(i,j)bea minimum-costpath from vertexj in Vi

to vertext. Let cost(i,j)be the costof this path. Then, usingthe forwardapproach,we obtain

cost(i,j)= min {c(j,l)+ cost(i+ 1,1)} (5-5)

Since,cost(k\342\200\224 l,j) = c(j,t)if (j,t) \302\243 22 and cost(k\342\200\224 l,j) = oo if

{j,t)\302\243E, (5.5)may be solved for cost(l,s)by first computingcost(k\342\200\224 2,j)for allj G Vfc_2, then

cost(k\342\200\224 3,j) for all j 6 Vfc_3, and soon, and finallycost(l,s).Trying this out on the graphof Figure5.2,we obtain

cosi(3,6)= min {6+ cost(4,9),5+coai(4,10)}= 7

cosi(3,7)= min {4+ co5i(4,9),3+ cosi(4,10)}= 5


V(2,4) V(3,4)X = max{A?(3,0),N(3,l)}

Y=max{iV(3,0),iV(3,l),iV(3,2)}

Figure5.3Four-stagegraphcorrespondingto a three-projectproblem


cosi(3,8)= 7

cost{2,2)= min {4+ cosi(3,6),2 + cosi(3,7),1+cosi(3,8)}= 7

cosi(2,3)= 9

cos\302\243(2,4)= 18

cos\302\243(2,5)= 15

cost(l,l)= min {9+ cost(2,2),7+ coat(2,3),3+ coat(2,4),2 + cosi(2,5)}

= 16

Note that in the calculationof cost(2,2),we have reusedthe values ofcos\302\243(3, 6),cost(3,7), and cos\302\243(3,8) and soavoided theirrecomputation.Aminimum costs to t path has a costof 16.This path can be determinedeasily if we recordthe decisionmadeat eachstate (vertex). Let d(i,j)bethe value of I (whereI is a node) that minimizesc(j,l)+ cost(i+ 1,1)(seeEquation5.5).ForFigure5.2we obtain

10;8; d(2,5)

Let the minimum-costpath be s = 1,V2, i>3, \342\200\242\342\200\242

\342\200\242, Vk-\\,t. It is easy to seethat v2 = d(l,l)= 2,w3 = d(2.d(l,l))= 7, and v4 = d(3,d(2,d(l,l)))=d(3,7)= 10.

Beforewritingan algorithmto solve (5.5)for a generalfc-stage graph, letus imposean orderingon the verticesin V. This orderingmakesit easierto write the algorithm.We requirethat the n verticesin V are indexed1throughn. Indicesareassignedin orderof stages.First,s is assignedindex1,then verticesin V2 areassignedindices,then verticesfrom V3, and soon.Vertext has indexn. Hence,indicesassignedto verticesin V^+i arebiggerthan thoseassignedto verticesin Vj (seeFigure5.2).As a result of thisindexingscheme,costand d canbecomputedin the ordern\342\200\224 1,n \342\200\224 2,...,1.Thefirst subscriptin cost,p,and d only identifies the stagenumberand isomittedin the algorithm.Theresultingalgorithm,in pseudocode,is FGraph(Algorithm 5.1).

Thecomplexityanalysis of the function FGraph is fairly straightforward.If G is representedby its adjacencylists,then r in line 9 of Algorithm 5.1canbe found in timeproportionalto the degreeof vertexj. Hence,if G has\\E\\ edges,then the timefor the for loopof line7 is @(\\V\\ + \\E\\). Thetimefor the for loopof line16is @(k).Hence,the totaltimeis Q(\\V\\ + \\E\\). Inadditionto the spaceneededfor the input, spaceis neededfor cost[],d[],and p[ ].

d(3,6)=

rf(2,2) =

d(l,l) =

= 10; d(3,7) =

= 7; d(2,3) == 2

= 10; d(3,8)== 6; d(2,4)=


1 AlgorithmFGraph(Gr,k,n,p)2 // The input is a fc-stage graph G = (V, E) with n vertices3 // indexedin orderof stages.E is a setof edgesand c[i,j]4 // is the costof (i,j)-p[l:k] is a minimum-costpath.5 {6 cost[n]:=0.0;7 for j :=n \342\200\224 1to 1 step\342\200\2241 do8 { // Computecost[j].9 Let r bea vertexsuchthat (j,r) is an edge10 of G and c[?,r] + cost[r]is minimum;11 cost{j]:=c[j,r]+ cost[r\\\\12 d|j]:=r;13 }14 // Find a minimum-costpath.15 p[l]:=1;p[k] :=n;16 for j :=2 to k -1dop[j]:=d[p|>'-1]];17 }

Algorithm5.1Multistagegraphpseudocodecorrespondingto the forwardapproach

The multistagegraph problemcan alsobe solved using the backwardapproach.Let bp(i,j)be a minimum-costpath from vertexs to a vertexjin V{. Let boost(i,j)be the costof bp(i,j).Fromthe backward approachweobtain

bcost(i,j)= min {bcost(i\342\200\224 1,1)+c(l,j)} (5.6)

Sincebcost(2,j)= c(l,j) if (l,j)G \302\243 and bcoat(2,j)= oo if(l,j)\302\243\302\243,

bcost(i,j)can be computedusing (5.6)by first computingfrcosi for i = 3,then for i = 4, and soon.Forthe graphof Figure5.2,we obtain

bcost(3,6)= min {bcost(2,2)+ c(2,6),bcost(2,3)+ c(3,6)}= min {9+ 4,7+ 2}= 9

6cosi(3,7)= 116cosi(3,8)= 106cosi(4,9)= 15


bcost(4,10)= 146coat(4,11)= 166co^(5,12)= 16

Thecorrespondingalgorithm,in pseudocode,to obtain a minimum-costs \342\200\224 t path is BGraph (Algorithm 5.2).The first subscripton bcost,p,andd areomittedfor the samereasonsas before.This algorithmhas the samecomplexityas FGraph providedG is now representedby its inverseadjacencylists(i.e.,for eachvertexv we have a list of verticesw suchthat (w,v) G E).

1 AlgorithmBGraph(G,k, n,p)2 // Samefunction as FGraph3 {4 bcost[l]:=0.0;5 for j :=2 to n do6 { // Computebcost[j],7 Let r be suchthat (r,j)is an edgeof8 G and bcost[r]+ c[r,j]is minimum;9 bcost\\j]:=bcost[r]+c[r,j];10 d\\j] :=r;11 }12 // Find a minimum-costpath.13 p[l]:=1;p[k] :=n;14 for j :=A; -1to 2 dop[j]:=d[p[;+ 1]];15 }

Algorithm5.2Multistagegraph pseudocodecorrespondingto backwardapproach

Itshouldbeeasy to seethat both FGraph and BGraph workcorrectlyevenon a moregeneralizedversion of multistagegraphs.In this generalization,the graph is permittedto have edges(u,v) such that u G Vi,v G Vj, andKj.Note:In the pseudocodesFGraph and BGraph,boost(i,j)is set to oo forany (i,j)0 E. When programmingthesepseudocodes,one coulduse themaximumallowablefloating point numberfor oo.If the weight of any suchedgeis addedto someother costs,a floating point overflow might occur.Careshouldbe taken toavoid suchoverflows.


EXERCISES1.Find a minimum-costpath from s to t in the multistagegraph of

Figure5-4. Do this first usingthe forward approachand then usingthe backward approach.

Figure5.4Multistagegraph for Exercise1

2.Refine Algorithm 5.1into a program.Assume that G is representedby its adjacencylists.Testthe correctnessof your codeusingsuitablegraphs.

3. ProgramAlgorithm 5.1.Assume that G is an array G[l: e,1 : 3].Eachedge(i,j),i < j, of G is storedin G[q],for someq and G[q,1]=i, G[q,2]= j, and G[q,3] = costof edge(i,j)-Assume that G[q,1]<G[q+ 1,1]for 1 < q < e, wheree is the number of edgesin themultistagegraph.Testthe correctnessof your function usingsuitablemultistagegraphs.What is the timecomplexity of your function?

4. ProgramAlgorithm 5.2for the multistagegraph problemusing thebackward approach.Assume that the graph is representedusinginverse adjacencylists.Testits correctness.What is its complexity?

5.Do Exercise4 usingthe graphrepresentationof Exercise3.This time,however,assumethat G[q,2] < G[q+ 1,2]for 1< q <e.

6- Extendthe discussionof this sectiontodirectedacyclicgraphs(dags).Supposethe verticesof a dag arenumberedsothat alledgeshave theform {i,j),i <j- What changes,if any, needtobemadetoAlgorithm5.1to find the lengthof the longestpath from vertex1to vertexn?

5.3.ALL-PAIRSSHORTESTPATHS 265

7. [W. Miller] Show that BGraphl computesshortestpaths for directedacyclicgraphsrepresentedby adjacencylists(insteadof inverseadjacency lists as in BGraph).

1 AlgorithmBGraphl(G,n)2 {3 bcost[l]:=0.0;4 for j :=2 to n dobcost{j]:=oo;5 for j :=1to n \342\200\224 1do6 for eachr suchthat (j,r)is an edgeof G do7 bcost[r]:=mm(bcost[r],bcost[j]+ c\\j,r]);8 }

Note:Thereis apossibilityof a floating point overflow in this function.In suchcasesthe programshouldbe suitably modified.

5.3 ALL-PAIRSSHORTESTPATHSLet G = (V, E) be a directedgraph with n vertices.Let costbe a costadjacencymatrixfor G suchthat cost(i,i)= 0,1< i < n. Then cost(i,j)is the length (or cost) of edge{i,j) if (i,j)G E(G) and cost(i,j)= oo ifi 7^ j and (i,j)0E(G).The all-pairsshortest-pathproblemis todeterminea matrixA such that A(i,j)is the length of a shortestpath from i toj.The matrixA can be obtainedby solving n single-sourceproblemsusingthe algorithmShortestPathsof Section4.8.Sinceeachapplicationof thisprocedurerequires0{n2)time,the matrixA canbeobtainedin 0(n3)time.We obtain an alternate0(nA) solutionto this problemusingthe principleof optimality. Our alternatesolutionrequiresa weaker restrictionon edgecoststhan requiredby ShortestPaths.Rather than requirecost(i,j)> 0,for every edge(i,j),we only requirethat G have no cycles with negativelength. Note that if we allow G to containa cycle of negativelength,thenthe shortestpath betweenany two verticeson this cycle has length \342\200\224oo.

Let us examinea shortesti to j path in G, i ^ j. This path originatesat vertexi and goesthroughsomeintermediatevertices(possiblynone)andterminatesat vertexj. We can assumethat this path containsno cyclesfor if thereis a cycle,then this can be deletedwithout increasingthe pathlength(nocyclehas negativelength). If k is an intermediatevertexon thisshortestpath, thenthe subpathsfrom i to k and from k to j must beshortestpaths from i to k and k to j, respectively.Otherwise,the i to j path is notof minimum length. So,the principleof optimality holds.This alertsus tothe prospectof usingdynamic programming.If k is the intermediatevertexwith highestindex,then the i to k path is a shortesti to k path in G goingthroughno vertexwith indexgreaterthan k \342\200\224 1.Similarly the k to j pathis a shortestk to j path in G goingthroughno vertexof indexgreaterthan


k \342\200\224 1. We can regardthe constructionof a shortesti to j path as firstrequiringa decisionas to which is the highestindexedintermediatevertexk. Oncethis decisionhas beenmade,we needto find two shortestpaths,onefrom i tok and the otherfrom k to j. Neitherof thesemay gothroughavertexwith indexgreaterthan k \342\200\224 1.UsingAk(i,j) to representthe lengthof a shortestpath from i to j goingthroughno vertexof indexgreaterthank, we obtain

A(i,j)= min { min {Ak~\\i,k) +Ak~\\k,j)},cost(i,j)} (5.7)l<k<n

Clearly, A\302\260(i,j)= cost{i,j),1 < i < n, 1 < j < n. We can obtain

a recurrencefor Ak(i,j) usingan argumentsimilarto that usedbefore.Ashortestpath from % to j goingthroughno vertexhigherthan k eithergoesthroughvertexk or it doesnot.If it does,Ak(i,j) = Ak~1(i,k) +Ak~l(k,j).If it doesnot;then no intermediatevertexhas indexgreaterthan k\342\200\2241. HenceAk{i,j)= Ak~1(i,j).Combining,we get

Ak(i,j) = mm {Ak-\\i,j),Ak~l{i,k)+Ak-\\k,j)},k>\\ (5.8)

Thefollowing exampleshows that (5.8)is not true for graphswith cyclesofnegativelength.

Example5.14Figure5.5showsa digraphtogetherwith its matrixA0. Forthis graphA2(l,3)7^ minjA^l,3), A1{1,2)+Al{2,3)}= 2.Insteadwe seethat A2(l,3)= -00.Thelengthof the path

1,2,1,2,1,2,...,1,2,3canbemadearbitrarily small.This is sobecauseof the presenceof the cycle121which has a lengthof \342\200\2241. \342\226\241

Recurrence(5.8)can be solved for An by first computingA1, then A2,then A3, and soon.Sincethereis no vertexin G with indexgreaterthan n,A(i,j)= An(i,j).FunctionAllPaths computesAn(i,j).Thecomputationis done inplacesothe superscripton A is not needed. The reasonthiscomputationcan be carriedout in-placeis that Ak(i,k) = Ak~l{i,k)andAk(k,j) = Ak~1(k,j).Hence,when Ak is formed,the kth columnand row donot change.Consequently,when Ak(i,j) is computedin line11of Algorithm5.3,A{i,k)= Ak~l(i,k) = Ak{i,k)and A{kJ)= Ak~1{k,j)= Ak{k,j).So,the old values on which the new values are baseddo not changeon thisiteration.


-20 1

-2 0oo oo

oo

1

0

Figure5.5Graphwith negativecycle

0 AlgorithmAIIPaths(cos\302\243, A, n)1 // cost[l:n, 1:n] is the costadjacencymatrixof a graphwith2 // n vertices;A[i, j] is the costof a shortestpath from vertex3 // i to vertexj. cost[i,i]= 0.0,for 1< i <n.4 {5 for i :=1to n do6 for j :=1to n do7 ^[^ j]:=cost[i,j];// Copy cos\302\243 into A.8 for /c :=1to n do9 for i :=1to n do10 for j :=1to n do11 A[i,j]:=mm(A[i,j],A[i,k]+A[k,j]);12 }

Algorithm5.3Functionto computelengthsof shortestpaths


Example5.15The graph of Figure 5.6(a)has the cost matrixofFigure 5.6(b).The initial A matrix, A(\302\260\\ plus its values after 3 iterationsA^l\\ A^2\\ and A^ aregiven in Figure5.6. \342\226\241

A0

12

3

1063

2

4

0OO

3

112

0

(a)Exampledigraph (b)A\302\260

A1

1

2

3

1063

2

4

07

3

112

0

(C)A1

A2

12

3

1063

2

4

07

3

62

0

A3

1

2

3

1

05

3

2

4

07

3

62

0

(d)A2 (e)A3

Figure5.6Directedgraphand associatedmatrices

Let M = max {cost(i,j)\$i,j)6 E(G)}.It is easy to seethat An(ij) <(n - 1)M.Fromthe working of AllPaths, it is clearthat if (i,j) \302\243 E(G)and i ^ j, then we can initializecost(i,j)to any numbergreaterthan(n \342\200\224 \$M (rather than the maximumallowablefloating point number). If,at termination,A(i,j)> (n \342\200\224 1)M,then thereis no directedpath from i toj in G.Even for this choiceof oo,careshouldbetakentoavoid any floatingpoint overflows.

Thetimeneededby AllPaths (Algorithm 5.3)is especiallyeasy todetermine becausethe loopingis independentof the data in the matrixA. Line11is iteratedn3 times,and sothe timefor AllPaths is @(n3).An exerciseexaminesthe extensionsneededtoobtainthe i to j paths with theselengths.Somespeedupcanbeobtainedby noticingthat the innermostfor loopneedbeexecutedonly when A(i,k) and A(k,j) arenot equalto oo.


EXERCISES1. (a) Doesthe recurrence(5.8)hold for the graphof Figure5.7?Why?


(b) Why doesEquation5.8not hold for graphswith cyclesof negativelength?

Modifythe function AllPaths so that a shortestpath is output for eachpairof vertices(i, j).What arethe timeand spacecomplexitiesof thenew algorithm?Let A be the adjacencymatrixof a directedgraph G. Define thetransitiveclosureA+ of A to bea matrixwith the property A+ (i,j)= 1iff G has a directedpath, containingat leastoneedge,from vertex ito vertexj. A+(i, j) \342\200\224 0 otherwise.The reflexive transitiveclosureA*is a matrixwith the property A*(i,j)= 1iff G has a path, containingzeroor moreedges,from i to j. A*(i,j)= 0 otherwise.

(a) Obtain A+ and A* for the directedgraph of Figure5.8.


(b) Let Ak(i, j) = 1iff thereis a path with zeroormoreedgesfrom ito j goingthroughno vertexof indexgreaterthan k. Define A0in termsof the adjacencymatrixA.

2.

3.


(c) Obtain a recurrencebetweenAk and Ak 1 similarto (5.8).Usethe logicaloperatorsor and andratherthan min and +.

(d) Write an algorithm,usingthe recurrenceof part (c), to find A*.Your algorithmcan useonly 0(n2)space.What is its timecomplexity?

(e) Show that A+ = A x A*, wherematrixmultiplicationis definedas A+(i,j)= Vl=1(A{i,k)A A*(k,j)).TheoperationV is thelogicaloroperation,and A the logicalandoperation.HenceA+may becomputedfrom A*.

5.4 SINGLE-SOURCESHORTESTPATHS:GENERALWEIGHTS

We now considerthe single-sourceshortestpath problemdiscussedin Section4.8when someorallof the edgesof the directedgraphG may have negativelength. ShortestPaths(Algorithm4.14)doesnot necessarilygive the correctresultson suchgraphs.To seethis, considerthe graph of Figure5.9.Letv = 1be the sourcevertex.Referringback to Algorithm 4.14,sincen = 3,the loopof lines12to 22 is iteratedjust once.Also u = 3 in lines15and16,and sono changesare madeto dist[]. Thealgorithmterminateswithdist[2]= 7 and

cfo\302\243[3]= 5. Theshortestpath from 1 to 3 is 1,2,3.This

path has length2,which is lessthan the computedvalue ofgKs\302\243[3].

Figure5.9Directedgraphwith a negative-lengthedge

When negativeedgelengthsare permitted,we requirethat the graphhave no cycles of negativelength. This is necessaryto ensurethat shortestpathsconsistof a finite numberof edges.For example,in the graphof Figure5.5,the lengthof the shortestpath from vertex 1to vertex3 is \342\200\224oo. Thelengthof the path

1,2,1,2,1,2,---,1,2,3can bemadearbitrarily smallas was shown in Example5.14.

When thereare no cycles of negativelength, there is a shortestpathbetweenany two verticesof an n-vertexgraph that has at most n \342\200\224 1edges

5.4.SINGLE-SOURCESHORTESTPATHS:GENERAL WEIGHTS271

on it. To seethis, note that a path that has morethan n \342\200\224 1edgesmustrepeatat leastone vertexand hencemust containa cycle.Eliminationofthe cycles from the path resultsin anotherpath with the samesourceanddestination.This path is cycle-freeand has a length that is no morethanthat of the originalpath, as the lengthof the eliminatedcycleswas at leastzero.We can use this observationon the maximumnumberof edgeson acycle-freeshortestpath to obtainan algorithmtodeterminea shortestpathfrom a sourcevertexto all remainingverticesin the graph.As in the caseof ShortestPaths(Algorithm4.14),we computeonly the length,dist[u],ofthe shortestpath from the sourcevertexv tou. An exerciseexaminestheextensionneededto constructthe shortestpaths.

Let disr[u]be the length of a shortestpath from the sourcevertex vto vertexu under the constraintthat the shortestpath containsat most Iedges.Then,distl[u]= cost[v,u],1< u < n. As notedearlier,when thereare no cycles of negativelength,we can limit our searchfor shortestpathsto paths with at most n \342\200\224 1edges.Hence,distn~l[u]is the length of anunrestrictedshortestpath from v to u.

Our goal then is to computedistn~1[u]for all u. This can be doneusing the dynamic programmingmethodology.First,we makethe followingobservations:

1.If the shortestpath from v tou with at most k, k > 1,edgeshas nomorethan k \342\200\224 1edges,then distk[u] \342\200\224 distk~l[u\\.

2.If the shortestpath from v to u with at most k, k > 1,edgeshasexactlyk edges,then it is madeup of a shortestpath from v to somevertexj followed by the edge{j,u).Thepath from v toj has k \342\200\224 1edges,and its length is distk~l[j].All verticesi such that the edge(i,u)is in the grapharecandidatesfor j. Sincewe areinterestedin ashortestpath, the i that minimizesdistk~1[i]+ cost[i,u] is the correctvalue for j.

Theseobservationsresult in the following recurrencefor dist:

distk[u] = min {distk~1[u],min {distk~l[i\\+ cost[i,u]}}i

Thisrecurrencecanbeusedto computedistk from dist1'-1,for k \342\200\224 2,3,...,n \342\200\224 1.

Example5.16Figure5.10gives a seven-vertexgraph, togetherwith thearrays distk,k = 1,...,6.Thesearrays were computedusingthe equationjust given. Forinstance,distk[l]= 0 for all k since1 is the sourcenode.Also, dist1^]= 6, dist1[3] = 5, and distx[A] = 5, sincethereareedgesfrom


1to thesenodes.Thedistancedistl\\\\ is oo for the nodes5,6,and 7 sincethereareno edgesto thesefrom 1.

dist2[2]= min {dist1[2],mhij distl[i]+cost[i,2]}= min {6,0+6,5\342\200\224 2,5+oo,oo +oo,oo +oo,oo +oo}= 3

Herethe terms0 +6,5 \342\200\224 2, 5 +oo,oo +oo,oo +oo,and oo + oo correspondto a choiceof i = 1,3,4,5,6,and 7, respectively.Therestof the entriesarecomputedin an analogousmanner. \342\226\241

k

1

23456

1

000000

2631

1

1

1

dis3533333

f*[1..7]4 55 \302\260\302\260

5 55 25 05 05 0

6 7OO OO

4 OO

4 7

4 54 34 3

(a) A directedgraph (b)distk

Figure5.10Shortestpaths with negativeedgelengths

An exerciseshows that if we use the samememory locationdist[u] for

distk[u],k = 1,..., n \342\200\224 1,then the final value of dist[u] is stilldistn~l[u\\.Usingthis fact and the recurrencefor distshown above,we arrive at thepseudocodeof Algorithm 5.4to computethe length of the shortestpathfrom vertexv toeachothervertexof the graph.This algorithmis referredto as the Bellmanand Fordalgorithm.

Each iterationof the for loopof lines7 to 12takes0(n2)timeifadjacency matricesareused and O(e) timeif adjacencylistsareused.Hereeis the numberof edgesin the graph.Theoverall complexityis Ofa3) whenadjacencymatricesareusedand 0(ne)when adjacencylistsareused.Theobservedcomplexityof the shortest-pathalgorithmcan be reducedbynoting that if noneof the distvalues changeon one iterationof the for loopof lines 7 to 12,then none will changeon successiveiterations.So,thisloopcan be rewrittento terminateeitherafter n \342\200\224 1iterationsor after the

5.4.SINGLE-SOURCESHORTESTPATHS:GENERAL WEIGHTS273

1 AlgorithmBe\\\\manFord(v, cost,dist,n)2 // Single-source/all-destinationsshortest3 // paths with negativeedgecosts4 {5 for i :=1to n do // Initializedist.6 dist[i]:=cost[v,\342\200\242/,'];

7 for k :=2 to n - 1do8 for eachu suchthat

\302\253/\302\273and u has

9 at leastoneincomingedgedo10 for each(i,u) in the graphdo11 if dist[u]> dist[i]+cost[i,u]then12 dist[u]:=dist[i]+cost[i,u];13 }

Algorithm5.4Bellmanand Fordalgorithmto computeshortestpaths

first iterationin which no distvalues are changed,whicheveroccursfirst.Another possibility is to maintaina queueof verticesi whose distvalueschangedon the previousiterationof the for loop.Thesearethe only valuesfor i that needtobeconsideredin line 10duringthe next iteration.Whena queueof thesevalues is maintained,we can rewritethe loopof lines7 to12sothat on eachiteration,a vertexi is removed from the queue,and thedistvalues of allverticesadjacentfrom i areupdatedas in lines11and 12.Verticeswhosedistvahies decreaseas a result of this areaddedto the endof the queueunlessthey are already on it. Theloopterminateswhen thequeuebecomesempty. Thesetwo strategiestoimprove the performanceofBellmanFord areconsideredin the exercises.Otherstrategiesfor improvingperformancearediscussedin Referencesand Readings. \342\226\241

EXERCISES1.Find the shortestpaths from node1to every othernodein the graph

of Figure5.11usingthe Bellmanand Fordalgorithm.2.Provethe correctnessof BellmanFord (Algorithm 5.4).Note that this

algorithmdoesnot faithfully implementthe computationof therecurrence for distk.In fact, for k < n \342\200\224 1,the distvaluesfollowing iterationk of the for loopof lines7 to12may not bedistk.

3.Transform BellmanFord into a program.Assume that graphsarerepresented usingadjacencylistsin which eachnodehas an additionalfield



calledcostthat gives the lengthof the edgerepresentedby that node.As a result of this, thereis no costadjacencymatrix.Generatesometest graphsand test the correctnessof your program.

4. Rewritethe algorithmBellmanFord sothat the loopof lines7 to 12terminateseitherafter n \342\200\224 1 iterationsor after the first iterationinwhich no distvalues arechanged,whicheveroccursfirst.

5. RewriteBellmanFord by replacingthe loopof lines7 to 12with codethat usesa queueof verticesthat may potentially result in a reductionof otherdistvertices.Thisqueueinitially containsall verticesthat areadjacentfrom the sourcevertexv. On eachsuccessiveiterationof thenew loop,a vertex i is removed from the queue (unlessthe queueisempty), and the distvalues to verticesadjacentfrom i areupdatedasin lines11and 12of Algorithm 5.4.When the distvalue of a vertexis reducedbecauseof this, it is addedto the queueunlessit is alreadyon the queue.

(a) Prove that the new algorithmproducesthe sameresultsas theoriginalone.

(b) Show that the complexityof the new algorithmis no morethanthat of the originalone.

6. Comparethe run-time performanceof the Bellmanand Fordalgorithms of the precedingtwo exercisesand that of Algorithm 5.4.Forthis, generatetestgraphsthat will exposethe relativeperformancesofthe threealgorithms.

5.5.OPTIMAL BINARYSEARCHTREES(*) 275

7. ModifyalgorithmBellmanFord sothat it obtainsthe shortestpaths, inadditionto the lengthsof thesepaths.What is the computingtimeofyour algorithm?

5.5 OPTIMAL BINARYSEARCHTREES (*)

for

( do ) ('while)

( int ) ( if \") (while

(a)

Figure5.12Two possiblebinary searchtrees

Given a fixed set of identifiers,we wish to createa binary searchtree(seeSection2.3)organization.We may expectdifferent binary searchtreesfor the sameidentifierset to have different performancecharacteristics.Thetreeof Figure5.12(a),in the worst case,requiresfour comparisonsto findan identifier,whereasthe treeof Figure5.12(b)requiresonly three.On theaveragethe two treesneed12/5and 11/5comparisons,respectively.Forexample,in the caseof tree(a), it takes1,2,2,3,and 4 comparisons,respectively, to find the identifiersfor,do.while,int,and if. Thus the averagenumberof comparisonsis 1+2+2+3+4_

j^ This calculationassumesthateachidentifieris searchedfor with equalprobability and that no unsuccessfulsearches(i.e.,searchesfor identifiers not in the tree) aremade.

In a generalsituation,we can expectdifferent identifiers to be searchedfor with different frequencies(or probabilities).In addition,we can expectunsuccessfulsearchesalsoto be made. Let us assumethat the given setof identifiers is {01,02,...,an}with ai < 02 < \342\200\242\342\200\242\342\200\242 < an. Let p(i) be theprobability with which we searchfor Oj. Let q(i) be the probability thatthe identifier x beingsearchedfor is such that Oj < x < ai+i,0 < i < n

(assumeoq =\342\200\22400 and an+\\ = +00).Then,Xô<i<n?(*)^s the probability of


an unsuccessfulsearch.Clearly, Yl,i<i<nP{^)+ So<i<n</(*) = 1-Given thisdata,we wish to constructan optimalbinary searclTtreefor {a\\,02,...,an}.First, of course,we must be preciseabout what we mean by an optimalbinary searchtree.

In obtaininga costfunction for binary searchtrees,it is useful to add afictitiousnodein placeof every empty subtreein the searchtree.Suchnodes,calledexternalnodes,aredrawn squarein Figure5.13.All othernodesareinternalnodes.If a binary searchtree representsn identifiers,then therewill beexactlyn internalnodesand n + 1(fictitious)externalnodes.Everyinternalnoderepresentsa point wherea successfulsearchmay terminate.Every externalnoderepresentsa point wherean unsuccessfulsearchmayterminate.

(a)

Figure5.13Binary searchtreesof Figure5.12with externalnodesadded

If a successfulsearchterminatesat an internalnodeat level I,then Iiterations of the while loopof Algorithm 2.5areneeded.Hence,the expectedcostcontributionfrom the internalnodefor aj is p(i) * level(aj).

Unsuccessfulsearchesterminatewith t = 0 (i.e.,at an externalnode) inalgorithmISearch(Algorithm 2.5).The identifiers not in the binary searchtreecan be partitionedinto n + 1equivalenceclassesEi,0 an. It is easy toseethat for all identifiers in the sameclassEi,the searchterminatesat the sameexternalnode.Foridentifiers indifferent Ei the searchterminatesat different externalnodes.If the failure


node for Ei is at level Z, then only I \342\200\224 1 iterationsof the while looparemade.Hence,the costcontributionof this node is q(i) * (level(Ei)\342\200\224 1).

The precedingdiscussionleadsto the following formula for the expectedcostof a binary searchtree:

J2 p(i) * level{a,i)+ Y^ q{i)* {level(Ei)- 1) (5.9)l<i<n 0<i<n

We defineanoptimalbinary searchtreefor the identifierset{a\\,02,...,an}tobea binary searchtreefor which (5.9)is minimum.

Example5.17Thepossiblebinary searchtreesfor the identifier set (ai,02^3)= (do,if, while) aregiven if Figure5.14.With equalprobabilitiesp(i) = q(i) = 1/7for alli,we have

cost(treea) = 15/7 cost(treeb) = 13/7cosi(treec) = 15/7 cost(treed) = 15/7cost(tieee) = 15/7

As expected,tree b is optimal. With p(l)= .5,p(2) = .1,p(3) = .05,q(0)= .15,q(l) = .1,q(2)= .05and g(3)= .05we have

cost(treea)cost(tveec)co.s\302\243(tree e)

= 2.65= 1.5= 1.6

cost(treeb) =

cost(treed) == 1.9= 2.05

Forinstance,cost(treea) can be computedas follows. Thecontributionfrom successfulsearchesis 3 * 0.5+2*0.1+0.05\342\200\224 1.75and the contributionfrom unsuccessfulsearchesis 3 * 0.15+ 3 * 0.1+ 2 * 0.05+0.05= 0.90.Allthe othercostscanalsobecalculatedin a similarmanner.Treec is optimalwith this assignmentof p'sand g's. \342\226\241

To apply dynamic programmingto the problemof obtainingan optimalbinary searchtree,we need to view the constructionof sucha treeas theresult of a sequenceof decisionsand then observethat the principleof op-timality holdswhen appliedto the problemstateresultingfrom a decision.A possibleapproachto this would be to makea decisionas to which of theaj'sshouldbe assignedto the rootnodeof the tree. If we choosea^, thenit is clearthat the internalnodesfor a\\, a?,...,afc_i as well as the externalnodesfor the classesEq,Ei,...,E^-iwill liein the left subtreeI of the root.Theremainingnodeswill be in the right subtreer. Define

cost(l)= ^ p(i) * level(aj)+ ^ q(i) *(level(\302\243j)

- 1)Ki<k 0<i<k


while

if ^\342\226\241

if

do ) (while

do\342\226\241

(b)

(a)

do

while

(c) (d) (e)

Figure5.14Possiblebinary searchtrees for the identifier set {do,if,while}


and

cost(r)= ^2 P(0* level(aj)+ J^ q(i) * (level(-Ej)\342\200\224 1)k<i<n k<i<n

In both casesthe level is measuredby regardingthe rootof the respectivesubtreeto be at level 1.

( ak )

I r

Figure5.15An optimalbinary searchtreewith roota^

Usingw(i,j)to representthe sum q(i)+ X];=i+i(9(0+p(0)>we obtainthe following as the expectedcostof the searchtree (Figure5.15):

p(k)+cost(l)+cost(r)+w(0,k - 1)+w(k,n) (5.10)If the tree is optimal,then (5.10)must be minimum. Hence,cost(l)

must beminimum over allbinary searchtreescontaininga\\, 02,...,a,k-iandEq,E\\,...,Eie_i.Similarly cost(r)must be minimum.If we usec(i,j)torepresentthe costof an optimalbinary searchtreetij containingaj+i,...,a,jand E{,...,Ej,then for the tree to be optimal,we must have cost(I) =c(0,k\342\200\224 1) and cost(r)= c(k,n).In addition,k must bechosensuchthat

p(k) + c(0,k -1)+c(k,n) +w(0,k - 1)+w(k,n)

is minimum.Hence,for c(0,n) we obtain

c(0,n)=min {c(0,k-1)+c(k,n)+p(k)+w(0,k-l) +w(k,n)} (5.11)l<fe<n

We cangeneralize(5.11)to obtain for any c(i,j)

c(i,j) = min {c(i,k \342\200\224 1)+c(k,j)+p(k)+w(i,k \342\200\224 1)+w(k,j)}i<k<j


c(i,j) = mm {c(i,k-1)+c(kj)}+w(i,j) (5.12)i<k<]

Equation5.12can be solved for c(0,n)by first computingallc(i,j)suchthat j \342\200\224 i = 1 (notec(i,i)= 0 and w(i,i)= q(i), 0 < i < n). Next wecan computeallc(i,j)such that j \342\200\224 i = 2,then allc(i,j)with j \342\200\224 i = 3,and so on.If duringthis computationwe recordthe rootr(i,j)of eachtreeUj, then an optimalbinary searchtreecanbeconstructedfrom theser(i,j).Note that r(i,j)is the value of k that minimizes(5.12).Example5.18Let n = 4 and (01,02,03,04)= (do,if, int,while).Letp(l:4) = (3,3,1,1)and g(0 :4) = (2,3,1,1,1).Thep'sand g'shave beenmultipliedby 16for convenience.Initially, we have w(i,i)= q(i),c(i,i)= 0andr(i,i)= 0,0 (0,l)c(0,l)r(0,l)iu(l,2)c(l,2)r(0,2)u;(2,3)c(2,3)r(2,3)u;(3,4)c(3,4)r(3,4)

= p(l)+q{l)+w(0,0)=8= io(0,l)+min{c(0,0)+c(l,l)}= 1= p(2)+<7(2)+ti>(l,l)= 7= u;(l,2) + min {c(l,1)+c(2,2)}= 2= p(3)+g(3)+u;(2,2)= 3= w(2,3) + min {c(2,2)+c(3,3)}= 3= p(4)+<7(4)+ti>(3,3)= 3= io(3,4)+ min {c(3,3) +c(4,4)}= 4

Knowingio(i,i + 1) andc(i,\302\253

+ 1),0 < \302\253 < 4, we can againuse Equation5.12to computew(i,i+2), c(i,i+2), and r(i,i+2), 0 < i < 3.Thisprocesscan be repeateduntil io(0,4),c(0,4),and r(0,4)are obtained.The tableof Figure5.16shows the resultsof this computation.Theboxin row i andcolumnj showsthe valuesof w(j,j +i), c(j,j+i) and r(j,j+i) respectively.Thecomputationis carriedout by row from row 0 to row 4.Fromthe tablewe see that c(0,4)= 32 is the minimum costof a binary searchtree for(01,02,03,04).Therootof tree

\302\24304is 0-2-Hence,the left subtreeis

\302\24301and

the right subtree 4- Tree\302\24301

has roota\\ and subtrees\302\24300

and t\\\\. Tree\302\24324

has root03;its left subtreeis\302\24322

and its right subtree\302\24334. Thus, with the

data in the tableit is possibleto reconstruct\302\243q4- Figure5.17shows

\302\243q4-a


0 12 3 4

w00 = 2coo=0roo =0

Woi = 8c01= 8'\"01

= 1

w02 = 12c02= 19r02 = 1

w03 = 14c03= 25^03 = 2

W04 = 16c04= 32ro4 = 2

wu =3cu = 0r\342\200\236

=0

w12 =7c12=7r12 =2

W|3 = 9C,3= 12'\"13=2

W14 = 11c14= 19r14= 2

W22 = 1c22=0r22 =0w23 =3C23=3^23 =3

w24 =5c24= 8r24 = 3

w33 = 1

c33=0r33 =0w34 =3c34= 3r34=4

w44

c44r44

= 1

= 0= 0

Figure5.16Computationof c(0,4),w(0,4),and r(0,4)

Figure5.17Optimalsearchtreefor Example5.18


Theabove exampleillustrateshow Equation5.12can be used todetermine the c'sand r'sand alsohow to reconstructton knowingthe r's.Let usexaminethe complexityof this procedureto evaluatethe c'sand r's.Theevaluationproceduredescribedin the aboveexamplerequiresus to computec(i,j)for (j \342\200\224 i) = 1,2,...,n in that order. When j \342\200\224 i = m, therearen \342\200\224 m + 1c(i,j)'sto compute.Thecomputationof eachof thesec(i,j)'srequiresus tofind the minimum of m quantities(seeEquation5.12).Hence,eachsuch c(i,j)can be computedin time0(m).The total time for allc(i,j)'swith j \342\200\224 i = m is therefore0(nm\342\200\224 m2).Thetotaltimeto evaluateall the

c(\302\253,j)'sand r(\302\253,j)'s is therefore

Y^ {nm-m2)= 0(n3)l<m<n

We candobetterthan this usinga resultdue to D.E.Knuthwhichshowsthat the optimalk in Equation5.12can be found by limitingthe searchtothe range r(i,j \342\200\224 1) < k < r(i+ l,j). In this casethe computingtimebecomes0(n2)(seethe exercises).Thefunction OBST(Algorithm5.5)usesthis result to obtain the values of w(i,j),r(i,j),and c(i,j),0 < i < j < n,in 0(n2)time.Thetree ton can be constructedfrom the values of r(i,j)in

0(n)time.Thealgorithmfor this is left as an exercise.

EXERCISES

1.Usefunction OBST(Algorithm 5.5) to computew(i,j),r(i,j),andc(i,j),0 < i < j < 4, for the identifier set (01,02,03,04)= (cout,float,if, while) with p(l)= 1/20,p(2) = 1/5,p(3) = 1/10,p(4) =1/20,q{0)= 1/5,g(l) = 1/10,q{2)= 1/5,g(3) = 1/20,and g(4) =1/20.Usingthe r(\302\253,j)'s, constructthe optimalbinary searchtree.

2. (a) Show that the computingtimeof function OBST(Algorithm 5.5)is 0(n2).

(b) Write an algorithmto constructthe optimalbinary searchtreegiven the rootsr(i,j),0< i < j <n. Show that this can bedonein timeO(n).

3. Sinceoften only the approximatevaluesof thep'sand g'sareknown, it

is perhapsjust asmeaningfulto find a binary searchtreethat is nearlyoptimal. That is,its cost,Equation5.9,is almostminimalfor thegiven p'sand g's.This exerciseexploresan 0(nlogn) algorithmthatresultsin nearly optimalbinary searchtrees.Thesearchtreeheuristicwe use is


1 AlgorithmOBST(p,q,n)2 // Given n distinctidentifiers a\\ < a,2 < \342\226\240\342\200\242\342\200\242< an and probabilities3 // p[i],1< i <n, and q[i], 0 < i < n, this algorithmcomputes4 // the costc[i,j]of optimalbinary searchtreest^ for identifiers5 // ai+i,,, , , cij.It alsocomputesr[i,j],the rootof tij.6 // w[i,j] is the weight of tij.7 {8 for i :=0 to n \342\200\224 1do9 {10 // Initialize,11 w[i,i]:=q[i];r[i,i]:=0;c[i,i]:=0.0;12 // Optimaltreeswith onenode13 w[i,i+ i\\:=q[i]+q[i+ l]+p[i+ l]i14 r[i,i+ l]:=i+ l;15

c[\302\273,\302\273+ l] :=q[i] +q[i + 1]+p[i +1];

16 }17 io[n,n] :=q[n];r[n,n\\ :=0;c[n,n]:=0.0;18 for m :=2 to n do // Find optimaltreeswith m nodes.19 for i :=0 to n \342\200\224 m do20 {21 j :=i +m;22

w;[\302\253',j]:=io[i,j -1]+p[j]+ q\\j]\\

23 // Solve5.12usingKnuth's result.24 k :=Find(c,r,i,j);25 // A value of I in the ranger[i,j\342\200\224 1]< I26 //<?'[\302\253 +1,j] that minimizesc[i,I \342\200\224 1]+c[l,j];27 c[i,j]:=w[i,j]+c[i,k-1]+c[k,j]-,28 r[i,j]:=fc;29 }30 write (c[0,n],w[0,n],r[0,n]);31 }1 AlgorithmFind(c,r,\302\253, j)2 ^3 min :=oo;4 for m :=r[i,j\342\200\224 1]to r[i+ 1,j]do5 if (c[i,m\342\200\224 1]+c[m,j])<mm then6 {7 mm :=c[i,m\342\200\224 1]+c[m,,7'];I :=m;8 }9 returnI;10 }

Algorithm5.5Findinga minimum-costbinary searchtree


Choosethe root k such that \\w(0, k \342\200\224 1) \342\200\224 w(k,n)\\ is assmallaspossible.Repeat this procedureto find the left andright subtreesof the root.

(a) Usingthis heuristic,obtain the resultingbinary searchtree forthe dataof Exercise1.What is its cost?

(b) Write an algorithmimplementingthe above heuristic.Youralgorithm shouldhave timecomplexity0(nlogn).

5.6 STRING EDITINGWe are given two stringsX = x\\,X2,...,xnand Y = 2/1,2/2,\342\200\242\342\200\242-,2/m, where%i, 1< i < n, and yj, 1< j< m, aremembersof a finite set of symbolsknown as the alphabet. We want to transform X into Y usinga sequenceof edit operationson X. Thepermissibleeditoperationsare insert,delete,and change(a symbolof X into another),and thereis a costassociatedwithperformingeach.Thecostof a sequenceof operationsis the sum of the costsof the individual operationsin the sequence.Theproblemof stringeditingis to identify a minimum-costsequenceof editoperationsthat will transformX into Y.

Let D(xi)be the costof deletingthe symbol X{ from X, I(yj) be the costof insertingthe symbol yj into X, and C(xi,yj)be the costof changingthesymbol X{ of X into yj.

Example5.19Considerthe sequencesX =xi,X2,X3,x\302\261,x\302\247

= a,a,b,a,band Y = 2/1,2/2,2/3,2/4= b,a,b,b.Let the costassociatedwith eachinsertionand deletionbe1(for any symbol).Also letthe costof changingany symbolto any othersymbol be 2. One possibleway of transformingX into Y isdeleteeachXj, 1< i < 5, and insert eachyj, 1< j' < 4. Thetotal costofthis edit sequenceis 9. Another possibleeditsequenceis deletex\\ and X2and insert 2/4 at the end of stringX. Thetotalcostis only 3. \342\226\241

A solutionto the stringeditingproblemconsistsof asequenceof decisions,one for eachedit operation.Let \302\243 be a minimum-costedit sequencefortransformingX into Y. The first operation,O, in \302\243 is delete,insert,orchange.If \302\243'

=\302\243

\342\200\224 {O}and X' is the resultof applying O on X, then \302\243'

shouldbe a minimum-costeditsequencethat transformsX' into Y. Thusthe principleof optimality holdsfor this problem.A dynamic programmingsolutionfor this problemcan beobtainedas follows. Definecost(i,j)to bethe minimum costof any edit sequencefor transformingxi,X2, \342\226\240\342\226\240

\342\226\240, Xj into2/1,2/2,\342\200\242\342\200\242\342\200\242, 2/j (for 0 0,we can transform X into Y by a sequenceof

5.6.STRINGEDITING 285

deletes.Thus,cost(i,0)= cost(i\342\200\2241,0) +D(xi).Similarly,if i = 0 andj > 0,we get cost(0,j) = cost(0,j \342\200\224 1)+ I(yj).If i / 0 and j / 0,x\\,X2, \342\226\240\342\226\240

\342\226\240, x\\

can be transformedinto yi,y2,---,Vjin oneof threeways:

1.Transformx\\, X2, \342\200\242\342\200\242

\342\200\242, a>j-i into j/i, j/2> \342\200\242\342\200\242\342\200\2425 J/j usinga minimum-costeditsequenceand then deletex\\. Thecorrespondingcostis cost(i\342\200\224 1,j)+D(xi).

2.Transform x\\,X2, \342\200\242\342\200\242

\342\200\242, \302\243j_iinto yi, j/2, \342\200\242\342\200\242

\342\200\242, J/j-iusinga minimum-costedit sequenceand then changethe symbol x\\ to j/j. Theassociatedcostis cost(i\342\200\224 l,j\342\200\224 1)+ C(xi,yj).

3.Transformx\\,X2, \342\226\240\342\226\240

\342\226\240, X{ into j/i,y2, \342\226\240\342\226\240

\342\226\240, yj-iusinga minimum-costeditsequenceand then insert yj. This correspondsto a costof cost(i,j\342\200\224

1)+/(%\342\226\240)\342\200\242

The minimum costof any edit sequencethat transformsx\\,X2, \342\226\240\342\226\240\342\226\240,Xiinto yi, y2, \342\200\242\342\200\242

\342\200\242, yj (for i > 0 and j > 0) is the minimum of the above threecosts,accordingto the principleof optimality. Therefore,we arrive at thefollowing recurrenceequationfor cost(i,j):

cost(i,j)0 i=j=0cost(i-1,0)+D(xi) j = 0, i > 0cost(0,j-l)+ I(yj) i = 0, j>0cost'(i,j) i > 0, j > 0

(5.13)

wherecost'(i,j)= min { cost(i\342\200\224 l,j)+D(xi),cost(i- l,j- 1)+C(xi,yj),cost(i,j-l)+ I(yj) }

We have to computecost(i,j) for allpossiblesvaluesof i and j (0< i < nand 0 < j <m). Thereare (n+ l)(m+1) suchvalues.Thesevalues canbecomputedin the form of a table,M, whereeachrow of M correspondsto aparticularvalue of i and eachcolumnof M correspondsto a specificvalueof j. M(i,j)storesthe value cost(i,j).Thezerothrow can be computedfirst sinceit correspondsto performinga seriesof insertions.Likewisethezerothcolumncan alsobe computed.After this, one couldcomputetheentriesof M in row-majororder,starting from the first row. Rows shouldbe processedin the order1,2,...,n. Entriesin any row are computedinincreasingorderof columnnumber.

Theentriesof M can alsobe computedin column-majororder,startingfrom the first column.Lookingat Equation5.13,we seethat eachentry ofM takesonly 0(1)timeto compute.Thereforethe whole algorithmtakes0(mn)time.Thevalue cost(n,m)is the final answer we areinterestedin.Having computedall the entriesof M, a minimum edit sequencecan be


obtainedby a simplebackward tracefrom cost(n,m). This backward traceis enabledby recordingwhich of the threeoptionsfor i > 0,j' > 0 yieldedthe minimum costfor eachi and j.

Example5.20Considerthe stringeditingproblemof Example5.19.X =a,a,b, a,b and Y = 6,a,b, b. Eachinsertionand deletionhas a unit costanda changecosts2 units. Forthe casesi = 0,j > 1,and j = 0,i > 1,cost(i,j)can becomputedfirst (Figure5.18).Let us computethe restof the entriesin row-majororder.Thenext entry to becomputedis cost(l,1).cost(l,l)= min {cost(0,1)+D(xi),cost(0,0) +C(xi,yi),cost(l,0) +I(yi)}= min {2,2,2}=2

Next is computedcost(l,2).

cost{\\,2)= min {cosi(0,2) +D(xi),cosf(0,1)+ C{xuy2),cost{l,1)+ I{y2)}= min {3,1,3}= 1

Therestof the entriesarecomputedsimilarly. Figure5.18displays thewhole table.The value cosi(5,4)= 3. One possibleminimum-costeditsequenceis deletex\\, deletex2,and insert j/4. Another possibleminimumcosteditsequenceis changex\\ to y2 and delete

\302\2434.D

l~* 0 12 3 4

0-01234! \342\200\224 1 2 1 2 3

2-232343-3 2 3 2 3

4-4 3 2 3 45-54323Figure5.18Costtablefor Example5.20

5.7.0/1KNAPSACK 287

EXERCISES1.Let X = a,a,b,a,a,b,a,b,a,aand Y = b,a,b,a,a,b,a,b.Find a

minimum-costeditsequencethat transformsX into Y.

2. Presenta pseudocodealgorithmthat implementsthe string editingalgorithmdiscussedin thissection.Programit and test its correctnessusingsuitabledata.

3. Modifythe above programnot only to computecost(n,m) but alsotooutput a minimum-costeditsequence.What is the timecomplexityofyour program?

4. GivenasequenceX of symbols,a subsequenceof X is dennedto beanycontiguousportionof X. Forexample,if X = x\\, X2, \302\2433, \302\2434, \302\2435, X2, \302\2433

and #1,0:2,2:3aresubsequencesof X. Given two sequencesX and Y,presentan algorithmthat will identify the longestsubsequencethatis commonto both X and Y. This problemis known as the longestcommonsubsequenceproblem.What is the timecomplexityof youralgorithm?

5.7 0/1KNAPSACKThe terminology and notation used in this sectionis the sameas that inSection5.1.A solutionto the knapsackproblemcanbeobtainedby makinga sequenceof decisionson the variablesx\\, X2, \342\226\240

\342\226\240., xn. A decisiononvariableXi involves determiningwhichof the values 0 or 1is tobeassignedto it.Letus assumethat decisionson the x% aremadein the orderxn,xn-i,...,x\\.Followinga decisionon xn, we may be in one of two possiblestates:thecapacity remainingin the knapsackis m and no profit has accruedor thecapacity remainingis m \342\200\224 wn and a profit of pn has accrued.It is clearthatthe remainingdecisionsxn\342\200\224\\,..., X\\ must be optimalwith respectto theproblemstateresultingfrom the decisionon xn. Otherwise,xn,...,X\\ willnot beoptimal.Hence,the principleof optimality holds.

Let fj(y) be the value of an optimalsolutionto KNAP(1,j,y). Sincetheprincipleof optimality holds,we obtain

fn(m) = max{/\342\200\236-i(m),/\342\200\236_i(m

-wn) +pn} (5.14)Forarbitrary fi(y), i > 0,Equation5.14generalizesto

/i(y)=max{fl-i(y)Ji-\\(y-wi)+pl} (5.15)Equation5.15canbesolvedfor fn(m) by beginningwith the knowledgefo(y)= 0 for ally and fi(y) = \342\200\224 00,y <0.Then /1,/2,...,fn can be successivelycomputedusing(5.15).


When the ioj'sare integer,we needtocomputefi(y) for integery, 0 <y < m. Sincefi(y) = \342\200\224 oo for y < 0, thesefunction values neednot becomputedexplicitly.Sinceeachfi canbecomputedfrom /$_i in 0(m)time,it takes@(mn)timeto computefn. When the w^s arerealnumbers,fi(y) isneededfor realnumbersy such that 0 < y < m. So,/j cannotbeexplicitlycomputedfor ally in this range.Even when the ioj'sareinteger,the explicit@(mn)computationof fn may not be the most efficient computation.So,we explorean alternativemethodfor both cases.

Noticethat fi(y) is an ascendingstep function;i.e.,thereare a finitenumberof y's, 0 = yx < y2 < \342\226\240\342\226\240\342\226\240 < yk, such that fi(y{) < fi(y2) < \342\226\240\342\226\240\342\226\240<fi(yk); fi(y) = -oo,y < yv, fi(y) = f(yk), y > Vk\\ and My) = U(yj),Vj < y < Uj+i- So,we needto computeonly fi(yj), 1< j < k. We use theorderedsetSl= {(/(yj),yj)|l< j < k} to representfi(y). EachmemberofSlis a pair(P,W), whereP = /.-(yy)and W = yr Noticethat 5\302\260

= {(0,0)}.We can computeSl+1from Slby first computing

S\\= {(P,W)\\(P-Pi,W -Wi) \342\202\254 S?} (5.16)

Now, Sl+1can becomputedby mergingthe pairsin S1and S\\ together.Note that if Sl+1containstwo pairs(Pj,Wj) and (Pk,Wk) with the propertythat Pj < Pfc and Wj > W^, then the pair(Pj,Wj) canbediscardedbecauseof (5.15).Discardingor purgingrulessuch as this one are alsoknown asdominancerules. Dominatedtuplesget purged. In the above, (Pk,Wk)dominates(Pj,Wj).

Interestingly,the strategy we have comeup with can alsobe derived byattemptingto solve the knapsackproblemvia a systematicexaminationofthe up to 2\342\204\242 possibilitiesfor x\\,X2, \342\226\240\342\226\240\342\226\240,xn. Let Sl representthe possiblestatesresultingfrom the 21decisionsequencesfor xx,...,X{.A staterefersto a pair (Pj,Wj),Wj being the total weight of objectsincludedin theknapsackand Pj beingthe correspondingprofit. To obtainSl+1,we notethat the possibilitiesfor Xi+\\ are xi+\\ = 0 or Xj+i = 1.When X{+\\ = 0,theresultingstatesare the sameas for Sl.When Xi+\\ = 1,the resultingstatesareobtainedby adding(pi+i,Wi+i)to eachstate in Sl.Callthe setof theseadditionalstatesS\\.The S\\ is the sameas in Equation5.16.Now, Sl+1canbecomputedby mergingthe statesin Sland S\\ together.

Example5.21Considerthe knapsackinstancen = 3,(w\\,W2,w^)= (2,3,4),(PiiP2,Pa)= (1,2,5),and m = 6.Forthesedatawe have

S\302\260= {(0,0)};5?={(1,2)}

51 = {(0,0),(1,2)};511={(2,3),(3,5)}52 = {(0,0),(1,2),(2,3),(3,5)};5? ={(5,4),(6,6),(7,7),(8,9)}53 = {(0,0),(1,2),(2,3),(5,4),(6,6),(7,7),(8,9)}

5.7.0/1KNAPSACK 289

Note that the pair (3, 5) has beeneliminatedfrom S3 as a result of thepurgingrulestatedabove. \342\226\241

When generatingthe S^'s,we canalsopurgeallpairs(P,W) with W > mas thesepairsdeterminethe value of fn(%) only for x > m. Sincetheknapsackcapacity is m,we arenot interestedin the behaviorof fn for x >m.When all pairs(Pj,Wj) with Wj > m are purgedfrom the Sl,s,fn(m) isgiven by the P value of the last pair in Sn (note that the Sl,sareorderedsets).Note alsothat by computingSn, we can find the solutionsto all theknapsackproblemsKNAP(1,n, x), 0 < x < m,and not justKNAP(1,n,m).Since,we want only a solutionto KNAP(1,n,m),we can dispensewith thecomputationof Sn.Thelast pairin Sn is eitherthe last one in S12^1or it is(Pj+pn,Wj +wn), where(Pj,Wj) \342\202\254 S\"\"\"1 suchthat Wj +wn<rnand Wjis maximum.

If (PI,Wl) is the last tuple in Sn, a set of 0/1values for the x^s suchthat Y^Pi'xi = PI and J2wixi= Wl can be determinedby carrying outa searchthrough the Sls.We can set xn = 0 if (P1,W1)E Sn^. If

(PI,Wl) 0Sn~l,then (PI-pn, Wl -wn) e S\"\"-1 and we can setxn = 1.This leavesus to determinehow either(PI,Wl) or (PI\342\200\224 pn, Wl \342\200\224 wn) wasobtainedin Sn~~l.Thiscan bedonerecursively.

Example5.22With m = 6,the value of /s(6)is given by the tuple (6,6)in S3 (Example5.21).The tuple (6, 6) 0 S'2,and sowe must set

\302\2433

= 1.Thepair (6, 6) camefrom the pair (6 \342\200\224 ^3,6\342\200\224 w^) = (1,2).Hence(1,2)e S2.Since(1,2)e 51,we can set x-i = 0. Since(1,2) 0 S\302\260,

we obtain#1=1.Hencean optimalsolutionis (xi,X2,X3)= (1,0,1). \342\226\241

We cansum up allwe have saidsofar in the form of an informal algorithmDKP (Algorithm 5.6).To evaluatethe complexityof the algorithm,weneedto specify how the setsSland S\\ are to be represented;provide analgorithmto mergeS1 and S\\; and specify an algorithmthat will tracethroughS\"\"-1,...,Sland determinea setof 0/1values for xn,...,x\\.

We canusean array pair[] to representall the pairs(P,W). TheP valuesarestoredin pair[].pand the W values in pair[].w.Sets

S\302\260, Sl,...,Sn~lcan bestoredadjacentto eachother.This requiresthe useof pointersb[i],0 < i < n, whereb[i] is the locationof the first elementin Sl,0 < i < n,and b[n] is onemorethan the locationof the last elementin S12\"1.

Example5.23Usingthe representationabove,the setsS\302\260,S}, and S2 ofExample5.21appearas


1 AlgorithmDKP(p,w,n,m)2 {345678910111213141516}

5\302\260:= {(0,0)};for i :=1to n \342\200\224 1do{

S*\"1:={(P,W)\\(P-Pi,W -Wi) GS*-1and W <m};Sl:=MergePurge(5i-1,S,j-1);

}(PX,Ifl) :=lastpairin S*1\"1;(PF,WT) :=(P'+p\342\200\236,

W' +wn) whereW is the largestW in

any pair in S12\"1suchthat W + wn <m;II Tracebackfor xn,xn-i,...,x\\.if (PX > PY) thenxn :=0;CloC

\342\200\242I'Tl

\342\200\242\342\200\224 J-5

TraceBackFor(a;\342\200\236_i,...,xi);

Algorithm5.6Informalknapsackalgorithm

pa\302\253r[

pa\302\253r[

1

0

0

2

0

0

3

1

2

4

0

0

5

1

2

6

2

3

7

3

5

t t t t6[0] 6[1] 6[2] 6[3] \342\226\241

Themergingand purgingof Sl~~land S\\~~l canbecarriedout at the sametimethat S\\~ is generated.Sincethe pairsin Sl~l are in increasingorderof P and W, the pairsfor Slaregeneratedin this order. If the next pairgeneratedfor S\\~~ is (PQ,WQ),then we can mergeinto S1allpairsfromS1\"1with W value < WQ. Thepurgingrulecanbeusedto decidewhetherany pairsget purged.Hence,no additionalspaceis neededin which to storesi_i.

DKnap (Algorithm 5.7)generatesSlfrom Sl~~lin this way. The Sl,saregeneratedin the for loopof lines 7 to 42 of Algorithm 5.7.At the startof eachiterationt = b[i \342\200\224 1]and h is the indexof the last pair in S1\"1.Thevariable k points to the next tuple in S1\"1that has to be mergedintoSl. In line 10,the function Largestdeterminesthe largestq, t < q < h,

5.7.0/1KNAPSACK 291

for which pair[q].w+ w[i] < m. This can be doneby performinga binarysearch. The codefor this function is left as an exercise.Sinceu is setsuch that for all Wj,h > j > u, Wj + Wi > m, the pairsfor S\\~ are(P(j)+Pi,W(j) +Wi), 1< j' < u. The for loopof lines11to 33 generatesthesepairs.Eachtimea pair(pp,ww)is generated,allpairs(P,W) in S1\"1with W < ww not yet purgedor mergedinto Slaremergedinto S1.Notethat noneof thesemay bepurged.Lines21to 25 handlethe casewhen thenextpair in S1^1has a W value equal to ww. In this casethe pair withlesserP value getspurged.In casepp >P(next \342\200\224 1),then the pair(pp,ww)getspurged.Otherwise,(pp,ww) is addedto Sl.Thewhile loopof lines31and 32 purgesall unmergedpairsin S1\"1that can be purgedat this time.Finally, following the mergingof SPf1into S{,theremay bepairsremainingin Sl~l to be mergedinto Sl.This is taken careof in the while loopoflines35 to 39.Notethat becauseof lines31and 32,none of thesepairscan be purged.FunctionTraceBack(line43) implementsthe if statementand trace-backstepof the function DKP (Algorithm 5.6).This is left as anexercise.

If \\Sl\\ is the numberof pairsin S\\ then the array pairshouldhave aminimum dimensionof d =

Xô<\302\253<n-i l^l- Sinceit is not possibleto predictthe exactspaceneeded,it is necessaryto test for next> d eachtimenextis incremented.SinceeachS\\ i >0,is obtainedby mergingS1\"1and S\\~and l-Si\"1!< |<S'i_1|,it follows that \\Sl\\ < 2|S'l~1|.In the worst caseno pairswill get purgedand

Y^ \\Sl\\= Y 2l = 2n -1

0<K\302\273-1 0<i<n-lThe timeneededto generateSz from S*~~l is \302\251(IS\"1\"1!). Hence,the time

neededto computeall the S^'s, 0 < i < n, is6(\302\243 IS1^1)).Since\\Sl\\ < 2\\

the timeneededto computeall the S^'s is 0(2n).If the pfsare integers,then eachpair(P,W) in Slhas an integerP and P < ^2i<j<iPj-Similarly,if the lOj's are integers,eachW is an integerand W < m. In any Sl thepairshave distinctW values and alsodistinctP values.Hence,

when the pj'sare integersand

IS*|< 1+min { ^ Wj,m}


17 pair18 pair

PW = record{floatp; float w; }1 Algorithm DKnap(p,w,x,n,to)2 {3 // pair[}is an array of PWs.4 6[0]:=1;pair[l].p:=pair[l].w:=0.0;// S\302\260

5 t:=l;h:=1;// Start and end of S\302\260

6 b[l] :=next:=2; // Next free spot in pair[]7 for i :=1ton \342\200\224 1do8 {// GenerateS\\9 k :=t;10 u :=Largest(pair,w, t, h, i,to);11 for j := t tou do12 {// GenerateS\\~l and merge.13 pp :=pair[j].p+ p[i];ww :=pair[j].w+ w[i];14 // (pp,ww)is the next element in 5J_1.15 while ((k < h) and (pair[k].w< ww)) do16 {

\342\226\240[next].p:=pair[k].p;

next].w:=pair[k].w,19 next:=next+ 1;k :=k + 1;20 }21 if ((fc < h) and (pair[k].w= ww)) then22 {23 if pp < pair[k].pthen pp :=pair[k].p;24 k:=k+l;25 }26 if pp > pair[next \342\200\224 l].pthen27 {28 pair[next].p:=pp;pair[next].w:=ww;29 next:=next+ 1;30 }31 while ((k < h) and (pair[k].p< pair[next\342\200\224 l].p))32 dok:=k+ 1;33 }34 // Mergein remaining terms from Sl~l\342\226\240

35 while (k < h) do36 {37 pair[next].p:=pair[k].p;pair[next].w:=pair[k].w;38 next:=next+ 1;k :=k + 1;39 }40 // Initialize for Si+l.41 t :=h + 1;h :=next \342\200\224 1;b[i + 1]:=next;42 }43 TraceBack(p,w,pair,x,to, n);44 >

Algorithm5.7 Algorithm for 0/1knapsackproblem

5.7.0/1KNAPSACK 293

when the wj's are integers.When both the py's and Wj's are integers,thetimeand spacecomplexityof DKnap (excludingthe time for TraceBack)is 0(min{2n,nEi<i<nft)nm})'^n this bound Y^\\<i<nPi can be replacedby \302\243i<i<\342\200\236Wgcd CPi,---,Pn)and m by gcd (wuw2,\342\226\240\342\226\240

\342\226\240, wn,m) (seetheexercises).Theexercisesindicatehow TraceBackmay beimplementedsoastohave a spacecomplexity0(1)and a timecomplexity0(n2).

Although the above analysis may seemto indicatethat DKnap requirestoomuch computationalresourceto be practicalfor largen, in practicemany instancesof this problemcan be solved in a reasonableamount oftime.This happensbecauseusually, all the p'sand io'sare integersand mis much smallerthan 2\342\204\242. Thepurgingrule is effective in purgingmost of thepairsthat would otherwiseremainin the Sl,s.

Algorithm DKnap can be speededup by the use of heuristics.Let Lbe an estimateon the value of an optimalsolutionsuch that fn(m) > L.Let PLEFT(i)= Y,i<j<nPj-If ^ containsa tuple (P,W) such that P +PLEFT(i)< L, then (P,W) can be purgedfrom S\\ To seethis, observethat (P,W) can contributeat bestthe pair(P+Y.i<j<nPjiW +Y.i<j<nw)to S?~l.SinceP + Ei<j<nPj= p + PLEFT(i)< L, it follows that'thispaircannotleadto a pair with value at leastL and socannotdetermineanoptimalsolution.A simpleway toestimateL such that L < fn(m) is toconsiderthe last pair (P,W) in S\\ Then,P < fn{m).A betterestimateisobtainedby addingsomeof the remainingobjectsto (P,W). Example5.24illustratesthis.Heuristicsfor the knapsackproblemarediscussedin greaterdetailin the chapteron branch-and-bound.Theexercisesexplorea divide-and-conquerapproachto speedup DKnap so that the worst casetime is0{2nl'2).Example5.24Considerthe following instanceof the knapsackproblem:n = 6,(pi,P2,P3,P4,P5,Pe)= {wi,W2,wz,w\302\261,W5,w&) = (100,50,20,10,7,3), and m = 165.Attempting to fill the knapsackusingobjectsin the order1,2,3, 4, 5,and 6,we seethat objects1,2, 4, and 6 fit in and yield a profitof 163and a capacity utilizationof 163.We can thus beginwith L = 163asa value with the property L < fn(m). Sincepi = wi, every pair(P,W) \342\202\254 Sl,0 < i < 6 has P = W. Hence,eachpair can bereplacedby the singletonPor W. PLEFT(O)= 190,PLEFT(l)= 90,PLEFT(2)= 40, PLEFT(3)=20,PLEFT(4)= 10,PLEFT(5)= 3,and PLEFT(6)= 0.EliminatingfromeachSlany singletonP suchthat P+ PLEFT(i)< L, we obtain

,S0= {0};5? = {100}S{= {100};S{= {150}

S2= {150};S2=<f>


S3= {150};5? ={160}SA = {160};Sf =

<f>

S5= {160}Thesingleton0 is deletedfrom S1as 0 + PLEFT(1)<163.Theset Sf

doesnot containthe singleton150+ 20 = 170asm< 170.S3 doesnotcontainthe 100or the 120as eachis lessthan L \342\200\224 PLEFT(3).And so on.Thevalue /g(165)can bedeterminedfrom S5.In this example,the value ofL did not change.In general,L will changeif a betterestimateis obtainedasa resultof the computationof someS1.If the heuristicwasn'tused,thenthe computationwould have proceededas

S\302\260= {0}

51 = {0,100}52 = {0,50,100,150}53 = {0,20,50,70,100,120,150}54 = {0,10,20,30,50,60,70,80,100,110,120,130,150,160}55 = {0,7,10,17,20,27,30,37,50,57,60,67,70,77,80,87,100,

107,110,117,120,127,130,137,150,157,160}The value /e(165)can now be determinedfrom S5, using the knowledge(P6,we)- (3,3). \342\226\241

EXERCISES1.GeneratethesetsSl,0 < i < 4 (Equation5.16),when (w\\,W2, w^,Wi) =

(10,15,6,9)and (pi,p2,P3,P4)= (2,5,8,1).2.Write a function Largest(pair,w, t, h, i,m) that usesbinary searchto

determinethe largestq, t < q < h, suchthat pair[q].w+w[i] < m.

3. Write a function TraceBackto determinean optimalsolutionx\\, X2, \342\226\240

\342\226\240.,

xn to the knapsackproblem.Assume that Sl,0 < i < n, have alreadybeencomputedas in function DKnap. Knowing b(i) and b(i + 1),you can use a binary searchto determinewhether (P',W')\302\243 Sl.Hence,the timecomplexityof your algorithmshouldbeno morethan0(nmaxi{log|5i|})=0(n2).

4. Give an exampleof a set of knapsackinstancesfor which \\Sl\\= 2l,

0< i <n. Your setshouldincludeone instancefor eachn.

5.8.RELIABILITYDESIGN 295

5. (a) Showthat if thepj'sareintegers,then the sizeof eachSl,\\Sl\\, inthe knapsackproblemis no morethan 1+Xa<i<jPj/gcd(pi,P2,\342\226\240\342\226\240

\342\200\242,

pn), wheregcd(p\\,p2,\342\226\240\342\226\240\342\226\240,pn) is the greatestcommondivisor ofthe pi's.

(b) Showthat when the w/s areinteger,then |S*|< 14-mm{Y,l<j<iWj,m}/gcd(wi,w2,...,wn,m).

6. (a) Usinga divide-and-conquerapproachcoupledwith the setgeneration approachof the text, show how toobtain an 0(2n'2)algorithm for the 0/1knapsackproblem.

(b) Developan algorithmthat uses this approachto solve the 0/1knapsackproblem.

(c) Comparethe run timeand storagerequirementsof this approachwith thoseof Algorithm 5.7.Usesuitabletest data.

7. Considerthe integerknapsackproblemobtainedby replacingthe 0/1constraint in (5.2)by xj > 0 and integer. Generalizefi(x) to thisproblemin the obviousway.

(a) Obtainthe dynamicprogrammingrecurrencerelationcorresponding

to(5.15).(b) Showhow to transform thisprobleminto a 0/1knapsackproblem.

(Hint: Introducenew 0/1variablesfor eachX{. If0 < X{ < 2J,then introducej variables,one for eachbit in the binaryrepresentation OfXi.)

5.8 RELIABILITYDESIGNIn this sectionwe lookat an exampleof how to use dynamic programmingto solve a problemwith a multiplicativeoptimizationfunction. Theproblem is to designa system that is composedof severaldevicesconnectedinseries(Figure5.19).Let i\\ be the reliability of deviceD{ (that is,n is theprobability that devicei will function properly).Then,the reliability of theentiresystem is Ilrj. Even if the individual devicesare very reliable(therj'sare very closeto one), the reliability of the system may not be verygood.Forexample,if n = 10and r{ = .99,1< i < 10,then Un = .904.Hence,it is desirableto duplicatedevices.Multiple copiesof the samedevice type areconnectedin parallel(Figure5.20)throughthe useof switchingcircuits.Theswitchingcircuitsdeterminewhich devicesin any given groupare functioningproperly.They then make use of one suchdeviceat eachstage.

If stage% containsmi copiesof deviceDi,then the probability that allrrii have a malfunction is (1\342\200\224 ri)mi.Hencethe reliability of stagei becomes


Dx 3\302\273 D2 \342\200\224=\302\273

\302\243>3

Figure5.19n devicesZ?,, 1< % <n, connectedin series

stage 1 stage 2 stage 3 stage n

DiDxDx

D2D2

D3\302\2733

\302\243>3

^3

Figure5.20Multiple devicesconnectedin parallelin eachstage

1\342\200\224 (1\342\200\224 ri)mi.Thus, if T{ = .99and mi = 2,the stagereliability becomes.9999.In any practicalsituation, the stagereliability is a little lessthan1\342\200\224 (1\342\200\224 ri)mi becausethe switchingcircuitsthemselvesarenot fully reliable.Also,failuresof copiesof the samedevicemay not befully independent(e.g.,if failure is due to designdefect).Let us assumethat the reliability of stagei is given by a function 0j(mj),1< n. (It is quiteconceivablethat <pi(m,i)may decreaseafter a certainvalue of m,.)The reliability of the system ofstagesis Ui<i<n(pi(mi).

Our problemis to use deviceduplicationto maximizereliability. Thismaximizationis to be carriedout under a costconstraint.Let c, be thecostof eachunit of device% and let c be the maximumallowablecostofthe system being designed.We wish to solve the following maximizationproblem:

maximizeX\\.\\<i<n 4>i{mi)

subjectto y~\" Ciirii <c (5-17)\\<i<n

mi > 1and integer,1 0,eachrrii must be in the range 1< rrii < Ui, where

Ui (c+Ci -^c?)/c*

TheupperboundUi follows from the observationthat rrij > 1.An optimalsolutionrri\\, rri2,...,rnn is the resultof a sequenceof decisions,onedecisionfor eachmt. Let ft(x) representthe maximumvalue of fli<j<,<p(rrij) subjectto the constraintsJ2i<j<icjrnj< x and 1< rnj < uji 1<J <

*\342\200\242 Then, thevalue of an optimalsoTuFionis /n(c).Thelast decisionmaderequiresonetochoosemn from {1,2,3,..., un}.Oncea value for mn has beenchosen,theremainingdecisionsmust besuchas tousethe remainingfunds c \342\200\224 cnmn inan optimalway. Theprincipalof optimality holdsand

fn(c)= max {(pn{mn)fn-i(c-cnmn)} (5.18)l<m\342\200\236<un

Forany fi{x),i > 1,this equationgeneralizesto

fi{x)= max {(pi{mt)fi-i(x-Ciirii)} (5.19)\\<rrii<Ui

Clearly, fo{x)= 1for allx, 0 < x <c Hence,(5.19)can be solvedusingan approachsimilarto that used for the knapsackproblem.Let Slconsistof tuplesof the form (/,x), where/ = fi(x).Thereis at most onetuple foreachdifferent x that resultsfrom a sequenceof decisionson rri\\,m2, \342\226\240\342\226\240

\342\226\240, mn.Thedominancerule {f\\,x\\) dominates(/2,#2)iff/i > f'2 and x\\ < X2 holdsfor this problemtoo.Hence,dominatedtuplescan bediscardedfrom Sl.

Example5.25We are to designa threestagesystem with devicetypesZ?i,Z?2,and D%. The costsare$30,$15,and $20 respectively.The costofthe system is to beno morethan $105.The reliability of eachdevicetype is.9,.8and .5respectively.We assumethat if stage% has rrii devicesof type %

in parallel,then ^(rn,) = 1\342\200\224 (1\342\200\224 r,)m\\ In termsof the notationusedearlier,c\\ = 30,C2 = 15,C3 = 20,c = 105,r\\ = .9,r^ = .8,r% = .5,u\\=2,U2= 3,and t/3 = 3.

We use Sl to representthe set of all undominatedtuples (/,x) thatmay result from the various decisionsequencesfor mi,m2,\342\226\240\342\226\240\342\226\240,rri{.Hence,f(x) = fi{x).Beginningwith S\302\260

= {(1,0)},we canobtaineachSlfrom S*~lby trying out allpossiblevalues for rrii and combiningthe resultingtuplestogether.UsingS1a to representalltuplesobtainablefrom Sl~l by choosingrrii = j, we obtain S{ = {(.9,30)}and S% = {(.9,30),(.99,60)}.Theset


S2= {(.72,45),(.792,75)};S2={(.864,60)}.Note that the tuple (.9504,90)which comesfrom (.99,60)has beeneliminatedfrom S2as this leavesonly$10.This is not enoughto allow m3 = 1.Theset S$= {(.8928,75)}.Combining, we get S2= {(.72,45),(.864,60),(.8928,75)}as the tuple (.792,75)isdominatedby (.864,60).Theset Sf = {(.36,65),(.432,80),(.4464,95)},5|= {(.54,85),(.648,100)},and 5|= {(.63,105)}.Combining,we get S3 ={(.36,65),(.432,80), (.54,85), (.648,100)}.

Thebestdesignhas a reliability of .648and a costof 100.Tracingbackthroughthe Sl,s,we determinethat mj = l,m2= 2,and m^ = 2. \342\226\241

As in the caseof the knapsackproblem,acompletedynamic programmingalgorithmfor the reliability problemwill useheuristicsto reducethe sizeofthe Sl,s.Thereis no needto retainany tuple (/,x) in S1 with x valuegreaterthat c \342\200\224

Yî<j<ncj as such a tuple will not leave adequatefundsto completethe system.In addition,we can devisea simpleheuristictodeterminethe bestreliability obtainableby completinga tuple (f,x)in Sl.If this is lessthan a heuristicallydeterminedlower bound on the optimalsystem reliability, then (/,x) can beeliminatedfrom Sl.

EXERCISE1. (a) Presentan algorithmsimilarto DKnap to solve the recurrence

(5.19).(b) What are the timeand spacerequirementsof your algorithm?(c) Testthe correctnessof your algorithmusingsuitabletest data.

5.9 THE TRAVELINGSALESPERSONPROBLEM

We have seenhow to apply dynamic programmingto a subsetselectionproblem (0/1knapsack).Now we turn our attention to a permutationproblem.Note that permutationproblemsusually aremuch harderto solvethansubset problemsas thereare n! different permutationsof n objectswhereasthereare only 2\342\204\242 different subsetsof n objects(n! > 2n). Let G = (V,E)bea directedgraphwith edgecostsqj.Thevariable qj is definedsuchthatCij > 0 for all i and j and Cy = co if (i,j)0 E. Let \\V\\

= n and assumen > 1.A tour of G is a directedsimplecycle that includesevery vertex inV. Thecostof a tour is the sum of the costof the edgeson the tour. Thetraveling salespersonproblemis to find a tour of minimum cost.

Thetravelingsalespersonproblemfinds applicationin a variety ofsituations. Supposewe have to routea postalvan to pick up mail from mail

5.9.THETRAVELINGSALESPERSONPROBLEM 299

boxeslocatedat n different sites.An n + 1vertexgraph can be used torepresentthe situation.Onevertexrepresentsthe postoffice from which thepostalvan startsand towhich it must return.Edge(i,j)is assigneda costequaltothe distancefrom sitei tositej. Theroutetakenby the postalvanis a tour, and we areinterestedin finding a tour of minimum length.

As a secondexample,supposewe wish to use a robotarm to tightenthe nuts on somepieceof machinery on an assembly line. The arm willstart from its initial position(which is over the first nut to be tightened),successivelymove to eachof the remainingnuts, and return to the initialposition.The path of the armis clearly a tour on a graph in which verticesrepresentthe nuts. A minimum-costtour will minimizethe timeneededforthe arm to completeits task (note that only the totalarmmovement timeis variable;the nut tighteningtimeis independentof the tour).

Ourfinal exampleis from a productionenvironmentin whichseveralcommodities aremanufacturedon the sameset of machines.The manufactureproceedsin cycles.In eachproductioncycle,n different commoditiesareproduced.When the machinesarechangedfrom productionof commodity% tocommodity j, a changeover costc%j is incurred.It is desiredto find asequencein which to manufacturethesecommodities.Thissequenceshouldminimizethe sum of changeover costs(the remainingproductioncostsaresequenceindependent).Sincethe manufactureproceedscyclically,it isnecessary to includethe costof starting the next cycle.This is just the changeover costfrom the last to the first commodity. Hence,this problemcan beregardedasa traveling salespersonproblemon an n vertexgraphwith edgecostCjj'sbeingthe changeovercostfrom commodity i to commodityj.

In the following discussionwe shall, without lossof generality, regarda tour to be a simplepath that starts and ends at vertex 1. Every tourconsistsof an edge(l,k)for somek \302\243 V \342\200\224 {1}and a path from vertexk tovertex 1.Thepath from vertex k to vertex 1goesthrougheachvertex inV \342\200\224 {1,k}exactlyonce.It is easy to seethat if the tour is optimal,then thepath from k to 1must be a shortestk to 1path goingthroughallverticesin V \342\200\224 {l,k}.Hence,the principleof optimality holds.Let g(i,S)be thelengthof a shortestpath starting at vertexi,goingthroughallverticesinS,and terminatingat vertex1.Thefunction g(l,V \342\200\224 {1})is the lengthofan optimalsalespersontour.Fromthe principalof optimality it follows that

0(1,V-{!})=min {clk+g(k,V -{1,k})} (5.20)2<k<n

Generalizing(5.20),we obtain (for % 0S)

g(i,S) = minted+g(j,S-{j})} (5.21)

Equation5.20canbe solvedfor g(l,V \342\200\224 {1})if we knowg(k,V \342\200\224 {1,k})for allchoicesof k. Theg values can be obtainedby using(5.21).Clearly,


g(i,4>) = cji,1< i <n.Hence,we can use (5.21)to obtaing(i,S) for allSof size1.Then we can obtaing(i,S)for S with |,S*| = 2, and soon.When\\S\\ < n \342\200\224 1,the values of i and S for which g(i,S) is neededaresuch thati\302\261 1,10S,and i 05.

Example5.26Considerthe directedgraph of Figure5.21(a).The edgelengthsaregiven by matrixc of Figure5.21(b).

0 10 15 205 0 9 106 13 0 128 8 9 0

(b)

Figure5.21Directedgraphand edgelengthmatrixc

Thus g(2,(p)= c2i = 5,#(3,0)= c31 = 6, and g(4,</>) = c4X = 8. Using(5.21),we obtain

ff(2,{3})= c23+ff(3,0)= 15 ff(2,{4})= 18ff(3,{2})= 18 ff(3,{4})= 20ff(4,{2})= 13 ff(4,{3})= 15

Next,we computeg(i,S) with \\S\\ =2,i^1,105and i 0S.

#(2,{3,4})= min {c23+ff(3,{4}),C24+ff(4,{3})}= 25#(3,{2,4})= min {c32+ff(2,{4}),C34+ff(4,{2})}= 25ff(4,{2,3})= min {c42+ff(2,{3}),c43+ff(3,{2})}= 23

Finally, from (5.20)we obtain

<7(1,{2,3,4})= min {c12+#(2,{3,4}),c13+ </(3,{2,4}),c14+0(4,{2,3})}= min {35,40,43}= 35

(a)

5.10.FLOW SHOPSCHEDULING 301

An optimaltour of the graph of Figure5.21(a)has length 35. A tourof this lengthcan beconstructedif we retainwith eachg(i,S) the value ofj that minimizesthe right-handsideof (5.21).Let J(i,S)be this value.Then, J(l,{2,3,4})= 2. Thus the tour starts from 1and goesto 2. Theremainingtour canbeobtainedfrom g(2,{3,4}).SoJ(2,{3,4})= 4.Thusthe next edgeis (2,4).The remainingtour is for g(4, {3}).SoJ(4,{3})=3.The optimaltour is 1,2,4, 3, 1. \342\226\241

Let N be the numberof g(i,,S)'sthat have to becomputedbefore(5.20)can beusedto computeg(l,V \342\200\224 {1}).Foreachvalue of \\S\\ therearen \342\200\224 1choicesfor i. Thenumberof distinctsetsS of sizek not including1and i

/n-2\\is k \342\200\242 Hence

n\342\200\2242

JV=f>-l)(V)= (n-l)2n-1\342\200\224n V /fc=0

An algorithmthat proceedsto find an optimaltour by using(5.20)and (5.21)will require0(n22\")timeas the computationof g(i,S) with |,S|= k requiresk \342\200\224 1comparisonswhen solving (5.21).This is betterthan enumeratingalln! different tours to find the bestone. The mostseriousdrawback of thisdynamic programmingsolutionis the spaceneeded,0{n2n).This is toolargeeven for modestvalues of n.

EXERCISE1. (a) Obtaina datarepresentationfor the valuesg(i,S) of the traveling

salespersonproblem.Your representationshouldallow for easyaccessto the value of g(i,S),given i and S.(i) Howmuch spacedoesyour representationneedfor an n vertexgraph? (ii) Howmuch timeis neededtoretrieveor updatethe value of g(i,S)l

(b) Usingthe representationof (a), developan algorithmcorresponding

to the dynamic programmingsolutionof the travelingsalesperson problem.

(c) Test the correctnessof your algorithmusingsuitabletestdata.

5.10FLOW SHOPSCHEDULINGOften the processingof a jobrequiresthe performanceof severaldistincttasks.Computerprogramsrun in a multiprogrammingenvironmentareinput and thenexecuted.Followingthe execution,thejobis queuedfor output


and the output eventually printed.In a generalflow shop we may have njobseachrequiringm tasksTu,T2i,...,Tmi,1< % < n, to be performed.Task Tji is to beperformedon processorPy, 1<j < m . The timerequiredto completetask Tji is tji.A schedulefor the n jobsis an assignmentof tasksto timeintervals on the processors.Task Tji must beassignedto processorPj. No processormay have morethan one task assignedto it in any timeinterval.Additionally, for any jobi the processingof task Tji,j > 1,cannotbestarteduntil task Tj^\\^ has beencompleted.

Example5.27Two jobshave to be scheduledon threeprocessors,task timesaregiven by the matrixJ The

J =2 03 35 2

Two possibleschedulesfor the jobsareshown in Figure5.22, \342\226\241

time 0

Pi

10 12r\342\200\236

7*22 T2i T22

r3i 7*32

time 0 2 3

(a)

5 6 11r,i

7*22 Tu

T32 T31

(b)

Figure5.22Two possibleschedulesfor Example5.27

5,10.FLOW SHOPSCHEDULING 303

A nonpreemptivescheduleis a schedulein which the processingof a taskon any processoris not terminateduntil the task is complete.A schedulefor which this neednot be true is calledpreemptive.The scheduleofFigure 5.22(a)is a preemptiveschedule.Figure5.22(b)showsa nonpreemptiveschedule.The finish time fi(S)of jobi is the timeat which alltasksof jobi have beencompletedin scheduleS. In Figure5.22(a),fi(S)= 10andf2(S) = 12.In Figure5.22(b),/j(5)= 11and f2(S) = 5, Thefinish timeF(S)of a scheduleS is given by

F(S)=max{/,-(\302\243)} (5.22)\\<i<n

Themean flow time MFT(S')is defined to be

MFT(S)= - J2 MS) (5-23)\\<i<n

An optimalfinish time(OFT) schedulefor a given set of jobsis a non-preemptivescheduleS for which F(S)is minimum over allnonpreemptiveschedulesS. A preemptiveoptimalfinish time(POFT)schedule,optimalmean finish timeschedule(OMFT),and preemptiveoptimalmean finish(POMFT)schedulearedefined in the obvious way.

Although the generalproblemof obtainingOFTand POFTschedulesform > 2 and of obtainingOMFTschedulesis computationallydifficult (seeChapter11),dynamic programmingleadsto an efficient algorithmtoobtainOFTschedulesfor the casem = 2. In this sectionwe considerthis specialcase.

For convenience,we shall use a{ to representtu, and 6j to representt2i- For the two-processorcase,one can readily verify that nothing is tobegainedby usingdifferent processingorderson the two processors(this isnot true for m > 2), Hence,a scheduleis completelyspecifiedby providinga permutationof the jobs.Jobswill be executedon eachprocessorin thisorder.Each task will be startedat the earliestpossibletime.Thescheduleof Figure5,23is completelyspecifiedby the permutation(5, 1,3, 2,4).We make the simplifyingassumptionthat a? ^ 0,1< i < n. Note that ifjobswith ai = 0 areallowed,then an optimalschedulecan be constructedby first finding an optimalpermutationfor all jobswith ai ^ 0 and thenaddingalljobswith a^ =0 (in any order) in front of this permutation(seethe exercises).

It is easy to seethat an optimalpermutation(schedule)has the propertythat given the first jobin the permutation,the remainingpermutationisoptimalwith respectto the state the two processorsare in following thecompletionof the first job.Let g\\, a2,...,<7fc beapermutationprefixdefininga schedulefor jobsT\\, T2,...,T^. Forthisschedulelet f\\ and f2 bethe timesat which the processingof jobsTi,T2,...,Tk is completedon processorsPi


a5 a]

b5

a-s a2

bx b,

a$

b2 b4

Figure5.23A schedule

and P2 respectively.Let t = fa \342\200\224

f\\- Thestateof the processorsfollowingthe sequenceof decisionsT\\, J2,...,T^ is completelycharacterizedby t. Letg(S,t) bethe lengthof an optimalschedulefor the subsetof jobsS underthe assumptionthat processor2 is not available until timet. Thelengthofan optimalschedulefor the jobset{1,2,...,n} is #({1,2,...,n},0).

Sincethe principleof optimality holds,we obtain

g({l,2,\342\226\240\342\200\242\342\200\242, n},0) = min {a,+g({l,2,\342\200\242\342\200\242

\342\200\242, n}-{i},&,-)} (5.24)\\<i<n

Equation5,24generalizesto (5.25)for arbitrary S and t. Thisgeneralization requiresthat g((p,t) =

max{\302\243, 0}and that aj ^ 0, 1< % <n.

g(S,t) = min {a2-+g(S-{i},ft,- +max{i- a,-,0})} (5.25)

Thetermmax{t\342\200\224 aj,0}comesinto (5.25)as task T21cannotstart until

max{a,,t}(P2 is not availableuntil timet).Hence/2 \342\200\224/1

= bi +max{a,i,t}\342\200\224

<H= h +max{\302\243

\342\200\224 aj,0}.We can solvefor g(S,t)usingan approachsimilarto that usedto solve (5.21).However,it turns out that (5.25)can be solvedalgebraicallyand avery simpleruleto generatean optimalscheduleobtained.

Considerany scheduleR for a subsetof jobsS, Assume that P2 is notavailable until timet. Let i and j be the first two jobsin this schedule.Then, from (5,25)we obtain

g(S,t) = ai + g(S-{i},bt+max{t-ai,0})g(S,t) = ai + aj + g(S \342\200\224 {i,j},bj+ max {b{+ max {t \342\200\224 aj,0}\342\200\224 aj,0})

(5.26)

5.10.FLOW SHOPSCHEDULING 305

Equation5.26can be simplifiedusingthe following result:

tij = bj +max {bi+max {t\342\200\224 a2,0}\342\200\224 a,-,0}= bj + br \342\200\224

a,j +max {max{t \342\200\224 di,0},a,j\342\200\224 bi}= bj + bi \342\200\224 (ij +max {t\342\200\224 a,,Oj \342\200\224 6j,0}

iy = ^ + bi \342\200\224

a,j\342\200\224 a,i +max

{\302\243,aj +a, \342\200\224 6j,a^}

If jobs% and j are interchangedin R, then the finish timeg'(S,t)is

#'(SU) = ai +a,j+g(S-{i,j},tji)

wheretji = bj + bi \342\200\224 aj \342\200\224 di +max{\302\243,

a, +aj \342\200\224 bj,dj}

Comparingg(S,t)and g'(S,t),we seethat if (5.28)below holds, theng(S,t)<g'(S,t).

max{\302\243, aj +aj \342\200\224 ij,aj}< max

{\302\243, aj +aj \342\200\224 6j,aj} (5.28)In orderfor (5.28)to hold for all values of t, we need

max {di+aj \342\200\224 ij,aj}< max {aj+ a,j ~ bj,a,j}

or di +aj +max{\342\200\224bi, \342\200\224dj}

<di+dj +max{\342\200\224bj,\342\200\224di}

or min {ij,aj}>min{bj,di} (5.29)From(5.29)we can concludethat thereexistsan optimalschedulein

which for every pair (i,j)of adjacentjobs,min{ij,dj}> min{ij,aj}.Exercise 4 shows that all scheduleswith this property have the samelength.Hence,it suffices to generateany schedulefor which (5.29)holdsfor everypairof adjacentjobs.We canobtaina schedulewith thisproperty by makingthe following observationsfrom (5.29).If min{ai,ct2,...,an,&i, 62, \342\200\242\342\200\242

\342\200\242, bn}is aj,thenjobi shouldbe the first jobin an optimalschedule.If min{ai,02,...,an,bi,b2,...,bn}is bj, then jobj shouldbe the lastjobin an optimalschedule.This enablesus to make a decisionas to the positioningof oneof the n jobs.Equation5.29can now be usedon the remainingn \342\200\224 1jobsto correctlypositionanotherjob,and so on.Theschedulingrule resultingfrom (5.29)is therefore:


1.Sort all the Oj's and 6j'sinto nondecreasingorder.

2, Considerthissequencein thisorder.If the nextnumberin the sequenceis cij andjobj hasn'tyet beenscheduled,schedulejobj at the leftmostavailable spot.If the next number is bj and jobj hasn't yet beenscheduled,schedulejobj at the rightmostavailable spot.If j hasalready beenscheduled,go to the next numberin the sequence.

Note that the above rule alsocorrectlypositionsjobswith Oj = 0. Hence,thesejobsneednot beconsideredseparately.

Example5.28Let n = 4, (01,02,03,04)= (3,4, 8,10),and (&i,&2)&3>&4)=(6, 2, 9,15).Thesortedsequenceof a'sand 6'sis (62^1,02,61,03,63,04,64)= (2,3,4, 6,8,9,10,15).Let 01,0-2,0-3,and 04 be the optimalschedule.Sincethe smallestnumberis 62, we set04 = 2.Thenext numberis a\\ andwe set o\\ = a\\. Thenext smallestnumberis 02- Job2 has already beenscheduled.Thenext numberis 61.Job1has already beenscheduled.Thenext is 03 and we set03,This leaves 03 free and job4 unscheduled.Thus,o3 = 4. \342\226\241

Theschedulingrule above can be implementedto run in timeO(nlogn)(seeexercises).Solving(5.24)and (5.25)directly for g(l,2,...,n,0)for theoptimalschedulewill take$7(2\") timeas therearethesemany different S\"sfor whichg{S,t)will becomputed.

EXERCISES1.N jobsare to be processed.Two machinesA and B areavailable.If

job% is processedon machineA, then Oj units of processingtimeareneeded.If it isprocessedonmachineB,then 62-unitsof processingtimeareneeded.Becauseof the peculiaritiesof the jobsand the machines,it is quite possiblethat en > 62- for some% while a,j < bj for somej'; j jL i. Obtain a dynamic programmingformulation to determinethe minimum timeneededtoprocessallthejobs.Note that jobscannotbe split betweenmachines.Indicatehow you would go about solvingthe recurrencerelationobtained.Do thisonan exampleof your choice.Also indicatehow you would determinean optimalassignmentof jobsto machines.

2, N jobshave tobescheduledfor processingononemachine.Associatedwith jobi is a 3-tuple(pi,ti,di).Thevariable ti is the processingtimeneededto completejobi, If jobi is completedby its deadlined,,thena profit pi is earned.If not, then nothingis earned.PromSection4,4we know that J is a subsetof jobsthat can allbecompletedby their


deadlinesiff the jobsin J can be processedin nondecreasingorderofdeadlineswithout violatingany deadline.Assumec4 <dj+i,1< % <n.Let fi(x) be the maximumprofit that can beearnedfrom a subsetJof jobswhen n = i. Herefn{dn)is the value of an optimalselectionofjobsJ. Let fo(x) = 0.Show that for x < t{,

fi(x) = max{/j_!(x),fi-i(x-U) +pi}

3.Let / be any instanceof the two-processorflow shopproblem.

(a) Show that the lengthof every POFTschedulefor / is the sameas the lengthof every OFTschedulefor I. Hence,the algorithmof Section5,10alsogeneratesa POFTschedule.

(b) Show that thereexistsan OFTschedulefor / in which jobsareprocessedin the sameorderon both processors,

(c) Show that thereexistsan OFTschedulefor / defined by somepermutationa of the jobs(seepart (b)) such that alljobswitha, = 0 areat the front of thispermutation.Further,showthat theorderin which thesejobsappearat the front of the permutationis not important.

4. Let / be any instanceof the two-processorflow shop problem.Leta = o\\02 \342\226\240\342\200\242\342\200\242

ct\342\200\236

bea permutationdefiningan OFTschedulefor /.(a) Use (5.29)to argue that there exists an OFTa such that

min {bi,aj)> min {bj,at}for every i and j such that i = aând j =

CTfc+i (that is,i and j areadjacent).(b) Foraa satisfyingthe conditionsof part (a),showthat min{6j,aj}>

mm{bj,a,}for every i and j suchthat i \342\200\224 o^ and j \342\200\224 ar,k< r.(c) Show that all schedulescorrespondingto it'ssatisfying the

conditions of part (a) have the samefinish time. (Hint:usepart (b)totransform oneof two different schedulessatisfying (a) into theotherwithout increasingthe finish time.)

5.11REFERENCES AND READINGSTwo classicreferenceson dynamic programmingare:Introductionto Dynamic Programming,by G.Nemhauser,John Wiley andSons,1966.AppliedDynamic Programmingby R. E.Bellmanand S.E.Dreyfus,Princeton UniversityPress,1962.


SeealsoDynamic Programming,by E.V.Denardo,Prentice-Hall,1982.The dynamic programmingformulation for the shortest-pathsproblem

was given by R. Floyd.Bellmanand Ford'salgorithmfor the single-sourceshortest-pathproblem

(with generaledgeweights)canbe found in Dynamic Programmingby R. E.Bellman,PrincetonUniversity Press,1957.

The constructionof optimalbinary searchtreesusingdynamicprogramming

is describedin The Art of Programming:Sortingand Searching,Vol.3,by D.E.Knuth,Addison Wesley, 1973.

Thestringeditingalgorithmdiscussedin this chapteris in \"The string-to-stringcorrectionproblem,\"by R. A. Wagner and M. J.Fischer,Journalof the ACM21,no.1(1974):168-173.

The set generationapproachto solving the 0/1knapsackproblemwasformulatedby G.Nemhauserand Z.Ullman,and E.Horowitzand S.Sahni.

Exercise6 in Section5.7is due to E.Horowitzand S.Sahni.

Thedynamicprogrammingformulation for the travelingsalespersonproblem was given by M. Heldand R. Karp.

Thedynamic programmingsolutionto the matrixproductchainproblem(Exercises1and 2 in Additional Exercises)is due toS.Godbole.

5.12 ADDITIONALEXERCISES1.[Matrix productchains] Let A, B,and C bethreematricessuchthat

C = A x B. Let the dimensionsof A, B,and C respectivelybe m xn,nx p,and m x p. Fromthe definition of matrixmultiplication,

C(i,j)= J2Ad,k)B(k,j)k=\\

(a) Write an algorithmto computeC directly usingthe aboveformula. Show that the numberof multiplicationsneededby youralgorithmis mnp.

(b) Let M\\ x M2 x \342\200\242\342\200\242\342\200\242 x Mr be a chain of matrixproducts.Thischainmay beevaluatedin severaldifferent ways. Twopossibilitiesare

(\342\200\242

\342\200\242\342\200\242((Mi x M2) x M3) x M4) x \342\200\242\342\200\242

\342\200\242)

x Mr and (Mx x (M2 x(\342\200\242

\342\200\242\342\200\242 x (Mr_i x Mr) \342\200\242\342\200\242

\342\200\242).

The costof any computationof M\\ x


M2 x \342\200\242\342\200\242\342\200\242 x Mr is the numberof multiplicationsused.Considerthe caser = 4 and matricesMi through M4 with dimensions100x 1,1x 100,100x 1,and lx 100respectively.What is thecostof eachof the five ways to computeM\\ x M2 x M3 x M4? Show that the optimalway has a costof 10,200and the worstway has a costof 1,020,000.Assume that all matrixproductsarecomputedusingthe algorithmof part (a).

(c) Let My denotethe matrixproductM, x Mj+i x \342\200\242\342\200\242\342\200\242x Mj.Thus,Ma = Mj,1< i < r. S = Pi,P2,\342\226\240\342\226\240

\342\226\240, Pr~\\ is a product sequencecomputingM\\r iff eachproductPk is of the form Mi3 x Mj+ii9,where My- and Mj+itQ have beencomputedeitherby anearlier productP[,I< k, or representan input matrixMu- Notethat Mtj x Mj+ij = Mj9. Also note that every validcomputation of M\\r using only pairwisematrixproductsat eachstep is defined by a productsequence.Two productsequencesS\\ = Pi,P2,\342\200\242\342\200\242

\342\200\242, Pr-1and S2 = U\\, U2, \342\200\242\342\200\242

\342\200\242, Ur-\\ are different ifPi 7^ Uj for somei. Show that the numberof different productsequencesif (r \342\200\224 1)!

(d) Although thereare (r \342\200\224 1)!different product,sequences,many oftheseare essentially the samein the sensethat the samepairsof matricesare multiplied.For example,the sequencesSi =(Mi x M2),(M3x M4), (M12x M34)and S2 = (M3 x M4), (Mi xM2),(Mi2x M34)are different under the definition of part (c).However,the samepairsof matricesaremultipliedin both S\\ andS2.Show that if we consideronly thoseproductsequencesthatdiffer from eachother in at leastone matrixproduct,then thenumberof different sequencesis equalto the numberof differentbinary treeshaving exactlyr \342\200\224 1nodes.

(e) Show that the numberof different binary treeswith n nodesis

1 (2n\\n + 1y n J

2. [Matrix productchains] In the precedingexerciseit was establishedthat the numberof different ways to evaluatea matrixproductchainis very largeeven when r is relatively small(say 10or 20).In thisexercisewe shalldevelopan 0{r^)algorithmto find an optimalproductsequence(that is,oneof minimum cost).Let D(i),0< i < r,representthe dimensionsof the matrices;that is,Mj hasD(i~ 1) rows and D(i)columns.Let C(i,j)be the costof computingM{j usingan optimalproductsequencefor Mij.Observethat C(i,i)= 0,1 i. This recurrencerelationwill besimilarto Equation5.14.

(b) Write an algorithmto solvethe recurrencerelationof part (a) for

C(l,r).Your algorithmshouldbeof complexity0(r3).(c) What changesare neededin the algorithmof part (b) to

determine an optimalproductsequence.Write an algorithmtodetermine sucha sequence.Show that the overall complexity of youralgorithmremains0(r3).

(d) Work through your algorithm(by hand) for the productchainof part (b) of the previousexercise.What are the values ofC(i,j),1 i? What is an optimalway tocomputeM14?

Thereare two warehousesW\\ and W2 from which suppliesare to beshippedto destinationsD{,1 < i < n. Let di be the demandat D{and let r, be the inventory at Wj. Assume r\\ + r^ = X!di- Let Cjj(xy)bethe costof shippingX{j units from warehouseWi to destinationDj.The warehouseproblemis to find nonnegativeintegersXij, 1< i < 2and 1<j < n, suchthat x\\j +X2j

= dj,1<j < n, and J2ij cij(xij)ISminimized.Let gi(x) be the costincurredwhen W\\ has an inventoryof x and suppliesaresent to Dj,l<j < i,in an optimalmanner(theinventory at W2 is Y^\\<j<i dj \342\200\224 x).The costof an optimalsolutiontothe warehouseproblemis gn{f\\).

(a) Usethe optimality principleto obtain a recurrencerelationfor9i(x)-

(b) Write an algorithmto solvethis recurrenceand obtainan optimalsequenceof values for x^, 1< i <2,1<j <n.

Given a warehousewith a storagecapacity of B units and an initialstockof v units, lety, be the quantity soldin eachmonth,i,1< i <n.Let Pi be the per-unit sellingpricein month i, and Xj the quantitypurchasedin monthi. Thebuying priceis q perunit. At the end ofeachmonth,the stockin hand must beno morethan B.That is,

v + J2 (X*~

Vi) ^ -B' l ^ J - n

\\<i<j

The amount soldin eachmonth cannot be morethan the stockatthe end of the previousmonth (new stockarrives only at the end of amonth). That is,

Vi < v + Y^ (XJ~ %')' l<i<n


Also, we requireX{ and yi to benonnegativeintegers.The totalprofitderived is

n

Pn = ^{PjVj-CjXj)i=i

Theproblemis to determineXj and yj such that Pn is maximized.Let fi{vi)representthe maximumprofit that canbeearnedin monthsi + 1,i + 2,...,n, starting with Wj units of stockat the end of monthi. Then fo(v) is the maximumvalue of Pn.

(a) Obtain the dynamic programmingrecurrencefor fi(vi) in termsof /i+i(\302\253j).

(b) What is/\342\200\236(Wi)?

(c) Solvepart (a) analytically to obtain the formula

fi(vi) = a%Xi + biVi

for someconstantsa% and 6j.(d) Showthat an optimalPn is obtainedby usingthe following

strategy:

i.pt > ciA. If 6j+i > c^ then yi = V{ and x\\ = B.B. If 6j+i <q, then y, = Vj and Xj = 0.

ii.c% >piA. If bi+\\ > ci,then j/j = 0 and Xi = B ~ vi-B. If 6j+i <p^ then yj = wj and a;j = 0.C If Pi < 6j+i < Ci, then j/j = 0 and Xj = 0.

(e) Usethe pi and cj in Figure5.24and obtain an optimaldecisionsequencefrom part (d).

i 12 3 4 5 6 7 8Pi 88234325ct 3 6 7 1 4 5 1 3

Figure5.24pj and cj for Exercise4

Assume the warehousecapacity tobe 100and the initialstocktobe 60.


(f) Frompart (d)concludethat an optimalsetof values for %i and y%

will always leadto the following policy:Do no buying or sellingfor the first k months(k may bezero)and then oscillatebetweena full and an empty warehousefor the remainingmonths.

5. Assume that n programsare to be storedon two tapes.Let lj bethe length of tape neededto store the ith program. Assume thatJ2h <

\302\243,where L is the length of eachtape. A programcan be

storedon eitherof the two tapes.If S\\ is the setof programson tape1,then the worst-caseaccesstimefor a programis proportionaltomax{Eies1hi JligSih}- An optimalassignmentof programsto tapesminimizesthe worst-caseaccesstimes.Formulatea dynamicprogramming approachto determinethe worst-caseaccesstimeof an optimalassignment.Write an algorithmtodeterminethis time. What is thecomplexityof your algorithm?

6.RedoExercise5 makingthe assumptionthat programswill bestoredon tape 2 usinga different tape density than that usedon tape 1.Ifli is the tape lengthneededby programi when storedon tape1,thenah is the tape lengthneededon tape2.

7. Let L bean array of n distinctintegers.Give an efficient algorithmtofind the lengthof a longestincreasingsubsequenceof entriesin L.Forexample,if the entriesare 11,17,5,8,6,4,7,12,3,a longestincreasingsubsequenceis 5,6, 7,12.What is the run timeof your algorithm?

Chapter6

BASICTRAVERSAL ANDSEARCHTECHNIQUES

Thetechniquestobediscussedin thischapteraredivided into two categories.The first categoryincludestechniquesapplicableonly to binary trees.Asdescribed,thesetechniquesinvolve examiningevery nodein the given dataobjectinstance.Hence,thesetechniquesarereferredto as traversalmethods.Thesecondcategoryincludestechniquesapplicableto graphs(andhencealsoto treesand binary trees).Thesemay not examineall verticesand soarereferredto only as searchmethods.Duringa traversalor searchthe fieldsof a nodemay be used severaltimes. It may be necessaryto distinguishcertainusesof the fieldsof a node.Duringtheseuses,the nodeis saidtobevisited. Visitinga node may involve printingout its data field, evaluatingthe operationspecifiedby the nodein the caseof a binary treerepresentingan expression,setting a mark bit to one or zero,and so on. Sincewe aredescribingtraversalsand searchesof treesand graphsindependentlyof theirapplication,we use the term \"visited\" ratherthan the termfor the specificfunction performedon the nodeat this time.

6.1 TECHNIQUESFOR BINARYTREESThesolutionto many problemsinvolves the manipulationof binary trees,trees,or graphs.Often this manipulationrequiresus to determinea vertex(node)or a subsetof verticesin the given dataobjectthat satisfies a givenproperty. Forexample,we may wish tofind all verticesin a binary treewitha datavalue lessthan x or we may wish to find allverticesin a given graphG that can be reachedfrom another given vertex v. The determinationof this subsetof verticessatisfying a given property can be carriedout bysystematically examiningthe verticesof the given data object.This oftentakesthe form of a searchin the dataobject.When the searchnecessarily

313

314CHAPTER6. BASICTRAVERSALAND SEARCHTECHNIQUES

treenode= record{

Type data;// Type is the data type of data,treenode*lchild;treenode*rchild;

}1 AlgorithmInOrder(i)2 // t is a binary tree.Eachnodeof t has3 // threefields:Ichild,data,and rchild.4 {5 if t^ 0 then6 {7 In0rder(i->\342\226\240 Ichild);8 Visit(i);9 In0rder(i->\342\226\240 rchild);10 }11 }

Algorithm6.1Recursiveformulation of inordertraversal

involves the examinationof every vertex in the objectbeingsearched,it iscalleda traversal.

We have already seenan exampleof a problemwhosesolutionrequiredasearchof a binary tree. In Section5.5we presentedan algorithmto searcha binary searchtree for an identifier x. This algorithmis not a traversalalgorithmas it doesnot examineevery vertexin the searchtree.Sometimes,we may wish to traversea binary searchtree(e.g.,when we wish to list outallthe identifiersin the tree).Algorithmsfor this arestudiedin this chapter.

Therearemany operationsthat we want to performon binary trees.Onethat arisesfrequently is traversinga tree,or visiting eachnodein the treeexactlyonce.A traversalproducesa linearorderfor the information in atree.This linearordermay befamiliar and useful.When traversinga binarytree,we want to treat eachnodeand its subtreesin the samefashion. Ifwe let L,D,and R stand for moving left, printing the data, and movingright when at a node,then therearesixpossiblecombinationsof traversal:LDR, LRD, DLR, DRL, RDL, and RLD.If we adoptthe conventionthatwe traverseleft beforeright, then only threetraversalsremain:LDR, LRD,and DLR.To thesewe assignthe namesinorder,postorder,and preorder.Recursivefunctions for thesethree traversalsare given in Algorithms 6.1and 6.2.

6.1.TECHNIQUESFORBINARY TREES 315

1 AlgorithmPreOrder(i)2 // t is a binary tree.Eachnodeof t has3 // threefields:Ichild,data,and rchild.4 {5 if t ^ o then6 {7 Visit(i);8 PreOrder(i->\342\200\242 Ichild);9 PreOrder(i->\342\226\240 rchild);10 }11 }1 AlgorithmPostOrder(i)2 // t is a binary tree.Eachnodeof \302\243 has3 // threefields:Ichild,data,and rchild.4 {5 if t^ 0 then6 {7 PostOrder(i->\342\226\240 Ichild);8 PostOrder(\302\243 \342\200\224>\342\226\240 rchild);9 Visit(i);10 }11 }

Algorithm6.2Preorderand postordertraversals

Figure6.1shows a binary treeand Figure6.2traceshow InOrderworkson it. This traceassumesthat visiting a noderequiresonly the printingof its data field. The output resultingfrom this traversalis FDHGIBEAC.With Visit(i) replacedby a printingstatement,the applicationof Algorithm6.2to the binary treeof Figure6.1resultsin the outputs ABDFGHIECandFHIGDEBCA,respectively.

Theorem6.1LetT(n) and S(n) respectivelyrepresentthe timeand spaceneededby any one of the traversalalgorithmswhen the input tree t hasn > 0 nodes.If the timeand spaceneededto visit a nodeare 0(1),thenT(n) = 0(n)and S(n) = 0(n).Proof:Each traversalcan be regardedas a walk throughthe binary tree.Duringthis walk, eachnodeis reachedthreetimes:oncefrom its parent (orasthe startnodein casethe nodeis the root),onceon returningfrom its left


Figure6.1A binary tree

callof valueInOrder in root action

main123443455455233122

A

BDF

GH

I

E

C\342\200\224

print ('F')print ('D')

print ('H')print ('G')

print (T)print ('B')print ('E')print ('A')

print ('C')

Figure6.2Inordertraversalof the binary treeof Figure6.1

6.1.TECHNIQUESFORBINARY TREES 317

subtree,and onceon returningfrom its right subtree.In eachof thesethreetimesa constantamount of work is done.So, the total timetaken by thetraversalis O(n). The only additionalspaceneededis that for the recursionstack.If t has depth d, then this spaceis 0(d).Foran n-nodebinary tree,d <nand so S(n) \342\200\224 0(n). \342\226\241

EXERCISESUnlessotherwisestated,all binary treesare representedusingnodeswiththreefields:Ichild,data,and rchild.

Give an algorithmto count the numberof leafnodesin a binary treet. What is itscomputingtime?

Write an algorithmSwapTree(i)that takesa binary treeand swaps theleft and right childrenof every node.An exampleis given in Figure6.3.Useoneof the threetraversalmethodsdiscussedin Section6.1.

(K) (G)SwapTree(t)

Figure6.3Swappingleft and right children

3. Useone of the three traversalmethodsdiscussedin Section6.1toobtain an algorithmEquiv(i,u)that determineswhether the binarytreest and u areequivalent.Two binary treest and u areequivalentif and only if they are structurallyequivalent and if the data in thecorrespondingnodesof t and u arethe same.

4. Show the following:

(a) Inorderand postordersequencesof a binary treeuniquely definethe binary tree.

(b) Inorderand preordersequencesof a binary treeuniquely definethe binary tree.


(c) Preorderand postordersequencesof a binary treedo not uniquelydefine the binary tree.

5. In the proofof Theorem6.1,show,usinginduction,that T(n) < C2n+c\\ (whereC2 is a constant>2c\\).

6.Write a function to constructthe binary tree with a given inordersequence/ and a given postordersequenceP.What is the complexityof your function?

7. DoExercise6 for a given inorderand preordersequence.8.Write a nonrecursivealgorithmfor the preordertraversalof a binary

treet. Your algorithmmay usea stack.What are the timeand spacerequirementsof your algorithm?

9. Do Exercise8 for postorderas well as inordertraversals.10.[Triple-ordertraversal] A triple-ordertraversalof a binary tree t is

definedrecursivelyby Algorithm6.3.A very simplenonrecursivealgorithm for sucha traversalis given in Algorithm 6.4.In this algorithmp, q, and r point respectivelyto the presentnode,previously visitednode,and next nodeto visit. Thealgorithmassumesthat t ^ 0 andthat an empty subtreeof nodep is representedby a link to p ratherthan a zero.Provethat Algorithm 6.4is correct.(Hint: Threelinks,Ichild,rchild,and onefrom itsparent,areassociatedwith eachnodes. Each times is visited,the links are rotatedcounterclockwise,andsoafter threevisits they arerestoredto the originalconfigurationandthe algorithmbacksup the tree.)

11.[Level-ordertraversal] In a level-ordertraversalof a binary tree t allnodeson level i are visited beforeany nodeon level i' + 1 is visited.Within a level,nodesarevisited left to right. In level-orderthe nodesof the treeof Figure6.1arevisited in the orderABCDEFGHI.Writean algorithmLevel(t) to traversethe binary treet in level order.Howmuch timeand spaceareneededby your algorithm?

6.2 TECHNIQUESFOR GRAPHSA fundamentalproblemconcerninggraphs is the reachability problem.Inits simplestform it requiresus to determinewhetherthereexistsa path inthe given graph G = (V, E) such that this path startsat vertexv and endsat vertexu. A moregeneralform is to determinefor a given startingvertexv E V allverticesu suchthat thereis a path from v to u.Thislatterproblemcanbesolvedby startingat vertexv and systematically searchingthe graphG for verticesthat can bereachedfrom v. We describetwo searchmethodsfor this.

6.2.TECHNIQUESFOR GRAPHS 319

1234567891011

AlgorithmTriple(i){

if t / 0 then{

Visit(t);Triple^\342\200\224> Ichild);Visit(i);Triple(t \342\200\224> rchild);Visit (t);

}}

Algorithm6.3Triple-ordertraversalfor Exercise10

1234567891011

AlgorithmTrip(i);//{

}

It is assumedthat Ichildand rchildfieldsare>0.

Pw{

\\

= t;g:=-l;hile(p ^ -1)do

Visit(p);r :=(p \342\200\224> Ichild);(p \342\200\224> Ichild):= (p \342\200\224> rchild);(p \342\200\224> rchild) :=q; q :=p;p :=r;

Algorithm6.4A nonrecursivealgorithmfor the triple-ordertraversalforExercise10


6.2.1BreadthFirstSearchandTraversalIn breadthfirst searchwe start at a vertex v and mark it as having beenreached(visited). The vertex v is at this timesaid to be unexplored.Avertexis saidto have beenexploredby an algorithmwhen the algorithmhasvisited allverticesadjacentfrom it. All unvisited verticesadjacentfrom varevisited next.Thesearenew unexploredvertices.Vertexv has now beenexplored.Thenewly visited verticeshaven'tbeenexploredand areput ontothe endof a list of unexploredvertices.Thefirst vertexonthis list is the nextto beexplored.Explorationcontinuesuntil no unexploredvertexis left.Thelist of unexploredverticesoperatesasa queueand can berepresentedusingany of the standardqueuerepresentations(seeSection2.1).BFS(Algorithm6.5)describes,in pseudocode,the detailsof the search.It makesuse of thequeuerepresentationgiven in Section2.1(Algorithm2.3).

Example6.1Let us try out the algorithmon the undirectedgraphofFigure 6.4(a).If the graph is representedby its adjacencylistsas in Figure6.4(c),then the verticesget visited in the order1,2, 3,4, 5, 6, 7, 8. A

breadthfirst searchof the directedgraphof Figure6.4(b)starting at vertex1resultsin only the vertices1,2, and 3 beingvisited.Vertex4 cannot bereachedfrom 1. \342\226\241

Theorem6.2Algorithm BFSvisits allverticesreachablefrom v.

Proof:Let G = (V, E) be a graph (directedor undirected)and let v \302\243 V.We prove the theoremby inductionon the lengthof the shortestpath fromv to every reachablevertexw EV.The length(i.e.,numberof edges)of theshortestpath from v to a reachablevertexw is denotedby d(v,w).

BasisStep.Clearly, allverticesw with d(v,w) < 1get visited.

InductionHypothesis.Assume that all verticesw with d(v,w) < r getvisited.

InductionStep.We now show that allverticesw with d(v, w) = r + 1alsoget visited.

Let wbeavertex in V suchthat d(v,w) = r + 1.Let u bea vertexthatimmediately precedesw on a shortestv to w path. Then d(v,u) = r and sou getsvisited by BFS.We canassume

\302\253/\302\253and r > 1.Hence,immediately

beforeu getsvisited,it is placedon the queueq of unexploredvertices.Thealgorithmdoesn'tterminateuntil q becomesempty. Hence,u is removedfrom q at sometimeand allunvisited verticesadjacentfrom it get visited inthe for loopof line11of Algorithm 6.5.Hence,w getsvisited. \342\226\241

6.2.TECHNIQUESFORGRAPHS 321

1 AlgorithmBFS(w)2 //A breadthfirst searchof G is carriedout beginning3 // at vertexv. Forany nodei,visited[i]= 1if i has4 // already beenvisited.Thegraph G and array visited[]5 // areglobal;visited[] is initializedto zero.6 {7 u :=v; // q is a queueof unexploredvertices.8 visited[v] :=1;9 repeat10 {11 for allverticesw adjacentfrom u do12 {13 if (visited[w]= 0) then14 {15 Add w to q; // w is unexplored.16 visited[w] :=1;17 }18 }19 if q is empty thenreturn;// No unexploredvertex.20 Deleteu from q; // Get first unexploredvertex.21 }until(false);22 \\

Algorithm6.5Pseudocodefor breadthfirst search

Theorem6.3Let T(n,e) and S(n,e) bethe maximumtimeand maximumadditionalspacetaken by algorithmBFS on any graph G with n verticesand e edges.T(n,e)= 9(n+ e) and S(n,e)= 9(n) if G is representedby its adjacencylists.If G is representedby its adjacencymatrix,thenT(n,e)= 6(n2)and S(n,e)= 0(n).Proof:Verticesget addedtothe queueonly in line15of Algorithm 6.5.Avertexw cangetontothe queueonly if visited[w] = 0.Immediatelyfollowingw's additionto the queue,visited[w] is set to 1(line16).Hence,eachvertexcanget ontothe queueat most once.Vertexv never getsontothe queueandsoat most n \342\200\224 1additionsaremade.Thequeuespaceneededis at most n \342\200\224 1.Theremainingvariablestake0(1)space.Hence,S(n,e) = 0(n).If G is ann-vertexgraphwith v connectedtothe remainingn \342\200\224 1vertices,then all n \342\200\224 1verticesadjacentfrom v areon the queueat the sametime. Furthermore,Q(n) spaceis neededfor the array visited.HenceS(n,e)= 9(n). Thisresult is independentof whetheradjacencymatricesor listsareused.


(a)Undirectedgraph G

(b)Directedgraph

headnodes

[1][2][3][4]

[5]

[6][7]

[8]

- 0

-, i

-, i

-, 0

-, 0

> 3

-, T.

-^ A

-, 3

-, /|

- 6

- 8

-, 8

-, 8

> 8

-, ^

0

0000

-, ^

-~, 7

-, 6

0o-

^

(c)Adjacencylist for G

Figure6.4Examplegraphsand adjacencylists

6.2.TECHNIQUESFOR GRAPHS 323

1 AlgorithmBFT(G,n)2 // Breadthfirst traversalof G3 {4 for i :=1to n do // Mark allverticesunvisited.5 visited[i]:=0;6 for i :=1to n do7 if (visited[i]= 0) thenBFS(i);8 }

Algorithm6.6Breadthfirst graphtraversal

If adjacencylistsare used,then allverticesadjacentfrom u can bedetermined in timed(u), whered(u) is the degreeof u if G is undirectedandd(u) is the out-degreeof u if G is directed.Hence,when vertexu is beingexplored,the timefor the for loopof line 11of Algorithm 6.5is Q(d(u)).Sinceeachvertex in G can be exploredat most once,the total time forthe repeatloopof line 9 is 0(^2d(u)) = 0(e).Then visited[i]has to beinitializedto 0,1< i < n. This takes0(n)time. Thetotal timeistherefore 0(n+ e). If adjacencymatricesareused,then it takes0(n)timetodetermineall verticesadjacentfrom u and the timebecomes0(n2).If Gis a graph such that all verticesare reachablefrom v, then allverticesgetexploredand the timeis at least0(n+ e) and 0(n2)respectively.Hence,T(n,e)= 0(n+ e) when adjacencylistsareused,and T(n,e)= 0(n2)whenadjacencymatricesareused. \342\226\241

If BFSis usedon a connectedundirectedgraphG,then allverticesin Gget visited and the graph is traversed.However,if G is not connected,thenat leastonevertexof G is not visited.A completetraversalof the graphcanbemadeby repeatedlycallingBFSeachtimewith a new unvisited startingvertex.Theresultingtraversalalgorithmis known as breadthfirst traversal(BFT)(seeAlgorithm 6.6).The proofof Theorem6.3can beused for BFTtooto show that the timeand additionalspacerequiredby BFT on an n-vertexe-edgegraphare0(n+ e) and 0(n)respectivelyif adjacencylistsareused.If adjacencymatricesareused,then the boundsare0(n2)and

\302\251(n)

respectively.

6.2.2 DepthFirstSearchandTraversalA depth first searchof a graphdiffers from a breadthfirst searchin that theexplorationof a vertexv is suspendedassoonas a new vertexis reached.At

324 CHAPTER6. BASICTRAVERSALAND SEARCHTECHNIQUES

1 AlgorithmDFS(w)2 // Given an undirected(directed)graph G = (V,E) with3 // n verticesand an array visited[] initially set4 //to zero,this algorithmvisits allvertices5 // reachablefrom v. G and visited[] areglobal.6 {7 visited[v] :=1;8 for eachvertexw adjacentfrom v do9 {10 if (visited[w]= 0) thenDFS(u;);11 }12 }

Algorithm6.7Depthfirst searchof a graph

this timethe explorationof the new vertexu begins.When this new vertexhas beenexplored,the explorationof v continues.The searchterminateswhen all reachedverticeshave beenfully explored.This searchprocessisbestdescribedrecursivelyas in Algorithm 6.7.

Example6.2A depth first searchof the graphof Figure6.4(a)startingatvertex1and usingthe adjacencylistsof Figure6.4(c)resultsin the verticesbeingvisited in the order1,2, 4, 8, 5, 6, 3,7. \342\226\241

Onecan easily prove that DFSvisits allverticesreachablefrom vertexv.If T(n,e) and S(n,e) representthe maximumtimeand maximumadditionalspacetaken by DFS for an n-vertex e-edgegraph, then S(n,e) =

\302\251(n)

and T(n,e)= 9(n+ e) if adjacencylistsare used and T(n,e)= 9(n2)ifadjacencymatricesareused (seethe exercises).

A depth first traversalof a graph is carriedout by repeatedlycallingDFS,with a new unvisited startingvertexeachtime.Thealgorithmfor this(DFT)differs from BFT only in that the callto BFS(i)is replacedby a callto DFS(i).Theexercisescontainsomeproblemsthat aresolvedbestby BFSand othersthat aresolved bestby DFS.Latersectionsof this chapteralsodiscussgraphproblemssolvedbestby DFS.

BFSand DFSare two fundamentally different searchmethods.In BFSanodeis fully exploredbeforethe explorationof any othernodebegins.Thenext nodeto exploreis the first unexplorednoderemaining.The exercisesexaminea searchtechnique(D-search)that differs from BFS only in that

6.3.CONNECTEDCOMPONENTSAND SPANNINGTREES 325

the next nodeto exploreis the most recently reachedunexplorednode.InDFS the explorationof a nodeis suspendedas soon as a new unexplorednodeis reached.Theexplorationof this new nodeis immediately begun.

EXERCISES1.Devisean algorithmusingthe ideaof BFSto find a shortest(directed)

cycle containinga given vertex v. Prove that your algorithmfindsa shortestcycle.What are the timeand spacerequirementsof youralgorithm?

2.Showthat DFSvisits allverticesin G reachablefrom v.

3.Provethat the boundsof Theorem6.3holdfor DFS.

4. It is easy to seethat for any graph G,both DFS and BFS will takealmostthe sameamount of time. However, the spacerequirementsmay beconsiderablydifferent.

(a) Give an exampleof an n-vertexgraph for which the depth ofrecursion of DFSstartingfrom a particularvertexv is n \342\200\224 1whereasthe queueof BFShas at most onevertexat any given timeif BFSis startedfrom the samevertexv.

(b) Give an exampleof an n-vertexgraphfor which the queueof BFShas n \342\200\224 1verticesat one timewhereasthe depth of recursionofDFS is at most one. Both searchesare startedfrom the samevertex.

5. Another way tosearcha graph is D-search.This methoddiffers fromBFS in that the next vertex to exploreis the vertex most recentlyaddedto the list of unexploredvertices.Hence,this list operatesas astackratherthan a queue.

(a) Write an algorithmfor D-search.(b) Showthat D-searchstartingfrom vertexv visits allvertices

reachable from v.

(c) What arethe timeand spacerequirementsof your algorithm?

6.3 CONNECTEDCOMPONENTSANDSPANNINGTREES

If G is a connectedundirectedgraph, then allverticesof G will get visitedon the first call to BFS (Algorithm 6.5).If G is not connected,then at


leasttwo callsto BFSwill beneeded.Hence,BFScan beusedtodeterminewhetherG is connected.Furthermore,allnewly visited verticeson a calltoBFSfrom BFT representthe verticesin a connectedcomponentof G.Hencethe connectedcomponentsof a graphcan beobtainedusingBFT.Forthis,BFS can be modified sothat all newly visited verticesare put onto a list.Then the subgraphformed by the verticeson this list make up a connectedcomponent.Hence,if adjacencylistsareused,a breadthfirst traversalwillobtain the connectedcomponentsin 9(n+ e) time.

BFT can alsobeused to obtain the reflexive transitiveclosurematrixofan undirectedgraph G. If A* is this matrix,then A*(i,j)= 1 iff eitheri = j or i ^ j and i and j are in the sameconnectedcomponent.We canset up in Q(n + e) timean array connecsuch that connec[i]is the indexof the connectedcomponentcontainingvertex i, 1 < i < n. Hence,wecan determinewhetherA*(i,j),i ^ J, is 1or 0 by simply seeingwhetherconnec[i]= connec[j].The reflexive transitiveclosurematrixof anundirected graph G with n verticesand e edgescan thereforebe computedin0(n2)timeand G(n)spaceusingeitheradjacencylistsormatrices(thespacecountdoesnot includethe spaceneededfor A* itself).

As a final applicationof breadthfirst search,considerthe problemofobtaininga spanningtree for an undirectedgraph G. Thegraph G has aspanningtreeiff G is connected.Hence,BFSeasily determinesthe existenceof a spanningtree.Furthermore,considerthe setof edges(u,w) used in thefor loopof line 11of algorithmBFS to reachunvisited verticesw. Theseedgesarecalledforward edges.Let t denotethe setof theseforward edges.We claimthat if G is connected,then tisa spanningtreeof G.Forthe graphof Figure6.4(a)the setof edgest will bealledgesin G except(5,8), (6,8),and (7,8) (seeFigure6.5(b)).Spanningtreesobtainedusinga breadthfirstsearcharecalledbreadth first spanningtrees.

(a)DFS(l)spanningtree (b)BFS(l)spanningtree

Figure6.5DFSand BFSspanningtreesfor the graphof Figure6.4(a)

6.3.CONNECTEDCOMPONENTSAND SPANNINGTREES 327

Theorem6.4Modify algorithmBFS by addingon the statementst :=0;and t := t U {(u,w)};to lines 8 and 16,respectively.Call the resultingalgorithmBFS*.If BFS*is calledsothat v is any vertex in a connectedundirectedgraph G, then on termination,the edgesin t form a spanningtreeof G.Proof:We have already seenthat if G is a connectedgraphon n vertices,then all n verticeswill getvisited.Also,eachof these,exceptthe startvertexv, will get ontothe queueonce(line15).Hence,t will containexactlyn \342\200\224 1edges.All theseedgesaredistinct.Then \342\200\224 1edgesin t will thereforedefinean undirectedgraph on n vertices.This graph will be connectedsinceitcontainsa path from the start vertexv to every othervertex(andsothereis a path betweeneachtwo vertices).A simpleproofby inductionshowsthat every connectedgraphon n verticeswith exactlyn \342\200\224 1edgesis a tree.Hencet is a spanningtreeof G. \342\226\241

As in the caseof BFT,the connectedcomponentsof a graph can beobtainedusingDFT.Similarly,the reflexive transitiveclosurematrixof anundirectedgraphcanbefound usingDFT.If DFS(Algorithm6.7)is modifiedby adding t := 0;and t := t U {(v,w)}',to line 7 and the if statementofline 10,respectively, then when DFS terminates,the edgesin t define aspanningtree for the undirectedgraph G if G is connected.A spanningtree obtainedin this manner is calleda depth first spanningtree. Forthegraphof Figure6.4(a)the spanningtreeobtainedwill includealledgesin Gexceptfor (2,5),(8,7),and (1,3)(seeFigure6.5(a)).Hence,DFSand BFSareequally powerfulfor the searchproblemsdiscussedso far.

EXERCISES1.Show that for any undirectedgraph G = (V, E),a callto BFS(w)with

v \342\202\254 V resultsin visiting all the verticesin the connectedcomponentcontainingv.

2. RewriteBFS and BFT so that all the connectedcomponentsof theundirectedgraph G get printedout.Assume that G is input inadjacency list form.

3. Provethat if G is a connectedundirectedgraph with n verticesandn \342\200\224 1edges,then G is a tree.

4. Presenta D-search-basedalgorithmthat producesa spanningtreeforan undirectedconnectedgraph.

5. (a) Theradiusof a treeis its depth.Showthat the forward edgesusedin BFS(w)define a spanningtree with root v having minimumradius among all spanningtrees,for the undirectedconnectedgraph G having rootv.


(b) Usingthe resultof part (a),write an algorithmto find a minimum-radiusspanningtreefor G.What arethe timeand spacerequirements of your algorithm?

6. The diameterof a treeis the maximumdistancebetweenany twovertices. Let d be the diameterof a minimum-diameterspanningtreeforan undirectedconnectedgraphG.Let r be the radiusof a minimum-radiusspanningtreefor G.

(a) Show that 2r- 1< d < 2r.(b) Write an algorithmto find a minimum-diameterspanningtree

for G. (Hint: Usebreadth-firstsearchfollowed by somelocalmodification.)

(c) Provethat your algorithmis correct.(d) What are the timeand spacerequirementsof your algorithm?

7. A bipartite graph G = [V, E) is an undirectedgraph whoseverticescan be partitionedinto two disjointsets V\\ and V2 = V \342\200\224

V\\ withthe propertiesthat no two verticesin V\\ are adjacentin G and notwo verticesin V2 are adjacentin G. Thegraph G of Figure6.4(a)is bipartite.A possiblepartitioningof V is V\\ = {1,4,5,6,7}andV2 \342\200\224 {2,3,8}.Write an algorithmto determinewhethera graph G isbipartite.If G is bipartite,your algorithmshouldobtaina partitioningof the verticesinto two disjointsetsV\\ and V2 satisfying the propertiesabove. Show that if G is representedby its adjacencylists,then thisalgorithmcan be madeto work in time0(n+e),wheren =

\\V\\ ande= \\E\\.

8.Write an algorithmto find the reflexive transitiveclosurematrixA*of a directedgraphG.Show that if G has n verticesand e edgesandis representedby its adjacencylists,then this can be done in time9(n2+ne).How much spacedoesyour algorithmtakein additiontothat neededfor G and A*? (Hint: UseeitherBFSor DFS.)

9. Input is an undirectedconnectedgraph G(V,E) eachone of whoseedgeshas the sameweight w (w beinga realnumber).Give an 0(\\E\\)timealgorithmto find a minimum-costspanningtreefor G.What isthe weight of this tree?

10.Given are a directedgraph G(V,E)and a nodev \342\202\254 V. Write anefficient algorithmto decidewhetherthereis a directedpath from vto every othernodein the graph.What is the worst-caserun timeofyour algorithm?

11.Designan algorithmto decidewhethera given undirectedgraphG(V,E)containsa cycleof length4.Therunningtimeof the algorithmshouldbeO(\\V\\3).

6.4.BICONNECTEDCOMPONENTSAND DFS 329

12.Let G(V,E) bea binary treewith n nodes.Thedistancebetweentwoverticesin G is the lengthof the path connectingthesetwo vertices.Theproblemis to constructannxnmatrixwhoseijthentry is thedistance betweenvi and Vj. Designan 0(n2)timealgorithmto constructsuch a matrix. Assume that the tree is given in the adjacency-listrepresentation.

13.Presentan 0(|V|)timealgorithmto checkwhethera given undirectedgraph G(V,E) is a tree. The graph G is given in the form of anadjacencylist.

6.4 BICONNECTEDCOMPONENTSAND DFS

In this section,by \"graph\" we always meanan undirectedgraph.A vertexv in a connectedgraph G is an articulationpoint if and only if the deletionof vertexv togetherwith all edgesincidentto v disconnectsthe graph intotwo or morenonempty components.

Example6.3In the connectedgraphof Figure6.6(a)vertex2 is anarticulation point as the deletionof vertex2 and edges(1,2),(2,3),(2,5),(2,7),and (2,8)leaves behind two disconnectednonempty components(Figure6.6(b)).Graph G of Figure6.6(a)has only two other articulationpoints:vertex5 and vertex3.Note that if any of the remainingverticesis deletedfrom G,then exactlyonecomponentremains. \342\226\241

A graphG is biconnectedif and only if it containsno articulationpoints.Thegraph of Figure6.6(a)is not biconnected.Thegraph of Figure6.7isbiconnected.The presenceof articulationpoints in a connectedgraph canbean undesirablefeaturein many cases.For example,if G representsacommunication network with the verticesrepresentingcommunicationstationsand the edgescommunicationlines,then the failure of a communicationstation i that is an articulationpoint would result in the lossof communicationto points other than i too.On the otherhand, if G has no articulationpoint, then if any stationi fails, we can stillcommunicatebetweeneverytwo stationsnot includingstationi.

In this sectionwe developan efficient algorithmto test whetheraconnected graphis biconnected.For the caseof graphsthat arenot biconnected,this algorithmwill identify all the articulationpoints.Onceit has beendetermined that a connectedgraph G is not biconnected,it may bedesirableto determinea set of edgeswhoseinclusionmakesthe graph biconnected.Determiningsucha set of edgesis facilitatedif we know the maximalsubgraphs of G that arebiconnected.G'= (V',E')is a maximalbiconnectedsubgraphof G if and only if G has no biconnectedsubgraphG\" = (V\", E\


(a)Graph G (b)Resultof deletingvertex 2

Figure6.6An examplegraph

such that V C V\" and E' c E\". A maximalbiconnectedsubgraph is abiconnectedcomponent.

Thegraph of Figure6.7has only one biconnectedcomponent(i.e.,theentiregraph). The biconnectedcomponentsof the graph of Figure6.6(a)areshown in Figure6.8.

Figure6.7A biconnectedgraph

It is relatively easy to show that

Lemma6.1Two biconnectedcomponentscan have at most one vertex incommonand this vertexis an articulationpoint. \342\226\241

Hence,no edgecan be in two different biconnectedcomponents(as thiswould requiretwo commonvertices).The graph G can be transformedintoa biconnectedgraphby usingthe edgeadditionschemeof Algorithm 6.8.


Sinceevery biconnectedcomponentof a connectedgraph G containsatleasttwo vertices(unlessG itselfhas only onevertex),it follows that the Wj

of line5 exists.

Example6.4Usingthe above schemeto transformthe graph of Figure6.6(a)into a biconnectedgraph requiresus to addedges(4,10)and (10,9)(correspondingto the articulationpoint 3),edge(1,5)(correspondingto thearticulationpoint 2), and edge(6,7) (correspondingto point 5). \342\226\241

Figure6.8Biconnectedcomponentsof graph of Figure6.6(a)

Note that oncethe edges(vj,Vj+i) of line 6 (Algorithm6.8)areadded,vertex a is no longeran articulationpoint.Hencefollowing the addition

1 for eacharticulationpoint a do2 {3 Let B\\, B'2,\342\226\240\342\226\240

\342\226\240, -B/c be the biconnected4 componentscontainingvertexa;5 Let Vi, Vi j\302\243 a, bea vertexin Bi,1< i < k;6 Add toG the edges(wi,^+i), 1< i < k;7 }

Algorithm6.8Schemetoconstructa biconnectedgraph


of the edgescorrespondingto allarticulationpoints,G has no articulationpoints and sois biconnected.If G has p articulationpoints and b bicon-nectedcomponents,then the schemeof Algorithm 6.8introducesexactlyb \342\200\224 p new edgesinto G. One can show that this schememay use morethan the minimum numberof edgesneededto make G biconnected(seetheexercises).

Now, letus attackthe problemof identifying the articulationpointsandbiconnectedcomponentsof a connectedgraph G with n > 2 vertices.Theproblemis efficiently solved by consideringa depthfirst spanningtreeof G.

Figure6.9(a)and (b) shows a depthfirst spanningtreeof the graph ofFigure6.6(a).In eachfigure thereis a numberoutsideeachvertex.Thesenumberscorrespondto the orderin which a depthfirst searchvisits theseverticesand arereferredto as the depth first numbers (dfns) of the vertex.Thus, dfn[l]= 1,dfn[4] = 2,dfn[6] = 8,and soon.In Figure6.9(b)solidedgesform the depth first spanningtree. Theseedgesarecalledtreeedges.Brokenedges(i.e.,all the remainingedges)arecalledbackedges.

(a) (b)

Figure6.9A depthfirst spanningtreeof the graphof Figure6.6(a)

Depthfirst spanningtreeshave a property that is very usefulin identifyingarticulationpointsand biconnectedcomponents


Lemma6.2If (it, v) is any edgein G,then relativeto the depth firstspanning

treet, eitheru is an ancestorof v orv is an ancestorof it.So,thereareno crossedgesrelativeto a depth first spanningtree ((it,v) is a crossedgerelativeto t if and only if u is not an ancestorof v and v is not an ancestorof it).

Proof:To seethis, assumethat (it, v) \342\202\254 E(G)and (it, v) is a crossedge.Then (it, v) cannotbe a treeedgeas otherwiseu is the parent of v or viceversa.So,(it,v) must be a backedge.Without lossof generality, we canassumedfn[u] < dfn[v]. Sincevertexu is visited first, itsexplorationcannotbe completeuntil vertex v is visited. Fromthe definition of depth firstsearch,it follows that it is an ancestorof all the verticesvisited until it iscompletelyexplored.Henceit is an ancestorof v in t and (it, v) cannotbe acrossedge. \342\226\241

We make the following observation

Lemma6.3Therootnodeof a depth first spanningtreeis an articulationpoint iff it has at leasttwo children.Furthermore,if it is any othervertex,then it is not an articulationpoint iff from every childw of it it is possibleto reachan ancestorof u usingonly a path madeup of descendentsof w anda backedge. \342\226\241

Note that if this cannotbedonefor somechildw of u, then the deletionofvertex it leaves behindat leasttwo nonempty components(onecontainingthe root and the othercontainingvertexw). This observationleadsto asimplerule to identify articulationpoints.For eachvertexit, define L[u] asfollows:

L[u] \342\200\224 min {dfn[u\\, min {L[w}\\

w is a childof it},min {dfn[w} \\

(u,w) is a back edge}}

It shouldbe clearthat L[u] is the lowestdepthfirst numberthat can bereachedfrom it usinga path of descendentsfollowed by at most one backedge.From the precedingdiscussionit follows that if u is not the root,thenit is an articulationpoint iff it has a childw suchthat L[w] >dfn[u].

Example6.5Forthe spanningtreeof Figure6.9(b)the L values areL[l:10]= {1,1,1,1,6,8,6,6,5,4}.Vertex3 is an articulationpoint as child10has

\302\243[10]= 4 and d/n[3]= 3. Vertex2 is an articulationpoint as child5

has L[5]= 6 and dfn[2] = 6. Theonly otherarticulationpoint is vertex5;child6 has L[6]= 8 and dfn[5\\

= 7. O

L[u] can be easily computedif the verticesof the depthfirst spanningtree are visited in postorder.Thus, to determinethe articulationpoints,


it is necessarytoperforma depth first searchof the graph G and visit thenodesin the resultingdepth first spanningtreein postorder.It is possibletodoboth thesefunctions in parallel.PseudocodeArt (Algorithm 6.9)carriesout a depth first searchof G.Duringthis searcheachnewly visited vertexgetsassignedits depthfirst number.At the sametime,L[i] is computedforeachvertex in the tree. This algorithmassumesthat the connectedgraphG and the arrays dfn and L areglobal.In addition,it is assumedthat thevariable num is alsoglobal.It is clearfrom the algorithmthat when vertexu has beenexploredand a return madefrom the function,then L[u] hasbeencorrectlycomputed.Note that in the elseclauseof line15,if w ^ v,then either(u,w) is a backedgeor dfn[w] > dfn[u] > L[u\\. In eithercase,L[u] is correctlyupdated.The initial callto Art is Art(l,0).Note dfn isinitializedto zerobeforeinvoking Art.

1 AlgorithmArt(it, v)2 // u is a start vertexfor depth first search,v is its parent if any3 // in the depth first spanningtree. It is assumedthat the global4 // array dfn is initializedto zeroand that the globalvariable5 // num is initializedto 1.n is the numberof verticesin G.6 {7 dfn[u] :=num,', L[u] :=num,', num :=num + 1;8 for eachvertexw adjacentfrom u do9 {10 if (dfn[w]= 0) then11 {12 Art(w,u);// w is unvisited.13 L[u] :=mm(L[u],L[w]);14 }15 elseif (w ^ v) thenL[u] :=mm(L[u],dfn[w]);16 }17 }

Algorithm6.9Pseudocodeto computedfn and L

OnceL[l:n] has beencomputed,the articulationpointscanbe identifiedin 0(e)time.SinceArt has a complexity0(n+e),wheree is the numberofedgesin G,the articulationpointsof G canbedeterminedin 0(n+e) time.

Now,what needsto bedoneto determinethe biconnectedcomponentsofGl If following the callto Art (line12)L[w] >dfn[u],then we know that uis eitherthe rootor an articulationpoint.Regardlessof whetheru is not therootoris the rootand has oneormorechildren,the edge(u,w) togetherwith


all edges(both tree and back)encounteredduring this callto Art (exceptfor edgesin other biconnectedcomponentscontainedin subtreew) formsa biconnectedcomponent.A formal proofof this statementappearsin theproofof Theorem6.5.Themodifiedalgorithmappearsas Algorithm 6.10.

AlgorithmBiComp(u,v)II u is a start vertexfor depthfirst search,v is its parent if

// any in the depthfirst spanningtree. It is assumedthat the// globalarray dfn is initially zeroand that the globalvariable// num is initializedto 1.n is the numberof verticesin G.

dfn[u] :=num',L[u] :=num',num :=num + 1;for eachvertexw adjacentfrom u do{

if ((w y^ w) and (dfn[w]< dfn[u]))thenadd (u,w) to the topof a stacks;

if (dfn[w]= 0) then{

if (L[w]>dfn[u])then{

write (\"Newbicomponent\;repeat{

Deletean edgefrom the topof stacks;Let this edgebe (a;,y)',write (x,y);

}until(((x,y) = (u,w)) or ((x,y) = (w,u)));}.B\\Comp(w,u)',I/ w is unvisited.L[u] :=min(L[it],L[\302\253;]);

}elseif (w ^ v) thenL[u] :=mm(L[u],dfn[w])',

1234567899.19.2101111.111.211.311.411.511.611.711.811.911.10121314151617

Algorithm6.10Pseudocodetodeterminebicomponents

Onecanverify that the computingtimeof Algorithm 6.10remains0(n+e).Thefollowing theoremestablishesthe correctnessof the algorithm.Notethat when G has only onevertex,it has no edgessothe algorithmgenerates


no output.In this caseG doeshave a biconnectedcomponent,namely itssinglevertex.Thiscasecanbehandledseparately.Theorem6.5Algorithm 6.10correctlygeneratesthe biconnectedcomponents of the connectedgraph G when G has at leasttwo vertices.

Proof:Thiscanbeshown by inductionon the numberof biconnectedcomponents in G.Clearly, for allbiconnectedgraphsG,the rootu of the depthfirst spanningtree has only one childw. Furthermore,w is the onlyvertex for which L[w] > dfn[u] in line 11.1of Algorithm 6.10.By the timew has beenexplored,all edgesin G have beenoutput as one biconnectedcomponent.

Now assumethe algorithmworkscorrectlyfor all connectedgraphsG withat most m biconnectedcomponents.We show that it alsoworks correctlyfor all connectedgraphswith m + 1biconnectedcomponents.Let G beanysuch graph. Considerthe first time that L[w] > dfn[u] in line 11.1.At

this timeno edgeshave beenoutput and soall edgesin G incidentto thedescendentsof w areon the stackand areabove the edge(u,w). Sincenoneof the descendentsof u is an articulationpoint and u is,it follows that the setof edgesabove (u,w) on the stackforms a biconnectedcomponenttogetherwith the edge(u,w). Oncetheseedgeshave beendeletedfrom the stackand output, the algorithmbehavesessentially as it would on the graph G',obtainedby deletingfrom G the biconnectedcomponentjust output.Thebehaviorof the algorithmon G differs from that on G'only in that duringthe completionof the explorationof vertexu, someedges(u,r) such that(u,r) is in the componentjust output may be considered.However,for allsuchedges,dfn[r]^ 0 and dfn[r] > dfn[u] > L[u].Hence,theseedgesonlyresult in a vacuous iterationof the for loopof line8 and do not materiallyaffect the algorithm.

Onecaneasilyestablishthat G'has at leasttwo vertices.Sincein additionG' has exactlym biconnectedcomponents,it follows from the inductionhypothesis that the remainingcomponentsarecorrectlygenerated. \342\226\241

It shouldbe noted that the algorithmdescribedabove will work withany spanningtree relative to which the given graph has no crossedges.Unfortunately, graphscan have crossedgesrelativeto breadthfirst spanningtrees.Hence,algorithmArt cannotbeadaptedto BFS.

EXERCISES1.For the graphsof Figure6.10identify the articulationpointsand draw

the biconnectedcomponents.2. Showthat if G is a connectedundirectedgraph,then no edgeof G can

be in two different biconnectedcomponents.


(a)

Figure6.10Graphsfor Exercise1

3. Let Gi = (Vi,Ei),l< i < k, be the biconnectedcomponentsof aconnectedgraphG.Show that

(a) If % t^ j, then ViflVj containsat most onevertex.(b) Vertexv is an articulationpoint of G iff {v}= Vi fl Vj for somei

and j, i ^ j.4. Show that the schemeof Algorithm 6.8may use morethan the

minimum numberof edgesneededto make G biconnected.

5.Let G be a connectedundirectedgraph. Write an algorithmto findthe minimum numberof edgesthat have to be addedto G so thatG becomesbiconnected.Your algorithmshouldoutput sucha set ofedges.What are the timeand spacerequirementsof your algorithm?

6.Show that if t is a breadthfirst spanningtree for an undirectedconnected graphG,then G may have crossedgesrelativetot.

7. Provethat a nonrootvertexu is an articulationpoint iff L[w]> dfn[u]for somechildw of u.

8.Prove that in BiComp(Algorithm 6.10)if eitherv = w or dfn[w] >dfn[u],then edge(u,w) is eitheralready on the stackof edgesor hasbeenoutput as part of a biconnectedcomponent.

9. Let G(V,E)be any connectedundirectedgraph. A bridge of G isdefined tobean edgeof G which when removed from G,will makeitdisconnected.Presentan 0(|-E|)timealgorithmto find allthe bridgesof G.

10.Let S(V,T)be any DFStree for a given connectedundirectedgraphG(V,E).Provethat a leaf of 5\" can not be an articulationpoint of G.


11.Proveordisprove:\"An undirectedgraphG(V,E) is biconnectedif andonly if for eachpair of distinctverticesv and w in V thereare twodistinctpaths from v to w that have no verticesin commonexceptvand to.\"

6.5 REFERENCES AND READINGSSeveralapplicationsof depth first searchto graph problemsare given in\"Depthfirst searchand lineargraphalgorithms,\"by R. Tarjan,SIAMJournal on Computing1,no.2 (1972):146-160.

The0{n+e) depthfirst algorithmfor biconnectedcomponentsis due toR. Tarjanand appearsin the precedingpaper.This paperalsocontainsan0(n+e) algorithmto find the strongly connectedcomponentsof a directedgraph.

An 0(n+ e) algorithmto find a smallestset of edgesthat, when addedto a graphG,producesa biconnectedgraphhasbeengiven by A. Rosenthaland A. Goldner.

Foran extensivecoverageon graph algorithmssee:DataStructuresand Network Algorithms,by R. E.Tarjan,SocietyforIndustrial and Applied Mathematics,1983.Algorithmic Graph Theory, by A. Gibbons,CambridgeUniversity Press,1985.Algorithmic Graph Theory and PerfectGraphs,by M. Golumbic,AcademicPress,1980.

Chapter7

BACKTRACKING


In the searchfor fundamentalprinciplesof algorithmdesign,backtrackingrepresentsone of the most generaltechniques.Many problemswhich dealwith searchingfor a set of solutionsor which ask for an optimalsolutionsatisfyingsomeconstraintscanbesolvedusingthe backtrackingformulation.The namebacktrackwas first coinedby D. H.Lehmerin the 1950s.Earlyworkerswho studiedthe processwereR. J.Walker, who gave an algorithmicaccountof it in 1960,and S.Golomband L.Baumertwho presenteda verygeneraldescriptionof it as well as a variety of applications.

In many applicationsof the backtrackmethod,the desiredsolutionisexpressibleas an n-tuple (xi,...,xn),wherethe %i fire chosenfrom somefinite set Sj. Often the problemto be solved callsfor finding one vectorthat maximizes(orminimizesorsatisfies)a criterionfunctionP(x\\,\342\226\240\342\226\240

\342\226\240, %n)-Sometimesit seeksallvectorsthat satisfy P.Forexample,sortingthe arrayof integersin a[l: n] is a problemwhosesolutionis expressibleby an n-tuple,wherex\\ is the indexin a of the ith smallestelement.The criterionfunctionP is the inequality a[xj\\ < a[xi+\\]for 1< i <n.ThesetSj is finiteand includesthe integers1throughn.Thoughsortingis not usually oneofthe problemssolvedby backtracking,it is oneexampleof a familiar problemwhosesolutioncan be formulated asan n-tuple.In this chapterwe study acollectionof problemswhosesolutionsarebestdoneusingbacktracking.

Supposeml is the sizeof set Si.Then thereare m = m\\m2 \342\226\240\342\226\240\342\226\240mn n-tuplesthat arepossiblecandidatesfor satisfying the functionP. The bruteforceapproachwould be to form all thesen-tuples,evaluateeachone withP, and save thosewhich yield the optimum.Thebacktrackalgorithmhasas its virtue the ability to yield the sameanswer with far fewer than mtrials.Itsbasicideais to build up the solutionvectoronecomponentat atimeand to usemodifiedcriterionfunctions Pi{x\\,...,xi)(sometimescalled

339

340 CHAPTER7. BACKTRACKING

boundingfunctions)to testwhetherthe vectorbeingformed has any chanceof success.The majoradvantageof this methodis this:if it is realizedthatthe partialvector(x\\,X2,\342\226\240\342\226\240\342\226\240,x{)can in no way leadtoan optimalsolution,then mj+i\342\226\240\342\200\242\342\200\242mn possibletest vectorscan be ignoredentirely.

Many of the problemswe solve usingbacktrackingrequirethat all thesolutionssatisfy a complexset of constraints.For any problemtheseconstraints can be divided into two categories:explicitand implicit.

Definition7.1Explicitconstraintsare rulesthat restricteachX{ to takeon values only from a given set. \342\226\241

Commonexamplesof explicitconstraintsare

X{ > 0 ot Si = {allnonnegativerealnumbers}xt = 0 or 1 ov Si = {0,1}h < %i < ut or Si = {a:li < a <ut}

Theexplicitconstraintsdependon the particularinstance/ of the problembeingsolved.All tuplesthat satisfy the explicitconstraintsdefinea possiblesolutionspacefor /.Definition7.2Theimplicitconstraintsare rulesthat determinewhich ofthe tuples in the solutionspaceof I satisfy the criterionfunction. Thusimplicitconstraintsdescribethe way in which the Xi must relateto eachother. D

Example7.1[8-queens]A classiccombinatorialproblemis to placeeightqueenson an 8 x 8 chessboardsothat no two \"attack,\" that is,sothat notwo of them areon the samerow, column,or diagonal.Let us numbertherows and columnsof the chessboard1through 8 (Figure7.1).Thequeenscanalsobenumbered1through8.Sinceeachqueenmust beon a differentrow, we can without lossof generalityassumequeen i is to be placedonrow i. All solutionsto the 8-queensproblemcan thereforebe representedas 8-tuples(x\\,...,xs),whereX{ is the columnon which queeni is placed.Theexplicitconstraintsusingthis formulation areSi = {1,2,3,4,5,6,7, 8},1< i < 8.Thereforethe solutionspaceconsistsof 88 8-tuples.The implicitconstraintsfor this problemare that no two x^s can bethe same(i.e.,allqueensmust beon different columns)and no two queenscanbeon the samediagonal.The first of thesetwo constraintsimpliesthat all solutionsarepermutationsof the 8-tuple(1,2,3, 4, 5,6,7, 8).This realizationreducesthe sizeof the solutionspacefrom 88 tuplesto 8!tuples.We seelaterhow toformulatethe secondconstraintin termsof the x\\. Expressedas an 8-tuple,the solutionin Figure7.1is (4, 6,8,2,7, 1,3,5). \342\226\241

7.1.THEGENERAL METHOD 341

column \342\226\272

12 3 4 5 6 7 8

Q

Q

Q

Q

Q

Q

Q

Q

Figure7.1Onesolutionto the 8-queensproblem

Example7.2 [Sum of subsets]Given positivenumbersWi, 1< i < n, andm, this problemcallsfor finding all subsetsof the w^ whose sumsare m.For example,if n = 4, (tui, 102,103,104)= (11,13,24, 7), and m = 31,thenthe desiredsubsetsare (11,13,7) and (24,7). Rather than representthesolutionvectorby the w% which sum to m, we couldrepresentthe solutionvectorby giving the indicesof theseWj. Now the two solutionsaredescribedby the vectors(1,2, 4) and (3, 4). In general,all solutionsare A;-tuples(a;i,\302\2432,

\342\226\240\342\226\240

\342\226\240, 37/c), 1< k < n, and different solutionsmay have different-sizedtuples.TheexplicitconstraintsrequireX{ \342\202\254 {j \\ j is an integerand 1<j < n}.Theimplicitconstraintsrequirethat no two be the sameand thatthe sum of the correspondingiuj'sbem. Sincewe wish toavoid generatingmultipleinstancesof the samesubset(e.g.,(1,2,4) and (1,4, 2) representthesamesubset),another implicitconstraintthat is imposedis that xi < rcj+i,1< i < k.

Inanotherformulation of the sumof subsetsproblem,eachsolutionsubsetis representedby an n-tuple(rci,\302\2432,

\342\226\240\342\226\240

\342\226\240,xn)suchthat x% \342\202\254 {0,1},1< i <n.

Then x\\ = 0 if Wi is not chosenand x\\ = 1 if Wj is chosen.The solutionsto the above instanceare (1,1,0, 1) and (0, 0, 1,1).This formulationexpressesall solutionsusing a fixed-sizedtuple.Thus we concludethattheremay be severalways to formulate a problemsothat all solutionsaretuples that satisfy someconstraints.One can verify that for both of theabove formulations,the solutionspaceconsistsof 2\" distincttuples. \342\226\241


Backtrackingalgorithmsdetermineproblemsolutionsby systematicallysearchingthe solutionspacefor the given probleminstance.This searchisfacilitatedby usinga treeorganizationfor the solutionspace.Fora givensolutionspacemany treeorganizationsmay be possible.The next twoexamples examinesomeof the ways to organizea solutioninto a tree.

Example7.3 [n-queens]Then-queensproblemis a generalizationof the 8-queensproblemof Example7.1.Now n queensare to beplacedon an n x nchessboardsothat no two attack;that is,no two queensareonthe samerow,column,or diagonal.Generalizingour earlierdiscussion,the solutionspaceconsistsof alln!permutationsof the n-tuple(1,2,..., n). Figure7.2showsa possibletreeorganizationfor the casen = 4. A treesuchas this is calleda permutationtree. Theedgesare labeledby possiblevalues of X{. Edgesfrom level 1 to level 2 nodesspecify the values for x\\. Thus, the leftmostsubtreecontainsall solutionswith x\\ = 1;its leftmost subtreecontainsallsolutionswith x\\ = 1and x<i = 2,and soon.Edgesfrom level i to level i +1arelabeledwith the values of x\\. The solutionspaceis definedby allpathsfrom the rootnodeto a leaf node.Thereare4!= 24 leafnodesin the treeof Figure7.2. \342\226\241

\302\256\302\251\302\256\302\251\302\251\302\251@\302\251

Figure7.2Treeorganizationof the 4-queenssolutionspace.Nodesarenumberedas in depth first search.


Example7.4 [Sum of subsets]In Example7.2we gave two possibleformulations of the solutionspacefor the sum of subsetsproblem.Figures7.3and7.4show a possibletreeorganizationfor eachof theseformulationsfor thecasen = 4. The tree of Figure7.3correspondsto the variable tuple sizeformulation.Theedgesarelabeledsuch that an edgefrom a level i nodetoa level i +1noderepresentsa value for x^. At eachnode,the solutionspaceis partitionedinto subsolutionspaces.Thesolutionspaceis defined by allpaths from the rootnodeto any nodein the tree,sinceany suchpathcorresponds to a subsetsatisfying the explicitconstraints.Thepossiblepaths are() (this correspondsto the empty path from the root to itself), (1),(1,2),(1,2,3),(1,2,3,4),(1,2,4),(1,3,4),(2), (2,3),and so on.Thus, theleftmost subtreedefines all subsetscontainingw\\, the next subtreedefines allsubsetscontainingW2 but not wi, and so on.

The tree of Figure7.4 correspondsto the fixed tuple sizeformulation.Edgesfrom level i nodesto level i + 1nodesare labeledwith the value ofXi, which is eitherzeroorone.All paths from the root to a leafnodedefinethe solutionspace.The left subtreeof the rootdefinesallsubsetscontainingw\\, the right subtreedefinesall subsetsnot containingw\\, and soon.Nowthereare2 leafnodeswhich represent16possibletuples. \342\226\241

Figure7.3A possiblesolutionspaceorganizationfor the sum of subsetsproblem.Nodesarenumberedas in breadth-firstsearch.

At this point it is useful to developsometerminology regardingtreeorganizationsof solutionspaces.Each nodein this treedefines a problem


state. All paths from the root to othernodesdefine the statespaceof theproblem.Solutionstatesare thoseproblemstatess for which the path fromthe rootto sdefinesa tuple in the solutionspace.In the treeof Figure7.3allnodesaresolutionstateswhereasin the treeof Figure7.4only leaf nodesaresolutionstates.Answer statesare thosesolutionstatess for which the pathfrom the rootto sdefinesa tuplethat is a memberof the setof solutions(i.e.,it satisfies the implicitconstraints)of the problem.Thetreeorganizationofthe solutionspaceis referredto as the statespacetree.

Figure7.4 Another possibleorganizationfor the sum of subsetsproblems.Nodesarenumberedas in D-search.

At eachinternal node in the spacetree of Examples7.3 and 7.4 thesolutionspaceis partitionedinto disjointsub-solutionspaces.Forexample,at node1of Figure7.2the solutionspaceis partitionedinto four disjointsets.Subtrees2,18,34, and 50 respectivelyrepresentall elementsof thesolutionspacewith x\\ = 1,2, 3, and4.At node2 the sub-solutionspacewithx\\ = 1 is further partitionedinto threedisjointsets.Subtree3 representsallsolutionspaceelementswith x\\ = 1and x<i = 2.For all the statespacetreeswe study in this chapter,the solutionspaceis partitionedinto disjointsub-solutionspacesat eachinternalnode.It shouldbe noted that this is


not a requirementon a statespacetree.The only requirementis that everyelementof the solutionspaceberepresentedby at leastonenodein the statespacetree.

Thestate spacetreeorganizationsdescribedin Example7.4 are calledstatic trees. This terminology follows from the observationthat the treeorganizationsare independentof the probleminstancebeing solved.Forsomeproblemsit is advantageousto usedifferent treeorganizationsfordifferent probleminstances.In this casethe treeorganizationis determineddynamically as the solutionspaceis beingsearched.Treeorganizationsthatareprobleminstancedependentare calleddynamic trees.As an example,considerthe fixed tuplesizeformulation for the sumof subsetsproblem(Example 7.4).Usinga dynamic treeorganization,one probleminstancewithn = 4 can be solved by meansof the organizationgiven in Figure7.4.Another probleminstancewith n = 4 canbesolvedby meansof a treein whichat level 1 the partitioningcorrespondsto X2 = 1and X2 = 0. At level 2the partitioningcouldcorrespondto x\\ = 1and x\\ = 0,at level 3 it couldcorrespondto X;$

= 1and x% \342\200\224 0,and soon.We seemoreof dynamic treesin Sections7.6and 8.3.

Oncea statespacetreehas beenconceivedof for any problem,thisproblem canbe solved by systematically generatingthe problemstates,determining which of theseare solutionstates,and finally determiningwhichsolutionstatesare answer states.Thereare two fundamentally differentways to generatethe problemstates.Both of thesebegin with the rootnodeand generateothernodes.A nodewhich has beengeneratedand allof whosechildrenhave not yet beengeneratedis calleda live node. Thelive nodewhosechildrenarecurrently beinggeneratedis calledthe E-node(nodebeingexpanded).A dead node is a generatednodewhich is not tobeexpandedfurther or allof whosechildrenhave beengenerated.In bothmethodsof generatingproblemstates,we have a list of live nodes.In thefirst of thesetwo methodsas soonas a new childC of the current E-nodeR is generated,this childwill becomethe new E-node.ThenR will becomethe E-nodeagainwhen the subtreeC has beenfully explored.Thiscorresponds to a depth first generationof the problemstates.In the secondstategenerationmethod,the -E-noderemainsthe .E-nodeuntil it is dead.In bothmethods,boundingfunctionsareused to kill live nodeswithout generatingall theirchildren.This is donecarefullyenoughthat at the conclusionof theprocessat leastoneanswer nodeis alwaysgeneratedorallanswer nodesaregeneratedif the problemrequiresus to find allsolutions.Depthfirst nodegenerationwith boundingfunctions is calledbacktracking.Stategenerationmethodsin which the E-noderemainsthe E-nodeuntil it is deadleadtobranch-and-boundmethods.Thebranch-and-boundtechniqueis discussedin Chapter8.

Thenodesof Figure7.2have beennumberedin the orderthey would begeneratedin a depth first generationprocess.The nodesin Figures7.3and


7.4 have beennumberedaccordingto two generationmethodsin which theE-noderemainsthe E-nodeuntil it is dead.In Figure7.3eachnew nodeisplacedinto a queue.When all the childrenof the current-E-nodehave beengenerated,the next nodeat the front of the queuebecomesthe new .E-node.In Figure7.4new nodesareplacedinto a stackinsteadof a queue.Currentterminologyis not uniform in referringto thesetwo alternatives.Typicallythe queuemethodis calledbreadthfirst generationand the stackmethodiscalledD-search(depthsearch).

Example7.5 [4-queens]Let us seehow backtrackingworkson the 4-queensproblemof Example7.3.As a boundingfunction,we usethe obviouscriteriathat if {x\\,X2,\342\226\240\342\226\240

\342\226\240, Xi) is the path to the current E-node,then all childrennodeswith parent-childlabelingsXi+\302\261

aresuchthat (x\\,...,a?j+i)representsa chessboardconfigurationin which no two queensareattacking.We startwith the root nodeas the only live node.This becomesthe E-nodeandthe path is (). We generateonechild.Let us assumethat the childrenaregeneratedin ascendingorder.Thus,nodenumber2 of Figure7.2isgeneratedand the path is now (1).This correspondsto placingqueen1on column1. Node2 becomesthe E-node.Node3 is generatedand immediatelykilled.The next nodegeneratedis node8 and the path becomes(1,3).Node8 becomesthe E-node.However, it gets killedas all its childrenrepresentboardconfigurationsthat cannot lead to an answer node.Webacktrackto node2 and generateanotherchild,node13.Thepath is now(1,4). Figure7.5shows the boardconfigurationsas backtrackingproceeds.Figure7.5showsgraphicallythe stepsthat the backtrackingalgorithmgoesthrough as it tries to find a solution.Thedotsindicateplacementsof aqueenwhich were triedand rejectedbecauseanotherqueenwas attacking.In Figure7.5(b)the secondqueenis placedon columns1and 2 and finallysettleson column3. In Figure7.5(c)the algorithmtriesall four columnsand is unableto placethe next queenon a square.Backtrackingnow takesplace. In Figure7.5(d) the secondqueen is moved to the next possiblecolumn,column4 and the third queenis placedon column2.The boardsinFigure7.5(e), (f), (g),and (h) show the remainingstepsthat the algorithmgoesthroughuntil a solutionis found.

Figure7.6shows the part of the tree of Figure 7.2 that is generated.Nodesarenumberedin the orderin which they aregenerated.A nodethatgets killedas a result of the boundingfunction has a B underit. Contrastthis treewith Figure7.2which contains31nodes. \342\226\241

With this examplecompleted,we are now ready to presenta preciseformulation of the backtrackingprocess.We continueto treat backtrackingin a generalway. We assumethat allanswer nodesare to be found and notjust one.Let (x\\,X2,\342\226\240

\342\226\240., Xi) bea path from the rootto a nodein a statespacetree. Let T(x\\,X2,\342\226\240\342\200\242

\342\226\240, xi) be the setof allpossiblevalues for Xi+\\ suchthat(x\\,X2,\342\226\240\342\226\240\342\226\240,Xi+i) is alsoa path to a problemstate.T(x\302\261,X2, \342\226\240\342\226\240\342\226\240,xn) = 0.


(a) (b)

1 1

21

21

32

(c) (d)

1

32

1 1

23

1

4

2

(e) (f) (g) (h)

Figure7.5Exampleof a backtracksolutionto the 4-queensproblem

x* =l

xA=3

Figure7.6Portionof the treeof Figure7.2that is generatedduringbacktracking


We assumetheexistenceof boundingfunction\302\243?*+i (expressedas predicates)

such that if Bi+i(x\\,X2,\342\226\240\342\226\240\342\226\240, #i+i) is false for a path (x\302\261,X2,\342\226\240\342\226\240

\342\226\240, #j+i) fromthe root nodeto a problemstate, then the path cannot be extendedtoreachan answer node.Thus the candidatesfor positioni +1of the solutionvector (x\\,...,xn) are those values which are generatedby T and satisfyBi+\\. Algorithm 7.1presentsa recursiveformulation of the backtrackingtechnique.It is natural to describebacktrackingin this way sinceit isessentiallya postordertraversalof a tree (seeSection6.1).This recursiveversionis initially invokedby

Backtrack(l);

1 AlgorithmBacktrack(/c)2 // This schemadescribesthe backtrackingprocessusing3 // recursion.On entering,the first k \342\200\224 1values

1],x[2],..., x[k\342\200\224 1] of the solutionvectorn] have beenassigned.x[ ] and n areglobal.

4 I/ x5 II x6 {7 for (eachx[k]\342\202\254 T(x[l],...,x[k-1])do8 {9 if (Bk{x[l],x[2],...,x[k])+ 0) then10 {11 if (a;[l],a;[2],...,x[k]is a path to an answer node)12 then write (x[l :k]);13 if {k <n) thenBacktrack(fe+ 1);14 }15 }16 >

Algorithm7.1Recursivebacktrackingalgorithm

Thesolutionvector(x\\,...,xn), is treatedas a globalarray x[l:ri\\. Allthe possibleelementsfor the kth positionof the tuple that satisfy Bk aregenerated,one by one,and adjoinedto the current vector {x\\,...,xk_\\).Each timexk is attached,a checkis madeto determinewhethera solutionhas beenfound. Then the algorithmis recursively invoked. When the forloopof line7 is exited,no morevalues for xk existand the currentcopy ofBacktrackends.Thelastunresolvedcallnow resumes,namely, the one thatcontinuesto examinethe remainingelementsassumingonly k \342\200\224 2 valueshave beenset.


Note that this algorithmcausesallsolutionsto be printed and assumesthat tuplesof various sizesmay make up a solution.If only a singlesolutionis desired,then a flag can be addedas a parameterto indicatethe firstoccurrenceof success.

1 AlgorithmIBacktrack(n)2 // This schemadescribesthe backtrackingprocess.3 // All solutionsaregeneratedin x[l:n] and printed4 // assoonas they aredetermined.5 {6 fc:=l;7 while

(jfc / 0) do8 {9 if (thereremainsan untriedx[k] \342\202\254 T(a;[l],x[2],...,10 x[k\342\200\224 1]) andf?fc(a;[l],...,x[k])is true) then11 {12 if (x[l],...,x[k]is a path to an answer node)13 thenwrite (x[l : k]);14 k :=k + 1;// Considerthe next set.15 }16 elsek :=k \342\200\224 1;// Backtrackto the previousset.17 }18 }

Algorithm7.2Generaliterativebacktrackingmethod

An iterativeversionof Algorithm 7.1appearsin Algorithm 7.2.Note thatT() will yield the set of all possiblevalues that can be placedas the firstcomponentx\\ of the solutionvector.Thecomponentx\\ will takeon thosevalues for which the boundingfunction B\\{xi)is true. Also note how theelementsaregeneratedin a depth first manner.Thevariable k is continuallyincrementedand a solutionvectoris grown until eithera solutionis found orno untriedvalue of Xf. remains.When k is decremented,the algorithmmustresumethe generationof possibleelementsfor the kih positionthat havenot yet beentried.Thereforeone must developa procedurethat generatesthesevalues in someorder.If only one solutionis desired,replacingwrite(x[l:k]);with {write(x[l:k]);return;}suffices.

The efficiency of both the backtrackingalgorithmswe'vejust seendepends very much on four factors:(1) the timeto generatethe next Xf., (2)the numberof Xf. satisfying the explicitconstraints,(3) the timefor theboundingfunctions

\302\243?\302\243,

and (4) the numberof Xf. satisfying the B^. Bound-


1 AlgorithmP\\ace(k,i)2 // Returnstrueif a queencan beplacedin kth row and3 // ith column.Otherwiseit returnsfalse.x[ ] is a4 // globalarray whose first (k \342\200\224 1) values have beenset.5 // Abs(r) returns the absolutevalue of r.6 {7 for j :=1to k \342\200\224 1do8 if ((x[j]= i) II Two in the samecolumn9 or {Abs(x[j}- i) = Abs(j - k)))10 // or in the samediagonal11 thenreturnfalse;12 returntrue;13 }

Algorithm7.4 Can a new queenbeplaced?

1 AlgorithmNQueens(/c,n)2 // Usingbacktracking,this procedureprints all3 // possibleplacementsof n queenson an n x n4 // chessboardsothat they arenonattacking.5

\"t

6 for i :=1to n do7 {8 if Place(/c,i)then9 {10 x[k]:=i;11 if (k = n) thenwrite (x[l:n]);12 elseNQueens(A;+l,n);13 }14 }15 >

Algorithm7.5All solutionsto the n-queensproblem

7.2. THE8-QUEENSPROBLEM 355

At this point we might wonder how effective function NQueensis over thebrute forceapproach.For an 8 x8 chessboardthereare ( g ) possibleways toplace8 pieces,or approximately4.4billion8-tuplesto examine.However,byallowingonly placementsof queenson distinctrowsand columns,we requirethe examinationof at most 8!,oronly 40,3208-tuples.

We can use Estimate to estimatethe numberof nodesthat will begenerated by NQueens.Note that the assumptionsthat are neededfor Estimatedo hold for NQueens. The boundingfunction is static.No changeis madeto the function as the searchproceeds.In addition,allnodeson the samelevel of the statespacetreehave the samedegree.In Figure7.8we seefive8x8chessboardsthat were createdusingEstimate.

As required,the placementof eachqueenon the chessboardwas chosenrandomly. With eachchoicewe kept trackof the numberof columnsa queencould legitimatelybe placedon. Thesenumbersare listedin the vectorbeneatheachchessboard.Thenumberfollowing the vectorrepresentsthevalue that function Estimate would producefrom thesesizes.The averageof thesefive trialsis 1625.The totalnumberof nodesin the 8-queensstatespacetreeis

7

l +EN=o(8-0]=69,281j=0

So the estimatednumberof unboundednodesis only about 2.34%of thetotalnumberof nodesin the 8-queensstate spacetree. (Seethe exercisesfor moreideasabout the efficiency of NQueens.)

EXERCISES1.Algorithm NQueenscanbemademoreefficient by redefiningthe

function Place(/c,i)sothat it eitherreturns the next legitimatecolumnonwhich to placethe kth. queenoran illegalvalue.Rewriteboth functions(Algorithms 7.4and 7.5)so they implementthis alternatestrategy.

2. For the n-queensproblemwe observethat somesolutionsare simplyreflectionsor rotationsof others.Forexample,when n = 4, the twosolutionsgiven in Figure7.9areequivalent underreflection.Observethat for finding inequivalentsolutionsthe algorithmneedonlysetx[l]= 2,3,...,\\n/2].(a) ModifyNQueenssothat only inequivalentsolutionsarecomputed.(b) Run the n-queensprogramdevisedabove for n = 8,9, and 10.

Tabulatethe number of solutionsyour programfinds for eachvalue of n.


3

l

4

(8,5,4

IT

6

4

2

5

,3,2)=1649

7

3

5

2

6

3

1

4

2

5

(8,5,3,1,2,1)= 769

1

4

2

5

3

(8,6,4,2,1,1,1)= 1401 (8,6,4,3,2)=1977

5

3

1

6

8

2

4

7

(8,5,3,2,2,1,1,1)= 2329

Figure7.8Five walks throughthe 8-queensproblemplus estimatesof thetreesize

7.3. SUMOFSUBSETS 357

3

1

4

2 2

4

1

3

Figure7.9Equivalent solutionsto the 4-queensproblem

3. Given an n x n chessboard,a knightwith coordinates(x^y). Theproblemoves such that every square ofsequenceof moves exists.Present

is placedon an arbitrary squarem is to determinen2 \342\200\224 1knight

the boardis visited onceif such aan algorithmto solve this problem.

7.3 SUM OF SUBSETSiveof

Supposewe are given n distinctpositand we desireto find allcombinationsThis is calledthe sum of subsetsproblerjiwe couldformulate this problemusingWe considera backtrackingsolutionthis casethe elementXi of the solutionon whetherthe weight Wi is includedoi

Thechildrenof any nodein Figureat level i the left childcorrespondsto

A simplechoicefor the bounding

numbers(usually calledweights)thesenumberswhosesumsarem.Examples7.2and 7.4showedhow

eitherfixed-or variable-sizedtuples.usingthe fixed tuple sizestrategy. Invectoris eitheroneorzerodepending

not.7.4areeasily generated.Fora node

1and the right to X{ = 0.ions is Bk{xi,...,Xk) = true ifffuncti

i=fcflm

i=\\

Clearly xi,...,Xkcannot lead to an answernodeif this conditionis notsatisfied.Theboundingfunctions canbare initially in nondecreasingorder. Inan answer nodeif

Ywixi+wk+i > rn

j=i

The boundingfunctions we usearetheiefore

strengthenedif we assumethe twj'sthis casex\\,...,Xkcannot leadto


k n

Bk(xi,...,Xk)= true iffy\302\2432wiXi+ ]P wt > mi=l i=fc+l

k

and VJli'jiCj + Wfc+i < rn (7-1)j=i

Sinceour algorithmwill not make useof Bn, we neednot be concernedbythe appearanceof wn+\\ in this function.Although we have now specifiedallthat is neededto directly useeitherof the backtrackingschemas,a simpleralgorithmresultsif we tailoreitherof theseschemasto the problemat hand.This simplificationresultsfrom the realizationthat if a^ = 1,then

k n

^2 wixi + X! wi > mi = 1 i=k+l

Forsimplicity we refine the recursiveschema.The resultingalgorithmisSumOfSub(Algorithm 7.6).

Algorithm SumOfSubavoids computingEj=iwixi and Yî-k+iwi eacntimeby keepingthesevalues in variablessand r respectively.The algorithmassumesw\\ < m and Ya=i wi ^ m- TheinitialcallisSumOfSub(0,1,Ef=iwi)It is interestingto note that the algorithmdoesnot explicitlyuse the testk >n to terminatethe recursion.This test is not neededas on entry to thealgorithm,s/ m and s+ r > m. Hence,r / 0 and sok can be no greaterthan n. Also note that in the elseif statement(line11),sinces+ Wf. < mand s+ r > m, it follows that r / Wf. and hence k + 1 < n. Observealsothat if s+ Wk = m (line9), then Xk+i,\342\226\240\342\226\240\342\226\240-,xn must be zero. Thesezerosare omittedfrom the output of line 9. In line 11we do not test for

Etiwtxi +ELfc+i Wi > m, as we already know s+ r >mand x^ = 1.

Example7.6Figure7.10shows the portionof the statespacetreegenerated by function SumOfSubwhile workingon the instancen = 6, m = 30,and w[\\ : 6] = {5,10,12,13,15,18}.Therectangularnodeslist the valuesof s,k, and r on eachof the callsto SumOfSub. Circularnodesrepresentpointsat which subsetswith sumsm areprintedout.At nodes 4,5,andC the output is respectively(1,1,0,0,1),(1,0,1,1),and (0, 0,1,0,0,1).Note that the tree of Figure7.10containsonly 23 rectangularnodes.The full statespacetreefor n = 6 contains26 \342\200\224 1= 63 nodesfrom whichcallscouldbemade(thiscountexcludesthe 64 leaf nodesas no callneedbemadefrom a leaf). \342\226\241

7.3. SUMOFSUBSETS 359

1 AlgorithmSumOfSub(s,/c,r)2 // Find allsubsetsof w[l :n] that sum to m. The values of x\\j],3 // 1<J < &> have already beendetermined,s=

YjjZ\\ w\\j\\ * x[j]4 // and r =

Y7j=k w\\j]- The\302\253;[?']'s

are in nondecreasingorder.5 // It is assumedthat w[l] < m and Yl?=iw[^\\ > m-6 {7 // Generateleft child.Note:s+w[k] < m sinceBk-\\ is true.8 x[k] :=1;9 if (s +w[k] \342\200\224 rn) thenwrite (x[\\ :k]);// Subsetfound10 // Thereis no recursivecallhereas w[j] > 0, 1<j <n.11 else if (s +w[k] +w[k+ 1]<m)12 thenSumOfSub(s+ \302\253;[\302\243;],

/c + 1,r -w[k});13 // Generateright childand evaluateB^.14 if ((s+ r \342\200\224 w[k] >m) and (s +

t\302\253[/c+ 1]<m)) then

15 {16 a;[fc]:=0;17 SumOfSub(s,fc+ l,r-\302\253;[18 }19 }

Algorithm7.6Recursivebacktrackingalgorithmfor sum of subsetsproblem

EXERCISES

1.Provethat the sizeof the setof allsubsetsof n elementsis 2\".

2.Let w = {5,7,10,12,15,18,20}and m \342\200\224 35.Find allpossiblesubsetsof w that sum to rn. Do this usingSumOfSub. Draw the portionofthe statespacetreethat is generated.

3. With m = 35,run SumOfSubonthedata(a)w = {5,7,10,12,15,18,20},(b) w = {20,18,15,12,10,7,5},and (c) w = {15,7,20,5,18,10,12}.Are thereany discernibledifferencesin the computingtimes?

4. Write a backtrackingalgorithmfor the sum of subsetsproblemusingthe statespacetreecorrespondingto the variable tuple sizeformulation.


x^=l

Figure7.10Portionof statespacetreegeneratedby SumOfSub

7.4 GRAPH COLORINGLet G be a graph and m be a given positiveinteger.We want to discoverwhetherthe nodesof G can be coloredin sucha way that no two adjacentnodeshave the samecoloryet only m colorsareused.This is termedthem-colorabilitydecisionproblemand it is discussedagainin Chapter11.Notethat if d is the degreeof the given graph, then it can becoloredwith d + 1colors.The m-colorabilityoptimizationproblemasksfor the smallestintegerm for which the graph G can becolored.This integeris referredto as thechromaticnumber of the graph.Forexample,the graph of Figure7.11canbecoloredwith threecolors1,2,and 3.The colorof eachnodeis indicatednext to it.It canalsobeseenthat threecolorsareneededto colorthis graphand hencethis graph'schromaticnumberis 3.

7.4. GRAPHCOLORING 361

Figure7.11An examplegraphand itscoloring

A graph is said to be planariff it can be drawn in a plane in such away that no two edgescrosseachother. A famous specialcaseof the m-colorabilitydecisionproblemis the 4-colorproblemfor planargraphs.Thisproblemasks the following question:given any map,can the regionsbecoloredin such a way that no two adjacentregionshave the samecoloryet only four colorsare needed?This turns out to be a problemfor whichgraphsarevery useful,becausea map caneasilybetransformedinto agraph.Each regionof the map becomesa node,and if two regionsare adjacent,then the correspondingnodesarejoinedby an edge.Figure7.12shows amap with five regionsand itscorrespondinggraph.This map requiresfourcolors.Formany years it was known that five colorsweresufficient to colorany map,but no map that requiredmorethan four colorshad ever beenfound. After severalhundredyears, this problemwas solved by a groupofmathematicianswith the helpof a computer.They showedthat in fact fourcolorsare sufficient. In this sectionwe considernot only graphs that areproducedfrom maps but all graphs.We are interestedin determiningallthe different ways in which a given graph can be coloredusingat most mcolors.

Supposewe representa graph by its adjacencymatrixG[\\ : n, 1 : n],whereG[i,j] = 1if (i,j) is anedgeof G,andG[i,j] = 0 otherwise.Thecolorsarerepresentedby the integers1,2,...,m and the solutionsaregiven by then-tuple(xi,\342\226\240.. ,xn),whereX{ is the colorof nodei. Usingthe recursivebacktrackingformulation as given in Algorithm 7.1,the resultingalgorithmis mColoring (Algorithm 7.7). The underlying state spacetree used is atree of degreem and heightn + 1. Eachnodeat level % has m childrencorrespondingto the m possibleassignmentsto xt, 1 < i < n. Nodesat


4

3

2

1

5

Figure7.12A map and its planargraph representation

level n + 1are leaf nodes.Figure7.13shows the statespacetreewhen n =3 and m = 3.

Function mColoring is begun by first assigningthe graph to itsadjacency matrix,setting the array x[ ] to zero,and then invoking the statementmColoring(l);.

Noticethe similarity betweenthis algorithmand the generalform of therecursivebacktrackingschemaof Algorithm 7.1.FunctionNextValue(Algorithm 7.8) producesthe possiblecolorsfor x^ after x\\ through Xk_i havebeendefined.Themainloopof mColoring repeatedlypicksan elementfromthe setof possibilities,assignsit to a^, and then callsmColoring recursively.Forinstance,Figure7.14showsa simplegraphcontainingfour nodes.Belowthat is the tree that is generatedby mColoring.Eachpath to a leafrepresents a coloringusingat mostthreecolors.Note that only 12solutionsexistwith exactly threecolors.In this tree, after choosingx\\ = 2 and X2 = 1,the possiblechoicesfor x% are 2 and 3.After choosingx\\ = 2,X2 = 1,andX3 = 2,possiblevalues for x\\ are 1and 3.And soon.

An upperboundon the computingtimeof mColoringcanbearrivedat bynoticingthat the numberof internalnodesin the statespacetreeis

Y~HZ\302\247m%\342\226\240

At eachinternalnode,0(mn)timeis spent by NextValue to determinethechildrencorrespondingto legalcolorings.Hencethe total timeis boundedby E7=oml+1n= Y!i=imln = n(mn+1-2)/(m- 1)= 0{nmn).

7.4. GRAPH COLORING 363

1 AlgorithmmColoring(A;)2 // This algorithmwas formed usingthe recursivebacktracking3 // schema.The graph is representedby its booleanadjacency4 // matrixG[l:n, 1:n].All assignmentsof 1,2,...,m to the5 // verticesof the graphsuch that adjacentverticesare6 // assigneddistinctintegersareprinted,k is the index7 // of the next vertexto color.8 {9 repeat10 {//Generateall legalassignmentsfor x[k].11 NextValue(/c);// Assign to x[k]a legalcolor.12 if (x[k]= 0) thenreturn;// No new colorpossible13 if (k = n) then // At mostm colorshave been14 // used to colorthe n vertices.15 write (x[l:n]);16 elsemColoring(A; + 1);17 }until(false);18 }

Algorithm7.7 Findingallm-coloringsof a graph

xt, =1

Figure7.13Statespacetreefor mColoring when n =3 and m \342\200\224 3


1 AlgorithmNextValue(/c)2 // a;[l],...,x[k\342\200\224 1]have beenassignedintegervalues in3 // the range [l,m]such that adjacentverticeshave distinct4 // integers.A value for x[k] is determinedin the range5 // [0,m].x[k] is assignedthe next highestnumberedcolor6 // while maintainingdistinctnessfrom the adjacentvertices7 // of vertexk. If no suchcolorexists,then x[k] is 0.8 {9 repeat10 {11 x[k] := (x[k]+ 1) mod(m+ 1);// Next highestcolor.12 if (x[k]= 0) thenreturn;// All colorshave beenused.13 for j :=1to n do14 { // Checkif this coloris15 // distinctfrom adjacentcolors.16 if ((G[fc,i]#0)and (x[k]= x[j}))17 // If (k,j)is and edgeand if adj.18 // verticeshave the samecolor.19 then break;20 }21 if (j = n + 1) thenreturn;// New colorfound22 }until(false);// Otherwisetry to find anothercolor.23 }

Algorithm7.8Generatinga next color

EXERCISE

1.Programand run mColoring (Algorithm7.7)usingasdatathecompletegraphsof sizen = 2,3, 4, 5, 6,and 7. Let the desirednumberof colorsbe k = n and k = n/2.Tabulatethe computingtimesfor eachvalueof n and k.

7.5 HAMILTONIANCYCLES

Let G = (V, E) be a connectedgraphwith n vertices.A Hamiltoniancycle(suggestedby Sir William Hamilton)is a round-trippath alongn edgesofG that visits every vertexonceand returns to its startingposition.In otherwords if a Hamiltoniancyclebeginsat somevertexv\\ G G and the vertices

7.5. HAMILTONIANCYCLES 365

Figure7.14A 4-nodegraphand allpossible3-colorings

of G arevisited in the orderv\\, V2, \342\226\240\342\226\240

\342\226\240, wn+i? then the edges(uj, Uj+i)are in\302\243,

1< * <n, and the m aredistinctexceptfor vj and un+i? whichareequal.Thegraph G\\ of Figure7.15containsthe Hamiltoniancycle 1,2,8, 7,

6,5, 4, 3,1.Thegraph G2 of Figure7.15containsno Hamiltoniancycle.Thereis no known easy way to determinewhethera given graphcontainsaHamiltoniancycle.We now lookat a backtrackingalgorithmthat finds allthe Hamiltoniancyclesin a graph.Thegraphmay bedirectedorundirected.Only distinctcyclesareoutput.

Thebacktrackingsolutionvector(xi,...,xn) is defined sothat X{

represents the ith visited vertexof the proposedcycle.Now allwe needdo isdeterminehow to computethe setof possibleverticesfor x^ if xi,...,x^-ihave already beenchosen.If k = 1,then x\\ canbeany of the n vertices.Toavoid printingthe samecyclen times,we requirethat x\\ = \\. If 1< k <n,then Xk can be any vertexv that is distinct from x\\,X2, \342\226\240\342\226\240

\342\226\240, #fc-i and v isconnectedby an edgeto x^-i- Thevertexxn canonly be the oneremainingvertexand it must beconnectedto both xn-\\ and x\\. We beginbypresenting

function NextValue(/c) (Algorithm7.9),which determinesa possiblenext


Gl:

G2:

Figure7.15Two graphs,onecontaininga Hamiltoniancycle

vertexfor the proposedcycle.UsingNextValue we can particularizethe recursivebacktrackingschema

to find all Hamiltoniancycles (Algorithm 7.10).This algorithmis startedby first initializingthe adjacencymatrixG[l:n, 1:n], then settingx[2:n]to zeroand x[l]to 1,and then executingHamiltonian(2).

Recallfrom Section5.9the travelingsalespersonproblemwhichaskedfora tour that has minimum cost.This tour is a Hamiltoniancycle.For thesimplecaseof a graphallof whoseedgecostsareidentical,Hamiltonian willfind a minimum-costtour if a tour exists.If the commonedgecostis c, thecostof a tour is en sincetherearen edgesin a Hamiltoniancycle.

EXERCISES

1.Determinethe orderof magnitudeof the worst-casecomputingtimefor the backtrackingprocedurethat finds allHamiltoniancycles.

2.Draw the portionof the statespacetreegeneratedby Algorithm 7.10for the graphGlof Figure7.15.

3. GeneralizeHamiltonian sothat it processesa graphwhoseedgeshavecostsassociatedwith them and finds a Hamiltoniancycle withminimum cost.You can assumethat alledgecostsarepositive.

7.5. HAMILTONIANCYCLES 367

1 AlgorithmNextValue(A:)2 // x[l:k \342\200\224 1] is a path of k \342\200\224 1distinctvertices.If x[fe]= 0, then3 // no vertexhas as yet beenassignedto x[k\\. After execution,4 // x[k]is assignedto the next highestnumberedvertexwhich5 // doesnot already appearin x[l:k \342\200\224 1] and is connectedby6 // an edgetox[k\342\200\224 1].Otherwisex[k]=0.If k = n, then7 // in additionx[k] is connectedto x[l].8 {9 repeat10 {11 x[k] :={x[k\\ + 1) mod(n + 1);// Next vertex.12 if (x[k]= 0) thenreturn;13 if (G[x[k-l],x[k}]^0) then14 { // Istherean edge?15 for j :=1to k \342\200\224 1do if (x[j]= x[k])thenbreak;16 // Checkfor distinctness.17 if (j = k) then// If true,then the vertexis distinct.18 if {{k<n) or ((A: = n) andG[x[n],x[l]]^ 0))19 then return;20 }21 }until(false);22 >

Algorithm7.9Generatinga next vertex


1 AlgorithmHamiltonian(A:)2 // This algorithmusesthe recursiveformulation of3 // backtrackingtofind all the Hamiltoniancycles4 // of a graph.Thegraph is storedas an adjacency5 // matrixG[l:n, 1:n].All cyclesbeginat node1.6 {7 repeat8 {// Generatevalues for x[k].9 NextValue(A:);// Assign a legalnext value to x[k].10 if (x[k]= 0) thenreturn;11 if (k = n) thenwrite (x[l :n]);12 elseHamiltonian(A; + 1);13 }until(false);14 }

Algorithm7.10FindingallHamiltoniancycles

7.6 KNAPSACKPROBLEMIn this sectionwe reconsidera problemthat was definedand solvedby adynamic programmingalgorithmin Chapter5, the 0/1knapsackoptimizationproblem.Given n positiveweights Wi, n positiveprofits pi,and a positivenumberm that is the knapsackcapacity, this problemcallsfor choosingasubsetof the weights suchthat

YJ WiXi < m and YJ piXi is maximized (7.2)l<i<n \\<i<n

TheXj-'s constitutea zero-one-valuedvector.Thesolutionspacefor this problemconsistsof the 2\342\204\242 distinct ways to

assignzeroor one values to the xj's.Thus the solutionspaceis the sameas that for the sum of subsetsproblem.Two possibletreeorganizationsarepossible.One correspondsto the fixed tuple sizeformulation (Figure7.4)and the othertothe variable tuplesizeformulation (Figure7.3).Backtracking algorithmsfor the knapsackproblemcan be arrivedat usingeitherofthesetwo statespacetrees.Regardlessof which is used,boundingfunctionsare neededto help kill somelive nodeswithout expandingthem. A goodboundingfunction for this problemis obtainedby usingan upperboundon the value of the bestfeasiblesolutionobtainableby expandingthe givenlive nodeand any of its descendants.If this upperbound is not higherthan

7.6. KNAPSACK PROBLEM 369

the value of the bestsolutiondeterminedsofar, then that live nodecan bekilled.

We continuethe discussionusingthe fixed tuple sizeformulation.If atnodeZ the values of x,;,1< i < k, have already beendetermined,then anupperboundfor Z can beobtainedby relaxingthe requirementXi = 0 or 1to0 < Xi < 1for k + 1< i < n and usingthe greedy algorithmof Section4.2to solve the relaxedproblem.FunctionBound(cp,cw,k) (Algorithm 7.11)determinesan upperbound on the best solutionobtainableby expandingany nodeZ at level k + 1of the state spacetree. The objectweights andprofits are w[i] and p[i].It is assumedthat j9[i]/w[i]> p[i + l]/w[i+ 1],1< i <n.

1 AlgorithmBound(cp,cw,k)2 // cp is the currentprofit total,cw is the current3 / / weight total;k is the indexof the last removed4 // item;and m is the knapsacksize.5 {6 b :=cp;c :=cw;7 for i :=k + 1to n do8 {9 c :=c+ w[i];9 if (c <m) thenb :=b + p[i];10 elsereturnb+(1\342\200\224 (c \342\200\224 m)/w[i\\)*p[i\\;11 }12 returnb;13 }

Algorithm7.11A boundingfunction

FromBound it follows that the boundfor a feasibleleft childof a nodeZis the sameas that for Z. Hence,the boundingfunction neednot be usedwheneverthe backtrackingalgorithmmakesa move to the left childof anode.The resultingalgorithmis BKnap (Algorithm 7.12).It was obtainedfromthe recursivebacktrackingschema.Initially set fp :=

\342\200\2241;.This algorithm

is invokedas

BKnap(l,0,0);

When fp^ \342\200\2241, x[i],1< i <n, is suchthat S\"=iP[i]a;[i]= fp- In lines8to 18left childrenaregenerated.In line20,Bound is usedto testwhethera


1234567891011121314151617181920212223242526272829

AlgorithmBKnap(A;, cp,cw)II m is: the sizeof the knapsack;n is the numberof weightsII and profits.w[ ] and p[ ] are the weightsand profits.IIpW .w[i] > p\\i + V\\/w\\i + 11.fw is the final weight ofII knapsack;fp is the final maximumprofit.x[k]= 0 if w[k]II{

}

is not in the knapsack;elsex[k] = 1.

//if ({

}//if({

}

Generateleft child.cw + w[k] <m) then

y[k] :=1;if (k <n) thenBKnap(A; + 1,cp+ p[k],cw + w[k]);if ((cp+ p[k] > fp) and (k = n)) then{

fp :=cp + p[k]',fw :=cw + w[k];for j :=ltokdox[j]:=y[j};

}Generateright child.Bound(cp,cw,k) > fp) then

y[k] :=0;if (k <n) thenBKnap(A; + 1,cp, cw);if ((cp> fp) and (k = n)) then{

fp \342\226\240= cp;fw :=cw;for j :=1to k dox[j]:=y[j];

}

Algorithm7.12Backtrackingsolutionto the 0/1knapsackproblem


right childshouldbegenerated.The path y[i], 1< i < k, is the path tothecurrentnode.Thecurrentweight cw \342\200\224 J^î M^M*']and cp = ^2 p[i]y[i]\342\226\240

In lines13to 17and 23 to 27the solutionvectoris updatedif needbe.So far, all our backtrackingalgorithmshave worked on a static state

spacetree.We now seehow a dynamic statespacetreecan beused for theknapsackproblem.One methodfor dynamically partitioningthe solutionspaceis basedon trying to obtain an optimalsolutionusing the greedyalgorithmof Section4.2.We first replacethe integerconstraintx^ = 0 or 1by the constraint0 < xt < 1.This yields the relaxedproblem

max 22 Pixi subjectto ^J Wixi <m (7.3)l<i<n l<i<n

0<Xi<l, 1< i < n

If the solutiongeneratedby the greedymethodhas allXj'sequalto zeroorone,then it is alsoan optimalsolutionto the original0/1knapsackproblem.If this is not the case,then exactlyone Xj will be suchthat 0 < Xj < 1.Wepartitionthe solutionspaceof (7.2) into two subspaces.In one Xj = 0and in the otherXj = 1.Thus the left subtreeof the state spacetree willcorrespondto x^ = 0 and the right to Xj = 1.In general,at eachnodeZof the statespacetreethe greedy algorithmis usedtosolve (7.3)under theaddedrestrictionscorrespondingto the assignmentsalready madealongthepath from the root to this node.In casethe solutionis all integer,then anoptimalsolutionfor this nodehas beenfound. If not, then thereis exactlyone Xi suchthat 0 < xt < 1.The left childof Z correspondsto Xj = 0,andthe right toxi = 1.

Thejustificationfor this partitioningschemeis that the nonintegerXj iswhat prevents the greedy solutionfrom beinga feasiblesolutionto the 0/1knapsackproblem.So,we would expectto reacha feasiblegreedy solutionquicklyby forcing this Xj tobeinteger.Choosingleft branchesto correspondto Xi = 0 ratherthan Xj = 1is alsojustifiable.Sincethe greedy algorithmrequiresPj/wj >pj+i/wj+i,we would expectmostobjectswith low index(i.e.,smallj and hencehighdensity) to be in an optimalfilling of theknapsack. When Xi is set to zero,we are not preventingthe greedy algorithmfrom usingany of the objectswith j < i (unlessXj has already beenset tozero).On the otherhand, when Xj is set to one,someof the Xj'swith j < iwill not be ableto get into the knapsack.Thereforewe expectto arrive atan optimalsolutionwith Xj = 0.Sowe wish the backtrackingalgorithmtotry this alternativefirst. Hencethe left subtreecorrespondsto Xj = 0.

Example7.7 Let us try out a backtrackingalgorithmand the abovedynamic partitioningschemeon the following data:p = {11,21,31,33,43,53,55,65},w = {1,11,21,23,33,43,45,55},m = 110,and n = 8. Thegreedy


solutioncorrespondingto the rootnode(i.e.,Equation(7.3))is x = {1,1,1,1,1,21/45,0,0}.Itsvalue is 164.88.Thetwo subtreesof the rootcorrespondto X6 = 0 and

x\302\247

= 1,respectively(Figure7.16).Thegreedy solutionatnode2 is x = {1,1,1,1,1,0,21/45,0}.Itsvalue is 164.66.The solutionspaceat node2 is partitionedusingxj = 0 and xj = 1.The next E-nodeisnode3.Thesolutionherehas xg = 21/55.Thepartitioningnow is with xg= 0 and xg = 1.Thesolutionat node4 is all integersothereis no needtoexpandthis nodefurther. Thebestsolutionfound so far has value 139andx = {1,1,1,1,1,0,0,0}.Node5 is the next E-node.Thegreedy solutionforthis nodeis x = {1,1,1,22/23,0,0,0,1}.Itsvalue is 159.56.Thepartitioning

is now with X4 = 0 and\302\2434

= 1.Thegreedy solutionat node6 has value156.66and X5 = 2/3.Next,node7 becomesthe E-node.Thesolutionhereis {1,1,1,0,0,0,0,1}.Itsvalue is 128.Node7 is not expandedas the greedysolutionhereis all integer.At node8 the greedy solutionhas value 157.71and X3 = 4/7.Thesolutionat node9 is all integerand has value 140.Thegreedy solutionat node10is {1,0,1,0,1,0,0,1}.Itsvalue is 150.ThenextE-nodeis 11.Itsvalue is 159.52and X3 = 20/21.Thepartitioningis nowon X3 = 0 and X3 = 1.Theremainderof the backtrackingprocesson thisknapsackinstanceis left as an exercise. \342\226\241

Experimentalwork due to E. Horowitz and S.Sahni, citedin thereferences, indicatesthat backtrackingalgorithmsfor the knapsackproblemgenerallywork in lesstimewhen usinga statictree than when usingadynamic tree. The dynamic partitioningschemeis,however, useful in thesolutionof integerlinearprograms.Thegeneralintegerlinearprogramismathematicallystatedin (7.4).

minimize Yli<j<n cj xj

subjectto \302\243i<j<n chj Xj < b{, 10,then we obtain a linearprogramwhoseoptimalsolutionhas a valueat leastas largeas the value of an optimalsolutionto (7.4).Linearprogramscanbesolvedusingthe simplexmethods(seethe references).If the solutionis not all integer,then a nonintegerx2 is chosento partitionthe solutionspace.Let us assumethat the value of x\\ in the optimalsolutionto thelinearprogramcorrespondingto any nodeZ in the statespaceis v and v isnot an integer.The left childof Z correspondstoXj < [v\\ whereasthe rightchildof Z correspondto Xi >

\\v~\\. Sincethe resultingstatespacetreehas a

potentially infinite depth (note that on the path from the root to a nodeZ


the solutionspacecan be partitionedon one x% many timesas eachx% canhave as value any nonnegativeinteger),it is almostalwayssearchedusingabranch-and-boundmethod(seeChapter8).

x*=0

Figure7.16Partof the dynamic statespacetreegeneratedin Example7.7

EXERCISES1. (a) Presenta backtrackingalgorithmfor solving the knapsack

optimization problemusingthe variable tuple sizeformulation.(b) Draw the portionof the statespacetreeyour algorithmwill

generate when solving the knapsackinstanceof Example7.7.

2. Completethe statespacetreeof Figure7.16.3. Give a backtrackingalgorithmfor the knapsackproblemusing the

dynamic statespacetreediscussedin this section.4. [Programmingproject](a) Programthe algorithmsof Exercises1and

3. Run thesetwo programsand BKnap usingthe following data:p =


{11,21,31,33,43,53,55,65},w = {1,11,21,23,33,43,45,55},m =110,and n \342\200\224 8.Which algorithmdo you expectto performbest?(b) Now programthe dynamic programmingalgorithmof Section5.7for the knapsackproblem.Usethe heuristicssuggestedat the end ofSection5.7.Obtaincomputingtimesand comparethis programwiththe backtrackingprograms.

5. (a) Obtaina knapsackinstancefor which morenodesaregeneratedby the backtrackingalgorithmusinga dynamic treethan usingastatictree.

(b) Obtaina knapsackinstancefor which morenodesaregeneratedby the backtrackingalgorithmusinga static tree than usingadynamic tree.

(c) Strengthenthe backtrackingalgorithmswith the followingheuristic: Build an array minw[ ] with the property that minw[i]is the indexof the objectthat has leastweight amongobjectsi,i+1,...,n. Now any \302\243?-node at whichdecisionsfor x\\,...,x^-\\have beenmadeand at which the unutilizedknapsackcapacity islessthan

i<;[mmi<;[\302\253]]canbeterminatedprovidedthe profit earned

up tothis nodeis no morethan the maximumdeterminedsofar.Incorporatethis into your programsof Exercise4(a).Rerun thenew programson the samedata setsand seewhat (if any)improvements result.

7.7 REFERENCES AND READINGSAn early modernaccountof backtrackingwas given by R. J. Walker. Thetechniquefor estimatingthe efficiency of a backtrackprogramwas firstproposedby M. Hall and D.E.Knuthand the dynamic partitioningschemeforthe 0/1knapsackproblemwas proposedby H.Greenbergand R. Hegerich.Experimentalresultsshowingstatictreestobesuperiorfor this problemcanbe found in \"Computingpartitionswith applicationsto the knapsackproblem,\" by E.Horowitz and S.Sahni,Journalof the ACM 21,no.2 (1974):277-292.

Data presentedin the above papershows that the divide-and-conquerdynamic programmingalgorithmfor the knapsackproblemis superiortoBKnap.

Fora proofof the four-colortheoremseeEvery PlanarMap is FourColorable, by K.I. Appel, AmericanMathematicalSociety, Providence,RI,1989.

7.8. ADDITIONALEXERCISES 375

A discussionof the simplexmethodfor solving linearprogramsmay befound in:LinearProgramming:An Introductionwith Applications,by A. Sultan,Academic Press,1993.LinearOptimization and Extensions,by M. Padberg,Springer-Verlag,1995.

7.8 ADDITIONALEXERCISES1.Supposeyou aregiven n menand n womenand two nxnarrays P and

Q suchthat P(i,j)is the preferenceof man i for womanj and Q(i,j)is the preferenceof woman i for man j. Given an algorithmthat findsa pairingof menand womensuchthat the sum of the productof thepreferencesis maximized.

2. Let A(l :n, 1:n) be an n x n matrix.The determinantof A is thenumber

det(A) = ^sgn(s)aM(i)a2s(2)' ' '

an,s(n)

wherethe sum is takenover allpermutationss(l),...,s(n) of {1,2,...,n} and sgn(s) is + 1or \342\200\2241 accordingto whethers is an even or oddpermutation.Thepermanentof A is defined as

per(A)= Y,Oi,s(i)02)S(2)\342\200\242\342\200\242\342\200\242

oB,s(\342\200\236)

The determinantcan becomputedas a by-product of Gaussianelimination requiring0(n3)operations,but no polynomial timealgorithmis known for computingpermanents.Write an algorithmthatcomputes the permanentof a matrixby generatingthe elementsof s usingbacktracking.Analyze the timeof your algorithm.

3. Let MAZE(1 : n, 1 : n) be a zero-or one-valued,two-dimensionalarray that representsa maze.A onemeansa blockedpath whereasazerostandsfor an openposition.You areto developan algorithmthatbeginsat MAZE(1,1)and triesto find a path to positionMAZE(n,n).Onceagainbacktrackingis necessaryhere.Seeif you can analyze thetimecomplexityof your algorithm.

4. The assignmentproblemis usually statedthis way: Therearen peopleto be assignedto n jobs.Thecostof assigningthe ith personto thejthjobis cost(i,j).You areto developan algorithmthat assignseveryjobtoa personand at the sametimeminimizesthe totalcostof theassignment.


5. Thisproblemis calledthe postagestampproblem.Envisiona countrythat issuesn different denominationsof stampsbut allows no morethan m stampson a singleletter. Forgiven values of m and n, writean algorithmthat computesthe greatestconsecutiverangeof postagevalues, from one on up, and all possiblesetsof denominationsthatrealizethat range.For example,for n = 4 and m = 5,the stampswithvalues (1,4, 12,21)allow the postagevalues 1through71.Are thereany othersetsof four denominationsthat have the samerange?

6. Hereis a gamecalledHi-Q.Thirty-two piecesarearrangedona boardas shown in Figure7.17.Only the centerpositionis unoccupied.Apieceis only allowed to move by jumping over one of its neighborsinto an empty space.Diagonaljumpsare not permitted. When apieceis jumped,it is removed from the board.Write an algorithmthat determinesa seriesof jumpssothat all the piecesexceptoneareeventually removedand that final pieceendsup at the centerposition.

7. Imaginea set of 12plane figures eachcomposedof five equal-sizesquares.Each figure differs in shapefrom the others,but togetherthey canbearrangedtomakedifferent-sizedrectangles.In Figure7.18thereis a pictureof 12pentominoesthat arejoinedtocreatea 6 x 10rectangle.Write an algorithmthat finds allpossibleways to placethepentominoessothat a 6 x 10rectangleis formed.

ZZ ZZ ZZ

ZZ ZZl ZZl

CZZI ZZl ZZl ZZl ZZl ZZ] ZZl

ZZl ZZ ZZl O ZZl ZZl ZZl

ZZl ZZl ZZl ZZl ZZl ZZl ZZ

ZZ ZZl ZZl

ZZ ZZ ZZ

Figure7.17A Hi-Qboardin its initialstate

8. Supposea set of electriccomponentssuch as transistorsare to beplacedon a circuitboard.We aregiven a connectionmatrixCONN,whereCONN(i,j)equalsthe numberof connectionsbetweencomponent i and componentj and a matrixDIST,whereDIST(r,s) is

7.8. ADDITIONAL EXERCISES 377

9

1

6

2 3

7

1110

4

8

5

12

Figure7.18A pentominoconfiguration

the distancebetweenpositionr and positions on the circuitboard.The wiring of the boardconsistsof placingeachof n componentsatsomelocation.The costof a wiring is the sum of the productsofCONN(i,j)* DIST(r,s),wherecomponenti is placedat locationrand componentj is placedat locations.Composean algorithmthatfinds an assignmentof componentsto locationsthat minimizesthetotalcostof the wiring.

9. Supposetherearen jobstobeexecutedbut only k processorsthat canwork in parallel.Thetimerequiredby jobi is U. Write an algorithmthat determineswhichjobsare to be run on which processorsand theorderin which they shouldbe run sothat the finish timeof the lastjobis minimized.

10.Two graphsG(V,E)and H(A,B)are calledisomorphicif thereis aone-to-oneonto correspondenceof the verticesthat preservestheadjacency relationships.More formally if / is a function from V to A and(v,w) is an edgein E, then (f(v),f(w)) is an edgein H. Figure7.19showstwo directedgraphsthat areisomorphicunderthe mappingthat1,2, 3,4,and 5 and go toa,b, c,d, and e. A brute forcealgorithmtotest two graphs for isomorphismwould try out all n! possiblecorrespondences and then test toseewhetheradjacencywas preserved.Abacktrackingalgorithmcando betterthan this by applying someobvious pruningto the resultantstatespacetree.Firstof allwe know thatfor a correspondencetoexistbetweentwo vertices,they must have thesamedegree.Sowe can selectat an early stageverticesof degreek forwhich the secondgraph has the fewestnumberof verticesof degreek.Thisexercisecallsfor devisingan isomorphismalgorithmthat is basedon backtrackingand makesuse of theseideas.


Figure7.19Two isomorphicgraphs(Exercise10)

11.A graph is calledcompleteif all its verticesare connectedto all theotherverticesin thegraph.A maximalcompletesubgraphof a graphiscalleda clique.By \"maximal\" we meanthat this subgraphis containedwithin no othersubgraphthat is alsocomplete.A cliqueof sizek has(J subcliquesof sizei,1< i < k. Thisimpliesthat any algorithmthatlooksfor a maximalcliquemust becarefulto generateeachsubcliquethe fewestnumberof timespossible.Oneway togeneratethe cliqueisto extenda cliqueof sizem tosizem + 1and to continuethis processby trying out allpossiblevertices.But this strategy generatesthe samecliquemany times;this can be avoided as follows. Given a cliqueX,supposenodev is the first nodethat is addedto producea cliqueofsizeonegreater.After the backtrackingprocessexaminesallpossiblecliquesthat areproducedfrom X and v, then no vertexadjacenttovneedbe addedto X and examined.Let X and Y be cliquesand letX be properly containedin Y. If allcliquescontainingX and vertexv have beengenerated,then allcliqueswith Y and v can be ignored.Write a backtrackingalgorithmthat generatesthe maximalcliquesofan undirectedgraphand makesuse of theselast rulesfor pruningthestatespacetree.

Chapter8

BRANCH-AND-BOUND

8.1 THE METHODThischaptermakesextensiveuseof terminologydefined in Section7.1.Thereaderis urgedto review this sectionbeforeproceeding.

Thetermbranch-and-boundrefersto all statespacesearchmethodsinwhich all childrenof the E-nodeare generatedbeforeany otherlive nodecan becomethe E-node.We have already seen(in Section7.1)two graphsearchstrategies,BFSand D-search,in which the explorationof a newnodecannotbeginuntil the nodecurrently beingexploredis fully explored.Both of thesegeneralizeto branch-and-boundstrategies.In branch-and-boundterminology,a BFS-likestatespacesearchwill becalledFIFO(FirstIn FirstOut) searchas the list of live nodesis a first-in-first-out list (orqueue). A D-search-likestate spacesearchwill be calledLIFO(Last InFirstOut) searchas the list of live nodesis a last-in-first-outlist (orstack).As in the caseof backtracking,boundingfunctions are used to help avoidthe generationof subtreesthat donot containan answer node.

Example8.1[4-queens]Let us seehow a FIFObranch-and-boundalgorithm would searchthe statespacetree (Figure7.2)for the 4-queensproblem. Initially, thereis only one live node,node1.This representsthe casein which no queenhas beenplacedon the chessboard.This nodebecomesthe E-node.It is expandedand itschildren,nodes2,18,34, and 50,aregenerated.Thesenodesrepresenta chessboardwith queen1 in row 1andcolumns1,2, 3,and 4 respectively.Theonly live nodesnow arenodes2, 18,34, and 50.If the nodesaregeneratedin this order,then the next i?-nodeis node2. It is expandedand nodes3,8,and 13are generated.Node 3is immediately killedusingthe boundingfunction of Example7.5.Nodes8 and 13areaddedto the queueof live nodes.Node 18becomesthe nextE-node.Nodes19,24, and 29 aregenerated.Nodes19and 24 arekilledasa result of the boundingfunctions. Node29 is addedto the queueof live

379

380 CHAPTER8. BRANCH-AND-BOUND

nodes.TheE-nodeis node34.Figure8.1shows the portionof the treeofFigure7.2that is generatedby a FIFObranch-and-boundsearch.Nodesthat arekilledasa resultof the boundingfunctionshave a \"B\" underthem.Numbersinsidethe nodescorrespondto the numbersin Figure7.2.Numbers outsidethe nodesgive the orderin which the nodesaregeneratedbyFIFObranch-and-bound.At the timethe answer node,node31,is reached,the only live nodesremainingarenodes38 and 54.A comparisonof Figures7.6and 8.1indicatesthat backtrackingis a superiorsearchmethodfor thisproblem. \342\226\241

<5ianswer node

Figure8.1Portionof 4-queensstatespacetreegeneratedby FIFObranch-and-bound

8.1.1LeastCost(LC)Search

In both LIFOand FIFObranch-and-boundthe selectionrule for the nextE-nodeis ratherrigidand in a senseblind.Theselectionrule for the nextE-nodedoesnot give any preferenceto a nodethat has a very goodchanceof gettingthe searchto an answernodequickly. Thus, in Example8.1,whennode30is generated,it shouldhave becomeobvious to the searchalgorithmthat this nodewill leadtoan answernodein onemove.However,the rigidFIFOrule first requiresthe expansionof alllive nodesgeneratedbeforenode30 was expanded.

8.1.THEMETHOD 381

The searchfor an answernodecan often be speededby usingan\"intelligent\" rankingfunction c(-) for live nodes.Thenext E-nodeis selectedon the basisof this rankingfunction. If in the 4-queensexamplewe use arankingfunction that assignsnode30 a betterrank than allotherlive nodes,then node30 will becomethe E-nodefollowing node29.The remaininglivenodeswill never becomeE-nodesas the expansionof node30 resultsin thegenerationof an answernode(node31).

The idealway to assignranks would be on the basisof the additionalcomputationaleffort (orcost)neededto reachan answer nodefrom the livenode.Forany nodex, this costcouldbe (1) the numberof nodesin thesubtreex that needto be generatedbeforean answer nodeis generatedor, moresimply, (2) the numberof levels the nearestanswer node(in thesubtreex) is from x. Usingcostmeasure2, the costof the rootof the treeof Figure8.1is 4 (node31is four levelsfrom node1).Thecostsof nodes18and 34, 29 and 35,and 30 and 38 are respectively3, 2, and 1.Thecostsofall remainingnodeson levels 2, 3,and 4 are respectivelygreaterthan 3, 2,and 1.Usingthesecostsas a basisto selectthe next \302\2437-node, the E-nodesare nodes1,18,29,and 30 (in that order). Theonly othernodesto getgeneratedarenodes2,34, 50,19,24, 32,and 31.It shouldbe easy to seethat if costmeasure1 is used,then the searchwould always generatetheminimum numberof nodesevery branch-and-boundtype algorithmmustgenerate.If costmeasure2 is used,then the only nodesto become\302\243?-nodes

are the nodeson the path from the rootto the nearestanswer node.Thedifficulty with usingeitherof theseidealcostfunctions is that computingthe costof a nodeusually involves a searchof the subtreex for an answernode.Hence,by the timethe costof a nodeis determined,that subtreehas beensearchedand thereis no needtoexplorex again.Forthis reason,searchalgorithmsusually rank nodesonly on the basisof an estimateg()of theircost.

Let g(x)bean estimateof the additionaleffort neededto reachan answernodefrom x. Node x is assigneda rank using a function c(-) such thatc(x) = f(h(x)) + g(x),whereh(x) is the costof reachingx from the rootand

/(\342\226\240)is any nondecreasingfunction.At first, we may doubt the usefulness

of usingan/(\342\226\240)

otherthan f(h(x)) = 0 for all h(x). We can justify suchan

/(\342\226\240)on the groundsthat the effort already expendedin reachingthe live

nodescannot be reducedand all we areconcernedwith now is minimizingthe additionaleffort we spendto find an answer node.Hence,the effortalready expendedneednot beconsidered.

Using/(\342\200\242)

= 0 usually biasesthe searchalgorithmto make deepprobesinto the searchtree.To seethis, note that we wouldnormally expectg(y) <g(x) for y, a childof x. Hence,following x, y will becomethe E-node,thenoneof y's childrenwill becometheE-node,nextoneof j/'sgrandchildrenwillbecomethe E-node,and soon.Nodesin subtreesotherthan the subtreexwill not get generateduntil the subtreex is fully searched.This would not


be a causefor concernif g(x) were the true costof x. Then, we would notwish to explorethe remainingsubtreesin any case(asx is guaranteedto getus to an answernodequickerthan any otherexistinglive node).However,g(x) is only an estimateof the true cost.So,it is quitepossiblethat for twonodesw and z, g{w)< g(z) and z is much closertoan answernodethanw. It is thereforedesirablenot to overbias the searchalgorithmin favor ofdeepprobes.By using /(\342\200\242) ^ 0,we can forcethe searchalgorithmto favora nodez closeto the rootover a nodew which is many levelsbelowz. Thiswould reducethe possibility of deepand fruitlesssearchesinto the tree.

A searchstrategy that usesa costfunction c(x)= f(h(x))+g(x)to selectthe next E-no&ewould always choosefor its next E-no&ea live nodewithleastc(-).Hence,sucha searchstrategy is calledan LC-search(LeastCostsearch).It is interestingto note that BFSand D-searchare specialcasesof LC-search.If we use g(x) = 0 and f(h(x)) = level of nodex, then aLC-searchgeneratesnodesby levels.This is essentiallythe sameas a BFS.If f(h(x)) = 0 and g(x) > g(y) whenever y is a childof x, then the searchis essentially a D-search.An LC-searchcoupledwith boundingfunctions iscalledan LC branch-and-boundsearch.

In discussingLC-searches,we sometimesmakereferenceto acostfunctionc(-)defined as follows: if x is an answernode,then c(x) is the cost(level,computationaldifficulty, etc.)of reachingx from the rootof the statespacetree. If x is not an answer node,then c(x) = 00 providing the subtreex containsno answernode;otherwisec(x) equalsthe costof a minimum-costanswer nodein the subtreex. It shouldbe easy to seethat c(-) withf(h(x)) = h(x) is an approximationto c(-).Fromnow on c(x) is referredtoas the costof x.

8.1.2The15-puzzle:An ExampleThe15-puzzle(invented by SamLoyd in 1878)consistsof 15numberedtileson a squareframe with a capacity of 16tiles (Figure8.2).We are givenan initial arrangementof the tiles,and the objectiveis to transform thisarrangementinto the goal arrangementof Figure8.2(b)through a seriesof legalmoves. Theonly legalmoves are ones in which a tile adjacenttothe empty spot(ES) is moved to ES.Thus from the initial arrangementof Figure 8.2(a),four moves are possible.We can move any one of thetilesnumbered2, 3, 5, or 6 to the empty spot.Followingthis move, othermoves can be made. Eachmove createsa new arrangementof the tiles.Thesearrangementsarecalledthe statesof the puzzle.Theinitialand goalarrangementsarecalledthe initialand goalstates.A stateis reachablefromthe initialstate iff thereis a sequenceof legalmoves from the initial stateto this state.Thestate spaceof an initialstate consistsof all statesthatcanbereachedfrom the initialstate.Themost straightforward way to solvethe puzzlewould be to searchthe statespacefor the goalstateand usethe

8.1.THEMETHOD 383

path from the initialstate to the goalstateas the answer.It is easy to seethat thereare16!(16!\302\253 20.9x 1012)different arrangementsof the tilesonthe frame.Of theseonly one-halfarereachablefrom any given initialstate.Indeed,the statespacefor the problemis very large.Beforeattemptingtosearchthis statespacefor the goalstate,it wouldbeworthwhileto determinewhetherthe goalstate is reachablefrom the initialstate.Thereis a verysimpleway to do this.Let us numberthe frame positions1to 16.Positioniis the frame positioncontainingtilenumberedi in the goalarrangementofFigure8.2(b).Position16is the empty spot.Let position(i)be the positionnumberin the initial state of the tilenumberedi. Thenpositioned)willdenotethe positionof the empty spot.

1

27

8

3

6

9

4 ,155 !121111410

j

13

1

59

13

261014

37

1115

4812

(a) An arrangement (b) Goalarrangement (c)

Figure8.215-puzzlearrangements

Forany state let less(i)be the numberof tilesj such that j position(i).Forthe stateof Figure8.2(a)we have, forexample, less(l)= 0,less(4)= 1,and less(12)= 6. Let x =1 if in the initialstate the empty spotis at one of the shadedpositionsof Figure8.2(c)andx = 0 if it is at oneof the remainingpositions.Then, we have the followingtheorem:

Theorem8.1Thegoalstateof Figure8.2(b)is reachablefrom the initialstate iff Ya=i less(i)+x is even.


Theorem8.1can be used to determinewhetherthe goalstate is in thestatespaceof the initialstate.If it is,then we can proceedto determineasequenceof moves leadingto the goalstate.To carry out this search,thestate spacecan be organizedinto a tree. The childrenof eachnodex inthis treerepresentthe statesreachablefrom statex by one legalmove. Itis convenient to think of a move as involving a move of the empty spaceratherthan a move of a tile.Theempty space,on eachmove, moves eitherup, right, down, or left. Figure8.3shows the first threelevelsof the state


spacetreeof the 15-puzzlebeginningwith the initialstateshown in the root.Partsof levels4 and 5 of the treearealsoshown.Thetreehas beenpruneda little.No node p has a child state that is the sameas p 'sparent. Thesubtreeeliminatedin this way is already presentin the tree and has rootparent(p).As can beseen,thereis an answer nodeat level 4.

Edgesare labeledaccordingto the directionin which the empty spacemoves

Figure8.3Part of the statespacetreefor the 15-puzzle

A depth first state spacetree generationwill result in the subtreeofFigure8.4when the next moves areattemptedin the order:move the emptyspaceup, right, down, and left.Successiveboardconfigurationsrevealthateachmove gets us farther from the goalratherthan closer.The searchofthe state spacetree is blind.It will take the leftmost path from the rootregardlessof the starting configuration.As a result,an answernodemaynever be found (unlessthe leftmost path ends in sucha node).In a FIFOsearchof the tree of Figure8.3,the nodeswill be generatedin the ordernumbered.A breadthfirst searchwill always find a goalnodenearestto theroot.However,sucha searchis alsoblind in the sensethat no matterwhatthe initialconfiguration,the algorithmattemptsto make the samesequenceof moves.A FIFOsearchalways generatesthe statespacetreeby levels.

8.1.THEMETHOD 385

1___ 7 ^ 4 5 61

5

913

2

610

14

1

5

913

2

610

14

3

7

15

48

II12

a

43

n

12

7 [IS

up

11down

i

5

913

2

610

14

3

7

4

8

II

15112

1

5

913

2

610

14

8

4

3

7

right^

10

11

12

15

r right

i

5

913

2

610

14

4

3

7

15

!5

9

L'l

2

610

14

4

3

7

8

11

12

8

11

12

15

down

9

c UP

i

5

913

1

5

913

2

610

14

4

3

7

8

11

15 12

2

6110

14

4

3

7

8

II12

15

down

8

e UP

i

5

913

2

610

14

4

3

7

15

8

11

12

1

5

913

2

610

14

4

3

7

8

II12

15

down

7

up

i

5

913

2

610

14

1

5

913

2

610

14

4

3

715

8

11

12

left

4

3

7

8

11

12

15

Figure8.4Firstten stepsin a depthfirst search

What we wouldlike,is a more\"intelligent\" searchmethod,onethat seeksout an answernodeand adaptsthe path it takesthrough the state spacetreeto the specificprobleminstancebeingsolved.We can associatea costc(x) with eachnodex in the statespacetree.Thecostc(x) is the lengthofa path from the rootto a nearestgoalnode(if any) in the subtreewith rootx. Thus, in Figure8.3,c(l)= c(4)= c(10)= c(23)= 3. When sucha costfunction is available,a very efficient searchcanbecarriedout.We beginwiththe rootas the E-nodeand generatea childnodewith c()-valuethe sameas the root.Thus childrennodes2, 3,and 5 areeliminatedand only node4 becomesa live node.This becomesthe next _E-node.Itsfirst child,node10,has c(10)= c(4)= 3. Theremainingchildrenarenot generated.Node4 diesand node10becomesthe E-no&e.In generatingnode10'schildren,node22 is killedimmediately as c(22)> 3. Node23 is generatednext. Itis a goalnodeand the searchterminates.In this searchstrategy, the onlynodesto become-E-nodesarenodeson the path from the rootto a nearestgoalnode.Unfortunately, this is an impracticalstrategy as it is not possibleto easily computethe function c(-)specifiedabove.

We can arrive at an easy to computeestimatec(x) of c(x).We can writec(x) = f(x) +g(x),wheref(x) is the lengthof the path from the root tonodex and g(x) is an estimateof the lengthof a shortestpath from \302\243 to agoalnodein the subtreewith rootx. Onepossiblechoicefor g(x) is

g(x) = numberof nonblank tilesnot in their goalposition

Clearly, at leastg(x) moves have to be madeto transform statei to agoalstate.More than g(x) moves may be neededto achievethis.To seethis, examinethe problemstateof Figure8.5.Thereg(x) = 1as only tile7is not in its final spot(the countfor g(x) excludesthe blank tile).However,the numberof moves neededto reachthe goalstateis many morethan g(x).Soc(x) is a lowerbound on the value of c(x).


An LC-searchof Figure8.3usingc(x) will beginby usingnode1as theE-node.All its childrenaregenerated.Node1diesand leaves behindthelive nodes2,3,4, and 5. Thenextnodeto becomethe E-nodeis a live nodewith leastc(x).Thenc(2)= 1+4,c(3)= 1+4,c(4)= 1+2, andc(5)= 1+4.Node4 becomesthe E-node.Itschildrenaregenerated.The live nodesatthis timeare 2, 3, 5, 10,11,and 12.Soc(10)= 2 + 1,c(ll)=2 + 3,andc(12)=2+ 3.Thelive nodewith leastc is node10.This becomesthe nextE-node.Nodes22 and 23 aregeneratednext.Node 23 is determinedto bea goalnodeand the searchterminates.In this caseLC-searchwas almostas efficient as usingthe exactfunction c().It shouldbe noted that with asuitablechoicefor c(),an LC-searchwill be far moreselectivethan any ofthe othersearchmethodswe have discussed.

1

5

9

13

26

1014

3

1115

4

8

127

Figure8.5Problemstate

8.1.3ControlAbstractionsfor LC-SearchLet t bea statespacetreeand c() a costfunction for the nodesin t. If \302\243 is anodein t, then c(x) is the minimum costof any answer nodein the subtreewith rootx. Thus, c(t) is the costof a minimum-costanswer nodein t.As remarkedearlier,it is usually not possibletofind an easily computablefunction c()as definedabove.Instead,a heuristicc that estimatesc() isused.This heuristicshouldbe easy to computeand generallyhas the propertythat if x is eitheran answer nodeor a leaf node,then c(x) = c(x).LCSearch(Algorithm 8.1)uses c to find an answer node.The algorithmuses twofunctions Least()and Add(x) to deleteand adda live nodefrom or to thelist of live nodes,respectively.Least()finds a live nodewith leastc().Thisnodeis deletedfrom the list of live nodesand returned.Add(x) addsthenew live nodex to the list of live nodes.The list of live nodeswill usuallybe implementedas a min-heap(Section2.4).Algorithm LCSearchoutputsthe path from the answer nodeit finds to the rootnodet. This is easy todo if with eachnodex that becomeslive, we associatea field parentwhichgivesthe parent of nodex.When an answer nodeg is found, the path from

8.1.THEMETHOD 387

g to t can be determinedby following a sequenceof parent values startingfrom the currentE-no&e(which is the parent of g) and endingat nodet.

listnode= record{listnode* next, * parent;floatcost;

}1 AlgorithmLCSearch(t)2 // Searcht for an answer node.3 {45678910111213141516171819202122 }

if *t is an answer nodethenoutput *t and return;E :=t; II E-node.Initializethe list of live nodesto beempty;repeat{

for eachchildx of E do{

if x is an answer nodethenoutput the pathfrom x to t and return;

Add (a;);// x is a new live node.(x \342\200\224> parent):=E; // Pointerfor path toroot.

}if thereareno morelive nodesthen{

write (\"Noanswer node\;")return;}E :=LeastQ;

}until(false);

Algorithm8.1LC-search

Thecorrectnessof algorithmLCSearchis easy to establish.Variable Ealways points to the current E-node.By the definition of LC-search,therootnodeis the first E-node(line5).Line6 initializesthe list of live nodes.At any timeduringthe executionof LCSearch,this list containsall live nodesexceptthe E-node.Thus, initially this list shouldbe empty (line6).Thefor loopof line 9 examinesall the childrenof the jS'-node.If one of thechildrenis an answer node,then the algorithmoutputs the path from x to tand terminates.If a childof E is not an answer node,then it becomesa livenode.It is addedto the list of live nodes(line13)and its parentfield set to


E (line14).When all the childrenof E have beengenerated,E becomesadeadnodeand line16is reached.This happensonly if noneof E'schildrenis an answer node.So,the searchmust continuefurther. If thereareno livenodesleft, then the entirestatespacetreehas beensearchedand no answernodesfound. Thealgorithmterminatesin line 18.Otherwise,LeastQ, bydefinition,correctlychoosesthe next E-nodeand the searchcontinuesfromhere.

From the precedingdiscussion,it is clearthat LCSearchterminatesonlywhen eitheran answer nodeis found or the entirestatespacetreehas beengeneratedand searched.Thus,terminationis guaranteedonly for finite statespacetrees.Terminationcanalsobeguaranteedfor infinite statespacetreesthat have at leastone answer nodeprovided a \"proper\" choicefor the costfunction c() is made.This is the case,for example,when c(x) > c(y) forevery pairof nodesx and y suchthat the level numberof x is \"sufficiently\"

higherthan that of y. For infinite statespacetreeswith no answer nodes,LCSearchwill not terminate.Thus, it is advisableto restrictthe searchtofind answer nodeswith a costno morethan a given boundC.

One shouldnote the similarity betweenalgorithmLCSearchandalgorithms for a breadthfirst searchand D-searchof a statespacetree. If thelist of live nodesis implementedas a queuewith LeastQ and Add (a;) beingalgorithmsto deletean elementfrom and addan elementto the queue,thenLCSearchwill be transformedto a FIFOsearchschema.If the list of livenodesis implementedas a stackwith Least()and Add(a;) beingalgorithmstodeleteand add elementsto the stack,then LCSearchwill carry out a LIFOsearchof the statespacetree.Thus,the algorithmsfor LC,FIFO,and LIFOsearchareessentially the same.Theonly differenceis in the implementationof the list of live nodes.This is to beexpectedas the threesearchmethodsdiffer only in the selectionruleused to obtain the next .E-node.

8.1.4BoundingA branch-and-boundmethodsearchesa state spacetree using any searchmechanismin which all the childrenof the E-nodeare generatedbeforeanothernodebecomesthe E-node.We assumethat eachanswer nodex hasa costc(x) associatedwith it and that a minimum-costanswer nodeis to befound. ThreecommonsearchstrategiesareFIFO,LIFO,and LC.(Anothermethod,heuristicsearch,is discussedin the exercises.)A costfunctionc(-)suchthat c(x) < c(x) is usedto provide lower boundsonsolutionsobtainablefrom any nodex. If upper is an upperboundon the costof a minimum-costsolution,then all live nodesx with c(x) > uppermay bekilledas allanswernodesreachablefrom x have costc(x) > c(x) > upper.Thestartingvaluefor uppercan beobtainedby someheuristicor can beset to oo.Clearly,solongas the initialvalue for upper is no lessthan the costof a minimum-costanswernode,the above rulesto kill live nodeswill not result in the killingof

8.1.THEMETHOD 389

a live node that can reacha minimum-costanswer node.Each timea newanswer nodeis found, the value of uppercan beupdated.

Let us seehow theseideascan be used to arrive at branch-and-boundalgorithmsfor optimizationproblems.In this sectionwe dealdirectly onlywith minimizationproblems.A maximizationproblemis easily convertedtoa minimizationproblemby changingthe signof the objectivefunction.Weneedto be ableto formulate the searchfor an optimalsolutionas a searchfor a least-costanswer nodein a statespacetree.To do this, it is necessaryto define the cost function c(-) such that c(x) is minimum for all nodesrepresentingan optimalsolution.The easiestway to do this is to use theobjectivefunction itselffor c(-).For nodesrepresentingfeasiblesolutions,c(x) is the value of theobjectivefunction for that feasiblesolution.For nodesrepresentinginfeasiblesolutions,c(x) = oo.Fornodesrepresentingpartialsolutions,c(x) is the costof the minimum-costnodein the subtreewith rootx. Sincec(x) is in generalas hard to computeas the originaloptimizationproblemis to solve, the branch-and-boundalgorithmwill use an estimatec(x) suchthat c(x) < c(x) for allx. In generalthen, the c(-) function usedin a branch-and-boundsolutionto optimizationfunctions will estimatetheobjectivefunction value and not the computationaldifficulty of reachingan answer node.In addition,to be consistentwith the terminology usedin connectionwith the 15-puzzle,any noderepresentinga feasiblesolution(a solutionnode) will be an answer node.However, only minimum-costanswer nodeswill correspondto an optimalsolution.Thus, answer nodesand solutionnodesare indistinguishable.

As an exampleoptimizationproblem,considerthe jobsequencingwithdeadlinesproblemintroducedin Section4.4. We generalizethis problemto allow jobswith different processingtimes.We aregiven n jobsand oneprocessor.Eachjobi has associatedwith it a threetuple (pi,di,ti).Job%

requirest{ units of processingtime.If itsprocessingis not completedby thedeadlined,;, then a penalty pi is incurred.Theobjectiveis to selecta subsetJ of the n jobssuchthat alljobsin J can becompletedby their deadlines.Hence,a penalty can be incurredonly on thosejobsnot in J. ThesubsetJ shouldbe such that the penalty incurredis minimum amongall possiblesubsetsJ. Sucha J is optimal.

Considerthe following instance:n = 4, (pi,d\\,ti) = (5,1,1),(p2,c?2,\302\2432)

=(10,3,2),{p3,d3,t3)= (6, 2,1),and {p4,d4,t4)= (3, 1,1).The solutionspacefor this instanceconsistsof all possiblesubsetsof the jobindexset{1,2,3,4}.The solutionspacecan be organizedinto a tree by meansofeitherof the two formulations used for the sum of subsetsproblem(Example 7.2).Figure8.6correspondstothe variable tuple sizeformulation whileFigure8.7correspondsto the fixed tuple sizeformulation.In both figuressquarenodesrepresentinfeasiblesubsets.In Figure8.6allnonsquarenodesare answer nodes.Node 9 representsan optimalsolutionand is the onlyminimum-costanswernode.For this nodeJ = {2,3}and the penalty (cost)


is 8. In Figure8.7only nonsquareleafnodesare answer nodes.Node25representsthe optimalsolutionand is alsoa minimum-costanswer node.This nodecorrespondsto J = {2,3}and a penalty of 8. Thecostsof theanswer nodesof Figure8.7aregiven below the nodes.

c=0i=2N x,=4

c=0{2)X2=

Vx2 =3\\

c=o(6)2=10\302\256

X$=3/ \\V3=4

12 13 1X

4

>=4

*,=4

:=5( :=15 ;=21C

\\x,=4

c=5l c=ll^

x,=4

x,=4

c=15

Figure8.6Statespacetreecorrespondingto variable tuplesizeformulation

We can define a cost function c() for the state spaceformulationsofFigures8.6and 8.7.For any circularnodex, c(x) is the minimum penaltycorrespondingtoany nodein the subtreewith rootx.Thevalue of c(x) = oofor a squarenode.In the treeof Figure8.6,c(3)= 8,c(2)= 9, and c(l)= 8.In the treeof Figure8.7,c(l)= 8,c(2)= 9,c(5)= 13,and c(6)= 8.Clearly,c(l)is the penalty correspondingto an optimalselectionJ.

A bound c(x) such that c(x) < c(x) for all x is easy to obtain.Let Sxbe the subsetof jobsselectedfor J at nodex. If m = max {i\\i 6 5X},thenc(^) = Ekm Pi is an estimatefor c(x) with the property c(x) < c(x).For

eachcircularnodea; in Figures8.6and 8.7,the value of c(x) is the numberoutsidenodex. For a squarenode,c(x) = oo.For example,in Figure8.6for node6,56 = {1,2}and hencem = 2. Also, ^2 i<2 pi = 0. Fornode

7, 57 = {1,3}and m = 3. Therefore,2 i<2 pi = P2 = 10-And so on.IniiS?

Figure8.7,node12correspondstothe omissionof job1and hencea penaltyof 5;node13correspondstothe omissionof jobs1and 3 and hencea penaltyof 11;and soon.

A simpleupperboundu{x)on the costof a minimum-costanswer nodein the subtreex is u(x) = Z^s,,.Pi- Note that u(x) is the costof the solutionSxcorrespondingtonodex.

8.1.THEMETHOD 391

9 13 19 8 11 14 15 18 21 24

Figure8.7Statespacetreecorrespondingto fixed tuple sizeformulation

8.1.5FIFOBranch-and-BoundA FIFObranch-and-boundalgorithmfor the jobsequencingproblemcanbeginwith upper = oo (or upper = J2\\<i<n'Pi)as an uPPerboundon thecostof a minimum-costanswer node.SFarting with node1as the E-no&eand usingthe variable tuple sizeformulation of Figure8.6,nodes2,3,4,and 5 aregenerated(in that order).Then u(2) = 19,u(3) = 14,u(4) = 18,and u(5) = 21.For example,node 2 correspondsto the inclusionof job1. Thus u{2) is obtainedby summingthe penaltiesof all the otherjobs.Thevariable upperis updatedto 14when node3 is generated.Sincec(4)and c(5) are greaterthan upper,nodes4 and 5 get killed(or bounded).Only nodes2 and 3 remainalive. Node 2 becomesthe next E-node.Itschildren,nodes6,7, and 8 are generated.Then u(6) = 9 and so upper isupdatedto 9. Thecostc(7) = 10> upper and node7 gets killed.Node 8is infeasibleand soit is killed.Next,node 3 becomesthe \302\243?-node. Nodes9 and 10arenow generated.Then u(9) = 8 and soupper becomes8. Thecostc(10)= 11>upper,and this nodeis killed.Thenext .E-nodeis node6.Both its childrenare infeasible.Node 9'sonly childis alsoinfeasible.Theminimum-costanswer nodeis node9.It has a costof 8.

When implementinga FIFObranch-and-boundalgorithm,it is noteconomical to kill live nodeswith c(x) > upper eachtimeupper is updated.This is sobecauselive nodesare in the queue in the orderin which theywere generated.Hence,nodeswith c(x) > upper are distributedin some


randomway in the queue.Instead,live nodeswith c(x) > upper can bekilledwhen they areabout to become-E-nodes.

Fromhereon we shall refer to the FIFO-basedbranch-and-boundalgorithm with an appropriatec(.)and u(.) as FIFOBB.

8.1.6LCBranch-and-BoundAn LC branch-and-boundsearchof the tree of Figure8.6will beginwithupper = oo and node1 as the first E-node.When node1 is expanded,nodes2,3,4, and 5 aregeneratedin that order.As in the caseof FIFOBB,upperis updatedto 14whennode3 is generatedandnodes4 and 5 arekilledas c(4) > upper and c(5) > upper.Node 2 is the next E-nodeas c(2)= 0and c(3) = 5. Nodes6,7, and 8 aregeneratedand upperis updatedto 9when node6 is generated.So,node7 is killedas c(7)= 10> upper.Node8is infeasibleand so killed.Theonly live nodesnow arenodes3 and 6.Node 6is the next E-nodeas c(6)= 0 <c(3).Both its childrenareinfeasible.Node3 becomesthe next E-node.When node9 is generated,upper is updatedto8 as u(9) = 8.So,node10with c(10)= 11is killedon generation.Node 9becomesthe next E-node.Itsonly childis infeasible.No live nodesremain.Thesearchterminateswith node9 representingthe minimum-costanswernode.

Fromhereon we refer to the LC(LIFO)-basedbranch-and-boundalgorithm with an appropriatec(.)and u(.) as LCBB(LIFOBB).

EXERCISES

1.ProveTheorem8.1.2.Presentan algorithmschemaFifoBB for a FIFObranch-and-bound

searchfor a least-costanswer node.

3. Give an algorithmschemaLcBBfor a LC branch-and-boundsearchfora least-costanswer node.

4. Write an algorithmschemaLifoBB,for a LIFObranch-and-boundsearchfor a least-costanswer node.

5. Draw the portionof the statespacetreegeneratedby FIFOBB,LCBB,and LIFOBBfor the jobsequencingwith deadlinesinstancen = 5,(pi,P2,...,P5)= (6, 3,4, 8,5), fa, t2,...,t5)= (2, 1,2,1,1),and(d\\,d,2,...,d$)= (3, 1,4, 2, 4). What is the penalty correspondingtoan optimalsolution?Usea variable tuple sizeformulation and c(-)and u(-) as in Section8.1.

8.2.0/1KNAPSACKPROBLEM 393

6.Write a branch-and-boundalgorithmfor thejobsequencingwithdeadlines problem.Usethe fixed tuple sizeformulation.

7. (a) Write a branch-and-boundalgorithmfor the jobsequencingwithdeadlinesproblemusinga dominancerule (seeSection5.7).Youralgorithmshould work with a fixed tuple sizeformulation andshouldgeneratenodesby levels.Nodeson eachlevel shouldbekept in an orderpermittingeasy useof your dominancerule.

(b) Convert your algorithminto a programand, usingrandomlygenerated probleminstances,determinethe worth of the dominancerule as well as the boundingfunctions.To do this, you will haveto run four versions of your program:ProgA- \342\200\242\342\200\242 boundingfunctions and dominancerulesareremoved,ProgB-\342\200\242\342\200\242dominanceruleis removed,ProgC-\342\200\242\342\200\242boundingfunction is removed,and ProgD-\342\200\242\342\200\242

boundingfunctions and dominancerulesareincluded.Determinecomputingtimefiguresas well as the numberof nodesgenerated.

8.2 0/1KNAPSACKPROBLEMTo usethe branch-and-boundtechniqueto solveany problem,it is firstnecessary to conceiveof a statespacetreefor theproblem.We have already seentwo possiblestatespacetreeorganizationsfor the knapsackproblem(Section7.6).Still,we cannotdirectly apply the techniquesof Section8.1sincethesewere discussedwith respectto minimizationproblemswhereasthe knapsackproblemis a maximizationproblem.This difficulty is easily overcomebyreplacingthe objectivefunction Yl,Pixi by the function \342\200\224Y^Vî- Clearly,YpiXi is maximizediff \342\200\224 YlpiXi is minimized.Thismodifiedknapsackproblem is statedas (8.1).

n

minimize \342\200\224 /yPjXji=i

n

subjectto S^WiXi<m (8.1)i=i

Xj:= 0 or 1, 1< i < n

We continuethe discussionassuminga fixedtuplesizeformulation for thesolutionspace.Thediscussionis easily extendedto the variable tuple sizeformulation.Every leafnodein the statespacetreerepresentinganassignment for which Yl>i<i<nwixi<w is an answer (orsolution)node.All otherleafnodesare infeasfble.For a minimum-costanswer nodeto correspondto any optimalsolution,we needto define c(x) =

\342\200\224^2\\<.i<nPixi f\302\260r every


answer nodex. The costc(x) = oo for infeasibleleafnodes.For nonleafnodes,c(x)is recursivelydefined to be min {c(lchild(x)),c(rchild(x))}.

We now needtwo functions c(x) and u(x) such that c(x) < c(x) < u(x)for every nodex.Thecostc(-)and u(-) satisfying this requirementmay beobtainedas follows. Let \302\243 be a nodeat level j, 1< j < n + 1.At nodexassignmentshave already beenmadeto

\302\243j,1 thenan improvedupperboundfunction

u{x)is u(x) = UBound(q,'}2l<i<jWiXi, j \342\200\224 l,m),whereUBoundis defined in

Algorithm 8.2.As for c(x),it is clearthatBound(\342\200\224 q, î<i<iwixi,j ~ 1) <

c(x),whereBound is as given in Algorithm 7.11.

1 AlgorithmUBound(cp,cw,k, m)2 // cp,cw,k,and m have the samemeaningsas in3 // Algorithm 7.11.w[i] and p[i] arerespectively4 // the weight and profit of the ith object.5 {6 b :=cp;c :=cw,7 for % :=k + 1to n do8 I9 if (c+ w[i] < m) then10 {11 c :=c+w[i];b :=b \342\200\224 p[i];12 }13 }14 return6;15 }

Algorithm8.2Functionu(-) for knapsackproblem

8.2.1 LCBranch-and-BoundSolutionExample8.2[LCBB]Considerthe knapsackinstancen = 4, (pi,P2)P3,P4)= (10,10,12,18),(wi,W2,W3,W4) = (2,4, 6,9), and m = 15.Let us tracethe workingof an LC branch-and-boundsearchusingc(-)and u(-)asdefinedpreviously. We continueto use the fixed tuple sizeformulation.Thesearchbeginswith the rootas the i?-node.For this node,node1of Figure8.8,wehave c(l)= -38and u(l) = -32.


Uppernumber = cLowernumber = u

Figure8.8LC branch-and-boundtreefor Example8.2


Thecomputationof u(l) andc(l)is doneasfollows.Theboundu(l) hasavalue UBound(0,0,0,15).UBoundscansthroughtheobjectsfrom left to rightstarting from j; it addstheseobjectsinto the knapsackuntil the first objectthat doesn'tfit is encountered.At this time,the negationof the totalprofitof all the objectsin the knapsackplus cw is returned.In FunctionUBound,c and b start with a value of zero. For i = 1,2,and 3,c gets incrementedby 2,4,and 6,respectively.Thevariable b alsogets decrementedby 10,10,and 12,respectively.When i = 4, the test (c+ w[i] < m) fails and hencethe value returnedis \342\200\22432. FunctionBound is similartoUBound,exceptthatit alsoconsidersa fractionof the first objectthat doesn'tfit the knapsack.Forexample,in computingc(l),the first objectthat doesn'tfit is 4 whoseweight is 9. Thetotal weight of the objects1,2,and 3 is 12.So,Boundconsidersa fraction|of the object4 and hencereturns \342\200\22432

\342\200\224|* 18=\342\200\22438.

Sincenode1is not a solutionnode,LCBBsetsans = 0 and upper= \342\200\224 32(ans beinga variable to storeintermediateanswer nodes).The \302\243?-node isexpandedand its two children,nodes2 and 3, generated.Thecostc(2) =\342\200\224 38,c(3) =

\342\200\22432, u(2) =\342\200\22432,

and u(3) =\342\200\22427. Both nodesare put onto

the list of live nodes.Node2 is the next i?-node.It is expandedand nodes4 and 5 generated.Bothnodesget addedto the list of live nodes.Node4 is the live nodewith leastc value and becomesthe next E-node.Nodes6 and 7 are generated.Assuming node6 is generatedfirst, it is addedtothe list of live nodes.Next,node7 joinsthis list and upper is updatedto\342\200\22438. The next i?-nodewill be one of nodes6 and 7. Let us assumeit isnode7. Its two childrenare nodes8 and 9. Node8 is a solutionnode.Then upper is updatedto \342\200\22438 and node8 is put onto the live nodeslist.Node9 has c(9) > upper and is killedimmediately.Nodes6 and 8 aretwo live nodeswith leastc.Regardlessof which becomesthe next E-node,c(E) > upper and the searchterminateswith node8 the answer node.At

this time,the value \342\200\22438 togetherwith the path 8,7, 4, 2,1is printedoutand the algorithmterminates.Fromthe path one cannot figure out theassignmentof values to the x^s such that Y^Vixi = upper.Hence,a properimplementationof LCBBhas to keepadditionalinformationfrom which thevaluesof the x^s canbeextracted.Oneway is to associatewith eachnodeaonebit field, tag.Thesequenceof tag bitsfrom the answer nodeto the rootgive the x% values. Thus, we have tag{2)= tag(4) = tag(<o) = tag(8) = 1and tag(3) = tag(5) = tag(7) = tag(9) = 0.Thetag sequencefor the path8, 7, 4, 2,1is 10 11and so X4 = 1,x% = 0,X2 = 1,and x\\ = 1. \342\226\241

To use LCBBto solve the knapsackproblem,we needtospecify (1) thestructureof nodesin the statespacetreebeingsearched,(2)how to generatethe childrenof a given node,(3) how to recognizea solutionnode,and (4)a representationof the list of live nodesand a mechanismfor addinga nodeinto the list as well as identifying the least-costnode.Thenodestructureneededdependson which of the two formulationsfor the statespacetree isbeingused.Let us continuewith a fixed sizetuple formulation.Eachnode


x that is generatedand put onto the list of live nodesmust have a parentfield.In addition,as noted in Example8.2,eachnodeshouldhave a onebittag field.This field is neededto output the Xi values correspondingto anoptimalsolution.To generatex'schildren,we needtoknowthe level of nodex in the statespacetree.Forthis we shallusea field level.Theleft childofx is chosenby setting:c;ej,e;(x)= 1and the right childby settingxieve^x\\

= 0.To determinethe feasibility of the left child,we needto know the amountof knapsackspaceavailable at nodex. This can be determinedeitherbyfollowing the path from nodex to the rootor by explicitlyretainingthisvalue in the node.Say we choosetoretainthis value in a field cu (capacityunused). Theevaluationof c(x) and u(x) requiresknowledgeof the profit^2i<i<level(x)Pixiearnedby the filling correspondingto nodex.Thiscanbecomputedby following the path from x to the root.Alternatively, this valuecanbeexplicitlyretainedin a fieldpe.Finally, in ordertodeterminethe livenodewith leastc value ortoinsertnodesproperly into the list of live nodes,we needto know c(x). Again, we have a choice.The value c(x) may bestoredexplicitlyin a field ub or may be computedwhen needed.Assumingallinformation is kept explicitly,we neednodeswith sixfieldseach:parent,level,tag,cu,pe,and ub.

Usingthis six-fieldnodestructure,the childrenof any live nodex can beeasily determined.The left childy is feasibleiff cu(x) > wieveiixy In thiscase,parent(y) = x, level(y)= level(x)+ 1,cu(y) = cu(x)\342\200\224wievei^, pe(y)= pe(x)+ Pievei(x),tag(y) = 1,and ub(y) = ub(x). The right childcan begeneratedsimilarly. Solutionnodesareeasily recognizedtoo.Node \302\243 is asolutionnode iff level(x)= n + 1.

We arenow left with the task of specifyingthe representationof the listof live nodes.The functions we wish toperformon this list are (1) test ifthe list is empty, (2) add nodes,and (3) deletea nodewith leastub. Wehave seena data structurethat allows us to perform thesethreefunctionsefficiently: a min-heap.If thereare m live nodes,then function (1) can becarriedout in 0(1)time,whereasfunctions (2)and (3)requireonly 0(logm)time.

8.2.2 FIFOBranch-and-BoundSolution

Example8.3Now, let us tracethroughthe FIFOBBalgorithmusingthesameknapsackinstanceas in Example8.2.Initially the rootnode,node1of Figure8.9,is the i?-nodeand the queueof live nodesis empty. Sincethisis not a solutionnode,upper is initializedto u(l) =

\342\200\22432.

We assumethe childrenof a nodeare generatedleft to right. Nodes2and 3 aregeneratedand addedto the queue(in that order).Thevalue ofupper remainsunchanged.Node 2 becomesthe next i?-node.Itschildren,nodes4 and 5, are generatedand addedto the queue. Node 3, the next


10 11 12 13

upper number = clower number = u

Figure8.9FIFObranch-and-boundtreefor Example8.3


_E-node, is expanded.Itschildrennodesaregenerated.Node6 gets addedto the queue. Node7 is immediately killedas c(7) > upper. Node4 isexpandednext.Nodes8 and 9 aregeneratedand addedto the queue.Thenupper is updatedto u(9) =

\342\200\22438. Nodes5 and 6 are the next two nodesto become_E-nodes.Neitheris expandedas for each,c() > upper.Node8is the next I?-node.Nodes10and 11aregenerated.Node 10is infeasibleand so killed.Node11has c(ll)> upper and sois alsokilled.Node9 isexpandednext. When node12is generated,upper and ans areupdatedto\342\200\22438 and 12respectively.Node12joinsthe queueof live nodes.Node 13is killedbeforeit can get onto the queueof live nodesas c(13)> upper.Theonly remaininglive nodeis node12.It has no childrenand the searchterminates.The value of upper and the path from node12to the root isoutput.As in the caseof Example8.2,additionalinformation is neededtodeterminethe x\\ values on this path. \342\226\241

At first we may be temptedto discardFIFOBBin favor of LCBBinsolvingknapsack.Our intuitionleadsus to believethat LCBBwill examinefewer nodesin its quest for an optimalsolution.However,we shouldkeepinmind that insertionsinto and deletionsform a heapare far moreexpensive(proportionalto the logarithmof the heap size) than the correspondingoperationson a queue (0(1)).Consequently, the work donefor eachE-nodeis morein LCBBthan in FIFOBB.UnlessLCBBusesfar fewer E-nodesthan FIFOBB,FIFOBBwill outperform(intermsof realcomputationtime)LCBB.

We have now seenfour different approachesto solving the knapsackproblem:dynamic programming,backtracking,LCBB,and FIFOBB.If wecomparethe dynamic programmingalgorithmDKnap (Algorithm 5.7) andFIFOBB,we seethat thereis a correspondencebetweengeneratingthe S^'sand generatingnodesby levels.S^'containsall pairs(P,W) correspondingtonodeson level i +1,0 < i <n.Hence,both algorithmsgeneratethe statespacetreeby levels.The dynamic programmingalgorithm,however, keepsthe nodesoneachlevel orderedby theirprofit earned(P) and capacity used(W) values. No two tuples have the sameP or W value. In FIFOBBwemay have many nodeson the samelevel with the sameP or W value. Itis not easy to implementthe dominancerule of Section5.7 into FIFOBBas nodeson a level are not orderedby their P or W values. However,theboundingrulescan easily be incorporatedinto DKnap. Toward the end ofSection5.7we discussedsomesimpleheuristicsto determinewhethera pair(P, W) \342\202\254. S^ shouldbe killed. Theseheuristicsare readily seento beboundingfunctions of the type discussedhere. Let the algorithmresulting

from the inclusionof the boundingfunctions into DKnap be DKnapl.DKnapl is expectedtobesuperiorto FIFOBBas it usesthe dominancerulein additionto the boundingfunctions. In addition,the overheadincurredeachtimea nodeis generatedis less.


To determinewhich of the knapsackalgorithmsis best,it is necessaryto programthem and obtain realcomputingtimesfor different data sets.Sincethe effectivenessof the boundingfunctions and the dominancerule ishighly data dependent,we expecta wide variation in the computingtimefor different probleminstanceshaving the samenumberof objectsn.To getrepresentativetimes,it is necessaryto generatemany probleminstancesfora fixed n and obtain computingtimesfor theseinstances.Thegenerationof thesedatasetsand the problemof conductingthe testsis discussedin aprogrammingprojectat the end of this section.The resultsof sometestscan be found in the referencesto this chapter.

Beforeclosingour discussionof the knapsackproblem,we briefly discussa very effective heuristicto reducea knapsackinstancewith largen to anequivalent one with smallern. This heuristic,Reduce,uses someof theideasdevelopedfor the branch-and-boundalgorithm.Itclassifiesthe objects{1,2,...,n} into oneof threecategories71,72,and 73.71is a setof objectsfor which Xi must be 1 in every optimalsolution.72 is a set for which x^must be 0. 73 is {1,2,...,n}\342\200\224 71 \342\200\224 72. Once71, 72, and 73 have beendetermined,we needto solveonly the reducedknapsackinstance

maximizeVJpixi

subjectto ^ wixi < m ~~ X] WiXi (^)iei3 ien

X{ = 0 or 1

Fromthe solutionto (8.2)an optimalsolutionto the originalknapsackinstance is obtainedby settingxi = 1if i G 71and xi = 0 if i G 72.

FunctionReduce(Algorithm8.3)makesuseof two functions Ubband Lbb.TheboundUbb(71,72)is an upperboundonthe value of anoptimalsolutionto the given knapsackinstancewith addedconstraintsX{ = 1if i G 71and xi= 0 if i G 72.TheboundLbb(71,72) is a lower boundunderthe constraintsof 71and 72. Algorithm Reduceneedsno further explanation.It shouldbeclearthat 71and 72aresuchthat from an optimalsolutionto (8.2),we caneasily obtainan optimalsolutionto the originalknapsackproblem.

Thetimecomplexityof Reduceis 0(n2).Becausethe reductionprocedureis very much like the heuristicsusedin DKnapl and the knapsackalgorithmsof this chapter,the use of Reducedoesnot decreasethe overall computingtimeby as much as may be expectedby the reductionin the numberofobjects.Thesealgorithmsdodynamically what Reducedoes.Theexercisesexplorethe value of Reducefurther.


1 AlgorithmReduce(p,w,n,m,11,12)2 // Variablesareas describedin the discussion.3 // p[i\\/w\\i] >p[i + l]/w[i+ 1],1 q) thenq :=Lbb(0,{i});12 }13 for i :=k + 1to n do14 {15 if (Ubb({i},0) <g) then72 :=72U {i};16 elseif (\\-bb{{i},0) > q) theng :=Lbb({i},0);17 }18 }

Algorithm8.3Reductionpseudocodefor knapsackproblem


EXERCISES1.Work out Example8.2usingthe variable tuple sizeformulation.

2. Work out Example8.3usingthe variable tuple sizeformulation.

3. Draw the portionof the state spacetreegeneratedby LCBBfor thefollowing knapsackinstances:

(a) n = 5, {pi,P2,---,P5)= (10,15,6, 8, 4), (u>i, w2,...,w5) =(4, 6, 3, 4, 2), and m = 12.

(b) n = 5, (pi,P2,P3,P4,Pb)= {wi,w2,w3,w4,w5)= (4, 4, 5, 8,9)and m = 15.

4. Do Exercise3 usingLCBBon a dynamic statespacetree(seeSection7.6).Usethe fixed tuple sizeformulation.

5. Write a LCBBalgorithmfor the knapsackproblemusing the ideasgiven in Example8.2.

6.Write a LCBBalgorithmfor the knapsackproblemusing the fixedtuplesizeformulation and the dynamic statespacetreeof Section7.6.

7. [Programmingproject]Programthe algorithmsDKnap (Algorithm5.7),DKnapl (page399),LCBBfor knapsack,and Bknap(Algorithm7.12).Comparetheseprogramsempiricallyusingrandomly generateddataas below:

(a) RandomWi and pi,W{ \342\202\254 [1,100],pi \342\202\254 [1,100],and m = Xa wi/2-(b) Randomu>i andpt,u>i \302\243 [l,100],pjG [1,100],and m = 2 max{u>i}.(c) RandomWi, W{ \342\202\254 [1,100],pi = Wi + 10,and m = Xa Wi/2.(d) Sameas (c) exceptm = 2 max {wi}.(e) Randompi,pi G [1,100],u>i = pi + 10,and m = J2iwi/2.(f) Sameas (e)exceptm = 2 max {u>i}.

Obtain computingtimesfor n = 5,10,20,30,40,....Foreachn,generate (say) ten probleminstancesfrom eachof the above data sets.Reportaverageand worst-casecomputingtimesfor eachof the abovedatasets.Fromthesetimescan you say anything about the expectedbehaviorof thesealgorithms?Now, generateprobleminstanceswith^j = ioj, 1< i < n, m = ^2tOj/2,and Yl WiXi ^ m for any 0,1assignmenttothe x^s. Obtaincomputingtimesfor your four programsfor n = 10,20,and 30.Now study theeffect of changingthe range to [1,1000]in data sets(a) through (f).In sets(c)to (f) replacepi = u>i +10by pi = u>i +100and w^ = Pi +10by Wi=pi+ 100.

8.3.TRAVELINGSALESPERSON(*) 403

8. [Programmingproject]

(a) Programthe reductionheuristicReduceof Section8.2.Generateseveralprobleminstancesfrom the data setsof Exercise7 anddeterminethe sizeof the reducedprobleminstances.Usen =100,200,500,and 1000.

(b) ProgramDKnap and the backtrackingalgorithmBknap for theknapsackproblem.Comparethe effectivenessof Reducebyrunning severalprobleminstances(as in Exercise7). Obtainaverageand worst-casecomputingtimesfor DKnap and Bknap for thegeneratedprobleminstancesand alsofor the reducedinstances.To the timesfor the reducedprobleminstances,add the timerequiredby Reduce. What conclusioncan you draw from yourexperiments?

8.3 TRAVELINGSALESPERSON(*)An 0(n22\")dynamic programmingalgorithmfor the travelingsalespersonproblemwas arrived at in Section5.9.We now investigatebranch-and-bound algorithmsfor this problem.Although the worst-casecomplexityof thesealgorithmswill not be any better than 0(n22\,")the use of goodboundingfunctions will enablethesebranch-and-boundalgorithmsto solvesomeprobleminstancesin much lesstime than requiredby the dynamicprogrammingalgorithm.

Let G = (V, E) be a directedgraphdefining an instanceof the travelingsalespersonproblem.Let Cy equalthe costof edge(i,j),Cij = oo if (i,j) 0E,and let |V| = n. Without lossof generality, we can assumethat every tourstartsand endsat vertex1.So,the solutionspaceSis given by S= {1,n, 1|vris a permutationof (2,3,..., n)}.Then

\\S\\= (n \342\200\224 1)!.Thesizeof Scan be

reducedby restrictingS sothat (l,H,i2,...,in-i,1) \302\243 S iff (ij,ij+\\)G E,0 <j < n \342\200\224 1,and i$ = in = 1.S can beorganizedinto a statespacetreesimilarto that for the n-queensproblem(seeFigure7.2).Figure8.10showsthe treeorganizationfor the caseof a completegraph with |V| = 4. EachleafnodeL is a solutionnodeand representsthe tour defined by the pathfrom the rootto L. Node 14representsthe tour Iq = 1,i\\ = 3,%2

= 4, i3 = 2,and 14 = 1.

To useLCBBto searchthe travelingsalespersonstatespacetree,we needto definea costfunction c(-)and two otherfunctions c(-)and u(-) suchthatc(r) < c(r) < u(r) for all nodesr. The costc(-) is such that the solutionnodewith leastc(-)correspondsto a shortesttour in G.Onechoicefor c(-)is

/ a\\ _ j lengthof tour defined by the path from the rootto A, if A is a leaf^

\\ costof a minimum-costleafin the subtreeA, if A is not a leaf


\302\251 \302\251 \302\251 \302\251 \302\251 \302\256

Figure8.10State spacetree for the travelingsalespersonproblemwithn = 4 and i$ = i^ = I

A simplec(-)suchthat c(A) < c(A) for all A is obtainedby definingc(A)tobethe lengthof the path definedat nodeA. Forexample,the path definedat node6 of Figure8.10is io,H,\302\2532

= 1,2,4.It consistsof the edges(1,2)and (2,4).A betterc(-) can beobtainedby usingthe reducedcostmatrixcorrespondingto G.A row (column)is saidto be reducediff it containsatleastonezeroand allremainingentriesarenon-negative.A matrixis reducediff every row and columnis reduced.As an exampleof how to reducethecostmatrixof a given graphG,considerthe matrixof Figure8.11(a).Thiscorrespondsto a graph with five vertices.Sinceevery tour on this graphincludesexactlyoneedge(i,j)with i = k, 1< k < 5, and exactlyoneedge(i,j)with j = k, 1< k < 5, subtractinga constant t from every entry inone columnor onerow of the costmatrixreducesthe lengthof every tourby exactlyt. A minimum-costtour remainsa minimum-costtour followingthis subtractionoperation.If t is chosento be the minimum entry in row i(columnj),thensubtractingit from allentriesin row i (columnj) introducesa zerointo row i (columnj). Repeatingthis as often as needed,the costmatrixcan bereduced.The totalamount subtractedfrom the columnsandrows is a lower boundon the lengthof a minimum-costtour and canbeusedas the c value for the rootof the statespacetree.Subtracting10,2,2,3,4,1,and 3 from rows 1,2, 3,4, and 5 and columns1and 3 respectivelyof thematrixof Figure8.11(a)yields the reducedmatrixof Figure 8.11(b).Thetotalamount subtractedis 25.Hence,all tours in the originalgraphhave alengthat least25.

We canassociatea reducedcostmatrixwith every nodein the travelingsalespersonstatespacetree. Let A be the reducedcostmatrixfor nodeR.Let Sbea child of R suchthat the treeedge(i?,S) correspondsto including


edge(i,j)in the tour. If S is not a leaf, then the reducedcostmatrixforS may be obtainedas follows: (1) Changeall entriesin row i and columnj of A tooo.This prevents the use of any moreedgesleaving vertex i orenteringvertexj. (2) Set A(j, 1) to oo.Thispreventsthe useof edge(j,1).(3)Reduceallrows and columnsin the resultingmatrixexceptfor rows andcolumnscontainingonly oo.Let the resultingmatrixbe B. Steps(1) and(2) arevalid as no tour in the subtrees can containedgesof the type (i,k)or (k,j)or (j,1) (exceptfor edge(i,j)).If r is the totalamount subtractedin step(3) then c(S) = c(R)+A(i,j)+r.Forleafnodes,c(-)= c() is easilycomputedas eachleaf definesa uniquetour. Forthe upperbound functionu, we can useu(R) = oo for allnodesR.

00153

1916

2000564

301600187

1042

0016

11-I

243

00

\" 00120

15L ii

1000330

171100120

020

0012

(a) Costmatrix (b) ReducedcostmatrixL = 25

Figure8.11An example

Let us now trace the progressof the LCBBalgorithmon the probleminstanceof Figure8.11(a).We use c and u as above. The initial reducedmatrixis that of Figure8.11(b)and upper = oo.Theportionof the statespacetree that getsgeneratedis shown in Figure8.12.Starting with therootnodeas the _E-node,nodes2,3, 4, and 5 aregenerated(in that order).Thereducedmatricescorrespondingto thesenodesareshownin Figure8.13.Thematrixof Figure8.13(b)is obtainedfrom that of 8.11(b)by (1)settingallentriesin row 1and column3 to oo,(2) settingthe elementat position(3, 1) to oo,and (3)reducingcolumn1by subtractingby 11.Thec for node3 is therefore25 + 17(the costof edge(1,3)in the reducedmatrix) + 11= 53.Thematricesand c value for nodes2, 4, and 5 areobtainedsimilarly.Thevalue of upper is unchangedand node4 becomesthe next _E-node.Itschildren6,7, and 8 aregenerated.Thelive nodesat this timearenodes2,3, 5, 6,7, and 8. Node 6 has leastc value and becomesthe next _E-node.Nodes9 and 10aregenerated.Node 10is the next _E-node.Thesolutionnode,node11,is generated.Thetour lengthfor this nodeis c(ll)= 28 andupper is updatedto 28.Forthe next i?-node,node5, c(5) = 31> upper.Hence,LCBBterminateswith 1,4, 2,5,3,1as the shortestlengthtour.

An exerciseexaminesthe implementationconsiderationsfor the LCBBalgorithm.A different LCBBalgorithmcan be arrived at by considering


Numbersoutsidethe nodeare c values

Figure8.12Statespacetreegeneratedby procedureLCBB


a different tree organizationfor the solutionspace.This organizationisreachedby regardinga tour as a collectionof n edges.If G = (V, E) has eedges,then every tour containsexactlyn of the e edges.However,for eachh 1 thereis exactlyoneedgeof the form (i,j)and oneof the form(k,i) in every tour. A possibleorganizationfor the statespaceis a binarytreein whicha left branchrepresentsthe inclusionof a particularedgewhilethe right branchrepresentsthe exclusionof that edge.Figure8.14(b)and(c) representsthe first two levels of two possiblestate spacetreesfor thethreevertexgraphof Figure8.14(a).As is true of all problems,many statespacetreesare possiblefor a given problemformulation. Different treesdiffer in the orderin which decisionsaremade.Thus, in Figure8.14(c)wefirst decidethe fate of edge(1,2).Rather than usea staticstatespacetree,we now considera dynamic state spacetree (seeSection7.1).This is alsoa binary tree. However, the orderin which edgesare considereddependson the particularprobleminstancebeingsolved.We computec in the sameway as we did usingthe earlierstatespacetreeformulation.

00000

1511

0000000000

001100120

0020

0012

00020

00

001

0040

0000330

0000000000

0020

0012

00020

00

00120

0011

0000330

001100120

0000000000

00020

00

(a) Path 1,2;node2 (b) Path 1,3;node3 (c)Path 1,4;node 4

00100

1200

0000300

009

0090

0000

0012

00 \"

00000000

\" 00000

0011

0000000000

001100000

0000000000

0002

0000

00 00 00 00 001 00 00 00 0

00 1 00 00 000 00 00 00 000 0 oo oo oo

(d) Path 1,5;node5 (e)Path 1,4,2;node6 (f) Path 1,4,3;node7

00 00 OO 00 001 00 0 00 000 3 oo oo oo

00 00 00 00 00oo 0 0 oo oo

00 00 00 00 0000 00 00 00 0000 00 00 00 000 00 00 00 000 00 00 00 00

00 00 00 00 0000 00 00 00 000 00 00 00 00

00 00 00 00 0000 00 0 00 00

(g) Path 1,4,5;node 8 (h) Path 1,4,2,3;node9 (i) Path 1,4,2,5;node 10

Figure8.13Reducedcostmatrices correspondingto nodesin Figure 8.12

As an exampleof how LCBBwould work on the dynamic binary treeformulation,considerthe costmatrixof Figure8.11(a).Sincea totalof 25


include<1,2>

(a)Graph

include<3,1>

(b)Part of a state spacetree

exclude<1,2>

exclude<2,3>

(c)Part of a state spacetree

Figure8.14An example

needsto be subtractedform the rows and columnsof this matrixto obtainthe reducedmatrixof Figure8.11(b),all tours have a length at least25.This fact is representedby the rootof the state spacetree of Figure8.15.Now, we must decidewhich edgetouse to partitionthe solutionspaceintotwo subsets.If edge(i,j)is used,then the left subtreeof the rootrepresentsall tours includingedge(i,j)and the right subtreerepresentsall tours thatdonot includeedge(i,j).If an optimaltour is includedin the left subtree,then only n \342\200\224 1edgesremainto be selected.If alloptimaltours lie in theright subtree,then we have stillto selectn edges.Sincethe left subtreeselectsfewer edges,it shouldbeeasiertofind an optimalsolutionin it thanto find one in the right subtree.Consequently,we would like to chooseasthe partitioningedgean edge(i,j)that has the highestprobability of being


exclude<3,1>

include<1,4>/

Figure8.15Statespacetreefor Figure8.11(a)

in an optimaltour. Severalheuristicsfor determiningsuchan edgecan beformulated.A selectionrule that is commonlyusedis selectthat edgewhichresultsin a right subtreethat has highestc value. Thelogicbehindthis isthat we soonhave right subtrees(perhapsat lower levels)for which the cvalue is higherthan the lengthof an optimaltour. Another possibility is tochoosean edgesuchthat the differencein the c values for the left and rightsubtreesis maximum.Otherselectionrulesarealsopossible.

When LCBBis usedwith the first of the two selectionrulesstatedaboveand the costmatrixof Figure8.11(a),the treeof Figure8.15is generated.At the rootnode,we have to determinean edge(i,j)that will maximizethe c value of the right subtree. If we selectan edge(i,j)whose cost inthe reducedmatrix(Figure8.11(b))is positive,then the c value of the rightsubtreewill remain25.This is soas the reducedmatrixfor the right subtreewill have B(i,j)= oo and all otherentrieswill be identicalto thosein

Figure8.11(b).HenceB will bereducedand c cannotincrease.So,we mustchoosean edgewith reducedcost0. If we choose(1,4),then 5(1,4)= ooand we needto subtract1from row 1to obtain a reducedmatrix.In thiscasec will be 26.If (3,1)is selected,then 11needstobe subtractedfromcolumn1toobtain the reducedmatrixfor the right subtree.So,c will be36.If A is the reducedcost, matrixfor nodeR, then the selectionof edge(i,j)(A(i,j)= 0) as the next partitioningedgewill increasethe c of the


' 0000000000

\" 0000000000

10000030

(a)

10000030

(d)

oo 011 200 0012 oo0 12

Node2

oo 00 2

00 001 00

oo 12Node 5

110

000

00

' 001

004

. 0

110

000

00

0000000000

1000330

(b)

0000000

00

(e)

17 011 2oo 012 oo0 12

Node3

00 0000 0000 0000 0000 00

Node6

11020

00

' 0000000000

000

000000 .

\" 0000000000

700000

00

(c)

000000

00

(f)

oo 0oo 200 0000 0000 00

Node4

00 00oo 000 0000 0000 00

Node7

00 \"

0000

00

000

000

00

Figure8.16Reducedcostmatricesfor Figure8.15

right subtreeby A = mmie^:j{A(i,k)}+min^lA(k,j)}as this much needsto be subtractedfrom row i and columnj to introducea zerointo both.Foredges(1,4),(2,5),(3,1)(3,4),(4,5),(5,2), and (5,3),A = 1,2,11,0,3,3,and 11respectively.So, eitherof the edges(3,1)or (5,3)can be used.Let us assumethat LCBBselectsedge(3,1).Thec(2) (Figure8.15)canbecomputedin a mannersimilartothat for the statespacetreeof Figure8.12.In the correspondingreducedcostmatrixallentriesin row 3 and column1will beoo.Moreoverthe entry (1,3)will alsobeoo as inclusionof this edgewill result in a cycle.The reducedmatricescorrespondingto nodes2 and 3aregiven in Figure8.16(a)and (b). Thec values for nodes2 and 3 (as wellas for allothernodes)appearoutsidethe respectivenodes.

Node 2 is the nextE-node.Foredges(1,4),(2,5),(4,5),(5,2), and (5,3),A = 3, 2,3, 3, and 11respectively.Edge(5,3) is selectedand nodes4 and 5generated.Thecorrespondingreducedmatricesaregiven in Figure8.16(c)and (d). Then c(4) becomes28 as we needto subtract3 from column2to reducethis column.Note that entry (1,5) has beenset to oo inFigure 8.16(c).This is necessaryas the inclusionof edge(1,5) to the collection{(3,1),(5,3)}will result in a cycle. In addition,entriesin column3 androw 5 areset to oo.Node4 is the next -E-node.TheA valuescorresponding

to edges(1,4),(2,5),and (4,2) are9,2,and 0 respectively.Edge(1,4)is selectedand nodes6 and 7 generated.Theedgeselectionat node6 is{(3,1),(5,3),(1,4)}.This correspondsto the path 5, 3, 1,4. So,entry (4,5) is set to oo in Figure8.16(e).In generalif edge(i,j)is selected,then theentriesin row i and columnj areset to oo in the left subtree.In addition,one moreentry needstobe set to oo.This is an entry whose inclusionin


the set of edgeswould createa cycle (Exercise4 examineshow todetermine this). Thenext _E-node is node6.At this timethreeof the five edgeshave already beenselected.The remainingtwo may be selecteddirectly.Theonly possibility is {(4,2),(2,5)}.This gives the path 5, 3,1,4,2,5 with

length 28. So upper is updatedto 28.Node 3 is the next _E-node. NowLCBBterminatesas c(3)= 36 > upper.

In the precedingexample,LCBBwas modified slightly to handlenodescloseto a solutionnodedifferently from othernodes.Node6 is only twolevelsfrom a solutionnode.Rather than evaluatec at the childrenof 6 andthen obtain their grandchildren,we just obtainedan optimalsolutionforthat subtreeby a completesearchwith no bounding.We couldhave donesomethingsimilarwhen generatingthe tree of Figure8.12.Sincenode6is only two levels from the leafnodes,we can simply skip computingc forthe childrenand grandchildrenof 6,generateallof them,and pick the best.This works out to bequiteefficient as it is easierto generatea subtreewitha smallnumberof nodesand evaluateall the solutionnodesin it than it isto computec for one of the childrenof 6. This latterstatementis true ofmany applicationsof branch-and-bound.Branch-and-boundis usedon largesubtrees.Oncea smallsubtreeis reached(say onewith 4 or 6 nodesin it),then that subtreeis fully evaluatedwithout usingthe boundingfunctions.

We have now seenseveralbranch-and-boundstrategiesfor the travelingsalespersonproblem.It is not possibleto determineanalytically which oftheseis the best.Theexercisesdescribecomputerexperimentsthatdetermine empiricallythe relativeperformanceof the strategiessuggested.

EXERCISES1.Considerthe travelingsalespersoninstancedefined by the costmatrix

oo35918

7oo8314

36

oo59

12146

oo8

891811oo

(a) Obtainthe reducedcostmatrix(b) Usinga statespacetreeformulation similarto that of Figure8.10

and c as describedin Section8.3,obtainthe portionof the statespacetree that will be generatedby LCBB.Labeleachnodebyits c value.Write out the reducedmatricescorrespondingto eachof thesenodes.

(c) Do part (b) usingthe reducedmatrixmethodand the dynamicstatespacetreeapproachdiscussedin Section8.3.


2.Do Exercise1usingthe following travelingsalespersoncostmatrix:

oo88116

11oo4109

107oo55

934oo5

6485

oo

3. (a) Describean efficient implementationfor a LCBBtravelingsalesperson algorithmusingthe reducedcostmatrixapproachand (i)a dynamic statespacetreeand (ii)a statictreeas in Figure8.10.

(b) Are thereany probleminstancesfor whichthe LCBBwill generatefewer nodesusinga statictreethan usinga dynamic tree?Proveyour answer.

4. Considerthe LCBBtravelingsalespersonalgorithmdescribedusingthe dynamic statespacetreeformulation.Let A and B benodes.LetB be a childof A. If the edge(A, B) representsthe inclusionof edge(i,j)in the tour, then in the reducedmatrixfor B allentriesin row iand columnj areset to oo.In addition,onemoreentry is set tooo.Obtainan efficient way to determinethis entry.

5. [Programmingproject]Write computerprogramsfor the followingtraveling salespersonalgorithms:

(a) Thedynamic programmingalgorithmof Chapter5(b) A backtrackingalgorithmusingthe statictreeformulation of

Section 8.3(c) A backtrackingalgorithmusingthe dynamic treeformulationof

Section8.3(d) A LCBBalgorithmcorrespondingto (b)(e) A LCBBalgorithmcorrespondingto (c)

Designdata sets to be used to comparethe efficiency of the abovealgorithms.Randomly generateprobleminstancesfrom thesedatasetsand obtaincomputingtimesfor your programs.What conclusionscan you draw from your computingtimes?

8.4 EFFICIENCY CONSIDERATIONSOnecanposeseveralquestionsconcerningthe performancecharacteristicsofbranch-and-boundalgorithmsthat find least-costanswer nodes.We mightask questionssuchas:

8.4.EFFICIENCYCONSIDERATIONS 413

1.Doesthe use of a betterstarting value for upper always decreasethenumberof nodesgenerated?

2.Isit possibleto decreasethe numberof nodesgeneratedby expandingsomenodeswith c() > upper?

3. Doesthe useof a bettercalwaysresult in a decreasein (orat leastnotan increasein) the numberof nodesgenerated?(A \302\2432

is betterthanc\\ iff c\\{x)< c-2(x)< c(x) for allnodesx.)

4. Doesthe use of dominancerelationsever result in the generationofmorenodesthan would otherwisebegenerated?

In this sectionwe answer thesequestions.Although the answers to mostof the questionsexaminedagreewith our intuition,the answers to othersarecontrary to intuition. However,even in casesin which the answer doesnot agreewith intuition,we canexpectthe performanceof the algorithmtogenerally agreewith the intuitive expectations.All the following theoremsassumethat the branch-and-boundalgorithm is to find a minimum-costsolutionnode.Consequently,c(x) = costof minimum-costsolutionnodeinsubtreex.

Theorem8.2Let t be a statespacetree. Thenumberof nodesof tgenerated by FIFO,LIFO,and LC branch-and-boundalgorithmscannot bedecreasedby the expansionof any nodex with c(x) > upper,whereupperis the currentupperboundon the costof a minimum-costsolutionnodeinthe treet.

Proof:Thetheoremfollows from the observationthat the value of uppercannot be decreasedby expandingx (as c(x) > upper). Hence,such anexpansioncannotaffect the operationof the algorithmon the remainderofthe tree. \342\226\241

Theorem8.3Let U\\ and U2,U\\ < U2, be two initialupperboundson thecostof a minimum-costsolutionnodein the statespacetreet. ThenFIFO,LIFO,and LC branch-and-boundalgorithmsbeginningwith U\\ will generateno morenodesthan they would if they startedwith U2 as the initialupperbound.


Theorem8.4Theuseof a betterc function in conjunctionwith FIFOandLIFObranch-and-boundalgorithmsdoesnot increasethe numberof nodesgenerated.



Theorem8.5If a betterc function is used in a LC branch-and-boundalgorithm, the numberof nodesgeneratedmay increase.

Proof:Considerthe state spacetree of Figure8.17.All leaf nodesaresolutionnodes.Thevalue outsideeachleafis its cost.Fromthesevalues itfollows that c(l)= c(3)= 3 and c(2)= 4. Outsideeachof nodes1,2, and 3is a pairof numbers(?).Clearly, &2 is a betterfunction than c.\\. However,if C2 is used,node2 canbecomethe _E-nodebeforenode3,as

\302\2432(2)=

\302\2432(3).

In this caseallnine nodesof the treewill get generated.When c\\ is used,nodes4, 5, and 6 arenot generated. \342\226\241

Figure8.17Exampletreefor Theorem8.5

Now, let us lookat the effect of dominancerelations.Formally,adominance relationD is given by a setof tuples,D = {(h,12),(\302\2533, m), (\302\2535, \302\2536)?

\342\200\242\342\200\242

\342\200\242}\342\200\242

If (i,j)\342\202\254. -D, then nodei is saidto dominatenodej. By this we meanthatsubtreei containsa solutionnodewith costno morethan the costof aminimum-costsolutionnodein subtreej. Dominatednodescan be killedwithout expansion.

Sinceevery nodedominatesitself, (i,i) G D for all i and D. Therelation (i,i) shouldnot result in the killingof nodei. In addition,it is quitepossiblefor D to containtuples (^1,^2),(\302\2532,23), (*3,u),...,{in,h)- In thiscase,the transitivity of D impliesthat eachnodeik dominatesall nodes*V' I \342\200\224 3 \342\200\224

n- Careshouldbe taken to leave at leastone of the ij'salive.A dominancerelationD2 is said to be strongerthan another dominancerelationD\\ iff D\\ C T>2- In the following theorems/ denotesthe identityrelation{(i,i)|l< i <n).

8.4.EFFICIENCYCONSIDERATIONS 415

Theorem8.6The number of nodesgeneratedduring a FIFOor LIFObranch-and-boundsearchfor a least-costsolutionnodemay increasewhena strongerdominancerelationis used.

Proof:Considerthe statespacetreeof Figure8.18.Theonly solutionnodesare leaf nodes.Theircostis written outsidethe node.Forthe remainingnodesthe numberoutsideeachnodeis its c value. The two dominancerelationsto use are D\\ = I and D2 = I U {(5,2),(5,8)}.Clearly, D2 isstrongerthan D\\ and fewer nodesaregeneratedusingD\\ ratherthan D2.a

Figure8.18Exampletreefor Theorem8.6

Theorem8.7Let D\\ and D2 be two dominancerelations. Let D2 bestrongerthan D\\ and such that {i,j)\302\243 D2,i j, impliesc(i) < c(j).An LC branch-and-boundusingD\\ generatesat leastas many nodesas oneusingD2.


Theorem8.8If the conditionc(i) < c(j)in Theorem8.7is removed thenan LC branch-and-boundusingthe relationD\\ may generatefewer nodesthan oneusingD2.



EXERCISES1.ProveTheorem8.3.2.ProveTheorem8.4.3.ProveTheorem8.7.4. ProveTheorem8.8.5. [Heuristicsearch]Heuristicsearchis a generalizationof FIFO,LIFO,

and LC searches.A heuristicfunction h(-) is used to evaluateall livenodes.The next E-nodeis the live nodewith leasth(-). Discussthe advantagesof using a heuristicfunction h(-) different from c(-)in the searchfor a least-costanswer node.Considerthe knapsackandtravelingsalespersonproblemsas two exampleproblems.Alsoconsiderany otherproblemsyou wish. Fortheseproblemsdevisereasonablefunctions h(-) (different from

\302\243(\342\200\242)).

Obtainprobleminstancesonwhichheuristicsearchperformsbetterthan LC-search.

8.5 REFERENCES AND READINGSLC branch-and-boundalgorithmshave beenextensivelystudiedby researchersin areassuchas artificialintelligenceand operationsresearch.

Branch-and-boundalgorithmsusing dominancerelationsin a mannersimilarto that suggestedby FIFOKNAP(resultingin DKnapl)were givenby M. Heldand R. Karp.

Thereductiontechniquefor the knapsackproblemis dueto G.Ingargiolaand J.Korsh.

Thereducedmatrixtechniqueto computec is due to J.Little,K.Murty,D.Sweeny, and C.Karel. They employed the dynamic state spacetreeapproach.

Theresultsof Section8.4arebasedonthe workof W. Kohler,K.Steiglitz,and T. Ibaraki.

Theapplicationof branch-and-boundand othertechniquesto theknapsack and relatedproblemsis discussedextensively in KnapsackProblems:Algorithms and ComputerImplementations,by S.Martelloand P. Toth,John Wiley and Sons,1990.

Chapter9

ALGEBRAICPROBLEMS

9.1 THE GENERALMETHODIn this chapterwe shift our attention away from the problemswe've dealtwith previously to concentrateon methodsfor dealingwith numbersandpolynomials.Thoughcomputershave the ability already built-in tomanipulate integersand reals,they are not directly equippedto manipulatesymbolic mathematicalexpressionssuch as polynomials.One mustdetermine a way torepresentthem and then write proceduresthat perform thedesiredoperations.A system that allowsfor the manipulationofmathematical expressions(usually includingarbitrary precisionintegers,polynomials,and rationalfunctions)is calleda mathematicalsymbol manipulationsystem. Thesesystems have beenfruitfully used to solvea variety of scientificproblemsfor many years.The techniqueswe study herehave often ledtoefficient ways toimplementthe operationsoffered by thesesystems.

Thefirst designtechniquewe presentis calledalgebraictransformation.Assume we have an input / that is a memberof set Si and a function/(/) that describeswhat must be computed.Usually the output /(/) isalsoa memberof S\\. Though a methodmay existfor computing/(/)usingoperationson elementsin Si,this methodmay be inefficient. Thealgebraictransformationtechniquesuggeststhat we alter the input intoanother form to producea memberof set S^- The set S2 containsexactlythe sameelementsas Siexceptit assumesa different representationfor them.Why would we transformthe input into another form? Becauseit may beeasierto computethe function / for elementsof S2 than for elementsof S\\.Oncethe answerin S2 is computed,an inversetransformation is performedtoyield the result in setSi.Example9.1Let Si be the set of integersrepresentedusingdecimalnotation, and S2 the setof integersusingbinary notation.Given two integersfrom setSi,plus any arithmeticoperationsto carry out on thesenumbers,

417

418 CHAPTER9. ALGEBRAICPROBLEMS

today'scomputerscan transform the numbersinto elementsof set S2,perform the operations,and transformthe result back into decimalform.Thealgorithmsfor transformingthe numbersare familiar to most students ofcomputerscience.To go from elementsof setSi to setS2,repeateddivisionby 2 is used,and from set S2 to set S\\, repeatedmultiplicationis used.Thevalue of binary representationis the simplificationthat resultsin the internalcircuitry of a computer. \342\226\241

Example9.2Let S\\ bethe setof n-degreepolynomials(n > 0)with integercoefficientsrepresentedby a list of theircoefficients;e.g.,

A(x)= anxn+ \342\200\242\342\200\242\342\200\242+ a\\x +a^

The set S2 consistsof exactlythe sameset of polynomials but isrepresented by theirvalues at 2n+1points;that is,the 2n+1pairs(xi, A(xi)),1< i < 2n+ 1,would representthe polynomial A. (At this stagewe won'tworry about what the values of but for now you can considerthemconsecutiveintegers.)The function / to be computedis the one thatdetermines the productof two polynomials A(x) and B(x),assumingthe setSi representationtostart with. Rather than forming the productdirectlyusingthe conventionalmethod(which requires0(n2)operations,wherenis the degreeof A and B and any possiblegrowth in the sizeof thecoefficients is ignored),we couldtransformthe two polynomials into elementsof the set S^- We do this by evaluating A(x) and B(x)at 2n + 1points.Theproductcannow becomputedsimply,by multiplying the correspondingpoints together.Therepresentationof A(x)B(x) in set S2 is given by thetuples (xj, A(xi)B(xi)),1< i < 2n+ 1,and requiresonly 0(n)operationsto compute.We can determinethe productof A(x)B(x) in coefficient formby finding the polynomial that interpolates(orsatisfies)these2n +1points.It is easy toshow that thereis a uniquepolynomialof degree< 2n that goesthrough2n+ 1points.

Figure9.1describesthesetransformationsin a graphicalform indicatingthe two paths one can taketo reachthe coefficient productdomain,eitherdirectly by conventionalmultiplicationor indirectly by algebraictransformation. Thetransformationin onedirectionis effected by evaluationwhereasthe inverse transformationis accomplishedby interpolation.Thevalue ofthe schemerestsentirely on whetherthesetransformationscan be carriedout efficiently.

Forinstance,if A(x) = 3x2+ 4x + 1and B{x)= x1+ 2x+ 5, thesecan be representedby the pairs(0,1),(1,8),(2,21),(3,40),and (4,65)and(0,5),(1,8),(2,13),(3,20),and (4,29),respectively.Then A{x)B{x)in S2takesthe form (0,5), (1,64),(2,273),(3,800),and (4,1885). \342\226\241

The world of algebraicalgorithmsis sobroadthat we only attempttocover a few of the interestingtopics.In Section9.2we discussthe question


conventionalmultiplication

S i :coefficients

evaluation

V

S2 :points

product

interpolation

point product

pairwisemultiplication

Figure9.1Transformationtechniquefor polynomialproducts

of polynomial evaluationat one or morepoints and the inverse operationof polynomial interpolationat n points.Then in Section9.3we discussthesameproblemsas in Section9.2but this timeassumingthe n points arenth rootsof unity. This is shown to beequivalent to computingthe Fouriertransform.We alsoshow how the divide-and-conquerstrategy leadsto thefast Fouriertransform algorithm.In Section9.4we shift our attention tointegerproblems,in this casethe processesof modulararithmetic.Modulararithmeticcanbeviewedasa transformationschemethat is usefulforspeeding up largeprecisionintegerarithmeticoperations.Moreoverwe seethattransformationinto and out of modularform is a specialcaseof evaluationand interpolation.Thus thereis an algebraicunity to Sections9.2,9.3,and9.4.Finally, in Section9.5we presentasymptotically efficient algorithmsforn-pointevaluationand interpolation.

EXERCISES1.Devisean algorithmthat acceptsa numberin decimaland produces

the equivalent numberin binary.

2.Devisean algorithmthat performsthe inverse transformationofExercise 1.


3. Show the tuples that would result by representingthe polynomials5x2+3x+10and 7x+4at the valuesx = 0,1,2,3,4,5,and6.What setof tuplesis sufficient torepresentthe productof thesetwo polynomials?

4. If A(x)= anxn+ \342\200\242\342\200\242\342\200\242+ a\\x +a0,then the derivative of A(x), A'(x) =nanxn~l+ \342\226\240\342\226\240\342\226\240+ a\\. Devisean algorithmthat producesthe value of apolynomialand its derivative at a point x = v. Determinethe numberof requiredarithmeticoperations.

9.2 EVALUATIONAND INTERPOLATIONIn this sectionwe examinethe operationson polynomialsof evaluationandinterpolation.As we searchfor efficient algorithms,we seeexamplesofanother designstrategy calledalgebraicsimplification.When appliedtoalgebraic problems,algebraicsimplificationrefersto the processof reexpressingcomputationalformulas sothat the requirednumberof operationstocompute theseformulas is minimized.Oneissuewe ignorehereis the numericalstability of the resultingalgorithms.Though this is often an importantconsideration,it is toofar from our purposes.

A univariatepolynomialis generallywrittenas

A(x)= anxn+an_\\xn~l+ \342\200\242\342\200\242\342\200\242+ a\\x + ao

wherex is an indeterminateand the a\\ may beintegers,floating pointnumbers, ormoregenerallyelementsof a commutativeringora field.If an ^ 0,then n is calledthe degreeof A.

When consideringthe representationof a polynomial by its coefficients,thereare at least two alternatives.The first callsfor storing the degreefollowed by degree+ 1coefficients:

(n,a\342\200\236,a\342\200\236_i,...,ai,a0)

This is termedthe denserepresentationbecauseit explicitlystoresallcoefficientswhetheror not they arezero.We observethat for a polynomialsuchas x1000+ 1the denserepresentationis wasteful sinceit requires1002locationsalthoughthereareonly two nonzeroterms.

Thesecondrepresentationcallsfor storingonly eachnonzerocoefficientand its correspondingexponent;for example,if all the a{ arenonzero,thenthe polynomial is storedas

(n,a\342\200\236,n-l,a\342\200\236_i,... ,l,ai,0,a0).Thisis termedthe sparserepresentationbecausethe storagedependsdirectlyon the numberof nonzerotermsand not on the degree.Fora polynomial

9.2.EVALUATIONAND INTERPOLATION 421

1 AlgorithmStraightEval(A,n, v)2 {3 r :=1;s :=ao;4 for i :=1to n do5 {6 r :=r * v;7 s :=s +di * r;8 }9 returns;10 }

Algorithm9.1Straightforward evaluation

of degreen, all of whosecoefficientsare nonzero,this secondrepresentation requiresroughly twice the storageof the first. However, that is theworst case.Forhigh-degreepolynomialswith few nonzeroterms,the secondrepresentationis many timesbetterthan the first.

Secondarilywe note that the termsof a polynomial will often be linkedtogetherratherthan sequentially stored.However,we will avoid thiscomplication in the following algorithmsand assumethat we can accessthe ithcoefficient by writing aj.

Supposewe aregiven the polynomialA(x)= anxn+ \342\200\242\342\200\242\342\200\242+ao and we wishto evaluateit at a point v, that is,computeA(v). The straightforward orright-to-leftmethodaddsa\\v to ao and a,2V2 to this sum and continuesasdescribedin Algorithm 9.1.Theanalysis of this algorithmis quite simple:2n multiplications,n additions,and 2n+2 assignmentsaremade(excludingthe for loop).

An improvementtothis procedurewas devisedby IsaacNewton in 1711.Thesameimprovement was usedby W. G.Hornerin 1819toevaluatethecoefficientsof A(x + c). Themethodcametobe known as Horner'srule.They rewrotethe polynomial as

A(x)=(\342\200\242

\342\200\242\342\200\242{{anx+an-i)x+ a\342\200\236_2)xH h a\\)x+ a0

This is our first and perhapsmost famous exampleof algebraicsimplification. The function for evaluationthat is basedon this formula is given in

Algorithm 9.2.Horner'srule requiresn multiplications,n additions,and n + 1

assignments (excludingthe for loop).Thus we see that it is an improvementover the straightforward methodby a factor of 2. In fact in Chapter10


1 AlgorithmHorner(A,n,v)2 { _o S \\\342\200\224 Q>n*i

4 for i :=n \342\200\224 1to 0 step\342\200\2241 do5 {6 s :=s * v +a;;7 }8 returns;9 }

Algorithm9.2Horner'srule

we seethat Horner'srule yields the optimalway to evaluatean nth-degreepolynomial.

Now supposewe considerthe sparserepresentationof a polynomial,A(x) =amiCm + \342\200\242\342\200\242\342\200\242+a\\xei,wherethe ai ^ 0 and em > em_i > \342\200\242\342\200\242\342\200\242 > e\\ >0.Thestraightforward algorithm(Algorithm9.1),when generalizedto this sparsecase,is given in Algorithm 9.3.

1 AlgorithmSStraightEval(A,m,v)2 // Sparsestraightforward evaluation.3 // m is numberof nonzeroterms.4 {5 s:=0;6 for i :=1to m do7 {8 s :=s + (ii * Power(v, e*);9 }10 returns;11 }

Algorithm9.3Sparseevaluation

Power(v,e)returns ve. Assuming that ve is computedby repeatedmultiplication with v, this operationrequirese \342\200\224 1multiplicationsandAlgorithm 9.3requiresem + em_i + \342\200\242\342\200\242\342\200\242+ e\\ multiplications,m additions,andm + 1assignments.This is horribly inefficient and can easily be improved


1 AlgorithmNStraightEval(A, m,v)2 {3 s :=eo :=0;t :=1;4 for i :=1to m do5 {6 r :=Power(7j,ej \342\200\224 e;_i);7 f :=t *r;8 s :=s + aj * i;9 }10 returns;

Algorithm9.4Evaluating a polynomial representedin coefficient-exponentform

by an algorithmbasedon computing

ve\\ve2-elvel,ve3-e2ve2,...

Algorithm 9.4requireseTO + tn multiplications,3th + 3 assignments,tnadditions,and m subtractions.

A morecleverschemeis to generalizeHorner'sstrategy in the revisedformula

A{x)= {\342\226\240\342\200\242\342\226\240 {{amxem-em^+ am-x)xe\342\204\242-x-em-2 + \342\226\240\342\226\240\342\226\240+ a2K2~ei+ax)xei

The function of Algorithm 9.5is basedon this formula. The numberofmultiplicationsrequiredis

(em - em_i - 1)H h (ei- e0- 1)+m = em

which is the degreeof A. In additiontherearem additions,m subtractions,and m + 2 assignments.Thus we seethat Horner'srule is easily adaptedtoeitherthe sparseor the densepolynomial modeland in both casesthenumberof operationsis boundedand linearin the degree.With a littlemorework onecan find an even bettermethod,assuminga sparserepresentation,whichrequiresonly m+log2em multiplications.(Seetheexercisesfor a hint.)

Givenn points(xj,j/j),the interpolationproblemis to find the coefficientsof the uniquepolynomial A(x) of degree< n \342\200\224 1that goesthroughthesenpoints.Mathematically the answer to this problemwas given by Lagrange:


123456789

AlgorithmSHorner(A,m,t;){

}

s :=e0 :=0;for i :=m to 1 step\342\200\2241 do{

s :=(s + a,i) * Power(*u, e;-e;_i);}returns;

Algorithm9.5Horner'srule for a sparserepresentation

w = e nl<i<n

\\ i/jvl<j'<n

(x \342\200\224

Xj)

(Xj Xj)

\\

Vi

JTo verify that A(x) doessatisfy the n points,we observethat

/A{xi) n (Xj Xj j

\\Xi Xj)

\\

Vi = Vi

J

(9.1)

(9.2)

sinceevery other term becomeszero. The numeratorof eachterm is aproductof n \342\200\224 1factorsand hencethe degreeof A is < n \342\200\224 1.

Example9.3Considerthe input (0,1),(1,10),and (2,21).UsingEquation9.1,we get

A(t) - (*-l)(*-2)cI (x-0)(x-2)in(x-0)(x-l)21A\\x) ~ (orry(o_2)0+(1^0)(T^2)iU+(2-0)(2-l)Zi= |(x2-3x+2)-10(x2-2x)+ f (x2-x)= 3x2 +2x+ 5

We can verify that A(0) = 5, A(l) = 10,and A(2) = 21. \342\226\241

We now give an algorithm(Algorithm 9.6)that producesthe coefficientsof A{x)usingEquation9.1.We needto performsomeadditionand mul-


tiplicationof polynomials.So we assumethat the operators+,*, /, andhave beenoverloadedto take polynomialsas operands.

1234567891011121314151617181920

AlgorithmLagrange(X,Y, n, A)IIII

X and Y areone-dimensionalarrays containingn points (xj,yj),1< i <n. A is a

II polynomial that interpolatesthesepoints.{

}

//A:for{

}

poly is a polynomial.= 0;i :=1to n do

poly :=1;denom:=1;for j :~1to n do

if (i ^ j) then{

poly :=poly * {x-X[j]);II x \342\200\224 X[j] is a degreeonepolynomial in x.

denom:=denom* (X[i]\342\200\224 X\\j]);}

A :=A+ (poly* Y[i\\ldenom);

Algorithm9.6Lagrangeinterpolation

An analysis of the computingtimeof Lagrangeis instructive. The ifstatementis executedn2 times. The time to computeeachnew value ofdenomis onesubtractionand onemultiplication,but the executionof * (asappliedto polynomials)requiresmorethan constanttimepercall.Sincethedegreeof x \342\200\224

X\\j] is one,the timefor oneexecutionof * is proportionaltothe degreeof poly, which is at most j \342\200\224 1on the jth iteration.

Thereforethe totalcostof the polynomial multiplicationstepis

E Eii-'i- \302\243 (^-\302\273)l<i<nl<j<n \\<i<n V 7

= n2(n+ l)/2-n2


Thus\302\243 \302\243

0/\"1) = 0(n3) (9.3)l<i<nl<j<n

Thisresultis discouragingbecauseit is sohigh. Perhapswe shouldsearchfor a bettermethod.Supposewe already have an interpolatingpolynomialA(x) such that A(xi) = yi for 1 < i < n and we want to add just onemorepoint (in+i,Vn+i)- How would we computethis new interpolatingpolynomialgiven the fact that A(x)was already available?If we couldsolvethis problemefficiently, then we couldapply our solutionn timesto get ann-point interpolatingpolynomial.

Let Gj-i(x)interpolatej \342\200\224 1 points (xk, yk), 1 < k < j, so thatGj-i(xk)= yk. Also let -Dj_i(x)= (x \342\200\224

x\\)\342\226\240\342\226\240\342\226\240(x \342\200\224 Xj-i).Then we can

computeGj(x)by the formula

Gj{x)= [y3-Giîxj)]^^-+Gj-^x) (9.4)

We observethat

Gj{xk)= [Vj-Gjîxj)]^-1^]+ G^(xk)

but Dj-i(xk)= 0 for 1< k <j. So

Gj{xk)= Gj-i(xk)= yk

Also we observethat

Gj{xj)= to-Gj-tixfl^^ +Gj-iixj)= Vj-Gj-i{xj)+Gj-i{xj)=

Vj

Example9.4Consideragain the input (0,5),(1,10),and (2,21).HereG\\{x)= 5 and D\\{x)= (x \342\200\224

x\\) = x.

G2(x)= [y2-G1(x2)}^\\+Gi(x)= (10-5)^ + 5 = 5x+ 5


Also, D2(x)= (x \342\200\224 x\\){x\342\200\224

\302\2432)

= {x\342\200\224 0)(x\342\200\224 1)= x2 \342\200\224 x.Finally,

G3(x) = [y3-G2{x3)}^\\+G2{x)

X2[21- 15]\342\200\224\342\200\224-

+ (5a;+ 5) = 3x2+2x+ 5 \342\226\241

, Ob\342\200\242*-\342\200\242./\342\226\240_ . _ \\ \"2

Having verified that this formula is correct,we presentan algorithm(Algorithm 9.7) for computingthe interpolatingpolynomial that is basedonEquation9.4.Noticethat from the equation,two applicationsof Horner'srulearerequired,one for evaluatingGj-i(x)at Xj and the otherforevaluating Dj-i(x)at Xj.

1 AlgorithmInterpolate^,Y,n,G)2 // Assume n > 2.X[l :n] and Y[l :n] are the3 // n pairsof points.Theuniqueinterpolating4 // polynomial of degree< n is returnedin G.5 {6 // D is a polynomial.7 G :=Y[l];// G beginsas a constant.8 D :=x \342\200\224 X[l];// D is a linearpolynomial.9 for j :=2 to n do10 {11 denom:=Horner(.D,j - l,X[j]);// EvaluateD at X\\j].12 n\302\253m :=Horner(6',j -2,X[j]);// Evaluate G at X[y].13 G :=G + (D * (Y[j]-num)/denom);14 D :=D * (x-X[j});15 }16 }

Algorithm9.7Newtonian interpolation

On the jth iterationD has degreej \342\200\224 I and G hasdegreej \342\200\224 2.Thereforethe invocationsof Horner require

J2 (i- l+J-2)= n(n-l)-3(n-1)= n2-4n+ 3 (9.5)l<j<n-l

multiplicationsin total.Theterm (Y\\j]\342\200\224 num)/denomin Algorithm 9.7is

a constant.Multiplying this constantby D requiresj multiplicationsand


multiplying D by x \342\200\224 X[j] requiresj multiplications.Theadditionwith Grequireszeromultiplications.Thus the remainingstepsrequire

\302\243(2j)=n(n-l) (9.6)

l<j<n-loperations,sothe entirealgorithmInterpolaterequires0(n2)operations.

In conclusionwe observethat for a densepolynomialof degreen,evaluation can beaccomplishedusing0(n)operationsor, for a sparsepolynomialwith m nonzerotermsand degreen, evaluationcan be doneusingat most0(m+ n) = 0{n)operations.Also, given n points,we can producetheinterpolatingpolynomial in 0(n2)time.In Chapter10we discussthequestion of the optimality of Horner'srule for evaluation.Section9.5presentsan even fasterway toperformthe interpolationof n pointsas well as theevaluationof a polynomial at n points.

EXERCISES1.Devisea divide-and-conqueralgorithmto evaluatea polynomial at a

point.Analyze carefully the timefor your algorithm.How doesitcompareto Horner'srule?

2.Presentalgorithmsfor overloadingthe operators+ and * in the caseof polynomials.

3. Assume that polynomials such as A(x) = anxn+ \342\200\242\342\200\242\342\200\242+ ao arerepresented usingthe denseform.Presentan algorithmthat overloadstheoperators+ and = to performthe instructionr = s + t;,wherer, s,and t arearbitrary polynomials.

4. Usingthe sameassumptionsas for Exercise3,write an algorithmtoperformr = s * t\\.

5. Let A(x)= anxn+ \342\226\240\342\226\240\342\226\240+ao,p = n/2and q = \\n/2]. Then a variationof Horner'srulestatesthat

A(x) =(\342\200\242

\342\200\242\342\200\242

{a,2pX2 + (i2P--2)x2H )x2+ a0

+((\342\200\242

\342\200\242\342\200\242{a2q-\\x2+a,2q-'i)x2H )x2+a\\)x

Showhow to usethis formula to evaluateA(x) at x = v and x =\342\200\224v.

6. Given the polynomial A(x) in Exercise5 devisean algorithmthatcomputesthe coefficientsof polynomial A(x +c) for someconstantc.


7. Supposethe polynomialA(x) has realcoefficientsbut we wish toevaluate A at the complexnumberx = u +iv, u and v beingreal.Developan algorithmto do this.

8. Supposethe polynomialA(x)= amxem + \342\200\242\342\200\242\342\200\242+a\\xei,whereai ^ 0 andem > em-l> \342\200\242\342\200\242\342\200\242> ei > 0, is representedusingthe sparseform.Writea function PAdd(r,s,t) that computesthe sum of two suchpolynomialsr and s and storesthe result in t.

9. Using the same assumptionsas in Exercise8, write a functionPMult(r,s,t) that computesthe productof the polynomials r and sand placesthe result in t. What is the computingtimeof yourfunction?

10.Determinethe polynomial of smallestdegreethat interpolatesthepoints(0,1),(1,2), and (2,3).

11.Given n points (xi, yj), 1< i <n, devisean algorithmthat computesboth the interpolatingpolynomial A(x)and its derivative at the sametime.Howefficient is your algorithm?

12.Provethat the polynomialof degree< n that interpolatesn +1pointsis unique.

13.The binary methodfor exponentiationuses the binary expansionofthe exponentn to determinewhen to squarethe temporary resultandwhen to multiply it by x. Sincethereare [lognj+ 1bitsin n, thealgorithmrequiresO(logn) operations;this algorithmis an orderofmagnitudefaster than iteration.Themethodappearsas Algorithm1.20.Show how to use the binary methodto evaluatea sparsepolynomial in timem + log em.

14.Supposeyou are given the realand imaginary parts of two complexnumbers.Show that the realand imaginary partsof theirproductcanbecomputedusingonly threemultiplications.

15. (a) Show that the polynomials ax + b and ex+ d can be multipliedusingonly threescalarmultiplications.

(b) Employ the above algorithmto devisea divide-and-conqueralgorithm to multiply two given nth degreepolynomials in time9(nl0S23).

16.TheFibonaccisequenceis dennedas /o = 0, f\\ = 1,and/\342\200\236

= jn-\\ +fn-2f\302\260r n>2.Give an O(logn) algorithmto compute/\342\200\236. (Hint:

fn-lJn

\342\200\224

0 11 1 fn-2

fn-l

430 CHAPTER9. ALGEBRAIC PROBLEMS

9.3 THE FASTFOURIER TRANSFORMIf oneis ableto devisean algorithmthat is an orderof magnitudefaster thanany previousmethod,that is a worthy accomplishment.When theimprovement is for a processthat has many applications,then that accomplishmenthas a significant impacton researchersand practitioners.This is the caseof the fast Fouriertransform.No algorithmimprovementhas had a greaterimpactin the recentpast than this one. TheFouriertransformis used byelectricalengineersin a variety of ways includingspeechtransmission,codingtheory, and imageprocessing.But beforethis fast algorithmwas developed,the use of this transformwas consideredimpractical.

TheFouriertransform of a continuousfunction a(t) is given by

/ooa{t)t2m!tdt (9.7)-oo

whereasthe inverse transformof A(f) is

1 fOO

a{t)= 2iJ QOA{f)e~2niftdf (9'8)

Thei in the above two equationsstands for the squarerootof \342\200\2241. Theconstant e is the baseof the natural logarithm.Thevariable t is often regardedas time,and / is taken to mean frequency. Then the Fouriertransform isinterpretedas takinga function of timeinto a function of frequency.

Correspondingto this continuousFouriertransformis the discreteFouriertransform whichhandlessamplepointsof a(t),namely,uq, ai,...,a^-i.ThediscreteFouriertransformis dennedby

Aj = Y, ake2^k/N, 0<j<N~l (9.9)0<fc<iV-l

and the inverse is

ak = lJ E Aje-W/N,0<k<N-l (9.10)0<j<N-l

In the discretecasea setof N samplepointsis given and a resultingsetofN points is produced.An importantfact toobserveis the closeconnectionbetweenthe discreteFouriertransformand polynomial evaluation. If weimaginethe polynomial

a(x)= a^v-ix ~l+aN-2X~2 + \342\226\240\342\226\240\342\226\240+ a\\x + a\302\256

9.3.THEFASTFOURIERTRANSFORM 431

then the FourierelementAj is the value of a(x)at x = ioJ',wherew = e2m'N.Similarlyfor the inverseFouriertransform if we imaginethe polynomialwiththe Fouriercoefficients

A(x)= AN^xN~l + AN-2xN~2+ \342\200\242\342\200\242\342\200\242+ Axx + A0

then eacha^ is the value of A(x)/N at x = (w~1)k,wherew = e27ri/N.Thus, the discreteFouriertransform correspondsexactlyto the evaluationof a polynomial at N points:w\302\260,wl,...,w .

Fromthe precedingsectionwe know that we canevaluatean iVth-degreepolynomial at N points using 0(N2)operations.We apply Horner'sruleoncefor eachpoint.Thefast Fouriertransform (abbreviatedFFT)is analgorithmfor computingtheseN values usingonly 0(NlogN) operations.This algorithmwas popularizedby J. M. Cooley and J.W. Tukey in 1965,and the longhistory of this methodwas tracedby J.M. Cooley,P.A. Lewisand P. D.Welch.

A hint that theFouriertransform canbecomputedfasterthan by Horner'srule comesfrom observingthat the evaluationpointsarenot arbitrary butarein fact very special.They arethe N powerswJ for 0 < j < N \342\200\224 1,wherew = e2nt'N.Thepoint w is a primitive iVth rootof unity in the complexplane.

Definition9.1An elementw in a commutativering is calleda primitiveNth. root of unity if

1.w ^ 12.wN = 13- Eo<P<iv-iwjp = 0, 1< j < N - 1 \342\226\241

Example9.5Let N = 4. Then, w = e7\"/2 = cos(tt/2)+ i sin{n/2)= i.Thus, w ^ 1,and w4 = i4 = 1.Also,

\302\2430<P<3wJP = l +ij+ i2j + i3j = 0. \342\226\241

We now presenttwo simplepropertiesof iVth rootsfrom which we canseehow the FFTalgorithmcan easily beunderstood.

Theorem9.1Let N = 2n and supposewisaprimitive iVth rootof unity.Then -vp = wJ+n.

Proof:Here{w^+n)2= {wj)2{wn)2= {w^)2{w2n)= {w^)2sincewn = 1.Sincethe w^ aredistinct,we know that w> ^ vji+n, so we canconcludethatwJ+n= -WJ. D


Theorem9.2Let N = 2n and w a primitive iVth rootof unity. Then w2is a primitive nth rootof unity.

Proof:SincewN = w2n = 1,(w2)n = 1;this impliesw2 is an nth rootofunity. In additionwe observethat (w2)3 ^ 1for 1<j< n\342\200\224 1sinceotherwisewe would have wk = 1for 1< k < 2n = N which would contradictthe factthat w is a primitive Nth rootof unity. Thereforew2 is a primitive nth rootof unity. \342\226\241

Fromthis theoremwe can concludethat if w3\\ 0 < j < N \342\200\224 1,arethe primitive iVth rootsof unity and N = 2n, then w2j, 0 < j < n \342\200\224 1,areprimitive nth rootsof unity. Usingthesetwo theorems,we areready toshowhow to derivea divide-and-conqueralgorithmfor the Fouriertransform.Thecomplexityof the algorithmis 0(NlogN), an orderof magnitudefasterthanthe 0(N2)of the conventionalalgorithmwhich usespolynomial evaluation.

Again letajv-i,\342\200\242\342\200\242

\342\200\242, &o bethe coefficientsto betransformedand let a(x)=ajsi-ixN~l+ \342\200\242\342\200\242\342\200\242+ a\\x + clq. We breaka(x) into two parts,one of whichcontainseven-numberedexponentsand the otherodd-numberedexponents.

a(x) = ((lN-lXN~l+dN-3XN~3+ \342\226\240\342\226\240\342\226\240+ CL\\x)

+ (aN_2xN~2H +a<ix2+a0)

Lettingy = x2,we can rewritea{x)as a sumof two polynomials.

a{x) = {aN-iyn~l+aN-3yn~2 H +ax)x+ {aN_2yn~1+aN-4yn~2+ \342\226\240\342\226\240\342\226\240+a0)

= c{y)x+ b{y)

Recallthat the values of the Fouriertransformarea(ioJ'),0 < j < N \342\200\224 1.Thereforethe values of a(x) at the points up, 0 < j < n \342\200\224 1,are nowexpressibleas

a{w^) = c{w2:')w:j+b(w2:>)a{wj+n) = -c{w2j)wJ+b{w2j)

Thesetwo formulasarecomputationallyvaluable in that they revealhowto takea problemof sizeN and transformit into two identicalproblemsofsizen = N/2.Thesesubproblemsare the evaluationof b(y) and c(y), eachof degreen \342\200\224 1,at the points (w2)3,0 < j < n \342\200\224 1,and thesepoints are


primitive nth roots.This is an exampleof divide-and-conquer,and we canapply the divide-and-conquerstrategy againas longas the numberof pointsremainseven. This leadsus to always chooseN as a power of 2,N = 2m,for then we can continuetocarry out the splittingprocedureuntil a trivialproblemis reached,namely, evaluatinga constantpolynomial.

FFT (Algorithm 9.8)combinesall theseideasinto a recursiveversionofthe fast Fouriertransformalgorithm.Denserepresentationfor polynomialsis assumed. We overload the operators+, \342\200\224, *, and = with regardtocomplexnumbers.

123456789101112131415161718192021222324

AlgorithmFFT(iV, a(x), w, A)II N == 2m, a(x)= apf-ix 1 + \342\226\240\342\226\240\342\226\240+ao,and w is aII primitive iVth rootof unity. A[0 :N \342\200\224 1]is set to//{

}

the values a(wJ), 0 <j < N \342\200\224 1.

////

b and c arepolynomials.B,C,and wp arecomplexarrays.

if N = 1thenA[0] :=a0;else{

}

n :=N/2;b(x) :=aAf_2x\"_1+ \342\200\242\342\200\242\342\200\242+a^x+ao;c(x) :=ajv_ix\"_1+ \342\200\242\342\200\242\342\200\242+a^x + a\\\\

FFT(n, b{x),w2, B)-,FFT(n, c{x),w2, C);wp[\342\200\224l]

:=l/w;for j :=0 to n \342\200\224 1do{

wp[j] :=w * wp[j \342\200\224 1];A\\j]:=B\\j]+wp\\j]*C\\j]>,A[j +n] :=B\\j] -wp[j] * C\\j];

}

Algorithm9.8Recursivefast Fouriertransform

Now let us derive the computingtimeof FFT.Let T(N) be the timeforthe algorithmappliedto N inputs. Then we have


T{N)= 2T{N/2)+DNwhereD is a constant and DN is a bound on the timeneededto formb(x), c(x),and A. SinceT(l)= d, whered is anotherconstant, we canrepeatedlysimplify this recurrencerelationto get

T(2m) = 2T{2m~l)+ D2m

= Dm2m+T{l)2m= DNlog2N +dN= 0(Nlog2N)

Supposewe return briefly to the problemconsideredat the beginningofthis chapter,the multiplicationof polynomials.The transformationtechnique callsfor evaluatingA(x) and B(x)at 2N+ 1points (whereN is thedegreeof A and B),computingthe 2N+ 1productsA(xi)B(xi),and thenfinding the productA(x)B(x) in coefficient form by computingtheinterpolating polynomial that satisfiesthesepoints.In Section9.2we saw thatiV-point evaluationand interpolationrequired0(N2)operations,so thatno asymptotic improvement is gainedby usingthis transformationover theconventionalmultiplicationalgorithm.However,in this sectionwe have seenthat if the pointsarechosentobe the N = 2m distinctpowers of aprimitive iVth rootof unity, then evaluationand interpolationcan be doneusingat most 0(NlogN)operations.Thereforeby usingthe fast Fouriertransform algorithm,we can multiply two iV-degreepolynomialsin 0(NlogN)operations.

Thedivide-and-conquerstrategy plus somesimplepropertiesof primitiveNth rootsof unity leadsto a very niceconceptualframework forunderstanding the FFT.Theabove analysis showsthat asymptotically it is betterthan the directmethodby an orderof magnitude.Howeverthe version wehave producedusesauxiliary spacefor b, c,B,and C.We needto study thisalgorithmmoreclosely to eliminatethis overhead.

Example9.6Considerthe casein which a(x) = a3x3+ a,2X2 + a\\x +a\302\256.

Let us walk through the executionof Algorithm 9.8on this input. HereN = 4,n = 2,and w = i. The polynomials b and c are constructedasb(x) = a2x+ ao and c(x) = a3x+ a\\. FunctionFFT is invoked on b(x)and c(x) to get B[0]= ao + 0,2, B[l]= ao + a2w2', C[0]= a\\ + 03,andC[l]= ai +a3w2.

In the for loop,the array A[ ] is modified.When j = 0,wp[0] = 1.Thus, A[0] = B[0]+ C[0]= a0+al+a2+ a3 and A[2] = B[0]- C[0]=


do + 0,2 \342\200\224

a\\\342\200\224 CJ3 = ao + ai'u;2 + a2^4 + (I3W6 (sincew2 = \342\200\224 l,w4 = 1,

and w6 = -1).When j = 1,wp[l] = w. Then A[l] = B[l]+ wC[l]= a,o+a,2W2+w(ai+a,3W2)= ao+a\\w+a2W2+asw3and A[3] = B[l]\342\200\224 wC[l]=

ao+a,2W2\342\200\224w{a\\+azw2)=

ao\342\200\224a\\w+a2W2\342\200\224a^w3= ao+a\\w3+a,2W6+azw9

(sincew2 = \342\200\224 l,w4 = 1,and w6 =\342\200\2241).

\342\226\241

9.3.1An In-placeVersionof the FFTRecallthat if we view the elementsof the vector(ao,\342\200\242\342\200\242

\342\200\242, ojv-i)to betransformed as coefficientsof a polynomial a(x), then the Fouriertransform isthe sameas computinga{w3)for 0 < j < N. This transformationis alsoequivalent to computingthe remainderwhen a(x) is divided by the linearpolynomial x \342\200\224 w3, for if q{x)and c are the quotientand remaindersuchthat

a{x)= (x \342\200\224 w3)q{x)+c

then a{wJ)= 0 * q{x)+ c = c. We could divide a(x) by theseN linearpolynomials,but that would require0(N'2)operations.Insteadwe makeuseof the principlecalledbalancingand computetheseremainderswith thehelpof a processthat is structuredlike a binary tree.

Considerthe productof the linearfactors(x \342\200\224

w\302\260)(x\342\200\224 wl) \342\226\240\342\226\240\342\200\242{x\342\200\224 w7) =

x% \342\200\224

w\302\260. All the intermediatetermscanceland leave only the exponentseightand zerowith nonzerocoefficients.If we selectout from this productthe even- and odd-degreeterms,a similarphenomenonoccurs:(x \342\200\224

w\302\260) (x \342\200\224

w2)(x\342\200\224

wA){x\342\200\224 w6) = (x4 \342\200\224

w\302\260)and (x \342\200\224 wl)(x \342\200\224 w3) (x \342\200\224 w5)(x\342\200\224 w7) =

(x4 \342\200\224 w4). Continuingin a similarfashion, we see in Figure9.2that theselectedproductshave only two nonzerotermsand we can continuethissplittinguntil only linearfactorsarepresent.

Now supposewe want to computethe remaindersof a(x) by eightlinearfactors(x \342\200\224

w\302\260),...,(x \342\200\224 w7).We beginby computingthe remainderof a(x)divided by the productd{x)= (x\342\200\224w\302\260)

\342\226\240\342\226\240\342\226\240(x\342\200\224 w7).lia(x)= q(x)d(x)+r(x),then a(wJ)= r{w3),0 < j < 7, sinced(wJ)= 0 and the degreeof r(x) is lessthan the degreeof d(x) which equals8.Now we divide r(x) by x4 \342\200\224

w\302\260 andobtains(x),and by x4 \342\200\224 w4 and obtaint(x).Then a(10?)= r(w^) = s(wi)for j = 0,2,4,and 6 and a(wJ) = r(wJ) = t(wJ) for j = 1,3,5,and 7 anddegreesof s and t arelessthan 4.Next we divide s(x)by x2 \342\200\224

w\302\260 and x2 \342\200\224 w4and obtain remaindersu(x) and v(x), wherea(wJ) = u(wJ) for j = 0 and4 and a(wJ) = v(wJ) for j = 2 and 6. Noticehow eachdivisor has onlytwo nonzerotermsand so the division processwill be fast. Continuinginthis way, we eventually concludewith the eightvaluesa(x) mod(x \342\200\224 wl) forj =0,1,...,7.


Y Y c2m~i+1= Y c2m = c2mm = 0{NlogN)\\<i<m l<j'<2i_1 l<i<m

Example9.7Now supposewe simulatethe algorithmas it works on theparticularcaseN = 4. We assumeas inputs the symbolic quantitiesa[l]=ao> a[2] = 0,1,a[3] = a,2, and a[4] = 03.Initially m = 2 and iV = 4. Afterthe first for loopis completed,the array containsthe elementspermutedasa[l]\342\200\224 ao,a[2] = 02,a[3] = ai,and a[4] = 03.Themainfor loopis executedfor i = 1and i = 2. After the i = 1passis completed,the array containsa[l]= ao + 0(2) a[2] = ao \342\200\224 a2,a[3] = a,\\ + a^, and a[4] = ai \342\200\224 a.3.At thispoint we observethat in generalthat wNl2 = \342\200\224 1and in this casew2 = \342\200\224 1and the complexnumberexpressedas the 2-tuplecos7r+ i sin-zr) is equalto w. At the end of the algorithmthe final values in the array a area[l]=ao+ai+a2+a3,a[2] = clq +wcli+w2a,2+w3a,3,a[3] = ao+w2ai+a2+^2a3,and a[4] = ao + ufia\\ + w2a,2+wa^. \342\226\241

9.3.2 SomeRemainingPoints

Up to now we have beentreatingthe value w as e2vilN. This is a complexnumber (it has an imaginary part) and its value cannotbe representedexactly in a digitalcomputer.Thusthe arithmeticoperationsperformedin theFouriertransformalgorithmwereassumedto beoperationsoncomplexnumbers, and this impliesthey areapproximationsto the actualvalues.Whenthe inputs to be transformedarereadingsfrom a continuoussignal,approximations of w do not causeany significant lossin accuracy.However,thereareoccasionswhen we would preferan exactresult,for instance,when weareusingthe FFTfor polynomial multiplicationin a mathematicalsymbolmanipulationsystem.It is possibleto circumventthe needfor approximate,complexarithmeticby working in a finite field.

Let p bechosensuchthat it is a primethat is lessthan your computer'sword sizeand such that the integers0,1,\342\226\240\342\200\242\342\226\240,p\342\200\224 1containa primitive nthroot of unity. By doingall the arithmeticof the fast Fouriertransformmodulop,all the resultsaresingleprecision.By choosingp to be a prime,the integers0,1,...,p\342\200\224

1form a field and allarithmeticoperationsincludingdivisioncanbeperformed.If allvaluesduringthe computationareboundedby p \342\200\224 1,then the exactansweris formed sincex modp = xii0<x<p.However,if oneor morevalues exceedp \342\200\224 1,the exactanswer can stillbeproducedby repeatingthe transform usingseveraldifferent primesfollowedby the ChineseRemainderTheoremas describedin the next section.Sothe questionthat remainsis,given an N, can one find a sufficient numberof primesof a certainsizethat containiVth roots?Fromfinite field theory{0,1,...,p \342\200\224 1}containsa primitive iVth rootif and only if N dividesp \342\200\224 1.


Therefore,to transforma sequenceof sizeN = 2m, primesof the formp = 2ek+ 1,wherem < e, must be found. Callsucha numbera Fourierprime.J.Lipsonhas shown that therearemorethan x/{2e~lIn x) Fourierprimeslessthan x with exponente, and hencetherearemorethan enoughfor any reasonableapplication.Forexample,if the word sizeis 32 bits,letx = 231 and e = 20.Then thereareapproximately182primesof the form2?k +l,where/ > 20.Any of theseFourierprimeswouldsuffice to computethe FFTof a sequenceof at most 220. Seethe exercisesfor moredetails.

EXERCISES

1.A polynomial of degreen > 0 has n derivatives,eachoneobtainedbytaking the derivative of the previousone.Devisean algorithmthatproducesthe values of a polynomialand its n derivatives.

2.Show the result of applying the Fouriertransform to the sequence(a0,...,a7).

3.TheFouriertransform can begeneralizedto k dimensions.Forexample,

the two-dimensionaltransform takesthe matrixa(0 : n \342\200\224 1,0:n \342\200\224 1) and yields the transformedmatrix

MhJ)= E E ak,iw-(ik+Mn (9.11)0<fc<ra-10<Z<7i-l

for an nxnmatrixwith elementsin GF(p).Theinversetransformationis

\302\253frJ)= -2 E E A(k,l)w~(ik+Mn (9.12)n 0<Kn-10<Kn-l

Define the two-dimensionalconvolutionC(i,j)= A(i,j)B(i,j)andderive an efficient algorithmfor computingit.

4. Presentan 0(n)timealgorithm to computethe coefficientsof thepolynomial (1+ x)n. How much time is neededif you use the FFTalgorithmto solve this problem?

5. An nxn Toeplitz matrixis a matrixA with the property that A[i, j] =A[i \342\200\224 l,j\342\200\224 1],2 < i,j<n. Give an 0(nlogn)algorithmto multiplya Toeplitzmatrixwith an arbitrary (nxl)columnvector.


9.4 MODULARARITHMETICAnother exampleof a useful set of transformationsis modulararithmetic.Modulararithmeticis useful in one contextbecauseit allows thereformulation of the way addition,subtraction,and multiplicationare performed.Thisreformulationis onethat exploitsparallelismwhereasthe normalmethods for doingarithmeticareserial.Thegrowth of specialcomputersthatmake it desirableto performparallelcomputationmake modulararithmeticattractive.A seconduse of modulararithmeticis with systems that allowsymbolicmathematicalcomputation.Thesesoftware packagesusuallyprovide operationsthat permitarbitrarily largeintegersand rationalnumbersasoperands.Modulararithmetichas beenfound to yield efficient algorithmsfor the manipulationof largenumbers.Finally thereis an intrinsicinterestin finite field arithmetic(the integers0,1,...,p\342\200\224 1,wherep is a primeforma field) by number theoristsand electricalengineersspecializingincommunications and codingtheory. In this sectionwe study this subjectfroma computerscientist'spoint of view, namely, the developmentof efficientalgorithmsfor the requiredoperations.

The mod operatoris definedas

x mody = x \342\200\224 y [x/y\\ , if y ^ 0x mod0 = x

Note that |correspondsto fixed point integerdivision which is commonlyfound on most current-daycomputers.

We denotethe set of integers{0,1,...,p\342\200\224 1},wherep is a prime,byGF(p) (the Galoisfield with p elements),namedafter the mathematicianE.Galoiswho studiedand characterizedthe propertiesof thesefields.Alsowe assumethat p is a singleprecisionnumberfor the computeryou plan toexecuteon. It is,in fact, true that the set GF(p) forms a field under thefollowing definitions of addition,subtraction,multiplication,and division,where a,6 \302\243 GF(p):

(a + 6) modp

(a \342\200\224 b) modp

(ab) modp \342\200\224 r suchthat r is the remainderwhen the productab is dividedby p;ab = qp+ r,where0 < r <p.

(a/6) modp = (a6-1)modp = r, the uniqueremainderwhen ab~l isdivided by p;ab~l= qp+ r, 0 < r <p.

-{-{

a + b if a + 6 <pa + b \342\200\224 p ifa +6>pa \342\200\224 b if a \342\200\224 6 > 0a \342\200\224 b +p if a \342\200\224 6 < 0

9.4.MODULARARITHMETIC 441

Thefactorb is the multiplicative inverseof b in GF(p).Forevery elementb in GF(p)exceptzero,thereexistsa uniqueelementcalledb~l such thatbb~l modp = 1.

Now what arethe computingtimesof theseoperations?We have assumedthat p is a singleprecisioninteger;this impliesthat alla,b G GF(p)arealsosingleprecisionintegers.Thetimefor addition,subtraction,andmultiplication modp,given the formulasabove,is easily seento be0(1).But beforewe can determinethe timefor division, we must developan algorithmtocomputethe multiplicativeinverseof an elementb \302\243 GF(p).

By definition we know that to find x = ft-1,theremust existan integerk, 0 < k <p, suchthat bx = kp + 1.Forexample,if p = 7,

b: 123 4 5 6 (element)b'1: 1 4 5 2 3 6 (inverse)k: 0 12125

An algorithmfor computingthe inverseof b in GF(p) is provided bygeneralizing Euclid'salgorithmfor the computationof greatestcommondivisors(gcds).Given two nonnegativeintegersa and b, Euclid'salgorithmcomputestheirgcd.Theessentialstepthat guaranteesthe validity of his methodconsists of showing that the greatestcommondivisor of a and b (a > b > 0) isequal to a if b is zeroand is equalto the greatestcommondivisor of b andthe remainderof a divided by b if b is nonzero.

Example9.8

gcd(22,8)= gcd(8,6) -gcd(6,2) = gcd(2,0)= 2

and gcd(21,13)= gcd(13,8)= gcd(8,5)= gcd(5,3)= gcd(3,2)=gcd(2,l)=gcd(l,0)=l\342\226\241

Expressingthis processas a recursivefunction givesAlgorithm 9.10.UsingEuclid'salgorithm,it is alsopossibleto computetwo moreintegers

x and y such that ax + by = gcd(a,6). Letting a be a primep and b 6GF(p),the gcd(p,b)= 1 (sincethe only divisors of a primeare itselfandone), and Euclid'sgeneralizationreducesto finding integersx and y suchthat px+by \342\200\224 1.Thisimpliesthat y is the multiplicativeinverseof b modp.

A closeexaminationof ExEuclid (Algorithm9.11)showsthat Euclid'sgcdalgorithmis carriedout by the stepsq := \\_c/d};, e :=c \342\200\224 d*q;,c :\342\200\224 d;,andd :=e;.Theonly otherstepsarethe updatingsof x and y as the algorithmproceeds.To analyze the timefor ExEuclid,we needto know the numberofdivisions Euclid'salgorithmmay require.This was answeredin the worstcaseby G.Lamein 1845.


1 AlgorithmGCD(a,b)2 // Assume a > b > 0.3 {4 if b ^ 0 thenreturnGCD(6,a mod6);5 elsereturna;6 }

Algorithm9.10Algorithm tocomputethe gcdof two numbers

1 AlgorithmExEuclid(6,p)2 // b is in GF(p),p beinga prime.ExEuclid is a function3 // whoseresult is the integerx suchthat bx + kp = 1.4 {5 c :=p;d :=b; x :=0;y :=1;6 while (d ^ 1)do7 {8 q:=[c/d};9 e :=c \342\200\224 d*q;10 w :=x \342\200\224 y * q;11 c :=d; d :=e;a; :=y; y :=w;12 }13 if y < 0 theny :=y +p;14 returny;15 >

Algorithm9.11ExtendedEuclideanalgorithm


Theorem9.3[G.Lame, 1845]Forn > 1,leta and b be integersa > b > 0,such that Euclid'salgorithmappliedto a and b requiresn division steps.Then n < 51og]06. \342\226\241

Thus the while loopis executedno morethan O(log10p)times,andthis is the computingtimefor the extendedEuclideanalgorithmand hencefor modulardivision. By modular arithmeticwe mean the operationsofaddition,subtraction,multiplication,and division modulop as previouslydefined.

Now let'sseehow we can use modulararithmeticas a transformationtechniqueto helpus workwith integers.We beginby lookingat how we canrepresentintegersusinga set of moduli,then how we performarithmeticon this representation,and finally how we can producethe properintegerresult.

Let a and b be integersand supposethat a is representedby the r-tuple (ai,\342\226\240\342\226\240

\342\226\240, ar), wherea^ = a modpi,and b is representedby (&i,...,br),where b%

= b modpj. Thepi are typically singleprecisionprimes.This iscalleda mixedradixrepresentationwhich contrastswith the conventionalrepresentationof integersusing a singleradixsuch as 10(decimal)or 2(binary). The rulesfor addition,subtraction,and multiplicationusing amixedradixrepresentationareas follows:

(ai,...,ar) + (&i,...,br) \342\200\224 ((ai+&i) modpi,...,(ar + br) modpr)(ai,...,ar)(6i,...,6r)= ((ai&i)modp1,...,(arbr)modpr)

Example9.9Forexample,let the modulibe p\\ = 3, p2 = 5, and p%= 7

and supposewe start with the integers10and 15.

10 = (10mod3,10mod5,10mod 7) = (1,0,3)15 = (15mod3, 15mod5,15mod7) = (0,0,1)

Then

10+ 15 = (25mod3,25mod5,25mod7) = (1,0,4)= (1+ 0 mod3,0+ 0 mod5,3+ 1mod7) = (1,0,4)

Also 15-10= (5 mod3, 5 mod5,5 mod7) = (2,0,5)= (0-1mod3,0-0mod5,1-3mod7) = (2,0,5)


Also 10*15= (150mod3,150mod5,150mod7) = (0,0,3)= (1*0mod3,0*0mod 5,3*1mod 7) = (0,0,3)\342\226\241

After we have performedsomedesiredsequenceof arithmeticoperationsusingtheser-tuples,we areleft with somer-tuple(ci,...,cr).We now needsomeway of transformingback from modularform with the assurancethatthe resultingintegeris the correctone.The ability todo this is guaranteedby the following theoremwhichwas first proven in full generalityby L.Eulerin 1734.

Theorem9.4[ChineseRemainder Theorem] Let p\\,...,prbe positiveintegers that are pairwiserelatively prime(no two integershave a commonfactor). Let p = p\\---prand let b,a\\,...,arbe integers.Then, there isexactlyoneintegera that satisfies the conditions

b < a < b +p and a = a,i modpi for 1< i < r

Proof:Let x be another integer,different from a, such that a = x modpifor 1< i < r. Thena \342\200\224a; is a multipleof pi for alli. Sincethepi arepairwiserelatively prime,it follows that a \342\200\224 x is a multipleof p.Thus, therecan beonly one solutionthat satisfies theserelations.We show how to constructthis value in a moment. \342\226\241

A pictorialview of thesetransformationswhen appliedto integermultiplication is given in Figure9.3.Insteadof usingconventionalmultiplication,which requires0((loga)2)operations(a = max(a,&)), we choosea set ofprimespi,...,prand computem = a modpi,bi= b modpi, and thenCi = a,ibi modpi.Theseareall singleprecisionoperationsand sotheyrequire 0(r)steps.Ther must besufficiently largeso that ab < p\\

\342\226\240\342\226\240-pr.Theprecisionof a is proportionalto loga and hencethe precisionof ab is nomorethan 2 loga = O(loga).Thus r = O(loga)and the timefor transformationinto modularform and computingthe r productsis O(loga).Thereforethevalue of this methodrestson how fast we can performthe inversetransformation by the ChineseRemainderAlgorithm.

Supposewe considerhow to computethe value in the ChineseRemainderTheoremfor only two moduli:Given a modp and b modq, we wish todeterminethe uniquec suchthat c modp = a and c modq = b. Thevaluefor c that satisfies thesetwo constraintsis easily seento be

c= (b \342\200\224 a)sp+a

where s is the multiplicativereciprocalof p mod q; that is,s satisfiespsmodq = 1.To show that this is correct,we note that


integers

division

integersmod p

conventionalmultiplication

mod pmultiplication

integerproducts

ChineseRemainderAlgorithm

productsmod p

Figure9.3Integermultiplicationby modp transformations

((b \342\200\224 a)sp+a) modp \342\200\224 a

sincethe term (b \342\200\224 a)sphas p as a factor.Secondly

((b \342\200\224 a)sp+a) modq = (b \342\200\224 a)spmodq +a modq= (b \342\200\224 a) modq +a modq= (b \342\200\224 a +a) modq= b

OneStepCRA(Algorithm 9.12)uses ExEuclid and arithmeticmodulop tocomputethe formula we have just described.Thecomputingtimeisdominated by the calltoExEuclid which requiresO(logq)operations.

The simplestway to use this procedureto implementthe ChineseRemainder Theoremfor r moduliis to apply it r \342\200\224 1 timesin the followingway. Given a set of congruencesai mod Pi, 1< i < r, we let OneStepCRAbe calledr \342\200\224 1timeswith the following setof values for the parameters.

P b q outputFirsttimeSecondtimeThird time C2

PiP1P2P1P2P3

d2 P2 Ciaj, P3 c2\302\2534 P\\ C3

(r \342\200\224 l)sttime cr_2 PiP2---Pr~io,r pr cr_i


1 AlgorithmOneStepCRA(a,p, b, q)2 11a and b are in GF(p),gcd(p,q)= 1.This function3 // returnsa c suchthat c modp = a and c modq = b.4 {5 t :=a modg;p6 :=p modq; s :=ExEuclid(p6,q)\\6 u:=((b \342\200\224 t) * s) modq; if (u <0) thenif :=u + q',7 t :=u * p +a; returnt;8 }

Algorithm9.12One-stepChineseRemainderAlgorithm

Thefinal resultcr_i is an integersuchthat cr_i mod pi = ai for 1< i < rand cr_i<pi \342\226\240\342\226\240-pr.Thetotalcomputingtimeis 0(rlogq) = 0(r2).Example9.10Supposewe wish to take4, 6,and 8 and compute4 +8*6= 52.Let p\\ = 7, and p2 = 11.

4 = (4 mod7,4mod11) = (4,4)6 = (6 mod7,6mod11) = (6,6)8 = (8 mod7,8mod11) = (1,8)8*6 = (6*1mod7,8*6mod11) = (6,4)4 +8*6 = (4 + 6 mod7,4+4 mod 11) = (3,8)

So,we must convert the 2-tuple(3, 8) backto integernotation. UsingOneStepCRAwith a = 3, b = 8,p= 7, and q = 11,we get

t \342\200\224 a modq = 3 mod 11= 3pb = p modq = 7 mod11= 7s= ExEuclid(p6,q) = 8; k = 5u = ((b-t)s) mod q = {8-3)8mod11= 40 mod11= 7return(u*p+a)= 7*7+ 3 = 52 \342\226\241

In conclusionwe review the computingtimesfor modulararithmetic.Ifa,bG GF(p),wherep is singleprecision,then

operation computingtime~a+~b Oil)ab 0(1)a/b 0{\\ogp)c:=(ci,...,cr)0{rlogc)Cj = c mod pjc:=(ci,...,cr)0(r2)


EXERCISES

1.Given the finite field A = {0,1,...,p\342\200\224 1},oneof theseelementsx issuchthat

x\302\260, x,x2,...,xp~2areequaltoallthe nonzeroelementsof A.Theelementx is calleda primitive element.If a; is a primitive elementand n dividesp \342\200\224 1,then a^P-1)/71is a primitive nth rootof unity. Tofind sucha value x, we use the fact that x(p \"q ^ 1for eachprimefactorq of p \342\200\224 1.Usethis fact towrite an algorithmthat, when givena, b, and e, finds the a largestFourierprimelessthan or equal to bof the form 2-^fc + 1with f > e.Forexample,if a = 10,6= 231,ande = 20,the answer is:

p2130706433211497779321139292172099249153209505484920887633932077229057207093760120478689292035286017

/24202521212320202020

leastprimitiv<3353115361310

2. [Dime,Hellman,Rivest, Shamir,Adelman]Somepeopleareconnectedto a computernetwork.They needa mechanismwith which they cansendmessagesto one another that can't be decodedby a third party(security)and in additioncan prove any particularmessageto havebeensent by a given person(a signature).In short eachpersonneedsan encodingmechanismE and a decodingmechanismD such thatD(E(M)) = M for any messageM. A signaturefeature is possibleif the senderA first decodesher or his messageand sends it and it

is encodedby the receiverusingA's encodingschemeE (E(D(M))\342\200\224

M).TheE for all usersis publishedin a publicdirectory.Theschemeto implementD and E proposedby Rivest, Shamir, and Adelmanrelieson the difficulty of factoringversus the simplicityof determiningseverallarge(100digit)primes.Usingmodulararithmetic,seewhetheryou can constructan encodingfunction that is invertible but only ifthe factors of a numberareknown.


9.5 EVEN FASTEREVALUATIONAND INTERPOLATION

In this sectionwe study four problems:

1.Proman n-precisionintegercomputeits residuesmodulon singleprecision primes.

2.Froman n-degreepolynomial computeits values at n points.3.Fromn singleprecisionresiduescomputethe uniquen-precisioninteger

that is congruentto the residues.

4. Fromn pointscomputethe uniqueinterpolatingpolynomial throughthosepoints.

We saw in Sections9.2and 9.4that the classicalmethodsfor problems1 to 4 take 0(n2)operations.Herewe show how to use the fast Fouriertransform to speedup all four problems. In particularwe derivealgorithms for problems1and 2 whosetimesare0(n(logn)2)and for problems3 and 4 whose timesare 0(n(logn)3).Thesealgorithmsrely on the fastFouriertransform as it is used to performn-precisionintegermultiplication in timeO(nlognloglogn).Thisalgorithm,developedby H.Schonhageand V. Strassen,is the fastest known way to multiply. Becausethisalgorithm is complexto describeand already appearsin severalplaces(see,e.g.,D.E.Knuth citedin Referencesand Readingsat the end of this chapter),we simply assumeits existencehere. Moreover to simplify thingssomewhat, we assumethat for n-precisionintegersand for n-degreepolynomialsthe timeto add or subtractis 0(n)and the timeto multiply or divide is0(nlogn).In additionwe assumethat an extendedgcd algorithmisavailable (seeAlgorithm9.11)for integersorpolynomialswhosecomputingtimesare0(n(logn)2).

Now considerthe binary treeas shown in Figure9.4.As we go down thetree,the level numbersincrease,while the rootof the tree is at the topatlevel 1.Theith level has 2l~lnodesand a treewith m levelshas a totalof2m \342\200\224 1nodes.We are interestedin computingdifferent functions at everynodeof sucha binary tree. Algorithm 9.13is an algorithmfor moving upthe tree.

Subsequentlywe areconcernedabout the costof the operation*, whichis denotedby C(*).Given the value of C(*) on the ith level (callit Cj(*))and Algorithm 9.13,the totaltimeneededto computeevery nodein a treeis

\302\243V-'Cii*) (9-13)

Ki<m-1

9.5.EVEN FASTEREVALUATIONAND INTERPOLATION 449

level1

?41 ?42 ?43 t44 ?45 f46 ?47 ?48

Figure9.4A binary tree

1234567891011121314151617

AlgorithmMoveUpATree(t, n)II n = 2m_1 values arestoredin i[l:m,1:n] in locations// i[m,1:n].The algorithmcausesthe nodesof a binary tree// tobe visited sothat at eachnodean abstractbinary operation// denotedby * is performed.Theresultingvalues arestored//in the array as indicatedin Figure9.4.{

i := rn \342\200\224 1to 1 step\342\200\2241 dofor{

p \342\226\240= i;for j :={

1to 2*-1do

t[i,j]:=t[i+l,ip:=p+2;

*t[i+ l,p+ l]

Algorithm9.13Moving up a tree


Similarly Algorithm 9.14is an algorithmthat computeselementsas wego down the tree.We now proceedto the specificproblems.

123456789101112131415161718

AlgorithmMoveDownATree(s,t, m)II n = 2m~land i[l,1]is given.Also,s[l:m,1:n] is given// containinga binary treeof values.The algorithmproduces// elementsand storesthem in the array t[l:m,1:n] at the// positionsthat correspondto the nodesof the binary tree//inFigure9.4.{ :=2 to m dofor

{ p:=l;for j :={

1to 2i~l step2 do

i,p];hi]~ s[hJ]*Aii,j+ 1] :=s[i,j+ 1]*t[i:=p+l; hp\\;

Algorithm9.14Movingdown a tree

Problem1Let u bean n-precisionintegerandpi,...,pn besingleprecisionprimes.We wish to computethe n residuesU{ = u modp% that give themixedradixrepresentationfor u.We considerthe binary tree in Figure9.5.Starting from the leaves of the tree,we move up the tree,computingtheproductsindicatedat eachnodeof the tree.

If n = 2m_1,then productson the ith level have precision2m-*,1 <i < m. Usingour fast integermultiplicationalgorithm,we can computetheelementsgoingup the tree. ThereforeCj(*) is 2m~l~1(m\342\200\224 i \342\200\224 1) and thetotaltimeto completethe tree is

Ei<i<m\342\200\2241

\342\200\242>i\342\200\224lorn-i \342\200\224 1(m \342\200\224 i \342\200\224 1)

=2\"I-2((m~1)2(m-2))

= 0(n(logn)2)(9.14)

Now to computethe n residuesUi \342\200\224 u modpi,we reversedirectionandproceedto computefunctions down the tree. Sinceu is n-precisionand the

9.5.EVEN FASTEREVAL UATIONAND INTERPOLATION 451

P\\ \342\226\240\342\200\242\342\200\242Ps

P\\P2P3P4

P\\P2 PlP4

P5P(,PlPi

P5P6 Pi Pi

Figure9.5Binary treewith moduli

primesareallnearthe maximumsizeof a singleprecisionnumber,we firstcomputeu modp\\- \342\226\240-pn

= ub. Then the algorithmcontinuesby computing

\302\2532,i

= ub modp\\\342\226\240\342\226\240\342\226\240

pn/2 and ^2,2 = ub modpn/2+i \342\200\242\342\200\242'Pn

Then we compute

\302\2533,l

=\302\2532,l

modpi \342\226\240\342\226\240\342\226\240pn/4, ^3,2= u2,imodpn/A+i\342\226\240\342\200\242\342\226\240

Pn/2\302\2533,3

= \"2,2 modpn/2+l \342\226\240\342\226\240\342\226\240

P3n/4, u3,4 = \"2,2 modp3\342\200\236/4+i

\342\200\242\342\200\242\342\200\242pn

and soon down the treeuntil we have

Um,l=Ui,Umt2=U2,..., Um^m-1)= Un

A nodeon level i is computedusingthe previouslycomputedproductofprimesat that positionplus the elementUj^-\\ at the descendantnode.Thecomputationrequiresa divisionoperationsoCj(*)is 2m~l+l(m\342\200\224 i +1)andthe totaltimefor problem1is

Ei<1<m2,\"12m-1+\\m-i+ l)

2m(m2-^f^)= 0(n(logn)2)(9.15)


(x-x,)-\342\226\240-(x-x8)

(x-x,)\342\200\242\342\226\240\342\200\242(x-x4) (x-x5)\342\200\242\342\226\240\342\200\242(x-x8)

(x-X\\)(x-x2) (x-x3)(x-x4) (x-x5)(x-x6) (x-x7)(x-xs)

(x-Xi) (x-x2) (x-x3) (x-x4) (x-x5) (x-x6) (x-x-j) (x-xs)

Figure9.6A binary treewith linearmoduli

Problem2 Let P{x)be an n-degreepolynomial and x\\,...,xn ben singleprecisionpoints.We wish tocomputethe n values P{x{),1< i < n. Wecan use the binary tree in Figure9.6to performthis computation.

First,we move up the treeand computethe productsindicatedat eachnodeof the tree. If n = 2m_1,the productson the ith level have degree2m~%. Usingfast polynomial multiplication,we computethe elementsgoingup the tree. ThereforeCj(*) is 2m~t~l(m\342\200\224 i \342\200\224 1) and the total time tocompletethe tree is

Ei<i<m-i2,'-12'\"-i-1(m-i-l)(9.16)

=2\342\204\242-2 (^-1}2(m-2))= 0(n(logn)2)

Now to computethe n values P(xi),we reversedirectionand proceedtocomputefunctions down the tree. If D(x)= (x \342\200\224

x\\)\342\226\240\342\226\240\342\226\240(x \342\200\224 xn), then we

can divide P(x) by D(x)and obtain the quotientand remainder

P{x)= D{x)Q{x)+Rn{x)

wherethe degreeof i?n is lessthan the degreeof D. By substitution it

follows that

P(xi) = Rn{xi), \\<i<n


The algorithmwould continueby dividing Rn(x) by the first n/2factorsofD(x)and then by the secondn/2factors.Callingthesepolynomials D\\(x)and D2{x),we get the quotientsand remainders

Rn{x) = DtWQ^x) +Ri2{x)Rn{x) = D2{x)Q2{x)+R22{x)

By the sameargumentwe seethat

P{Xi) = ( ^\\ l %^nJ2. (9.17)v i;\\ R22{Xi), n/2+ 1<i <n v ;

Eventually we arrive at constantsi?m;i,...,Rm 2(m-i),whereP(x{)= Rmtifor 1< i <n. Sincethe timefor multiplicationand division of polynomialsis the same,Cj(*) is 2m~J(m\342\200\224 i) and the totalfor problem2 is

(9.18)=

2\342\204\242-1 (m2-

=^\302\26111)= 0(n(logn)2)

Problem3 Given n residuesU{ of n singleprecisionprimespi,we wish tofind the uniquen-precisionintegeru such that u mod pi \342\200\224 Ui, 1< i < n.It follows from the ChineseRemainderTheorem,Theorem9.4,that thisintegerexistsand is unique.Forthis problem,as for problem1,we assumethe binary tree in Figure9.5has already beencomputed.What we needtodo is goup the treeand at eachnodecomputea new integerthat is congruentto the productof the integersat the childrennodes.Forexample,at thefirst level letU{ \342\200\224 um^ 1< i < n = 2m~l.Then for i odd,we computefromum,i modpi and

\302\253m)!+imod pi+\\ the uniqueinteger\302\253m_i,i

= um^ modptand itm-i,i= \302\253m,i+i

modpi+\\. Thus um-ijliesin the range [0, PiPi+i).Repeatingthis processup the tree,we eventually producethe integeru inthe interval [0, p\\

\342\226\240\342\226\240\342\226\240pn). Sowe needtodevelopan algorithmthat proceedsfrom level i to i \342\200\224 1.But we already have suchan algorithm,the one-stepChineseRemainderAlgorithm OneStepCRA.Thetimefor this algorithmwasshown to bedominatedby the timefor ExEuclid.Usingour assumptionthatExEuclid can be done in 0(n(logn)2)operations,wheren is the maximumprecisionof the moduli,then this is alsothe timefor OneStepCRA.Note thedifferencebetweenits use in this sectionand in Section9.4.In Section9.4only oneof the moduliwas growing.

We now apply this one-stepalgorithmto an algorithmthat proceedsupthe treeof Figure9.5.The totaltimefor problem3 is seentobe


Ekk\342\204\242-!^^-1^-.-!)2(9.19)

= 2\342\204\242-2Ei<i<m-i(m-i-l)2= 0(n(logn)3)Problem4 Given n values yi,...,ynat n = 2m^1 points (x\\,...,xn),we wish to computethe unique interpolatingpolynomial P(x) of degree< n \342\200\224 1suchthat P(xi) =

y%. Forthis problem,as for problem2, we assumethat the binary tree in Figure9.6has already beencomputed.Again weneedan algorithmthat goesup the treeand at eachnodecomputesa newinterpolatingpolynomial from its two children.Forexample,at level m wecomputepolynomialsRmi(x),...,Rmn(x)such that Rmi(xi)~ yi, 1< i <n. Then at level m \342\200\224 1we computePm_i;i,...,-Rm_i)7l/2such that, for1< i <n/2,

Rm~l,i{X2i-l)= 2/2J-1

Rm-l,i{x2i)= V2i

and soon, until Rn{x) = P{x).Thereforewe needan algorithmthatcombinestwo interpolatingpolynomials to give a third that interpolatesat both setsof points.This requiresa generalization,Algorithm 9.15,ofalgorithmInterpolate,Algorithm 9.7.In this algorithm,the operators+, \342\200\224,

*, and mod have beenoverloadedto take polynomial operands.Also,Ql{x)= {x-xi)(x-x2)---{x-xk/2)and Q2{x)= {x-xk/2+i)\342\226\240\342\226\240\342\200\242{x-xk)with gcd(Ql,Q2)= 1. Balancedlnterpreturns a polynomial A such thatA(xi) = Ul{xi)for 1< i < k/2 and A(xt) = U2(xi)for k/2 + 1< i < k.Thedegreeof A is < k \342\200\224 1.

We note that lines5, 7, and 8 of Algorithm 9.15imply that thereexistquotientsC1,C2,and C3suchthat

U\\ = Q2 * CI+PI, deg(Pl)< deg(Q2) (a)

Ql = Q2*C2+P2, deg(P2)< deg(Q2) (b)

P4* P2+ C3* Q2 = 1, deg(P4)< deg{Q2)(c)

P4is the multiplicativeinverseof P2modQ2.Therefore

A = U1+ {U2-P1)*P4*QI (i)

A = U1+ {U2+Q2*C1-U1){{1-C3*Q2)/P2)*Q1(ii)


1 AlgorithmBalancedlnterp(C/l,U2,Ql,Q2,k)2 // Ul,U2,Ql,and Q2 areallpolynomials in a;.Ul interpolates3 // the first k/2 pointsand U2 interpolatesthe next k/2 points.4 {5 PI :=UlmodQ2;6 11a modb computesthe poly, reminderof a(x)/b(x).7 P2:=Ql modQ2;8 P3:=ExEudid(P2,Q2);9 // The extendedEuclideanalg.for polynomials.10 P4 :=P3modQ2;11 returnUl+ {U2-PI)*P4*Q1;12 }

Algorithm9.15Balancedinterpolation

using (a) and (c).By (i), A(xi) = Ul(xi)for 1 < i < k/2 sinceQl(:r)evaluatedat thosepointsis zero.By (ii), it is easy to seethat A(x)= U2(x)at pointsxk/2+i,\342\226\240\342\226\240\342\226\240,xh.

Now lines 5 and 7 takeO(klogk)operations.To computethemultiplicative inverse of P2,we use the extendedgcd algorithmfor polynomialswhich takes0(k(logk)2)operations.The timefor line 10is no morethanO(klogk)sothe totaltimefor one-stepinterpolationis 0(k(logk)2).

Applying this one-stepalgorithmas we proceedup the treegives a totalcomputingtimefor problem4 of

^2 2i-12m-i-1{m-i-if = 0(n(logn)3) (9.20)l<i<m-l

Theexercisesshow how one can further reducethe timefor problems3and 4 usingpreconditioning.

EXERCISES1.Investigatethe problemof evaluatingan nth-degreepolynomial a(x)

at the n points2*, 0 < i < n\342\200\224 1.Note that a{21)requiresnomultiplications, only n additionsand n shifts.


2.Given the n points(2l,Vi), 0 < i < n \342\200\224 1,whereyi is an integer,designan algorithmthat producesthe unique interpolatingpolynomial ofdegree<n. Try to minimizethe numberof multiplications.

3. In Section9.5the timefor the n-valueChineseRemainderAlgorithmand n-point interpolationis shown to be 0(n(logn)3).However,it ispossibleto getmodifiedalgorithmswhosecomplexitiesare0(n(logn)2)if we allow certainvalues to be computedin advancewithout cost.Assuming the moduliand the points are so known, what shouldbecomputedin advance to lower the complexityof thesetwo problems?

9.6 REFERENCES AND READINGSThefast Fouriertransform algorithmwas popularizedby J.M. Cooley andJ.W. Tukey. For moredetailson the FFTsee:The FastFourierTransform and ItsApplications,by O.Brigham,PrenticeHall,1988.NumericalRecipes:The Art of ScientificComputing,by W. H.Press,B.P.Flannery, and S.A. Teukolsky,CambridgeUniversity Press,1986.NumericalRecipesin C, by W. H. Press,B.P.Flannery, S.A. Teukolsky,and W. T.Vetterling,CambridgeUniversityPress,1988.

Foran interestingcollectionof papersthat dealwith evaluation,interpolation, and modulararithmeticseeThe ComputationalComplexityofAlgebraic and NumericProblems,by A. B.Borodinand I.Munro, AmericanElsevier,1975.

For a survey on the gcd and relatednumber theoreticalgorithmssee\"Number-TheoreticAlgorithms,\"by E.Bach,Annual Review of ComputerScienceA (1990):119-172.

Theuseof the FFTplus modulararithmeticfor multiplying largeprecision integerswas originallygiven by A. Schonhageand V. Strassen.

Englishaccountsof the method,which requires0(nlogn loglogn)operations to multiply two n bit integers,can be found in:The Designand Analysis of ComputerAlgorithms, by A. V. Aho, J. E.Hopcroft,and J.D.Ullman,Addison-Wesley,1974.The Art of ComputerProgramming:Semi-NumericalAlgorithms,by D.E.Knuth, Vol. II,Addison-Wesley,1969.

Chapter10

LOWER BOUND THEORY

In the previousnine chapterswe surveyed a broadrangeof problemsandtheir algorithmicsolution.Our main task for eachproblemwas to obtainacorrectand efficient solution.If two algorithmsfor solvingthe sameproblemwere discoveredand their timesdiffered by an orderof magnitude,then theone with the smallerorderwas generallyregardedas superior.But stillwe are left with the question:is therea faster method?The purposeofthis chapteris to exposeyou to sometechniquesthat have beenused toestablishthat a given algorithmis the most efficient possible.Theway thisis done is by discoveringa function g(n) that is a lower boundon the timethat any algorithmmust take to solve the given problem.If we have analgorithmwhosecomputingtimeis the sameorderas g(n), then we knowthat asymptotically we can do no better.

Recallfrom Chapterone that thereis a mathematicalnotation forexpressing lower bounds.If /(n) is the timefor somealgorithm,then we write/(n) = Q(g(n))to meanthat g(n) is a lower bound for /(n).Formally thisequationcan bewritten if thereexistpositiveconstantsc and no suchthat|/(n)|> c|<?(n)|for all n > no.In additiontodevelopinglower boundstowithin a constantfactor,we arealsoconcernedwith determiningmoreexactboundswhenever this is possible.

Derivinggoodlower boundsis often moredifficult than devisingefficientalgorithms.Perhapsthis is becausea lower bound statesa fact about allpossiblealgorithmsfor solvinga problem.Usuallywe cannotenumerateandanalyze allthesealgorithms,so lower boundproofs areoften hard to obtain.

However,for many problemsit is possibleto easily observethat a lowerbound identicalto n exists,wheren is the numberof inputs (or possiblyoutputs) to the problem.Forexample,considerall algorithmsthat findthe maximumof an unorderedsetof n integers.Clearly every integermustbeexaminedat leastonce,soU(n) is a lower bound for any algorithmthatsolvesthis problem.Or,supposewe wish to find an algorithmthat efficiently

457

458 CHAPTER10.LOWER BOUNDTHEORY

multipliestwo n x n matrices.Then 0(n2) is a lower bound on any suchalgorithmsincethereare2n2 inputs that must beexaminedand n2 outputsthat must becomputed.Boundssuchas theseareoften referredto as triviallowerboundsbecausethey aresoeasy to obtain.We know how to find themaximumof n elementsby an algorithmthat usesonly n \342\200\224 1comparisonssothere is no gap betweenthe upperand lower boundsfor this problem.But for matrixmultiplicationthe best-knownalgorithmrequires0(n2+<:)operations(e> 0), and sothereis no reasontobelievethat a bettermethodcannotbe found.1

In Section10.1we presentthe computationalmodelcalledcomparisontrees.Theseareuseful for determininglower boundsfor sortingand

searching problems.In Section10.2we examinea techniquefor establishinglowerboundscalledan oracleand study a closely relatedmethodcalledanadversary argument.Derivinglower boundswith the techniqueof reductionsis introducedin Section10.3.In Section10.4we study someargumentsthat have beenused to find lower boundsfor the arithmeticand algebraicproblemsdiscussedin Chapter9.

10.1COMPARISONTREES

In thissectionwe study the useof comparisontreesfor derivinglower boundson problemsthat arecollectivelycalledsortingand searching.We seehowthesetreesareespeciallyusefulfor modelingthe way in whicha largenumberof sortingand searchingalgorithmswork. By appealingtosomeelementaryfacts about trees,the lower boundsareobtained.

Supposethat we aregiven a setSof distinctvalueson which an orderingrelation< holds.Thesortingproblemcallsfor determininga permutationof the integers1to n, say p(l)to p(n), suchthat the n distinctvalues fromS storedin A[l :n] satisfy A[p(l)]< A[p(2)]< \342\226\240\342\226\240\342\226\240 < A\\p{n)}. Theorderedsearchingproblem asks whethera given elementx G S occurswithin theelementsin A[l :n] that areorderedsothat A[l] < \342\226\240\342\226\240\342\226\240 < A[n}. If x is in

A[l :n], then we aretodeterminean i between1and n suchthat A[i] = x.Themergingproblemassumesthat two orderedsetsof distinctinputs fromS are given in A[l : m] and B[l : n] such that A[l] < \342\226\240\342\226\240\342\200\242 < A[m] andB[l]< \342\226\240\342\226\240\342\226\240 < B[n];thesem + n values are to be rearrangedinto an arrayC[l: m + n] sothat C[l]< \342\226\240\342\226\240\342\226\240 < C[m+ n). Forall theseproblemswerestrictthe classof algorithmswe areconsideringto thosewhich worksolelyby makingcomparisonsbetweenelements.No arithmeticinvolving elementsis permitted,thoughit is possiblefor the algorithmto move elementsaround.Thesealgorithmsare referredto as comparison-basedalgorithms.We ruleout algorithmssuchas radixsortthat decomposethe values into subparts.

1SeeChapter3 for more details.

10.1.COMPARISONTREES 459

10.1.1OrderedSearchingIn obtainingthe lower boundfor the orderedsearchingproblem,we consideronly thosecomparison-basedalgorithmsin whichevery comparisonbetweentwo elementsof S is of the type \"comparex and A[i].\" The progressofany searchingalgorithmthat satisfies this restrictioncan be describedby apath in a binary tree.Eachinternalnodein this treerepresentsacomparisonbetweenx and an A[i], Therearethreepossibleoutcomesof this comparison:x < A[i], x = A[i], or x > A[i\\. We can assumethat if x = A[i], thealgorithmterminates.The left branch is taken if x < A[i], and the rightbranch is taken if x > A[i], If the algorithmterminatesfollowing a left orright branch (but beforeanother comparisonbetweenx and an elementof-A[]),then no i has beenfound such that x = A[i] and the algorithmmustdeclarethe searchunsuccessful.

Figure10.1shows two comparisontrees,one modelinga linearsearchalgorithmand the othera binary search(seeAlgorithm 3.2).It shouldbeeasy toseethat the comparisontreefor any searchalgorithmmust containatleastn internalnodescorrespondingtothe n different values of i for whichx = A[i] and at leastone externalnodecorrespondingto an unsuccessfulsearch.

Theorem10.1Let A[\\ :n], n > 1,containn distinct elements,orderedsothat A[l] < \342\226\240\342\226\240\342\226\240 < A[n\\. Let FIND(n)be the minimum numberofcomparisons needed,in the worst case,by any comparison-basedalgorithmtorecognizewhetherx G A[l :n].Then FIND(n)> [log(n+ 1)].Proof:Considerall possiblecomparisontrees that modelalgorithmstosolvethe searchingproblem.The value of FIND(n)is boundedbelowby thedistanceof the longestpath from the root toa leaf in sucha tree.Theremust be n internalnodesin all thesetreescorrespondingto the n possiblesuccessfuloccurrencesof x in A. If allinternalnodesof a binary treeareatlevels lessthan or equalto k, then thereareat most 2fc \342\200\224 1internalnodes.Thus n < 2k - 1and FIND(n)= k > [log(n+ 1)]. \342\226\241

FromTheorem10.1and Theorem3.2we canconcludethat binary searchis an optimalworst-casealgorithmfor solving the searchingproblem.

10.1.2SortingNow let'sconsiderthe sortingproblem.We can describeany sortingalgorithm that satisfiesthe restrictionsof the comparisontreemodelby a binarytree. Considerthe casein which the n numbersA[l : n] tobe sortedaredistinct.Now, any comparisonbetweenA[i] and A[j] must result in oneof two possibilities:eitherA[i] < A[j] or A[i] > A[j]. So, the comparisontree is a binary tree in which eachinternalnodeis labeledby the pairi :j


(x:A(n ))

Failure Failure

(xA<j^J>J

(Taq7)-.-(x.A(L^J-I)) (x:AlL^J+o)'\"(x:A(n))

Failure Failure Failure [Failure| Foili\302\253'\302\253| |Foilure| |Failure| Failure

Figure10.1Comparisontreesfor two searchingalgorithms

which representsthe comparisonof A[i] with A[j]. If A[i] is lessthan A[j],then the algorithmproceedsdown the left branchof the tree;otherwiseitproceedsdown the right branch.Theexternalnodesrepresentterminationof the algorithm.Associatedwith every path from the roottoan externalnodeis a uniquepermutation.To seethat this permutationis unique,notethat the algorithmswe allow are only permittedto move data and makecomparisons.Thedatamovement on any path from the rootto an externalnodeis the sameno matterwhat the initial input values are. As therearen! different possiblepermutationsof n items,and any one of thesemightlegitimatelybe the only correctanswer for the sortingproblemon a giveninstance,the comparisontreemust have at leastn!externalnodes.

Figure 10.2shows a comparisontree for sorting threeitems. The first

comparisonis A[l] :A[2]. If A[l] is lessthan A[2], then the nextcomparison


is A[2] with .A[3]. If A[2] is lessthan A[3], then the left branchleadstoanexternalnodecontaining1,2, 3.Thisimpliesthat the originalsetwasalreadysortedfor A[l] < A[2] < A[3]. The otherfive externalnodescorrespondtothe otherpossibleorderingsthat couldyield a sortedset.

Figure10.2A comparisontreefor sortingthreeitems

Example10.1Let A[l] = 21,A[2]= 13,and A[3] = 18.At the rootof thecomparisontree (in Figure10.2),21and 13arecompared,and as a result,the computationproceedsto the right subtree.Now, 13and 18arecomparedand the computationproceedsto the left subtree.Then A[l] and A[3] arecomparedand the computationproceedsto the right subtreeto yield thepermutationA[2],A[3],A[1}. \342\226\241

We considerthe worst casefor allcomparison-basedsortingalgorithms.Let T(n) be the minimum numberof comparisonsthat aresufficient tosortn itemsin the worst case.Usingourknowledgeof binary treesonceagain,ifall internalnodesareat levelslessthan k, then thereareat most 2 externalnodes(onemorethan the numberof internalnodes).Therefore,if we letk = T{n)

n\\ < 2T\342\204\242

SinceT(n) is an integer,we get the lower bound

T{n)> Tlogn!]

By Stirling'sapproximation(seeExercise7) it follows that


n 1 2 3 4 5 6 7 8 9 10 11 12 13Tlogn!] 0 1 3 5 7 10 13 16 19 22 26 29 33BISORT(n) 0 1 3 5 8 11 14 17 21 25 29 33 37

Table10.1Boundsfor minimum comparisonsorting

[logn!]= nlogn-n/ln2 + (1/2)logn +0(1)whereIn 2 refersto the natural logarithmof 2 whereaslogn is the logarithmto the base2 of n. This formula shows that T(n) is of the ordernlogn.Hencewe say that any comparison-basedsortingalgorithm needs f2(nlogn)time.(Thisboundcanbeshownto hold even whenoperationsmorecomplexthan comparisonsare allowed, for example,operationssuch as addition,subtraction,and in somecasesarbitrary analytic functions.)

Howclosedo the knownsortingmethodsget tothis lower boundof T(n)?Considerthe bottom-upversionof mergesortwhich first ordersconsecutivepairsof elementsand then mergesadjacentgroupsof size2,4,8,...untilthe entiresortedset is produced.Theworst-casenumberof comparisonsrequiredby this algorithmis boundedby

J2 (n/2i){21- 1) <nlogn-0{n) (10.1)l<i<fc

Thuswe know at leastonealgorithmthat requiresslightly lessthan nlogncomparisons.Istherestilla bettermethod?

Thesortingstrategy calledbinary insertionsortingworks in the followingway. The next unsorteditemis chosenand a binary search(seeAlgorithm3.2)is performedon the sortedset to determinewhereto placethis newitem. Then the sorteditemsare moved to make roomfor the new value.This algorithmrequires0(n2)data movements to sort the entireset butfar fewer comparisons.Let BISORT(n)be the numberof comparisonsitrequires.Then by the resultsof Section3.2

BISORT(n)= Yl riog2*l (10-2)l<fc<n

which is equalto

n[logn]-2^nT+1Now supposewe compareBISORT(n)with the theoreticallower bound.

This is done in Table10.1.ScanningTable10.1,we observethat for n =


1,2, 3,and 4, the values are the samesobinary insertionis optimal.Butfor n = 5, thereis a differenceof one and so we are left with the questionof whether7 or 8 is the minimum numberof comparisonsin the worst caseneededto sortfive items.This questionhas beenansweredby L.Ford,Jr.,and S.Johnson,who presenteda sortingalgorithmthat requireseven fewercomparisonsthan the binary insertionmethod.In fact theirmethodrequiresexactlyT(n) comparisonsfor 1< n < 11and 20 < n <21.

To seehow the Ford-Johnsonmethodworks, we considerthe sortingof17itemsthat originallyresidein SORTED[\\:17].We beginby comparingconsecutivepairsSORTED[l]:SOBTED[2],SORTED[3]:SORTED[4],..., SORTED[15\\: SORTED[16]and placingthe largeritemsinto thearray HIGHand the smalleritemsinto the array LOW. The itemLOW[9]getsSORTED[17].Then we sortthe array HIGHusingthis algorithmrecursively in nonincreasingorder,sothat HIGH[l]has the largestelement.Permutethe array LOW alsoaccordingto this permutation.When this isdone,we have LOW[l]< HIGH[1]< HIGH[2]<\342\226\240\342\226\240\342\226\240 < HIGH[8}.ThoughLOW[2] through LOW[9] remainunsorted,we do know that LOW[i] <HIGH[i]for 2 < i < 8. Now insertingLOW[2] into the sortedsetpossibly requirestwo comparisonsand at the sametimecausesthe insertionofLOW[3]to possiblyrequirethreemorecomparisonsfor a totalof five. Abetter approachis to insert first LOW [3]amongthe itemsLOW[l],HIGH[1],and HIGH[2]usingbinary insertionand then insertLOW[2].Eachinsertionrequiresonly two comparisonsand the mergedelementsarestoredback intothe array SORTED.This gives us the new relationshipsSORTED[l]<SORTED[2]< ...< SORTED[6]< HIGH[4]< HIGH[5]< HIGH[G}< HIGH[7]< HIGH[8]and LOW[i] < HIGH[i\\,for 4 < i < 8. Elevenitemsarenow sortedand sixremaintobemerged.If we insertLOW[4]followed by LOW[5],threeand four comparisonsmay beneededrespectively.Onceagain it is moreeconomicalto first insertLOW [5]and then LOW[4];eachinsertionrequiresat most threecomparisons.This gives us the newsituation SORTED[l]< \342\200\242\342\200\242\342\200\242 < SORTED[10]< HIGH[6]< HIGH[7< HIGH[8]and LOW[i] < HIGH\\i]for 6 < i < 8. InsertingLOW[7requiresonly four comparisons,and then insertingLOW [8] requiresfivecomparisons.Howeverif we insertLOW[9]and then LOW[8],LOW[7],andLOW[6],then eachitemrequiresat most four comparisons.We do theinsertions in the orderLOW[9] toLOW[6] and get the completelysortedsetof 17items.

A countof the totalnumberof comparisonsneededto sortthe 17itemsis8 to compareSORTED[i\\:SORTED[i+ 1],16to sortHIGH[1:8] usingmergeinsertionrecursively, 4 to insert LOW[3] and LOW [2], 6 to insertLOW[5] and LOW[4], and 16toinsert LOW[9] toLOW[6]-a totalof 50.Thevalue of [logn!]for n = 17is 49, and so mergeinsertionrequiresonlyonemorecomparisonthan the theoreticallower bound.


In general,mergeinsertioncanbesummarizedas follows:Let the itemstobesortedbeSORTED[\\:n].Makepairwisecomparisonsof SORTED[i]and SORTED[i+ 1];placethe largeritemsinto an array HIGHand thesmalleritemsinto array LOW. If n is odd,then the last itemof SORTEDis appendedto LOW. Now apply mergeinsertionto the elementsof HIGH.PermuteLOW usingthe samepermutation.Thenwe know that HIGH[1]<HIGH[2]< \342\200\242\342\200\242\342\200\242 < HIGH[[n/2\\]and LOW[i] < HIGH\\i]for 1 < i <[n/2\\. Now we insertthe itemsof LOW into the HIGHarray usingbinaryinsertion.However,the orderin whichwe insertthe LOWs is important.Wewant toselectthe maximumnumberof itemsin LOW sothat the numberof comparisonsrequiredto insert eachone into the already sortedlist is aconstantj. As we have seenfrom our example,the insertionproceedsinthe orderLOW{tj), LOW{tj- 1),..., LOW{tj-i+ 1),wherethe tj area set of increasingintegers.In fact tj has the form tj = 2J \342\200\224

\302\243/-i,and

in the exercisesit is shown that this recurrencerelationcan be solved togive the formula tj = (2J+1+ (\342\200\224l)J')/3.

Thus itemsare insertedin theorderLOW[S\\,LOW[2];LOW[5],LOW[%LOW[11],LOW[10],LOW[9],LOW[8],LOW[7],LOW[6];and soon.

It can be shown that the timefor this algorithmis

L 3klog2T (10.3)

l<fc<n

For n = 1to21,the values of this sum are

0,1,3,5,7,10,13,16,19,22,26,30,34,38,42,46,50,54,58,62,66Comparingthesevalues with [logn!],we seethat mergeinsertionis trulyoptimalfor 1< n < 11and n = 20,21.

10.1.3SelectionFromour previousdiscussionit shouldbe clearthat any comparisontreethat modelscomparison-basedalgorithmsfor finding the maximumof nelementshas at least2n_1externalnodessinceeachpath from the root toan externalnodemust containat leastn \342\200\224 1internalnodes.This impliesatleastn \342\200\224 1comparisonsfor otherwiseat leasttwo of the input itemsneverlosea comparisonand the largestis not yet found.

Now supposewe let Lfc(n) denotea lower bound for the number ofcomparisonsnecessaryfor a comparison-basedalgorithmto determinethelargest,secondlargest,..., A;th largestout of n elements,in the worst case.L\\{n)= n \342\200\224 1aspreviously. Sincethe comparisontreemust containenoughexternalnodestoallow for any possiblepermutationof the input, it followsimmediately that Ljt(n) > [logn(n \342\200\224 1) \342\200\242\342\226\240\342\226\240(n \342\200\224 k + 1)].


Theorem10.2Lk(n)> n \342\200\224 k+ [logn(n\342\200\224 1) \342\200\242\342\200\242\342\226\240(n \342\200\224 k + 2)] for allintegersk and n, where1< k <n.

Proof:As beforeinternalnodesof the comparisontreecontainintegersofthe form i :j that implya comparisonbetweenthe input itemsA[i] and A[j].If A[i] < A[j],then the algorithmproceedsdown the left branch;otherwiseitproceedsdown the right branch.Now considerthe setof allpossibleinputsand placeinputs into the sameequivalenceclassif their k \342\200\224 1largestvaluesappearin the samepositions.Therewill be

n(n\342\200\224 1) \342\200\242\342\200\242\342\200\242

(n\342\200\224k+2) equivalenceclasseswhichwe denoteby Ei,i = 1,2,.... Now considertheexternalnodesfor the set of inputs in the equivalenceclassEi (for somei).The externalnodesof the entiretreearealsopartitionedinto classescalledXi. Forallexternalnodesin X; the positionsof the largest,..., k \342\200\224 lst-largestelementsareidentical.If we examinethe subtreeof the originalcomparisontreethatdefines the classXi, then we observethat allcomparisonsaremadeon thepositionof the n \342\200\224 k + 1 smallestelements;in essencewe are trying todeterminethe &th-largestelement.Thereforethis subtreecan be viewedasa comparisontreefor finding the largestof n \342\200\224 k + 1elementsand it has atleast2n~k externalnodes.

Hencethe originaltreecontainsat leastn(n\342\200\224 1) \342\200\242\342\200\242\342\200\242

(n\342\200\224k+2)2n~k externalnodesand the theoremfollows. \342\226\241

EXERCISES1.Draw the comparisontreefor sortingfour elements.

2.Draw the comparisontree for sortingfour elementsthat is producedby the binary insertionmethod.

3.When equality betweenkeys is permitted,thereare 13possiblepermutations when sortingthreeelements.What are they?

4. When keysareallowedtobeequal,a comparisoncanhave oneof threeresults:A[i] < A[j], A[i] = A[j], or A[i] > A[j}. Sortingalgorithmscan thereforebe representedby extendedternary comparisontrees.Draw an extendedternary treefor sortingthreeelementswhenequalityis allowed.

5. Let TE(n)be the minimum numberof comparisonsneededto sortnitemsand to determineall equalitiesbetweenthem. It is clearthatTE(n)>T(n) sincethe n itemscouldbedistinct.Showthat TE(n) =T(n).

6.Find a comparisontree for sortingsixelementsthat has allexternalnodeson levels 10and 11.


7. Stirling'sapproximationis n!~ \\/27rn(n/e)nel^l2n\\Show how thisapproximationis usedtodemonstratethat [logn!]= nlogn\342\200\224n/(ln2) +(1/2)logn+0(1).

8.Provethat the closedform for BISORT(n)= n [logn]- 2flo\302\253nl +1iscorrect.

9. Show that log(n!)is approximatelyequalto n logn \342\200\224 n loge +0(1)byusingthe fact that the function logk is monotonicand boundedbelowby Jk-iôgxdx.

10.Showthat the sum2fc-2fc-1+2fc-2+---+ (-l)fc2\302\260= (2fc+1+(-l)fc)/3.

11.A partial orderingis a binary relation,denotedby <, such thatif x < y and y < z, x < z and if x < y and y < x, x = y. Atotal orderingis a partialorderingsuchthat for all x and y, eitherx < yory < x. Howcana directedgraphbeusedto modelapartialorderingor a totalordering?

12.Let A[\\ :n] and B[\\ :n] eachcontainn unorderedelements.Showthatif comparisonsbetweenpairsof elementsof A or B are not allowed,then Q(n2)operationsarerequiredto test whetherthe elementsof Aareidentical(thoughpossibly a permutation)of the elementsof B.

13.In the derivationof the Ford-Johnsonsortingalgorithm,the sequencetj must be determined.Explainwhy tj + tjî = 2K Then show howto derive the formula tj = (2>+1+ (\342\200\224l)J)/3.

10.2 ORACLESANDADVERSARYARGUMENTS

Oneof the prooftechniquesthat is useful for obtaininglower boundsconsistsof makinguse of an oracle.Themost famous oraclein history was calledthe Delphicoracle,locatedin Delphi,Greece.Thisoraclecanstillbefound,situatedin the sideof a hill embeddedin somerocks.In oldentimespeoplewould approachthe oracleand ask it a question.After someperiodof timeelapsed,the oraclewould reply and a caretakerwould interpretthe oracle'sanswer.

A similarphenomenontakesplacewhen we use an oracleto establishalowerbound.Given somemodelof computationsuchas comparisontrees,the oracletellsus the outcomeof eachcomparison.To derive a goodlowerbound, the oracletries its best to causethe algorithmto work as hard asit can. It doesthis by choosingas the outcomeof the next test,the resultthat causesthe most worktoberequiredto determinethe final answer.And

10.2.ORACLESAND ADVERSARYARGUMENTS 467

by keepingtrackof the work that is done,a worst-caselower bound for theproblemcan bederived.

10.2.1MergingNow we considerthe mergingproblem.Given the setsA[\\ :m] and B[l:n],wherethe itemsin A and the itemsin B are sorted,we investigatelowerboundsfor algorithmsthat mergethesetwo setstogive a singlesortedset.As was the casefor sorting, we assumethat all the m + n elementsaredistinctand that A[l] < A[2] < \342\226\240\342\226\240\342\226\240< A[m] and B[l]< B[2]<\342\226\240\342\226\240\342\226\240 < B[n).It is possiblethat after thesetwo setsaremerged,the n elementsof B canbe interleavedwithin A in every possibleway. Elementary combinatoricstellsus that thereare (mr^n) ways that the .A'sand S'scan mergetogetherwhilepreservingthe orderingwithin A and B.Forexample,if m = 3,n = 2,A[l] = x,A[2]= y,A[3] = z, B[l]= u, and B[2]= v, thereare (3+2) =10ways in which A and B can merge:u,v,x,y,z;u,x,v,y,z;u,x,y,v,z;u,x,y,z,v;x,ri,v,y,z;x,u,y,v,z;x,u,y,z,v;x,y,u,v,z;x,y,u,z,v;andx,y,z,u,v.

Thus if we use comparisontreesas our modelfor mergingalgorithms,then therewill be (m+n) externalnodes,and thereforeat least

comparisonsarerequiredby any comparison-basedmergingalgorithm.Theconventionalmergingalgorithmthat was given in Section3.4(Algorithm3.8)takesm + n\342\200\224 1comparisons.If we let MERGE(m,n) be the minimumnumberof comparisonsneededtomergem itemswith n items,then we havethe inequality

< MERGE(m,n) < m + n -1

Theexercisesshow that theseupperand lower boundscan get arbitrarilyfar apart as m getsmuch smallerthan n. This shouldnot be a surprisebecausethe conventionalalgorithmis designedto work bestwhen m and nareapproximatelyequal.In the extremecasewhen m = 1,we observethatbinary insertionwould requirethe fewest numberof comparisonsneededtomergeA[l] into B[l],...,B[n].

When rn and n areequal,the lower boundgiven by the comparisontreemodelis toolow and the numberof comparisonsfor the conventionalmergingalgorithmcan be shown to beoptimal.

Theorem10.3MERGE(m,m)= 2m -1,for m > 1.


Proof:Considerany algorithmthat mergesthe two setsA[l] < \342\200\242\342\200\242\342\200\242 <A[m] and B[l]< \342\226\240\342\226\240\342\226\240 < B[m].We already have an algorithmthat requires2m \342\200\224 1comparisons.If we can show that MERGE(m,m) > 2m \342\200\224 1,thenthe theoremfollows. Considerany comparison-basedalgorithmfor solvingthe mergingproblemand an instancefor which the final result is B[l]<A[l] < B[2]< A[2] < \342\200\242\342\200\242\342\200\242 < B[m]< A[m], that is,for which the 5'sand .A'salternate.Any mergingalgorithmmust make eachof the 2m \342\200\224 1comparisonsB[l]: A[l], A[l] : B[2],B[2]: A[2], ...,B[m]: A[m] whilemergingthe given inputs. To seethis, supposethat a comparisonof typeB[i]:A[i] is not madefor somei. Then the algorithmcannot distinguishbetweenthe previousorderingand the one in which

B[l]< A[l] < \342\226\240\342\226\240\342\226\240< A[i -1]< A[i] < B[i\\ <B[i+ 1]< \342\200\242\342\200\242\342\200\242< B[m]< A[m]

So the algorithmwill not necessarilymergethe A's and S'sproperly.Ifa comparisonof type A[i] : B[i+ 1]is not made,then the algorithmwillnot be ableto distinguishbetweenthe casesin which B[i\\ < A[l] < B[2]< \342\226\240\342\226\240\342\226\240 < B[m]< A[m] and in which B[l]< A[l] < B[2]< A[2] < \342\226\240\342\226\240\342\226\240 <A[i - 1]< B[i\\ < B[i+ 1]< A[i] < A[i + 1]< \342\200\242\342\200\242\342\200\242 < B[m]< A[m\\. Soanyalgorithmmust make all 2m \342\200\224 1comparisonsto producethis final result.The theoremfollows. \342\226\241

10.2.2 LargestandSecondLargestFor anotherexamplethat we cansolveusingoracles,considerthe problemoffinding the largestand the secondlargestelementsout of a setof n.What isa lower boundon the numberof comparisonsrequiredby any algorithmthatfinds thesetwo quantities? Theorem10.2has already provided us with ananswer usingcomparisontrees.An algorithmthat makesn \342\200\224 1comparisonstofind the largestand then n\342\200\2242 to find the secondlargestgivesan immediateupperboundof 2n \342\200\224 3.Soa largegapstillremains.

This problemwas originally stated in termsof a tennis tournament inwhich the valuesarecalledplayers and the largestvalue is interpretedas thewinner,and the secondlargestas the runner-up.Figure10.3showsa sampletournamentamongeight players.Thewinner of eachmatch (which is thelargerof the two values beingcompared)is promotedup the treeuntil thefinal round, which in this case,determinesMcMahon as the winner. Now,who are the candidatesfor secondplace?Therunner-upmust be someonewho lost toMcMahon but who did not losetoanyone else.In Figure10.3that meansthat eitherGuttag,Rosen,or Francezarethe possiblecandidatesfor secondplace.

Figure10.3leadsus to anotheralgorithmfor determiningthe runner-uponcethe winner of a tournamenthasbeenfound.The players who have lostto the winner play a secondtournamenttodeterminethe runner-up. Thissecondtournamentneedonly bereplayedalongthe path that the winner,in


McMahon

McMahon Guttag

Rosen McMahon Guttag Daks

Rosen Cline McMahonFrancez Guttag Taylor Daks Lynch

Figure10.3A tennistournament

this caseMcMahon, followed as he rosethroughthe tree. For a tournamentwith n players, thereare [logn]levels,and henceonly [logn] \342\200\224 1comparisonsare requiredfor this secondtournament.This new algorithm,which wasfirst suggestedby J. Schreierin 1932,requiresa total of n \342\200\224 2 + [logn]comparisons.Thereforewe have an identicalagreementbetweenthe knownupperand lowerboundsfor this problem.

Now we show how the samelower boundcanbe derived usingan oracle.

Theorem10.4Any comparison-basedalgorithmthat computesthe largestand secondlargestof a setof n unorderedelementsrequiresn \342\200\224 2 + [logn]comparisons.

Proof:Assume that a tournamenthas beenplayed and the largestelementand the second-largestelementobtainedby somemethod.Sincewe cannotdeterminethe second-largestelementwithout having determinedthe largestelement,we seethat at leastn\342\200\224 1comparisonsarenecessary.Thereforeallwe needtoshow is that thereis alwayssomesequenceof comparisonsthatforcesthe secondlargestto be found in [logn] \342\200\224 1additionalcomparisons.

Supposethat the winner of the tournamenthas played x matches.Thenthere are x peoplewho are candidatesfor the runner-upposition.Therunner-uphas lost only once,to the winner,and the otherx \342\200\224 1candidatesmust have lost to one otherperson.Thereforewe producean oraclethatdecidesthe resultsof matchesin sucha way that the winner plays [logn]otherpeople.

In a match betweena and b the oracledeclaresa the winner if a ispreviously undefeatedand b has lost at leastonceor if both a and b are


undefeatedbut a has won morematchesthan b. In any othercasethe oraclecan decidearbitrarily as longas it remainsconsistent.

Now, considera tournamentin which the outcomeof eachmatch isdetermined by the above oracle. Imaginedrawing a directedgraph with nverticescorrespondingto this tournament.Eachvertexcorrespondsto oneof the n players. Draw a directededgefrom vertexb toa,b ^ a, if and onlyif eitherplayer a has defeatedb or a has defeatedanother player who hasdefeatedb. It is easy to seeby inductionthat any player who has playedand won only x matchescan have at most 2X~1edgespointinginto her orhis correspondingnode.Sincefor the overall winner theremust bean edgefrom eachof the remainingn \342\200\224 1vertices,it follows that the winner musthave played at least[logn]matches. \342\226\241

10.2.3StateSpaceMethodAnother techniquefor establishinglower bounds that is relatedto oraclesis the state spacedescriptionmethod.Often it is possibletodescribeanyalgorithmfor solving a given problemby a set of n-tuples.A statespacedescriptionis a setof rulesthat show the possiblestates(n-tuples)that analgorithmcanassumefrom a given stateand a singlecomparison.Oncethestate transitionsaregiven, it is possibleto derive lower boundsby arguingthat the finish state cannot be reachedusing any fewer transitions. Asan exampleof the state spacedescriptionmethod,we considera problemoriginally dennedand solved in Section3.3:given n distinct items,findthe maximumand the minimum.Recallthat the divide-and-conquer-basedsolutionrequired[~3n/2] \342\200\224 2 comparisons.We would like to show that thisalgorithmis indeedoptimal.

Theorem10.5Any algorithmthat computesthe largestand smallestelements of a setof n unorderedelementsrequires[~3n/2] \342\200\224 2 comparisons.

Proof:Thetechniquewe usetoestablisha lowerboundis to definean oracleby a statetable.We considerthe stateof a comparison-basedalgorithmasbeingdescribedby a 4-tuple(a,b, c,d), wherea is the numberof itemsthathave never beencompared,b is the numberof itemsthat have won butnever lost,c is the numberof itemsthat have lost but never won, and d isthe numberof itemsthat have both won and lost.Originally the algorithmis in state (n,0,0,0)and concludeswith (0,1,1,n \342\200\224 2). Then, after eachcomparisonthe tuple (a,b, c,d) can make progressonly if it assumesoneofthe five possiblestatesshown in Figure10.4.

To get the state(0,1,1,n\342\200\2242)from the state(n,0,0,0),[3n/2]\342\200\2242

comparisons areneeded.To seethis,observethat the quickestway to get the acomponent to zerorequiresn/2statechangesyielding the tuple (0,n/2,n/2,0).Next the b and c componentsarereduced;this requiresan additionaln \342\200\224 2statechanges. \342\226\241


(a \342\200\224 2, b + 1,c+ 1,d) if a > 2 //Two itemsfrom a//arecompared.

(a \342\200\224 1,6,c+ 1,d) or (a \342\200\224 1,6 + 1,c,d) if a > 1 //An itemfrom aor (a \342\200\224 1,b, c,d + 1) //iscomparedwith

//onefrom b or c.(a,6-l,c,d+l) if 6 > 2 //Two itemsfrom 6

//arecompared.(a,6,c \342\200\224 1,d + 1) if c > 2 //Two itemsfrom c

//arecompared.

Figure10.4Statesfor max-minproblem

10.2.4 Selection

We end this sectionby derivinganother lower boundon the selectionproblem. We originallystudiedthisproblemin Chapter3 wherewe presentedseveral solutions.Oneof the algorithmspresentedtherehas a worst-casecomplexity of 0(n)no matterwhat value is beingselected.Thereforewe knowthat asymptotically any selectionalgorithmrequires\302\251(n)

time.Let SELfc(n)be the minimum numberof comparisonsneededfor finding the A;th elementof an unorderedsetof sizen.We have already seenthat for k = 1,SELi(n)=n \342\200\224 1and, for k = 2,SEL^n) = n \342\200\224 2 + [logn].In the following paragraphswe presenta statetablethat showsthat

n\342\200\224k+(k\342\200\2241) log2(k-i) \342\200\224 SELfc(n).We continueto use the terminology that refers to an elementof the set asa player and to a comparisonbetweentwo players as a match that must bewon by oneof the players. A procedurefor selectingthe A;th-largest elementis referredto as a tournamentthat finds the &th-bestplayer.

To derive this lower bound on the selectionproblem,an oracleisconstructed in the form of astatetransitiontablethat will causeany comparison-basedalgorithmtomakeat leastn \342\200\224 k+(k\342\200\224 1) log2(k-\\:) comparisons.Thetuplesizefor statesin this caseis two, (it was four for the max-minproblem),and the componentsof a tuple,say (Map,Set),areMap,a mappingfromthe integers1,2,...,nonto itself, and Set,an orderedsubsetof the input.The initialstate is the identity mapping(that is,Map(i) = 1,1< i < n)and the empty set.Intuitively, at any given time,the players in Setarethetop players (from amongall).In particular,the ith player that entersSetis the \302\253th-best player. Candidatesfor enteringSetarechosenaccordingtotheir Map values. At any timeperiodt the oracleis assumedto be giventwo unorderedelementsfrom the input, say a and b, and the oracleactsasfollows:


1.If a and b areboth in Setat timet, then a wins iff a > b. Thetuple(Map,Set) remainsunchanged.

2. If a is in Setand b is not in Set,then a wins and the tuple (Map,Set)remainsunchanged.

3.If a and b are both not in Setand if Map(a) > Map(b) at time t,then a wins. If Map(a) = Map(b), then it doesn'tmatterwho winsas long as no inconsistencywith any previousdecisionis made. Ineithercase,if Map(a) + Map(b)> n/(k \342\200\224 1) at timet, then Map isunchangedand the winner is insertedinto Setas a new member.IfMap(a)+Map(b)< n/(k \342\200\224 1),Setstays the sameand we setMap(theloser):=0 at timet +1and Map(thewinner):=Map(a)+Map(b)attimet + 1and, for all itemsw,w ^ a,w ^ b, Map(w)stays the same.

Lemma 10.1Usingthe oraclejustdefined, the k \342\200\224 1bestplayers will have

played at least (k \342\200\224 1) log^\".n matcheswhen the tournament is

completed.

Proof:At timet the numberof matcheswon by any player x is greaterthan or equal to [logMap(x)~\\. The elementsin Setare orderedsothatx\\ < \342\226\240\342\226\240\342\226\240< Xj. Now for all w in the input J2W Map(w)= n.Let W = {y :y isnot in Setbut Map(y) >0}.Sincefor allw in the input Map(w)<n/(k \342\200\224 l),it follows that the sizeof Setplusthe sizeof W isgreaterthan k\342\200\2241. However,sincethe elementsy in W can only belessthan someX{ in Set,if the sizeofSetis lessthan k \342\200\224 1at the endof the tournament,then any player in SetorW is a candidateto beoneof the k \342\200\224 1bestplayers.This is a contradiction,soit follows that at the end of the tournament \\Set\\ > k \342\200\224 1.

Let x be any elementof Set.If it has enteredSetby defeatingy, it canonly be becauseMap(x)+ Map(y) > n/(k \342\200\224 1) and Map(x)> Map(y).This in turn means that Map(x) > 2(k-i\\ implying that x has played at

least log2(k-i\\ matches.This is true for every memberof Setand \\Set\\ >(k-l). D

We arenow in a positionto establishthe main theorem.

Theorem10.6[Hayfil] Thefunction SELk(n)satisfies

SELfc(n) > n-k+ (k-l) log2(fc-l)Proof:Accordingto Lemma10.1,the k\342\200\2241 best players have played atleast(k \342\200\224 1) log2(k-i} matches.Any player who is not amongthe k best

10.2.ORACLESAND ADVERSARYARG UMENTS 473

players has lost at leastone match against a player who is not amongthek \342\200\224 1best.Thus therearen \342\200\224 k additionalmatchesthat were not includedin the countof the matchesplayed by the k \342\200\224 1top players.The theoremfollows. \342\226\241

EXERCISES1.Let rn = an. Then by Stirling'sapproximationlog (\302\260\"^n)

= n[(l+a) log (1+ a) \342\200\224 aloga] \342\200\224 ^ logn+ 0(1).Show that as a \342\200\224>\342\226\240 0, thedifferencebetweenthis formula and m + n \342\200\224 1gets arbitrarily large.

2.Let F(n)be the minimum numberof comparisons,in the worst case,neededto insert B[l]into the orderedset A[l] < A[2] < \342\226\240\342\226\240\342\226\240< A[n).Proveby inductionthat F(n)> [logn+ 1].

3. A searchprogramis a finite sequenceof instructionsof threetypes: (1)if (f(x) r 0) gotoLI;elsegotoL2;,wherer is either<,>,or = andx is a vector;(2)accept;and (3)reject.Thesum of subsetsproblemasks for a subsetI of the integers1,2,...,n for the inputsw\\,...,wnsuch that J2iei(wi)= b, whereb is a given number.Considersearchprogramsfor which the function / is restrictedso that it canonly makecomparisonsof the form

Z>\302\253)= b (10.4)

iei

Usingthe adversary techniqueD. Dobkinand R. Liptonhave shownthat f2(2n) suchoperationsare requiredto solve the sum of subsetsproblem(ioi,...,wn, b). Seeif you can derive their proof.

4. [W. Miller]

(a) Let (TV, R) denotethe reflexive transitiveclosureof a directedgraph (TV, E).Thus (u,v) is an edgein R if thereis a path fromu to v usingzeroor moreedgesin E. Show that R is a partialorderon N iff (TV, E) is acyclic.

(b) Provethat (TV, E U (it, v)) is acyclic iff (N,E) is acyclicand thereis no path from v to u usingedgesin E.

(c) Provethat if (TV, E) is acyclic and u and v aredistinctelementsof TV, then (N,EU{u,v))or (N,E\\J{v,u))) is acyclic.

(d) Show that it is natural to think of an oracleas constructinganacyclic digraphon the set N of players.Interpret(b) and (c) asrulesgoverninghow the oraclemay resolvematches.


10.3LOWERBOUNDSTHROUGH REDUCTIONS

Herewe discussa very importanttechniquethat canbeusedto derive lowerbounds.This techniquecallsfor reducingthe given problemto anotherproblemfor which a lower bound is already known.

Definition10.1Let Piand P2 beany two problems.We say Pi reducestoP2 (alsowrittenPi ocP2) in timeT{n)if an instanceof Pi canbeconvertedinto an instanceof P2 and a solutionfor Pi canbeobtainedfrom a solutionfor P2 in time< i~(n). \342\226\241

Example10.2Let Pi be the problemof selection(discussedin Section3.6)and P2 be the problemof sorting.Let the input have n numbers.Ifthe numbersaresorted,say in an array A[ ],the \302\253th-smallest elementof theinput can beobtainedas A[i]. Thus Pi reducesto P2 in 0(1)time. \342\226\241

Example10.3Let 5i and 52 be two setswith m elementseach. TheproblemPi is to checkwhetherthe two setsare disjoint,that is,whetherSifl 52 = 0.P2 is the sortingproblem.We can show that Pi ocP2 in 0(m)timeas follows.

Let Si = {ki,k2,...,km} and 52 = {^1,^2,\342\200\242\342\200\242\342\200\242

,\302\243m}-The instanceof

P2 to be createdhas n = 2m and the sequenceof keys to be sortedisX = (ki,l),(k2,l),...,(km,l),(ii,2),(i2,2),...,(\302\243m,2).

In otherwords,eachkey in X is a tuple and the sorting has to be done in lexicographicorder. The conversionof Pi to P2 takes0{m)time,sinceit involves thecreationof 2m tuples.

Now we have to show that a solutionto Pi is obtainablefrom the solutionto P2 in 0(m)time. Let X' be X in sortedorder. OnceX' has beencomputed,what remainsto bedoneis to sequentiallygothoughtheelementsof X' from left to right and checkwhethertherearetwo successiveelements(x,1) and (y,2) suchthat x = y. If thereareno suchelements,then Siand52aredisjoint;otherwisethey arenot. \342\226\241

Note: If Pi reducesto P2 in T(n) timeand if T{n) is a lower bound onthe solutiontimeof Pi,then, clearly, T(n) \342\200\224 r{n) is a lower bound on thesolutiontimeof P2.In Example10.3,we can infer that P2 hasa lower boundof T(n) \342\200\224 0(n),whereT(n) is a lower bound for the solutionof Pi on twosetswith m = n/2elementseach. In Chapter11we revisit the notion ofreductionin the contextof AfV-h&vd problems.

Now we presentseveralexamplesto illustratethe above techniqueofreduction in derivinglower bounds.Thefirst problemis that of computingtheconvexhull of pointsin the plane(seeSection3.8)and the secondproblemis Pi of Example10.3.

10.3.LOWER BOUNDSTHROUGHREDUCTIONS 475

10.3.1Findingthe ConvexHullGivenn pointsin the plane,recallthat the convex hull problemis to identifythe verticesof the hull in someorder(clockwiseor counterclockwise).Wenow show that any algorithmfor this problemtakesfi(nlogn)time. If wecallthe convexhull problemP<i and the problemof sortingPi,then thislower bound is obtainedby showing that P\\ reducesto P<i in 0(n)time.Sincesortingof n pointsneedsfi(nlogn)time(seeSection10.1),the resultreadily follows.

Let P\\ bethe problemof sortingthe sequenceof numbersK = k\\, A,-2,...,kn. P2 takesas input points in the plane.We convert the n numbersintothe n points(&i,k\\), (\302\243,'2, k%), \342\226\240\342\226\240

\342\226\240, (kn, k\\). Constructionof thesepointstakes0(n)time.Thesepoints lieon the parabolay = x2.

Theconvex hull of thesepointshas all the n pointsas verticesandmoreover they areorderedaccordingtotheir ^-coordinatevalues (that is,k%

values). If X =(\302\2431, \302\243'(), (\302\2432, ?%),...,(\302\243n, \302\243n)

is the output (in counterclockwiseorder) for P2,we can identify the point of X with the leastx-coordinatein0(n)time.If

(\302\243',\302\243' ) is this point,the sortedorderof K is the x-coordinates

of points in X starting from \302\243' and moving counterclockwise.Thusdetermining the output for Pi alsotakes0(n)time.

Example10.4Considerthe problemof sortingthe numbers2, 3,1,and 4.The four pointscreatedare (2,4),(3,9),(1,1),and (4,16).Theconvexhullof thesepointsis shown in Figure10.5.All the four pointsareonthe hull. Acounterclockwiseorderingof the hull is (3,9),(4,16),(1,1),and (2,4),fromwhich the sortedorderof the pointscan beretrieved. \342\226\241

Thereforewe arrive at the following lemma.

Lemma 10.2Computingthe convexhull of n given points in the planeneedsfi(nlogn)time. \342\226\241

10.3.2DisjointSetsProblemIn Example10.3we showed a reductionfrom the problemPi of decidingwhethertwo given setsaredisjointto the problemP2 of sorting.Thisreduction canthen beusedto derivea lower boundon the solutiontimeof P2,makinguse of any known lower boundsfor the solutionof Pi.But the onlylower boundwe have proved so far is for P2 and not for Pi.Now we wouldlike toderive a lower bound for Pi.

Lemma 10.3Any algorithmfor solvingPi,when the setsareof sizen each,needsfi(nlogn)time.


161412108

64

2 12 3 4

Figure10.5Sortingnumbersusingthe convex hull reduction

Proof:We show that P2 ocPi in 0(n)time.Let K = k\\, fc2,...,kn be anysequenceof n numbersthat we want to sort.Also let X = x\\,X2, \342\226\240\342\226\240

\342\226\240, xn bethe sortedorderof the sequenceK. To constructan instanceof Pi,we letSi= {(fci,0),(k2,0),...,(kn,0)}and S2 = {(fc1}1),(fc2,1),\342\200\242\342\200\242

\342\200\242, (kn, 1)}-Any algorithmfor Pi must compare(xj,1) with (xj+i,0)for each

\302\253,

1<z < n \342\200\224 1.If not, we can replace(x{,1) in 52 with (xj_|_i,0) and force thealgorithmto output an incorrectanswer. Replacing(xj,1) with (xj+i,0)doesnot in any way alterthe outcomesof othercomparisonsmadeby thealgorithm.

Our claimis that the above comparisonsare sufficient to sortK. Thesortedorderof Kcanbeobtainedby constructingagraphG(V,E) as follows:V = {k\\, fo,...,kn}and thereis adirectededgefrom ki to kj if the algorithmfor Pi has determinedthat ki < kj (for any 1 < i,j < n). This graphis constructiblein 0(n+ T(n)) time,whereT(n) is the run timeof thealgorithmfor Pi. To obtain the elementsof K in sortedorder,find thesmallestelementx\\ in 0{n)time.Findthe smallestamongall the neighborsof x\\ in G;this will bex2. Then find the smallestamongall the neighborsof X2 in G; this will be X3. And soon. The total timespent is clearly0(n+ \\V\\ + \\E\\) =0{n+T{n)). \342\226\241


10.3.3On-lineMedianFindingIn this problem,at every timestepwe aregiven a new key. We arerequiredto computethe medianof all the given keysbeforethe nextkey is given.So,if n keys aregiven in all,we have to computen medians.

Example10.5Let n = 7. Say the first key input is 7. We output themedianas 7. After this we are given the secondkey, say 15.Themedianis either7 or 15.Say the next 5 keys input are3,17,8,11,and 5, in thisorder.The correspondingmediansoutput will be 7,7 or 15,8,8or 11,and8, respectively. \342\226\241

Lemma 10.4The on-linemedian finding problem(call it P2) requiresfi(nlogn)timefor its solution,wheren is the totalnumberof keys input.

Proof:The lemmais proved by reducingPi to P2,whereP\\ is the problemof sorting. Let Pi be the problemof sorting the sequenceof keys K =ki,k2,---,kn.The instanceof P2 shouldbe such that eachkey of K isan on-linemedian.Also, the smallestkey of K shouldbe output beforethesecond-smallestkey, which in turn shouldbeoutput beforethe third smallestkey, and soon.Suchan instanceof P2 is createdby extendingthe sequenceK with \342\200\224 00s and 00s as follows:

\342\200\224 OO,\342\200\224OO,..., \342\200\224OO, k\\, &2, \342\200\242\342\200\242\342\200\242

> km OO,OO,OO,OO,..., OOOO

That is,the input consistsof n \342\200\224 00sfollowed by K and then 2n 00s.Thisinstanceof P2 can be generatedin 0(n)time. ThesolutiontoP2 consistsof An medians.Note that when the first 00 is input, the medianof all givenkeys is the smallestkey of K. Also,when the third 00 is input, the medianisthe second-largestkey of K. And soon.In other words the sortedorderofK is nothingbut the In+1stmedian,the In+3rdmedian,...,the An \342\200\224 1stmedianoutput by any algorithmfor P2.Therefore,Pi can be solved givena solutionto P2,and hencePi reducesto P2 in 0(n)time.

If S(n) is the timeneededtosortn keys and M(m) is the solutiontimeof P2 on a total of rn keys, then the above reductionshows that S(n) <M(An) + 0(n). But we know that S(n) > enlogn for someconstant c.Therefore,M(An) > enlogn -0{n).That is,M(n) > c\\ logf -0(f);thisimpliesthat M(n) = fi(nlogn). \342\226\241

10.3.4MultiplyingTriangularMatricesAn n x n matrixA whose elementsare {fly},1< i,j < n, is saidto beupper triangularif a^ = 0 whenever i >j. It is said to be lower triangularifciij = 0 for i <j. A matrixthat is eitheruppertriangularorlower triangularis saidtobetriangular.


We areinterestedin derivinglower boundsfor the problemofmultiplying

two triangularmatrices.We restrictour discussionto lower triangularmatrices.But the resultsto be derived hold for upper triangularmatricesalso.Let A and B be two n x n lower triangularmatrices.If M(n) is thetimeneededtomultiply two n x n full matrices(callthis problemPi)andMt(n) is the timeneededto multiply two lower triangularmatrices(callthisproblemP2), then clearly M(n) > Mt(n). More interestingly, it turns outthat Mt(n) = f2(M(n));that is,the problemof multiplying two triangularmatricesis asymptotically no easierthan multiplying two full matricesofthe samedimensions.

Lemma 10.5Mt(n) = fi(M(n)).

Proof:We show that Pi reducestoP2 in 0{n2)time. Notethat M(n) =fi(n2) sincethereare 2n2 elementsin the input and n2 in the output.Letthe two matricesto be multipliedbe A and B and of sizen x n each.Theinstanceof P2 to becreatedis the following:

A'OOOOOOO A O

B'OOOBOOOOO

HereO stands for the zeromatrix,that is,an n x n matrixall of whoseentriesarezeros.Both A' and B'areof size3n x 3n each.Multiplying thetwo matrices,we get 'OOO

A'B'= OOOAB O O

Thus the productAB is easily obtainablefrom the productA'B'.Problem Pi reducesto P2 in 0(n2)time. This reductionimpliesthat M(n) <Mt(3n)+0{n2);this in turn meansMt(n) >

M(\302\247)

-0(n2).SinceM(n) =n(n2), M(\302\247)

= fi(M(n)). Hence,Mt{n)= fi(M(n)). \342\226\241

Note that the above lemmaalsoimpliesthat Mt{n)= Q(M(n)).

10.3.5Invertinga LowerTriangularMatrixLet A be an n x n matrix.Also, let I be the n x n identity matrix,that is,the matrixfor which i^k =

1\302\273

f\302\260r 1< k <n, and whoseevery otherelementis zero.The elementsa^k of any matrixA arecalledthe diagonalelementsof A. Every elementof I is zeroexceptfor the diagonalelementswhich areall ones.If thereexistsan n x n matrixB suchthat AB = I, then we sayB is the inverseof A and A is saidto be invertible.The inverse of A is


alsodenotedas A 1.Not every matrixis invertible.Forexample,the zeromatrixhas no inverse.

In thissectionwe concentrateonobtaininga lower boundon the inversiontimeof a lower triangularmatrix.It canbe shown that a triangularmatrixis invertibleif and only if all itsdiagonalelementsarenonzero.Let Pi betheproblemof multiplying two full matricesand P2 be the problemof invertinga lower triangularmatrix.Let M(n) and It(n) be the correspondingtimecomplexities.We show that M(n) = Q(It(n)).

Lemma 10.6M(n) = 0(It(n)).

Proof:Theclaimis that Pi reducesto P2 in 0(n2)time,from which thelemmafollows.Let A and Bbe the two n xn full matricesto bemultiplied.We constructthe following lower triangularmatrixin 0(n2)time:

CIOOBIOO A I

wherethe O'sand 7'saren x n zeromatricesand identity matrices,respectively. C is a 3n x 3n matrix.Theinverseof C is

C~I

-BAB

0 0I 0

-A I

where \342\200\224A referstoA with all the elementsnegated.Herealsowe seethatthe productAB is obtainableeasily from the inverse of C. Thus we getM(n) < It(3n)+0(n2),and henceM(n) = 0(It(n)). \342\226\241

Lemma 10.7It(n) = 0(M(n)).

Proof:Lot A be the nxn lower triangularmatrixto be inverted.PartitionA into four submatricesof size eachas follows:

An OA21 A22

Both An and A22 are lower triangularmatricesand A21 couldpossibly bea full matrix.The inverseof A can beverified tobe

~A22 -A21.A11 A22


The above equationsuggestsa divide-and-conqueralgorithmfor invertingA. To invert A which is of sizen x n, it suffices to invert two lowertriangular matrices(An and A22) of size|x|eachand performtwo matrixmultiplications(i.e.,computeD=A^2 (A2iA^l )) and negatea matrix(D).

2D can be negatedin *j- time. Therun timeof sucha divide-and-conqueralgorithmsatisfies the following recurrencerelation:

Usingrepeatedsubstitution,we get

/((n)<2MQ+22M^)+...+ 0(n2)

SinceM(n) = f2(n2),the above simplifiesto

It{n)= 0(M(n)+n2)= 0(M(n)).D

Lemmas10.6and 10.7togetherimply that It(n) = 6(M(n)).

10.3.6Computingthe TransitiveClosureLet G be a directedgraph whoseadjacencymatrixis A. Recallthat thereflexive transitiveclosure(or simply the transitiveclosure)of G,denotedA*, is a matrixsuchthat A*(i,j)= 1if and only if thereis a directedpathof lengthzeroor morefrom nodei to nodej in G.In this sectionwe wishto computelower boundson the computingtimeof A* given A.

In the following discussionwe assumethat all the diagonalelementsof Aarezeros.Thereis an interestingrelationshipbetweenthe different powersof A and A* capturedin the following lemma

Lemma 10.8Let A be the adjacencymatrixof a given directedgraph G.Then, Ak(i:j)= 1 if and only if thereis a path from nodei to nodej oflengthexactlyequalto k, for any 0 < k <n. Herethe matrixproductsareinterpretedas follows:Scalaradditioncorrespondsto booleanor and scalarmultiplicationcorrespondsto booleanand.

Proof:We prove the lemmaby inductionon k. When k = 0,A0 is theidentity matrixand the lemmaholds,sincethereis a path of lengthzerofromevery nodeto itself.When k = 1,the lemmais alsotrue,sinceA(i,j)= 1if and only if thereis an edgefrom i to j. Assume that the lemmais truefor all path lengthsup to k \342\200\224 1,k > 1.We prove it for a path lengthof k.


If thereis a path of length k from nodei to nodej, this path consistsof an edgefrom nodei to someother nodeq and then a path from q to jof lengthk \342\200\224 1.In otherwords, usingthe inductionhypothesis, thereexistsa q such that A(i,q) = 1and A^ (g,j) = 1. If there is such a g, thenAfc(z, j) = (A* Ak~l)(i,j)is surely 1.

Conversely,if Ak(i,j) = 1,then sinceAk = A * Ak~l, thereexistsa gsuch that A(i,q) = 1and Ak~1(q,j)= 1.This meansthat thereis a pathfrom nodei to nodej of length fc. \342\226\241

If thereis a path at all from nodei to nodej in G,the shortestsuchpathwill beof length< (n \342\200\224 1),n beingthe numberof nodesin G.

Lemma 10.9A* I + A +A2 + ---+An~l. \342\226\241

Lemma10.9gives another algorithmfor computingA*. (We saw search-basedalgorithmsin Section6.3.)Let T{n) be the timeneededto computethe transitiveclosureof an n-nodegraph.

Lemma 10.10M(n) <T(3n)+0(n2),and henceM(n) = 0(T(n)).

Proof:If A and B are the given n x n matricesto be multiplied,form thefollowing 3n x 3n matrixC in 0(n2)time:

C=O A OO O BOOO

C2 is given by

C2O O ABOOOOOO

Also, Ck = O for k > 3.Therefore,usingLemma10.9,

c*= i+c+c2+ \342\226\240\342\226\240\342\226\240+cn-1 I+C+ C2I A ABO I BO O I

Given C*,it is easy to obtainthe productAB. \342\226\241

Lemma 10.11T(n) = 0(M(n)).


Proof:Theproofis analogousto that of Lemma10.7.Let G(V,E)be thegraph under considerationand A its adjacencymatrix. The matrixA ispartitionedinto four submatricesof size|x|each:

A =A21 A22

Recallthat row i of A correspondsto edgesgoingout of nodei. Let V\\ bethe setof nodescorrespondingto rows 1,2,...,ôf A and V2 be the setofnodescorrespondingto the restof the rows.

Theentry Aî(i,j) = 1 if and only if thereis a path from nodei G Vito nodej G Vi allof whose intermediatenodesarealsofrom Vi. A similarproperty holdsfor A22.

Let D = A12A21and let u and v G Vi. Then, D(u,v) = 1 if and onlyif thereexistsa w G V2 such that (u,w) and (w,v) are in E. A similarstatementholds for A21A12.

Let the transitiveclosureof G be given by

A* = Cn C12C21 C22

Our goal is to derive a divide-and-conqueralgorithmfor computingA*.Therefore,we shouldfind a way of computingCn,Ci2,C2i,and C22 fromA*n and A*22.

First we considerthe computationof Cn. Note that Cn correspondsto paths from i to j, wherei and j are in Vi. Of coursethe intermediatenodesin suchpaths couldas well be from V2. Any suchpath from i to jcan have severalsegments,whereeachsegmentstarts from a node,say uq,from Vi, goesto wq G V2 throughan edge,goesto xq of V2 througha pathof arbitrary length,and goesto yq G Vi throughan edge(seeFigure10.6).Any suchsegmentcorrespondsto An +AÂ^î-Sincetherecouldbeanarbitrary numberof suchsegments,we get

Cn = {An+Ai2A*22A2i)*

Usingsimilarreasoning,the rest of A* can alsobe determined:Cyi =CnAi2A22, C21= ^22-^21Cii)and C22=

A\\2 +A22A2iCnAi2A22.Thusthe above divide-and-conqueralgorithmfor computingA* performs

two transitive closureson matricesof sizef x|each (A22 and (An +^12^22^21)*),sixmatrixmultiplications,and two matrixadditionson ma-

S each.Thereforewe get

T(n)<2Tg)+6M(|)+0(n2


V

V,

Figure10.6A possiblepath in C\\\\

Repeatedsubstitutionyields

T(n)< 6M(|j+12Mf-]+24M/n+ +0(n2)

But, M(n) > n2, and henceM(n/2) < 4M(n). Usingthis fact, we seethatT(n) = 0(M(n)+n2) = 0(M(n)). O

Lemmas10.10and 10.11show that T(n) = 9(M(n)).

EXERCISES1.In Section3.8we stated two variants of the convexhull problem.

Lemma10.2proves a lower bound for one version of the problem.In the otherversion,we areonly supposedto find the verticesof thehull (notnecessarilyin any order).Will Lemma10.2hold even for thisversion?

2. If M(n) is the timeneededto multiply two nxn matrices,and S(n) isthe timeneededto squarean nxnmatrix,showthat M(ri) = Q(S(n))(i.e.,show that multiplying and squaringmatriceshave essentially thesamedifficulty).

3. Considerthe disjointsetsproblemof Exercise10.3.Say the elementsof 5] as well as thoseof 52areintegersfrom the range [0,nc] for someconstantc, where|5i|= IS2I= n. Will the lower bound of Lemma10.3stillhold? If yes, why? If not, presentan algorithmwith o(nlogn)time.


4. In the disjointsetsproblem,if \\S\\\\= m and ISÎ= O(l),will Lemma

10.3stillbe valid? Proveyour answer.

5.The distinctelementsproblemis totakeas input n numbersand decidewhetherthesenumbersaredistinct(i.e.,no numberis repeated).Showthat any algorithmfor the solutionof the distinctelementsproblemneedsfi(nlogn)time.

10.4 TECHNIQUESFORALGEBRAICPROBLEMS (*)

In this sectionwe examinetwo methods,substitutionand linearindependence, for derivinglowerboundsonarithmeticand algebraicproblems.Thealgebraicproblemswe considerhereareoperationson integers,polynomials,

and rationalfunctions. Solutionsto theseproblemswere presentedinChapter9. In addition we alsoincludematrixmultiplicationand relatedoperationswhich were discussedin Chapter3.

The modelof computationwe use is calleda straight-lineprogram.It iscalledthis becausethereare no branchinginstructionsallowed. Thisimplies that if we know a way of solvinga problemfor n inputs, then a setofstraight-lineprograms,oneeachfor solvinga different sizen, can be given.Theonly statementin a straight-lineprogramis the assignmentwhich hasthe form S:=p opg;.HereS,p,and q arevariablesof boundedsizeand opis typically oneof the arithmeticoperations:addition,subtraction,multiplication,

ordivision.Moreoverp and q areeitherconstants,input variables,orvariablesthat have already appearedonthe left of an assignmentstatement.Forexample,onepossiblestraight-lineprogramthat computesthe value ofa degree-twopolynomial a2X2+ a\\x + ao has the form

vl := a2 * x;

vl := vl + a\\;

vl := vl * x;

ans := vl +ao;

To determinethe complexityof a straight-lineprogram,we assumethat eachinstructiontakesoneunit of timeand requiresoneunit of space.Then thetimecomplexityof a straight-lineprogramis its numberof assignmentsorits length. A morerealisticassumptiontakesinto accountthe fact that anintegern requires[lognj+ 1bitsto representit. But in this sectionweassumethat alloperandsaresmallenoughto occupy a fixed-sizedregister,and hencethe unit-costassumptionis appropriate.

10.4.TECHNIQUESFORALGEBRAICPROBLEMS(*) 485

Now we needto considerthe classof constantswe intend to allow.Thisrequiressomeelementarydefinitions from algebra.

Definition10.2A ring is an algebraicstructurecontaininga set ofelements S and two binary operationsdenotedby + and *. Foreacha,6 GSta + b arid a * b arealsoin S.Also the following propertieshold:

(a + b) +c(a * b) * c

a + b

(a + b) * ca* (b +c)

a +0a * 1

= a+ (b + c) and= a * (b * c)= b +a= a*c+b*c and= a *b+a *c= 0 + a = a= 1* a = a

(associativity)(commutativity)

(distributivity)(additiveidentity, 0 G S)(multiplicativeidentity, 1G S)

Foreacha G 5, there is an additive inverse denotedby \342\200\224a such thata +

(\342\200\224a)

=(\342\200\224a)

+ \302\253

= 0. If multiplicationis alsocommutative,then thering is calledcommutative. \342\226\241

Definition10.3A field is a commutativering such that for eachelementa E S (other than 0), thereis a multiplicativeinverse denotedby a-1G Sthat satisfies the equationa * a-1= 1. \342\226\241

Example10.6The realnumbersform a field underthe regularoperationsof additionand multiplication.Similarlyfor the complexnumbers.However,the integerswith operations+ and * do not form a field sinceonly plus orminus one has a multiplicativeinverse.Another field is the set of integersmoduloa primeas discussedin Chapter9.It forms a finite field consistingof the integers{0,1,...,p\342\200\224 1}. \342\226\241

Definition10.4An indeterminateover an algebraicsystem 5 is asymbol that doesnot occurin S. The extensionof S by the indeterminatesx\\,...,xn is the smallestcommutativering that containsall combinationsof the elementsof Sand the indeterminates.Suchan extensionis denotedbyS[x\\,...,xn].When an extensionis madeto a field that allowsfor quotientsof combinationsof elementsof S and indeterminates,then that is denotedby S(xi,...,xn). n

Theelementsin an extensionS[x\\,...,xn] canbeviewedas polynomialsin the variablesx^ with coefficientsfrom the set S.Theelementsin anextension S(x\\,\342\226\240\342\226\240\342\226\240,xn) shouldbeviewedas rationalfunctions of the variablesXi with coefficientsthat arefrom S.The indeterminatesareindependentinthe sensethat no one can be expressedby the others,and hencetwo suchpolynomials or rationalfunctions areequalonly if one can be transformedinto the otherusingthe laws of the ringorfield.


Thefield of constantscanmakean importantdifferencein the complexityof the algorithmsfor someproblems.Forexample,if we wish toexamineprogramsfor computingx2 +y2, wherethe field is the reals,then twomultiplications arerequired.Howeverif the field is the complexnumbers,thenonly onecomplexmultiplicationis needed,namely (x + iy)(x \342\200\224 iy).

Theorem10.7Every algorithmfor computingthe value of a generalnth-degreepolynomial that usesonly +, \342\200\224,

and * requiresn additionsorsubtractions.

Proof:Any straight-lineprogramthat computesthe value of anxn+ \342\200\242\342\200\242\342\200\242+aocanbetransformedinto a programto computean + \342\226\240\342\226\240\342\226\240+ao given somefieldof constantsF and a vectorA = (an,...,ao) of indeterminates.This newprogramis producedby insertingthe statements := 1;at the beginningand then replacingevery occurrenceof x by s.We now prove by inductionthat an + \342\200\242\342\200\242\342\200\242+ ao requiresn additionsor subtractions.Forn = 1,we needto computea\\ + ao as an elementin F[a\\,ao].If we disallow additionsor subtractions,then by the definition of extensiononly productsof theaj multipliedby constantsfrom the field can be produced.Thus a\\ + aorequiresoneaddition.Now supposewe have computeda sum or differenceof at least two terms,whereeachterm is possibly a productof elementsfrom the vectorA and possibly a field element.Without lossof generalityassumethat an appearsin oneof theseterms.If we substitutezerofor an,then this eliminatesthe needfor this first additionor subtractionsinceoneof the argumentsis zero.We arenow computingan-\\+ \342\200\242\342\200\242\342\200\242+ ao which bythe inductionhypothesesrequiresn \342\200\224 1additionsorsubtractions.Thusthetheoremfollows. \342\226\241

Thebasicideaof thisproofis the substitutionargument.Usingthe sametechnique,onecan derive a not much morecomplicatedtheoremthat showsthat Horner'srule is optimalwith respecttomultiplicationsor divisions.

Definition10.5SupposeF and G are two fields suchthat F is containedin G and we arecomputingin G(a\\,...,an).The operation/ opg, whereop is * or /, is saidto be inactive if one of the following holds:(1) g G F,(2)/ G F and the operationis multiplication,or (3) / G G and g G G. \342\226\241

Any multiplicationordivision that is not inactiveis calledactive.So,forexample,operationssuchasx * x or 15* a$ are inactivewhereasoperationssuchas x * en or a\\ * 02 or 15/ajareactive.

Definition10.6Let A = (ao,...,an).Then pi(A),...,pu(A) is linearlyindependentif theredoesnot exista nontrivial set of constantsc\\,...,cnsuchthat ^CiPi= a constant. \342\226\241

10.4.TECHNIQUESFOR ALGEBRAICPROBLEMS(*) 487

Thepolynomial P(A,x) canbe thought of as a generalpolynomial in thesensethat it is a function not only of x but alsoof the inputsA. We canwriteP(A,x) as ^2{pi(A)xl)+ r(x), whereu of the Pi are linearly independent.

Theorem10.8[Borodin,Munro] If u active * or / arerequiredto computeP(A,x),then n active* or / arerequiredto evaluatea generalnth-degreepolynomial.

Proof:Theproofproceedsby inductiononu.Supposeu = 1.If thereis noactive * or /, then it is only possibleto form Pi(A) + r(x) for somei. Nowsuppose(pi{A)+ r'l (x))* {pj{A)+^(a;)) is the first active multiplicationina straight-lineprogramthat computesP(A,x).Without lossof generalityassumethat Pj{A) ^ a constant. Then, in the straight-lineprogramletPj(A)+ 7-2(2;)bereplacedby a constantd suchthat no illegaldivision byzerois caused.This can alwaysbe donefor if pj is a linearcombinationofconstantsCj timesaj and sincetheremust exista j :Cj ^ 0,then by setting

a.j = ^ c%a% + r2{x)-d (10.5)

it follows that (pj(A)+ r-2{x))= d. Now considerP(A,x),wherethesubstitution of aj has beenmade.Thepolynomial P can berewritten in theform

Y^ Pl(x)xl+r'(x) (10.6)0<\302\253<n

Thereforeby makingthe onereplacement,we can remove oneactivemultiplication or division and we arenow computinga new expression.If it canbeshown that thereareu \342\200\224 \\ linearly independentpj,then by the inductionhypothesis thereareat leastu \342\200\224 1remainingactive* or / and the theoremfollows.Proofof this canbe found in the exercises. \342\226\241

Corollary10.1Horner'srule is an optimalalgorithmwith respectto thenumberof multiplicationsand divisionsnecessaryto evaluatea polynomial.

Proof:Fromthe previoustheorem,the result in the exercisesthat undersubstitutionu \342\200\224 1 linearly independentcombinationsremain,and the factthat Horner'srule requiresonly n multiplications,the corollary follows. \342\226\241

Another methodof prooffor deriving lower boundsfor algebraicproblemsis to considertheseproblemsin a matrixsetting.Returningto polynomialevaluation,we can expressthis problemin the following way: computethe1x (n + 1) by (n + 1) x 1matrixproduct


1xx* X

a0

(10.7)

which is the productof two vectors.Another problemis complexnumbermultiplication.Theproductof (a+

ib)(c+ id) = ac \342\200\224 bd + (be+ad)i canbewritten in termsof matricesas

a \342\200\224b

b aac \342\200\224 bdbe+ad ;io.8)

In moregeneraltermswe wish toconsiderproblemsthat canbe formulatedas the productof a matrixtimesa vector:

an,

flmli

a\\r Xl

xr.;io.9)

Definition10.7Let F be a field and x\\,...,xnbe indeterminates.LetFm[xi,...,xn]stand for the m-dimensionalspaceof vectorswithcomponents from F[x\\,...,xn] and Fmstand for the m-dimensionalspaceofvectors with componentsfrom F.A setof vectorsv\\, \342\226\240.., Vk from Fm[x\\,...,xn]is linearly independentmodulo Fm if for m,...,Uk in F the sum X^=i(uiui)= 0 in Fmimpliesthe ui areallzero.If the vi arenot linearly independent,then they arecalledlinearly dependentmoduloFm. The row rank of amatrix A moduloFr is the numberof linearly independentrows moduloFr.The column rank is the numberof linearly independentcolumns. \342\226\241

We now state the maintheoremof this section.

Theorem10.9sion field F[x\\,indeterminates.

Let A be an r x s matrixwith elementsfrom the exten-..,xn]and y = [yi,...,ys] a columnvectorcontainings

1.If the row rank of A is v, then any computationof Ay requiresat leastv activemultiplications.

2. If the columnrank of A is w, then any computationof Ay requiresatleastw active multiplications.

10.4.TECHNIQUESFOR ALGEBRAIC PROBLEMS(*) 489

3. If A containsa submatrixB of sizev x w such that for any vectorsp \342\202\254 Fv and q \342\202\254 Fw, pTBq\342\202\254 F iff p = 0 orq = 0,thenany computationof Ay requiresv +w \342\200\224 1multiplications.

Proof:Fora proofof part (1)seethe paperby S.Winograd.Fora proofofparts (2) and (3) seethe papersby C.Fiduccia.Also seeA. V. Alio, J.E.Hopcroftand J.D.Ullman. \342\226\241

Example10.7Reconsiderthe problemof multiplying two 2x2matrices

a bc d

'e fg h

ae+ bgce+ dg

af + bh

cf+dh

whichby definitionseeminglyrequireseightmultiplications.We canrephrasethis computationin termsof a matrix-vectorproductas shown in Figure10.7.The first 2x2matrix,say A, has beenexpandedas the 4x4 matrix

A OO A

This matrixis then decomposedinto a sum of seven matrices,eachof size4x4.Both the row rank and the columnrank of eachmatrixis one andhenceby Theorem10.9we seethat seven multiplicationsarenecessary. \342\226\241

Example10.8Given two complexnumbersa + ib and c+id,the product(a+ ib)(c+id) = ac \342\200\224 bd + i(ad+be) canbedescribedby the matrix-vectorcomputation

ab

~ba

cd

ac \342\200\224 bdbc+ cd (10.10)

which seeminglyrequires4 multiplications,but it can alsobewrittenas

0a + b0 + -b

bcd fl0.ll)

Therow and columnrank of the first matrixis two whereasthe row andcolumn rank of the secondmatrixis 1.Thusthreemultiplicationsarenecessary.The productcan becomputedas:

1.a(d \342\200\224 c)2. (a +b)c

3.b{c+d)


a 6 0 0 1c d 0 00 0 a b0 0 c d

e9fh

(

V

a-b 0 0 00 0 0 0

a +6 0 0 00 0 0 0

+b 6 0 0-6-60 00 0 0 00 0 0 0

+0 0 00 0 00 0 00 0 0

0c \342\200\224 d

0-c+d

+0 00 00 00 0

00-cc

00-cc

+0 0 0 00 0 0 0

a +c 0 a +c 00 0 0 0

+0 0 0 00 b+d 0 b+d0 0 0 00 0 0 0

+0 0 0

b +c 0 0-b-c 0 00 0 0

0-b-cb +c

0

\\

)

e9fh

Figure10.7Multiplying two 2x2matrices

Then (2)- (3) = ac-bdand (1) + (2) = ad+be. \342\226\241

Example10.9Equation10.7phrasesthe evaluationof an nth-degreepolynomial in termsof a matrix-vectorproduct.Thematrixhas n linearlyindependent columnsmodulothe constantfield F,and thus by Theorem10.9,nmultiplicationsarenecessary. \342\226\241

In this sectionwe've already seenthat any algorithmthat evaluatesageneralnth-degreepolynomial requiresn multiplicationsor divisionsand nadditionsor subtractions.Thisassertionwas basedon the assumptionthatthe input into any algorithmwas both the value of x plus the coefficientsof the polynomial.We might takeanotherview and considerhow well wecan do if the coefficientsof the polynomial areknown in advanceandfunctions of thesecoefficientscan be computedwithout costbeforeevaluationbegins.Thisprocessof computingfunctions of the coefficientsis referredtoas preconditioning.

10.4.TECHNIQUESFORALGEBRAIC PROBLEMS(*) 491

Supposewe begin by consideringthe generalfourth-degreepolynomialA(x) = CJ4X4 +a$ar+a2x2+ a\\x +a^x0 and the scheme

y :=(x + c0)x+a; A{x):={{y+x + c2)y+c3)c4;

Only threemultiplicationsand five additionsarerequiredif we candeterminethe values of the c, in termsof the a,.ExpandingA(x) in termsof x andthe Cj, we get

A(x) = C4X4 + (2c0C4+ C4)x3+ (cq+2ci+ c0c4+ c2Ci)x2+ (2c0CiC4+ C\\C,\\ + C0C2C4)x+ (c^C4+C1C2C4+ C3C4)

and equatingthe above coefficientswith the aj,we get

c4 = a4; c0 = (a3/a4- l)/26 = a2/a4-c0(c0+ 1)ci = ai/a4- cQb; c2 = b -

2c\302\261; C3= a0/a4-ci(ci+c2)

Example10.10Applying the above methodto the polynomial A(x) \342\200\224

\342\200\224xA + 3x3 \342\200\224 2x2 +2x+ 1yields the straight-lineprogram

q:=x-2',r :=q * x;y:=r-2;s :=y + x;t :=s + 4;u :=t * y;v :=u +3;p := \342\200\224 1* u;

which evaluatesto A(x) in just threemultiplications. \342\226\241

In fact it can be shown that for any polynomial A(x) of degreen > 3,thereexistrealnumbersc,di,and e,for 0 < i < [n/2]\342\200\224 1 = m suchthatj4(x)can beevaluatedin [n/2\\+ 2 multiplicationsand n additionsby thefollowing scheme:

y :=x +c; w :=y * y\\

z := (an *y +do) *y + eo {neven); z :=an * y + eo (n odd);2 :=z * (w \342\200\224 di) + ti\\ for i = 1,2,...,m;answer:=z',


Now that we have a schemethat reducesthe numberof requiredmultiplications by about one-half,it is natural to ask how closewe have cometothe optimal.The lower boundwe areabout to presentfollows from the factthat any straight-lineprogramcan be put into a normalform involving alimitednumberof constants.We restrictour argumentshereto programswithout division,leaving the extensiontointerestedreaders.

Lemma 10.12[Motzkin 1954]For any straight lineprogramwith k

multiplications and a singleinput variable x, thereexistsan equivalent programusingat most 2k constants.

Proof:Let Sj, 0 < i < k, denotethe result of the ith multiplication.Wecan rewritethe programas

so \342\200\242\342\200\242= x;si :=Li * Rf, for 1< i < k

A(x) :=Lk+i;

whereeachLi and Ri is a certainsum of a constant(whichmay accumulateotherconstantsfrom the originalprogram)and an earlierSj (an Sj mayappearseveraltimesin this sum). Thefirst product\302\253i

= (c\\ +m\\x)(c2+m,2x) can be replacedby s\\ = mx(x+ c), where m = m\\m2 and c =m\\C2 + TO2C1,provided that laterconstantsaresuitably altered. \342\226\241

Lemma 10.13[Belaga1958]Forany straight-lineprogramwith k addition-subtractionsand a singleinput variablex, thereexistsan equivalentprogramusingat most k + 1constants.

Proof:Let Sj,0 < i < k, be the resultof the kth addition-subtraction.Asin the previousproof, we can rewritethe programas

so :=x;Si :=Ci * pi + di * qi; 1< i < k

A(x) :=cfc+i*Pk+i',

whereeachpi and g2 is a productof earliersj.Fork = 1,2,...,replaceSiby Si = (cid~[l)pi+ qi simultaneouslyreplacingsubsequentreferencesto Siby diSi. \342\226\241

Theorem10.10[Motzkin, Belaga]A randomly selectedpolynomial ofdegree n has probability zeroof beingcomputableeitherwith lessthan \\{n + 1)/2]multiplications-divisionsor with lessthan n addition-subtractions.

10.4.TECHNIQUESFOR ALGEBRAIC PROBLEMS(*) 493

Proofsketch:If a given straight-lineprogramwith the singleinputvariable x has only a few operations,then we can assumethat it has at mostn constants.Eachtimetheseconstantsareset,they determinea set ofcoefficients of the polynomialcomputedby the last operationof the program,GivenA(x)of degreen, the probability is zerothat the program'sn orfewerconstantscan beadjustedto alignthe computedpolynomial with alln + 1of the given polynomial coefficients.A formal proofhererelieson showingthat the subsetof (n + l)-dimensionalspacethat can besorepresentedhasLebesquemeasurezero,It follows (becausethe setof straight-lineprogramsis enumerableif we identify programsdiffering only in their constants)thatthe constantsof any such short programcan be set soas to evaluatethepolynomial with only zeroprobability, \342\226\241

The above theoremshows that the preconditioningmethodpreviouslygiven comesvery closeto beingoptimal,but someroomfor improvementremains.

EXERCISES

1.Let A be an n x n symmetric matrixA(i,j)= A(j,i)for 1< i,j< n.Show that if p is the numberof nonzeroentriesof A(i,j),i< j, thenn + p multiplicationsaresufficient to computeAx.

2. Showhow an n xn matrixcanbemultipliedby two nxlvectorsusing(3n2 + 5n)/2 multiplications.

3. [Borodin,Munro] Thisexercisecompletesthe proofof Theorem10.9,Let pi(ai...as),\342\226\240\342\226\240\342\226\240

,pu(\302\260i\342\226\240\342\226\240\342\226\240o>s) beu linearly independentfunctions

of a,\\,...,as.Let a\\ = p(a,2 \342\200\242\342\200\242\342\200\242as).

Then show that thereareat leastu \342\200\224 1linearly independentpt = pi,wherea-\\ is replacedby p.

4. [W. Miller]Show that the innerproductof two n-vectorscan becomputed in [n/2]multiplicationsif separatepreconditioningof the vectorelementsis not counted.

5.Considerthe problemof determininga lower bound for the problemof multiplying an m x n matrixA by an n x 1vector. Show howtoreexpressthis problemusinga different matrixformulation sothatTheorem10.9can beappliedand yield the lower boundof mn

multiplications.

6. Write an exponentiationprocedurewhich computesxn usingthe low-orderto the high-orderbits of n.


10.5REFERENCESAND READINGSFora detailedaccountof lower boundsfor sorting,merging,and selecting,seeSections5.3,5.3.1,5.3.2,and 5.3.3in The Art of ComputerProgramming,

Vol. IllSortingand Searching,by D.E.Knuth,Addison-Wesley,1973.

Thesortingalgorithmthat requiresthe fewestknownnumberofcomparisons was originallypresentedby L.Ford,Jr.,and S.Johnson.

The lower boundon the selectionproblemcan be found in \"Boundsforselection,\"by L.Hyafil, SIAM Journalon Computing5,no.1(March1976):109-114.

Exercise3 (Section10.2)is due toD.Dobkinand R. Lipton.Theproofof Lemma10.3is from \"A lower boundresult for the common

elementproblemand its implicationfor reflexive reasoning,\"by P.Dietz,D.Krizanc,S.Rajasekaran,and L.Shastri,manuscript,1993.

Many of the algebraiclower boundscan be found in the following twobooks:The Computation Complexityof Algebraic and NumericProblems,by A.Borodinand I.Munro, AmericanElsevier,1975.The Design and Analysis of ComputerAlgorithms by A. V. Aho, J. E.Hopcroft and J.D.Ullman,Addison-Wesley,1974.

Theorem10.9was proven by S.Winograd and Fiduccia.See:\"On the numberof multiplicationsnecessarytocomputecertainfunctions\"by S.Winograd, Comm.Pureand AppliedMath. 23 (1970):165-179.\"On obtainingupperboundson the complexityof matrixmultiplication\"by C.Fiduccia,Proc.IBMSymposiumon complexityof computercomputations,

March 1972.\"Fast matrixmultiplication\"by C.Fiduccia,Proc.3rd Annual ACMsymposium on theory of computing,(1971):45-49.

Chapter 11A/\"P-HARDANDA/\"P-COMPLETEPROBLEMS

11.1BASICCONCEPTSIn this chapterwe areconcernedwith the distinctionbetweenproblemsthatcan be solved by a polynomial timealgorithmand problemsfor which nopolynomial timealgorithmis known. It is an unexplainedphenomenonthatfor many of the problemswe know and study, the bestalgorithmsfor theirsolutionshave computingtimesthat clusterinto two groups.The first groupconsistsof problemswhose solutiontimesare boundedby polynomials ofsmalldegree.Exampleswe have seenin this bookincludeorderedsearching,which is O(logn),polynomial evaluationwhich is 0(n),sorting which is0(nlogn),and stringeditingwhich is 0(mn).

The secondgroup is madeup of problemswhose best-knownalgorithmsarenonpolynomial.Exampleswe have seenincludethe traveling salespersonand the knapsackproblemsfor which the bestalgorithmsgiven in this texthave complexities0(n22n)and 0(2n'2)respectively.In the quest to developefficient algorithms,no onehas beenableto developa polynomialtimealgorithm for any problemin the secondgroup.This is very importantbecausealgorithmswhose computingtimesare greaterthan polynomial (typicallythe timeis exponential)very quickly requiresuchvast amountsof timetoexecutethat even moderate-sizeproblemscannotbe solved (seeSection1.3for moredetails).

The theory of ./V'P-completenesswhichwe presentheredoesnot provide amethodof obtainingpolynomial timealgorithmsfor problemsin the secondgroup. Nor doesit say that algorithmsof this complexitydo not exist.Instead,what we do is show that many of the problemsfor which thereare

495

496 CHAPTER11.AfV-HARD AND MV-COMPLETEPROBLEMS

no known polynomial timealgorithmsarecomputationallyrelated.In fact,we establishtwo classesof problems.Thesearegiven the namesA/T'-hardand AfV-complete.A problemthat is A/T'-completehas the property thatit can be solved in polynomial time if and only if all otherA/T'-completeproblemscan alsobe solved in polynomial time. If an A/T'-hard problemcan be solved in polynomial time,then all A/T'-completeproblemscan besolved in polynomial time. All A/T'-completeproblemsareAfV-hard, butsomeAfV-havd problemsarenot known to beA/T'-complete.

Although onecan define many distinctproblemclasseshaving theproperties statedabove for the A/'P-hardand A/T'-completeclasses,the classeswe study arerelatedto nondeterministiccomputations(to bedefined later).The relationshipof theseclassesto nondeterministiccomputationstogetherwith the apparentpower of nondeterminismleadsto the intuitive (thoughas yet unproved)conclusionthat no A/T'-completeor AfV-hard problemispolynomiallysolvable.

We seethat the classof A/T'-hard problems(and the subclassof HV-completeproblems)is very richas it containsmany interestingproblemsfroma wide variety of disciplines.First,we formalize the precedingdiscussionofthe classes.

11.1.1NondeterministicAlgorithmsUp to now the notionof algorithmthat we have beenusinghas the propertythat the resultof every operationis uniquely defined.Algorithms with thisproperty are termeddeterministicalgorithms.Suchalgorithmsagreewiththe way programsareexecutedona computer.In a theoreticalframeworkwecan remove this restrictionon the outcomeof every operation.We canallowalgorithmstocontainoperationswhoseoutcomesare not uniquely definedbut are limitedto specifiedsetsof possibilities.The machineexecutingsuchoperationsis allowed to chooseany one of theseoutcomessubjecttoa terminationconditionto be defined later. This leadsto the conceptof anondeterministicalgorithm.To specify suchalgorithms,we introducethreenew functions:

1.Choice(S')arbitrarily choosesoneof the elementsof set 5*.

2. Failure() signalsan unsuccessfulcompletion.3.Success()signalsa successfulcompletion.

Theassignmentstatementx := Choice(l,n)couldresult in x beingassigned any oneof the integersin the range[1,n].Thereis no rulespecifyinghow this choiceis to bemade.TheFailure() and Success()signalsareusedtodefinea computationof the algorithm.Thesestatementscannotbeusedtoeffect a return.Whenever thereis a setof choicesthat leadsto a successful

11.1.BASICCONCEPTS 497

completion,then one suchset of choicesis always madeand the algorithmterminatessuccessfully.A nondeterministicalgorithm terminates

unsuccessfully if and only if there existsno set of choicesleading to a successsignal.The computingtimesfor Choice,Success,and Failure are taken to be 0(1).A machinecapableof executinga nondeterministicalgorithmin this wayis calleda nondeterministicmachine.Although nondeterministicmachines(as defined here) do not existin practice,we seethat they provide strongintuitive reasonsto concludethat certainproblemscannotbe solvedby fastdeterministicalgorithms.

Example11.1Considerthe problemof searchingfor an elementx in agiven setof elementsA[\\ :?t],n > 1.We arerequiredto determinean indexj suchthat A[j] = x or j = 0 if x is not in A. A nondeterministicalgorithmfor this is Algorithm 11.1.

1 j :=Choice(l,n);2 if A[j] = x then{write(j);Success();}3 write (0); FailureQ;

Algorithm11.1Nondeterministicsearch

Fromthe way a nondeterministiccomputationis defined, it follows thatthe number0 can beoutput if and only if thereis no j suchthat A[j] = x.Algorithm 11.1is of nondeterministiccomplexity0(1).Note that sinceA isnot ordered,every deterministicsearchalgorithmis of complexityQ(n). \342\226\241

Example11.2[Sorting] Let A[i], 1< i < n, be an unsortedarray ofpositive integers.ThenondeterministicalgorithmNSort(.A,n) (Algorithm 11.2)sortsthe numbersinto nondecreasingorderand then outputs them in thisorder. An auxiliary array B[\\ : n] is used for convenience.Line 4initializes B tozerothough any value different from all the A[i] will do.In thefor loopof lines5 to 10,eachA[i] is assignedto a positionin B. Line 7nondeterministicallydeterminesthis position.Line 8 ascertainsthat B[j]has not already beenused.Thus, the orderof the numbersin B is somepermutationof the initialorderin A. Thefor loopof lines11and 12verifiesthat B is sortedin nondecreasingorder.A successfulcompletionis achievedif and only if the numbersareoutput in nondecreasingorder.Sincethereisalways a set of choicesat line 7 for suchan output order,algorithmNSortis a sortingalgorithm.Itscomplexityis 0(n).Recallthat alldeterministicsortingalgorithmsmust have a complexityQ(nlog7i). \342\226\241

498 CHAPTER11.MV-EARD AND MV-COMPLETEPROBLEMS

123456789101112131415

AlgorithmNSort(^4,n)II{

}

Sort n positiveintegers.

for i :=1to n doB[i]:=0;// InitializeB[].for i :=1to n do{

j :=Choice(l,n);if B[j] 0thenFailure();B\\j]:=A[i\\;

}for i :=1to n -1do // Verify order.

if B[i]> B[i+ 1]thenFailure();write (B[l:n]);SuccessQ;

Algorithm11.2Nondeterministicsorting

A deterministicinterpretationof a nondeterministicalgorithmcan bemadeby allowingunboundedparallelismin computation.In theory, eachtimea choiceis to be made,the algorithmmakesseveralcopiesof itself.One copy is madefor eachof the possiblechoices.Thus, many copiesareexecutingat the sametime.Thefirst copy to reacha successfulcompletionterminatesall othercomputations.If a copy reachesa failure completion,then only that copy of the algorithmterminates.Although thisinterpretation may enableonetobetterunderstandnondeterministicalgorithms,it isimportantto rememberthat a nondeterministicmachinedoesnot make anycopiesof an algorithmevery timea choiceis to bemade.Instead,it has theability toselecta \"correct\"elementfrom the setof allowablechoices(if suchan elementexists)every timea choiceis to be made.A correctelementisdefined relativeto a shortestsequenceof choicesthat leadstoa successfultermination.In casethereis no sequenceof choicesleadingto a successfultermination,we assumethat the algorithmterminatesin one unit of timewith output \"unsuccessfulcomputation.\"Whenever successfulterminationis possible,a nondeterministicmachinemakesa sequenceof choicesthat isa shortestsequenceleadingto a successfultermination.Since,the machinewe are defining is fictitious,it is not necessaryfor us to concernourselveswith how the machinecan make a correctchoiceat eachstep.Definition11.1Any problemfor which the answer is eitherzeroorone iscalleda decisionproblem.An algorithmfor a decisionproblemis termed


a decisionalgorithm. Any problemthat involves the identificationof anoptimal(eitherminimum or maximum)value of a given cost function isknown as an optimization problem.An optimization algorithm is used tosolve an optimizationproblem. \342\226\241

It is possibleto constructnondeterministicalgorithmsfor which manydifferent choicesequencesleadto successfulcompletions.Algorithm NSortof Example11.2is onesuchalgorithm.If the numbersA[i] arenot distinct,then many different permutationswill result in a sortedsequence.If NSortwerewritten to output the permutationusedratherthan the .A[i]'sin sortedorder,then its output would not be uniquely defined.We concernourselvesonly with thosenondeterministicalgorithmsthat generateuniqueoutputs.In particularwe consideronly nondeterministicdecisionalgorithms.Asuccessful completionis madeif and only if the output is 1.A 0 is output ifand only if thereis no sequenceof choicesleadingto a successfulcompletion.Theoutput statementis implicitin the signalsSuccessand Failure. Noexplicit output statementsarepermittedin a decisionalgorithm.Clearly,ourearlierdefinitionof a nondeterministiccomputationimpliesthat the outputfrom a decisionalgorithmis uniquely defined by the input parametersandalgorithmspecification.

Although the ideaof a decisionalgorithmmay appearvery restrictiveatthis time,many optimizationproblemscanberecastinto decisionproblemswith the property that the decisionproblemcanbesolvedin polynomialtimeif and only if the correspondingoptimizationproblemcan. In other cases,we can at leastmake the statementthat if the decisionproblemcannotbesolved in polynomial time,then the optimizationproblemcannoteither.

Example11.3[Maximum clique] A maximalcompletesubgraphof a graphG = (V,E) is a clique.The sizeof the cliqueis the numberof verticesin it.The max cliqueproblem is an optimizationproblemthat has to determinethe sizeof a largestcliquein G. Thecorrespondingdecisionproblemis todeterminewhetherG has a cliqueof sizeat leastk for somegiven k. LetDCIique(G,k) be a deterministicdecisionalgorithmfor the cliquedecisionproblem.If the numberof verticesin G is n, the sizeof a maxcliqueinG can be found by makingseveralapplicationsof DCIique.DCIiqueis usedoncefor eachk, k = n, 7i \342\200\224 1,n \342\200\224 2,...,until the output from DCIiqueis 1.Ifthe timecomplexityof DCIiqueis /(n),then the sizeof a maxcliquecan befound in time< n f{n).Also, if the sizeof a maxcliquecan bedeterminedin timeg(n), then the decisionproblemcan be solved in timeg{n).Hence,the maxcliqueproblemcan be solved in polynomial timeif and only if thecliquedecisionproblemcan be solved in polynomial time. \342\226\241

Example11.4[0/1knapsack]The knapsackdecisionproblemis todetermine whetherthereis a 0/1assignmentof values to x,,1 r and ^2WiXi <m. The r is a given number.Thep^s and w^s are

500 CHAPTER11.HV'-HARDAND HV'-COMPLETEPROBLEMS

nonnegativenumbers.If the knapsackdecisionproblemcannotbe solved indeterministicpolynomialtime,then the optimizationproblemcannoteither.

\342\226\241

Beforeproceedingfurther,it is necessaryto arrive at a uniform parametern to measurecomplexity.We assumethat n is the length of the inputto the algorithm(that is,n is the input size).We alsoassumethat allinputs are integer.Rationalinputs can be provided by specifyingpairsofintegers.Generally, the lengthof an input is measuredassuminga binaryrepresentation;that is,if the number10is to be input, then in binary itis representedas 1010.Its length is 4. In general,a positiveinteger khas a length of

Ll\302\260g2 &J + 1bitswhen representedin binary. The lengthof the binary representationof 0 is 1.The size,or length, n of the inputto an algorithmis the sum of the lengthsof the individual numbersbeinginput. In casethe input is given usinga different representation(say radixr),then the lengthof a positivenumberk is

Ll\302\260gr &J + 1-Thus, in decimalnotation, r = 10and the number 100has a length log10100+ 1 = 3.Sincelogrk = log2&;/log2r,the lengthof any input usingradixr (r > 1)representationis c(r)n,wheren is the lengthusinga binary representationand c(r) is a numberthat is fixed for a given r.

When inputs are given using the radixr = 1,we say the input is inunary form. In unary form, the number5 is input as 11111.Thus, thelengthof a positive integerk is k. It is importantto observethat the lengthof a unary input is exponentiallyrelatedto the lengthof the correspondingr-ary input for radixr, r > 1.Example11.5[Max clique]Theinput to the maxcliquedecisionproblemcan be provided as a sequenceof edgesand an integerk. Each edgein

E(G)is a pairof numbers(i,j). Thesizeof the input for eachedge(i,j)is[log2iJ+ [log2JJ + 2 if a binary representationis assumed.The input sizeof any instanceis

n = Y, (Ll\302\260g2 *J + Llog2JJ + 2) + Llog2*J + 1l<j

Note that if G has only oneconnectedcomponent,then n > \\V\\. Thus,if this decisionproblemcannot be solved by an algorithmof complexityp(n) for somepolynomialp(), then it cannotbe solved by an algorithmofcomplexityp(|F|). \342\226\241

Example11.6[0/1knapsack]Assuming pi,Wi,m, and r are all integers,the input sizefor the knapsackdecisionproblemis

q = Y. (\\}\302\260g2Pi\\ + Uog2Wi\\) +2n + |_log2 m\\ + |_log2 rj + 2\\<i<n


Note that q >n. If the input is given in unary notation,then the input sizes is ^2pi +X] Wi +m +r. Note that the knapsackdecisionand optimizationproblemscan be solved in timep(s) for somepolynomial p() (seethedynamic programmingalgorithm).However,thereis no knownalgorithmwithcomplexity0(p{n))for somepolynomialp(). \342\226\241

We arenow ready toformally definethe complexityof a nondeterministicalgorithm.Definition11.2The time required by a nondeterministicalgorithmperforming on any given input is the minimum numberof stepsneededto reacha successfulcompletionif thereexistsa sequenceof choicesleadingto sucha completion.In casesuccessfulcompletionis not possible,then the timerequiredis 0(1).A nondeterministicalgorithmis of complexity 0(f(n))iffor all inputs of sizen, n > no, that result in a successfulcompletion,thetimerequiredis at mostcf(n) for someconstantsc and no- \342\226\241

In Definition11.2we assumethat eachcomputationstepis of a fixedcost.In word-orientedcomputersthis is guaranteedby the finitenessof eachword.When eachstep is not of a fixed cost,it is necessaryto considerthe costof individual instructions.Thus, the additionof two m-bitnumberstakesO(m) time,their multiplicationtakes0(m2)time(usingclassicalmultiplication), and soon.To seethe necessityof this, considerthe algorithmSum(Algorithm 11.3).This is a deterministicalgorithmfor the sum of subsetsdecisionproblem.It uses an (m+ l)-bitword s. The ith bit in s is zeroif and only if no subsetof the integersA[j], 1< j < n, sums to i. Bit 0of s is always 1and the bitsarenumbered0,1,2,...,m right to left. Thefunction Shift shifts the bitsin s to the left by A[i] bits.The totalnumberofstepsfor this algorithmis only 0{n).However,eachstepmoves m + 1bitsof data and would take0(m)timeon a conventionalcomputer.Assumingoneunit of timeis neededfor eachbasicoperationfor a fixed word size,thecomplexity is 0(nm)and not 0(n).

Thevirtue of conceivingof nondeterministicalgorithmsis that often whatwould be very complexto write deterministicallyis very easy to write non-deterministically.In fact, it is very easy to obtainpolynomialtimenondeterministic algorithmsfor many problemsthat can bedeterministicallysolvedby a systematicsearchof a solutionspaceof exponentialsize.

Example11.7[Knapsackdecisionproblem] DKP (Algorithm 11.4)is a non-deterministicpolynomial timealgorithmfor the knapsackdecisionproblem.Thefor loopof lines4 to 8 assigns0/1values to x[i],1< i <n.Italsocomputes the totalweight and profit correspondingto this choiceof x[].Line9checksto seewhetherthis assignmentis feasibleand whetherthe resultingprofit is at leastr. A successfulterminationis possibleiff the answer tothe decisionproblemis yes. Thetimecomplexity is O(n). If q is the inputlengthusinga binary representation,the timeis O(q). \342\226\241


1 AlgorithmSum(j4,n, m)2 {3 s :=1;4 // s is an (m+ l)-bitword.Bitzerois 1.5 for i :=1to n do6 s :\342\200\224 s or Shift(s, 4[i]);7 if the mth bit in s is 1then8 write (\"A subsetsumsto m.\;9 elsewrite (\"Nosubsetsumsto m.\;10 }

Algorithm11.3Deterministicsum of subsets

Example11.8[Max clique] Algorithm DCK (Algorithm 11.5)is anondeterministic algorithmfor the cliquedecisionproblem.Thealgorithmbeginsby trying to form a set of k distinctvertices.Then it teststo seewhethertheseverticesform a completesubgraph.If G is given by its adjacencymatrix and |V| = n, the input lengthm is n2 + [log2 fcj + |_log2 \"J + 2. Thefor loopof lines4 to9 caneasily beimplementedto run in nondeterministictimeO(n).The timefor the for loopof lines11and 12is 0(k2).Hencetheoverall nondeterministictimeis 0(n+ k2) = 0(n2)= 0(m).Thereis noknown polynomial timedeterministicalgorithmfor this problem. \342\226\241

Example11.9[Satisfiability] Let x\\,X2, \342\226\240\342\226\240\342\200\242denotebooleanvariables (theirvalue is eithertrue or false).Let afj denotethe negationof X{. A literaliseithera variableor its negation.A formula in the propositionalcalculusis anexpressionthat canbeconstructedusingliteralsand the operationsandandor.Examplesof suchformulasare [x\\ l\\X2)\\l{x^l\\x\\) and \\x^\\lx^)l\\[x\\\\lx-2).Thesymbol V denotesor and A denotesand.A formula is in conjunctivenormalform(CNF) if and only if it is representedas A^_1Cj,wherethe cjareclauseseachrepresentedas V/jj. Thel^ areliterals.It is in disjunctivenormalform(DNF)if and only if it is representedas V^=1Cj and eachclauseCi is representedas Alij. Thus {x\\ A X2) V (^3 A 2T4) is in DNF whereas(x3V 2F4) A (xi V X2) is in CNF.Thesatisfiabilityproblemis to determinewhethera formula is true for someassignmentof truth valuesto thevariables.CNF-satisfiabilityis the satisfiability problemfor CNFformulas.

It is easy to obtaina polynomialtimenondeterministicalgorithmthatterminates successfullyif andonly if agiven propositionalformula E(x\\,...,xn)is satisfiable.Suchan algorithmcouldproceedby simplychoosing(nondeter-


1 AlgorithmDKP(p,w, n, m, r, x)2 {3 W :=0;P :=0;4 for i :=1to n do5 {6 x[i]:=Choice(0,1);7 W :=W +x[i]* w[i\\; P :=P +x[i]* p[i]8 }9 if {{W > m) or (P< r)) thenFailure();10 elseSuccess();11 }

Algorithm11.4Nondeterministicknapsackalgorithm

1234567891011121314

AlgorithmDCK(G,n,ifc)

{ S \342\226\240\342\200\224 0;// S is an initially empty set.for i :=1to k do

}

{t :=Choice(l,n);if t G S thenFailure();S :=SU{t}II Add t to set5.

}//At this point 5containsk distinctvertexindices.for all pairs(i,j)suchthat i E S, j G S,and i ^ j do

if (i,j) is not an edgeof G thenFailure();SuccessQ;

Algorithm11.5Nondeterministiccliquepseudocode

504 CHAPTER11.MV'-HARDAND MV'-COMPLETEPROBLEMS

ministically)oneof the 2n possibleassignmentsof truth valuesto(x\\,...,xn)and verifying that E(x\\,...,xn) is true for that assignment.

Eval (Algorithm 11.6)doesthis.Thenondeterministictimerequiredbythe algorithmis 0(n)to choosethe value of (x\\,...,xn) plus the timeneededto deterministicallyevaluateE for that assignment.This time isproportional to the lengthof E. \342\226\241

1 AlgorithmEval(.E,n)2 // Determinewhetherthe propositionalformula E is3 // satisfiable.Thevariables arex\\,X2, \342\226\240\342\226\240

\342\226\240, xn.4 {5 for i :=1to n do // Choosea truth value assignment.6 X{ :=Choice(false,true);7 if E(xi,...,xn) thenSuccess();8 elseFailure();9 }

Algorithm11.6Nondeterministicsatisfiability

11.1.2TheClassesAfP-hardandA^P-completeIn measuringthe complexityof an algorithm,we use the input length asthe parameter.An algorithmA is of polynomialcomplexityif thereexistsapolynomialp() suchthat the computingtimeof A is 0(p(n)) for every inputof sizen.

Definition11.3V is the setof alldecisionproblemssolvablebydeterministic algorithmsin polynomial time.MV is the set of alldecisionproblemssolvableby nondeterministicalgorithmsin polynomial time. \342\226\241

Sincedeterministicalgorithmsarejust a specialcaseof nondeterministicones,we concludethat V C J\\fV'. What we donot know, and what hasbecomeperhapsthe mostfamous unsolvedproblemin computerscience,iswhetherV = MV or V ^ MV.

Is it possiblethat for all the problemsin AfV, thereexistpolynomialtimedeterministicalgorithmsthat have remainedundiscovered?Thisseemsunlikely, at leastbecauseof the tremendouseffort that has already beenexpendedby somany peopleontheseproblems.Nevertheless,a proofthat V

7^ MV is just as elusiveand seemsto requireasyet undiscoveredtechniques.


But as with many famous unsolved problems,they servetogenerateotheruseful results,and the questionof whetherMV Q V is no exception.Figure11.1displays the relationshipbetweenV and MV assumingthat V^MV\342\226\240

Figure11.1CommonlybelievedrelationshipbetweenV and MV

S.Cookformulatedthe following question:Isthereany singleprobleminMV suchthat if we showed it to be in V, then that would imply that V =MV? Cookansweredhis own questionin the affirmative with the followingtheorem.

Theorem11.1[Cook]Satisfiability is in V if and only if V = MV.

Proof:SeeSection11.2. \342\226\241

We are now ready to define the A/r'P-hard and A/^-completeclassesofproblems.Firstwe definethe notionof reducibility. Note that this definitionis similarto the onemadein Section10.3.Definition11.4Let L\\ and L2 be problems.ProblemL\\ reducesto L2(alsowritten L\\ oc L2) if and only if there is a way to solve L\\ by adeterministic polynomial timealgorithmusinga deterministicalgorithmthatsolvesL2 in polynomial time. \342\226\241

This definition impliesthat if we have a polynomial timealgorithmforL2,then we can solveL\\ in polynomial time.Onecan readily verify that ocis a transitiverelation(that is,if L\\ ocL2 and L^ ocL3, then L\\ ocL3).

Definition11.5A problemL is ATP-hard if and only if satisfiabilityreduces to L (satisfiability oc L).A problemL is MV-completeif and only ifL is TvT-hard and L <E MV. \342\226\241

506 CHAPTER11.AfV-HARD AND AfT'-COMPLETEPROBLEMS

/ HP/ / 9$>-complete

!A\302\243P-hard

<P

Figure11.2Commonly believed relationshipamong V, AfV, MV-complete,and Af'P-hard problems

It is easy to seethat thereare Af'P-hard problemsthat are not MV-complete.Only a decisionproblemcanbeTV'P-complete.However,anoptimization problemmay beA/'T-'-hard.Furthermoreif L\\ is a decisionproblemand L2 an optimizationproblem,it is quitepossiblethat L\\ ocL2.Onecantrivially show that the knapsackdecisionproblemreducesto the knapsackoptimizationproblem.Forthe cliqueproblemonecan easily show that thecliquedecisionproblemreducesto the cliqueoptimizationproblem.In fact,one can alsoshow that theseoptimizationproblemsreduceto theircorresponding decisionproblems(seethe exercises).Yet, optimizationproblemscannot be Af'P-completewhereasdecisionproblemscan. TherealsoexistAfV-haxd decisionproblemsthat are not Af'P-complete.Figure11.2showsthe relationshipamongtheseclasses.

Example11.10As an extremeexampleof an J\\f V-h&rd decisionproblemthat is not Af'P-completeconsiderthe haltingproblemfor deterministicalgorithms. Thehalting problemis todeterminefor an arbitrary deterministicalgorithmA and an input / whetheralgorithmA with input / everterminates (or entersan infinite loop).It is well known that this problemisundecidable.Hence,thereexistsno algorithm(of any complexity)to solvethis problem.So, it clearly cannotbe in MV. To show satisfiability oc thehaltingproblem,simply constructan algorithmA whoseinput is aprepositional formula X. If X has n variables,then A triesout all2n possibletruthassignmentsand verifies whetherX is satisfiable.If it is,then A stops.If itis not, then A entersan infinite loop.Hence,A halts on input X if and only


if X is satisfiable.If we had a polynomial timealgorithmfor the haltingproblem,then we couldsolve the satisfiability problemin polynomial timeusingA and X as input to the algorithmfor the haltingproblem.Hence,the haltingproblemis an ATP-hard problemthat is not in MV. \342\226\241

Definition11.6TwoproblemsL\\ and L2 aresaid tobepolynomiallyequivalent if and only if L\\ ocL2 and L2 ocL\\. \342\226\241

To show that a problem L2 is MV-hard, it is adequate to show L\\ ocL2, where L\\ is someproblem already known to be MV-hard. Sinceoc isa transitive relation,it follows that if satisfiability oc L\\ and L\\ oc L2,then satisfiability oc L2. To show that an MV-hard decisionproblem isMV'-complete,we have just to exhibit a polynomial time nondeterministicalgorithm for it.

Latersectionsshowmany problemstobeAfV-h&rd. Although we restrictourselvesto decisionproblems,it shouldbe clearthat the correspondingoptimizationproblemsarealsoAfV-h&vd. TheATP-completenessproofs areleft asexercises(for thoseproblemsthat areAf'P-complete).

EXERCISES1.Given two sets

\302\2431and S2,the disjointsetsproblemis to checkwhether

the setshave a commonelement(seeSection10.3.2).Presentan 0{\\)timenondeterministicalgorithmfor this problem.

2. Given a sequenceof n numbers,the distinctelementsproblemis tocheckif thereareequalnumbers(seeSection10.3,Exercise5).Givean 0(1)timenondeterministicalgorithmfor this problem.

3. Obtaina nondeterministicalgorithmof complexityO(n) to determinewhetherthereis a subsetof n numbersoj,1< i < n, that sumsto m.

4. (a) Show that the knapsackoptimizationproblemreducesto theknapsackdecisionproblemwhen allthe p's,id's,and m areinteger and the complexityis measuredas a function of input length.(Hint: If the input length is q, then ^Pi < n2q, wheren is thenumberof objects.Usea binary searchto determinethe optimalsolutionvalue.)

(b) Let DK bean algorithmfor the knapsackdecisionproblem.Let rbe the value of an optimalsolutiontothe knapsackoptimizationproblem.Showhow to obtaina 0/1assignmentfor the Xi, 1< i <n, suchthat Yl,Pixi = r and Yl Wixi < rn by makingn applicationsof DK.

CHAPTER11.AfV-HARD AND AfV-COMPLETEPROBLEMS

Show that the cliqueoptimizationproblemreducesto the cliquedecision problem.Let Sat(.E)be an algorithmto determinewhethera propositionalformula E in CNFis satisfiable.Show that if E is satisfiableand has nvariablesx\\,X2, \342\200\242\342\200\242

\342\200\242, xn, then usingSat(.E)n times,onecandeterminea truth value assignmentfor the x^s for which E is true.

Let 7T2 be a problemfor which thereexistsa deterministicalgorithmthat runs in time2^n (wheren is the input size).Proveordisprove:

If 7Ti is another problemsuch that -K\\ is polynomiallyreducible to7T2, then 7Ti canbesolved in deterministic0(2^)timeon any input of sizen.

11.2COOK'STHEOREM (*)Cook'stheorem(Theorem11.1)statesthat satisfiability is in V if and onlyif V = MV. We now prove this importanttheorem.We have already seenthat satisfiability is in MV (Example11.9).Hence,if V = MV, thensatisfiability is in V. It remainsto be shown that if satisfiability is in V, thenV = MV. To do this, we show how to obtain from any polynomial timenondeterministicdecisionalgorithmA and input / a formula Q(A,I) suchthat Q is satisfiableiff A has a successfulterminationwith input /. If thelengthof / is n and the timecomplexityof A is p(n) for somepolynomialp(), then the length of Q is 0(p3(n)logn)= 0(pA(n)). The timeneededto constructQ is also0(ps(n)logn).A deterministicalgorithmZ todetermine the outcomeof A on any input / canbeeasily obtained.Algorithm Zsimply computesQ and then usesa deterministicalgorithmfor thesatisfiability problemtodeterminewhetherQ is satisfiable.If 0(q(m))is the timeneededto determinewhethera formula of lengthm is satisfiable,then thecomplexity of Z is 0(p3(n)logn+ q(ps(n)logn)).If satisfiability is in V,then q{m)is a polynomial function of m and the complexityof Z becomes0(r(n))for somepolynomial r(). Hence,if satisfiability is in V, then forevery nondeterministicalgorithmA in MV we can obtaina deterministicZin

V\342\226\240 So,the above constructionshows that if satisfiability is in V, then V= MV.

Beforegoinginto the constructionof Q from A and /, we make somesimplifyingassumptionson our nondeterministicmachinemodeland on theform of A. Theseassumptionsdonot in any way alterthe classof decisionproblemsin MV or V\342\226\240 The simplifyingassumptionsareas follows.

1.The machineon which A is to be executedis word oriented.Eachword is w bitslong. Multiplication,addition,subtraction,and soon,

508

11.2.COOK'STHEOREM(*) 509

betweennumbersoneword longtake oneunit of time.If numbersarelongerthan a word, then the correspondingoperationstakeat leastasmany units as the numberof words makingup the longestnumber.

2. A simpleexpressionis anexpressionthat containsat mostoneoperatorand all operandsaresimplevariables (i.e.,no array variablesareused).Somesamplesimpleexpressionare

\342\200\224B,B+ C,D orE, and F. We

assumethat all assignmentsin A are in oneof the following forms:

(a) (simplevariable):=(simpleexpression)(b) (array variable):=(simplevariable)(c) (simplevariable) :=(array variable)(d) (simplevariable) :=Choice(S'),whereSisa finite set{Si,S2,...,Sk}

or l,u. In the latter casethe function choosesan integer in therange [/ :u].

Indexingwithin an array is doneusinga simpleintegervariable andallindexvalues arepositive.Only one-dimensionalarrays areallowed.Clearly, all assignmentstatementsnot falling into one of the abovecategoriescanbereplacedby a setof statementsof thesetypes. Hence,this restrictiondoesnot alterthe classMV.

3. All variablesin A areof type integerorboolean.4. Algorithm A containsno readorwrite statements.Theonly input to

A is via its parameters.At the timeA is invoked,allvariables (otherthan the parameters)have value zero(or falseif boolean).

5. Algorithm A containsno constants.Clearly, all constantsin anyalgorithm can be replacedby new variables.Thesenew variablescanbeaddedto the parameterlist of A and the constantsassociatedwiththem canbe part of the input.

6.In additiontosimpleassignmentstatements,A is allowed to containonly the following types of statements:

(a) Thestatementgotok, wherek is an instructionnumber.(b) The statementif c thengotoa;.Variablec is a simpleboolean

variable (i.e.,not an array) and a is an instructionnumber.

(c) Success(),Failure().(d) Algorithm A may containtype declarationand dimension

statements. Theseare not used duringexecutionof A and soneednot be translatedinto Q. Thedimensioninformation is used toallocatearray space.It is assumedthat successiveelementsin anarray areassignedto consecutivewords in memory.

510 CHAPTER11.AfV-HARD AND MV'-COMPLETEPROBLEMS

It is assumedthat the instructionsin A arenumberedsequentiallyfrom1to \302\243 (if A has \302\243 instructions).Every statementin A has a number.Thegotoinstructionsin (a) and (b) use this numberingschemetoeffect a branch.It shouldbeeasy to seehow to rewriterepeat-until,for,and soon, statementsin termsof gotoand if c thengotoastatements.Also, note that the gotok statementcan be replacedbythe statementif truethengotok. So,this may alsobeeliminated.

7. Let p(n) be a polynomial suchthat A takesno morethan p(n) timeunits on any input of lengthn.Becauseof the complexityassumptionof 1),A cannotchangeor use morethan p(n) words of memory. Weassumethat A usessomesubsetof the words indexed1,2,3,...,p{n).This assumptiondoesnot restrictthe classof decisionproblemsinMV.To seethis, let/(l),/(2),...,f(k),\\<k<p(n), be the distinctwords used by A while working on input /. We can constructanother polynomial timenondeterministicalgorithmA' that uses2p(n)words indexed1,2,...,2p(n) and solvesthe samedecisionproblemasA does.A' simulatesthe behaviorof A. However,A' mapstheaddressesf(l),f(2),...,f(k)onto the set {1,2,..., A;}. The mappingfunction used is determineddynamically and is storedas a table inwordsp(n) +1through2p(n) . If the entry at wordp(n) + i is j, thenA' usesword i tohold the samevalue that A storedin word j. Thesimulationof A proceedsas follows: Let k be the numberof distinctwords referencedby A up to this time. Let j be a word referencedby A in the currentstep.A' searchesits tableto find word p(n) + i,1< i < k, suchthat the contentsof this word is j. If no such i exists,then A' setsk :=k + 1;i :=k; and word p(n) + k is given the valuej. A' makesuseof the word i to dowhatever A would have donewithword j. Clearly,A' and A solve the samedecisionproblem.Thecomplexity of A' is 0(p2(n))as it takesA' p(n) timeto searchits tableandsimulatea stepof A. Sincep2(n) is alsoa polynomial in n, restrictingouralgorithmsto useonly consecutivewordsdoesnot alterthe classesV and MV.

FormulaQ makesuseof severalbooleanvariables.We statethe semanticsof two setsof variablesused in Q:

1.B(i,j,t),1< i <p{n),l<j<w,0<t<p{n)

B(i,j,t)representsthe status of bit j of word i following t steps(ortimeunits) of computation.Thebitsin a word are numberedfromright to left. The rightmostbit is numbered1. Q is constructedsothat in any truth assignmentfor which Q is true,B(i,j,t)is true ifand only if the correspondingbit has value 1following t stepsof somesuccessfulcomputationof A on input /.


2- S(j,t),l<j<\302\243, \\<t<p{n)Recallthat \302\243 is the numberof instructionsin A. S(j,t) representstheinstructionto beexecutedat timet. Q is constructedso that in anytruth assignmentfor which Q is true,S(j,t) is true if and only if theinstructionexecutedby A at timet is instructionj.

Q is madeup of sixsubformulas,C,D,E,F,G,and H. Q = CA D AE A

F A G A H. Thesesubformulas make the following assertions:

C:The initialstatusof the p{n) words representsthe input /. All non-input variablesarezero.

D:Instruction1is the first instructiontoexecute.

E: At the end of the ith step,therecan be only one next instructiontoexecute.Hence,for any fixed i,exactlyone of the S(j,i),1< j <

\302\243,

can be true.

F: If S(j,i) is true,thenS(j,i+1)is alsotrue if instructionj is a SuccessorFailure statement.S(j+l,i+l) is true if j is an assignmentstatement.If j is a gotok statement,then S(k,i+1) is true. Thelastpossibilityfor j is the if c thena; statement.In this caseS(a,i + 1) is true if cis true and S(j+ 1,i + 1) is true if c is false.

G:If the instructionexecutedat step t is not an assignmentstatement,then the B(i,j,\302\243)'s

areunchanged.If this instructionis an assignmentand the variable on the left-hand sideis X, then only X may change.This changeis determinedby the right-handsideof the instruction.

H: The instructiontobe executedat timep(n) is a Successinstruction.Hencethe computationterminatessuccessfully.

Clearly, if C throughH make the above assertions,then Q = CAD AE A

F AG A H is satisfiableif and only if thereis a successfulcomputationof Aon input /. We now give the formulasC throughH. While presentingthesefornmlas, we alsoindicatehow eachmay be transformedinto CNF.Thistransformationincreasesthe length of Q by an amount independentof n

(but dependenton w andI).Thisenablesus toshow that CNF-satisfiabilityis TV'P-complete.

1.FormulaC describesthe input /. We have

C= A T(*,j,0)l<i<p(n)l<j<W

512 CHAPTER11.NV-HARDAND MV-COMPLETEPROBLEMS

T(i,j,0)is B(i,j,0)if the input callsfor bit B(i,j,0)(i.e.,bit j ofword i) to be 1.T(i,j,0)is B(i,j,0)otherwise.Thus, if thereis noinput, then

C= A B(i,j,0)l<i<p(n)l<j<W

Clearly, C is uniquely determinedby / and is in CNF.Also, C issatisfiableonly by a truth assignmentrepresentingthe initialvaluesofallvariables in A.

2.D = 5(1,1)A 5(2,1)A 5(3,1)A \342\200\242\342\200\242\342\200\242AS(\302\243, 1)

Clearly, D is satisfiableonly by the assignment5(1,1)= true andS(i,l)= false,2 < i < \302\243. Usingour interpretationof S(i,1),thismeansthat D is true if and only if instruction 1 is the first to beexecuted.Notethat D is in CNF.

3- E =Ai<t<P(n)Et

Each Et will assertthat thereis a uniqueinstructionfor stept. Wecan define Et tobe

Et = (5(1,t) V (5(2,t) V \342\200\242\342\200\242\342\200\242VS(\302\243, t)) A ( /\\ (S(j,t) V S(k,t))

i<j<ei<k<e

Onecan verify that Et is true iff exactlyoneof the S(j,\302\243)'s,1<j' < \302\243,

is true. Also, note that E is in CNF.4.F = A i<><* *i,t

l<t<p(n)EachFj;t assertsthat eitherinstructioni is not the oneto beexecutedat timet or, if it is,then the instructionto beexecutedat timet + 1is correctlydeterminedby instructioni. Formally,we have

Fitt= S{i,t)\\/L

whereL is definedas follows:

(a) If instructioni is Successor Failure, then L is S(i,t+ 1).Hencethe programcannot leave suchan instruction.

(b) If instructioni is gotok, then L is S(k,t + 1).(c) If instructioni is if X thengotok and variable X is represented

by wordj, then L is ((B(j,1,t -1)A S(k,t +1))V (B(j,1,t -1)A

5(i+ 1,t + 1))).This assumesthat bit 1of X is 1if and only ifX is true.


(d) If instructioni is not any of the above, then L is S(i+ 1,t + 1).TheFi/sdefined in cases(a), (b), and (d) are in CNF.The Fjit incase(c) can be transformedinto CNFusingthe booleanidentity a V

(b A c) V (d A e) = (a V b V d) A (a V c V d) A (a V 6 V e) A (a V c V e).5. G = /\\ i<;<< G^t

l<t<p(n)EachGi;t assertsthat at timet eitherinstructioni is not executedorit is and the status of thep(n) wordsafter stept is correctwith respectto the status beforestept and the changesresultingfrom instructioni. Formally, we have

Gi,t = S(i,t)VMwhereM is definedas follows:

(a) If instructioni is a goto,if-then-goto-,Success,or Failurestatement, then M assertsthat the status of the p(n) words isunchanged; that is,B(k,j,t \342\200\224 1)= B(k,j,t), 1< k <p{n),1< j <w.

M =f\\ ((B(k,j,t-1)

In this case,G8

Gx

l<k<p(n)\\<j<W

AB(k,j,t))y(B(k,j,

j can bewrittenas

,t= A (s(ht)

l<k<p(n)l<j<W

V(B(h,j,t-l)y(B(k,j,t-l)

t-l)A

AB(k,jA(B(k,,

(B(k,j,

ht)))

<))

Eachclausein G^t is of the form z V (x A s) V (x A s),wherez isS(i,t),xrepresentsa B(,,t \342\200\224 1),and s representsa B{,,t). Notethat z V (x A ,s) V (xAs) is equivalent to (iVsVz)A(iVsVz).Hence,Gijcanbe transformedinto CNF easily.

(b) If i is an assignmentstatementof type 2(a), then M dependsonthe operator(if any) on the right-handside.We first describetheform of M for thecasein whichinstructioni is of type Y :=V+Z.Let y, V, and Z be respectivelyrepresentedin words y, v, and z.We make the simplifyingassumptionthat allnumbersarenonneg-ative. Theexercisesexaminethe casein which negativenumbers

514 CHAPTER11.NV-HARDAND MV-COMPLETEPROBLEMS

areallowedand l'scomplementarithmeticis used.To get aformula assertingthat the bitsB{y,j,t),1< j < w, representthesum of B(v,j,t \342\200\224 1)and B(z,j,t \342\200\224 1),1<j < w, we have to makeuseof w additionalbitsC(j,t),1< j < w. C(j,t)representsthecarry from the additionof the bits

B(v,j,t\342\200\224 l),B(z,j,t\342\200\224 1),andC(j \342\200\224

1,\302\243),1< j < w. C{\\,t)is the carry from the addition

of B(v,1,t \342\200\224 1) and B(z,1,t \342\200\224 1).Recallthat a bit is 1 iff thecorrespondingvariable is true. Performinga bitwise additionofV and Z, we obtain C(l,t) = B(v,l,t- 1) A B(z,l,t- 1) andB(y,l,t)= B(v,l,t-l)\302\256B(z,l,t-l),where0 is the exclusiveoroperation(a\302\251

6 is true iff exactlyoneof a and b is true).Notethat a \302\251 b = (a V b) A (a A b) = (a V b) A (a V b). Hence,the right-hand sideof the expressionfor B{y,\\,t) can be transformedintoCNF usingthis identity. Forthe otherbitsof Y, one can verifythat

B(yJ,t)= B(v,j,t-1) \302\251 (B(z,j,t-1) \302\251 C(j- l,t)) and

C(j,t) = (B(v,j,t-l)AB(z,j,t-l))y(B(v,j,t-l)AC(j-l,t))V(B(z,j,t-l)AC(j-l,t))

Finally, we requirethat C(w,t) = false (i.e.,thereis no overflow).Let M'be the andof all the equationsfor B(y,j,t)and C(j,t),I <j <w.M is given by

M = ( A ((B(k,j,t-1)AB(k,j,t))l<k<p(n)

l<j<W

A(B(k,j,t-l)AB(k,j,t)))AM'Gijcan be convertedinto CNF usingthe ideaof 5(a).Thistransformation increasesthe lengthof G^t by a constantfactor independentof n. We leave it to the readertofigure out what M is wheninstruction i is eitherof the forms Y :\342\200\224 V; and Y :=V OZ;,for O one of

\342\200\224, /,*, <,>,<,=, and soon.When i is an assignmentstatementof type 2(b) or 2(c),then it isnecessarytoselectthe correctarray element.Consideran instructionof type 2(b):R[m] :=X;.In this caseformula M can bewritten as

M = W A ( /\\ Mj)l<j<u


whereu is the dimensionof R. Note that becauseof restriction(7)onalgorithmA, u <p(n).W assertsthat 1< m < u. Thespecificationof W is left as an exercise.EachMj assertsthat eitherm ^ j orm = jand only the j'thelementof R changes.Letus assumethat the valuesof X and m arerespectivelystoredin wordsx and m and that R{\\ :u)is storedin words a,a+1,...,a+ u \342\200\224 1.Mj is given by

Mj= \\/ T(m,k,t-l)VZl<k<w

where_T is B if the &th bit in the binary representationof j is 0 andT is B otherwise.Z is definedas

Z=f\\ ((B(r,M-l)AB(r,M))V(%M-l)

l<k<wl<r<p(n)

AB(r,k,t-l)))/\\ ((B(a+j-l,k,t)AB(x,k,t-l))

l<k<w

V (B(a+j-l,k,t) A B(x,k, t - 1)))

Notethat the numberof literalsin M is 0(p2(n)).Sincej is w bitslong,it canrepresentonly numberssmallerthan 2W. Hence,for u > 2W,we needa different indexingscheme.A simplegeneralizationis toallow multiprecisionarithmetic.The indexvariable j can then useas many words as needed.Thenumberof words useddependson u.At mostlog(p(n))words areneeded.This callsfor a slight changein

Mj, but the numberof literalsin M remains0(p2(n)).Thereis noneed to explicitlyincorporatemultiprecisionarithmeticas by givingthe programaccessto individual words in a multiprecisionindexj, wecan requirethe programto simulatemultiprecisionarithmetic.When i is an instructionof type 2(c),the form of M is similartothat obtainedfor instructionsof type 2(b).Next,we describehow toconstructM for the casein which i is of the form Y := Choice(S');,whereS is eithera setof the form S= {S\\,S2,\342\226\240\342\226\240

\342\226\240, Sk}or S is of theform r,u.Assume Y is representedby word y. If S is a set,then wedefine

M =\\/ Mj

Mj assertsthat Y is Sj.This is easily doneby choosingMj = 01A

(12A \342\200\242\342\200\242\342\200\242A aw, whereai = B(y,I,t) if bit I is 1in Siand ai = B(y,I,t)if bit I is zeroin S(.If S is of the form r,u, then M is just the formula

516 CHAPTER11.MV-HARD AND MV'-COMPLETEPROBLEMS

that assertsr <Y < u. This is left as an exercise.In both cases,Gi%tcan be transformedinto CNF and the lengthof Gitt increasedby atmost a constantamount.

6.Let ii,i2,\342\226\240\342\226\240\342\226\240,ik be the statementnumberscorrespondingto successstatementsin A. H is given by

H = 5(ii,p(n))V S(t2,p(n))V \342\200\242\342\200\242\342\200\242V S(ik,p(n))

Onecanreadily verify that Q = CA DA E A FAG A H is satisfiableif andonly if the computationof algorithmA with input / terminatessuccessfully.Further, Q can be transformedinto CNF as describedabove. FormulaCcontainswp(n)literals,D contains\302\243 literals,E contains0(\302\2432p(n)) literals,Fcontains0(\302\243p(n)) literals,G contains0(\302\243wp3(n)) literals,and Hcontainsatmost\302\243 literals.The totalnumberof literalsappearingin Q is 0{\302\243wp3{n))

=0(p3(n))as \302\243w is constant.Sincethereare0(wp2(n)+\302\243p(n)) distinctliteralsin Q,eachliteralcan be written using0(log(wp2(n)+ \302\243p{n)))

= O(logn)bits.The lengthof Q is therefore0(p3(n)logn)= 0(pA(n))as p{n) is atleastn.Thetimeto constructQ from A and / is also0(p3(n)logn).

The precedingconstructionshows that every problemin MV reducestosatisfiability and alsotoCNF-satisfiability. Hence,if eitherof thesetwoproblemsis in V, thenMV C V and soV = MV.Also,sincesatisfiabilityis inAfV, the constructionof a CNFformula Q shows that satisfiability ocCNF-satisfiability. This togetherwith the knowledgethat CNF-satisfiabilityis inMV impliesthat CNF-satisfiabilityis A^P-complete.Note that satisfiabilityis alsoMV-completeas satisfiability oc satisfiability and satisfiability is inMV.

EXERCISES1.In conjunctionwith formula G in the proofof Cook'stheorem

(Section 11.2),obtainM for the following casesfor instructioni.Note thatM cancontainat most0(p(n)) literals(asa function of n). ObtainMunder the assumptionthat negativenumbersare representedin onescomplement.Show how the correspondingGrit's can be transformedinto CNF.Thelengthof Gijmust increaseby no morethan a constantfactor (say w2) duringthis transformation.

(a) Y :=Z;(b) Y :=V -Z;(c) Y :=V+ Z;(d) Y :=V*Z;

11.3.AfV-HARD GRAPHPROBLEMS 517

/ poly, time

transform

/' algorithm for poly, time

transform

solution for

/

Figure11.3Reductionof L\\ to L2

(e) Y :=Choice(0,l);(f) Y :=Choice(r,u);,wherer and u arevariables

2.Show how to encodethe following instructionsas CNF formulas:(a)for and (b) while.

3. Proveor disprove:If thereexistsa polynomial timealgorithmtoconvert a booleanformula in CNFinto an equivalent formula in DNF,then V = MV.

11.3AfV-HAKDGRAPH PROBLEMS

Thestrategy we adoptto show that a problemL2 is AfV-ha,rd is:1.Picka problemL\\ already known to beMV-haxd.

2. Show how to obtain (in polynomial deterministictime)an instanceVof Li from any instance/ of L\\ suchthat from the solutionof /' we candetermine(in polynomial deterministictime) the solutionto instance/ofLi (seeFigure11.3).

3. Concludefrom step (2) that L\\ ocLi-

4. Concludefrom steps(1) and (3) and the transitivity of oc that L^ isAfP-hard.

Forthe first few proofs we go throughall the above steps.Laterproofsexplicitlydealonly with steps(1) and (2).An AfV-h&vd decisionproblemL2 can be shown to beA^P-completeby exhibitinga polynomial timenon-deterministicalgorithmfor L2.All the A/^-hard decisionproblemswe dealwith hereareATP-complete.Theconstructionof polynomial timenondeter-ministicalgorithmsfor theseproblemsis left as an exercise.


11.3.1CliqueDecisionProblem(CDP)Thecliquedecisionproblemwas introducedin Section11.1.We show inTheorem11.2that CNF-satisfiabilityoc CDP.Usingthis result,thetransitivity of oc, and the knowledgethat satisfiability oc CNF-satisfiability(Section 11.2),we can readily establishthat satisfiability oc CDP.Hence,CDPis AfV-hard. Since,CDPG AfV, CDPis alsoJVT-complete.

Theorem11.2CNF-satisfiability occliquedecisionproblem.

Proof:Let F = /\\1<Ki,d be a propositionalformula in CNF.Let Xj,1< i < n, be the variables in F.We show how to constructfrom F a graphG = (V, E) such that G has a cliqueof sizeat leastk if and only if F issatisfiable.If the lengthof F is m,thenGis obtainablefrom F in 0(m)time.Hence,if we have a polynomial timealgorithmfor CDP,then we canobtaina polynomial timealgorithmfor CNF-satisfiabilityusingthis construction.

Forany F, G = {V,E)is defined as follows: V \342\200\224 {{a,i)\\a is a literalinclauseCj}andE = {((a,i),(S,j)) \\

% / j and a / 8}.A sampleconstructionis given in Example11.11.Claim:F is satisfiableif and only if G has a cliqueof size> k.

ProofofClaim:If F is satisfiable,then thereis a set of truth values forxi, 1< i < n, suchthat eachclauseis true with this assignment.Thus,withthis assignmentthereis at leastone literala in eachC{suchthat a is true.LetS = {{<J,i)| a is true in Ci}be a set containingexactlyone {cr,i)foreachi. Betweenany two nodes(a,i)and {8,j) in S thereis an edgein G,sincei / j and both a and 8 have the value true. Thus,S forms a cliqueinG of sizek.

Similarly, if G has a cliqueK = (V',E')of sizeat least k, then letS = {(<J,i)| (<J,i)G V'}.Clearly, 151= k as G has no cliqueof sizemorethan k. Furthermore,if S\" = {a \\ (a,i)6 S for somei},then S'cannotcontainboth a literal8 and its complement8 as thereis no edgeconnecting(8,i)and {8,j)in G.Henceby settingX{ = true if X{ G S\" and X{ = false ifXi G S\" and choosingarbitrary truth values for variablesnot in S\", we cansatisfy all clausesin F. Hence,F is satisfiableiff G has a cliqueof sizeatleastk. \342\226\241

Example11.11ConsiderF = {x\\\\/x2Va^)A {x\\VX2VS3).Theconstruction of Theorem11.2yields the graph of Figure11.4.This graph containssixcliquesof sizetwo. Considerthe cliquewith vertices{(x\\,1),(X2,2)}.By setting x\\ = true and X2 = true (that is,X2 = false),F is satisfied.The\302\2433 may beseteitherto true or false. \342\226\241


Figure11.4A samplegraphand satisfiability

11.3.2NodeCoverDecisionProblem(NCDP)A set S C V is a node cover for a graph G = (V, E) if and only if alledgesin E areincidentto at leastonevertexin S.The size\\S\\ of the cover is thenumberof verticesin S.Example11.12Considerthe graph of Figure11.5.S = {2,4}is a nodecover of size2.S= {1,3,5}is a nodecover of size3. \342\226\241

Figure11.5A samplegraphand nodecover

In the nodecover decisionproblemwe aregiven a graphG and an integerk. We arerequiredto determinewhetherG has a nodecover of sizeat mostk.

Theorem11.3Thecliquedecisionproblemocthenodecover decisionproblem.

Proof:Let G = (V,E) and k define an instanceof CDP.Assume that\\V\\ = n. We constructa graph G\" such that G\" has a nodecover of sizeat

520 CHAPTER11.MV-HARD AND MV-COMPLETEPROBLEMS

mostn \342\200\224 k if and only if G hasa cliqueof sizeat leastk. GraphG'is givenby G'= (V,E),whereE = {{u,v)\\

u G V,v \342\202\254 V and (u,v) $ E}.ThesetG'is known as the complementof G.

Now, we show that G has a cliqueof sizeat leastk if and only if G'hasa nodecover of sizeat most n \342\200\224 k. Let K be any cliquein G. Sincethereareno edgesin E connectingverticesin K, the remainingn \342\200\224

\\K\\ verticesin G'must cover alledgesin E. Similarly, if S is a nodecover of G\", thenV \342\200\224 S must form a completesubgraphin G.

SinceG'canbeobtainedfrom G in polynomial time,CDPcanbe solvedin polynomial deterministictimeif we have a polynomial timedeterministicalgorithmfor NCDP. \342\226\241

Example11.13Figure11.6shows a graph G and its complementG'. Inthis figure,G'has a nodecover of {4,5},sinceevery edgeof G'is incidenteitheron the node4 or on the node5. Thus,G hasa cliqueof size5 \342\200\224 2 = 3consistingof the nodes1,2,and 3. \342\226\241

Figure11.6A graph and its complement

Note that sinceCNF-satisfiabilityocCDP,CDPocNCDPand oc istransitive, it follows that NCDPis AfV-havd. NCDPis alsoin MV becausewe can nondeterministicallychoosea subsetC C V of sizek and verify in

polynomial timethat C is a cover of G.SoNCDPis ./V^P-complete.

11.3.AfV-HARD GRAPH PROBLEMS 521

11.3.3ChromaticNumberDecisionProblem(CNDP)A coloringof a graphG = (V, E) is a function / :V \342\200\224> {1,2,..., k}definedfor all i G V. If (u,v) G E, then f(u) / f{v). The chromaticnumberdecisionproblemis to determinewhetherG has a coloringfor a given k.

Example11.14A possible2-coloringof the graphof Figure11.5is /(l)=/(3)= /(5)= 1and /(2)= /(4) = 2.Clearly, this graphhas no 1-coloring.

\342\226\241

In proving CNDPto be J\\fV-haid, we shall make use of the J\\fV-haxd

problemSATY. This is the CNF-satisfiabilityproblemwith the restrictionthat eachclausehas at mostthreeliterals.ThereductionCNF-satisfiabilityoc SATY is left as an exercise.

Theorem11.4Satisfiability with at mostthreeliteralsperclauseocchromatic numberdecisionproblem.

Proof:Let F be a CNFformula having at mostthreeliteralsper clauseand having r clausesC\\, C2,...,Cr. Let x^, 1 4. If n < 4, then we can determinewhetherF issatisfiableby trying out all eightpossibletruth value assignmentsto xi,x2,and X3. We construct,in polynomial time,a graphG that is n +1colorableif and only if F is satisfiable.Thegraph G = (V,E) is defined by

V = {xi,x2,...,x\342\200\236}U{xbx2,...,x\342\200\236}U{yi,y2,...,y\342\200\236} U{Ci,C2,...,CV}wherey\\, y2,...,yn arenew variables,and

E = {(ari, xi),1< i < n}U {(j/i,^)!!/ j}U {(yi,Xj)\\i/ j}U {(l/i,^)|i/ J'}U {(xi,C7j)|xiCj)U {sifC7j)|si^ Cj}

To seethat G is n + 1colorableif and only if F is satisfiable,we firstobservethat the yj's form a completesubgraphon n vertices.Hence,eachyimust beassigneda distinctcolor.Without lossof generalitywe canassumethat in any coloringof G,yi is given the colori. Sinceyi is alsoconnectedto all the Xj'sand x/sexceptxi and Xj, the colori can beassignedto onlyXi and xj.However,(xj,xn) E E and soa new color,n +1,is neededfor oneof thesevertices.The vertexthat is assignedthe new colorn + 1is calledafalsevertex.Theothervertexis a truevertex.Theonly way to colorG usingn + 1colorsis toassigncolorn + 1to oneof {xj,Xj} for eachi,1 4 and eachclausehas at mostthreeliterals,eachd is adjacentto a pairof verticesxj,Xj for at leastonej. Consequently,


no C% can be assignedthe colorn + 1.Also, no Cj can be assigneda colorcorrespondingtoan Xj or Xj not in claused.Thelasttwo statementsimplythat the only colorsthat can be assignedtoC{correspondtoverticesXj orXj that are in clauseC% and aretrue vertices.Hence,G is n + 1colorableifand only if thereis a true vertexcorrespondingto eachCj.So,G is n + 1colorableiff F is satisfiable. \342\226\241

11.3.4DirectedHamiltonianCycle(DHC)(*)A directedHamiltoniancycle in a directedgraph G = (V, E) is a directedcycleof lengthn =

\\V\\. So,the cyclegoesthroughevery vertexexactlyonceand then returns to the starting vertex.The DHCproblemis to determinewhetherG has a directedHamiltoniancycle.

Example11.151,2,3,4, 5, 1is a directedHamiltoniancycle in the graphof Figure11.7.If the edge(5,1)is deletedfrom this graph, then it has nodirectedHamiltoniancycle. \342\226\241

Figure11.7A samplegraphand Hamiltoniancycle

Theorem11.5CNF-satisfiability ocdirectedHamiltoniancycle.

Proof:Let F be a propositionalformula in CNF.We show how toconstruct a directedgraph G suchthat F is satisfiableif and only if G has adirectedHamiltoniancycle. Sincethis constructioncan be carriedout intimepolynomial in the sizeof F, it will follow that CNF-satisfiability ocDHC.Understandingthe constructionof G is greatly facilitatedby the useof an example.Theexamplewe use is F = C\\ A CiA C3A C4, where

C\\ = x\\ V X2 V\302\2434

V X5

C*2 = X\\ V X2 V\302\2433

C3 = xi V\302\2433

V x5C4 = x\\ V X2 V S3V

\302\2434V S5


Assume that F has r clausesC\\, C2,\342\226\240\342\226\240

\342\226\240, Cr and n variablesx\\, X2, \342\226\240\342\226\240

\342\226\240, xn.Draw an array with r rows and 2n columns.Row i denotesclauseCj.Eachvariablexi is representedby two adjacentcolumns,onefor eachof the literalsXi and X{. Figure11.8shows the array for the exampleformula. InsertaO into columnx\\ and row Cj if and only if x% is a literalin Cj. Inserta0 into columnxi and row Cj if and only if x^ is a literalin Cj. Betweeneachpairof columnsXi and X{ introducetwo verticesui and uj, Ui at the topand Vi at the bottomof the column.Foreachi,draw two chainsof edgesupward from v% to U{, oneconnectingtogetherall 0sin columnXj and theotherconnectingall Qsin columnX{ (seeFigure11.8).Now, draw edges(v,i,Vi+i),1< i < n. Introducea box

|_ijat the right end of eachrow Cj,

1< % < r. Draw the edges{un,\\ 11)and ([T],vi). Draw edges(|_ij,1< i < r (seeFigure11.8).

i+1

Figure11.8Array structurefor the formula in Theorem11.5

To completethe graph, we replaceeach0 and |_ij by a subgraph. Each0 is replacedby the subgraphof Figure11.9(a)(of course,uniquevertexlabelingsareneededfor eachcopy of the subgraph).Eachbox|jjis replacedby the subgraphof Figure11.10.In this subgraphAj is an entrancenodeand Bi an exit node.The edges(|_i_|, i+1) referredto earlierare really{Bi,Ai+i}.Edge (un,11]) is {un,A\\) and

{[\302\245],v\\)is (Br,vi).Thevariable

ji is the numberof literalsin clauseC{.In the subgraphof Figure 11.10an edgeof the type shown in Figure 11.11indicatesa connectionto a 0subgraphin row C{.R%â is connected,to the 1vertexof the (2) <ind. Ria^.\$or i?zj if a = ji) is enteredfrom the 3 vertex.


Figure11.9TheQ subgraphand its insertioninto column2

Figure11.10TheHi subgraph

Ri \" 3K * ) 3\302\273 Ri,a +1

Figure11.11A constructin the proofof Theorem11.5

11.3.MV'-HARDGRAPHPROBLEMS 525

w2 C

w3 e>

Figure11.12Another constructin the proofof Theorem11.5

Thus in the O subgraph(shown in Figure11.12)of Figure11.9(b)w\\ andW3 arethe 1and 3 verticesrespectively.Theincomingedgeis {Ri,i,W\$ andthe outgoingedgeis (w-s,Rlt2)-This completesthe constructionof G.

If F is satisfiable,then let S be an assignmentof truth values for whichF is true. A Hamiltoniancycle for G can start at v\\ and go to u\\, then to\302\2532,

then to\302\2532,

then to U3, then to U3,...,and then toun. In goingfromVi to Ui, this cycle uses the columncorrespondingto Xi if x% is true in S.Otherwiseit goesup the columncorrespondingto xj. Fromun this cyclegoesto A\\ and then throughi?i;i,i?i;2,-Ri,3,\342\200\242\342\200\242

\342\200\242, Ri,jn and -61to A2 to \342\200\242\342\200\242\342\200\242

to v\\. In goingfrom Ri a to i?ji0+i in any subgraph[jj,a diversion is madeto a O subgraphin row i if and only if the verticesof that O subgrapharenot already on the path from Vi to Ri,a- Note that if d has ij literals,thenthe constructionof |jjallowsa diversionto at mostij \342\200\224 1O subgraphs.Thisis adequateas at leastone O subgraphmust already have beentraversedin row d (becauseat leastone suchsubgraphmust correspondto a trueliteral).So,if F is satisfiable,then G has a directedHamiltoniancycle.

It remainsto show that if G has a directedHamiltoniancycle,then F issatisfiable.This can be seenby starting at vertex v\\ on any Hamiltoniancycle for G. Becauseof the constructionof the O and

|_ij subgraphs,sucha cycle must proceedby goingup exactlyone columnof eachpair (xj,5j).In addition,this part of the cycle must traverseat leastoneO subgraphineachrow. Hencethe columnsused in goingfrom vi toUi, 1< i < n, definea truth assignmentfor which F is true.

We concludethat F is satisfiableif and only if G hasa Hamiltoniancycle.Thetheoremnow follows from the observationthat G canbeobtainedfromF in polynomial time. \342\226\241

11.3.5TravelingSalespersonDecisionProblem(TSP)The travelingsalespersonproblemwas introducedin Chapter5. Thecorresponding decisionproblemis to determinewhethera completedirected

526 CHAPTER11.AfV-HARD AND AfV-COMPLETEPROBLEMS

Figure11.13Graphsrepresentingproblems

graph G = (V,E) with edgecostsc(u,v) has a tour of costat mostM.

Theorem11.6DirectedHamiltoniancycle (DHC)oc the travelingsalesperson decisionproblem(TSP).

Proof:Fromthe directedgraphG = (V, E) constructthe completedirectedgraph G' = (V,E'),E' = {{i,j)| i

\302\261 j} and c(i,j)= 1 if (i,j) <E E;c(\302\253, j) = 2 if i / j and

(\342\226\240/, j) ft E. Clearly,G'has a tour of costat mostn iff

G has a directedHamiltoniancycle. \342\226\241

11.3.6AND/ORGraphDecisionProblem(AOG)Many complexproblemscan be brokendown into a seriesof subproblemssuch that the solutionof all or someof theseresultsin the solutionof theoriginalproblem.Thesesubproblemscan bebrokendown further into sub-subproblems,and soon, until the only problemsremainingare sufficientlyprimitive as tobe trivially solvable.Thisbreakingdown of a complexproblem into severalsubproblemscan be representedby a directedgraphlikestructurein which nodesrepresentproblemsand descendentsof nodesrepresent the subproblemsassociatedwith them.

Example11.16Thegraphof Figure11.13(a)representsa problemA thatcanbe solvedby solvingeitherboth the subproblemsB and Cor the singlesubproblemD orE. \342\226\241

Groupsof subproblemsthat must be solved in orderto imply a solutiontothe parentnodearejoinedtogetherby an arcgoingacrossthe respectiveedges(asthe arcacrossthe edges(A, B) and (A, C)).By introducingdummy

11.3.MV-HARD GRAPH PROBLEMS 527

nodesin Figure11.13(b),allnodescanbemadeto besuchthat theirsolutionrequireseitherall descendentsto be solved or only one descendentto besolved.Nodesof the first type arecalledAND nodesand thoseof the lattertype OR nodes.NodesA and A\" of Figure11.13(b)areOR nodeswhereasnodeA' is an AND node.The AND nodesare drawn with an arc acrossalledgesleaving the node.Nodeswith no descendentsarecalledterminal.Terminalnodesrepresentprimitive problemsand aremarkedeithersolvableor not solvable.Solvableterminalnodesarerepresentedby rectangles.An

AND/ORgraphneednot alwaysbea tree.Breakingdown a probleminto severalsubproblemsis known as problem

reduction.Problemreductionhas beenusedon suchproblemsas theoremproving, symbolic integration,and analysis of industrialschedules.Whenproblemreductionis used,two different problemsmay generatea commonsubproblem.In this caseit may be desirableto have only one noderepresenting the subproblem(this would imply that the subproblemis to besolved only once).Figure11.14shows two AND/OR graphs for casesinwhich this is done.

Figure11.14Two AND/ORgraphsthat arenot trees

Note that the graph is no longera tree. Furthermore,suchgraphsmayhave directedcyclesas in Figure11.14(b).The presenceof a directedcycledoesnot in itselfimply the unsolvabilityof the problem.In fact, problemAof Figure11.14(b)canbesolvedby solvingthe primitive problemsG,H,andI. This leadsto the solutionof D and E and henceof B and C.A solutiongraph is a subgraphof solvablenodesthat shows that the problemis solved.Possiblesolutiongraphsfor the graphsof Figure11.14areshown by heavyedges.

Let us assumethat there is a cost associatedwith eachedge in theAND/ORgraph. The costof a solutiongraph H of an AND/ORgraphG is the sum of the costsof the edgesin H. The AND/OR graph decision

528 CHAPTER11.MV-HARD ANDMV-COMPLETEPROBLEMS

problem (AOG) is to determinewhetherG has a solutiongraph of costatmost k, for k a given input.

Example11.17Considerthe directedgraphof Figure11.15.Theproblemto be solved is Pi.To do this, onecan solvenodeP2,Ps,or P7, as Pi is anOR node.The costincurredis then either2,2,or 8 (i.e.,costin additionto that of solving one of p2,Pz,or P-j). To solve P2,both P4 and P5 havetobesolved,as Pi is an AND node.Thetotalcostto dothis is 2.To solveP3,we can solve eitherP5 or P6.The minimum costto do this is 1.NodeP-j is free.In this example,then, the optimalway tosolve P\\ is to solvePqfirst, then P3,and finally P1p Thetotalcostfor this solutionis 3. \342\226\241

AND node

Figure11.15AND/ORgraph

Theorem11.7CNF-satisfiabilityocthe AND/ORgraphdecisionproblem.

Proof:Let P bea propositionalformula in CNF.We showhow to transforma formula P in CNF into an AND/ORgraphsuchthat the AND/ORgraphsoobtainedhas a certainminimum costsolutionif and only if P is satisfiable.Let

P=f\\Ci, Ci = \\Jlj

wherethe Ij'sareliterals.Thevariablesof P, V(P) arexi,X2,-\342\226\240\342\226\240,xn.TheAND/ORgraphwill have nodesas follows:

1.Thereis a specialnodeSwith no incomingarcs.Thisnoderepresentsthe problemtobe solved.

11.3.NT'-HARDGRAPHPROBLEMS 529

2.ThenodeS is an AND nodewith descendentnodesP,xi,X2,\342\226\240\342\226\240\342\226\240,xn.3. EachnodeXi representsthe correspondingvariable X{ in the formula

P.EachXi is an OR nodewith two descendentsdenotedTxi and Fxtrespectively.If Tx{is solved,then this will correspondto assigningatruth value of true to the variable x%. SolvingnodeFx{will correspondto assigninga truth value of false to xz.

4. ThenodeP representsthe formula P and is an AND node.It has kdescendentsC\\, C-i,\342\226\240\342\226\240

\342\226\240, Ck- Node Ci correspondsto the claused inthe formula P. The nodesCi areOR nodes.

5. Eachnodeof type Txi or Fxi has exactlyonedescendentnodethatis terminal(i.e.,has no edgesleaving it). TheseterminalnodesaredenotedV\\,V2, \342\226\240\342\226\240

\342\226\240,v^n\342\226\240

To completethe constructionof the AND/ORgraph,the following edgesand costsareadded:

1.PromeachnodeCt an edge(Ci,Txj)is addedif Xj occursin claused.An edge(Ct,Fxj)is addedif Xj occursin claused.This is donefor allvariablesXj appearingin the clauseCt.ClauseCi is designatedan OR node.

2. Edgesfrom nodesof type TxiorFxi to theirrespectiveterminalnodesareassigneda weight, or costof 1.

3. All otheredgeshave a costof 0.

In orderto solve S,eachof the nodesP,xi,X2,\342\226\240\342\226\240\342\226\240,xn must be solved.Solvingnodesxi,x-2,\342\226\240\342\226\240

\342\200\242, xn costsn.To solveP,we must solveallthe nodesC\\, C*2,...,Ck\342\226\240 Thecostof a nodeCi is at most1.However, if one of itsdescendentnodeswas solvedwhile solvingthe nodesx\\, X2, \342\226\240\342\226\240

\342\226\240, xn, then theadditionalcosttosolveCi is 0,asthe edgesto its descendentnodeshave cost0 and oneof its descendentshas already beensolved.That is,a nodeCi canbe solvedat no costif oneof the literalsoccurringin the claused has beenassigneda value of true. Fromthis it follows that the entiregraph (that is,nodeS) canbesolvedat a costn if thereis someassignmentof truth valuesto the rcj'ssuch that at leastone literalin eachclauseis true under thatassignment,i.e.,if the formula P is satisfiable.If P is not satisfiable,thenthe costis morethan n.

We have now shownhow toconstructan AND/ORgraph from a formulaP suchthat the AND/OR,graphsoconstructedhas a solutionof costn if andonly if P is satisfiable.Otherwisethe costis morethan n.The constructionclearly takesonly polynomial time.This completesthe proof. \342\226\241

530 CHAPTER11.MV-HARD ANDMV'-COMPLETEPROBLEMS

Example11.18Considerthe formula

P = {xiV x2 Va;3) A (xi V x2 Vi3)A (xi V x2); V(P) = xi,x2,x3;n = 3

Figure11.16shows the AND/ORgraphobtainedby applying theconstruction of Theorem11.7.

ThenodesTx\\,Tx2,and Tx% can be solved at a total costof 3. ThenodeP costsnothingextra.ThenodeScanthen besolvedby solvingallitsdescendentnodesand the nodesTx\\,Tx2,and Tx%. Thetotalcostfor thissolutionis 3 (which is n). Assigningthe truth value of true to the variablesof P resultsin P'sbeingtrue. \342\226\241

EXERCISES1.Let SATYbe the problemof determiningwhethera propositional

formula in CNFhaving at mostthree literalsper clauseis satisfiable.Show that CNF-satisfiability oc SATY. Hint Show how to write aclausewith morethan threeliteralsas the andof severalclauseseachcontainingat mostthreeliterals.Forthis you have to introducesomenew variables.Any assignmentthat satisfies the originalclausemustsatisfy all the new clausescreated.

2. Let SAT3besimilarto SATY (Exercise1)exceptthat eachclausehasexactlythreeliterals.Show that SATY oc SAT3.

3. Let F be a propositionalformula in CNF.Two literalsx and y inF are compatibleif and only if they are not in the sameclauseandx 7^ y. Theliteralsx and y are incompatibleif and only if x and y arenot compatible.LetSATINCbe the problemof determiningwhethera formula F in which eachliteralis incompatiblewith at mostthreeotherliteralsis satisfiable.Show that SAT3 oc SATINC.

4. Let 3-NODECOVERbe the nodecover decisionproblemofSection 11.3restrictedto graphs of degree3. Show that SATINC oc3-NODECOVER(seeExercise3).

5. [Feedbacknodeset]

(a) Let G = (V,E) be a directedgraph. Let S C V be a subsetof verticessuch that the deletionof S and all edgesincidenttoverticesin Sresultsin a graphG'with no directedcycles.SuchanS is a feedbacknodeset.Thesizeof S is the numberof verticesinS.Thefeedbacknodesetdecisionproblem(FNS)is todeterminefor a given input k whetherG has a feedbacknodeset of sizeatmostk. Show that the nodecover decisionproblemocFNS.

11.3.MP-HARDGRAPHPROBLEMS 531

AND nodesjoinedby arcAll other nodesare OR

Figure11.16AND/ORgraph for Example11.18


(b) Write a polynomial timenondeterministicalgorithmfor FNS.

6. [Feedbackarcset] Let G = {V,E)bea directedgraph.SC E is a feedbackarcset of G if and only if every directedcycle in G containsanedgein S.Thefeedbackarcsetdecisionproblem(FAS)is to determinewhetherG has a feedbackarcset of sizeat mostk.

(a) Showthat the nodecoverdecisionproblemoc FAS.

(b) Write a polynomial timenondeterministicalgorithmfor FAS.

7. The feedbacknodeset optimizationproblemis to find a minimumfeedbacknodeset (seeExercise5).Show that this problemreducestoFNS.

8.Show that the feedbackarcsetminimizationproblemreducesto FAS(Exercise6).

9. [Hamiltoniancycle]Let UHCbethe problemof determiningwhetherinany given undirectedgraph G,thereexistsan undirectedcycle goingthrough eachvertex exactlyonceand returning to the start vertex.Show that DHCoc UHC(DHCis defined in Section11.3).

10.ShowUHCoc CNF-satisfiability.

11.ShowDHCoc CNF-satisfiability.

12.[Hamiltonianpath] An i to j Hamiltonianpath in graph G is a pathfrom vertexi to vertexj that includeseachvertexexactlyonce.Showthat UHCis reducibleto the problemof determiningwhetherG hasan i toj Hamiltonianpath.

13.[Minimum equivalentgraph]A directedgraphG = (V,E) is anequivalent graphof the directedgraphG'= (V,E')if and only if E C E'andthe transitiveclosuresof G andG'arethe same.G is a minimumequivalent graph if and only if \\E\\ is minimum amongallequivalent graphsof G'.Theminimum equivalent graph decisionproblem(MEG) is todeterminewhetherG'has a minimum equivalent graph with \\E\\ < k,wherek is somegiven input.

(a) Showthat DHCoc MEG.(b) Write a nondeterministicpolynomial timealgorithmfor MEG.

14.[Cliquecover] The cliquecoverdecisionproblem(CC) is to determinewhetherG is the unionof / or fewer cliques.Show that the chromaticnumberdecisionproblemoc CC.

11.4.MV-HARD SCHEDULINGPROBLEMS 533

15.[Setcover] Let F = {Sj}be a finite family of sets.Let T C F be asubsetof F.T is a cover of F iff

(J Si = U SiSier s%eF

Theset cover decisionproblemis to determinewhetherF has a coverT containingno morethan k sets.Show that the nodecover decisionproblemis reducibletothis problem.

16.[Exactcover] Let F = {Sj}be as in Exercise15.T C F is an exactcover of F iff T is a cover of F and the setsin T are pairwisedisjoint. Show that the chromaticnumberdecisionproblemreducestothe problemof determiningwhetherF has an exactcover.

17.Show that SAT3 oc EXACT COVER(seeExercise16).18.[Hittingset] Let F beas in Exercise16.The hittingsetproblemis to

determinewhetherthereexistsa setH suchSj \302\243 F. Show that exactcover oc hittingset.determinewhetherthereexistsa setH suchthat \\H f~l Sj\\

= 1for all

19.[Tautology] A propositionalformula is a tautology if and only if it istrue for allpossibletruth assignmentsto its variables.The tautologyproblemis to determinewhethera DNF formula is a tautology.

(a) Show that CNF-satisfiability ocDNF tautology.(b) Write a polynomial timenondeterministicalgorithmTAUT(.F)

that terminatessuccessfullyif and only if F is not a tautology.

20.[Minimum booleanform] Let the lengthof a propositionalformula beequaltothe sum of the numberof literalsin eachclause.Two formulasF and G on variables equivalent if for all assignmentsto xi,...,xn,F is true if and only if G is true. Show that decidingwhetherF has an equivalent formula of lengthno morethan k is MV-hard. (Hint ShowDNF tautology reducestothis problem.)

11.4A/T-HARDSCHEDULINGPROBLEMSTo prove the resultsof this section,we need to use the AfV-havd problemcalledpartition.This problemrequiresus todecidewhethera givenmultiset A = {ai,a,2,\342\226\240\342\226\240\342\226\240,an}of n positiveintegershas a partitionP such thatJ2iePai = ^Zigpai- We can show this problemis ATP-hard by first showingthe sum of subsetsproblem(Chapter7) to beAfV-hard. Recallthat in thesum of subsetsproblemwe have todeterminewhetherA = {a\\,a,2,\342\226\240\342\226\240

\342\226\240, an}has a subsetS that sumstoa given integerM.

534 CHAPTER11.MV-HARD AND HV-COMPLETEPROBLEMS

Theorem11.8Exactcover ocsum of subsets.

Proof:Theexactcover problemis shownATP-hard in Section11.3,Exercise16.In this problemwe aregiven a family of setsF = {Si,S2,\342\226\240\342\226\240

\342\226\240, Sk}andare requiredto determinewhetherthereis a subsetTCFofdisjointsetssuchthat

(J Si = {J Si = {ui,u2,...,un}Si\302\243T S,eF

Promany given instanceof this problem,constructthe sum of subsetsproblem A = {a\\,...,ajt}with aj \342\200\224 X)i<i<n\302\243ji(k

+lY~l,whereeji= 1if Ui \342\202\254 Sjand eji= 0 otherwise,and M = J2o<i<n{k+1)*= {(k+1)\"\342\200\224 l)/k.Clearly,F has an exactcover if and only if A = {a\\,..., a^} has a subsetwith sumM. SinceA and M can be constructedfrom F in polynomial time,exactcover ocsum of subsets. \342\226\241

Theorem11.9Sumof subsetsocpartition.

Proof:Let A = {a\\,...,an}and M definean instanceof the sum of subsetsproblem.Constructthe set B = {61,62 \342\200\242\342\200\242\342\200\242, ^+2}with bi = ai,1< i < n,bn+i = M + 1,and bn+2 = (Xa<i<na\302\253)

+ 1\342\200\224 M. B has a partitionif andonly if A has a subsetwith sum M. SinceB can be obtainedfrom A andM in polynomial time,sum of subsetsoc partition. \342\226\241

One can easily show partitionoc 0/1-knapsackand partitionoc jobsequencing with deadlines.Hence,theseproblemsarealsoAfV-havd.

11.4.1SchedulingIdenticalProcessorsLet Pi,1< i < m, be m identicalprocessors(or machines).ThePi could,for example,be lineprintersin a computeroutput room.Let Ji,1< i < n,ben jobs.JobJi requiresti processingtime.A scheduleS is an assignmentof jobstoprocessors.ForeachjobJi,S specifiesthe timeintervals and theprocessor(s)on which thisjobis tobeprocessed.A jobcannotbeprocessedby morethan oneprocessorat any given time.Let fi bethe timeat whichtheprocessingof jobJi is completed.Themean finish time(MFT)of scheduleS is

MFT(S)= - Y, fi1<1<71

Let Wi be a weight associatedwith eachjobJj. The weightedmean finishtime(WMFT) of scheduleS is

WMFT(S)= - V wifil<i<n

11.4.MV-HARD SCHEDULINGPROBLEMS 535

Let Ti be the timeat which Pi finishesprocessingalljobs(orjobsegments)assignedto it.Thefinish time(FT) of S is

FT(5)= max{T}l<l<m

ScheduleS is a nonpreemptivescheduleif and only if eachjobJi is processedcontinuously from start to finish on the sameprocessor.In a preemptivescheduleeachjobneednot beprocessedcontinuously tocompletionon oneprocessor.

At this point it is worth notingthe similarity betweenthe optimaltapestorageproblemof Section4.6and nonpreemptiveschedules.Mean retrievaltime,weightedmeanretrievaltime,and maximumretrievaltimerespectivelycorrespondto meanfinish time,weightedmeanfinish time,and finish time.Minimumfinish timeschedulescanthereforebeobtainedusingthe algorithmdevelopedin Section4.6.Obtainingminimum weightedmeanfinish timeandminimum finish timenonpreemptiveschedulesis ATP-hard.

Theorem11.10Partitionocminimum finish timenonpreemptiveschedule.

Proof:We prove this for m = 2. The extensionto m > 2 is trivial. Leto-i, 1 < i < n, be an instanceof the partitionproblem.Define n jobswith processingrequirementst% = ai,1< i < n. Thereis a nonpreemptiveschedulefor this setof jobson two processorswith finish timeat most

\302\243) tj/2iff thereis a partitionof the a^s. \342\226\241

Example11.19Considerthe following input to the partitionproblem:a\\ = 2, a<i = 5,a% = 6,a^ = 7, and

a\302\247

= 10.The correspondingminimum finish timenonpreemptivescheduleproblemhas the input t\\ =2,^2=5,^3 = 6, \302\2434

= 7, and\302\2435

= 10.Thereis a nonpreemptiveschedulefor thissetof jobswith finish time15:P\\ takesthe jobs\302\2432

and\302\2435; P2 takesthe jobs

t\\,t%, and\302\2434.

This solutionyields a solutionfor the partitionproblemalso:{a2,a5},{ai,a3,a4}. \342\226\241

Theorem11.11Partitionoc minimum WMFT nonpreemptiveschedule.

Proof:Onceagainwe prove this for m = 2 only. Theextensionto m > 2is trivial. Let ai,1< i < n, define an instanceof the partitionproblem.Constructa two-processorschedulingproblemwith n jobsand Wi = U = a,i,1< i < n. Forthis set of jobsthereis a nonpreemptivescheduleS withweighted mean flow timeat most l/2âf+ l/4(Sai)2if and oruy if theai'shave a partition.To seethis, let the weightsand timesof jobson Pi be(wi,i\"i),..., (wk,ik)and on P2 be (u>i, t\\), \342\226\240\342\226\240

\342\226\240, {wi,ti).Assume this is the


orderin which the jobsareprocessedon theirrespectiveprocessors.Then,for this scheduleSwe have

n*WMFT(S)= ttfiii +u}2(*i+*2)+ ---+\302\253>*(*!+ \342\200\242\342\200\242\342\200\242**)

+wltl+w2(ii+I2)H \\-wi(ii-\\ ti)

= 5E^2+ ^(E^)2+ i(E^-E^)2Thus,n * WMFT(5')> (1/2)\302\243 w}+ (1/4)(Ew()2.This value is obtainableiff the lOf's (and so alsothe a^s) have a partition. \342\226\241

Example11.20Consideragainthe partitionproblema\\ = 2,0,2= 5,a% =6,a4= 7, and a5 = 10.Here,^\302\253i

= W* + 52 + 62 + 72 + 102)= 107,Ea;= 30,and \\{Y.ai)2= 225.Thus, 1/2\302\243

a?+1/4(Ea{)2= 107+225=332.The correspondingminimum WMFT nonpreemptivescheduleproblemhas the input wi = ti = ai for 1< i < 5. If we assignthe jobs\302\2432

and\302\2435

toPi and the remainingjobsto P2,n * WFMT(S')=5*5+ 10(5+ 10)+ 2 * 2 +6(2+ 6) +7(2+ 6 + 7) = 332Thesamealsoyields a solutionto the partitionproblem. \342\226\241

11.4.2FlowShopSchedulingWe shall use the flow shop terminologydevelopedin Section5.10.Whenm = 2, minimum finish timeschedulescan be obtainedin 0(nlogn) timeif n jobsare to be scheduled.When m = 3, obtainingminimum finishtimeschedules(whetherpreemptiveor nonpreemptive)is .ATP-hard. Forthe caseof nonpreemptiveschedulesthis is easy to see(Exercise2). Weprove the resultfor preemptiveschedules.Theproofwe give is alsovalidfor the nonpreemptivecase.However,a much simplerproofexistsfor thenonpreemptivecase.

Theorem11.12Partitionocthe minimum finish timepreemptiveflow shopschedule(m > 2).Proof:We use only threeprocessors.Let A = {0,1,0,2,...,an}define aninstanceof the partitionproblem.Constructthe following preemptiveflowshop instanceFS,with n + 2 jobs,m = 3 machines,and at most2 nonzerotasksperjob:

h,i= (H\\ h,i= 0; *3,i = a,i,l<i<n

h,n+i = T/2; \302\2432,71+1

= T; \302\2433^+1= 0

\302\2431,71+2

= 0; \302\2432,71+2

= T; \302\2433,71+2

= T/2

11.4.NV-HARDSCHEDULINGPROBLEMS 537

n

where T = ^ atl

We now show that the precedingflow shopinstancehas a preemptiveschedule with finish timeat most 2Tif and only if A has a partition.

1.If A has a partitionu, then there is a nonpreemptiveschedulewithfinish time2T.Onesuchscheduleis shown in Figure11.17.

2. If A has no partition,then allpreemptiveschedulesfor FS must havea finish timegreaterthan 2T. This can be shown by contradiction.Assume that thereis preemptiveschedulefor FS with finish timeatmost 2T.We make the following observationsregardingthis schedule:

(a) Task\302\243i,n+i

must finish by timeT as\302\2432,71+1

= T and cannotstartuntil

\302\243i,n+ifinishes.

(b) Task\302\2433^+2

cannot start beforeT units of timehave elapsedash,n+2 = T.

Observation(a) impliesthat only T/2 of the first T timeunits are free onprocessorone.Let V be the setof indicesof taskscompletedon processor1by timeT (excludingtask

\302\243i,n+i). Then,

X>i,i<T/2iev

as A has no partition.Hence

Z) %>T/2l<i<n

Theprocessingof jobsnot includedin V cannotcommenceon processor3until after timeT sincetheirprocessor1processingis not completeduntilafter T. This togetherwith observation(b) impliesthat the totalamountofprocessingleft for processor3 at timeT is

h,n+2 + Z *3,i >Tigv

l<i<n

The schedulelengthmust thereforebemorethan 2T. \342\226\241

538 CHAPTER11.NV-HARDAND HV'-COMPLETEPROBLEMS

{tij\\ieu} h,n + l

tl,n+2

{\302\2433 ,\342\200\242

1 ieu}

{tij\\i\302\243u}

h,n + l

h,n +2 {t3J\\ieu}

0 T/2 T 37V2 XT

Figure11.17A possibleschedule

11.4.3Job ShopSchedulingA jobshop, like a flow shop,has m different processors.Then jobstobescheduledrequirethe completionof severaltasks.Thetimeof the j'thtaskfor jobJi is tkjj. Task j is to be performedon processorP^. Thetasksfor any jobJ{are to be carriedout in the order1,2,3,...,and soon.Taskj cannot beginuntil task j \342\200\224 1 (if j > 1) has beencompleted.Notethatit is quite possiblefor a jobto have many tasksthat are tobe performedon the sameprocessor.In a nonpreemptiveschedule,a task oncebegunis processedwithout interruptionuntil it is completed.Thedefinitions ofFT(S')and MFT(S')extendto this problemin a natural way. Obtainingeithera minimum finish timepreemptivescheduleora minimum finish timenonpreemptivescheduleis A/''P-hardeven when m = 2. The prooffor thenonpreemptivecaseis very simple(usepartition).We presentthe proofforthe preemptivecase.This proofwill alsobe valid for the nonpreemptivecasebut will not be the simplestprooffor this case.Theorem11.13Partitionoc minimum finish time preemptivejobshopschedule(m > 1).Proof:We use only two processors.Let A = {ai,a,2,...,an}define aninstanceof the partitionproblem.Constructthe following jobshopinstanceJS,with n + 1jobsand m = 2 processors.

Jobs1,...,n : t\\î = t2,i,2=<H for 1<

\302\253< n

Jobn + 1:\302\2432,71+1,1

=\302\2431,71+1,2

=\302\2432,71+1,3

=\302\2431,71+1,4

= T/2n

where T = ^jaj1

We showthat the jobshopproblemhas a preemptiveschedulewith finishtimeat most2Tif and only if S has a partition.

11.4.NT'-HARDSCHEDULINGPROBLEMS 539

1.If A has a partition,u then thereis a schedulewith finish timeIT (seeFigure11.18).

2. If A has no partition,then allschedulesfor JSmust have a finish timegreaterthan 2T.To seethis, assumethat thereis a scheduleS for JSwith finish timeat mostTT.Then,jobn + 1must bescheduledas inFigure11.18.Also, therecan beno idletimeon eitherP\\ or P^. LetR be the set of jobsscheduledon Pi in the interval [0,T/2].Let R!be the subsetof R representingjobswhose first task is completedonPi in this interval.Sincethe a^s have no partition,YljeR'ti,j,i<T/2.Consequently,YljeR'*2j,2 < T/2.Sinceonly the secondtasksof jobsin R' can be scheduledon P2 m the interval [T/2,T],it follows thatthere is someidletimeon Pi in this interval. Hence,S must havefinish timegreaterthan 2T. \342\226\241

{/,,,-,lieu]

h.n+l.l

tl.n+1,2

{?2,i,2 l\302\253eu}

{/,,,-,lieu]

h,n+\\,3

t l,n + l,4

{t2iii2\\i\302\243u}

0 T/2 T 3T/2 2T

Figure11.18Another schedule

EXERCISES1.[Jobsequencing]Showthat the jobsequencingwith deadlinesproblem

(Section8.1.4)is AfV-hard.

2. Show that partitionoc the minimum finish timenonpreemptivethree-processorflow shopschedule.Useonly onejobthat has threenonzerotasks.All otherjobshave only onenonzerotask.

3. Show that partitionoc the minimum finish timenonpreemptivetwo-processorjobshopschedule.Useonly onejobthat has threenonzerotasks.All otherjobshave only onenonzerotask.

4. Let Ji,..., Jn benjobs.Jobi has a processingtimet{ and a deadlined{. Jobi is not available for processinguntil timer{. Show thatdecidingwhetheralln jobscan be processedon onemachinewithoutviolatingany deadlineis A^'P-hard. (Hint: Usepartition.)

540 CHAPTER11.MV-HARD ANDHV'-COMPLETEPROBLEMS

5. Let Ji, 1< i < n, be n jobsas in Exercise4. Assume r^ = 0,1<i < n. Let fi be the finish timeof Ji in a one-processorscheduleS. The tardinessTj of Jj is max {0,/j\342\200\224 <ij}.Let ioj,1 <

\302\253< n,

be nonnegativeweights associatedwith the Ji's.Thetotal weightedtardinessis J2wiTi- Show that finding a scheduleminimizing WiTiis J\\fV-hard. (Hint:Usepartition).

6. Let Ji, 1< i < n, be n jobs.JobJ; has a processingtimeof U. Itsprocessingcannotbeginuntil timefj. Let Wi be a weight associatedwith Ji.Let /{ be the finish timeof Ji in a one-processorscheduleS.Show that finding a one-processorschedulethat minimizesJ2Wifi isAfV-h&rd.

7. Show that the problemof obtainingoptimalfinish timepreemptiveschedulesfor a two-processorflow shop is A^'P-hard when jobsarereleasedat two different timesR\\ and Ra- Jobsreleasedat Ri cannotbescheduledbeforeRi.

11.5AfP-HARDCODEGENERATIONPROBLEMS

Thefunction of a compileris to translateprogramswritten in somesourcelanguageinto an equivalentassemblylanguageormachinelanguageprogram.Thus,the C++compileron the Sparc10translatesC++programsinto themachinelanguageof this machine.We lookat the problemof translatingarithmeticexpressionsin a languagesuch as C++into assembly languagecode.The translationclearly dependson the particularassembly language(andhencemachine)beingused.To begin,we assumea very simplemachinemodel.We callthis modelmachineA. This machinehas only one registercalledthe accumulator.All arithmetichas to beperformedin this register.If0 representsa binary operatorsuchas +,\342\200\224,*, and /, then the left operandof 0 must be in the accumulator.Forsimplicity, we restrictourselvestothesefour operators.The discussioneasily generalizesto otheroperators.Therelevant assembly languageinstructionsare:

LOAD X loadaccumulatorwith contentsof memory locationX.STOREX storecontentsof accumulatorinto memory locationX.OPX OP may be ADD, SUB,MPY, orDIV.

The instructionOP X computesthe operatorOPusingthe contentsofthe accumulatoras the left operandand that of memory locationX as theright operand.As an example,considerthe arithmeticexpression(a+6)/(c+d). Two possibleassembly languageversionsof this expressionaregiven in

Figure11.19.Tl and T2aretemporarystorageareasin memory. In both

11.5.AfV-HARD CODEGENERATIONPROBLEMS 541

casesthe resultis left in the accumulator.Code(a) is two instructionslongerthan code(b). If eachinstructiontakesthe sameamount of time,then code(b) will take25% lesstimethan code(a). Forthe expressions(a+b)/(c+d)and the given machineA, it is easy to seethat code(b) is optimal.

LOAD aADD bSTORE TlLOAD cADD dSTORE T2LOAD TlDIV T2

LOAD cADD dSTORE TlLOAD aADD bDIV Tl

(a) (b)

Figure11.19Two possiblecodesfor (a +b)/(c+ d)

Definition11.7A translationof an expressionE into the machineorassembly languageof a given machineis optimalif and only if it has a minimumnumberof instructions. \342\226\241

Definition11.8A binary operator\302\251 is commutative in the domainD iff

a \302\251 b = b \302\251 a for alla and b in D. \342\226\241

MachineA can be generalizedto another machineB. MachineB hasN > 1registersin whicharithmeticcanbeperformed.Therearefour typesof machineinstructionsfor B:

1. LOAD M,R,2. STORE M,R3. OP R1,M,R24. OP R1,R2,R3

Thesefour instructiontypes performthe following functions:

1.LOAD M,R placesthe contentsof memory locationM into registerR,1<R<N.

2. STOREM, R storesthe contentsof registerR, 1< R < JV, intomemory

locationM.

542 CHAPTER11.AfV-HARD ANDMV-COMPLETEPROBLEMS

3.OP Rl, M, R2 computescontents(i?l)OP contents(M)and placesthe result in registerR2. OP is any binary operator(for example,+,

\342\200\224, *, or /);Rl and R2 areregisters;Rl may equalR2;M is a memorylocation.

4. OPRl,R2,R3 is similarto instructiontype (3). HereRl,R2,and R3areregisters.Someor all of theseregistersmay be the same.

In comparingthe two machinemodelsA and B,we note that when N =1,instructionsof types (1),(2) and (3) for modelB are the sameas thecorrespondinginstructionsfor modelA. Instructionsof type (4) only allowtrivial operationslike a+a,a\342\200\224 a,a*a,and a/ato beperformedwithout anadditionalmemoryaccess.Thisdoesnot changethe numberof instructionsin the optimalcodesfor A and Bwhen N = 1.Hence,modelA is in a senseidenticaltomodelB when N = 1.FormodelB,we seethat the optimalcodefor a given expressionE may be different for different values of N.Figure11.20shows the optimalcodefor the expression(a+b)/(c*d).Twocasesareconsidered,N = 1and N = 2.Notethat when N = 1,onestorehas to bemadewhereaswhen JV = 2,no storesareneeded.TheregistersarelabeledRl and R2. RegisterTl is a temporarystoragelocationin memory.

LOADMPYSTORELOADADDDIV

c,RlRl, d,RlR1,T1a,RlRl,b,RlR1,T1,R1

LOADMPYLOADADDDIV

c,RlRl,d,Rla,R2R2,b,R2R2,R1,R1

(a) N = 1 (b) N = 2

Figure11.20Optimalcodesfor N = 1and N = 2

Given an expressionE, the first questionwe ask is:can E be evaluatedwithout any STOREs?A closelyrelatedquestionis:what is the minimumnumberof registersneededto evaluateE without any stores?We show thatthis problemis AfP-hard.

11.5.1CodeGenerationwith CommonSubexpressionsWhen arithmeticexpressionshave commonsubexpressions,they canberepresented by a directedacyclicgraph (dag). Every internalnode(nodewith


nonzeroout-degree)in the dagrepresentsan operator.Assumingtheexpression containsonly binary operators,eachinternalnodeP hasout-degreetwo.The two nodesadjacentfrom P arecalledthe left and right childrenof Prespectively.The childrenof P are the rootsof the dags for the left andright operandsof P. Node P is the parent of its children.Figure11.21showssomeexpressionsand their dag representations.

Definition11.9A leaf is a nodewith out-degreezero.A level-onenodeisa nodeboth of whosechildrenareleaves.A sharednodeis a nodewith morethan oneparent.A leaf dag is a dag in whichallsharednodesareleaves.Alevel-onedag is a dag in which allsharednodesare level-onenodes. \342\226\241

a+(b+a*c) (a+b)*(a+b+c) (a +b)*c/((a+b)*c-d)

Figure11.21Expressionsand their dags

Example11.21Thedag of Figure11.21(a)is a leaf dag.Figure11.21(b)is a level-onedag.Figure11.21(c)is neithera leafdag nor a level-onedag.

\342\226\241

A leaf dag resultsfrom an arithmeticexpressionin which the onlycommon subexpressionsaresimplevariablesorconstants.A level-onedagresultsfrom an expressionin which the only commonsubexpressionsareof the forma Ob,wherea and b aresimplevariablesorconstantsand Q is an operator.

Theproblemof generatingoptimalcodefor level-onedags is AAP-hardeven when the machinefor which codeis beinggeneratedhas only oneregister. Determiningthe minimum numberof registersneededto evaluateadag with no STOREsis alsoA/T-hard.

544 CHAPTER11.MV-HARD AND AfV'-COMPLETEPROBLEMS

Example11.22Theoptimalcodesfor the dag of Figure11.21(b)for one-and two-registermachinesis given in Figure11.22.

The minimum numberof registersneededto evaluatethis dag withoutany STOREsis two. \342\226\241

LOADADDSTOREADDSTORELOADMUL

a,RlRl,b,RlT1,R1Rl,c,RlT2,R1T1,R1R1,T2,Rl

LOADADDADDMUL

a,RlRl,b,RlRl,c,R2R1,R2,R1

(a) (b)

Figure11.22Optimalcodesfor one-and two-registermachines

To prove the abovestatements,we usethe feedbacknodeset(FNS)problem that is shown to be J\\f V-hard in Exercise5 (Section11.3).

FNS:Given a directedgraph G = (V, E) and an integerk, determinewhetherthereexistsa subsetV' of verticesV' C.Vand \\V'\\ < k suchthatthe graphH= (V -V,E -{(u,v)\\u G V or v G V'})obtainedfrom G bydeletingallverticesin V' and alledgesincidentto a vertex in V containsno directedcycles.

We explicitlyprove only that generatingoptimalcodeis J\\fV-hard.Using the constructionof this proof, we can alsoshow that determiningtheminimum numberof registersneededto evaluatea dag with no STOREsisJ\\f V-hardas well.Theproofassumesthat expressionscancontaincommutative operatorsand that sharednodesmay becomputedonly once.It is easilyextendedto allow recomputationof sharednodes.Usingan ideadue to R.Sethi,the proofis easilyextendedto the casein whichonly noncommutativeoperationsareallowed (seeExercise1).Theorem11.14FNSa the optimalcodegenerationfor level-onedagsona one-registermachine.

Proof:Let G,k be an instanceof FNS.Let n be the numberof verticesinG. We constructa dag A with the property that the optimalcodefor theexpressioncorrespondingto A has at most n + k LOADs if and only if Ghas a feedbacknodesetof sizeat mostR.


Thedag A consistsof threekinds of nodes:leaf nodes,chainnodes,andtreenodes.All chainand treenodesareinternalnodesrepresentingcommutative operators(for example,+). Leaf nodesrepresentdistinctvariables.We use dv to denotethe out-degreeof vertexv of G.Correspondingtoeachvertex v of G, there is a directedchainof chainnodesv\\, V2, \342\226\240\342\200\242

\342\200\242, Vdv+\\ mA. NodeVdv+\\ is the head nodeof the chainfor v and is the parent of twoleafnodesvl and vr (seeExample11.23and Figure11.23).Vertexv\\ isthe tail of the chain. Fromeachof the chainnodescorrespondingto vertexv, exceptthe headnode,thereis onedirectededgeto the headnodeof oneof the chainscorrespondingto a vertexw suchthat (v,w) is an edgein G.Eachsuchedgegoestoa distincthead.Note that asa resultof the additionof theseedges,eachchainnodenow has out-degreetwo. Sinceeachchainnoderepresentsa commutativeoperator,it doesnot matterwhichof its twochildrenis regardedas the left child.

At this point we have a dag in which the tailof every chainhas in-degreezero.We now introducetreenodesto combineall the tailssothat we areleftwith only one node(the root) with in-degreezero.SinceG has n vertices,we needn \342\200\224 1treenodes(note that every binary tree with n \342\200\224 1 internalnodeshas n externalnodes).Thesen \342\200\224 1nodesareconnectedtogethertoform a binary tree (any binary treewith n \342\200\224 1nodeswill do).In placeofthe externalnodeswe connectthe tailsof the n chains(seeFigure11.23(b)).This yields a dag A correspondingto an arithmeticexpression.

It is easy toseethat every optimalcodefor A will have exactlyn LOADsof leafnodes.Also, therewill beexactlyoneinstructionof type Q for everychainnodeand treenode (we assumethat a sharednode is computedonlyonce).Hence,the only variable is the numberof LOADs and STOREsofchain and tree nodes.If G has no directedcycles,then its verticescanbearrangedin topologicalorder(vertexu precedesvertexv in a topologicalorderingonly if thereis no directedpath from u to v in G).Let v-\\,V2,...,vnbe a topologicalorderingof the verticesin G. The expressionA can becomputedusingno LOADs of chainand treenodesby first computingallnodeson the chainfor vn and storingthe result of the tail node.Next,allnodeson the chainfor wn-i canbecomputed.In addition,we can computeany nodeson the path from the tail for vn^\\ to the root for which bothoperandsare available. Finally, one result needsto be stored.Next,thechain for wn_2 can be computed.Again, we can computeallnodeson thepath from this chaintail to the rootfor which both operandsareavailable.Continuingin this way, the entireexpressioncan becomputed.

If G containsat leastone cycle v\\, v?,..., Wj,Wi, then every codefor Amust contain at leastone LOAD of a chain nodeon a chain for one of\302\2531, V2, \342\226\240

\342\226\240., Uj. Further, if noneof theseverticesis on any othercycle,thenall their chainnodescan be computedusingonly one loadof a chainnode.This argumentis readily generalizedto show that if the sizeof a minimumfeedbacknodeset is p,then every optimalcodefor A containsexactlyn +p

546 CHAPTER11.AfV-HARD AND AfV'-COMPLETEPROBLEMS

LOADs. Thep LOADs correspondto a combinationof tail nodescorresponding to a minimum feedbacknodeset and the siblingsof thesetailnodes.If we had usednoncommutativeoperatorsfor chainnodesand madeeachsuccessoron a chain the left childof its parent,then the p LOADswould correspondto the tailsof the chainsof any minimum feedbackset.Furthermore,if the optimalcodecontainsp LOADsof chainnodes,then Ghas a feedbacknodesetof sizep. \342\226\241

Example11.23Figure11.23(b)shows the dag A correspondingto thegraph G of Figure11.23(a).The set {r,s} is a minimum feedbacknodeset for G.Theoperatorin eachchainand treenodecan be assumedtobe+.Eachcodefor A hasa loadcorrespondingto oneof {pl,Pr),(ql,Qr),\342\200\242\342\200\242

\342\200\242,

and {ul,ur).TheexpressionA canbecomputedusingonly two additionalLOADsby computingnodesin the orderr4, S2, q2, Qi, P2, Pi,c, \302\2533, U2, \302\253i,

\302\2432, t\\i e, si,r%, r2,n, d, b, and a.Note that a LOADis neededtocomputesiand alsoto computer%. \342\226\241

11.5.2ImplementingParallelAssignmentInstructionsA parallelassignmentinstructionhas the format (vi,1)2,\342\226\240\342\226\240

\342\226\240, vn) := (ei,e2'\342\200\236...,en)wherethe Uj'saredistinctvariablenamesand theej'sareexpressions.Thesemanticsof this statementis that the value of Wj is updatedto be thevalue of the expressionej,1< i < n.Thevalue of the expressione% is tobecomputedusingthe values the variablesin ej have beforethis instructionisexecuted.

Example11.24 1.(A,B) :=(5,C);is equivalent toA :=B;B :=C;.2. {A,B):={B,A);is equivalent to T :=A; A :=B;B :=T;.3. {A,B):= {A + B,A-B);is equivalent to Tl := A; T2 :=B;A :=

Tl+T2;5:=Tl-T2;andalsotoTl:=A; A :=A+B;B :=T1-B;.\342\226\241

As the above exampleindicates,it may be necessaryto storesomeofthe uj'sin temporary locationswhen executinga parallelassignment.Thesestoresareneededonly when someof the Uj'sappearin the expressionsej,1 < j < n. A variable V{ is referencedby expressionej if and only if V{

appearsin ej. It shouldbe clearthat only referencedvariablesneedto becopiedinto temporarylocations.Further,parts(2)and (3)of Example11.24show that not allreferencedvariablesneedto copied.

An implementationof a parallelassignmentstatementis a sequenceofinstructionsof types Tj =

Wj and Wj=

e\\, wheree'j is obtainedfrom ej byreplacingall occurrencesof a V{ that have already beenupdatedwithreferences to the temporary locationsin which the old values of V{ has been

11.5.MV-HARD CODEGENERATIONPROBLEMS 547

a) Graph G

Pl Pr Ql <]r rL rR SL SR tL tR UL UR

b) CorrespondingdagA

Figure11.23A graphand its correspondingdag


saved. Let R = (r(l),..., T(n)) be a permutationof (1,2,...,n). Then Ris a realizationof an assignmentstatement.It specifiesthe orderin whichstatementsof type V{ =

e\\ appearin an implementationof a parallelassignment statement.The orderis vT^ = e'jjv,!)^)= e'/2N, and soon. Theimplementationalsohas statementsof type Tj = vi interspersed.Withoutlossof generalitywe can assumethat the statementTj = Uj (if it appearsin the implementation)immediately precedesthe statementV{ =

e\\. Hence,a realizationcompletelycharacterizesan implementation.The minimumnumberof instructionsof type Tj =

v\302\261

for any given realizationis easy todetermine.Thisnumberis the costof the realization.ThecostC(R) of arealization R is the numberof vi that arereferencedby an ej that correspondsto an instructionVj

= e' that appearsafter the instructionV{ =e\\.

Example11.25Considerthe statement(A,B,C):={D,A+ B,A-B)\\.The3!= 6 different realizationsand their costsaregiven in Figure11.24.The realization3,2,1correspondingto the implementationC= A \342\200\224 B;B=A +B;A= D;needsno temporary stores{C(R)= 0). \342\226\241

R C{R)1,2,3 21,3, 2 22,13 22, 3,1 13,1,2 13,2,1 0

Figure11.24Realizationfor Example11.25

An optimalrealizationfor a parallelassignmentstatementis one withminimum cost.When the expressionsejareallvariable namesorconstants,an optimalrealizationcan be found in lineartime(O(n)). When the e,areallowedto beexpressionswith operatorsthen finding an optimalrealizationis AfP-Hard.We prove this statementusingthe feedbacknodesetproblem.

Theorem11.15FNSa the minimum-costrealization.

Proof:Let G = (V, E) be any n-vertexdirectedgraph. ConstructtheparallelassignmentstatementP : (vi,V2,..\342\226\240, vn) :=(ei,e2,..\342\226\240,en),wherethe Uj'scorrespondto the n verticesin V and ej is the expressionVix + vi2 +\342\226\240\342\226\240\342\226\240+ Uj..The set {v^,Uj2,...,Uj.}is the set of verticesadjacentfrom Uj


(that is,(wj, Uj-) G E(G),1< / < j). This constructionrequiresat most0(n2)time.

Let U be any feedbacknodeset for G. Let G'= {V,E')= {V -U,E\342\200\224 {(x,y)\\xG U or y G J7})be the graphobtainedby deletingvertexset Uand alledgesincidentto verticesin U. From the definitionof a feedbacknodeset,it follows that G'is acyclic.So,the verticesin V \342\200\224 U can bearrangedin a sequencesi,S2,\342\226\240

\342\226\240., sm, wherem = \\V\342\200\224

U\\ and E' containsno edge(sj,Si)for any i and j,l<i<j<m.Hence,an implementationof P inwhich variablescorrespondingto verticesin U arefirst storedin temporarylocationsfollowed by the instructionsUj = e' correspondingto vi G U,followed by the correspondinginstructionsfor si,S2,\342\226\240\342\226\240

\342\226\240, sm (in that order),will be a correctimplementation.(Notethat e' is ej with all occurrencesofVi G U replacedby the correspondingtemporary location.)TherealizationR correspondingto this implementationhas C(R) =

\\U\\. Hence,if G hasa feedbacknodeset of sizeat most k, then P has an optimalrealizationofcostat most k.

SupposeP has a realizationR of costk. Let U be the setof k variablesthat have to be storedin temporarylocationsand let R = (qi,q2,\342\226\240\342\226\240\342\226\240,Qn)-Fromthe definitionof C(R) it follows that no eQi referencesa vq. with j < iunlessvqj G U. Hence,the deletionof verticesin U from G leavesG acyclic.Thus, U definesa feedbacknodesetof sizek for G.

G hasafeedbacknodesetof sizeat most A; if and only if P hasa realizationof costat most k. Thus we can solve the feedbacknodeset probleminpolynomial timeif we have a polynomial timealgorithmthat determinesaminimum-costrealization. \342\226\241

EXERCISES

1. (a) How shouldthe proofof Theorem11.14be modified to permitrecomputationof sharednodes?

(b) [R. Sethi]Modify the proofof Theorem11.14so that it holdsfor level-onedagsrepresentingexpressionsin which all operatorsare noncommutative.{Hint Designatethe successorvertexona chainto be the left childof its predecessorvertexand use then+1nodebinary treeof Figure11.25to connecttogetherthe tailnodesof the n chains.)

(c) Show that optimalcodegenerationis AAP-hard for leafdagsonan infinite registermachine.(Hint: UseFNS.)


n +1 nodes

connectionstotail nodes

leaf node

Figure11.25Figurefor Exercise1

11.6SOME SIMPLIFIEDA/T-HARDPROBLEMS

Oncewe have shown a problemL to be J\\fV-hard, we would be inclinedto dismissthe possibility that L can be solved in deterministicpolynomialtime. At this point, however, we can naturally ask the question:Can asuitably restrictedversion(i.e.,somesubclass)of an J\\f V-hardproblembesolved in deterministicpolynomial time? It shouldbe easy to seethat byplacingenoughrestrictionson any J\\fV-hard problem(or by definingasufficiently smallsubclass),we can arriveat a polynomiallysolvableproblem.As examples,considerthe following:

1.CNF-satisfiability with at most threeliteralsper clauseis J\\fV-haid.If eachclauseis restrictedto have at most two literals,then CNF-satisfiability is polynomiallysolvable.

2. Generatingoptimalcodefor a parallelassignmentstatementis J\\fV-

hard. However,if the expressionse^ are restrictedto be simplevariables, then optimalcodecan begeneratedin polynomial time.

3. Generatingoptimalcodefor level-onedags is A/^P-hard, but optimalcodefor treescan begeneratedin polynomial time.

11.6.SOMESIMPLIFIEDAfV-HARD PROBLEMS 551

4. Determiningwhethera planargraphis threecolorableis J\\fV-hard. Todeterminewhetherit is two colorable,we only have to seewhetheritis bipartite.

Sinceit isvery unlikelythat J\\f V-haidproblemsarepolynomiallysolvable,it is important to determinethe weakest restrictionsunder which we cansolvea problemin polynomial time.

To narrow the gap betweensubclassesfor which polynomial timealgorithms are known and those for which such algorithmsare not known, itis desirableto obtainas stronga setof restrictionsunder which a problemremainsJ\\fV-hard orAfP-complete.

We statewithout proofthe severestrestrictionsunderwhichcertainproblems areknown to be J\\fV-hard orAfP-complete.We statethesesimplifiedor restrictedproblemsas decisionproblems.Foreachproblemwe specifyonly the input and the decisionto bemade.

Theorem11.16The following decisionproblemsareJ\\fV-complete.

1.NodecoverInput:An undirectedgraph G with nodedegreeat most 3 and anintegerk.Decision:DoesG have a nodecover of sizeat most hi

2.PlanarNodeCoverInput:A planarundirectedgraphG with nodedegreeat most 6 andan integerk.Decision:DoesG have a nodecover of sizeat most kl

3. ColorabilityInput:A planarundirectedgraph G with nodedegreeat most four.Decision:IsG threecolorable?

4. UndirectedHamiltonianCycleInput:An undirectedgraph G with nodedegreeat most three.Decision:DoesG have a Hamiltoniancycle?

5. PlanarUndirectedHamiltonianCycleInput:A planarundirectedgraph.Decision:DoesG have a Hamiltoniancycle?

6.PlanarDirectedHamiltonianPathInput:A planardirectedgraph G with in-degreeat most 3 and out-degreeat most 4.Decision:DoesG have a directedHamiltonianpath?

552 CHAPTER11.AfV-HARD ANDMV-COMPLETEPROBLEMS

7. Unary InputPartitionInput:Positiveintegersaj, 1< i <m, n, and B suchthat

}j a,i = nB, \342\200\224 < a% <\342\200\224-,

1< i < m, m = 3nl<i<m

Input is in unary notation.Decision:Istherea partition{Ai,.,.,An} of the a^s suchthat eachA{ containsthreeelementsand

yj a = B, 1< i < na\302\243Ai

8.Unary Flow ShowInput:Task timesin unary notationand an integerT.Decision:Istherea two-processornonpreemptiveschedulewith meanfinish timeat most T?

9. SimpleMax CutInput:A graph G = (V, E) and an integerk.Decision:DoesV have a subset Vi such that thereare at leastkedges(u,v) G E with u G V\\ and v $_ V\\1

10.SAT2Input:A propositionalformula F in CNF.Eachclausein F has atmost two literals.An integerk.Decision:Can at leastk clausesof F be satisfied?

11.MinimumEdgeDeletionBipartiteSubgraphInput:An undirectedgraph G and an integerk.Decision:Can G be madebipartiteby the deletionof at most kedges?

12.MinimumNodeDeletionBipartiteSubgraphInput:An undirectedgraph G and an integerk.Decision:Can G be madebipartiteby the deletionof at most kvertices

13.MinimumCut intoEqual-SizedSubsetsInput:An undirectedgraph G = (V,E),two distinguishedverticessand t, and a positiveintegerW.Decision:Is therea partitionV = V\\ U V2, V\\ n V2 =

<t>, \\V\\\\=

IV2I, s G Vi,t G V2, and \\{(u,v)\\u G Vuv G V2 and (it, v) G E}\\ < W1


14.SimpleOptimalLinearArrangementInput:An undirectedgraph G = (V, E) and an integerk. \\V\\

= n.Decision:Is therea one-to-onefunction / : V \342\200\224> {1,2,...,n}suchthat

(u,v)eE

11.7REFERENCES AND READINGS

A comprehensivetreatmentof AfP-hard and J\\fV-complete problemscanbe found in Computersand intractability: A Guide to the Theory of NP-Completeness,by M. Garey and D.Johnson,W. H.Freeman,1979.

Our proofsatisfiability oc directedHamiltoniancycle is due to P.Hermann. Theproofsatisfiability a AND/ORGraphsand the proofused inthe textfor Theorem11.11weregiven by S.Sahni.Theorem11.11is due toJ.Bruno,E.G.Coffman, Jr.,and R. Sethi.

Theorems11.12and 11.13are due to T. Gonzalezand S.Sahni. Theproofof Theorem11.13is due to D.Nassimi.Theproofof Theorem11.14is due toA. Aho, S.Johnson,and J.Ullman.

The fact that the codegenerationproblemfor one-registermachinesisA/^P-hard wasfirst provedby J.Brunoand R. Sethi.Theresultin theirpaperis strongerthan Theorem11.14as it applieseven to expressionscontainingno commutativeoperators.Theorem11.15is due to R. Sethi.

The resultsstatedin Section11.6were presentedby D.Johnsonand L.Stockmeyer.

Foradditionalmaterialon complexitytheory seeComplexityTheory, byC.H.Papadimitriou,Addison-Wesley,1994.

11.8ADDITIONALEXERCISES

1.[Circuitrealization]Let C bea circuitmadeup of and,or,and notgates.Let x\\,...,xn bethe inputsand / the output.Showthatdeciding

whetherf{x\\,...,xn) = F{x\\,...,xn), whereF is a propositionalformula, is A/\"P-hard.

2. Show that determiningwhetherC is a minimum circuit(i.e.,has aminimum numberof gates,seeExercise1) realizinga formula F isAAP-hard.


3. [0/1knapsack]Showthat Partitiona the 0/1knapsackdecisionproblem.

4. [Quadraticprogramming]Show that finding the maximumof afunction f(x\\,...,xn) subjectto the linearconstraintsJ2neqj<naijxj\342\200\224

bi, 1 0, 1< i < n is J\\fV-hard. Thefunction / isrestrictedto be of the form J2cix1+ J2(kxi-

5. Let G = (V, E) bea graph.Let w(i,j)bea weightingfunction for theedgesof G.A cut of G is a subsetSCV.The weight of a cut is

A max-cutis a cut of maximumweight. Show that the problemofdeterminingthe weight of a max-cutis J\\f V-h&rd.

6. [Plant location]Let Si,1< i < n, be n possiblesitesat which plantscan be located.At eachsiteat most one plant can be located.If aplant is locatedat siteSi,then a fixed costFi is incurred.This is thecostof setting up the plant. A plant locatedat Si has a maximumproductioncapacity of Cj.Therearen destinationsDi,1< i < m, towhichproductshave to beshipped.Thedemandat Di is d^, 1< i < in.Theper-unitcostof shippinga productfrom sitei todestinationj isCij. A destinationcan be suppliedfrom many plants. Define yi = 0if no plant is locatedat site i and iji = 1otherwise.Let Xij be thenumberof units of the productshippedfrom Si to Dj.Then,the totalcostis

^2 FiXji +XIIIcijxij,^2 xij = dji and ^2 xij < CiVii i j i j

All x^ arenonnegativeintegers.We assumethat J2Cij> J2di- Showthat finding yi and that the totalcostis minimizedis J\\f V-haid.

7. [Concentratorlocation]Thisproblemis very similartothe plantlocation problem(Exercise6).Theonly differenceis that eachdestinationmay besuppliedby only oneplant.When this restrictionis imposed,the plant locationproblembecomesthe concentratorlocationproblemarisingin computernetwork design.Thedestinationsrepresentcomputer terminals.Theplantsrepresentthe concentrationof informationfrom the terminalswhich they supply. Show that the concentratorlocation problemis AAP-hard undereachof the following conditions:

(a) n = 2, C\\ = C2, and F\\ = F2.(Hint: UsePartition.)(b) Fi/Ci= Fi+i/Ci+i,l< i < n, and di = 1.(Hint: Useexact

cover.)


8. [Steinertrees]Let T bea treeand R a subsetof the verticesin T, Letw(i,j)be the weight of edge(i,j)in T. If (i,j)is not an edgein T,then w(i,j)= oo.A Steinertree is a subtreeof T that includesthevertexset R. It may includeotherverticestoo.Itscostis the sumof the weights of the edgesin it. Show that finding a minimum-costSteinertreeis AfV-h&rd.

9. Assume that P is a parallelassignmentstatement(v\\,...,vn) := (e\\,...,en);,whereeache$ isa simplevariableand the Uj'saredistinct.Forconvenience,assumethat the distinctvariablesin P are (\302\253i,... ,vm)with m > n and that E =

(ii,\302\2532,\342\200\242\342\200\242\342\200\242,in) is a set of indicessuch that

eb =Vii\" Then write an 0(n)timealgorithm to find an optimal

realizationfor P.10.Let F = {Sj}bea finite family of sets.Let T < F bea subfamily of

F.Thesizeof T,\\T\\, is the numberof setsin T. Let Si and Sjbe twosetsin T. Also

\302\243j

and 5jaredisjointif and only if SzD5j= 4>.T is adisjointsubsetof F if and only if every two setsin T aredisjoint.Thesetpackingproblemis to determinea disjointsubfamilyT of maximumsize.Show that cliquea setpacking.

11.Show that the following decisionproblemis A/'P-complete.Input:Positiveintegern;Wi, 10,1< i < n; suchthat

Y^ WiXi = M\\<i<n

12.An independentsetin an undirectedgraph G(V,E) is a setof verticesno two of which areconnected.Given a graph G and an integerk, theproblemis todeterminewhetherG has an independentset of sizek.Show that this problemis AfV-complete.

13.Given an undirectedgraph G(V,E) and an integerk, the goal is todeterminewhetherG has a cliqueof sizek andan independentsetofsizek. Show that this problemis AT'-complete.

14.Isthe following problemin VI If yes, give a polynomialtimealgorithm;if not, show it is A/\"\"P-complete.

Input are an undirectedgraph G = (V,E) of degree1000and an integerk(< \\V\\). DecidewhetherG has a cliqueofsizek.

15.Given an integerm x n matrixA and an integerin x 1vectorb, the0-1integerprogrammingproblemaskswhetherthereis an integern x 1vectorx with elementsin the the set {0,1}such that Ax < b. Provethat 0-1integerprogrammingis A/vP-complete.

556 CHAPTER11.AfV-HARD AND AfV-COMPLETEPROBLEMS

16.Input are finite setsAi, A2, - --,Am and B\\1B21\342\226\240\342\226\240

\342\226\240, Bn. The setintersection problem is to decidewhether there is a set T such that\\T n Ai\\ > 1for i = 1,2,...,m, and |Tn -B,-|< 1for j = 1,2,...,n.Show that the set intersectionproblemis AA'P-complete.

17.We say an undirectedgraphG(V,E) is k colorableif eachnodeof G canbe labeledwith an integerin the range [1,A;], such that no two nodesconnectedby an edgehave the samelabel.Isthe following probleminVI If yes, presenta polynomial timealgorithmfor its solution.If not,show that it is AA'P-complete.

Given an undirectedacyclicgraphG(V,E) and an integerk,decidewhetherG is k colorable.

18.Isthe following problemin VI If yes, presenta polynomial timealgorithm; if not, show that it is AA'P-complete.

Input are an undirectedgraph G(V,E)and an integer1<k < \\V\\. Also, assumethat the degreeof eachnodein G is|V| \342\200\224 O(l).Theproblemis tocheckwhetherG hasa vertexcover of sizek.

19.Assume that thereis a polynomial timealgorithmCLQto solve theCLIQUEdecisionproblem.

(a) Show how to use CLQto determinethe maximumcliquesizeofa given graph in polynomial time.

(b) Show how to use CLQto find a maximumcliquein polynomialtime.

Chapter12

APPROXIMATIONALGORITHMS

12.1INTRODUCTION

In the precedingchapterwe saw strongevidenceto support the claimthatno J\\fV-ha,rd problemcanbesolvedin polynomialtime.Yet, many J\\fV-haidoptimizationproblemshave greatpracticalimportanceand it is desirabletosolve largeinstancesof theseproblemsin a reasonableamount of time.Thebest-knownalgorithmsfor AA'P-hard problemshave a worst-casecomplexitythat is exponentialin the numberof inputs. Although the resultsof the lastchaptermay favor abandoningthe quest for polynomial timealgorithms,thereis stillplenty of roomfor improvement in an exponentialalgorithm.We can lookfor algorithmswith subexponentialcomplexity, say 2n/c (forc > 1),2^,or nlogn.In the exercisesof Section5.7an 0(2n/2) algorithmfor the knapsackproblemwas developed.This algorithmcan alsobe usedfor the partition,sum of subsets,and exactcover problems.0(2n/3) timealgorithmsfor the max-clique,max-independentset,and minimum nodecover problemsareknown (seethe referencesat the endof thischapter).Thediscoveryof a subexponentialalgorithmfor an J\\fV-haxd problemincreasesthe maximumproblemsizethat can be solved.However,for largeprobleminstances,even an 0(n4) algorithmrequirestoomuch computationaleffort.Clearly, what is neededis an algorithmof low polynomial complexity (say0(n)or 0{n2)).

Theuse of heuristicsin an existingalgorithmmay enableit toquicklysolve a largeinstanceof a problemprovided the heuristicworks on thatinstance.Thiswasclearlydemonstratedin the chaptersonbacktrackingandbranch-and-bound.A heuristic,however, doesnot work equally effectivelyon all probleminstances.Exponentialtimealgorithms,even coupledwithheuristics,stillshow exponentialbehavioron someset of inputs. If we are

557

558 CHAPTER12.APPROXIMATIONALGORITHMS

toproducean algorithmof low polynomialcomplexitytosolvean AA'P-hardoptimizationproblem,then it is necessarytorelaxthe meaningof \"solve.\" Inthischapterwe discusstwo relaxationsof the meaningof \"solve.\" In the firstwe removethe requirementthat the algorithmthat solves the optimizationproblemP must always generatean optimalsolution.This requirementisreplacedby the requirementthat the algorithmfor P must alwaysgeneratea feasiblesolutionwith value closeto the value of an optimalsolution.Afeasiblesolutionwith value closeto the value of an optimalsolutionis calledan approximatesolution.An approximationalgorithm for P is an algorithmthat generatesapproximatesolutionsfor P.

Although at first one may discountthe virtue of an approximatesolution, oneshouldbearin mind that often, the data for the probleminstancebeingsolved is only known approximately.Hence,an approximatesolution(providedits value is sufficiently closetothat of an exactsolution)may beno lessmeaningfulthan an exactsolution.In the caseof AA'P-hardproblems, approximatesolutionshave addedimportanceas exactsolutions(i.e.,optimalsolutions)may not beobtainablein a feasibleamount of computingtime. An approximatesolutionmay be all one can get usinga reasonableamountof computingtime.

In the secondrelaxationwe lookfor an algorithmfor P that almost alwaysgeneratesoptimalsolutions.Algorithms with this property arecalledprobabilistically goodalgorithms.Theseare consideredin Section12.6.In theremainderof this sectionwe developthe terminologyto beusedin discussingapproximationalgorithms.

Let P be a problemsuch as the knapsackor the travelingsalespersonproblem.Let / bean instanceof problemP and let F*(I) bethe value of anoptimalsolutionto/. An approximationalgorithmgenerallyproducesafeasible solutionto I whosevalue F(I)is lessthan (greaterthan) F*(I)if P is amaximization(minimization)problem.Severalcategoriesof approximationalgorithmscan bedenned.

Let A bean algorithmthat generatesa feasiblesolutiontoevery instance/ of a problemP. Let F*(I)be the value of an optimalsolutiontoI andletF(I)be the value of the feasiblesolutiongeneratedby A.

Definition12.1A is an absolute approximationalgorithmfor problemPif and only if for every instanceI of P, \\F*(I) \342\200\224 F(I)\\ < k for someconstantk. \342\226\241

Definition12.2A is an f (n)-approximatealgorithmif and only if for everyinstanceJ of sizen, \\F*(I) -F(I)\\/F*(I)< /(n) for F*(I)>0. \342\226\241

Definition12.3An (.-approximatealgorithmis an /(n)-approximatealgorithm for which /(n)< e for someconstante. \342\226\241

12.1.INTRODUCTION 559

Note that for maximizationproblems,\\F*(I) -F(I)\\/F*(I)< 1forevery feasiblesolutionto7. Hence,for maximizationproblemswe normallyrequiree < 1for an algorithmto bejudgede-approximate.In the next fewdefinitionswe consideralgorithmsA(e) with e an input to A.

Definition12.4A(e) is an approximationschemeif and only if for everygiven e > 0 and probleminstance7, A{e)generatesa feasiblesolutionsuchthat \\F*(I) -F{I)\\/F*(I)< e. Again, we assumeF*(I)>0. \342\226\241

Definition12.5An approximationschemeis a polynomial timeapproximation schemeif and only if for every fixed e > 0,it has a computingtimethat is polynomial in the problemsize. \342\226\241

Definition12.6An approximationschemewhosecomputingtimeisapolynomial both in the problemsizeand in 1/eis a fully polynomial timeapproximation scheme. \342\226\241

Clearly,the most desirablekind of approximationalgorithmis anabsoluteapproximationalgorithm.Unfortunately, for most A/'T-'-hardproblemsit canbeshownthat fast algorithmsof this type existonly if V = MV.Surprisingly,this statementis true even for the existenceof /(n)-approximatealgorithmsfor certainA/'T-'-hardproblems.

Example12.1Considerthe knapsackinstancen = 3,m = 100,{pi,P2,P3}= {20,10,19},and {wi,W2,wz}= {65,20,35}.The solution (2:1,2:2,2:3)=(1,1,1) is not a feasiblesolutionas J2Wi%i > fn- Thesolution(x\\, X2, \302\2433)

=(1,0,1)is an optimalsolution.Itsvalue, J2Pixiils 39.Hence,F*(I)= 39 forthis instance.The solution(2:1,2:2,2:3)= (1,1,0)is suboptimal.Itsvalue isJ2Pi%i = 30.Thisis a candidatefor a possibleoutput from an approximationalgorithm.In fact, every feasiblesolution (in this caseall three-element0/1vectorsother than (1,1,1)are feasible)is a candidatefor output byan approximationalgorithm.If the solution(1,1,0) is generatedby anapproximationalgorithmon this instance,then F(I)= 30,\\F*(I) \342\200\224 F(I)\\=9,and F*(I)-F(I)\\/F*(I)= 0.3. \342\226\241

Example12.2Considerthe following approximationalgorithmfor the 0/1knapsackproblem:assumethe objectsare in nonincreasingorderof pi/w{.If objecti fits, then set Xi \342\200\224 1;otherwiseset Xi = 0.When this algorithmis usedon the instance(p\\,p2)= (100,20),(101,102)= (4,1),and m = 4,the objectsare consideredin the order1,2and the result is (2:1,2:2)=(1,0)which is optimal. Now, considerthe instancen = 2, (pi,p2)=(2,r),(wi,W2) = (1,?*),and m = r. When r > 2,the optimalsolutionis(x\\, X2) \342\200\224 (0,1).Itsvalue, F*(I),isr. Thesolutiongeneratedby theapproximation algorithmis (2:1,2:2)= (1,0).Itsvalue, F(I),is 2.Hence,\\F*(I)\342\200\224


F(I)\\= r \342\200\224 2.Our approximationalgorithmis not an absoluteapproximation algorithmas thereexistsno constant k such that \\F*{I) \342\200\224 F(I)\\ < k

for all instancesJ. Furthermore,note that \\F*(I) -F(I)\\/F*(I)= 1-2/r.This approaches1as r becomeslarge. \\F*(I) -F(I)\\/F*(I)< 1 forevery feasiblesolutiontoevery knapsackinstance.Sincethe above algorithmalways generatesa feasiblesolution,it is a 1-approximatealgorithm.It is,however, not an e-approximatealgorithmfor any e,e < 1. \342\226\241

Correspondingto the notions of an absoluteapproximationalgorithmand /(n)-approximatealgorithm,we can defineapproximationproblemsinthe obviousway. So,we can speakof fc-absoluteapproximateproblemsand/(n)-approximateproblems.The .5-approximateknapsackproblemis tofind any 0/1feasiblesolutionwith \\F*(I)-F(I)\\/F*(I)< .5.

Approximationalgorithmsare usually just heuristicsor rules that onthe surfacelooklike they might solve the optimizationproblemexactly.However, they do not. Instead,they only guaranteeto generatefeasiblesolutionswith values within someconstant or somefactor of the optimalvalue.Beingheuristicin nature, thesealgorithmsarevery much dependenton the individual problembeingsolved.

EXERCISE1.The following A/\"\"P-hard problemswere denned in Chapter11.For

thosedenned in Chapter11,the exercisenumbersappearinparentheses. For eachof theseproblems,clearly state the correspondingabsoluteapproximationproblem.(Someof the problemslistedbelowwere dennedas decisionproblems.For these,therecorrespondobvious optimizationproblemsthat are alsoJ\\fV-haid. Define theabsolute approximationproblemrelativetothe correspondingoptimizationproblem.)Also, show that the correspondingabsoluteapproximationproblemis AfV-h&rd.

(a) Node cover

(b) Set cover (Section11.3Problem15)(c) Set packing(Chapter11,Additional Exercise10)(d) Feedbacknodeset(e) Feedbackarcset (Section11.3,Exercise6)

(f) Chromaticnumber

(g) Cliquecover (Section11.3,Exercise14)(h) Max-independentset (seeSection12.6)

12.2.ABSOLUTEAPPROXIMATIONS 561

1 AlgorithmAColor(F,E)2 // Determinean approximationto the minimum numberof colors.3 {4 if V = 0 thenreturn0;5 elseif E = 0 thenreturn1;6 elseif G is bipartitethenreturn2;7 elsereturn4;8 }

Algorithm12.1Approximatecoloring

(i) Nonpreemptiveschedulingof independenttasksto minimizefinishtimeon m > 1processors(Section12.3)

(j) Flow shopschedulingtominimizefinish time(m > 2)

(k) Jobshopschedulingto minimizefinish time(m > 1)

12.2 ABSOLUTEAPPROXIMATIONS

12.2.1PlanarGraphColoringThereare very few A/'P-hardoptimizationproblemsfor which polynomialtimeabsoluteapproximationalgorithmsareknown. Oneproblemis that ofdeterminingthe minimum numberof colorsneededto colora planargraphG = (V,E).It is known that every planargraph is four colorable.Onecaneasily determinewhethera graph is zero,one,or two colorable.It is zerocolorableiff V = 0. It is one colorableiff E = 0. It is two colorableiffit is bipartite(seeSection6.3,Exercise7). Determiningwhethera planargraph is threecolorableis A/'P-hard. However,all planar graphsare fourcolorable.An absoluteapproximationalgorithmwith \\F*(I)\342\200\224 F(I)\\ < 1iseasy toobtain.Algorithm 12.1is suchan algorithm.It finds an exactanswer when the graphcan becoloredusingat most two colors.Sincewe candeterminewhethera graph is bipartitein time0(\\V\\ + \\E\\), the complexityof the algorithmis 0(\\V\\ + \\E\\).


1 AlgorithmPStore(/,n, L)2 // Assume that l[i] < l[i+ 1],1< i <n.3 {4 i :=1;5 for j :=1to 2 do6 {7 sum:=0;// Amount of disk j already assigned8 while (sum+ l[i])< L do9 {10 write (\"Storeprogram\",\302\253,

\"on disk\",j);11 sum:=sum+/[\302\253];

i :=i + 1;12 if i > n thenreturn;13 }14 }15 }

Algorithm12.2Approximationalgorithmto storeprograms

12.2.2 MaximumProgramsStoredProblemAssume that we have n programsand two storagedevices(say disks ortapes).We assumethe devicesare disks.Our discussionappliesto anykind of storagedevice.Let Zj be the amountof storageneededtostoretheith program.Let L be the storagecapacity of eachdisk.Determiningthemaximumnumberof thesen programsthat can be storedon the two disks(without splittinga programover the disks)is AA'P-hard.

Theorem12.1Partitionoc maximumprogramsstored.

Proof:Let {oi,02,\342\226\240\342\226\240

\342\226\240, an}definean instanceof the partitionproblem.WecanassumeJ2ai = 2T.Definean instanceof the maximumprogramsstoredproblemas follows:L = T and Zj

= Oj, 1< i <n. Clearly,{ai,...,an}hasa partitionif and only if alln programscan bestoredon the two disks. \342\226\241

By consideringprogramsin orderof nondecreasingstoragerequirement/j,we can obtaina polynomial timeabsoluteapproximationalgorithm.Function PStore(Algorithm12.2)assumesh < h < \342\226\240\342\226\240\342\226\240 < ln and assignsprograms to disk 1solongas enoughspaceremainson this tape.Then it beginsassigningprogramstodisk 2.In additionto the timeneededto initially sortthe programsinto nondecreasingorderof /j, 0(n)timeis neededto obtainthe storageassignment.


Example12.3Let L = 10,n = 4, and (l\\,h,h,h)= (2,4,5,6).FunctionPStorewill storeprograms1and 2 on disk 1and only program3 on disk 2.An optimalstorageschemestoresall four programs.Oneway to do this istostoreprograms1and 4 on disk 1and the othertwo on disk 2. \342\226\241

Theorem12.2Let / be any instanceof the maximumprogramsstoredproblem. Let F*(I)be the maximumnumber of programsthat can bestoredon two diskseachof lengthL. Let F(I)be the numberof programsstoredusingthe function PStore.Then \\F*(I)-F(I)\\ < 1.

Proof:Assume that k programsare storedwhen PStoreis used.Then,F(I)= k. Considerthe programstorageproblemwhen only onedisk ofcapacity 2L is available.In this case,consideringprogramsin orderof the non-decreasingstoragerequirementmaximizesthe numberof programsstored.Assume that p programsget storedwhen this strategy is used on a singledisk of length2L.Clearly,p > F*(I)and Y?\\h < 2L.Let J be the largestindexsuch that J2{h < L. It is easy toverify that j p-la.nd\\F*(I)-F(I)\\<l. \342\226\241

FunctionPStorecan be extendedin the obvious way toobtain a k \342\200\224 1absoluteapproximationalgorithmfor the caseof k disks.

12.2.3A^P-hard AbsoluteApproximationsTheabsoluteapproximationalgorithmsfor the planargraphcoloringand themaximumprogramstorageproblemsare very simpleand straightforward.Thus, one may expectthat polynomial timeabsoluteapproximationalgorithms existfor most otherJ\\fV-hard problems.Unfortunately, for themajority of AA'P-hard problemsonecan provide very simpleproofs to show thata polynomial timeabsoluteapproximationalgorithmexistsif and only if apolynomial timeexactalgorithmdoes.Let us lookat somesampleproofs.

Theorem12.3Theabsoluteapproximateknapsackproblemis A/'P-hard.

Proof:We show that the 0/1knapsackproblemwith integerprofitsreduces to the absoluteapproximateknapsackproblem.The theoremthen


follows from the observationthat the knapsackproblemwith integerprofitsis AA'P-hard. Assume thereis a polynomial timealgorithmA thatguarantees feasiblesolutionssuchthat \\F*(I)\342\200\224 F(I)\\ < k for every instance7 anda fixed k. Let (j>i,Wi), 1< i < n, and m define an instanceof theknapsack problem.Assume the pi areinteger.Let 7' be the instancedennedby((k+ 1)pi,Wi), 1< i < n, and m. Clearly,7 and 7' have the samesetoffeasible solutions.Further,F*(I')= (k + 1)F*(I),and 7 and 7' have the sameoptimalsolutions.Also,sinceallthepi areinteger,it follows that allfeasiblesolutionsto7' eitherhave value F*(I')or value at most F*(I')\342\200\224 {k + 1).If F(I')is the value of the solutiongeneratedby A, for instanceI', thenF*(I')-F(I')is either0 or at leastk +1.Henceif F*(I')-F(I')< k, thenF*(I')= F(I').So,A can be usedtoobtainan optimalsolutionfor 7' andhence7. Sincethe lengthof 7' is at most(logA;)(length of 7), it follows thatusingthe above construction,we canobtaina polynomialtimealgorithmforthe knapsackproblemwith integerprofits. \342\226\241

Example12.4Considerthe knapsackinstancen = 3, in = 100,{pi,P2,P?i)= (1,2,3),and (wi,W2,W3) = (50,60,30).The feasiblesolutionsare (1,0, 0), (0, 1,0), (0, 0, 1),(1,0, 1),and (0, 1,1). The values of thesesolutionsare 1,2, 3, 4, and 5 respectively. If we multiply the p'sby 5,then (pi,p2,P3)= (5,10,15).Thefeasiblesolutionsare unchanged.Theirvalues are now 5, 10,15,20,and 25 respectively.If we had an absoluteapproximationalgorithmfor k = 4, then this algorithmwouldhave to outputthe solution(0, 1,1) as no othersolutionwould be within 4 of the optimalsolutionvalue. \342\226\241

Now, considerthe problemof obtaininga maximumcliqueof anundirected graph. Thefollowing theoremshows that obtaininga polynomialtimeabsoluteapproximationalgorithmfor this problemis as hard asobtaining a polynomial timealgorithmfor the exactproblem.

Theorem12.4Max cliqueocabsoluteapproximationmaxclique.

Proof:Assume that the algorithmfor the absoluteapproximationproblemfinds solutionssuch that \\F*(I)\342\200\224 F(I)\\ < k. Promany given graph G =(V, E),we constructanothergraphG'= (V,E')sothat G'consistsof k + 1copiesof G connectedtogethersuchthat thereis an edgebetweenevery twoverticesin distinctcopiesof G.That is,if V = {v\\, t>2, \342\226\240\342\226\240

\342\200\242, vn},then

v = u*+>i,^,...x}and\302\243' = (U^{(vi,vi)\\(vp,vr)eE})U{(vi,vi)\\i?j}


(a)

(b)

Figure12.1Graphsfor Example12.5

Clearly,the maximumcliquesizein G is q if and only if the maximumcliquesizein G'is (k + l)q.Further,any cliquein G'that is within k of the optimalcliquesizein G'must containa subcliqueof sizeq, which is a cliqueof sizeq in G. Hence,we can obtain a maximumcliquefor G from a fc-absoluteapproximatemaximumcliquefor G'. \342\226\241

Example12.5Figure12.1(b) shows the graph G' that resultswhen theconstructionof Theorem12.4is appliedto the graph of Figure12.1(a).Wehave assumedk = 1. The graph of Figure 12.1(a)has two cliques.Oneconsistsof the vertexset {1,2},and the other{2,3,4}.Thus, an absoluteapproximationalgorithmfor k = 1 could output eitherof the two as asolutionclique.In the graphof Figure12.1(b),however, the two cliquesare{1,2,1',2'}and {2,3,4,2',3',4'}.Only the lattercan beoutput. Hence,anabsoluteapproximationalgorithmwith k = 1outputs the maximumclique.

\342\226\241


12.3 e-APPROXIMATIONS12.3.1SchedulingIndependentTasksObtainingminimum finish timescheduleson m, in > 2, identicalprocessors is AfV-h&rd. Thereexistsa very simpleschedulingrule that generatesscheduleswith a finish timevery closeto that of an optimalschedule.Aninstance/ of the schedulingproblemis dennedby a set of n task times

\302\243j,

1< i < n, and m, the numberof processors.Theschedulingrule we areabout todescribeis known as the LPT (longestprocessingtime) rule. AnLPTscheduleis a schedulethat resultsfrom this rule.

Definition12.7An LPTscheduleis one that is the resultof an algorithmthat, whenever a processorbecomesfree,assignsto that processora taskwhosetimeis the largestof thosetasksnot yet assigned.Tiesarebrokeninan arbitrary manner. \342\226\241

Example12.6Let m = 3,n = 6, and(\302\243i, \302\2432, \302\2433, \302\2434, \302\2435, te) = (8,7, 6,5,4, 3).

In an LPTschedule,tasks1,2,and 3 areassignedto processors1,2,and 3respectively.Tasks 4, 5, and 6 are respectively assignedto processors3, 2,and 1.Figure12.2shows this LPTschedule.The finish timeis 11.SinceJ2\302\243i/3

= 11,the scheduleis alsooptimal. \342\226\241

6 7 8 111

2

3

6

5

4

Figure12.2LPTschedulefor Example12.6

Example12.7Let m = 3, n = 7, and(\302\2431, \302\2432, \302\2433, \302\2434, \302\2435, \302\2436,^7)

= (5,5,4,4,3,3,3).Figure12.3(a)shows the LPTschedule.This has a finish timeof11.Figure12.3(b)shows an optimalschedule.Itsfinish timeis 9. Hence,for this instance\\F*{I) -F(I)\\/F*(I)= (11-9)/9= 2/9. \342\226\241

It is possibleto implementthe LPTrulesothat at most O(nlogn)timeis neededto generatean LPT schedulefor n taskson m processors.Anexerciseexaminesthis.The precedingexamplesshow that although the

12.3.e-APPROXIMATIONS 567

0 4 5 8 111

2

3

5

6

4

7

(a)LPT schedule

0 5 9

1

2

5

3

4

6 7

(b)Optimal schedule

Figure12.3LPTand optimalschedulesfor Example12.7

LPT rule may generateoptimalschedulesfor someprobleminstances,itdoesnot do sofor all instances.How bad can LPTschedulesbe relativetooptimalschedules?Thisquestionis answeredby the following theorem.

Theorem12.5[Graham] Let F*(I)be the finish timeof an optimalm-

processorschedulefor instance/ of the taskschedulingproblem.LetF(I)be the finish timeof an LPTschedulefor the sameinstance.Then,

\\F*(I)-F(I)\\ 1 1IF*(/)| -

3 3m

Proof:Thetheoremis clearly true for in = 1.Soassumein > 2. Assumethat for somein, in > 1,thereexistsa set of tasksfor which the theoremis not true. Then, let (ii,^\342\200\242\342\200\242\342\200\242,tn) define an instanceI with the fewestnumberof tasksfor which the theoremis violated.We may assumethatti > t2 > \342\200\242\342\200\242\342\200\242> tn and that an LPTscheduleis obtainedby assigningtasksin the order1,2,3,...,n.

Let S be the LPTscheduleobtainedby assigningthesen tasksin thisorder.Let F(I)be its finish time.Let k be the indexof a taskwith latestcompletiontime. Then, k = n. To seethis, supposek < n. Then, thefinish time/ of the LPT schedulefor tasks1,2,...,kis alsoF(I). The


finish time /* of an optimalschedulefor thesek tasks is no morethanF*(I).Hence,|/*- f\\/f* > \\F*(I) -F(I)\\/F*(I)> 1/3- l/(3m).(Thelatter inequality follows from the assumptionon 7.) Then |/* \342\200\224 f\\/f* >1/3\342\200\224 l/(3m)contradictsthe assumptionthat 7 is the smallestm-processorinstancefor which the theoremdoesnot hold.Hence,k = n.

Now,we show that in no optimalschedulefor 7 can morethan two tasksbe assignedto any processor.Hence,n < 2m. Sincetask n has the latestcompletiontimein the LPTschedulefor 7, it follows that this taskis startedat timeF(I) \342\200\224 tn in this schedule.Further,no processorcan have any idletimeuntil this time.Hence,we obtain

F(I)-tn < Ê?_1*i

So,F(7) < lYZU+ '&tnSinceF*(I) >

\302\243 \302\243i*i

we can concludethat

F(I)-F*(I) < ^tn01 F*(I) - m F*(I)

But, from the assumptionon 7, the left-handsideof the above inequalityis greaterthan 1/3\342\200\224 l/(3m).So,

1 1_ ^ m\342\200\2241 tn3 3m ^ m F*(I)

1 , 3(m\342\200\224l)tnorm-1< p*(i)

orF*(7) < 3tn

Hence,in an optimalschedulefor 7, no morethan two taskscan beassignedto any processor.When the optimalschedulecontainsat mosttwo taskson any processor,then it can be shown that the LPT scheduleis alsooptimal. We leave this part of the proofas an exercise.Hence,\\F*(I) \342\200\224 F(I)\\/F*(I)= 0 for this case.Thiscontradictsthe assumptionon7. So,therecan be no 7 that violatesthe theorem. \342\226\241

Theorem12.5establishesthe LPTruleasa (1/3\342\200\224 l/(3m))-approximaterulefor task scheduling.As remarkedearlier,this rulecanbeimplementedtohave complexity0(nlogn).The following exampleshowsthat 1/3\342\200\224 l/(3m)is a tight boundon the worst-caseperformanceof the LPTrule.


3m\342\200\224I 4m\342\200\224I 3m

Pi

P2

Pi

Pm-2

Pm-\\

Pm

1

2

3

1m

2m-l2m\342\200\2242

2m+l

m\342\200\2241

m\342\200\224\\

m

m +3

m +2

m+\\

PiPiP^

P/n-2

Pm-\\

Pm

1

2

3

2m\342\200\2242

2m-32m-4

m\342\200\224 2

m\342\200\2241

2m-1

m +1

m

2m 2m+1

(a)LPT schedule (b)Optimal schedule

Figure12.4Schedulesfor Example12.8

Example12.8Let n = 2m + 1,U = 2m-[(i+ l)/2j,i = 1,2,...,2m,andhm+i = rn- Figure12.4(a)shows the LPTschedule.This has a finish timeof 4m \342\200\224 1.Figure12.4(b)showsan optimalschedule.Itsfinish timeis 3m.Hence,\\F*(I)-F(I)\\/F*(I)= 1/3- l/(3m). \342\226\241

ForLPT schedules,the worst-caseerrorbound of 1/3\342\200\224 l/(3m)is notvery indicativeof the expectedclosenessof LPT finish timesto optimalfinish times. When m = 10,the worst-caseerrorbound is .3.Efficient e-approximatealgorithmsexistfor many schedulingproblems.Thereferencesat the end of this chapterpoint to someof the better-knowne-approximateschedulingalgorithms.Someof thesealgorithmsare alsodiscussedin theexercises.

12.3.2 BinPackingIn this problemwe are given n objectsthat have to be placedin bins ofequal capacity L. Objecti requireslj units of bin capacity.Theobjectiveis to determinethe minimum numberof binsneededto accommodateallnobjects.No objectmay beplacedpartly in onebin and partly in another.

Example12.9Let L = 10,n = 6, and {h,h,h,h,h,k)= (5,6,3,7,5,4).Figure12.5showsa packingof the sixobjectsusingonly threebins.Numbersin bins areobjectindices.Obviously,at leastthreebins areneeded. \342\226\241


Figure12.5Optimalpackingfor Example12.9

Thebin packingproblemcanberegardedas a variation of the schedulingproblemconsideredearlier.Thebins representprocessorsand L is the timeby which all tasksmust be completed.The variable Zj is the processingrequirementof task i. Theproblemis to determinethe minimum numberof processorsneededto accomplishthis.An alternativeinterpretationis toregardthe bins as tapes.ThevariableL is the lengthof a tape,and l{ thetape length neededto storeprogrami. The problemis to determinetheminimum numberof tapesneededto storeall n programs.Clearly, manyinterpretationsexistfor this problem.

Theorem12.6Thebin packingproblemis A/^P-hard.

Proof:To seethis, considerthe partitionproblem.Let {a\\,02,...,an}bean instanceof the partitionproblem.Definean instanceof the bin packingproblemas Zj

= Oj, 1< i <n, and L = 5Za$/2.Clearly, the minimumnumber of bins neededis two if and only if thereis a partitionfor {ai,02,\342\226\240\342\226\240

\342\226\240, an}.aOnecandevisemany simpleheuristicsfor the bin packingproblem.These

will not, in general,obtain optimalpackings.They will, however, obtainpackingsthat useonly a smallfractionof bins morethan an optimalpacking.Foursimpleheuristicsare:

1.FirstFit (FF):Indexthe bins 1,2,3,.... All bins are initially filled tolevel zero. Objectsare consideredfor packingin the order1,2,...,n. Topack objecti, find the leastindexj such that bin j is filled to a level r,r < L \342\200\224 li.Packi into bin j. Binj is now filled to level r + Zj.

2.BestFit (BF):Theinitialconditionson the bins and objectsarethe sameas for FF.When object% is beingconsidered,find the leastj such that binj is filled to a level r, r < L \342\200\224 /j and as largeas possible.Packi into bin j.Binj is now filled to level r + Zj.

3. FirstFit Decreasing(FFD):Reorderthe objectssothat Zj > Zj+i, 1 k+i,1<i <n. Now useBF to packthe objects.

Example12.10Considerthe probleminstanceof Example12.9.Figure 12.6shows the packingsresultingwhen eachof the four packingheuristics is used.ForFFDand BFDthe sixobjectsareconsideredin the order4,2,1,5,6,3.As is evident from the figure, FFDand BFDdo betterthaneitherFF or BFon this instance.Although FFDand BFDobtain optimalpackingson this instance,they do not in generalobtainsuchpackings. \342\226\241

3

1

6

2 4 5

2 3

(a)First fit

5

1

1

3

2

?

4

36

(b)Bestfit

3

4

6

2

5

1

1 2 3

(c)First and best fit decreasing

Figure12.6Packingsresultingfrom the four heuristics


this problemis concernedwith optimally locatingm plants. Thereare npossiblesitesfor theseplants,n > m. At most one plant may be locatedin any of thesen sites.We use xj^, 1 < i < n, 1 < k < m, as ran 0/1variables.Thevariablex^h = 1if and only if plant k is to belocatedat sitei. Thelocationof the plants is to bechosenso as to minimizethe totalcostof transportinggoodsbetweenplants.Let dk,i be the amountof goodsto betransportedfrom plant k to plant /. We have d^^ = 0,1< k <m. Let cybe the costof transportingoneunit of the goodsfrom sitei to sitej. ThenCij = 0, 1 < i < n. The quadratic assignmentproblem has the followingmathematicalformulation

minimizef(x) Yl Yl ciJdkjXi,kXj,li,j=lk,l=\\

Subjectto Y1T=1xi,k \342\200\224 17 1< i < 77.

Eti^fc = 1,l<fc<m

%i,k= 0,1,for alli,k

Ci,j,dk,i ^ 0, 1< i,j <n, 1<k,l< m

Thefirst conditionensuresthat at most oneplant is locatedat any site.Thesecondconditionensuresthat every plant is locatedat exactlyonesite.Thefunction f(x) is the totaltransportationcost.

Example12.11Assume two plants are to be located(m = 2) and therearethreepossiblesites(n = 3).Assume

duefci

Cn C12C21 C22C31 C32

dn'

^22 .C13

\"

C23C33 .

=

0 4'10 0

\"09 35 0 102 6 0

and

If plant 1is locatedat site1and plant 2 at site2,then the transportationcostf(x) is 9*4+5*10= 86.If plant 1is locatedat site3 and plant 2 at site1,then the costf(x) is 2*4+ 3*10= 38.The optimallocationsareplant 1at site1and plant 2 at site3.Thecostf(x) is 3*4+ 2*10= 32. \342\226\241

Theorem12.10Hamiltoniancycleex e-approximatequadraticassignment.


Proof:Let G(N,A) be an undirectedgraph with m = \\N\\. Thefollowingquadraticassignmentinstanceis constructedfrom G:

if i \342\200\224 (j mod m) + 1,1< i,j <motherwiseif {k,l)eA, 1< k,l<motherwise

The total cost f(j) of an assignment 7 of plants to locationsisYl?=ici,jd d) a), wherej = (i modm) + 1 and 7(4) is the indexof theplant assignedto locationi. If G has a Hamiltoniancycle ii,i2,...,in,i\\,then the assignmentj(j)= ij has a costf(-y) = m. If G has no Hamiltoniancycle, then at leastone of the values d ,^ ,,. modm+n must be u and sothe costbecomes> m + u \342\200\224 l.Choosingu > (1+ e)mresultsin optimalsolutions with a value of m if G has a Hamiltoniancycle,and value > (1+ e)mif G has no Hamiltoniancycle.Thus,from an e-approximatesolution,it canbedeterminedwhetherG has a Hamiltoniancycle. \342\226\241

Many othere-approximationproblemsareknown to beA^P-hard. Someof theseareexaminedin the exercises.Although the threeproblemsjustdiscussed wereATP-hard for e,e > 0,it is quitepossiblefor an e-approximationproblemto be ft/V-hard only for e in somerange,say 0 < e < r. Fore > r,theremay existsimplepolynomial timeapproximationalgorithms.

EXERCISES1.Obtainan O(nlogn)algorithmthat implementsthe LPTscheduling

rule.

2. Show that LPT schedulesare optimalfor all task setsthat haveoptimal schedulesin which no morethan two tasksareassignedto anyprocessor.

3. A uniform processorsystem is a set of m > 1processors.Processorioperatesat a speedSj, st >0.If task i requiresU units of processing,then it may be completedin ij/sjunits of realtimeon processorpi.When Si = 1,1< i <m,we have a system of identicalprocessors(Section 12.3).An MLPTscheduleis defined to be any scheduleobtainedby assigningtasksto processorsin orderof nonincreasingprocessingtimes.When a task is beingconsideredfor assignmentto a processor,it is assignedto that processoron which its finishingtimewill beearliest. Tiesare brokenby assigningthe task to a processorwith leastindex.

n =

^3S

dk,i =


(a) Let m = 3, si= 1,\302\2532

= 2, and S3= 3.Let the numbern of tasksbe 6.Let (ii,i2,*3,*4,*5,*6)= (9,6,3,3,2,2).Obtainthe MLPTschedulefor this setof tasks.Isthis an optimalschedule?If not,obtainan optimalschedule.

(b) Show that thereexistsa two processorsystem and a set J forwhich \\F*(I)-F(I)\\/F*(I)> 1/3- l/(3m).ThetimeF(I)isthe finish timeof the MLPTschedule.Note that 1/3- l/(3m)isthe boundfor LPTscheduleson identicalprocessors.

(c) Write an algorithmto obtainMLPTschedules.What is the timecomplexityof your algorithm?

4. Let I beany instanceof the uniform processorschedulingproblem.LetF(I)and F*(I)respectivelybe the finish timesof MLPTand optimalschedules.Show that F{I)/F*{I)< 2m/(m+ 1) (seeExercise3).

5.For a uniform processorsystem (seeExercises3 and 4), show thatwhen m = 2,F{I)/F*{I)< (1+ \\/l7)/4. Show that this is the bestpossiblebound for m = 2.

6.Let Pi,...,Pmbe a set of processors.Let t^, tij > 0,be the timeneededto processtaski if its processingis carriedout onprocessorPj,1< i <n, 1<j <m. Fora uniform processorsystem, Uj/tik = Sk/sj,wheres^ and Sj arethe speedsof P^ andPj respectively.In a system ofnonidenticalprocessors,sucha relationneednot exist.As an example,considern = 2, m = 2, and

*11*12hi \302\24322

= 12'3 2

If task 1is processedon P2 and task2 on Pi,then the finish timeis3. If task 1is processedon Pi and task2 on P2,the finish timeis 2.Show that if a scheduleis constructedby assigningtask i to processorj sothat Uj < tik, 1< k <m, then F{I)/F*{I)<m. ThetimesF(I)and F*(I)are the finish timesof the scheduleconstructedand of anoptimalschedule,respectively.Show that this bound is bestpossiblefor this algorithm.

7. Forthe schedulingproblemof Exercise6,define function Scheduleasin Algorithm 12.3.Thenf[j]is the currentfinish timeon processorj.So,F(I)= maxj{/[j]}.Show that F(I)/F*(I)< m and this boundis bestpossible.

8.In Exercise7, first orderthe taskssothat minj {tij}> mm,{U+ij},1< i < n. Then use function Schedule.Show that F(I)/F*(I)< mand this bound is bestpossible.


1 AlgorithmSchedule(/,t)2 {3 for j :=1 to m do f\\j] :=0;4 for i .=1to n do5 {6 k :=leastj suchthat7 f\\j] + t[i,j]</[/]+*[*,/],l</<m;8 /[*]:=/[*]+ *[\302\273,*];

9 }10 }

Algorithm12.3Scheduling

9. Show that the resultsof Exercise7 hold even if the initialorderingissuchthat maxj{Uj}>maxj{ti+ij},1< i <n.

10.Considerthe following heuristicfor the maxcliqueproblem:deletefrom G a vertexthat is not connectedtoevery othervertexand repeatuntil the remaininggraph is a clique.Show that this heuristicdoesnot result in an e-approximatealgorithmfor the maxcliqueproblemfor any e, 0 < e < 1.

11.Forthe maxcliqueproblem,considerthe following heuristic:Let S=0.Add to Sa vertexnot in S that is connectedto allverticesin S.Ifthereis no suchvertex,then stopwith S the approximatemaxclique;otherwiserepeat.Showthat the algorithmresultingfrom this heuristicin not an e-approximatealgorithmfor the max-cliqueproblemfor anye, e < 1.

12.Show the function Color (Algorithm 12.4)is not an e-approximatecoloring algorithmfor the minimum colorabilityproblemfor any e, e >0.

13.Considerany tour for the travelingsalespersonproblem.Let city i\\

be the starting point.Assume the n citiesappearin the tour in theorder

\302\253i,*2,\302\2533,\342\200\242\342\200\242\342\200\242,*n>?'n+i = i\\- Let l(ij,ij+i)be the lengthof edge

(ij,ij+i).Thearrival timeY^ at city ik is

fe-i

3=1


1 AlgorithmColor(G)2 11G = (V, E) is a graphwith |V|= n vertices.col[i]3 // is the colorto use for vertexi, 1< % <n.4 {5 i :=1;// Next colorto use6 j :=0;// Number of verticescolored7 whilej ^ n do8 {9 S :=0;// Verticescoloredwith i10 while thereis an uncoloredvertex,v,11 not adjacentto a vertex in Sdo12 {13 col[v] :=i;S :=SU {v};j :=j + 1;14 }15 i:=i+ l;16 }17 }

Algorithm12.4Functionfor Exercise12

Themeanarrival timeY is

-, n+l

k=2

Show that the e-approximateminimum meanarrival timeproblemisAfP-hard for all e, e > 0.

14.Let Yk and Y be as in Exercise13.Thevariancea in arrival timesis

-, n+l

Show that the e-approximateminimum variancetimeproblemis AfV-hard for alle, e >0.

12.4.POLYNOMIALTIMEAPPROXIMATIONSCHEMES 579

12 15 141

2

4

3

1

3 4

2

5 t

1

2

4

3

6

5

(a)Optimal for 4 tasks (b)Completedschedule (c)Overall optimal

Figure12.8Usingthe approximationschedulewith k = 4

12.4 POLYNOMIALTIME APPROXIMATIONSCHEMES

12.4.1SchedulingIndependentTasksWe have seenthat the LPTrule leadstoa (1/3\342\200\224 l/(3m))-approximatealgorithm for the problemof obtainingan m processorschedulefor n tasks.A

polynomial timeapproximationschemeis alsoknown for this problem.Thisschemerelieson the following schedulingrule:Let k be somespecifiedandfixed integer.Obtainan optimalschedulefor the k longesttasks.Schedulethe remainingn \342\200\224 k tasksusingthe LPTrule.

Example12.12Let m = 2, n = 6, (*i,*2,*3,*4,*5,*6)= (8,6,5,4,4,1),and k = 4.Thefour longesttaskshave task times8,6,5, and 4 respectively.An optimalschedulefor thesehas finish time12(Figure12.8(a)).Whenthe remainingtwo tasksarescheduledusingthe LPTrule,the scheduleofFigure12.8(b)results.This has finish time15.Figure12.8(c)shows anoptimalschedule.This has finish time14. \342\226\241

Theorem12.11[Graham] Let / bean m-processorinstanceof thescheduling problem.Let F*(I) be the finish timeof an optimalschedulefor / andlet F(I)be the length of the schedulegeneratedby the above schedulingrule.Then,

\\F*(I)-F(I)\\K 1-1/mF*{I) ~ 1+[k/m\\

Proof:Let r be the finish timeof an optimalschedulefor the k longesttasks. If F(I) = r, then F*(I)= F(I) and the theoremis proved. So,assumeF(I)> r. Let ij, 1i\302\253+i,

1 k.Also, we can assumen > m. Let j, j > k, be such that task j has finishtimeF(I). Then, no processoris idlein the interval [0,F(I)\342\200\224 tj].Sincetk+i > tj, it follows that no processoris idlein the interval [0,F(I)\342\200\224

tk+\\].Hence,

Etiti > m(F(I)-tk+l)+ tk+l

and so,F*(I) > \302\261Y,iU> F(I)- s^tk+1

or\\F*(I)-F(I)\\< ^tk+1Sinceij > tk+i, 1(l+ [k/m\\)tk+1

Combiningthesetwo inequalities,we obtain

\\F*(I)-F(I)\\< (m-l)/m=1-1/m QF*(7)

- 1+ [k/rn\\ 1+ [k/m\\

Usingthe result of Theorem12.11,we can constructa polynomial timee-approximationschemefor the schedulingproblem.This schemehas e asan input variable.For any input e, it computesan integer k such thate < (1\342\200\224 1/m)I{I+ [k/m\\). This defines the k to be used in thescheduling

rule describedabove. Solvingfor k, we obtain that any integerk, k >(m\342\200\224 l)/e\342\200\224 m,guaranteese-approximateschedules.Thetimerequiredtoobtain suchschedules,however, dependsmainly on the timeneededto obtainan optimalschedulefor k taskson m machines.Usinga branch-and-boundalgorithm,this timeis 0(mk).Thetimeneededto arrangethe taskssothatU > ti+i and alsoto obtain the LPTschedulefor the remainingn \342\200\224 k tasksis 0(nlogn).Hencethe total timeneededby the e-approximateschemeis0(nlogn+ mk) = 0(nlogn+ rn^m~l^t^m'). Sincethis timeis notpolynomial in 1/e(it is exponentialin 1/e),this approximationschemeis nota fully polynomial timeapproximationscheme.It is a polynomial timeapproximation scheme(for any fixed m) as the computingtimeis polynomialin the numberof tasksn.

12.4.2 0/1KnapsackThe0/1knapsackheuristicproposedin Example12.2doesnot result in ane-approximatealgorithmfor any e, 0 < e < 1.Supposewe try the heuristic


1 AlgorithmEpsilonApprox(p,w, m, n, k)2 // Thesizeof a combinationis the numberof objectsin it.3 // Theweight of a combinationis the sum of the weights4 //of the objectsin that combination;k is a nonnegative5 // integerthat defines the orderof the algorithm.6 {7 Pmax :=0;8 for all combinations/ of size< k and weight < m do9 {10 Pi:=EieiPH11 Pmax :=max(Pmaa;,Pr + LBound(7,p, w, m. n));12 }13 returnPmax;14 }

Algorithm12.5Heuristicalgorithmfor knapsackproblem

describedby function EpsilonApprox (Algorithm 12.5).In this function p[ ]and w[ ] are the setsof profits and weights respectively.It is assumedthatPi/wi > pi+i/wi+i,1< i < n. The variable m is the knapsackcapacityand k a nonnegativeinteger.In the for loopof lines8 to 12,all J2i=o(\different subsetsI consistingof at most k of the n objectsaregenerated.Ifthe currently generatedsubset/ is such that J2ieiwi > rni ^ ^s discarded(as it is infeasible).Otherwise,the spaceremainingin the knapsack(thatis,m \342\200\224 J2ieiwi) is filledusingthe heuristicdescribedin Example12.2.Thisheuristicis statedmoreformally as function LBound (Algorithm 12.6).

Example12.13Considerthe knapsackprobleminstancewith n = 8objects, sizeof knapsack= m = 110,p = {11,21,31,33,43,53,55,65},andw = {1,11,21,23,33,43,45,55}.

Theoptimalsolutionis obtainedby puttingobjects1,2,3,5, and 6 intothe knapsack.This resultsin an optimalprofit p* of 159and a weight of109.

We obtainthe following approximationsfor different k:

1.k = 0.Pmax is just the lower boundsolutionLBound((f),p,w, m,n) =139,a:= (1,1,1,1,1,0,0,0),w = J2lxlwl= 89,and (p*-Pmax)jP*=20/159= .126.

2. k = 1. Pmax = 151,x = (1,1,1,1,0,0,1,0),w = 101,and (p* -Pmax)/p*= 8/159= .05.


1 AlgorithmLBound(7,p, w, m, n)2 {3 s:=0;4 t :=m -Eie/\342\204\242H

5 for i :=1to n do6 if (z \302\243 I) and (w[i] < t) then7 <8 s :=s+p[i};t :=t \342\200\224 w[i];9 }10 returns;11 }

Algorithm12.6Subalgorithmfor function EpsilonApprox

3. k = 2.Pmax= p* = 159,x = (1,1,1,0,1,1,0,0), and w = 109.Table12.1gives the detailsfor k = 1.It is interestingto note that the

combinationsI = {1},{2},{3},{4},{5}neednot betriedsincefor 7 = 0,#6 is the first Xj, whichis 0,andsothesecombinationsyield the samePmax.This is true for allcombinationsI that includeonly objectsfor which X{ was1in the solutionfor 7 = 0. \342\226\241

Theorem12.12Let J bean instanceof the knapsackproblem.Let n, m,p and w be as defined for function EpsilonApprox.Let p* be the value of anoptimalsolutionfor J. Let Pmax be as defined by function EpsilonApproxon termination.Then,

\\p*-Pmax\\/p*< \\/{k + 1)

Proof:Let R bethe setof objectsincludedin the knapsackin someoptimalsolution.So, ienPi= p* and Xêi?wi ^ m- If the numberof objectsinR, \\R\\, is suchthat \\R\\

< k, then at sometimein the executionof functionEpsilonApprox,I = R and soPmax = p*.Therefore,assume\\R\\ > k. Let{Pi,Wi), 1 pi+i/wi+i,k < i < \\R\\. Fromthe first of theseassumptions,itfollows that pk+q <P*/(k+ l),1< q <

\\R\\\342\200\224 k. Sincethe for loopof lines

8 to11triesout all combinationsof sizeat most k, it follows that in someiteration,I correspondsto the set of k largestprofits in R. Hence,Pi =J2ieiPi= J2i=iPi-Considerthe computationof line10in this iteration.Inthe computationof LBound(/,p,w, m,n), letj be the leastindexsuchthat


/0678

Pmax

0139149151

Pi

11535565

Ri

1434555

LBound

12896963

PMAX = max{Pmax,Pi+ LBound}139149151151

\342\200\242ôptimal

(1,1,1,1,1,0,0,0)(1,1,1,1,0,1,0,0)(1,1,1,1,0,0,1,0)(1,1,1,1,0,0,1,0)

* Note that ratherthan updatexopttmai, it is easierto updatethe optimal/and recomputexoptimai at the end.

Table12.1Expansionof Example12.13for k = 1

j 2\" /,Wj > t, and j E R. Thus,objectj correspondsto one of the objects(pr,wr), k < r < \\R\\, and j is not includedin the knapsackby functionLBound.Let objectj correspondto (pq,wq).

At the timeobjectj is considered,t < Wj= wq. Theamount of space

filled by function LBound ism\342\200\224^2ieI u>i\342\200\224t,

andthis is largerthan Y^lZk+i Wi

(as J21Wi ^ m)- Sincethis amount of spaceis filled by consideringobjectsin nondecreasingorderof Pi/wt, it follows that the profit saddedby LBoundis no lessthan

E A + Â

whereA = m \342\200\224 t \342\200\224

J2\\ wi

Also,El\302\243gPi

< f^m-Er1^)Fromthesetwo inequalities,we obtain

p* = P/ +ES,A= Pi+ s+pq{t/wq)< Pj+S+pg

Since,Pmax> Pi+ s and pq <p* /{k+ 1),it follows that

\\p*\342\200\224 Pmax\\ pq 1

p* p*~

k + 1


Thiscompletesthe proof. \342\226\241

Thetimerequiredby Algorithm 12.5is 0(nk+l).To seethis, note thatthe totalnumberof subsetstriedis

k k k k+\\ i

\302\243(?)and

\302\243(0<\302\243n*= ^-\342\200\224-i = 0(n*)

i=0 i=0 i=0

FunctionLBound has complexity0{n).So,the totaltimeis 0(nfc+1).FunctionEpsilonApproxcan beusedas a polynomial timeapproximation

scheme.Forany given e, 0 < e < 1,we can choosek to be the leastintegergreaterthan or equalto (1/e)\342\200\224 1.This will guaranteea fractionalerrorinthe solutionvalue of at most e.Thecomputingtimeis 0{nlle).

Although Theorem12.12provides an upperboundon \\p*\342\200\224 Pmax\\/p*,

it doesnot say anything about how goodthis bound is. Nor doesit sayanything about the kind of performancewe can expectin practice.Let usnow addressthesetwo problems.

Theorem12.13Forevery k thereexistknapsackinstancesfor which \\(p*\342\200\224

Pmax)/p*\\getsas closeto l/{k+ 1) as desired.

Proof:Forany k, the simplestexamplesapproachingthe lower boundareobtainedby settingn = k+2;w\\ = 1;p\\ = 2;pi,Wi = r,2 2;and m = (k + l)r.Thenp* = (k + l)r.ThePmax given by EpsilonApproxfor this k is kr +2 and therefore\\(p*

\342\200\224 Pmax)/p*\\= (1\342\200\224 2/r)/(k+ 1).Bychoosingr increasinglylarge,onecan get as closeto l/(k+ 1)asdesired.\342\226\241

Another upperboundon the value of \\(p*\342\200\224 Pmax)/p*\\can beobtained

from the proofof Theorem12.12.We know that p* \342\200\224 Pmax<pq and p* >Pmax. Alsosincepq is oneof pk+i, \342\226\240\342\226\240\342\226\240,P|i?|5it follows that pq <p,wherep isthe (k +l)stlargestp.Hence\\{p*

\342\200\224 Pmax)/p*\\< min{l/(fc+1),p/Pmax}.In most casesp/Pmax will besmallerthan l/(k+1)andsowill give a betterestimateof closenessin casesin which the optimalis not known. We notethat p is easy tocompute.

Theorem12.14The deviationof the solutionPmax obtainedfrom thee-approximatealgorithmfrom the true optimalp* is bounded by \\{p*

\342\200\224

Pmax)/p*\\< min {l/{k+ 1),p/Pmax}. \342\226\241

EXERCISES1.Show that if line 11of Algorithm 12.5is changedto Pmax = max

{Pmax,LBound(/,p,tf,m,n)}and the fourth line of function LBound

12.5.FULLYPOLYNOMIALTIMEAPPROXIMATIONSCHEMES585

replacedby the line t :=m;,then the resultingalgorithmis not e-approximatefor any e, 0 < e < 1.Note that the new heuristicconstrains / to beoutsidethe knapsack.Theoriginalheuristicconstrains/ tobe insidethe knapsack.

2. Let G = (V, E) be an undirectedgraph. Assume that the verticesrepresentdocuments.The edgesare weightedso that w(i,j)is thedissimilarity betweendocumentsi and j. It is desiredto partitiontheverticesinto k > 3 disjointclusterssuchthat

J2 J2 w(u,v)i=l{u,v)eE

u,v&Ci

is minimized.ThesetCj is the setof documentsin clusteri. Showthatthe e-approximateversionof this problemis A/\"'P-hard for alle, e > 0.Note that k is a fixed integerprovidedwith eachprobleminstanceandmay be different for different instances.

3.Show that if we changethe optimizationfunction of Exercise2 tomaximize

5Z wiuiv)uec,

(u,v)\302\243E

then thereis a polynomial timee-approximationalgorithmfor somee, 0 < e < 1.

12.5 FULLY POLYNOMIALTIMEAPPROXIMATIONSCHEMES

Theapproximationalgorithmsandschemeswe have seensofar areparticulartothe problemconsidered.Thereis no setof well-definedtechniquesthat wecan use toobtainsuchalgorithms.Theheuristicsuseddependvery muchon the particularproblembeing solved. For the caseof fully polynomialtimeapproximationschemes,we can identify threeunderlying techniques.Thesetechniquesapply to a variety of optimizationproblems.We discussthesethreetechniquesin termsof maximizationproblems.We assumethemaximizationproblemtobeof the form

max^PiXi-l


n

subjectto 22,aiiXi\342\200\224 bj-, 1< J < ttt.

i=lx,- = 0orl, l<i<n (12.1)

Pi, a-ij > 0

Without lossof generality, we assumethat aij < bj, 1 < i < n and1< j <m.

If 1< k < n, then the assignmentx\\ = m, is said to be a feasibleassignment if and only if thereexistsat leastone feasiblesolutionto (12.1)with Xi = M, 1 < i < k. A completionof a feasibleassignmentX{ = yiis any feasiblesolutionto (12.1)with x\\ = yi, 1 < i < k. Let X{ = yiand Xi = Zi, 1 < i < k, be two feasibleassignmentssuch that for atleastone j, 1 < j < k, y{ ^ Zj. Let Yli=iPiVi ~ J2i=iPiî-We saythat y\\,...,ykdominateszi,...,Zkif and only if thereexistsacompletion yi,...,yk,yk+x,...,yn such that Ya=\\ PiVi is greaterthan or equal toJ2iKi<nPizif\302\260r au completionsZ\\,...,zn of Z\\,...,zk. TheapproximationtecEniquesto bediscussedapply to thoseproblemsthat canbeformulatedas(12.1)and for whichsimplerulescanbefound to determinewhenonefeasibleassignmentdominatesanother.Suchrulesexist,for example,for problemssolvable by the dynamic programmingtechnique.Somesuchproblemsare0/1knapsack,jobsequencingwith deadlines,jobsequencingto minimizefinish time,and jobsequencingto minimizeweightedmeanfinish time.

Oneway to solveproblemsstatedas above is tosystematically generateall feasibleassignmentsstarting from the null assignment.Let S^ representthe set of all feasibleassignmentsfor xi,X2,\342\226\240\342\226\240\342\226\240,Xi. Then S^ representsthe null assignmentand S^ the set of all completions.Theanswer to ourproblemis an assignmentin S^ that maximizesthe objectivefunction.Thesolutionapproachis then to generateS(t+1'from S^\\l < i < n. Ifan S^ containstwo feasibleassignmentsyi,...,yi and z\\,...,Zi such that

J2)=iPjyj= J2)=iPjzj,thenuseof the dominancerulesenablesus to discardor kill that assignmentwhich is dominated.In somecasesthe dominancerulesmay permitthe discardingorkillingof a feasibleassignmenteven whenHPjVj7^ J2Pjzj-This happens,for instance,in the knapsackproblem(seeSection5.7).Followingthe useof the dominancerules,it is the casethat foreachfeasibleassignmentin S^\\ J2)=iPjxjis distinct.However,despitethis,it is possiblefor eachS^>to containtwice as many feasibleassignmentsas inS(*-i).This resultsin a worst-casecomputingtimethat is exponentialin n.Note that this solutionapproachis identicalto the dynamic programmingsolutionmethodologyfor the knapsackproblem(Section5.7)and alsoto thebranch-and-boundalgorithmlaterdevelopedfor this problem(Section8.2).

The approximationmethodswe discussarecalledrounding,intervalpartitioning, and separation.Thesemethodsrestrictthe numberof distinct

12.5.FULLYPOLYNOMIALTIMEAPPROXIMATION SCHEMES587

Y^j=\\ Pjxj t\302\260 only a polynomialfunction of n.Theerrorintroducedis withinsomeprespecifiedbound.

12.5.1RoundingThe aimof roundingis to start from a probleminstance/ that isformulated as in (12.1)and to transformit to another probleminstance/' thatis easierto solve.This transformationis carriedout in sucha way that theoptimalsolutionvalue of /' is closeto the optimalsolutionvalue of /. Inparticular,if we are providedwith a bound e on the fractionaldifferencebetweenthe exactand approximatesolutionvalues, then we requirethat\\F*{I)-F*{I')/F*{I)\\< e, whereF*{I)and F*{I')representthe optimalsolutionvalues of / and /' respectively.

Probleminstance/' is obtainedfrom / by changingthe objectivefunctionto maxY^ Qixi- Since/ and /' have the sameconstraints,they have the samefeasiblesolutions.Hence,if the p^s and q^s differ by only a smallamount,the value of an optimalsolutionto/' will becloseto the value of an optimalsolutionto /.

Forexample,if the pi in / have the values (pi,p2, P3, Pa) = (1-1,2.1,1001.6,1002.3)and we construct/' with {qu q2, q3, <?4) = (0, 0, 1000,1000),it is easy to see that the value of any solutionin / is at most7.1morethan the value of the samesolutionin /'.This worst-casedifferenceis achievedonly when Xj = 1,1 1002.3(as onefeasiblesolutionis x\\ = x2 = X3 = 0 and X4 = 1).But F*{I)-F*{I')< 7.1and so(F*{I)-F*{I'))/F*{I)< 0.007.Solving/ usingthe procedureoutlinedabove,the feasibleassignmentsin S\302\256 couldhave the following distinctprofit values:

5(\302\260) {0}S^ {0,1.1}S& {0,1.1,2.1,3.2}S^ {0,1.1,2.1,3.2,1001.6,1002.7,1003.7,1004.8}S^ {0,1.1,2.1,3.2,1001.6,1002.3,1002.7,1003.4,1003.7,

1004.4,1004.8,1005.5,2003.9,2005,2006,2007.1}Thus, barring any eliminationof feasibleassignmentsresultingfrom thedominancerulesor from any heuristic,the solutionof / usingthe procedureoutlinedrequiresthe computationof J2o<i<n\\& 1=31feasibleassignments.

588 CHAPTER12.APPROXIMATION ALGORITHMS

Thefeasibleassignmentsfor /' have the following values:

5(\302\260) {0}SW {0}5(2) {0}S(3) {0,1000}S<4) {0,1000,2000}

Note that Ya=o \\S(i)\\ is only 8.HenceV can be solved in about one-fourththe timeneededfor /. An inaccuracyof at most.7%is introduced.

Given the p^s and an e, what shouldthe q^s besothat

n

\\F*{I)-F*{I')\\/F*{I)< e and ^ |S\302\253|<u{n,1/e)

i=0

whereu is a polynomial in n and 1/e?Oncewe figure this out, we have afully polynomial approximationschemefor our problemsinceit is possibletogo from S1^-1)toS'W in timeproportionalto 0{Sl-l~1^)(seethe knapsackalgorithmof Section5.7).

Let LB bean estimatefor F*(I)suchthat F*(I)> LB.Clearly, we canassumeLB > max{pi}.If

n

*\302\243\\Pl-qi\\<eF*(I)i=l

then it is clearthat [F*(I)-F*(I')]/F*(I)< e. Define q{ = p{- rem(p4,(LB-e)/n),whererem(a,b) is the remainderof a/b, that is,a \342\200\224

\\a/b\\ b (forexample,rem(7,6) = 1/6and rem(2.2,1.3)= .9).Sincerem(pi,LB-e/n)<LB \342\200\242e/n,it follows that Yl \\Pi

~Qi\\ < LB \342\200\242e < F* \342\226\240e.Hence,if an optimal

solutionto /' is usedas an optimalsolutionfor /, the fractionalerroris lessthan e.

To determinethe timerequiredtosolve/' exactly,it is usefulto introduceanotherproblem/\" with Si, 1< i <n, as its objectivefunction coefficients.Define Sj =

\\_(pi\342\226\240n)/(LB\342\200\242e)J, 1 < i < n. It is easy to see that Si =

{li\342\200\242

\302\273i)/(LB\342\200\242e).Clearly, the S^'scorrespondingto the solutionsof /' and

/\" will have the samenumberof tuples.The(r,t) is a tuple in an S\302\256 for /'if and only if ((r \342\200\242n)/(LB\342\200\242e),t)is a tuple in the S^ for /\".Hence,the timeneededto solve /' is the sameas that neededto solve /\".Sincepi < LB,itfollows that Si < \\n/e\\. Hence

\\S^\\<l+ Y!j=iSj< l + i\\n/e\\

and soZ7=o \\S{t)\\ < n +Z?=o*L\"/eJ= 0(n3/e)


Thus, if we can go from S{t~^ to S\302\256 in 0{\\S^-^\\)time, then I\" andhence/' can be solved in 0(n3/e)time. Moreover, the solutionfor /' willbe an e-approximatesolutionfor / and we thus have a fully polynomialtimeapproximationscheme.When usingrounding,we solve /\" and usetheresultingoptimalsolutionas the solutionto /.

Example12.14Considerthe 0/1knapsackproblemof Section5.7.While

solving this problemby successivelygeneratingS^\\ S^\\ ...,S^n\\ thefeasible assignmentsfor S^' can be representedby tuples of the form (r,t),where

% ir = J^PjxJ and * =

\302\243 wj'XJ

The dominancerule developedin Section5.7 for this problemis (ri,ii)dominates{r2,t2)iff \302\2431

< t2 and r\\ >r2.Let us solve the following instanceof the 0/1knapsackproblem:n =

5, m = 1112,and (pu p2, ;>?, p\302\261, Pb) = (wi, w2l w3, u;4, w5) = {1,2, 10,100,1000}.SinceK = wt, 1< i < 5, the tuples(r,t)in S\302\256,

0 < i < 5, haver = t. Consequently,it is necessaryto retainonly oneof the two coordinatesr and t. The S\302\256 obtainedfor this instanceare S(\302\260)

= {0},S^ = {0,1},5(2)= {0,1,2,3},S^ = {0,1,2,3,10,11,12,13},S^ = {0,1,2,3, 10,11,12,13,100,101,102,103,110,111,112,113},and S& = {0,1,2,3,10,11,12,13,100,101,102,103,110,111,112,113,1000,1001,1002,1003,1010,1011,1012,1013,1100,1101,1102,1103,1110,1111,1112}.

Theoptimalsolutionhas value J2Pixi= 1112.Now,letus useroundingon thisprobleminstancetofind an approximate

solutionwith value at most 10%lessthan the optimalvalue.We thus havee = 1/10.Also, we know that F*(I)> LB > max {pz}= 1000.Theproblem/\" to be solved is n = 5, m = 1112,(a'i, s2..S3, S4, S5) = (0,0,0,5, 50), and (101,w2, w3, 104, w5) = (1,2, 10,100,1000).Hence,S(0)= S(i) = S(2)= S(3)= {(0,0)},5(4) = {(0,0),(5.100)},and S^ ={(0,0),(5,100),(50,1000),(55,1100)}.

Theoptimalsolutionis (x\\, x2, \302\2433, \302\2434, \302\2435)

= (0,0,0,1,1).Itsvalue in

/\" is 55,and in the originalproblem1100.Theerror{F*{I)-F(I))/F*(I)is therefore12/1112< 0.011< e. At this timewe seethat the solutioncanbe improvedby settingeitherx\\ = 1or X3 = 1. \342\226\241

Roundingas describedin its full generalityresultsin 0(n3/e)timeapproximation schemes.It is possibleto specializethis techniqueto thespecific problembeing solved. In particular,we can obtain specializedand


asymptotically faster polynomial timeapproximationschemesfor theknapsack problemas well as for the problemof schedulingtaskson twoprocessors to minimizefinish time. Thecomplexityof the resultingalgorithmsis0(n(logn+ l/e2)).

Let us investigatethe specializedroundingschemefor the 0/1knapsackproblem.Let / be an instanceof this problemand let e be the desiredaccuracy. Let P*(I)be the value of an optimalsolution.First, a goodestimateUB for P*(I)is obtained.This is doneby orderingthe n objectsin / so that pi/wi >pi+i/wi+i,1< i <n. Next,we find the largestj suchthat Y,{Wi < m. If j = n, then the optimalsolutionis xt; = 1,1< i < n,and P*(I)= J^Pi- So,assumej < n. DefineUB = J2{ Pi- We can show

\\ UB < P*{I)<UB.The inequality P*{I)< UB follows from the orderingonPi/wi.Theinequality \\ UB < P*(I)follows from the observationthat

p*{i)>J2p*and p*(7) ^ max|X>>Pi+i|Hence,2P*(J)> J2{+1Pi = UB.

Now, let 5 = UBe2/9.Divide the n objectsinto two classes,BIGandSMALL. BIGincludesall objectswith pi > eUB/3.SMALL includesallother objects.Let the numberof objectsin BIGbe r. Replaceeachpi inBIGby qi such that q\\

= [pi/5\\ (this is the roundingstep).Theknapsackproblemis solvedexactlyusingtheser objectsand the q^s.

Let S^r> be the set of tuplesresultingfrom the dynamic programmingalgorithm.Foreachtuple (x,y) E S^r\\ fill the remainingspacem \342\200\224 y byconsideringthe objectsin SMALLin nondecreasingorderof pi/wi. Usethefilling that has maximumvalue as the answer.

Example12.15Considerthe probleminstanceof Example12.14:n = 5,(pi,P2,P3, Pi,Pb) =

{w\\, w2, w3, wA, w5) = (1,2, 10,100,1000),m =1112,and e = 1/10.Theobjectsarealready in nonincreasingorderoipi/wi.For this instance,UB = Y.\\Pi = 1H3.Hence,S = 3.71/3and e.UB/3=37.1.SMALL, therefore,includesobjects1,2,and 3. BIG= {4,5}.So54 =

\\j)i/S\\= 94 and q^ = |_Ps/<5J = 946.Solvingthe knapsackinstance

n = 2,m = 1112,{qA,wA) = (94,100),and (q5,w5)= (946,1000),we obtain5(\302\260)

= {(0,0)},SW = {(0,0), (94, 100)},and S^ = {(0,0), (94, 100),(946,1000),(1040,1100)}.Filling(0,0) from SMALL,we get the tuple (13,13).Filling(94,100),(946,1000),and (1040,1100)yields the tuples (107,113),(959,1013),and (1043,1100)respectively.Theanswer is given by thetuple (1043,1100).This correspondsto (x\\, X2, \302\2433, \302\2434, \302\2435)

= (1,1,0,1,1) and Y,PiXi= 1103. \342\226\241


An exerciseexploresa modificationto the basicroundingschemeillustrated in Example12.15.This modificationresultsin bettersolutions.

Theorem12.15[Ibarra and Kim] The algorithmjust describedis an eapproximate algorithmfor the 0/1-kiiapsackproblem. \342\226\241

The timeneededto initially sort accordingto Pijwi is 0(nlogn). SoUB can be computedin 0(n)time. SinceP*(I)< UB,thereareat mostUB/5\342\200\224 9/e2tuples in any S^ in the solutionof BIG.Thetimeto obtainS^ is therefore0(r/e2)< 0(n/e2).Fillingeachtuple in S^ with objectsfrom SMALLtakes0(|SMALL|)time.Since|S<r)|< 9/e2,the totaltimeforthis step is at most 0(n/e2).The total timefor the algorithmis therefore0(n(logn+l/e2)).A faster approximationschemefor the knapsackproblemhas beenobtainedby G.Lawler.His schemealsousesrounding.

12.5.2IntervalPartitioningUnlike rounding,interval partitioningdoesnot transform the originalproblem instanceinto one that is easiertosolve. Instead,an attemptis madeto solve the probleminstance/ by generatinga restrictedclassof thefeasible assignmentsfor

S^\302\260\\ S^,...,S^n\\ Let P{ be the maximumY?j=iPjxjamongall feasibleassignmentsgeneratedfor S^'.Then the profit interval[0,Pi] is divided into subintervals eachof sizePie/(n\342\200\224 1) (exceptpossiblythe last interval which may be a littlesmaller).All feasibleassignmentsinS\302\256 with

\302\243!\342\226\240 =lPjXjin the samesubinterval areregardedas having the sameY^j=\\Pjxjan(ltne dominancerulesareusedto discardallbut oneof them.The S^%> resultingfrom this eliminationare used in the generationof theS(t+1'.Sincethe numberof subintervals for eachS^ is at most \\n/e\\ + 1,\\S&\\ < \\n/e\\ + 1.Hence,\302\243i \\S^\\ = 0{n2/e).

Theerrorintroducedin eachfeasibleassignmentdueto thiseliminationinS'W is lessthan the subinterval length. This errormay, however, propagatefrom S^1)up through S^n\\ However,the erroris additive.Let F(I)be thevalue of the optimalgeneratedusing interval partitioning,and F*(I)thevalue of a true optimal.It follows that

n-lF*(I)-F(I)<(e'\302\243Pi)/(n-l)

i=l

SincePt < F*{I),it follows that [F*(I)-F(I)/F*(I)]< e,as desired.In many casesthe algorithmcan be speededby starting with a good

estimateLB for F*(I)such that F*(I)> LB.The subinterval sizeis then


LBe/(n\342\200\224 1) ratherthan Pie/(n\342\200\224 1).When a feasibleassignmentwith valuegreaterthan LB is discovered,the subinterval sizecan alsobe chosenasdescribed.

Example12.16Considerthe sameinstanceof the 0/1knapsackproblemas in Example12.14.Then e = 1/10and F*{I)> LB > 1000.We canstart with a subinterval sizeof LB.e/(n\342\200\224 1) = 1000/40= 25. Sincealltuples (p,t) in S'W have p = t, only p is explicitlyretained.Theintervals are[0,25),[25,50),...,and soon.Usinginterval partitioning,we obtain S(\302\260)

=S(i) = g(2)= S(3)= {Q},^)= {0,100},and S^ = {0,100,1000,1100}.

Thebestsolutiongeneratedusinginterval partitioningis (xi,X2, \302\2433, \302\2434,

\302\2435)

= (0,0,0,1,1)and its value F{I)is 1100.Then [F*(I)-F{I)]/F*{I)=12/1112<0.011< e. Again, the solutionvalue can be improved by usingaheuristicto changesomeof the xj'sfrom 0 to 1. \342\226\241

12.5.3SeparationAssume that in solving a probleminstance/, we have obtainedan S^with feasiblesolutionshaving J2i<j<iPjxj:0? 3.9,4.1,7.8,8.2,11.9,12.1.Furtherassumethat the interval sizePie/(n\342\200\224 1) is 2.Thenthe subintervalsare [0,2),[2,4),[4,6),[6,8),[8,10),[10,12),and [12,14).Each feasiblesolutionvalue falls in a different subinterval and sono feasibleassignmentsareeliminated.However, thereare threepairsof assignmentswith valueswithin Pje/(n \342\200\224 1).If the dominancerulesareusedfor eachpair,only fourassignmentsremain. Theerrorintroducedis at most

P\302\253e/(n\342\200\224 1).More

formally, let ao, ai,02,\342\200\242\342\200\242

\342\200\242, Qr be the distinctvalues of Y^j=\\Pjxim S \342\200\242

Let us assumeao < a\\ < a<i < \342\226\240\342\226\240\342\226\240<ar- We constructa new setJ from S'W

by makinga left to right scanand retaininga tuple only if its value exceedsthe value of the last tuple in J by morethan Pie/(n \342\200\224 1).This is describedby Algorithm 12.7.This algorithmassumesthat the assignmentwith lessprofit dominatesthe one with moreprofit if we regardboth assignmentsasyielding the sameprofit J2Pjxj-If the reverseis true, the algorithmcanstart with ar and workdownward. Theanalysis for this strategy is the sameas that for interval partitioning.Thesamecommentsregardingthe useof agoodestimatefor F*(I)holdheretoo.

Intuitively onemay expectseparationtoalwaysworkbetterthan intervalpartitioning.Thefollowing exampleillustratesthat thisneednot bethe case.However,empiricalstudieswith one problemindicateinterval partitioningto be inferior in practice.

Example12.17Usingseparationon the dataof Example12.14yields thesameS^ as obtainedusing interval partitioning.We have already seen


1 J :=assignmentcorrespondingto ao;XP :=ao;2 for j :=1to r do3 if aj >XP +Pie/{n- 1) then4 {5 put assignmentcorrespondingto cij into J;6 XP:=aj',7 }

Algorithm12.7Separationmethod

an instancein which separationperformsbetter than intervalpartitioning. Now, we see an examplein which interval partitioningdoesbetterthan separation.Assume that the subinterval sizeLB.e/(n\342\200\224 1) is 2. Thenthe intervals are [0,2),[2,4),[4,6),..., and soon. Assume further that(pi,p2, Pa, Pa, Pa) = (3, 1,5.1,5.1,5.1).Then, following the use ofinterval partitioning,we have S^ ={0},S^ ={0,3},S^ ={0,3,4},S<3)= {0,3, 4,8.1},S^ = {0,3, 4, 8.1,13.2},and S^ = {0,3, 4,8.1,13.2,18.3}.

Usingseparationwith LB.e/(n- 1) = 2,we have S^ = {0},S^ = {0,3},5(2)= {0,3},S^ = {0,3, 5.1,8.1},S& = {0,3, 5.1,8.1,10.2,13.2},and S<5)= {0,3, 5.1,8.1,10.2,13.2,15.3,18.3}. \342\226\241

Theexercisesexaminesomeof the otherproblemsto which thesetechniques apply. It is interestingto note that onecan coupleexistingheuristicstothe approximationschemesthat result from thesethreetechniques.Thisis becauseof the similarity in solutionproceduresfor the exactandapproximate problems.In the approximationalgorithmsof Sections12.2to 12.4itis usually not possibleto useexistingheuristics.

At this point, one might well ask what kind of AfV-h&rd problemscanhave fully polynomial time approximationschemes? No AfV-h&rd

e-approximationproblemcanhave suchaschemeunlessV = HV.A strongerresult can be proven. This strongerresult is that the only AAP-hardproblems that can have fully polynomial timeapproximationschemes(unlessV= MV) are thosewhich are polynomially solvableif restrictedto probleminstancesin which allnumbersareboundedby a fixed polynomial in n.Examples of suchproblemsarethe knapsackand jobsequencingwith deadlinesproblems.

Definition12.8[Garey and Johnson]Let L besomeproblem.Let / beaninstanceof L and letLENGTH(/) bethe numberof bitsin the representationof /. Let MAX(7) be the magnitudeof the largestnumberin /. Without


lossof generality,we canassumethat allnumbersin 7 areinteger.For somefixed polynomialp, let Lp be problemL restrictedto thoseinstances7 forwhich MAX(7) < p(LENGTH(7)).ProblemL is stronglyMV-hardif andonly if thereexistsa polynomialp suchthat Lp is AfV-h&rd. \342\226\241

Examplesof problemsthat arestrongly AfV-h&rd areHamiltoniancycle,nodecover,feedbackarcset,travelingsalesperson,maxclique,and soon.The 0/1knapsackproblemis probably not strongly J\\f V-h&rd (note thatthereis no known way toshow that a problemis not strongly J\\fV-h&rd) aswhen MAX(/) <p(LENGTH(7)),/ can be solved in time0(LENGTH(7)2\302\243>(LENGTH(7))usingthe dynamic programmingalgorithmof Section5.7.

Theorem12.16[Garey and Johnson] Let L be an optimizationproblemsuch that all feasiblesolutionsto all possibleinstanceshave a value thatis a positiveinteger. Further, assumethat for all instances7 of L, theoptimalvalue F*(I)is bounded by a polynomial function p in thevariables LENGTH(7)and MAX(7); that is,0 < F*(I)< p(LENGTH(7),MAX (7)) and F*(I)is an integer.If L has a fully polynomial timeapproximation scheme,then L has an exactalgorithmof complexitya polynomialin LENGTH(7)and MAX(7).

Proof:SupposeL has a fully polynomial timeapproximationscheme.Weshow how to obtain optimalsolutionsto L in polynomial time. Let 7 beany instanceof L.Definee = l/p(LENGTH(7),MAX(7)).With this e,theapproximationschemeis forcedto generatean optimalsolution.To seethis,let F(I)be the value of the solutiongenerated.Then,

\\F*(I)-F(I)\\ < eF*(I)< F*(7)/p(LENGTH(7),MAX(7))< 1

Since,by assumption,all feasiblesolutionsare integervalued, F*(I)=F(I).Therefore,with this e, the approximationschemebecomesan exactalgorithm.

The complexity of the resultingexactalgorithmis easy to obtain.Letg(LENGTH(7),1/e)bea polynomialsuchthat the complexityof theapproximation schemeis 0(g(LENGTH(7),1/e)).The complexity of this schemewhen e is chosenas above is 0(g(LENGTH(7),p(LENGTH(7),MAX(7))),which is 0(</(LENGTH(7),MAX(7)))for somepolynomialq'. \342\226\241

When Theorem12.16is appliedtointeger-valuedproblemsthat areHV-hard in the strong sense,we see that no such problemcan have a fully

polynomial timeapproximationschemeunlessV =HV\342\226\240 Theabove

theorem alsotellsus somethingabout the kind of exactalgorithmsobtainablefor strongly AAP-hard problems.A pseudo-polynomialtimealgorithmis one


whosecomplexityis a polynomial in LENGTH(J)and MAX(J). Thedynamic programmingalgorithmfor the knapsackproblem(Section5.7) is apseudo-polynomialtimealgorithm.No strongly ./VP-hardproblemcan havea pseudo-polynomialtimealgorithmunlessP = HV-

EXERCISES1.Considerthe 0(n(logn+ l/e2))roundingalgorithmfor the 0/1

knapsack problem.Let S^r'be the final setof tuplesin the solutionof BIG.Showthat no morethan (9/e2)/ objectswith roundedprofit value qi

cancontributeto any tuple in S^r>.Fromthis, concludethat BIGcanhave at most(9/e2)/qjobjectswith roundedprofit value gj. Hence,r < XX9/e2)/<7i>whereq% is in the range [3/e,9/e2].Now, show thatthe timeneededto obtain S^r' is (3(81/e4In (3/e)).Usethe relation

9^29/e2 r9A2 9 dqt 9,3/ \342\200\224 - / T \342\200\224 = -7T In -yc Qi Jz/e e2 q% e2 e

2.Write an algorithmfor the (3(n(logn+ l/e2))roundingschemediscussed in Section12.5.When solvingBIG,use threetuples (P,Q,W)suchthat P =

\302\243inxi, Q = E Qiî,and W =\302\243 wlxl.Tuple(Pi,Q\\,W\\)

dominates{P21Q21W2) if and only if Q\\ > Q2 and W\\ < W2- In caseQi = Q2 and W\\ = W2, then an additionaldominancecriteriacanbeused.In this casethe tuple (Pi,Q\\,W\\)dominates(P2,Q2,W2)ifand only if Pi > P2.Otherwise,(P2,Q2jW2) dominates(P],Qi,Wi).Show that your algorithmis of timecomplexity0(n(logn+ l/e2)).

3.Useseparationtoobtaina fully polynomialtimeapproximationschemefor the independenttaskschedulingproblemwhen rn = 2 (seeSection12.4).

4. Do Exercise3 for the casein whichthe two processorsoperateat speedsS[ and S2,si7^ S2 (seeExercise3).

5. Do Exercise3 for the casewhen the two processorsare nonidentical(seeExercise4).

6.Useseparationto obtain a fully polynomial timeapproximationalgorithm for the jobsequencingwith deadlinesproblem.

7. Useseparationto obtaina fully polynomialtimeapproximationschemefor the problemof obtainingtwo processorscheduleswith minimummean weightedfinish time (seeSection11.4).Assume that the twoprocessorsareidentical.


8. Do Exercise7 for the casein whicha minimum meanfinish timeschedule that hasminimum finish timeamongallminimum meanfinish timeschedulesis desired.Again, assumetwo identicalprocessors.

9. Do Exercise3 usingrounding.10.Do Exercise4 usingrounding.11.Do Exercise5 usingrounding.12.Do Exercise6 usingrounding.13.Do Exercise7 usingrounding.14.Do Exercise8 usingrounding.15.Show that the following problemsarestrongly NV-h.&rd:

(a(b(c(d(e:(f(g(h

Max cliqueSet cover

NodecoverSet packingFeedbacknodesetFeedbackarcsetChromaticnumberCliquecover

12.6 PROBABILISTICALLYGOODALGORITHMS(*)

Theapproximationalgorithmsof the precedingsectionshad the niceproperty that their worst-caseperformancecouldbeboundedby someconstants(k in the caseof an absoluteapproximationand e in the caseof an e-approximation).Therequirementof boundedperformancetends tocategorize otheralgorithmsthat usually work well as beingbad.Somealgorithmswith unboundedperformancemay in fact almostalways eithersolve theproblemexactlyor generatea solutionthat is exceedinglyclosein value tothe value of an optimalsolution.Suchalgorithmsaregoodin a probabilisticsense.If we pick a probleminstance/ at random,then thereis a very highprobability that the algorithmwill generatea very goodapproximatesolution. In this sectionwe considertwo algorithmswith this property. Bothalgorithmsarefor AAP-hard problems.

First, sincewe carry out a probabilisticanalysis of the algorithms,weneedto definea samplespaceof inputs. Thesamplespaceis set up by first

12.6.PROBABILISTICALLYGOODALGORITHMS(*) 597

denninga samplespaceSn for eachproblemsizen. Probleminstancesofsizen are drawn from Sn. Then, the overall samplespaceis the infiniteCartesianproductSix 5*2x S3 x \342\200\242\342\200\242\342\200\242 x Sn \342\200\242\342\200\242\342\200\242

\342\226\240 An elementof the samplespaceis a sequenceX = xi,'\302\2432,

\342\226\240\342\226\240.,xn....suchthat X{ is drawn from Si.Definition12.9[Karp] An algorithmA solves a problemL almosteverywhere (abbreviateda.e.)if, when X = X\\.X2,...,xn,...is drawn from thesamplespaceSi x S% x S3 x \342\200\242\342\200\242\342\200\242 x Sn \342\226\240\342\226\240

\342\200\242,

the numberof xi on which thealgorithmfails to solve L is finite with probability 1. \342\226\241

Sinceboth the algorithmswe discussarefor A^P-hardgraphproblems,wefirst describethe samplespacefor which the probabilisticanalysis is carriedout.Let p{n)bea function suchthat 0 <p(n) < 1for all n >0.A randomn vertexgraphis constructedby includingedge(i,j),i 7^ j,with probabilityp(n).

The first algorithmwe consideris an algorithmto find a Hamiltoniancycle in an undirectedgraph.Informally, this algorithmproceedsas follows.First,an arbitrary vertex (say vertex 1) is chosenas the start vertex.Thealgorithmmaintainsa simplepath P starting from vertex 1and endingatvertexk. Initially P is a trivial path with k = 1;that is,thereareno edgesin P.At eachiterationof the algorithman attemptis madeto increasethelengthof P. This is doneby consideringan edge(k,j)incidentto the endpoint k of P.When edge(k.j)is beingconsidered,oneof threepossibilitiesexist:

1.j = 1and path P includesall the verticesof the graph.In this caseaHamiltoniancycle has beenfound and the algorithmterminates.

2.j is not on the path P. In this casethe lengthof path P is increasedby adding(k,j)to it.Thenj becomesthe new endpointof P.

3.j is already on path P. Now thereis a uniqueedgee = (j,m) in Psuchthat the deletionof e from and the inclusionof (h,j)to P resultin a simplepath. Then e is deletedand (k,j)is addedto P.P is nowa simplepath with endpointin.

Thealgorithmis constrainedso that case3 doesnot generatetwo pathsof the samelengthhaving the sameendpoint.With a properchoiceof datarepresentations,this algorithmcan be implementedto run in time(3(n2),wheren is the numberof verticesin graph G. It is easy to seethat thisalgorithmdoesnot alwaysfind a Hamiltoniancycle in a graphthat containssucha cycle.

Theorem12.17[Posa]If p(n) \302\253 (a In n/n), a > 1,then the precedingalgorithmfinds a Hamiltoniancycle (a.e.). \342\226\241


Example12.18Let us try out the abovealgorithmonthe five-vertex graphof Figure12.9.Thepath P initially consistsof vertex1only. Assume edge(1,4) is chosen.Thisrepresentscase2 and P is expandedto {1,4}.Assumeedge(4, 5) is chosennext. Path P now becomes{1,4,5}.Edge(1,5) isthe only possibility for the next edge.This resultsin case3 and P becomes{1,5,4}.Now assumeedges(4, 3) and (3,2)areconsidered.PathP becomes{1,5,4,3,2}.If edge(1,2) is next considered,a Hamiltoniancycle is foundand the algorithmterminates. \342\226\241

Figure12.9Graph for Example12.18

Thenext probabilisticallygoodalgorithmwe lookat is for the maximumindependentsetproblem.A subsetof verticesN of graphG(V,E)is saidtobe independentif and only if no two verticesin N areadjacentin G. Indep(Algorithm 12.8)is a greedyalgorithmto constructa maximumindependentset.

One can easily constructexamplesof n vertexgraphs for which Indepgeneratesindependentsetsof size1when in fact a maximumindependentset containsn \342\200\224 1vertices.However,for certainprobability distributionsitcanbe shown that Indepgeneratesgoodapproximationsalmosteverywhere.If F*(I)and F(I)representthe sizeof a maximumindependentset andonegeneratedby algorithmIndep,respectively, then the following theoremis obtained.

Theorem12.18[Karp] If p{n) = c, for someconstant c, then for everye >0,we have

[F*(I)-F(I)]/F*(I)<.5+ e (a.e.) \342\226\241

Algorithm Indep can easily be implementedto have polynomialcomplexity.

SomeotherjVP-hardproblemsfor which probabilisticallygoodalgo-


1 Algorithmlndep(F,E)2 {3 iV:=0;4 while thereis a v 6 (V \342\200\224 N) and5 v not adjacentto any vertexin N do6 N :=N U {u};7 returnJV;8 }

Algorithm12.8Findingan independentset

rithmsareknown arethe Euclideantraveling salesperson,minimalcoloringof graphs,setcovering,maximumweightedclique,and partition,

EXERCISE

1,Show that function Indep is not an e-approximatealgorithmfor themaximumindependentset problemfor any e, 0 < e < 1.


Thetermsapproximationscheme,polynomial timeapproximationscheme,and fully polynomial timeapproximationschemewerecoinedby M. Gareyand D.Johnson.S.Sahnipointedout that for the 0/1knapsackproblemthecorrespondingabsoluteapproximationproblemis alsojVP-hard.Thepolynomial timeapproximationschemefor the 0/1knapsackproblemdiscussedin Section12.4is alsodue toS.Sahni.

Theanalysis of the LPTrule of Section12.3is due to R. Graham.Thepolynomial timeapproximationschemefor schedulingindependenttasksthat was discussedin Section12.4is alsodue tohim.

An excellentbibliography on approximationalgorithmsis\"Approximation algorithmsfor combinatorialproblems:an annotatedbibliography,\"byM. Garey and D. Johnson,in Algorithms and Complexity: RecentResultsand New Directions,J.Traub,ed.,AcademicPress,1976.

TheapproximationalgorithmMSAT2(Exercise2) wasgiven by K.Lieber-herr. S.Sahni and T. Gonzalezwere the first to show the existenceofjVP-harde-approximateproblems.Garey and Johnsonhave shown that thee-approximategraph coloringproblemis jVP-hardfor e < 1. O. Ibarra


and C.Kim werethe first to discoverthe existenceof fully polynomial timeapproximationschemesfor NV-h&rd problems.

Our discussionof the generaltechniquesrounding,interval partitioning,and separationis basedon the work of Sahni. Thenotionof strongly HV-hard is due to Garey and Johnson.Theorem12.16is alsodue tothem.

Thediscussionof probabilisticallygoodalgorithmsis basedon the workof R. Karp.Theorem12.17is due toL.Posa.

For additionalmaterialon complexity theory seeComplexityTheory, byC.H.Papadimitriou,Addison-Wesley,1994.

12.8 ADDITIONALEXERCISES1.Thesatisfiability problemwas introducedin Chapter11.Define

maximum satisfiability to be the problemof determininga maximumsubset of clausesthat can be satisfied simultaneously.If a formula haspclauses,then allp clausescan be simultaneously satisfied if and onlyif the formula is satisfiable.For function MSat (Algorithm 12.9),showthat for every instancei, \\F*(i)

-F(i)\\/F*(i)< l/(k+ 1).Then kis the minimum numberof literalsin any clauseof i. Show that thisbound is bestpossiblefor this algorithm.

2. Show that if function MSat2 (Algorithm 12.10)is used for themaximum satisfiability problemof Exercise1,then \\F*(i)

\342\200\224 F(i)\\/F*(i)<1/2^,wherek,F,and F* areas in Exercise1.

3. Considerthe set cover problemof Section11.3,Exercise15.Showthatif the function SetCover(Algorithm 12.11)is usedfor the optimizationversionof this problem,then

Hi) <y-iF*(I)-^j

wherek is the maximumnumberof elementsin any set.Show thatthis bound is bestpossible.

4. Considera modifiedsetcover problem(MSC)in whichwe arerequiredtofind a cover T suchthat J2seTIsI *s minimum.

(a) Show that exactcover ocMSC(seeSection11.3,Exercise16).(b) Showthat the function MSC(Algorithm12.12)is not an e-approximate

algorithmfor this problemfor any e, e >0.


1 AlgorithmMSat(J)2 // Approximationalgorithmfor maximumsatisfiability.3 // / is a formula.Let x[l:n] be the variablesin /4 // and let C{,1< i <p be the clauses.5 {6 CL:=0;// Set of clausessimultaneouslysatisfiable7 Left:={C\302\253|i

< i <p};// Remainingclauses8 Lit :={xi,fi\\l< i <n};// Set of allliterals9 while Lit containsa literaloccurringin a clausein Leftdo10 {11 Let y bea literalin Lit that is in12 the most clausesof Left;12 LetR bethe subsetof clausesin Leftthat containsy;13 CL:=CLU R; Left:=Left-R;14 Lit:=Lit-{y,y};15 }16 returnCL;17 }


5. An edgedisjointcycle cover of an undirectedgraphG is a setof edgedisjointcyclessuchthat every vertexis includedin at leastonecycle.Thesizeof sucha cycle cover is the numberof cycles in it.(a) Showthat finding a minimum cyclecover of this type isAAP-hard.

(b) Show that the e-approximationversionof this problemis MV-hard for alle, e > 0.

6. Show that if the cyclesin Exercise5 areconstrainedtobevertexdisjoint, thenthe problemremainsAAP-hard. Showthat the e-approximateversionis AAP-hard for alle, e >0.

7. Let G = (V, E) be an undirectedgraph. Let / :E ->Z be an edgeweightingfunction and let w :V \342\200\224> Z bea vertexweightingfunction.Let A; be a fixed integer,k > 2. Theproblemis to obtain k disjointsetsSi,...,Sksuchthat:

(a) US,= V

(b) St r\\Sj=4> for i ^ j(c) \302\243ies,- v>tt) < W; l<i<k

602 CHAPTER12.APPROXIMATION ALGORITHMS

1 AlgorithmMSat2(J)2 // Samefunction as MSat3 {4 w[i] :=2-lCil,1< i <p-,5 // Weighting function \\d\\

= numberof literalsin C{.6 CL:=0;Left:={d\\l< i <p};7 Lit :={xi,Xi\\l< i <n};8 whileLit containsa literaloccurringin a clausein Leftdo9 {10 Let y G Lit be such that y occursin a clausein Left;11 LetR be the subsetof clausesin Leftcontainingy;12 Let S be the subsetof clausesin Leftcontainingy;13 if EdeRw[i] > Ectes l then14 {15 CL \342\226\240= CLU R;16 Left:=Left- R\\

17 w[i] :=2 * w[i] for eachCj G S;18 }19 else20 {21 CL:=CLU S;22 Left:=Left-S;23 w[i]:=2*w[i] for eachCi e i?;24 }25 Lit :=Lrf \342\200\224 {y,y};26 }27 returnCL;28 >


12.8.ADDITIONAL EXERCISES 603

12345678910111213141516

AlgorithmSetCover(.F)// Si,1< i < m are the setsin F. ISA is the// numberof elementsin Si.| U Si = n.{

}

G:forcovT :

= US,;i :=1to m doEj :\342\200\224 Sn

1 :=0;// Elementscovered= 0;// Cover beingconstructed

while cov ^ G do1

}ret

Let Rj besuchthat \\Rj\\> \\Rq\\, 1< q

cov :=cov U Rj; T:=TUSj;for i :=1to m do i?? :=Ri \342\200\224 Rj;

urn T;


1 AlgorithmMSC(F)2 // Samevariablesas in SetCover3 {4 T:=0;Left:={St\\l< i <m};G :=USf,5 while G ^ 0 do6 {7 Let Sjbea set in Leftsuchthat8 \\Sj

-G\\/\\Sj f]G\\<\\Sq

-G\\/\\Sq fl G|for all SQin Le/t;

9 T:=TUSj;G:=G-Sj;Left:=Left-Sj;10 }11 returnT;12 }



(d) ELiE(u,v)eE f(u,v) is maximizedu,veS{

W is a number that may vary from instanceto instance.Thispartitioning problemfinds applicationin the minimizationof the costofinterpagereferencesbetweensubroutinesof a program.Show that thee-approximateversionof this problemis AfV-h&rd for alle, 0 < e < 1.

8. In one interpretationof the generalizedassignmentproblem,we havem agentswho have to performn tasks.If agenti is assignedto performtask j, then costCy is incurred.When agent i performstask j, rûnits of her or his resourcesare used.Agent i has a total of b{ unitsof resource.Theobjectiveis to find an assignmentof agents to taskssuch that the totalcostof the assignmentis minimizedand no agentrequiresmorethan her or his totalavailableresourcetocompletethetasksshe or he is assignedto. Only one agent may be assignedto atask.UsingXij as a 0/1variable suchthat x\\j

= 1if agent i is assignedtotask j and Xij

= 0 otherwise,the generalizedassignmentproblemcanbe formulatedmathematicallyas

minimize\302\243\342\204\242 l E\"=icijxij

subjectto E?=irijxij< fo, 1< i < m

YT=i ^j = 1, 1< j < n

x^ = 0 or 1, for all i and j

TheconstraintsExij = 1ensurethat exactlyoneagent is assignedtoeachtask.Many otherinterpretationsarepossiblefor this problem.Show that the correspondinge-approximationproblemis AfV-hard forall e, e > 0.

Chapter 13

PRAM ALGORITHMS

13.1INTRODUCTIONSo far our discussionof algorithmshas beenconfined to single-processorcomputers.In this chapterwe study algorithmsfor parallelmachines(i.e.,computerswith morethan one processor).Thereare many applicationsinday-to-day life that demandreal-timesolutionsto problems.Forexample,weatherforecastinghas to bedonein a timely fashion.In the caseof severehurricanesor snowstorms,evacuationhas to be done in a short periodoftime. If an expertsystem is used to aid a physician in surgicalprocedures,decisionshave to be madewithin seconds.And soon. Programswrittenfor suchapplicationshave to performan enormousamountof computation.In the forecastingexample,large-sizedmatriceshave to beoperatedon.Inthe medicalexample,thousandsof ruleshave to be tried.Even the fastestsingle-processormachinesmay not beableto comeup with solutionswithintolerabletimelimits.Parallelmachinesoffer the potentialof decreasingthesolutiontimesenormously.

Example13.1Assume that you have 5 loadsof clothesto wash. Alsoassumethat it takes25 minutesto wash one loadin a washing machine.Then, it will take125minutesto wash allthe clothesusinga singlemachine.On the otherhand, if you had 5 machines,washing couldbe completedinjust 25 minutes!In this example,if thereare p washing machinesand ploadsof clothes,then the washing timecan be cut down by a factor of pcomparedto having a singlemachine.Herewe have assumedthat everymachinetakesexactlythe sametimeto wash. If this assumptionis invalid,then the washing timewill bedictatedby the slowestmachine. \342\226\241

Example13.2As anotherexample,say thereare100numberstobeaddedand therearetwo personsA and B.PersonA canaddthe first 50 numbers.At the sametimeB canaddthe next50numbers.When they aredone,one

605

606 CHAPTER13.PRAM ALGORITHMS

of them can add the two individual sums to get the final answer. So, twopeoplecan add the 100numbersin almosthalf the timerequiredby one.\342\226\241

Theideaof parallelcomputingis very similar.Given a problemto solve,we partitionthe probleminto many subproblems;let eachprocessorworkona subproblem;and when allthe processorsaredone,the partialsolutionsarecombinedto arrive at the final answer.If therearep processors,thenpotentially we cancut down the solutiontimeby a factor of p. We refer to anyalgorithmdesignedfor a single-processormachineas a sequentialalgorithmand any designedfor a multiprocessormachineas a parallelalgorithm.

Definition13.1Let -k be a given problemfor which the best-knownsequential algorithmhas a run timeof S'(n),wheren is the problemsize.If aparallelalgorithmon a ^-processormachineruns in timeT'(n,\302\243>), then thespeedupof the parallelalgorithmis defined to be

Ti/\342\204\242L

\342\226\240

If the best-knownsequentialalgorithmfor it has an asymptotic run timeof S(n) and if T{n,p)is the asymptotic run timeof a parallelalgorithm,then the asymptotic speedupof the parallelalgorithmis defined to be T^K \342\226\240

I\342\200\224il I nr\\ 1 4- Vt r\\ir\\ 4- l-i /\"i r\302\273 I /vAvi Tri tv\302\273 10oriirl 4- ,-v li rn rt~\\ Innr\\ s^rtw nnns^s^rl ft inr\\

n,p)If Tt \\

= &(p),then the algorithmis saidto have linearspeedup. \342\226\241

Note:In this bookwe usethe terms\"speedup\"and \"asymptoticspeedup\"interchangeably.Which one is meant is clearfrom the context.

Example13.3For the problemof Example13.2,the 100numberscan beaddedsequentially in 99 units of time.PersonA can add50 numbersin 49units of time.At the sametime,Bcanaddthe other50 numbers.In anotherunit of time,the two partialsumscanbeadded;this meansthe parallelruntimeis 50.Sothe speedupof this parallelalgorithmis -^ \342\200\224 1.98,which isvery nearly equalto 2! \342\226\241

Example13.4Therearemany sequentialsortingalgorithmssuchas heapsort(Section2.4.2)that areoptimaland run in timeO(nlogn),n beingthenumberof keys to be sorted.Let A be an n-processorparallelalgorithmthat sortsn keys in 0(logn)timeand let B be an n2-processoralgorithmthat alsosortsn keys in \302\251(logn) time.

Then,the speedupof A is~^?lo\302\260gny

= 0(n). On the other hand, the

speedupof B is also \"q^JJ = 0(n). Algorithm A has linearspeedup,whereasB doesnot have a linearspeedup. \342\226\241

Definition13.2If a ^-processorparallelalgorithmfor a given problemruns in timeT(n,p),the total work doneby this algorithmis defined to

13.1.INTRODUCTION 607

bepT(u,p).The efficiency of the algorithmis dennedto be Jp(n )' wnereS(n) is the asymptotic run timeof the bestknown sequentialalgorithmforsolving the sameproblem.Also, the parallelalgorithmis saidto be work-optimal ifpT(n,p)= 0{S(n)). \342\226\241

Note:A parallelalgorithm is work-optimalif and only if it has linearspeedup.Also, the efficiency of a work-optimalparallelalgorithmis 0(1).

Example13.5Let w be the timeneededto wash one load of clothesona singlemachinein Example13.1.Also let n be the totalnumberof loadsto wash. A singlemachinewill taketimenw. If therearep machines,thewashing timeis

[-]'\302\253;.Thus the speedupis n/\\^~\\.

This speedupis >|if

n >p. So,the asymptoticspeedupis \302\243l(p)and hencethe parallelalgorithm

has linearspeedupand is work-optimal.Also, the efficiency is,\"\342\200\236!

, Thisis 6(1)if n >p. D

Example13.6For the algorithmA of Example13.4,the totalwork doneis n0(logn) = 0(nlogn).Itsefficiency is gr^ = 8(1).Thus, A is

work-optimaland has a linearspeedup.Thetotalwork doneby algorithmB is n20(logn)= 8(n2logn)and its efficiency is J^ \302\260ogg^

= 6(^).As aresult,B is not work-optimal! \342\226\241

Is it possibleto get a speedupof morethan p for any problemon a p-processormachine?Assume that it is possible(sucha speedupis calledasuperlinearspeedup).In particular,let it bethe problemunderconsiderationand Sbe the best-knownsequentialrun time.If thereis a parallelalgorithmon a p-processormachinewhosespeedupis betterthan p, it meansthat theparallelrun timeT satisfies T < ^; that is, pT < S. Notethat a singlestepof the parallelalgorithmcan besimulatedon a singleprocessorin time< p. Thus the whole parallelalgorithmcan be simulatedsequentially intime< pT < S. This is a contradictionsinceby assumptionS is the runtimeof the best-knownsequentialalgorithmfor solving tt\\

Theprecedingdiscussionis valid only whenwe considerasymptotic speed-ups. When the speedupis definedwith respectto the actualrun timeson thesequentialand parallelmachines,it is possibleto obtainsuperlinearspeedup.Two of the possiblereasonsfor suchan anomaly are (1)p processorshavemoreaggregatememory than one and (2) the cache-hitfrequency may bebetterfor the parallelmachineas the p-processorsmay have moreaggregatecachethan doesoneprocessor.

Oneway of solvinga given problemin parallelis to exploremanytechniques (i.e.,algorithms)and identify the one that is the most parallelizable.To achievea goodspeedup,it is necessaryto parallelizeevery componentof


the underlying technique.If a fraction/ of the techniquecannotbeparallelized (i.e.,has to berun serially),then the maximumspeedupthat can beobtainedis limitedby /. Amdahl'slaw (proofof which is left as anexercise)

relatesthe maximumspeedupachievablewith / and p (the numberofprocessorsused)as follows.

Lemma 13.1Maximum speedup= \\-f- n\342\226\240* p

Example13.7Considersometechniquefor solvinga problemn. Assumethat p = 10.If / = 0.5for this technique,then the maximumspeedupthatcan beobtainedis \\_0^ = |y,which is lessthan 2!If / = 0.1,then the

maximumspeedupis j^,which is slightly morethan 5!Finally, if / = 0.01,then the maximumspeedupis j^, which is slightly morethan 9! \342\226\241

EXERCISES1.Algorithms A and B are parallelalgorithmsfor solving the selection

problem(Section3.6).Algorithm A usesn05 processorsand runs intimeQ(n0'5).Algorithm B uses n processorsand runs in Q(logn)time.Computethe worksdone,speedups,and efficienciesof thesetwoalgorithms.Are thesealgorithmswork-optimal?

2.Mr. Ultrasmartclaimsto have found an algorithmfor selectionthatruns in time\302\251(logn) usingn3'4processors.Isthis possible?

3. ProveAmdahl'slaw.

13.2 COMPUTATIONALMODELThesequentialcomputationalmodelwe have employedsofar is the RAM(randomaccessmachine). In the RAM modelwe assumethat any of thefollowing operationscan be performedin one unit of time: addition,subtraction, multiplication,division, comparison,memory access,assignment,and so on.Thismodelhasbeenwidelyacceptedasa valid sequentialmodel.On the otherhand when it comesto parallelcomputing,numerousmodelshave beenproposedand algorithmshave beendesignedfor eachsuchmodel.

An importantfeatureof parallelcomputingthat is absent in sequentialcomputingis the needfor interprocessorcommunication.For example,givenany problem,the processorshave to communicateamongthemselvesandagreeon the subproblemseachwill work on.Also, they needtocommunicate to seewhetherevery onehas finished its task,and so on.Eachmachine

13.2.COMPUTATIONALMODEL 609

(a)Mesh (b)Hypercube (c)Butterfly

Figure13.1Examplesof fixed connectionmachines

or processorin a parallelcomputercan be assumedto be a RAM. Variousparallelmodelsdiffer in the way they support interprocessorcommunication. Parallelmodelscan be broadly categorizedinto two: fixed connectionmachinesand sharedmemory machines.

A fixed connectionnetwork is a graph G(V,E) whose nodesrepresentprocessorsand whoseedgesrepresentcommunicationlinks betweenprocessors. Usually we assumethat the degreeof eachnodeis eithera constantora slowlyincreasingfunction of the numberof nodesin the graph.Examplesincludethe mesh,hypercube,butterfly, and so on (seeFigure13.1).Interprocessor communicationis done through the communicationlinks. Anytwo processorsconnectedby an edgein G can communicatein onestep.Ingeneraltwo processorscan communicatethroughany of the pathsconnecting

them. The communicationtimedependson the lengthsof thesepaths(at leastfor smallpackets).More detailsof thesemodelsare provided inChapters14and 15.

In sharedmemory models[alsocalledPRAMs (ParallelRandomAccessMachines)],a number(say p) of processorswork synchronously.Theycommunicate with eachotherusinga commonblockof globalmemory that isaccessibleby all.Thisglobalmemory is alsocalledcommonor sharedmemory (seeFigure 13.2).Communicationis performedby writing to and/orreadingfrom the commonmemory. Any two processorsi and j cancommunicate in two steps.In the first step,processori writesits messageintomemory cellj, and in the secondstep,processorj readsfrom this cell.Incontrast,in a fixed connectionmachine,the communicationtimedependson the lengthsof the paths connectingthe communicatingprocessors.

Eachprocessorin a PRAM is a RAM with somelocalmemory. A singlestepof a PRAM algorithmcanbeoneof the following:arithmeticoperation(suchas addition,division, and soon.),comparison,memory access(localorglobal),assignment,etc.Thenumber(m) of cellsin the globalmemory istypically assumedtobethe sameasp.But this neednot alwaysbethe case.


1 2 3 processors

1 2 3 4 5 6 mglobalmemory

Figure13.2A parallelrandomaccessmachine

In fact we presentalgorithmsfor which m is much largeror smallerthan p.We alsoassumethat the input is given in the globalmemory and thereisspacefor the output and for storingintermediateresults.Sincethe globalmemory is accessibleby all processors,accessconflictsmay arise. Whathappensif morethan oneprocessortriesto accessthe sameglobalmemorycell (for the purposeof readingfrom or writing into)? Thereare severalways of resolvingreadand write conflicts.Accordingly,severalvariants ofthe PRAM arise.

EREW (ExclusiveReadand ExclusiveWrite) PRAM is the sharedmemory

modelin which no concurrentreador write is allowedon any cellof theglobalmemory. Note that ER or EW doesnot precludedifferent processorssimultaneously accessingdifferent memory cells.Forexample,at a giventimestep,processoronemight accesscellfive and at the sametimeprocessor two might accesscell12,and soon.But processorsoneand two cannotaccessmemory cellten,for example,at the sametime.CREW (ConcurrentRead and ExclusiveWrite) PRAM is a variation that permitsconcurrentreadsbut not concurrentwrites.Similarlyonecouldalsodefine the ERCWmodel.Finally, the CRCW PRAM modelallowsboth concurrentreadsandconcurrentwrites.

In a CREW or CRCW PRAM, if morethan oneprocessortriesto readfrom the samecell,clearly, they will readthe sameinformation.But in aCRCW PRAM, if morethan one processortriesto write in the samecell,then possibly they may have different messagesto write.Thus therehas tobean additionalmechanismto determinewhich messagegets to bewritten.Accordingly,severalvariants of the CRCW PRAM can be derived.In acommonCRCW PRAM, concurrentwritesarepermittedin any cellonly ifallthe processorsconflictingfor this cellhave the samemessagetowrite.Inan arbitrary CRCW PRAM, if thereis a conflict for writing, oneof theprocessors will succeedin writing and we don'tknow whichone.Any algorithmsdesignedfor this modelshouldwork no matterwhich processorssucceedinthe event of conflicts.Thepriority CRCW PRAM letsthe processorwith


the highestpriority succeedin the caseof conflicts.Typicallyeachprocessoris assigneda (static)priority to beginwith.

Example13.8Considera 4-processormachineand alsoconsideranoperation in which eachprocessorhas to readfrom the globalcellM[l].Thisoperationcan bedenotedas

Processori (in parallelfor 1< i < 4) does:ReadM[l];

This concurrentreadoperationcan be performedin one unit of timeonthe CRCW as well as on the CREW PRAMs. But on the EREW PRAM,concurrentreadsare prohibited.Still,we can perform this operationonthe EREW PRAM makingsure that at any given timeno two processorsattempttoreadfrom the samememorycell.Oneway of performingthis is asfollows:processor1readsM[l]at the first timeunit;processor2 readsM[l]at the secondtimeunit; and processors3 and 4 readM[\\] at the third andfourth timeunits,respectively.Thetotalrun timeis four. Betteralgorithmsfor generalcasesareconsideredin latersections(seeSection13.6,Exercise11)-

Now considerthe operationin which eachprocessorhas to accessM[l]for writing at the sametime.Sinceonly onemessagecanbewrittento M[l],one has to assumesomeschemefor resolvingcontentions.This operationcan bedenotedas

Processori (in parallelfor 1< i < 4) does:Write M[l];

Again, on the CRCW PRAM, this operationcan becompletedin one unitof time.On the CREW and EREW PRAMs, concurrentwrites areprohibited. However,thesemodelscan simulatethe effects of a concurrentwrite.Consideroursimpleexampleof four processorstrying to write in M[l\\.Simulating a commonCRCW PRAM requiresthe four processorsto verify thatall wish to write the samevalue. Followingthis, processor1 can do thewriting. Simulatinga priority CRCW PRAM requiresthe four processorsto first determinewhich has the highestpriority, and then the onewith thispriority doesthe write.Other modelsmay be similarly simulated.Exercise12of Section13.6dealswith moregeneralconcurrentwrites. \342\226\241

Note that any algorithmthat runsonap-processorEREW PRAM in timeT(n,p),wheren is the problemsize,can alsorun on a p-processorCREWPRAM or a CRCW PRAM within the sametime. But a CRCW PRAMalgorithmor a CREW PRAM algorithmmay not be implementableon an


Processori (in parallelfor 1< i < n) does:if

(A[\302\273]

= 1) thenA[0] :=A[i]-,

Algorithm13.1Computingthe booleanOR in 0(1)time

EREW PRAM preservingthe asymptotic run time. In Example13.8,wesaw that the implementationof a singleconcurrentwrite orconcurrentreadsteptakesmuch moretimeon the EREW PRAM. Likewise,a p-processorCRCW PRAM algorithmmay not beimplementableonap-processorCREWPRAM preservingthe asymptotic run time.It turns out that thereis a stricthierarchy amongthe variants of the PRAM in termsof their computationalpower. For example,a CREW PRAM is strictly morepowerful than anEREW PRAM. This meansthat thereis at leastone problemthat can besolved in asymptotically lesstimeon a CREW PRAM than on an EREWPRAM, given the samenumber of processors.Also, any versionof theCRCW PRAM is morepowerful than a CREW PRAM as is demonstratedby Example13.9.

Example13.9A[0] = A[1]||A[2]||\342\200\242\342\200\242\342\200\242

\\\\A[n] is the Boolean(or logical)ORof the n bitsA[l : n]. A[0] is easily computedin 0(n)timeon a RAM.Algorithm 13.1shows how A[0] can be computedin 0(1)timeusing ann-processorCRCW PRAM.

Assume that A[0] is zeroto beginwith. In the first timestep,processori, for 1 < i < n, readsmemory locationA[i] and proceedsto write a 1in memory locationA[0] if A[i] is a 1.Sinceseveralof the A[i]'smay be1,severalprocessorsmay write to A[0] concurrently.Hencethe algorithmcannotbe run (as such)on an EREW or CREW PRAM. In fact, for thesetwo models,it is known that the parallelcomplexityof the BooleanORproblemis fi(logn) no matterhow many processorsare used.Note thatthe algorithmof Algorithm 13.1works on all threevarietiesof the CRCWPRAM. \342\226\241

Theorem13.1ThebooleanOR of n bitscan be computedin 0(1)timeon an n-processorcommonCRCW PRAM. \342\226\241

Thereexistsa hierarchy among the different versionsof the CRCWPRAMalso.Common,arbitrary, andpriority form an increasinghierarchyofcomputingpower.Let EREW(p,T(n,p))denotethe setof allproblemsthatcanbesolvedusingap-processorEREW PRAM in timeT(n,p) (n beingthe


problemsize). Similarly define CREW(p,T(n,p))and CRCW(p,T(n,p)).Then,

EREW(p,T(n,p))C CREW(p,T(n,p))C CommonCRCW(p,T(n,p))C Arbitrary CRCW(p,T(n,p)) C Priority CRCW(p,T(n,p))

All the algorithmsdevelopedin this chapterfor the PRAM modelassumesomerelationshipbetweenthe problemsizen and the numberof processorsp. For example,the CRCW PRAM algorithmof Algorithm 13.1solves aproblemof sizen usingn processors.In practice,however, a problemof sizen is solvedon a computerwith a constantnumberp of processors.All thealgorithmsdesignedundersomeassumptionsabout the relationshipbetweenn and p can alsobe usedwhen fewer processorsare available as thereis ageneralslow-downlemmafor the PRAM model.

Let A be a parallelalgorithmfor solvingproblemit that runs in timeTusingp processors.The slow-downlemmaconcernsthe simulationof thesamealgorithmon a p'-processormachine(for p'<p).

Eachstepof algorithmA can be simulatedon the p'-processormachine(callit M) in time < [4],sincea processorof M. can be in chargeof

simulating[4]processorsof the originalmachine.Thus, the simulationtimeon M. is < T|\"4~|.Therefore,the totalwork doneon M. is <p'T|~41<pT+p'T= 0(pT).This resultsin the following lemma.

Lemma 13.2[Slow-downlemma] Any parallelalgorithmthat runs on a p-processormachinein timeT can be run on a p'-processormachinein time

0(4j),for any p'<p. \342\226\241

Sinceno such slow-downlemmais known for the modelsof Chapters14and 15,we need to develop different algorithmswhen the number ofprocessorschangesrelativeto the problemsize.So, in Chapters14and 15we develop algorithmsunder different assumptionsabout the relationshipbetweenn andp.

Example13.10Algorithm 13.1runs in 0(1)timeusingn processors.Using the slow-downlemma,the samealgorithmalsoruns in 0(logn)timeusing

j\302\243\342\200\224

processors;it alsoruns in Q(y/n) timeusingy/n processors;andsoon.When p = 1,the algorithmruns in time0(n),which is the sameasthe run timeof the bestsequentialalgorithm! \342\226\241

Note:In Chaptersone through nine we presentedvarious algorithmdesign techniquesand demonstratedhow they can beappliedto solveseveralspecificproblems.In the domainof parallelalgorithmsalsosomecommon


ideashave beenrepeatedlyemployed to designalgorithmsover a widevariety of models.In Chapters13,14,and 15we considerthe PRAM, mesh,and hypercubemodels,respectively.In particular,we study the followingproblems:prefixcomputation,list ranking,selection,merging,sorting,somebasicgraph problems,and convexhull. Foreachof theseproblemsacommon themeis used to solve it on the threedifferent models.In Chapter13we presentfull detailsof thesecommonthemes.In Chapters14and 15weonly point out the differencesin implementation.

EXERCISES

1.Presentan 0(1)timen-processorcommonCRCW PRAM algorithmfor computingthe booleanAND of n bits.

2. Input is an array of n elements.Give an 0(1)time,n-processorcommon CRCW PRAM algorithmto checkwhetherthe array is in sortedorder.

3.Solvethe booleanOR and AND problemson the CREW and EREWPRAMs. What arethe timeand processorboundsof your algorithms?

4. Thearray A is an array of n keys, whereeachkey is an integerin therange[l,n].Theproblemis todecidewhetherthereareany repeatedelementsin A. Show how you do this in 0(1)timeon an n-processorCRCW PRAM. Which versionof the CRCW PRAM areyou using?

5.CanExercise4 besolved in 0(1)timeusingn processorson any of thePRAMs if the keysarearbitrary? Howabout if therearen2 processors?

6.The string matchingproblemtakesas input a text t and a patternp,wheret and p are stringsfrom an alphabetS. Theproblemis todetermineall the occurrencesof p in t. Presentan O(l)timePRAMalgorithmfor stringmatching.Which PRAM areyou usingand whatis the processorboundof your algorithm?

7. ThealgorithmA is a parallelalgorithmthat has two components.Thefirst componentruns in

\302\251(log logn) timeusinglog[LnEREW PRAM

processors.Thesecondcomponentruns in Q(logn) timeusing j^^CREW PRAM processors.Show that the wholealgorithmcanbe runin Q(logn) timeusing j^-^ CREW PRAM processors.

13.3.FUNDAMENTALTECHNIQUESAND ALGORITHMS 615

13.3FUNDAMENTALTECHNIQUESAND ALGORITHMS

In this sectionwe introducetwo basicproblemsthat arisein the parallelsolutionof numerousproblems.The first problemis known as the prefixcomputationproblemand the secondone is calledthe list ranking problem.

13.3.1PrefixComputationLet \302\243 be any domainin which the binary associativeoperatorffi is defined.An operatorffi is said to be associativeif for any threeelementsx,y, andz from S, ((x @ y) @ z) \342\200\224 (x @ (y @ z)); that is,the orderin which theoperationffi is performeddoesnot matter.It is alsoassumedthat ffi is unittimecomputableand that \302\243 is closedunderthis operation;that is,for anyx,y \342\202\254 X, x ffi y \342\202\254 X. The prefix computationproblemon S has as inputn elementsfrom E, say, x\\, X2, \342\226\240\342\226\240\342\226\240,xn.The problemis to computethe nelementsx\\,x\\ ffi x?,\342\226\240\342\226\240.,x,\\ffi x<i ffi X3 ffi \342\200\242\342\200\242\342\200\242ffi xn. Theoutput elementsareoften referredto as the prefixes.

Example13.11Let S be the set of integersand ffi be the usual additionoperation.If the input tothe prefix computationproblemis 3,\342\200\2245, 8, 2,5,4,the output is 3,-2,6,8,13,17.As another example,let S be the set ofintegersand ffi bethe multiplicationoperation.If 2,3,1,\342\200\2242,

\342\200\2244 is the input,the output is 2,6,6,-12,48. \342\226\241

Example13.12Let S be the set of all integersand ffi be the minimumoperator.Note that the minimum operatoris associative.If the input to theprefix computationproblemis 5,8,\342\200\2242, 7,-11,12,the output is 5,5,\342\200\2242, \342\200\2242,

\342\200\224 11,-11.In particular,the last elementoutput is the minimum amongallthe input elements. \342\226\241

The prefixcomputationproblemcanbesolved in 0{n)timesequentially.Any sequentialalgorithmfor this problemneedsQ,{n)time. Fortunately,work-optimalalgorithmsareknown for the prefix computationproblemonmany modelsof parallelcomputing.We presenta CREW PRAM algorithmthat uses^^ processorsand runs in 0(logn)time.Note that the workdoneby suchan algorithmis 0(n)and hencethe algorithmhas an efficiency of0(1)and is work-optimal.Also, the speedupof this algorithmis @(n/logn).

We employthe divide-and-conquerstrategy to devisethe prefixalgorithm.Let the input bexi,X2, \342\226\240\342\226\240

\342\226\240, xn. Without lossof generalityassumethat n isan integralpower of 2. We first presentan n-processorand O(logn) timealgorithm(Algorithm 13.2).


Step0. If n = 1,oneprocessoroutputs x\\.

Step1. Let the first n/2processorsrecursively computetheprefixesof X\\,X2, \342\226\240\342\226\240\342\226\240,xn/2 and let 2/1,2/2> \342\200\242\342\200\242\342\200\242

> 2M/2 be theresult. At the sametime let the rest of the processorsrecursively computethe prefixesof xn/2+i,xn/2+2,\342\200\242\342\200\242\342\200\242ixn and letyn/2+i,yn/2+2,---,ynbe the output.

Step2. Note that the first half of the final answer is the sameas 2/i,j/2) \342\200\242\342\200\242\342\200\242j 2M/2- Thesecondhalf of the final answer is yn/2 \302\256

yn/2+l,Vn/2\302\256 Vn/2+2, \342\226\240\342\226\240\342\226\240,Vn/2 \302\256 2M-

Let the secondhalf of the processorsreadyni2 concurrentlyfromthe globalmemory and updatetheir answers. This steptakes0(1)time.

Algorithm13.2Prefixcomputationin O(logn) time

Example13.13Let n = 8 and p = 8.Let the input to the prefixcomputation problembe 12,3,6,8,11,4,5,7 and let \302\256 beaddition.In step1,processors 1to 4 computethe prefix sumsof 12,3,6,8to arrive at 12,15,21,29.At the sametimeprocessors5 to8 computethe prefix sumsof 11,4,5,7toobtain 11,15,20,27.In step2,processors1to 4 don'tdo anything.Processors 5 to 8 updatetheir resultsby adding29 to every prefix sum and get40,44,49,56. \342\226\241

What is the timecomplexityof Algorithm 13.2?Let T(n) bethe run timeof Algorithm 13.2on any input of sizen usingn processors.Step1takesT(^) timeand step2 takes0(1)time.So,we get the following recurrencerelationfor T(n):

T(n) = t(!)+0(1),T(l)= l

This solves to T{n) = O(logn).Note that in denningthe run timeof aparalleldivide-and-conqueralgorithm,it is essentialtoquantify it with thenumberof processorsused.

Algorithm 13.2is not work-optimalfor the prefix computationproblemsincethe total work doneby this algorithmis 0(nlogn),whereasthe runtimeof the best-knownsequentialalgorithmis 0(n).A work-optimalalgorithm canbeobtainedby decreasingthe numberof processorsusedto j^^,


Step 1. Processor i (i = 1,2,...,^?^)in

parallel computesthe prefixesof its logn assignedelements\302\243(i-i)iogn+i^(i-i)iogn+2>---,îogn-This takesO(logn) time.Let the resultsbe ^-i)iogn+i,^(i-i)iogn+2,\342\200\242\342\200\242\342\200\242,ZiioSn-

Step2. A total of^\302\261\342\200\224

processorscollectively employAlgorithm 13.2to computethe prefixesof the ^^ elements2-logn i z2log n i z?> log n >

\342\226\240\342\226\240\342\226\240i zn \342\226\240 -^Gt W\\0g n, W2 log n 1 ^3 log m \342\226\240\342\200\242\342\226\2401 ^n bethe result.

Step3. Each processorupdatesthe prefixesit computedinstep1as follows.Processori computesand outputsW(j_i)i0gn \302\251

z(i-1)logn+liw(i-l)logn\302\256 z(i-l)logn+2)\342\200\242\342\200\242\342\200\242

> w(i-l)logn\302\256 zilogn,f\302\260r

i = 2,3,...,j^^.Processor1outputs z\\, Z2, \342\226\240\342\226\240

\342\226\240, z\\0gn without

any modifications.

Algorithm13.3Work-optimal logarithmictimeprefix computation

while keepingthe asymptotic run timethe same. The numberofprocessors usedcan be decreasedto

^z\342\200\224

as follows. We first reducethe numberof inputs to j^^, apply the non-work-optimalAlgorithm 13.2to computethe prefixesof the reducedinput, and then finally computeall the nprefixes. Every processorwill be in chargeof computinglognfinal answers.If the input is xi,x%,\342\226\240\342\226\240\342\226\240,xn and the output is yi,y2,-\342\226\240\342\226\240,Un, letprocessor i be in chargeof the outputs y(i-i)i0gn+i,y(i-i)iogn+2,\342\200\242\342\200\242\342\200\242,2/iiogn, fori = 1,2,...,j^^- The detailedalgorithmappearsas Algorithm 13.3.Thecorrectnessof the algorithmis clear. Step1 takesO(logn) time. Step2takes

0(\\og(^\342\200\224))= O(logn) time(usingAlgorithm 13.2).Finally, step3

alsotakesO(logn) time.Thuswe get the following theorem.

Theorem13.2Prefixcomputationonann-elementinput canbeperformedin O(logn) timeusing

^\342\200\224

CREW PRAM processors. \342\226\241

Example13.14Let the input tothe prefix computationbe5,12,8,6,3,9,11,12,1,5,6,7,10,4,3,5and let \302\256 stand for addition.Heren = 16andlogn= 4. Thus in step 1,eachof the four processorscomputesprefixsums on four numberseach. In step 2,prefix sums on the localsums is


computed,and in step3,the locally computedresultsare updated.Figure13.3illustratesthesethreesteps. \342\226\241

processor15,12,8,6

5,17,25,31

processor2 processor3 processor43,9,11,12 1,5,6,7 10,4, 3,5

step1(localto processors)

3,12,23,35 1,6,12,49 10,14,17,22

(localsums)-:\342\200\224^\342\200\224y *\342\226\24031,35,19,22

step2(globalcomputation)

3,1,66,85,107

5,17,25,31

5,17,25,31

3,12,23,35

34,43,54,66

1,6,12,19 10,14,17,22

step3 (update)

67,72,78,85 95,99,102,107

Figure13.3Prefixcomputation-an example

13.3.2ListRankingList rankingplays a vital rolein the parallelsolutionof severalgraphproblems. The input to the problemis a list given in the form of an array ofnodes.A nodeconsistsof somedata and a pointerto its right neighborinthe list.Thenodesthemselvesneednot occurin any orderin the input.Theproblemis to computefor eachnodein the list the numberof nodesto its right (alsocalledthe rank of the node). Sincethe data containedin


5 4 2 0 3 1

A[l] A[2] A[3] A[4] A[5] A[6]

Figure13.4Input to the list rankingproblemand the correspondinglist

any nodeis irrelevant tothe list rankingproblem,we assumethat eachnodecontainsonly a pointerto its right neighbor.The rightmostnode'spointerfield is zero.

Example13.15Considerthe input ^4[1 : 6] of Figure 13.4.The rightneighborof node^4[1]is A[5]. Theright neighborof node 4[2]is ^4[4].Andso on.Node ^4[4]is the rightmostnode;henceitsrank is zero.Node 4[2]hasrank 1sincethe only nodeto itsright is ^4[4].Node 4[5]has rank 3 sincethenodes 4[3],-4[2],and ^4[4]are to its right. In this example,the left-to-rightorderof the list nodesis given by ^4[6],^4[1],^4[5],^4[3],^4[2],^4[4]. \342\226\241

List rankingcan be donesequentiallyin lineartime.First,the list headis determinedby examining 4[1: n] to identify the uniquei, 1 < i; < n,such that A[j] ^ i,1< j < n. NodeA[i] is the head.Next,a left-to-rightscanof the list is madeand nodesareassignedthe ranksn \342\200\224 l,n \342\200\224 2, ...,0in this order. In this section,we developtwo parallelalgorithmsfor listranking.Thefirst is an n-processorO(logn) timeEREW PRAM algorithmand the secondis an j^^-processorO(logn) timeEREW PRAM algorithm.Thespeedupsof both the algorithmsare \302\251(n/logn). Theefficiency of thefirst algorithmis Q/^ , =

\302\251(1/ logn), whereasthe efficiency of the second

algorithmis ^|^|= 0(1).Thus the secondalgorithmis work-optimalbutthe first algorithmis not.

DeterministiclistrankingOneof the crucialideasbehindtheseparallelalgorithmsis pointerjumping.To beginwith, eachnodein the list points to itsimmediateright neighbor(seeFigure13.5(a)).In onestepof pointerjumping,the right neighborof


every nodeis modified to be the right neighborof itsright neighbor(seeFigure13.5(b)).Notethat if we have n processors(oneprocessorpernode),this can be done in 0(1)time. Now every nodepoints to a nodethat wasoriginally a distanceof 2 away. In the next stepof pointerjumpingeverynodewill point toa nodethat was originally a distanceof 4 away. And soon. (SeeFigure13.5(c)and (d).) Sincethe lengthof the list is n, within[logn]pointerjumpingsteps,every nodewill point to the end of the list.

1 1 1 1 1 01

Figure13.5Pointerjumpingappliedto list ranking

In eachstep of pointerjumpinga nodealsocollectsinformationas tohow many nodesare betweenitselfand the nodeit newly points to. Thisinformationis easy to accumulateas follows. To start with, set the rankfield of eachnodeto 1exceptfor the rightmostnodewhoserank field iszero. Let Rank[i] and Neighbor[i]stand for the rank field and the rightneighborof nodei. At any stepof pointerjumping,Rank[i]is modified toRank[i]+Rank[Neighborli in parallelfor all nodesotherthan thosewith

Neighbor[]= 0.Thisis followed by makingi pointto Neighbor[Neighbor[i]].Thecompletealgorithmis given in Algorithm 13.4.Processori is associatedwith the nodeA[i], 1< i <n.

Example13.16For the input of Figure13.4,Figure 13.6walks throughthe stepsof Algorithm 13.4.To beginwith, every nodehas a rank of oneexceptfor node4.When q = 1,for example,nodel'sRank field is changedto two, sinceits right neighbor(i.e.,node5) has a rank of one.Also, nodel'sNeighborfield is changedto the neighborof node5, which is node3.And soon. \342\226\241


for q :=1to [logn]doProcessori (in parallelfor 1< i < n) does:

if (Neighbor[i] ^ 0) then{

Rank[i]:=Rank[i]+Rank[Neighbor[i]]',Neighbor[i]:=Neighbor[Neighbor[i]];

}

Algorithm13.4An O(nlogn)work list rankingalgorithm

Neighbor

5 4 2 0 3 1

Rank

1 1 1 0 1 1 tobeginwith

3 0 4 0 2 5 2 12 0 2 2 q = i

4 0 0 0 0 2 4 12 0 3 4 q=2

0 0 0 0 0 0 4 12 0 3 5 q=3

Figure13.6Algorithm 13.4workingon the input of Figure13.4

Definition13.3Let A bean array of nodescorrespondingto a list.Alsoletnodei have a realweight Wi associatedwith it and let \302\251 beany associativebinary operationdefined on the weights.The list prefix computation is theproblemof computing,for eachnodein the list,w% \302\251 w^ \302\251 Wi2 \302\251

\342\200\242\342\200\242\342\200\242

\302\251 wik,wherei\\, %ii \342\226\240\342\200\242\342\200\242

-, ^k are the nodesto the right of i. \342\226\241

Note that the list rankingproblemcorrespondsto a list prefixsumscomputation, whereeachnodehas a weight of 1exceptfor the rightmostnode


whoseweight is zero.Algorithm 13.4can beeasily modifiedto computelistprefixeswithout any changein the processorand timebounds.

RandomizedlistrankingNext we presenta work-optimalrandomizedalgorithmfor list ranking.Eachprocessoris in chargeof computingthe rank of lognnodesin the input.Processori is assignedthe nodesA[(i \342\200\224 l)logn+ 1],^4[(\302\253

\342\200\224 l)logn+ 2],...,A[i logn].Thealgorithmruns in stages.In any stage,a fractionof theexistingnodesis selectedand eliminated(or splicedout). When a nodei issplicedout, the relevant information about this nodeis storedsothat in thefuture its correctrank can be determined.When the numberof remainingnodesis two, the list rankingproblemis solved trivially. Fromthe nextstageon, spliced-outnodesgetinsertedback(i.e.,splicedin). When a nodeis splicedin, its correctrank will alsobe determined.Nodesaresplicedinin the reverseorderin which they weresplicedout.Thesplicing-outprocessis depictedin Algorithm 13.5.

Node insertionis alsodone in stages.When a nodex is splicedin,its correctrank can be determinedas follows: If LNeighbor[x]was thepointerstoredwhen it was splicedout, the rank of x is the current rankof LNeighbor[x]minus the rank that was storedwhen x was splicedout.Pointersarealsoadjustedtotakeinto accountthe fact that now x has beeninserted(seeFigure13.7).

We show that the totalnumbers of stagesof splicing-outis O(logn).Ifa nodegetssplicedout in stageq, then it'llbesplicedin in stage2s\342\200\224 q +1.So the overall structureof the algorithmis as follows: In stages1,2,..., s,nodesare successivelysplicedout.Stages is suchthat thereare only twonodesleft and oneof them is splicedout.In stages + 1,the nodethat wassplicedout in stages is splicedin.In stages+2, the nodesthat weresplicedout in stages \342\200\224 1are splicedin. And soon.Followingthe last stage,weknow the ranks of all the nodesin the originallist.

The nodessplicedout in any stageare such that (1) from amongthenodesassociatedwith any processor,at mostonenodeis selectedand (2)notwo adjacentnodesof the list areselected.Sincein any stage,processorqconsidersonly onenode,at mostoneof its nodesis splicedout.Also realizethat no two adjacentnodesfrom the list aresplicedout in any stage.Thisis becausea processorwith a head proceedsto splicethe chosennodeonlyif the right neighbor'sprocessordoesnot have a head. Therefore,the timespent by any processorin a given stageis only 0(1).

To computethe totalrun timeof the algorithm,we only have to computethe value of s,the numberof stages.Thiscanbedoneif we canestimatethenumberof nodesthat will besplicedout in any stage.If q is any processor,in a given stageits chosennodex is splicedout with probability at least|.The reasonsare(1)the probability for q to comeup with a headis ^ and (2)


Step1,Doubly link the list.With n processors,this can be done in 0(1)timeas follows.Processori is associatedwith nodeA[i] (for 1< i < n). Inonestep,processori writesi in memory cellNeighbor[i]so that in the nextstepthe processorassociatedwith the nodeA[Neighbor[i]]will knowits leftneighbor. Usingthe slow-downlemma,this can alsobe done in O(logn)timeusing -^r- processors.Let LNeighbor[i]and RNeighbor[i]stand forthe left and right neighborsof nodeA[i], To beginwith, the rank field ofeachnodeis as shown in Figure13.5(a).Step2.while (the numberof remainingnodesis > 2) do{

Stepa.Processorq (1< q < j^^) considersthe next unsplicednode(callit x) associatedwith it.It flips a two-sidedcoin.If theoutcomeis a tail, the processorbecomesidlefor the restof thestage.In the nextstageit againattemptsto spliceout x. On theother hand, if the coin flip resultsin a head, it checkswhetherthe right neighborof x is beingconsideredby the correspondingprocessor.If the right neighborof x is beingconsideredand thecoinflip of that processoris alsoa head,processorq givesup andis idlefor the restof the stage.If not, q decidestospliceout x.

Stepb. When node x is splicedout, processorq storesin node x the stagenumber, the pointerLNeighbor[x\\,andRank[LNeighbor[x]].Rank[LNeighbor[x]]at this time is thenumberof nodesbetweenLNeighbor[x]and x.Processorq alsosets Rank[LNeighbor[x]]:= Rank[LNeighbor[x]]+ Rank[x];.Finally it setsRNeighbor[LNeighbor[x]]:= RNeighbor[x];and LNeighbor[RNeighbor[x]]:=LNeighbor[x]',.

Algorithm13.5Splicingout nodes


1

*

1 1 1

*

1 01V

spliceout

recursivelyfind ranks

0

spliceinV

0v

* Denotesnodessplicedout.

Figure13.7Splicingin and splicingout nodes.Only right links of nodesareshown.


the probability that the right neighborof x (callit y) eitherhas not beenI2*chosenor, if chosen,y's processorhas a tail is > I. Sinceevents 1and 2 are

independent,the claimfollows.Every processorbeginsthe algorithmwith logn nodes,and in every stage

it has a probability of > \\ of splicingout a node.Thus it follows that theexpectedvalue of s is < 4 logn.We canalsouseChernoffbounds(Equation1.2with parameters12alognand|with e = |)to show that the value ofs is < 12alog?i with probability > (1\342\200\224 n~a) for any a > 1.As a result weget the following theorem.

Theorem13.3List ranking on a list of length n can be performedinO(logn) timeusing

\\\302\243\342\200\224

EREW PRAM processors.

EXERCISES1.Thereis somedata in cellMi of the globalmemory. The goal is

to make copiesof this data in cellsM2,M3,...,Mn. Show how youaccomplishthis in O(logn) timeusingn EREW PRAM processors.

2.Presentan O(logn) timej^;processorEREW PRAM algorithmforthe problemof Exercise1.

3. Show that Theorem13.2holdsfor the EREW PRAM also.4. Let f(x) = anxn+an^\\xn~x-\\ \\-a\\x +aQ. Presentan O(logn) time

^^-processorCREW PRAM algorithmto evaluatethe polynomial /at a given point y.

5. The segmentedprefix problemis defined as follows: Array A has nelementsfrom somedomainS.Array B[l:n] is a Booleanarray withB[l]= 1.Definea segmentof A to beA[i :j],whereB[i]= 1,B[j]= 1,and B[k]= 0, i < k < j. As a conventionassumethat B[n+ 1]= 1.Theproblemis to performseveralindependentprefixcomputationsonA, one for eachsegment.Show how to solve this problemin O(logn)timeon a j^-processorCREW PRAM.

6. If k\\, &2,- \342\200\242

\342\200\242, kn are from S and \302\251 is a binary associativeoperatoronX, the suffix computationproblem is to output kn, kn-\\ \302\251 kn, ...,k\\ \302\251 &2 \302\251

\342\200\242\342\200\242\342\200\242

\302\251 kn. Showhow you'll solvethis problemin 0(logn) timeon an ^-processorCREW PRAM as well as EREW PRAM.

7. Theinputsarean array A of n elementsand an elementx. The goalis torearrangethe elementsof A suchthat all the elementsof A thatarelessthan or equaltox appearfirst (in successivecells)followed by


the restof the elements.Give an 0(logn)-timej^^-processorCREWPRAM algorithmfor this problem.

8.Array A is an array of n elements,whereeachelementhas a labelofzeroor one.Theproblemis to rearrangeA sothat all the elementswith a zerolabelappearfirst, followedby all the others.Showhow toperformthis rearrangementin O(logn)timeusing^^ CREW PRAMprocessors.

9. Let A be an array of n keys. The rank of a key x in A is defined tobe one plus the numberof elementsin A that are lessthan x. GivenA and an x, show how you computethe rank of x using j^t-^ CREWPRAM processors.Your algorithmshouldrun in O(logn) time.

10.Showhow you modify Algorithm 13.4and the randomizedlist rankingalgorithmso they solve the list prefix computationproblem.

11.Array A is an array of nodesrepresentinga list.Eachnodehas a labelof eitherzeroor one.You aresupposedto splitthe list in two, thefirst list containingall the elementswith a zerolabeland the secondlist consistingof all the restof the nodes.Theoriginalorderof nodesshouldbepreserved.For example,if a; is a nodewith a zerolabelandthe next right nodewith a zerolabelis z, then x shouldhave z asits right neighborin the first list created.Presentan O(logn) timealgorithmfor this problem.You can use up to

^\342\200\224

CREW PRAMprocessors.

212.Presentan O(logn) timej^-^-processorCREW PRAM algorithmto

multiply an n x n matrixwith an n x 1 columnvector. How willyou solvethe sameproblemon an EREW PRAM? You can use0(n2)globalmemory. (Hint:SeeExercise1.)

313.Show how to multiply two n x n matricesusing^\342\200\224

CREW PRAMprocessorsand O(logn) time.

14.Strassen'salgorithmfor matrixmultiplicationwas introducedinSection 3.7.Usingthe sametechnique,designa divide-and-conqueralgorithm for matrixmultiplicationthat usesonly nlog2 7 CREW PRAMprocessorsand runs in O(logn) time.

15.Provethat two n x n booleanmatricescanbemultipliedin O(l)timeon any of the CRCW PRAMs. What is the processorbound of youralgorithm?

16.In Section10.3,a divide-and-conqueralgorithmwas presentedforinverting a triangularmatrix.Parallelizethis algorithmon the CREW

13.4.SELECTION 627

PRAM to get a run timeof 0(log2n).What is the processorboundof your algorithm?

17.A tridiagonalmatrixhas nonzeroelementsonly in the diagonalandits two neighboringdiagonals(onebelow and one above).PresentanO(logn) time

^\342\200\224-processorCREW PRAM algorithmto solve the

system of linearequationsAx = 6, whereA is an n x n tridiagonalmatrixand x (unknown) and b arenxl columnvectors.

18.An optimalalgorithmfor FFTwas given in Section9.3.Presentawork-optimalparallelizationof this algorithmon the CREW PRAM.Your algorithmshouldrun in O(logn) time.

19.Let X and Y be two sortedarrays with n elementseach. Show howyou mergethesein 0(1)timeon the commonCRCW PRAM. Howmany processorsdo you use?

13.4 SELECTIONThe problemof selectionwas introducedin Section3.6.Recallthat thisproblemtakesas input a sequenceof n keys and an integeri, 1< i < n,and outputs the ith smallestkey from the sequence.Severalalgorithmswere presentedfor this problemin Section3.6.One of thesealgorithms(Algorithm3.19)has a worst-caserun timeof 0(n)and henceis optimal.In this sectionwe study the parallelcomplexityof selection.We start bypresentingalgorithmsfor many specialcases.Finally, we give an O(logn)time

j\342\200\224^-processorcommonCRCW PRAM algorithm.Sincethe workdone

in this algorithmis 0(n),it is work-optimal.

13.4.1MaximalSelectionwith n2 ProcessorsHerewe considerthe problemof selectionfor i = n; that is,we areinterestedin finding the maximumof n given numbers.Thiscanbedonein O(l)timeusingan ?i2-processorCRCW PRAM.

Let ki,k2,--.,knbe the input. The ideais toperformallpairsofcomparisons in onestepusingn2 processors.If we namethe processorspij (for1<

\302\253, j < n), processorpij computesXij= (fcj < kj).Without lossof

generality assumethat allthe keys aredistinct.Even if they arenot, they can bemadedistinctby replacingkey ki with the tuple (ki,i)(for 1< i < n); thisamountstoappendingeachkey with only a (logn)-bitnumber. Of all theinput keys, thereis only one key k which when comparedwith every otherkey, would have yielded the samebit zero.This key can be identifiedusingthe booleanORalgorithm(Algorithm13.1)and is the maximumof all.Theresultantalgorithmappearsas Algorithm 13.6.


Step0. If n = 1,output the key.

Step1.Processorpij (for each1< i,j < n in parallel)computes Xij

= {hi < kj).

Step 2. The n2 processorsare grouped into n groupsGi,G2,...,Gn, where Gi (1< i < n) consistsof theprocessors pn,Pi2,\342\200\242\342\200\242\342\200\242,ptn- EachgroupGi computesthe booleanORof\342\200\242Ejl ? -till \342\200\242\342\200\242\342\200\242j -^rn-

Step3. If Gi computesa zeroin step 2, then processorpnoutputs ki as the answer.

Algorithm13.6Findingthe maximumin 0(1)time

Steps1and 3 of this algorithmtakeone unit of timeeach.Step2 takes0(1)time(seeTheorem13.1).Thusthe wholealgorithmruns in 0(1)time;this impliesthe following theorem.

Theorem13.4The maximumof n keys can be computedin 0(1)timeusingn2 commonCRCW PRAM processors. \342\226\241

Notethat the speedupof Algorithm 13.6is ^f^ = Q(n).Totalworkdoneby this algorithmis

\302\251(n2).Henceits efficiency is qAÂ

=\302\251(1/n). Clearly,

this algorithmis not work-optimal!

13.4.2 Findingthe MaximumUsingn ProcessorsNow we show that maximalselectioncan bedonein O(loglogn)timeusingn commonCRCW PRAM processors.The techniqueto be employed isdivide-and-conquer.To simplify the discussion,we assumen is a perfectsquare(when n is not a perfectsquare,replacey/n by [y^l m *ne followingdiscussion).

Let the input sequencebe k\\, &2, \342\226\240

\342\226\240., kn. We areinterestedin developingan algorithmthat can find the maximumof n keys usingn processors.LetT(n) be the run timeof this algorithm.We partitionthe input into y/nparts,whereeachpart consistsof y/n keys. Allocatey/n processorsto eachpart sothat the maximumof eachpart can be computedin parallel.Sincethe recursivemaximalselectionof eachpart involves y/n keys and an equal

13.4.SELECTION 629

Step0.If n = 1,return k\\.

Step1.Partitionthe input keys into y/n partsK\\, K^^ \342\226\240

\342\226\240., -K\"^whereiQ consistsof fc(i_i)v^+i,fyi-1)^+2.\342\226\240\342\226\240

\342\226\240, ^v^- Similarlypartitionthe processorsso that -Pj (1< \302\253

< v^) consistsofthe processorsp{i__1)^+1,^1)^+2'\342\200\242\342\226\240\342\226\240'*V^ Let -^ find themaximumof Ki recursively (for 1< i < y/n).

Step2. If Mi,M2,...,M/^ are the group maxima,find andoutput the maximumof thesemaximaemployingAlgorithm 13.6.

Algorithm13.7Maximalselectionin O(loglogn)time

numberof processors,thiscanbedonein T(y/n) time.Let Mi,M2,...,M/^be the groupmaxima.The answer we aresupposedtooutput is themaximum of thesemaxima.Sincenow we only have y/n keys, we can find themaximumof theseemployingall the n processors(seeAlgorithm 13.7).

Step1of this algorithmtakesT(y/n) timeand step2 takes0(1)time(c.f.Theorem13.4).ThusT(n) satisfies the recurrence

T(n) = T{y/n)+0{\\)

which solvesto T(n) = O(loglogn).Thereforethe following theoremarises.

Theorem13.5The maximumof n keys can be found in O(loglogn)timeusingn commonCRCW PRAM processors. \342\226\241

Totalwork doneby Algorithm 13.7is\302\251(n loglogn) and its efficiency is

efaio ligrel=

0(l/l\302\260\302\247l\302\260\302\247n)-Thus this algorithmis not work-optimal.

13.4.3MaximalSelectionAmongIntegersConsideragain the problemof finding the maximumof n given keys. Ifeachone of thesekeys is a bit,then the problemof finding the maximumreducestocomputingthe booleanOR of n bitsand hencecan be done in

0(1)timeusingn commonCRCW PRAM processors(seeAlgorithm 13.1).Thisraisesthe following question:What canbe the maximummagnitudeofeachkey if we desirea constanttimealgorithmfor maximalselectionusingn processors?Answering this questionin its full generality is beyond the


for i :=1to2cdo{

Step1.Find the maximumof all the alive keys withrespectto their ith parts.Let M be the maximum.

Step2.Deleteeachalive key whoseith part is < M.

}Output oneof the alive keys.

Algorithm13.8Integermaximum

scopeof this book.Insteadwe show that if eachkey is an integer in therange [0,nc],wherec is a constant, maximalselectioncan be done work-optimally in 0(1)time.Speedupof this algorithmis @(n) and its efficiencyis 0(1).

Sinceeachkey is of magnitudeat mostnc, it follows that eachkey is abinary numberwith < clogn bits.Without lossof generalityassumethatevery key is of lengthexactlyequal to clogn.(We can add leadingzerobitsto numberswith fewer bits.)Supposewe find the maximumof the n

keys only with respectto their -^pmostsignificantbits(MSBs)(seeFigure13.8).Let M bethe maximumvalue.Then,any keywhose-^pMSBsdo notequalM can be droppedfrom future considerationsinceit cannotpossiblybe the maximum.After this step,many keys can potentially survive. Nextwe computethe maximumof the remainingkeys with respectto their next-^fp MSBsand dropkeys that cannotpossiblybe the maximum.We repeatthis basicstep2ctimes(oncefor every -^p bitsin the input keys). Oneof the keys that survives the very last stepcan beoutput as the maximum.Refer to the ^ MSBsof any key as its first part, the nextmostsignificant-Îp bitsas its secondpart,and so on.Thereare2cpartsfor eachkey. The2cth part may have lessthan -^p bits.The algorithmis summarizedinAlgorithm 13.8.To beginwith, all the keys are alive.

We now show that step 1of Algorithm 13.8can be completedin 0(1)timeusingn commonCRCW PRAM processors.Note that if a key hasat most-^fp bits,its maximummagnitudeis %/n

\342\200\224 1. Thus eachstepofAlgorithm 13.8is nothingbut the task of finding the maximumof n keys,whereeachkey is an integerin the range [0,y/n \342\200\224 1].Assignoneprocessorto

13.4.SELECTION 631

(logn)l2bits

\342\200\242

\342\200\242

\342\200\242

(logn)l2bits

\342\200\242

\342\200\242

\342\200\242

\342\200\242 \342\200\242 \342\200\242

\342\200\242

\342\200\242

\342\200\242

(logn)l2bits

\342\200\242

\342\200\242

\342\200\242

Figure13.8Findingthe integermaximum

eachkey. Makeuseof y/n globalmemorycells(whichareinitializedto\342\200\224oo).

CallthesecellsMq,M\\....,M/^_1.In oneparallelwrite step,if processorihas a key ki, then it triesto write ki in M^. For example,if processori hasa key valued 10,it will attempttowrite 10in M]$.After this write step,the problemof computingthe maximumof the n keys reducesto computingthe maximumof the contentsof Mq,Mi,....M/^_1.Sincetheseare onlyy/n numbers,their maximumcan be found in 0(1)timeusingn processors(seeTheorem13.4).As a resultwe get the following theorem.

Theorem13.6Themaximumof n keyscanbe found in 0(1)timeusingnCRCW PRAM processorsprovided the keys areintegersin the range [0,nc]for any constantc. \342\226\241

Example13.17Considerthe problemof finding the maximumof thefollowing four four-bit keys: kx = 1010,k2 = 1101,k3 = 0110,and h = H00.Heren = 4, c = 2,and logn= 2.In the first basicstepof Algorithm 13.8,the maximumof the four numberswith respectto their MSB is 1.Thus k%

gets eliminated.In the secondbasicstep,the maximumof fci.fo,and k\\

with respectto their secondpart (i.e.,secondMSB)is found. As a result,k\\ is dropped.In the third basicstep,no key gets eliminated.Finally, inthe fourth basicstep,k\\ is deletedto output ki as the maximum. \342\226\241


13.4.4 GeneralSelectionUsingn2 ProcessorsLet X = k\\, k2, - \342\200\242

\342\226\240, kn be a given sequenceof distinctkeys and say we areinterestedin selectingthe ith smallestkey. The rank of any key x in X isdefined to be one plus the numberof keys of X that arelessthan x. If wehave j^^ CREW PRAM processors,we can computethe rank of any givenkey x in O(logn) time(seeSection13.3,Exercise9).

2If we have

j^\342\200\224processors,we can groupthem into Gi,G2l...,Gn such

that eachGj has j^-^ processors.Gj computesthe rank of kj in X (for1< j < n) usingthe algorithmof Section13.3,Exercise9. This will takeO(logn) time. One of the processorsin the groupwhoserank was i willoutput the answer.Thus we get the following theorem.

2Theorem13.7Selectioncan be performedin O(logn) time using ^~CREW PRAM processors. \342\226\241

e(n)The algorithmof Theorem13.7has a speedupof q,^' , =

\302\251(n/logn)

Itsefficiency is st^t =\302\251(1/n); that is,the algorithmis not work-optimal!

13.4.5A Work-OptimalRandomizedAlgorithm(*)In this sectionwe show that selectioncan be done in O(logn) timeusingj^^ commonCRCW PRAM processors.Therandomizedalgorithmchoosesa randomsample(callit S) from X of sizen1_e(for somesuitablee) andselectstwo elementsof S as splitters.A choiceof e = 0.6suffices. Let l\\

and l2 be the splitters.Thekeys l\\ and l2 aresuchthat the elementto beselectedhas a value betweenl\\ and l2 with high probability (abbreviatedw.h.p.).In addition,the numberof keys of X that have a value in the range[h,l2]is small,0{rSl+e^2\\/\\ogn) to bespecific.

Having chosenl\\ and l2, we partitionX into Xi^X2, and X3, whereXi = {x \302\243 X\\x <h},X2 = {x \302\243 X\\h <x<l2},and X3 = {x \302\243 X\\x >h}-While performingthis partitioning,we alsocountthe sizeof eachpart.If |Xi|< i < \\Xi\\ + IX21,the elementto beselectedliesin X2. If this is thecase,we proceedfurther. If not, we start allover again.We can show thatthe ith smallestelementof X will indeedbelongto X2 with high probabilityand alsothat \\X2\\

= N = 0{rSl+e^2y/\\ogn).The elementto be selectedwill be the (i \342\200\224 |Xi|)thsmallestelementof X2-

Theprecedingprocessof samplingand eliminationis repeateduntil thenumberof remainingkeys is < n0A. After this, we performan appropriateselectionfrom out of the remainingkeys usingthe algorithmof Theorem13.7.More detailsof the algorithmare given in Algorithm 13.9.To begin

13.4.SELECTION 633

with, eachinput key is alive. Thereare j^-^ processorsand eachprocessorgetslognkeys. Concentration(in steps3 and 7) refersto collectingtherelevant keys and putting them in successivecellsin globalmemory (seeSection13.3,Exercise8).

Let a stagerefer to onerun of the whileloop.Thenumberof samplesinany given stageis binomialwith parametersN and N~t. Thusthe expectednumberof samplekeys is iV1\"''.UsingChernoffbounds,we can show that\\S\\ is 0(iV-f).

Let S be a sampleof s elementsfrom a set X of n elements.Let rj =rank(select(j,S),X).Hererank(x,X)is defined to beoneplus the numberof elementsin X that are lessthan x, and select(j,S) is defined tobe thejthsmallestelementof S.The following lemmaprovides a high probabilityconfidenceinterval for r7.

Lemma 13.3For every a.Prob.(\\rj \342\200\224

j^\\ > y/Sa-^-^logn) < rTa. \342\226\241

Fora proofof this lemma,see the referencessuppliedat the end of thischapter.Usingthis lemma,we can show that only 0(Ar(1+e)/2v/I5g\"/V)keyssurvive at the end of any stage,whereN is the numberof alive keys at thebeginningof this stage.This in turn impliesthereare only 0(1)stagesinthe algorithm.

Broadcastingin any parallelmachineis the operationof sendinga specificinformation to a specifiedsetof processors.In the caseof a CREW PRAM,broadcastingcan be done in 0(1)timewith a concurrentreadoperation.In Algorithm 13.9,steps1and 2 takeO(logn) timeeach. In steps3 and6,concentrationcan bedoneusinga prefix sumscomputationfollowedby awrite. Thusstep3 takes0(logn) time.Also, the samplesizein steps3 and6 is 0(nPA).Thus thesekeys can be sortedin O(logn) timeusinga simplealgorithm(given in Section13.6).Alternatively, the selectionsperformedin steps3 and 6 can beaccomplishedusingthe algorithmof Theorem13.7.Two prefix sums computationsare done in step 5 for a total of O(logn)time.Therefore,eachstageof the algorithmruns in O(logn) timeand thewhole algorithmalsoterminatesin timeO(logn);this impliesthe followingtheorem.

Theorem13.8Selectionfrom out of n keys can beperformedin O(logn)timeusing^- CREW PRAM processors. \342\226\241

EXERCISES1.Presentan O(loglogn)timealgorithmfor finding the maximumof n

arbitrary numbersusing , P commonCRCW PRAM processors.


N :=n; // N at any timeis the numberof live keyswhile (N > n0-4) do{

Step1.Eachlive key is includedin the randomsampleS withprobability j^. This steptakeslogntimeand with highprobability, 0(Nl~e)keys (from amongall the processors)are in therandomsample.

Step2.All processorsperforma prefix sumsoperationtocompute the number of keys in the sample. Let q be thisnumber. Broadcastq to all the processors.If q is not in the range[Q.hNl-%1.5JV1-\302\243], go to step1.

Step3.Concentrateand sortthe samplekeys.

Step4.Selectkeys l\\ and fe from Swith ranks My\342\200\224 dy/q logN

and U$ + dy/qlogN,respectively,d beinga constant> \\/3a.Broadcastl\\ and I2 to all the processors.Thekey to beselectedhas a value in the range [^1,^2]w.h.p.

Step5.Count the numberr of live keys that are in the range[Zi,^]-Also countthe numberof live keys that are< l\\. Let thiscountbe t. Broadcastr and t to all the processors.If i is not inthe interval (t, t + r] or if r is ^ 0(Af(1+e)/2yToglV),go tostep1;elsekill (i.e.,delete)all the live keys with a value < l\\ or > I2and set i :=i \342\200\224 t and N :=r.

Step6. Concentrateand sort the live keys. Identify and output the ithsmallestkey.

Algorithm13.9A work-optimalrandomizedselectionalgorithm

13.4.SELECTION 635

2.Showthat prefixminimacomputationcanbeperformedin 0(loglogn)timeusing , [' n commonCRCW PRAM processors.

3. Given an array A of n elements,we would like to find the largestisuch that A[i] = 1.Give an 0(1)timealgorithmfor this problemonan n-processorcommonCRCW PRAM.

4. Algorithm 13.6nms in timeO(l)usingn2 processors.Show how tomodify this algorithmsothat the maximumof n elementscanbefoundin 0(1)timeusingnl+f processorsfor any fixed e > 0.

5. If k is any integer> 1,the A;th quantilesof a sequenceX of n numbersaredefined to bethosek \342\200\224 1elementsof X that evenly divide X. Forexample,if k = 2,thereis only one quantile,namely, the medianofX. Show that the A;th quantilesof any given X can be computedin0(logA;logn)timeusing^- CREW PRAM processors.

6.Presentan O(l)timen-processoralgorithmfor finding the maximumof n given arbitrary numbers.(Hint: Employ randomsamplingalongthe samelinesas in Algorithm 13.9.)

7. Given an array A of n elements,the problemis to find any elementofA that is greaterthan or equal to the median.Presentan 0(1)timealgorithmfor this problem.You can usea maximumof log2n CRCWPRAM processors.

8. The distinct elementsproblemwas posedin Section13.2,Exercises4 and 5. Assume that the elementsare integersin the range [0,nc],wherec is a constant.Showhow tosolvethe distinctelementsproblemin 0(1)timeusingn CRCW PRAM processors.(Hint: You can use0(nc)globalmemory.)

9. Showhow to reducethe spaceboundof the abovealgorithmto 0(nl+f:)for any fixed e >0.(Hint: Usethe ideaof radixreduction(seeFigure13.8).)

10.If X is asortedarray of elementsand x is any element,we canmake useof binary searchto checkwhethersGlinO(logn) timesequentially.Assume that we have k processors,where k > 1. Can the searchbe donefaster? One way of makinguse of all the k processorsis topartitionX into k nearly equal parts.Each processoris assignedapart. A processorthen comparesx with the two endpointsof the partassignedto it to checkwhetherx falls in its part. If no part has x,then the answer is immediate.If x \342\202\254 X, only one part survives. Inthe next step,all the k processorscanwork on the surviving part in asimilarmanner.This is continueduntil the positionof x is pinpointed.


Theprecedingalgorithmis calleda k-ary search.What is the run timeof a A;-ary searchalgorithm?Show that if therearene CREW PRAMprocessors(for any fixed e > 0), we can checkwhethersGlin0(1)time.

13.5MERGING

Theproblemof mergingis to taketwo sortedsequencesas input and producea sortedsequenceof all the elements.This problemwas studied in Chapter3 and an 0(n)timealgorithm(Algorithm3.8)was presented.Mergingis animportantproblem.For example,an efficient mergingalgorithmcan leadtoan efficient sortingalgorithm(as we saw in Chapter3). The sameis true inparallelcomputingalso.In this sectionwe study the parallelcomplexityofmerging.

13.5.1A LogarithmicTimeAlgorithmLet Xi = &i,&2,..., km and X2 \342\200\224 A;m_|_i, A;m_|_2,\342\200\242\342\200\242\342\200\242, &2m be the input sortedsequencesto be merged.Assume without lossof generalitythat m is anintegralpower of 2 and that the keys aredistinct.Notethat the mergingofX\\ and X2 canbereducedto computingthe rank of eachkey k in X\\ U X%.If we know the rank of eachkey, then the keys canbemergedby writingthekey whoserank is i into globalmemory celli. This writingwill takeonlyonetimeunit if we have n = 2m processors.

For any key A;, let itsrank in X\\ (X2) bedenotedas r\\ (r|).If k = kj \302\243

X\\, then note that r\\= j. If we allocatea singleprocessorir to k, ir can

performa binary search(seeAlgorithms 3.2and 3.3)on X2 and figure outthe numberq of keys in X2 that are lessthan k. Onceq is known, ir cancomputek'srank in X\\ UX2 asj +q.If k belongsto X2, a similarprocedurecan be used to computeits rank in X\\ U Xi- In summary, if we have 2mprocessors(oneprocessorper key), mergingcan be completedin O(logm)time.

Theorem13.9Mergingof two sortedsequenceseachof length m can becompletedin O(logm)timeusingm CREW PRAM processors. \342\226\241

Sincetwo sortedsequencesof lengthm eachcan be sequentially mergedin G(m)time,the speedupof the above algorithmis

Q^\342\204\242m\\

= G(m/logm);its efficiency is Q(mwm) = 9(1/logm).Thisalgorithmis not work-optimal!

13.5.MERGING 637

13.5.2Odd-EvenMergeOdd-evenmergeis a mergingalgorithmbasedon divide-and-conquerthatyields itselfto efficient parallelization.If X\\ = hi,&2,.\342\200\242\342\200\242, km and X2 =km+\\, A;m+2,...,&2m (wherem is an integralpower of 2) are the two sortedsequencesto bemerged,then Algorithm 13.10)uses2m processors.

Step0. If m = 1,mergethe sequenceswith onecomparison.

Step1. PartitionX\\ and X2 into their oddand even parts.That is,partitionXx into Xfd = kuk3,...,hm^x and Xfven =fo,\302\2434,..., km. Similarly,partitionX? into X^ and X|\"e\".

Step2. Recursively mergeX\302\260dd with X^ using mprocessors. Let L\\ \342\200\224

\302\243i,\302\2432,-\342\226\240\342\200\242

,\302\243mbe the result. Note that

X\302\260dd,Xfven,X%dd, and X%ven are in sortedorder. At the sametimemergeXfen with X|\"e\"usingthe other m processorstoget L2 =

\302\243m+l,Pm+2i \342\200\242\342\200\242\342\200\242,^2ra-

Step3. Shuffle L\\ and L2; that is,form the sequenceL =\302\2431, lm+1,\302\2432, C+2,\342\200\242\342\200\242\342\200\242, t-m, hm.\342\226\240 Compareevery pair (\302\243m+i,\302\243i+x)

and interchangethem if they areout of order.That is,compare\302\243m+i

with\302\2432

and interchangethem if needbe, compare\302\243m+2

with\302\2433

and interchangethem if needbe,and soon.Output theresiiltantsequence.

Algorithm13.10Odd-evenmergealgorithm

Example13.18Let Xx = 2, 5,8,11,13,16,21,25and X2 = 4,9,12,18,23,27,31,34.Figure13.9shows how the odd-evenmergealgorithmcan beusedtomergethesetwo sortedsequences. \342\226\241

Let M(m) be the run timeof Algorithm 13.10on two sortedsequencesof lengthm eachiising2m processors.Then, step1takes0(1)time.Step2 takesM(m/2) time. Step3 takes0(1)time. This yields the followingrecurrencerelation:M(m) = M(m/2)+ 0(1)which solves to M{m) \342\200\224

O(logm).Thuswe arrive at the following theorem.

Theorem13.10Two sortedsequencesof lengthm eachcan bemergedinO(logm)timeusing2m EREW PRAM processors. \342\226\241


Xx = 2,5,8,11,13,16,21,25/odd.

X2= 4,9,12,18,23,27,31,34

2/ \\ 2

4,12,23,319,18,27,342,8,13,21

merge

L, = 2,4,8,12t13,21,23,31 L2 = 5,9,11,16,18,25,27,34shuffle

L = 2,5,4, 9,8,11,12,16,13,18,21,25,23,27,31,34<k-j <k-j <k-j <k-j <k-j <k-j <k-j

compare-exchange

2,4, 5,8, 9,11,12,13,16,18,21,23,25,27,31,34

Figure13.9Odd-evenmerge- an example

13.5.MERGING 639

Thecorrectnessof the mergingalgorithmcan be establishedusingthezero-oneprinciple.The validity of this principleis not proved here.

Theorem13.11[Zero-oneprinciple]If any oblivious comparison-basedsorting algorithmsortsan arbitrary sequenceof n zerosand onescorrectly,thenit will alsosortany sequenceof n arbitrary keys. \342\226\241

A comparison-basedsortingalgorithmis saidto be oblivious if the sequenceof cellsto becomparedin the algorithmisprespecified.For example,the nextpairof cellstobecomparedcannotdependon the outcomeof comparisonsmadein the previoussteps.Example13.19Let k\\, &2, \342\200\242\342\200\242\342\200\242, kn bea sequenceof bits.Oneway of sortingthis sequenceis to count the numberz of zerosin the sequence,followed bywriting z zerosand n \342\200\224 z onesin succession.The zero-oneprinciplecannotbeappliedto this algorithmsincethe algorithmis not comparisonbased.

Also, the quicksortalgorithm (Section3.5),even though comparisonbased,is not oblivious.The reasonis as follows. At any point in thealgorithm, the next pairof cellsto be compareddependson the numberofkeys in eachof the two parts.Forexample,if thereareonly two keys in thefirst part,thesetwo cellsarecomparednext.On the otherhand, if thereareten elementsin the first part, then the comparisonsequenceis different.

Note that mergingis a specialcaseof sortingand alsothe odd-evenmergealgorithmis oblivious.This is becausethe sequenceof cellsto becomparedis always the same.Thus the zero-oneprinciplecan beappliedto odd-evenmerge. \342\226\241

Theorem13.12Algorithm13.10correctlymergesany two sortedsequencesof arbitrary numbers.

Proof:The correctnessof Algorithm 13.10can be proved usingthe zero-one principle.Let X\\ and X% be sortedsequencesof zerosand oneswith

l-Xil =\\X-2\\

= m. Both X\\ and X2 have a sequenceof zerosfollowed by asequenceof ones.Let q\\ (q-i) bethe numberof zerosin X\\ (X2,respectively).Thenumberof zerosin X\302\260dd is \\q\\ /2]and the mimberof zerosin Xfven is|_<?i/2j. Timsthe numberof zerosin L\\ is z\\ \342\200\224 [qi/2]+ [92/2]and thenumberof zerosin Li is z% = [<7i/2j+ L<?2/2J.

The differencebetweenz\\ and z<i is at most2.This differenceis exactlytwo if and only if both

q\302\261

and q-i areodd.In allthe othercasesthe differenceis < 1.Assume that \\z\\

\342\200\224

z<i\\= 2. Theothercasesaresimilar.L\\ has two

morezerosthan L-2- When thesetwo are shuffled in step3,L containsaseqiienceof zeros,followed by 10and then by a sequenceof ones.The onlyunsortedportionin L (alsocalledthe dirty sequence)will be 10.When thefinal comparisonand interchangeis performedin step3,the dirty sequenceand the whole sequencearesorted. \342\226\241


Figure13.10A work-optimalmergingalgorithm

13.5.3A Work-OptimalAlgorithmIn this sectionwe show how tomergetwo sortedsequenceswith m elementseachin logarithmictimeusingonly ^^ processors.Thisalgorithmreducesthe originalprobleminto

0(lo\342\204\242 ) subproblems,whereeachsubproblemisthat of mergingtwo sortedsequenceseachof length O(logm).Each suchsubproblemcanbe solvedusingthe sequentialalgorithm(Algorithm3.8)ofChapter3 in O(logm)time.

Thusthe algorithmis completeif we describehow toreducethe originalprobleminto

0(lo\342\204\242) subproblems.Let X\\ and X2 be the sequencesto be

merged.PartitionX\\ into . \342\204\242 parts,wherethereare logmkeys in eachpart. CallthesepartsA\\, A2, \342\200\242\342\200\242\342\200\242

> Am, whereM =lo\342\204\242

\342\226\240 Let the largestkeyin Ai be t{ (for i = 1,2,..., M). Assign a processorto eachof these

\302\243^s.

Theprocessorassociatedwith ti performsa binary searchon X2 to find thecorrect(i.e.,sorted)positionof t{ in X%. This inducesa partitioningof X2into M parts. Note that someof theseparts couldbe empty (seeFigure13.10).Let the correspondingpartsof X2 be

\302\243?i,\302\243?2,\342\200\242\342\200\242\342\200\242,Bm-CallBi the

correspondingsubset of Ai in X2.Now, the mergeof X\\ and X2 is nothingbut the mergeof Ai and B\\,

followed by the mergeof A2 and B2,and soon.That is,mergingX\\ and X2reducesto mergingAi with Bi for i \342\200\224 1,2,...,M. We know that the sizeof eachAi is logm.But the sizesof the B^s couldbe very large(or verysmall).Howcan we mergeAj with Bi?We can usethe ideaof partitioningonemoretime.

Let Ai and Bi be an arbitrary pair. If \\B{\\= O(logm),they can be

mergedin O(logm)timeusingoneprocessor.Considerthe casewhen \\Bi\\ is

w(logm).PartitionBi into [w^lparts,whereeachpart has at most logm

13.5.MERGING 641

successivekeysof\302\243?j.

Allocateoneprocessorto eachpart sothat theprocessor can find the correspondingsubsetof this part in Aj in O(loglogm)time.As a result,the problemof mergingA{ and B{has beenreducedto

[\"J, '^1subproblenis,whereeachsubproblemis that of mergingtwo sequencesoflengthO(logm).

The numberof processorsusedis5Z\302\253=i TloSlkl'wnich 1S< 2M.Thuswe

concludethe following.

Theorem13.13Two sortedsequencesof lengthm eachcan bemergedin2m

logmO(logm)timeusing t-^2- CREW PRAM processors. \342\226\241

13.5.4An 0(loglogm)-TimeAlgorithmNow we presenta very fast algorithmfor merging.Thisalgorithmcanmergetwo sortedsequencesin 0(loglogm) time,wherem is the numberof elementsin eachof the two sequences.The numberof processorsused is 2m. Thebasicideabehindthis algorithmis the sameas the oneusedfor the algorithmof Theorem13.13.In additionwe employ the divide-and-conquertechnique.

X\\ and X2 are the given sequences.Assume that the keys aredistinct.The algorithmreducesthe problemof mergingX\\ and X-2 into N < 2y/msubproblenis,whereeachsubproblemis that of mergingtwo sortedsequencesof length0{^/m).This reductionis completedin 0(1)timeusingmprocessors. If T(m) is the run timeof the algorithmusing2m processors,thenT(m) satisfies the recurrencerelationT(m) = T(0(y/m)) +0(1)whosesolution is O(loglogm).Detailsof the algorithmaregiven in Algorithm 13.11.

Thecorrectnessof Algorithm 13.11is quite clear;we infer this theorem.

Theorem13.14Two sortedsequencesof lengthm eachcan bemergedinO(loglogm)timeusing2m CREW PRAM processors. \342\226\241

Theabovealgorithmhas aspeedupof 0njgwm) = @(m/loglogm) whichis very closeto rn. Itsefficiency is 0(1/loglogm),and hencethe algorithmis not work-optimal!

EXERCISES1.Modify Algorithm 13.11so that it uses only log^gm CREW PRAM

processorsand mergesX\\ and X2 in O(loglogm)time.

2.A sequenceK = k\\, fc-2,...,kn is saidto be bitonic either(1) if thereis a 1 < j < n such that k\\ < &2 < \342\200\242\342\200\242\342\200\242kj > kj+i > \342\226\240\342\226\240\342\226\240 > kn or


Step1. PartitionX\\ into y/m partswith y/m elementseach.CallthesepartsA\\,A2,...,A^. Let the largestkey in Ai be

\302\243i (for i = 1,2,...,y/m). Assign y/m processorsto eachof theseii's.Theprocessorsassociatedwith

\302\243i performa ^/m-ary searchon X2 tofind the correct(i.e.,sorted)positionof

\302\243i

in X2 in 0(1)time(seeSection13.4,Exercise10).This inducesa partitioningof X2 into y/m parts.Note that someof theseparts couldbeempty (seeFigure13.10).Let the correspondingpartsof X2 be\302\243?i,\302\243?2,

\342\200\242\342\200\242\342\200\242,B/^.ThesubsetBi is the correspondingsubsetofAi in X2.

Step2.Now, the mergeof X\\ and X2 is nothingbut the mergeof A\\ and B\\, followed by the mergeof A2 and

\302\243?2,and soon.

That is,mergingX\\ and X2 reducestomergingAi with Bi fori = 1,2,...,y/m. We know that the sizeof eachAi is y/m. Butthe sizesof the

\302\243?j'scouldbevery large(orvery small).To merge

A{ with\302\243?j,

we can usethe ideaof partitioningonemoretime.Let Ai and B{be an arbitrary pair. If \\B{\\

= 0(y/rn), we canmergethem in 0(1)timeusingan me-ary search.Considerthecasewhen \\Bi\\ is uj(y/m). PartitionBi into [M^] parts,whereeachpart has at most y/m successivekeys of Bi.Allocatey/mprocessorstoeachpart sothat the processorscan find thecorresponding subset of this part in A4 in 0(1)time. As a resultthe problemof mergingA4 and Bi has beenreducedto [ f^~\\

subproblems,whereeachsubproblemis that of mergingtwosequencesof length0(y/rn).Thenumberof processorsusedis J2i=i

V\342\204\242\\~7î~\\5 which is < 2m.

Algorithm13.11Mergingin O(loglogm)time

13.6.SORTING 643

(2) a cyclic shift of K satisfies 1.Forexample,3,8,12,17,24,15,9, 6and 21,35,19,16,8, 5,1,15,17arebitonic.If K is a bitonicsequencewith n elements(for n even), let ai = min {fcj,ki+n/2] and 6j =max{h,ki+n/2}.AlsoletL(K) = min {kx,&1+n/2},min {k2,k2+n/2},...,min{kn/2,kn}andH(K)= max{kuk1+n/2},max{fc2,fc2+n/2},...,max

{A;\342\200\236/25^n}-Show that:

(a) L(.K\") and H(K)areboth bitonic.(b) Every elementof L(K) is smallerthan any elementof H(K).

In otherwords, to sort if, it suffices to sort L(K) and H(K)separatelyand output onefollowed by the other.

The above propertiessuggesta divide-and-conqueralgorithmforsorting a given bitonicsequence.Presentthe detailsof this algorithmtogetherwith an analysis of the timeand processorbounds.Showhow to makeuseof the resultantsortingalgorithmto mergetwo givensortedsequences.Suchan algorithmis calledthe bitonic merger.

3.Given two sortedsequencesof lengthn each.Howwill you mergethemin O(l)timeusingn2 CREW PRAM processors?

13.6SORTINGGiven a sequenceof n keys, recallthat the problemof sortingis to rearrangethis sequenceinto eitherascendingor descendingorder.In this sectionwestudy severalalgorithmsfor parallelsorting.If we have n processors,therank of eachkey can be computedin O(logn) timecomparing,in parallel,allpossiblepairs(seethe proofof Theorem13.7).Oncewe know the rank ofeachkey, in oneparallelwrite stepthey can bewritten in sortedorder(thekey whoserank is i is written in celli).Thuswe have the following theorem.

Theorem13.15We can sort n keys in O(logn) time using n2 CREWPRAM processors. \342\226\241

Thework doneby the precedingalgorithmis 0(n2logn).On the otherhand we have seenseveralsequentialalgorithmswith run timesof 0(nlogn)(Chapter3) and have alsoproveda matchinglower bound(Chapter10).Theprecedingalgorithmis not work-optimal.

13.6.1Odd-EvenMergeSortOdd-evenmergesortemploys the classicaldivide-and-conquerstrategy.Assume for simplicity that n is an integralpower of two and that the keys aredistinct.If X = k\\, k2, \342\226\240

\342\226\240., kn is thegiven sequenceof n keys, it ispartitioned


into two subsequencesX[ = k\\,k2,...,kn/2and X2 = kn/2+i,knj2+2,\342\226\240\342\200\242\342\226\240,knof equallength. X[ and X2 aresortedrecursivelyassigningn/2processorstoeach.The two sortedsubsequences(callthem X\\ and X2, respectively)are then finally merged.

Theprecedingdescriptionof the algorithmis exactlythe sameas thatof mergesort.The differencebetweenthe two algorithmsliesin how thetwo subsequencesX\\ and X2 are merged.In the mergingalgorithmusedin Section3.4,the minimum elementsfrom the two sequencesarecomparedand the minimum of these two is output.This step continuesuntil thetwo sequencesare merged.As is seen,this processseemsto be inherentlysequentialin nature. Instead,we employ the odd-evenmergealgorithm(Algorithm 13.10)of Section13.5.

Theorem13.16We can sortn arbitrary keys in 0(logn) timeusingnEREW PRAM processors.

Proof:The sortingalgorithmis describedin Algorithm 13.12.It usesnprocessors.DefineT(n) tobethe timetakenby this algorithmtosortn keysusingn processors.Step1of this algorithmtakes0(1)time. Step2 runsin T(n/2)time. Finally, step3 takesO(logn) time(c.f.Theorem13.10).Therefore,T{n) satisfies T(n) = 0(1)+ T(n/2)+ O(logn) = T(n/2)+O(logn),which solves to T(n) = 0(log2n). \342\226\241

Example13.20Considertheproblemof sortingthe 16numbers25,21,8,5,2,13,11,16,23,31,9,4,18,12,27,34using 16processors.In step 1ofAlgorithm 13.12,the input is partitionedinto two: X[ = 8,21,8,5,2,13,11,16,and X'2 = 23,31,9,4,18,12,27,34.In step2,processors1to8 work on X[,recursivelysortit,and obtainX\\ = 2,5,8,11,13,16,21,25.At the sametimeprocessors9 to 16workonX'2,sortit,and obtainX2 = 4,9,12,18,23,27,31,34.In step3,X\\ and X2 aremergedas shown in Example13.18to get thefinal result:2,4,5, 8,9,11,12,13,16,18,21,23,25,27,31,34. \342\226\241

Theworkdoneby Algorithm 13.12is 0(nlog2n). Therefore,its efficiencyis 0(1/logn). It has a speedupof 0(n/logn).

13.6.2 An AlternativeRandomizedAlgorithmWe can get the result of Theorem13.16using the randomizedselectionalgorithmof Section13.4.Theorem13.8statesthat selectionfrom out of n

keys canbeperformedin O(logn) timeusingj^\342\200\224

processors.Assume thatthereare n processors.Themediank of the n given keys can be found in

O(logn) time.Having found the median,partitionthe input into two parts.

13.6.SORTING 645

Step0.If n < 1,return X.

Step1.Let X = hi, tc2,...,kn be the input. Partitionthe inputinto two: X\\

= fci, fc2,...,A;n/2 and X2 =fc\342\200\236/2+i, A;\342\200\236/2+2,...,

kn.

Step2.Allocaten/2processorstosortX[ recursively. Let X\\

be the result.At the sametimeemploy the othern/2processorsto sortX2 recursively. Let X2 be the result.

Step3. MergeX\\ and X2 usingAlgorithm 13.10and n = 2mprocessors.

Algorithm13.12Odd-evenmergesort

The first part X[ containsall the input keys < k and the secondpart X'2containsallthe restof the keys. ThepartsX[ and X2 aresortedrecursivelywith n/2processorseach.Theoutput is X[ in sortedorderfollowed by X2in sortedorder.If T(n) is the sortingtimeof n keys usingn processors,wehave T(n) = T(n/2)+O(logn),which solves to T(n) = 0(log2n).

Theorem13.17Sortingn keys can be performedin 0(log2n)timeusingn CREW PRAM processors. \342\226\241

13.6.3Preparata'sAlgorithmPreparata'salgorithmruns in O(logn) timeand usesnlognCREW PRAMprocessors.This is a recursivedivide-and-conqueralgorithmwhereintherank of eachinput key is computedand the keys are output accordingtotheir ranks (seethe proofof Theorem13.15).Let k\\, k2,...,kn be theinput sequence.Preparata'salgorithmpartitionsthe input into lognpartsK\\,K2,...,iflogn; wherethereare

j\302\243\342\200\224keys in eachpart. If k is any key

in the input, itsrank in the input is computedas follows. First, the rankT{ of k in Ki is computedfor eachi, 1< i < logn.Then, the total rankof k is computedas J2t=ilr*- \302\256ne \302\260f ^ne resultsthat it makesuse of isTheorem13.14.

Thedetailsof Preparata'salgorithmare given in Algorithm 13.13.LetT(n) be the run time of Preparata'salgorithmusing nlognprocessors.Clearly, step 1 takesT(n/logn)time and steps 2 and 3 togethertake


Step0.If n is asmallconstant,sortthe keysusingany algorithmand quit.

Step1.Partitionthe given n keys into lognparts,with n/lognkeys in eachpart. Sort eachpart recursively and separatelyinparallel,assigningn processorstoeachpart.Let Si,S-2,...,Siognbe the sortedsequences.

Step2. MergeSi with Sj for 1< i,j < lognin parallel.Thiscan be doneby allocatingn/lognprocessorsto eachpair(i,j).That is,usingnlognprocessors,this stepcan be accomplishedin O(loglogn)timewith Algorithm 13.11.As a by-product ofthis mergingstep,we have computedthe rank of eachkey in eachoneof the S^'s(1< i < logn).Step3. Allocatelognprocessorstocomputethe rank of eachkey in the originalinput. This is donein parallelfor allthe keysby addingthe logn rankscomputed(for eachkey) in step2.Thiscan be done in O(loglogn)timeusingthe prefix computationalgorithm(seeAlgorithm 13.3).Finally, the keysarewrittenoutin the orderof their ranks.

Algorithm13.13Preparata'ssortingalgorithm

13.6.SORTING 647

O(loglogn) time.Thuswe have

T(n) = T(n/logn) + O(loglogn)which can be solvedby repeatedsubstitutionto get T(n) = O(logn).Also,the numberof processorsused in eachstepis n logn. We get the following.

Theorem13.18Any n arbitrary keys canbesortedin O(logn) timeusingn logn CREW PRAM processors. \342\226\241

Applying the slow-downlemma(Lemma13.2)to the above theorem,weinfer a corollary.

Corollary13.1Any n generalkeys can besortedin 0(tlogn)timeusingnlogn/tCREW PRAM processors,for any t>\\. \342\226\241

Preparata'salgorithmdoesthe sametotalwork as the odd-evenmergesort.But its speedupis 0(n),which is better than that of the odd-evenmergesort.Efficiency of both the algorithmsis the same;i.e.,0(1/logn).

13.6.4Reischuk'sRandomizedAlgorithm(*)This algorithmuses n processorsand runs in timeO(logn).Thus its

efficiency is ~ ogn> = 0(1);i.e.,the algorithmis work-optimalwith high0(rt log rt)

probability!The basisfor this algorithmis Preparata'ssortingschemeandthe following theorem. (For a proofseethe referencesat the end of thischapter.)

Theorem13.19We can sort n keys, whereeachkey is an integerin therange [0,n(logn)c] (c is any constant) in O(logn) timeusing^-^ CRCWPRAM processors. \342\226\241

Reischuk'salgorithmruns in the sametimebound as Preparata's(withhighprobability) but usesonly n processors.The ideais to randomly sampleN = \" keys from the input and sorttheseusinga non-work-optimalalgorithm likePreparata's.Thesortedsamplepartitionsthe originalprobleminto N + 1independentsubproblemsof nearly equalsize,and allthesesub-problemscan be solvedeasily. Theseideasaremadeconcretein Algorithm13.14.

Step2 of Algorithm 13.14canbedoneusingN logN <N logn processorsin O(logiV)= O(logn) time(c.f.Theorem13.18).In step3,thepartitioning

of X can be doneusingbinary searchand the integersort algorithms


Stepl.iV = n/(log4n) processorsrandomlysamplea key (each)from X = ki,&2, \342\200\242\342\200\242

\342\200\242, kn, the given input sequence.

Step2. Sort the N keys sampledin step 1usingPreparata'salgorithm.Let l\\, 1%, \342\226\240\342\226\240

\342\226\240, Inbe the sortedsequence.

Step3. Let Kx = {ke X\\k < h};K{= {k <E X\\U-i < k <li},i = 2,3,...,N;and K^+i = {k \302\243 X\\k > l^}.Partitionthegiven input X into K^s as defined.This is doneby first findingthe part eachkey belongsto (usingbinary searchin parallel).Now partitioningthe keys reducesto sortingthe keys accordingtotheir part numbers.

Step4. For l<i<iV+ linparallelsortKi usingPreparata'salgorithm.

Step5.Output sorted(ifi),sorted(if2),\342\200\242\342\200\242

\342\200\242, sorted(ifjv+i).

Algorithm13.14Work-optimalrandomizedalgorithmfor sorting

13.6.SORTING 649

(c.f.Theorem13.19).If thereis a processorassociatedwith eachkey, theprocessorcan performa binary searchin h,1,2, \342\226\240\342\226\240

\342\226\240, In to figure out the partnumberthe key belongsto. Note that the part numberof eachkey is anintegerin the range [l,iV+ 1],Thereforethe keys can be sortedaccordingtotheir part numbersusingTheorem13.19.

Thus step 3 can be performedin O(logn) time,using < n processors.With high probability, therewill be no morethan 0(log5n)keys in eachof the A.,;'s(1 < i < N). The proofof this fact is left as an exercise.Within the sameprocessorand timebounds,we canalsocount |Aj|for eachi. In step4, eachK{ can be sortedin 0(log|Aj|)timeusing |Aj|log|Aj|processors.Also Kt can be sortedin (log|A2|)2timeusing |A2| processors(seeCorollary 13.1).So step4 can be completedin (maxjlog|Aj|)2 timeusingn processors.If max;|Aj|= 0(log5n),step 4 takes0((loglogn)2)time.Thus we have proved the following.

Theorem13.20We can sortn generalkeys usingn CRCW PRAMprocessors in O(logn) time. \342\226\241

EXERCISES1.In step3 of Algorithm 13.12,we couldemploy the mergingalgorithm

of Algorithm 13.11.If so,what would be the run timeof Algorithm13.12?What would be the processorbound?

2.Ifwe have n numbersto sort and eachnumber is a bit,one way ofsortingX couldbe to make use of prefix computationalgorithmsasin Section13.3,Exercise8.This amountsto countingthe numberofzerosand the numberof ones.If z is the numberof zeros,we outputz zerosfollowedby n \342\200\224 z ones.Usingthis idea,designan O(logn)timealgorithmto sortn numbers,whereeachnumberis an integerinthe range [0,logn\342\200\224 1].Your algorithmshouldrun in O(logn) timeusing no morethan -^r- CREW PRAM processors.Recallthat n

numbersin the range [0,nc]can be sequentially sortedin 0(n)time(the correspondingalgorithmis known as the radixsort).

3. Make use of the algorithmdesignedin the previousproblemtogetherwith the ideaof radixsorting to show that n numbersin the range[0,(logn)c]can be sortedin O(logn) timeusingj^ CREW PRAMprocessors.

4. Given two setsA and B of sizen each(in the form of arrays),the goalis to checkwhetherthe two setsaredisjointornot.Showhow to solvethis problem:


(a) In 0(1)timeusingn2 CRCW PRAM processors(b) In O(logn) timeusingn CREW PRAM processors

5. Sets A and B are given such that \\A\\= n, \\B\\

= m, and n >m. Show that we can determinewhether A and B are disjoint inO((logn)(logm)) timeusing

^\342\200\224

CREW PRAM processors.6. Show that if a setX of n keys is partitionedusinga randomsampleof

sizes (as in Reischuk'salgorithm),the sizeof eachpart is O (^ logn).7. Array A is an almostsortedarray of n elements.It is given that the

positionof eachkey is at most a distanceof d from its final sortedposition,whered is a constant.Give an 0(1)timen-processorEREWPRAM algorithmtosortA. Provethe correctnessof your algorithmusingthe zero-oneprinciple.

8.Theoriginalalgorithmof Reischukwas recursiveand had the followingsteps:(a) Selecta randomsampleof size\\/n

\342\200\224 1and sortit usingTheorem13.15.

(b) Partitionthe input into y/n partsmakinguseof the sortedsample(similartostep3 of Algorithm 13.14).

(c) Assigna linearnumberof processorsto eachpart and recursivelysorteachpart in parallel.

(d) Output the sortedparts in the correctorder.

Seeif you can analyze the run timeof this algorithm.

9. It is knownthat prefixsumscomputationcanbedonein time0(lo\302\260|^0\" )

using n'\302\260\302\243 '\302\260g

n CRCW PRAM processors,provided the numbersareintegersin the range [0,nc]for any constantc. Assuming this result,showthat sortingcanbedonein time

0(lo\302\260g0\" ) timeusingn2 CRCWPRAM processors.

10.Adopt the algorithmsof Exercise7 and Section13.4,Exercise10,andthe 0(logn/loglogn) timealgorithmfor integerprefix sumscomputation to show that n numberscan besortedin 0(t \302\260^\"

) timeusingn(logn)eCRCW PRAM processors(for any constante >0).

11.The random accessread (RAR) operationin a parallelmachineisdefined as follows:Eachprocessorwants to reada data itemfrom someotherprocessor.In the caseof a PRAM, it is helpful to assumethat

13.7.GRAPHPROBLEMS 651

eachprocessorhas an associatedpart of the globalmemory,andreading

from a processormeansreadingfrom the correspondingpart ofsharedmemory. It may be the casethat severalprocessorswant toreadfrom the sameprocessor.Note that on the CRCW PRAM or onthe CREW PRAM, a RAR operationcan beperformedin oneunit oftime. Devisean efficient algorithmfor RAR on the EREW PRAM.(Hint:If processori wants to readfrom processorj, createa tuple (j,i)correspondingtothis request.Sortallthe tuples(j,i) in lexicographicorder.)

12.We candefinea random accesswrite (RAW) operationsimilarto RARas follows.Every processorhas an itemof datatobesent to someotherprocessor(that is,an itemof datatobewritten in the sharedmemorypart of someotherprocessor).Severalprocessorsmight want to writein the samepart and hencea resolutionscheme(suchas common,priority, etc.)is alsosupplied.On the CRCW PRAM (with the sameresolutionscheme),this can bedone in oneunit of time.Developefficient algorithmsfor RAW on the CREW and EREW PRAMs. (Hint-Make use of sorting(seeExercise11).)

13.7GRAPH PROBLEMSWe considerthe problemsof transitiveclosure,connectedcomponents,minimum spanningtree,and all-pairsshortestpaths in this section.Efficientsequentialalgorithmsfor theseproblemswerestudiedin Chapters6 and 10.We beginby introducinga generalframework for solving theseproblems.

Definition13.4Let M be an n x n matrixwith nonnegativeintegercoefficients. Let M bea matrixdefinedas follows:

M(i,i)= 0 for every i

M(i,j) = min {MioU+ Mlli2+ \342\200\242\342\200\242\342\200\242+ Mlk_lik} for every i ^ jwhereiq = i,ik= j, andthe minimum is takenover allsequencesiq,i\\,...,iôf elementsfrom the set{1,2,...,n}. \342\226\241

Example13.21Let G(V,E)be a directedgraph with V = {1,2,...,n}.DefineM as M(i,j) = 0 if eitheri = j or thereis a directededgefrom nodei to nodej in G,and M(i,j)\342\200\224 1otherwise.For this choiceof M, it is easyto seethat M(i,j) = 0 if and only if thereis a directedpath from nodei tonodej in G.

In Figure13.11,a directedgraph is shown togetherwith its M and M.M(l,5) is zerosinceMi2 + M25 = 0.Similarly,M(2,1)is zerosinceM25+


M56+ Mqi = 0.On the otherhand, M(3,1)is onesincefor every choiceofi1,i2,...,ik-i,the sum Mioh + Mhi.2 H \\-Mik_lik is >0. \342\226\241

M M

011110

001110

100011

101000

101101

011100

001100

001100

000000

001000

001100

001100

Figure13.11An examplegraphand its M and M

Theorem13.21M can becomputedfrom an n x n matrixM in O(logn)timeusingn3+ecommonCRCW PRAM processors,for any fixed e >0.

Proof:We make use of 0(ns)globalmemory. In particularwe use thevariablesm[i,j]for 1 < i,j < n and q[i,j,k]for 1 < i,j,k < n. Thealgorithmto beemployedis given in Algorithm 13.15.

Initializingm[ ] takesn2 time. Step 1of Algorithm 13.15takes0(1)time using n3 processors.In step 2, n2 different m[i,j\\sare computed.Thecomputationof a singlem[z,j]involves computingthe minimum of nnumbersand hencecanbecompletedin 0(1)timeusingn2 CRCW PRAMprocessors(Theorem13.4).In fact this minimum can alsobe computedin0(1)timeusingn1+eprocessorsfor any fixed e > 0 (Section13.4,Exercise4). In summary, step2 can be completedin 0(1)timeusingn3+ecommon


m[i,j]:=M[i,j]for 1< i.j< n in parallel;for r :=1to logndo{

Step1.In parallelsetq[i,j,k]:=m[i,j]+ m[j,k]for1<hjik<n.Step2. In parallelset m[i,j]:= min {q[i,l,j],<l[i,2J],\342\226\240\342\226\240; q[i-,n,j]}for 1<hj <n.

}Put M{i){i):=0 for all i and M(i)(j):=rn[i,j]for i ^ j.

Algorithm13.15Computationof M

CRCW PRAM processors.Thus the for loopruns in O(logn) time. Thefinal computationof M alsocan be done in 0(1)timeusingn2 processors.

\342\226\241

The correctnessof Algorithm 13.15can be proven by inductionon r.We can show that the value of m[i,j]at the end of the rth iterationofthe for loopis min {Mtoll+ Mili2+ \342\200\242\342\200\242\342\200\242+ Mik_^k}, wherei = i0,j = 4,and the minimum is takenover allthe sequencesz'o,i\\,...,%k of elementsof{1,2,...,n}such that k < 2r.Algorithm 13.15can be specializedto solveseveralproblemsincludingthe transitive closure,connectedcomponents,minimum spanningtree,and soon.

Theorem13.22Thetransitiveclosurematrixof ann-vertexdirectedgraphcan be computedin O(logn) time using n'i+f commonCRCW PRAMprocessors.

Proof:If M is definedas in Example13.21,the transitiveclosureof G canbeeasilyobtainedonceM is computed.In accordancewith Theorem13.21,M can becomputedwithin the statedresourcebounds. \342\226\241

Theorem13.23Theconnectedcomponentsof an n-vertexgraph can bedeterminedin O(logn) timeusingn3+ecommonCRCW PRAM processors.


Proof:DefineM(i)(j)to bezeroif eitheri = j ori and j areconnectedbyan edge;M(i)(j)is oneotherwise.Nodesi and j are in the sameconnectedcomponentif and only if M(i)(j)=0. \342\226\241

Theorem13.24A minimum spanningtreefor an n-vertexweightedgraphG(V,E)can be computedin O(logn) time using n5+e commonCRCWPRAM processors.Proof:Thealgorithmis a parallelizationof Kruskal'ssequentialalgorithm(seeSection4.5).In Kruskal'salgorithm,the edgesin the given graph Garesortedaccordingto nondecreasingedgeweights.A forest F of treesismaintained.To beginwith, F consistsof \\V\\ isolatednodes.The edgesareprocessedfrom the smallestto the largest.An edge(u,v) gets includedinF if and only if (u,v) connectstwo different treesin F.

In parallel,the edgescan besortedin O(logn) timeusingn2 processors(Theorem13.15).Let ei,e2,...,en be the edgesof G. For eachedgeej =(u,v), we can decide,in parallel,whetherit will belongto the final treeasfollows. Find the transitiveclosureof the graph Gi that has V as its nodesetand whoseedgesaree\\, e<i,...,ej_i.Theej will get includedin the finalspanningtreeif and only if u and v arenot in the sameconnectedcomponentof Gi.

Thus,usingTheorem13.23,the testas to whetheran edgebelongsto thefinal answer canbeperformedin O(logn) timegiven n3+eprocessors.Sincethereareat most n2 edges,the result follows. \342\226\241

13.7.1An AlternativeAlgorithmfor TransitiveClosureNow we showhow to computethe transitiveclosureof a given directedgraphG(V,E)in 0(log2n)timeusing

j^\342\200\224

CREW PRAM processors.In Section10.3(Lemma10.9)we showed that if A is the adjacencymatrixof G,thenthe transitiveclosurematrixM is given by M = I + A + A2 + \342\200\242\342\200\242\342\200\242+ An~l.Along the samelines,we can alsoshow that M = (I+ A)n. A proofbyinductionwill establishthat for any k, 1< k < n, (I+A)k(i)(j) = 1if andonly if thereis a directedpath from nodei to nodej of length<k.

Thus M can be computedby evaluating(I + A)n. (I + A)n can berewrittenas (I+ A)2 . Therefore,computingM reducesto a sequenceof [logn]matrixsquarings(or multiplications).Sincetwo matricescan be

3multipliedin O(logn)timeusingj^-^ CREW PRAM processors(seeSection13.3,Exercise12),we have the following theorem.

Theorem13.25The transitiveclosureof an n-nodedirectedgraphcanbecomputedin 0(log2n)timeusing

^-\342\200\224

CREW PRAM processors. \342\226\241

13.7.GRAPH PROBLEMS 655

13.7.2 All-PairsShortestPathsAn 0(n3)time algorithmwas developedin Section5.3for the problemof identifying the shortestpath betweenevery pair of verticesin a givenweighted directedgraph.Thebasicprinciplebehindthis algorithmwas todefine Ak(i,j) to representthe lengthof a shortestpath from i toj goingthroughno vertexof indexgreaterthan k and then to infer that

Ak(i,j) =min {A'-Hhj),Ak-\\t,k)+Ak~\\k,j)}, k>l.Thesameparadigmcan be used to designa parallelalgorithmas well.

Theimportanceof the above relationshipbetweenAk and Ak~l is that thecomputationof Ak from A x correspondsto matrixmultiplication,wheremin and additiontakethe placeof additionand multiplication,respectively.Underthis interpretationof matrixmultiplication,the problemof all-pairsshortestpaths reducesto computingAn = A2

\302\260g\" . We get this theorem.

Theorem13.26The all-pairsshortest-pathsproblemcan be solved in

0(log2n)timeusing *\302\243-CREW PRAM processors. \342\226\241

lug n

EXERCISES1.Computethe speedup,totalwork done,and efficiency for eachof the

algorithmsgiven in this section.2.Let G(V,E)be a directedacyclicgraph (dag). The topologicalsortof

G is defined tobe a linearorderingof the verticesof G such that if(u,v) is an edgeof G,then u appearsbeforev in the linearordering.Show how to employ the generalparadigmintroducedin this sectionto obtain an O(logn) timealgorithmfor topologicalsort usingn3+ecommonCRCW PRAM processors.

3. Presentan efficient parallelizationof Prim'salgorithmfor minimumspanningtrees(seeSection4.5).

4. Presentan efficient parallelalgorithmto checkwhethera givenundirected graph is acyclic.Analyze the processorand timebounds.

5. If G is any undirectedgraph,Gk is definedas follows:Therewill beanedgebetweennodesi and j in G if and only if thereis a path of lengthk in G betweeni and j. Presentan 0(lognlogA;)timealgorithmto

computeGk from G.You can usea maximumof j^- CREW PRAMprocessors.

6.Presentan efficient parallelminimum spanningtreealgorithmfor thespecialcasewhen the edgeweightsarezeroand one.


7. Presentan efficient parallelizationof the Bellmanand Fordalgorithm(seeSection5.4).

13.8COMPUTINGTHE CONVEXHULL

In this sectionwe revisit the problemof constructingthe convexhull of npointsin 2D in clockwiseorder.Thetechniqueto beusedis the sameas theone we employedsequentially.Theparallelalgorithmwill have a run timeof O(logn) usingn CREW PRAM processors.Notethat in Chapter10weproved a lower bound of fi(nlogn)for the convexhull problemand hencethe parallelalgorithmto bestudiedis work-optimal.

Thesequentialalgorithmwas basedon divide-and-conquer(seeSection3.8.4).It computedthe upperhull and the lower hull of the given pointsetseparately.Thus let us restrictour discussionto the computationof theupperhull only. The given pointswerepartitionedinto two halves on thebasisof their ^-coordinatevalues. All points with an ^-coordinate< themedianformed the first part. The restof the pointsbelongedto the secondpart. Upperhulls were recursivelycomputedfor the two halves.Thesetwohulls were then mergedby finding the lineof tangent.

We adopta similartechniquein parallel.First,the pointswith theminimum and maximum^-coordinatevalues are identified.This can be doneusingthe prefixcomputationalgorithmin O(logn)timeand ^f~ processors.Let p\\ and p2 be thesepoints.All the points which are to the left of thelinesegment(p\\,P2)areseparatedfrom thosewhich are to the right. Thisseparationalsocan be doneusinga prefix computation.Pointsof the first(second)kind contributeto the upper(lower)hull. Thecomputationsof theupperhull and the lower hull aredoneindependently.Fromhereon we onlyconsiderthe computationof the upper hull. By \"input\" we mean all thepoints that are to the left of (pi,p2)-We denotethe numberof suchpointsbyiV.

Sort the input points accordingto their ^-coordinatevalues. This canbe done in O(logiV)timeusingN processors.In fact therearedeterministic algorithmswith the sametimeand processorboundsas well (seethereferencesat the end of this chapter). This sorting is done only onceinthe computationof the upperhull. Let <Zi,<72> \342\226\240\342\226\240\342\226\240><7jv be the sortedorderofthesepoints.The recursivealgorithmfor computingthe upperhull is givenin Algorithm 13.16.An upperhull is maintainedin clockwiseorderas a list.We refer to the first elementin the list as the leftmost point and the lastelementas the rightmostpoint.

We show that step3 can beperformedin 0(1)timeusingN processors.Step4 alsocan be completedin 0(1)time. If T(N) is the run timeofAlgorithm 13.16for finding the upperhull on an input of N pointsusingN

13.8.COMPUTINGTHECONVEX HULL 657

Step0. If N < 2,solve the problemdirectly.

Step1.Partitionthe input into two halves with q\\,qi,\342\200\242\342\200\242\342\200\242,Qn/2in the first half and 9^72+11Qn/2+2^\342\226\240\342\226\240\342\226\240iQn in the secondhalf.

Step2.Computethe upperhull of eachhalf (in clockwiseorder)recursively assigningy processorsto eachhalf. Let H\\ and H2be the upperhulls.

Step3. Find the line of tangent (seeFigure3.9)betweenthetwo upperhulls. Let (u,v) be the tangent.

Step4. Dropall the points of H\\ that are to the right of u.Similarly,dropallthe pointstothe left of v in #2- Theremainingpart of H\\, the tangent,and the remainingpart of H2 form theupperhull of the given input set.

Algorithm13.16Parallelconvexhull algorithm

processors,then we have

T{N) = T{N/2)+ 0(1)which solves to T(N) = 0(logN).The numberof processorsused is N.

The only part of the algorithmthat remainsto be specifiedis how tofind the tangent (u,v) in 0(1)timeusingN processors.Firststart from themiddlepoint p of H\\. Herethe middlepoint refersto the middleelementof the correspondinglist.Find the tangent of p with H2.Let (p,q) be thetangent. Using(p,q),we can determinewhetheru is to the left of, equalto,or to the right of p in Hi. A A;-ary search(for somesuitableA;) in thisfashion on the pointsof Hiwill revealu.Usethe sameprocedureto isolatev.

Lemma 13.4Let Hi and Hi be two upperhulls with at most rn pointseach. If p is any point of Hi, its tangent q with H2 can be found in O(l)timeusingme processorsfor any fixed e >0.

Proof.If q' is any point in H2,we can checkwhetherq' is to the left of,equalto,ortothe right of q in O(l)timeusinga singleprocessor(seeFigure3.10).If Ipq'xis a right turn and Ipq'yis a left turn, then q is to the right


of q';if Lpq'x and Ipq'yare both right turns, then, q' = q; otherwiseq isto the left of q'.Thus if we have m processors,we can assignoneprocessorto eachpoint of H2 and identify q in 0(1)time.Theidentificationof q canalsobe doneusingan me-ary search(seeSection13.4,Exercise10)in 0(1)timeand me processors,for any fixed e >0. \342\226\241

Lemma 13.5If H\\ and H2 aretwo upperhulls with at most m pointseach,theircommontangentcan becomputedin 0(1)timeusingm processors.Proof.Let u G H\\ and v G H2 be such that (u,v) is the line of tangent.Also let p be an arbitrary point of H\\ and let q G H2 be such that (p,q) isa tangent of H2.Givenp and q, we can checkin 0(1)timewhetheru is tothe left of, equal to,or to the right of p (seeFigure3.11).If (p,q) is alsotangentialtoHi,then p = u. IfLxpqis a left turn, then u is tothe left ofp;elseu is tothe right of p.This suggestsan me-ary searchfor u. Foreachpoint p of Hi chosen,we have todeterminethe tangent from p to H2 andthen decidethe relativepositioningof p with respectto u. Thus indeedweneedm\302\243 x me processorsto determineu in 0(1)time.If we choosee = 1/2,we can makeuseof all the m processors. \342\226\241

The following theoremsummarizesthesefindings.

Theorem13.27Theconvexhull of n points in the planecanbecomputedin O(logn) timeusingn CREW PRAM processors. \342\226\241

Algorithm 13.16has a speedupof 0(n);its efficiency is @(1).

EXERCISES1.Show that the verticesof the convexhull of n given points can be

identified in 0(1)timeusinga commonCRCW PRAM.

2.Presentan0(lo'g\302\260f0gJ

timeCRCW PRAM algorithmfor the convexhull problem.How many processorsdoesyour algorithmuse?

3.Presentan O(logn)timen-processorCREW PRAM algorithmtocompute the areaof the convexhull of n given points in 2D.

4. Given a simplepolygonand a point p,the problemis tocheckwhetherp is internalto the polygon.Presentan O(logn) timej^^-processorCREW PRAM algorithmfor this problem.

5. Presentan 0(1)timealgorithmtocheckwhetherany threeof n givenpointsarecolinear.You can use up to n3 CRCW PRAM processors.Can you decreasethe processorboundfurther?

13.9.LOWER BOUNDS 659

6. Assume that n points in 2D aregiven in sortedorderwith respecttothe polaranglesubtendedat x, wherex is the point with the lowesty-coordinate.Presentan 0(logn)-timeCREW PRAM algorithmforfinding the convexhull. What is the processorbound of youralgorithm?

7. Given two points p = (xi,yi)and q = (#2;2/2) m the plane,p issaidto dominateq if x\\ > X2 and y\\ >

2/2\342\200\242

The dominancecountingproblemis definedas follows.Given two setsX = {pi,p-2,\342\226\240\342\226\240\342\226\240,Pm}andY = {qi,Q2,\342\226\240\342\226\240\342\226\240,Qn}of points in the plane,determinefor eachpointPi the numberof points in Y that are dominatedby pi.Presentan0(log(rn+ n)) timealgorithmfor dominancecounting. How manyprocessorsdoesyour algorithmuse?

13.9LOWERBOUNDSIn thissectionwe presentsomelower boundsonparallelcomputationrelatedto comparisonproblemssuch as sorting, finding the maximum,merging,and soon.Theparallelmodelassumedis calledthe parallelcomparisontree(PCT).Thismodelcanbethoughtof as theparallelanalogof the comparisontreemodelintroducedin Section10.1.

A PCTwith p processorsis a tree whereinat eachnode,at most ppairsof comparisonsaremade(at mostonepairperprocessor).Dependingon the outcomesof all thesecomparisons,the computationproceedstoanappropriatechildof the node.Whereasin the sequentialcomparisontreeeachnodecan have at most two children,in the PCTthe numberof childrenfor any nodecanbemorethan two (dependingonp).Theexternalnodesofa PCTrepresentterminationof the algorithm.Associatedwith every pathfrom the root to an externalnodeis a uniquepermutation.As therearen! different possiblepermutationsof n itemsand any one of thesemightbe the correctanswer for the given sortingproblem,the PCTmust have atleastn!externalnodes.A typical computationfor a given input on a PCTproceedsasfollows.We startat the rootand performp pairsof comparisons.Dependingon the outcomesof thesecomparisons(which in turn dependonthe input), we branch to an appropriatechild. At this child we performp morepairsof comparisons.And soon. This continuesuntil we reachan externalnode,at which point the algorithmterminatesand the correctanswer is obtainedfrom the externalnodereached.

Example13.22Figure13.12shows a PCTwith two processorsthat sortsthreegiven numbersk\\,k2, and

\302\2433. Rectangularnodesare externalnodesthat give the final answers.At the rootof this PCT,two comparisonsaremadeand hencethereare four possibleoutcomes.Thereis a childfor therootcorrespondingtoeachof theseoutcomes.For example,if both of the


comparisonsmadeat the rootyielded \"yes,\" then clearly the sortedorderof the keys is k\\,k2,k^.On the otherhand if the rootcomparisonsyielded\"yes\" and \"no,\" respectively, then k\\ is comparedwith k%, and dependingon the outcome,the final permutationis obtained.Thedepth of this PCTand hencethe worst-caserun timeof this parallelalgorithmis two. \342\226\241

kx <k2?fe <&3?

Figure13.12PCTwith two processorsthat sortsthreenumbers

Theworst-casetime of any algorithmon a PCTis the maximumdepthofany externalnode.The averagecasetime is the averagedepthof an externalnode,all possiblepaths being equally likely. Note that in a PCT,whilecomputingthe timeof any algorithm,we only takeinto accountcomparisons.Any otheroperationsuchas the additionof numbers,data movement,andsoon, is assumedto be free. Also, at any nodeof the PCT,p pairsofcomparisonsare performedin oneunit of time. As a consequenceof theseassumptions,any comparisonproblemcan be solved in 0(1)time,givenenoughprocessors.Sincea PCTis morepowerfulthan any of the PRAMs,lowerboundsderivedfor the PCThold for the PRAMs as well.

Example13.23Supposewe aregiven n numbersfrom a linearorder.Thereare only (!!)pairs of comparisonsthat can ever be made. Therefore,ifp = (2), all thesecomparisonscan be madein one unit of time,and as aresult we can solve the following problemsin one unit of time: selection,sorting,and soon. (Note that a PCTchargesonly for the comparisonsmade.) \342\226\241

13.9.1A lowerboundonaverage-casesortingIf P > (2))sortingcan bedonein 0(1)timeon a PCT(seeExample13.23).Soassumethat p <

(\342\204\242).

Thelowerbound follows from two lemmas.

13.9.LOWER BOUNDS 661

The first lemmarelatesthe averagedepth of an externalnodeto thedegree(i.e.,the maximumnumberof childrenof any node)and the numberof externalnodes.Lemma 13.6[Shannon] A treewith degreed and \302\243 externalnodeshas an

log Ilogo!-averagedepth of at leastr^f-y. \342\226\241

We canapply the abovelemmato obtaina lower boundfor sorting,exceptthat we don'tknow what d is.Clearly, \302\243 has to be at leastn\\. Note thatat eachnodeof a PCT,we make p pairsof comparisons.So,therecan beas many as 2P possibleoutcomes.(The first pairof comparisoncan yieldeither \"yes\" or \"no\"; independently,the secondcomparisoncan yield \"yes\"or \"no\"; and so on.) If we substitute this value for d, Lemma13.6yields alower boundof ^f^.

A betterlowerboundis achievedby notingthat not allof the 2P possibleoutcomesat any nodeof the PCTarefeasible.As an example,take p > 3;the threecomparisonsmadeat a given nodearex :y, y :z, and x :z. In thiscase,it is impossiblefor the threeoutcomesto be \"yes,\" \"yes,\" and \"no.\"

To obtaina betterestimateof d, we introducea graph.Thisgraphhas nnodes,onenodeperinput number.Sucha graphGv is conceivedof for eachnodev in the PCT.For eachpairx :y of comparisonsmadeat the PCTnode?;,we draw an edgebetweenx and y. ThusGv has p edgesand is undirected.We can orient(i.e.,give a directionto) eachedgeof Gv dependingon theoutcomeof the correspondingcomparison.Say we directthe edgefrom xtoy if x > y. Note that the degreeof the nodev is the numberof ways inwhich we canorientthe p edgesof Gv.

Sincethe input numbersare from a linearorder,any orientationof theedgesof Gv that introducesa directedcycleis impossible.Thequestionthenis how many such acyclic orientationsarepossible?This numberwill be abetterestimateof d.U.Manber and M. Tompahave proved the following.

Lemma 13.7[Manberand Tompa] A graph with n verticesand m edgeshas at most(l+ ~-\\ acyclicorientations. \342\226\241

CombiningLemmas13.6and 13.7,we get the following theorem.

Theorem13.28Any PCTwith p processorsneedsan averagecasetimeofn (loga+p/n))t0 sortn numbers.

Proof:UsingLemma13.7,a better estimatefor d is (l+ -^ J . Then,accordingtoLemma13.6,the averagecasetimefor sortingis

n ( logn!,, ^ = n ( nlogn, ^ = n ( logn

\342\200\236^\342\226\241

log(l+ 2p/n)nJ \\nlog(l+ 2p/n)J \\log(1+\302\243)


13.9.2 Findingthe maximumNow we prove a lower boundon the problemof identifying the maximumofn given numbers.This problemcan alsobe solved in 0(1)timeif we have

p>(2)-Theorem13.29[Valiant] Given n unorderedelementsand p = n PCTprocessors,if MAX(n) is a lower bound on the worst-casetimeneededtodeterminethe maximumvalue in paralleltime,then MAX(n) > loglogn \342\200\224 c,wherec is a constant.

Proof:Considerthe information determinedfrom the set of comparisonsthat can be madeby timet for someparallelmaximumfinding algorithm.Someof the elementshave beenshown tobe smallerthan otherelements,and so they have beeneliminated.Theothersform a set S which containsthe correctanswer.If at timet two elementsnot in Sarecompared,then noprogressis madein decreasingsetS.If an elementin setSand onenot in Sarecomparedand the largerelementis in S,thenagainno improvementhasbeenmade.Assume that the worst caseholds;this meansthat the only wayto decreasethe setS is to make comparisonsbetweenpairsof its elements.

Imaginea graph in which the nodesrepresentthe values in the input anda directededgefrom a to 6 impliesthat b is greaterthan a. A subset ofthe nodesis saidto be stable if no pairfrom it is connectedby an edge.(InFigure13.13,the nodese,b,g,and / form a stableset.)Then the sizeof Sat timet can beexpressedas

\\S at timet\\ > min {max{h\\G containsa stablesetof sizeh)\\

G is a graphwith \\S\\ nodesand n edges}

It has beenshown by Turan in On the Theory of Graphs (Colloq.Math.,1954)that the sizeof S at timet is > the sizeof S at timet \342\200\224 1,squaredand divided by 2p plus the sizeof S.We can solve this recurrencerelationusingthe fact that initially the sizeof S equalsn; this shows that the sizeof S will begreaterthan onesolongas t < loglogn \342\200\224 c. \342\226\241

EXERCISES1.[Valiant] Devisea parallelalgorithmthat producesthe maximumof n

unorderedelementsin loglogn + cparalleltime,wherec is a constant.

2. [Valiant] Devisea parallelsortingalgorithmthat takesa timeof atmost 2 logn loglogn + 0(logn).


Figure13.13Theset {e,/,b,g}is stable.

3. TheoremGiven n bits,any algorithmfor computingtheparity of thesebits will need fl(lo\"{'\") timein the worstcaseif thereareonly a polynomialnumberof CRCW PRAMprocessors.

Usingthis theoremprove that any algorithmfor sortingn givennumbers will need

^(Io\302\260|ô\" ) time in the worst case,if the number of

processorsused is n\302\260^1'.

13.10REFERENCESAND READINGSThreeexcellenttextson parallelcomputingare:Introductionto ParallelAlgorithms and Architectures:Arrays-Trees-Hyper-cubes,by Tom Leighton,Morgan-Kaufmann,1992.ParallelAlgorithms: Designand Analysis, by J. Ja Ja, Addison-Wesley,1992.Synthesis of ParallelAlgorithms,editedby J. H. Reif, Morgan-Kaufmann,1992.

For a definition of the PRAM and the metricssee:\"Parallelalgorithmictechniquesfor combinatorialcomputation,\"by D.Epp-steinand Z. Galil,Annual Reviews in ComputerScience3 (1988):233-283.\"Performancemetrics:keepingthe focus on runtime,\" by S.Sahniand V.Thanvantri, IEEEParalleland Distributed Technology, Spring1996:1-14.

Materialonprefixcomputationand list rankingcanbefound in the threetextsmentionedabove.


Thework-optimalrandomizedselectionalgorithmis basedon:\"Expectedtimebounds for selection,\"by R. W. Floyd and R. L. Rivest,Communicationsof the ACM 18,no.3 (1975):165-172.\"Derivation of randomizedalgorithmsfor sortingand selection,\"by S.Ra-jasekaranand J. H.Reif. in ParallelAlgorithm DerivationAnd ProgramTransformation, editedby R. Paige,J. H.Reif, and R. Wachter, Kluwer,1993,pp.187-205.

Fora survey of parallelsortingand selectionalgorithmssee\"Sortingand selectionon interconnectionnetworks,\"by S.Rajasekaran,DI-MACS Seriesin DiscreteMathematicsand TheoreticalComputerScience21,1995:275-296.

Preparata'salgorithmfirst appearedin \"New parallelsortingschemes,\"by F.P.Preparata,IEEE Transactionson ComputersC27,no. 7 (1978):669-673.TheoriginalReischuk'salgorithmwas recursiveand appearedin\"Probabilisticparallelalgorithmsfor sortingand selection,\"by R. Reischuk,SIAM Journalof Computing14,no.2 (1985):396-409.The versionpresented in this chapteris basedon \"Random samplingtechniquesand parallelalgorithmsdesign,\"by S.Rajasekaranand S.Sen, in Synthesis of ParallelAlgorithms, J. H. Reif, ed.,Morgan-Kaufmann,1993,pp.411-451.Adeterministic algorithmfor sortingwith a run timeof O(logn)usingn EREWPRAM processorscanbe found in \"Parallelmergesort,\"by R. Cole,SIAMJournalon Computing17,no.4 (1988):770-785.

For a proofof Theorem13.19see \"Optimaland sub-logarithmictimerandomizedparallelsortingalgorithms,\"by S.Rajasekaranand J.H.Reif,SIAM Journalon Computing18,no.3(1989):594-607.SolutionstoExercises 2,3,and 10of Section13.6can be found in this paper.

The generalparadigmfor the solutionof many graph problemsis givenin \"Parallelcomputationand conflictsin memory access,\"by L. Kucera,InformationProcessingLetters,(1982):93-96.

Formorematerialon convexhull and relatedproblemsseethe textby J.Ja Ja.Forthe lowerboundprooffor finding the maximumsee\"Parallelismin comparisonproblems,\"by L.Valiant, SIAM Journalon Computing4, no.3 (1975):348-355.

Theorem13.28was first proved in \"The averagecomplexityofdeterministic and randomizedparallelcomparison-sortingalgorithms,\"by N. Alonand Y. Azar, SIAM Journalon Computing (1988):1178-1192.The proofwas greatly simplified in \"The average-caseparallelcomplexityof sorting,\"by R. B.Boppana,InformationProcessingLetters33 (1989):145-146.A

proofof Lemma13.7can be found in \"The effect of numberof Hamiltonian


paths on the complexityof a vertexcoloringproblem,\"by U. Manber andM. Tompa,SIAM Journalon Computing13,(1984):109-115.Lemma13.6was proved in \"A mathematicaltheory of communication,\"by Shannon,BellSystem TechnicalJournal27 (1948):379-423and 623-56.

13.11ADDITIONALEXERCISES1.Supposeyou have a sortedlist of n keys in commonmemory. Give

an 0(logn/(logp))timealgorithmthat takesa key x as input andsearchesthe list for x usingp CREW PRAM processors.

2. A sequenceof n keys k\\, &2,...,kn is input. Theproblemis to find theright neighborof eachkey in sortedorder.For instanceif the input is5.2,7, 2,11,15,13,the output is 7,11,5.2,13,oo,15.(a) How will you solve this problemin 0(1)timeusing n3 CRCW

PRAM processors?(b) Howwill you solvethe sameproblemusinga Las Vegasalgorithm

in 0(1)timeemployingn2 CRCW PRAM processors?3.Theinput is a sequenceSof n arbitrary numberswith many

duplications,suchthat the numberof distinct numbersis 0(1).Presentan

O(logn) timealgorithmto sort S using j^^ priority-CRCW PRAMprocessors.

4. A, J3, and Carethreesetsof n numberseach,and \302\243 is anothernumber.Showhow to checkwhethertherearethreeelements,pickedoneeachfrom the threesets,whosesum is equal to t. Your algorithmshouldrun in O(logn) timeusingat most n2 CRCW PRAM processors.

5. An array A of sizen is input. The array can only be of one of thefollowing threetypes:

Type I:A has allzeros.Type II:A has allones.Type III:A has j onesand |nzeros.

Howwill you identify the type of A in 0(1)timeusinga Monte Carloalgorithm? You can use lognCRCW PRAM processors.Show thatthe probability of a correctanswer will be > 1 \342\200\224 n~a for any fixeda>1.

6. Input is an array A of n numbers.Any number in A occurseitheronly onceor morethan n3'4 times. Elementsthat occurmorethann3'4 timeseacharecalledsignificantelements.Presenta Monte Carlo


algorithmwith a run timeof 0(n3/4logn) to identify all thesignificant elementsof A. Prove that the output will be correctwith highprobability.

7. Let A be an array of sizen such that eachelementis markedwitha bucketnumber in the range [1,2,..., m],wherem divides n. Thenumberof elementsbelongingto eachbucketis exactly\342\200\224. Developarandomizedparallelalgorithmto rearrangethe elementsof A sothatthe elementsin the first bucketappearfirst, followed by the elementsof the secondbucket,and soon. Your Las Vegas algorithmshouldrun in O(logn)timeusing^~ CRCW PRAM processors.Provethecorrectnessand timeboundof your algorithm.

8. Input are a directedgraph G(V,E) and two nodesv,w \342\202\254 V. Theproblemis todeterminewhetherthereexistsa directedpath from vto w of length < 3. How will you solve this problemin 0(1)timeusing |V|2CRCW PRAM processors?Assume that G is available incommonmemory in the form of an adjacencymatrix.

9. Given is an undirectedgraph G(V,E) in adjacencymatrixform. Weneedtodecideif G hasa triangle,that is,threemutually adjacentvertices. Presentan O(logn)time,(n3/logn)-processorCRCW PRAMalgorithmto solve this problem.

Chapter14

MESH ALGORITHMS

14.1COMPUTATIONALMODEL

A mesh is an a x b grid in which there is a processorat eachgrid point.Theedgescorrespondtocommunicationlinks and are bidirectional.Eachprocessorof the meshcanbelabeledwith a tuple (i,j),where1< i < a and1< 3 < b. Every processorof the meshis a RAM with somelocalmemory.Henceeachprocessorcan performany of the basicoperationssuchasaddition, subtraction,multiplication,comparison,localmemory access,and soon, in oneunit of time.Thecomputationis assumedto besynchronous;thatis,thereis a globalclockand in every timeunit eachprocessorcompletesits intendedtask. In this chapterwe consideronly squaremeshes,that is,meshesfor which a = b. A yfp x yfp meshis shown in Figure14.1(a).

A closelyrelatedmodelis the lineararray (Figure14.1(b)).A lineararrayconsistsof p processors(namedl,2,...,p)connectedas follows. Processori is connectedtothe processorsi\342\200\224\\ and i'. + 1,for 2 < i <p \342\200\224 1;processor1is connectedto processor2 and processorp is connectedtoprocessorp \342\200\224 1.Processors1and p are known as the boundary processors.Processori \342\200\224 1(i + 1) is calledthe left neighbor(right neighbor)of i. Processor1doesnothave a left neighborand processorp doesnot have a right neighbor.Herealsowe assumethat the links arebidirectional.A yfp x yfp meshhas severalsubgraphsthat are ^/p-processorlineararrays. Often, the individual stepsof meshalgorithmscanbe thought of as operationson lineararrays.

Interprocessorcommunicationin any fixed connectionmachineoccurswith the help of communicationlinks. If two processorsconnectedby anedgewant to communicate,they can do soin one unit of time. If thereisno edgeconnectingtwo given processorsthat desireto communicate,thencommunicationis enabledusingany of the pathsconnectingthemand hencethe timefor communicationdependson the path length (at leastfor small-sizedmessages).It is assumedthat in one unit of timea processorcan

667

668 CHAPTER14. MESHALGORITHMS

(1,1)\302\251\342\200\224\302\251\342\200\224\302\251\342\200\224\302\251

\302\251\342\200\224\302\251\342\200\224\302\251\342\200\224\302\251

G\342\200\224\302\251\342\200\224\302\251\342\200\224\302\251

\302\251\342\200\224(3\342\200\224^3\342\200\224\302\251

(3\342\200\224^3\342\200\224G\342\200\224\302\251

(3\342\200\224\302\251\342\200\224G\342\200\224\302\251

\302\251- O O \302\251-

i/pA)

(hjp)G\342\200\224\302\251\342\200\224\302\251

\302\251\342\200\224\302\2433\342\200\2240

e\342\200\224\302\251\342\200\224\302\251

\302\251\342\200\224\302\251\342\200\224\302\251

O (3 (3

^3-^3\342\200\224\302\251

<b-^^biJPJP)

(a)Mesh

(^\342\200\224^\342\200\224^\342\200\224^\342\200\224^\342\200\224^\342\200\224^\342\200\224^\342\200\224^\342\200\224^\342\200\224^-123456789(b)Lineararray

Figure14.1A mesh-connectedcomputerand a lineararray

14.2.PACKETROUTING 669

performa localcomputationand/orcommunicatewith all its up to fourneighbors.

In a mesh,all processorswhose first (second)coordinatesare the sameform a row {colurrm)of the mesh.Forexample,row i is madeup of theprocessors

(\302\253', 1),(\302\253', 2),...,(\302\253', y/p). Eachsuchrow or columnis a ^-processorlineararray. Often, a mesh algorithmconsistsof stepsthat are localto

individual rows or columns.

14.2 PACKET ROUTING

A singlestepof interprocessorcommunicationin a fixedconnectionnetworkcanbethoughtof as the following task,calledpacketrouting:Eachprocessorin the networkhas a packetof information that has to besent to someotherprocessor.The task is to sendall the packetsto their correctdestinationsasquicklyas possibleso that at most onepacketpassesthroughany link atany time.Sincethe bandwidth of any communicationchannelis limited,itbecomesnecessaryto imposethe restrictionthat at most one packetpassthrough the channelat any time. It is possiblethat two or morepacketsarrive at someprocessorv at the sametimeand all of them want to usethe samelink goingout of v. In this case,only one packetwill be sentout in the next timeunit and the rest of the packetswill be queuedat vfor future transmission.We use a priority schemeto decidewhich packetis transmittedin suchcasesof link contentions.Farthestdestinationfirst(the packetwhosedestinationis the furthest wins),farthest originfirst (thepacketwhoseoriginis the farthest wins),first-in first-out (FIFO),andsoon,areexamplesof priority schemes.

Partialpermutationrouting (PPR) is a specialcaseof the routingproblem. In PPR,eachprocessoris the originof at most one packetand eachprocessoris the destinationof no morethan one packet.Note that on theEREW PRAM, PPRcan beperformedin onesimultaneouswrite step.Butin the caseof any fixed connectionnetwork, PPR is achievedby sendingand receivingpacketsalongcommunicationedgesand is often a challengingtask. Also, in any fixed connectionnetwork, typically, the input is given toprocessorsin someorderand the output is alsoexpectedto appearin aspecified order.Just rearrangingthe data in the right ordermay involve severalPPRs. Thus any nontrivial algorithmto be designedon a fixedconnection network invariably requiresPPRs.This is oneof the crucialdifferencesbetweennetwork algorithmsand PRAM algorithms.

A packetroutingalgorithmis judged by its run time,that is,the timetaken by the last packetto reachits destination,and its queue length, themaximumnumberof packetsany processorhastostoreduringrouting.Notethat the queuelengthis lower boundedby the maximumnumberof packetsdestinedfor any nodeand the maximumnumberof packetsoriginatingfrom

670 CHAPTER14.MESHALGORITHMS

any node.We assumethat a packetnot only containsthe message(from oneprocessorto another)but alsothe originand destinationinformation of thispacket.An algorithmfor packetroutingis specifiedby the path to be takenby eachpacketand a priority scheme.The timetaken by any packettoreachits destinationis dictatedby the distancebetweenthe packet'soriginand destinationand the amount of time(referredto as the delay)the packetspendswaiting in queues.

Example14.1Considerthe packetsa,6,c, and d in Figure14.2(a).Theirfinal destinationsare shown in Figure14.2(g).Let us assumethe FIFOpriority schemein which tiesare brokenarbitrarily. Also let eachpackettakethe shortestpath from its origintoits destination.At timestept = 1,every packetmoves oneedgecloserto its destination.As a result,packetsa and b reachthe samenode.So,at t = 2,oneof a and b has tobequeued.Sinceboth a and b have reachedthis nodeat the sametime,thereis a tie.This can be brokenarbitrarily. Assume that a has won. Also at t = 2,thepacketsc and d move one stepcloserto their final destinationsand hencejoinb (seeFigure14.2(c)).At t = 3, packetb moves out sinceit has higherpriority than c and d. At t = 4, packetsc and d contendfor the sameedge.Sinceboth of thesehave the samepriority, the winner is chosenarbitrarily.Let d be the winner. It takestwo morestepsfor c to reachits destination.By then every packetis at its destination.

The distancepacketc has to travel is four. Itsdelay is two sinceit hasbeenqueuedtwice (onceeachbecauseof the packetsb and d). So,c hastaken sixstepsin toto. Can the run timebe improved usinga differentpriority scheme?Say we usethe farthest destinationfirst scheme.Then, att = 4, packetc will have a higherpriority and hencewill advance. Underthis scheme,then, the run timewill reducetofive! \342\226\241

14.2.1PacketRoutingona LinearArrayIn a lineararray, sincethe links are bidirectional,a processorcan receiveand send messagesfrom eachof its neighborsin one unit of time. Thisassumptionimpliesthat if there is a streamof packetsgoingfrom left toright and anotherstreamgoingfrom right to left, then thesetwo streamsdonot affect eachother;that is,they won't contendfor the samelink. In thissectionwe show that PPR on a lineararray can be done in p \342\200\224 1stepsorless.Note that in the worst case,p\342\200\2241 stepsareneeded,since,for example,a packetfrom processor1may be destinedfor processorp. In additiontoPPR,we alsostudy somemoregeneralroutingproblemson a lineararray.

Example14.2In Figure14.3,packetsgoingfrom left to right aremarkedwith circlesand thosegoingfrom right to left are markedwith ticks. Forexample,packetsa and b have tocrossthe sameedgeat the first timestep


t=0 t=l t=2 t=3

c fto-e-e-

(a)

V

ooC)

C)

o

Si

\302\251

o

CMS^MOM)OC)

C)C)

(b) I

b,a

\302\251

C)vG-OO CM)OaOoo\302\251(c)

9ob,c,d \302\251 c,d

O Q O CMDObOaOO\302\251

(d)

t=4\302\251

\302\251

(e)

\302\251,

\302\251,

\302\251'

\302\251

\302\251

t=5o\302\251

\302\251

o o o

(f)

&-\302\251

C)^,cO\302\251ft

\302\251a

\302\251

f=6

9\302\251

\302\251

O O Q \302\251-Q

1 cft

1 a(g)

\302\251,

\302\251,

\302\251i

(),\302\251

Figure14.2Packetrouting- an example


10 o Y \302\247 Y o Y Y o Y Y

Figure14.3Left and right flows areindependent!

but in oppositedirections.Sincethe edgesare bidirectional,there is nocontention;both can crossat the sametime.Also, note that a packetthatoriginatesat node1and whosedestinationis p has to crossp \342\200\224 1edgesandhenceneedsat leastp \342\200\224 1steps. \342\226\241

Problem1 [One packet at eachorigin] On a p-processorlinear array,assume that at most one packetoriginatesfrom any processorand that thedestinationsof the packetsarearbitrary. Routethe packets. \342\226\241

Lemma 14.1Problem1can be solved in <p \342\200\224 1steps.

Proof:Eachpacketq can be routedusingthe shortestpath betweenitsoriginand destination.Consideronly the packetsthat travel from left toright (sincethosethat travel from right to left canbeanalyzed independentlyin the samemanner). If q originatesat processori and is destinedforprocessorj, then it needsonly j \342\200\224 i stepsto reachits destination.Note thata packetcanonly travel onelink at a time.Thereis no delay associatedwithq sinceit never gets to meetany otherpacket.Themaximumof this timeover allpossiblepacketsis p \342\200\224 1.Also, the queuelengthof this algorithmisthe maximumnumberof packetsdestinedfor any node. \342\226\241

Problem2 [At most one packet per destination] On a p-processorlineararray, processori has ki (1< ki < p) packetsinitially (for i = 1,2,...,p)such that J2i-ihi = p. Each processoris the destinationfor exactlyonepacket.Routethe packets. \342\226\241

Lemma 14.2If the farthest destinationfirst priority schemeis used,thetimeneededfor a packetstarting at processori to reachits destinationisno morethan the distancebetweeni and the boundary in the directionthepacketis moving.That is,if the packetis moving from left to right, thenthis timeis no morethan (p \342\200\224 i) and, if the packetis moving from right toleft, this timeis < (i \342\200\224 1).


I\342\200\224I\342\200\224I\342\200\224I\342\200\224I\342\200\224I\342\200\224I\342\200\224I\342\200\224I\342\200\224'-^ 1t = o

2 3 0 0 3 10 111310 2 11011| | | | | | | | |

\342\200\242 \342\200\242 \342\200\242

|t=2

0 3 111111011Figure14.4Freesequence

Proof:Considera packetq at processori and destinedfor j. Assumewithout lossof generalitythat it is moving from left toright. Ignorethe presenceof packetsthat travel from right to left for reasonsstatedin Example14.2.Let every packettraversethe shortestpath connectingits originanddestination. Packetq canonly bedelayedby the packetsthat have destinations>jand are to the left of their destinations.Let k\\, k-2, \342\200\242\342\200\242

\342\200\242, kj-\\ be the numberof suchpackets(at the beginning)at processors1,2,...,j \342\200\224 1respectively.(Noticethat ^\"jks> 1and km> < 1for m < m' < j \342\200\224 1.Callthesequencekm, km+i,...,kj-\\ the freesequence.Realizethat a packetin thefree sequencewill not bedelayedby any otherpacketin the future. Moreover,at every timestepat leastone new packetjoinsthe free sequence.Figure14.4presentsan example.In this figure, the numbersdisplayed denotethenumbersof packetsin the correspondingnodes.Forexample,therearethreepacketsin nodei at t = 0.At t \342\200\224 0, 1,0,1,1is a free sequence.Also note inthis figure how the numberof packetsin the free sequenceincreasesas timeprogresses.Forexample,at t = 1,one new packetjoinsthe free sequence.At t = 2, four new packetsjointhe free sequence!

Thus, after p \342\200\224 j steps,allpacketsthat can possibly delay q have joinedthe free sequence.Packetq needsonly an additionalj \342\200\224 i steps,at most,to reachits destination(seeFigure14.5).The casewhen the packetmovesfrom right to left is similar. \342\226\241


distance delay\"\"

1 2 3


Problem3 [General packet routing] In a lineararray with p processorsassume that morethan onepacketcanoriginatefrom any processorand morethan onepacketcanbedestinedfor any processor.In addition,the numberof packetsoriginatingfrom the processors1,2,...,j is no morethan j+f(p)(for any j and somefunction /).Routethe packets.

Lemma 14.3Underthe furthest originfirst priority scheme,Problem3 canbe solvedwithin p + f(p) steps.Proof:Let q beapacketoriginatingat processori and destinedfor processorj (to the right of i).Then q can potentially be delayed by at most i + f(p)packets(sinceonly thesemany packetscan originatefrom the processors1,2,...,i and hencehave a higherpriority than q). If q is delayed by eachof thesepacketsat most once,then it follows that the delay q suffers is< i +f(p).Else,if a packetr with higherpriority delays q twice (say), thenit meansthat r has beendelayed by another packetthat has even higherpriority and will never get to delay q. Therefore,the delay for q is < i +f(p).Sinceq only needsan additionalj \342\200\224 i stepsto reachits destination,the totaltimeneededfor q is < j +f(p).Themaximumof this timeover allpacketsis p + f{p). \342\226\241

Example14.3Figure14.6illustratesthe proofof Lemma14.3.Thereareeight packetsin this example:a,b,c,d,e,f,g,and h. Let g be the packetof our concern. Packetg can possibly be delayed only by the packetsa,6,c,d,e,/, and h. Packetg reachesits destinationat t = 9. Thedistance it travels is two and its delay is seven.In this figure, packetsthat havecrossednodej arenot displayed. \342\226\241

14.2.2 A GreedyAlgorithmfor PPR ona MeshForthe PPRproblemon a y/px y/p mesh,we seethat if a packetat processor(1,1)has {y/p,y/p) as its destination,then it has to travel a distanceof

2(y/p \342\200\224 1).Hence2(y/p \342\200\224 1) is a lowerboundon the worst-caseroutingtimeof any packetroutingalgorithm.


t\342\200\224U 1

a,b

I\342\200\224I

1\342\200\2244

t\342\200\224U 1

t\342\200\224b

1

c

a

1

1

d,e,f g,h

b,c,de,g

a,b c,g

\\

a,g

f

\\

d

b

g

\\

jh

e

c

ja

1 1 1

1 ! 1

a b

1 1 I1 I S

i

1

c,d,e

1

1

a,b,c

a

I

1

l

1

d,g

b,g

g

i

1

h

e

c

a

1

1

j

\\1

1

d

jb

1

g

\342\200\2241 t-1

1 t\342\200\224i

\\ i T^ t-5

\342\200\2241 t-7

1 t r,^ t-9

Figure14.6An exampleto illustrateLemma14.3

A simplePPRalgorithmthat makesuseof the packetroutingalgorithmswe have seenfor a lineararray is the following.Let q bean arbitrary packetwith (i,j)as its originand

(k,\302\243)as its destination.Packetq usesa two-

phasealgorithm.In phase1 it travels alongcolumnj to row k alongtheshortestpath. In phase2 it traversesalongrow k to itscorrectdestinationagain usingthe shortestpath. A packetcan start its phase2 immediatelyon completionof its phase1.

Phase1canbecompletediny/p\342\200\2241

stepsorlesssinceLemma14.1applies.Phase2 alsotakes< sfp \342\200\224 1stepsin accordancewith Lemma14.2.So,thisalgorithmtakesat most 2(y/p \342\200\224 1) stepsand is optimal.

But there is a severedrawback with this algorithm,namely, that thequeuesizeneededis as largeas ^. Let the partialpermutationto beroutedbe such that all packetsthat originatefrom column1aredestinedfor row y^-. Forthis PPR problem,the processor(^,1) getstwo packets(onefrom above and onefrom below)at every timestep.Sinceboth of thesewant tousethe samelink, only onecan be sent out and the other has to bequeued.Thiscontinuesuntil step^- at whichtimetherewill be^ packetsin the queueof (^,1) (seeFigure14.7).

Ideally we would like to designalgorithmsthat requirea queuesizethatis 0(1)(ora slowlyincreasingfunction of p suchas 0(logp)).


(1,1)<3^-

\302\243id

\302\251\342\200\224

Up,i)

(hyp)-Q

-6)

-\302\251

U>,>)

Figure14.7Thegreedy algorithmneedslargequeues

14.2.3 A RandomizedAlgorithmwith SmallQueuesThe two-phasealgorithmcan be modifiedtoensurequeuesof size0(logp)with the helpof randomization.Therearethreephasesin the new algorithmand the run time is 3y/p+ o(y/p). Let q be any packetwhoseoriginanddestinationare {i,j)and

(k,\302\243), respectively.The algorithmemployed byq is depictedin Algorithm 14.1.In this algorithm,the threephasesaredisjoint;that is,a packetcan start its phase2 only after all packetshavecompletedtheir phase1,and it can start its phase3 only after all packetshave completedtheirphase2.Thisconstraintmakesthe analysis simpler.

Theorem14.1Algorithm 14.1terminatesin time3^/p+O^'^logp).Proof:Phase1takessfp timeor lessapplying Lemma14.1,sinceno packetsuffers any delays.

Considera packetthat startsphase2 at (i',j).Without lossof generalityassumethat it is moving to the right. Thenumberof packetsstarting thisphasefrom processor(i',j)is a binomialdistribution,B(y/p,-k=).This isbecausethere are y/p packetsin columnj and eachone can end up atthe end of phase1at (i',j)with probability 4=. In turn, the numberof

packetsthat start their phase2 from (i1,1),(i',2),..., or (i',j)is a binomialdistribution,B{j^/p,-4=). (We have madeuse of the fact that the sumof B(ni,x)and B(ri2,x)is B(n\\ + ri2,x).)The mean of this variable isj. Using Chernoff bounds (Equation1.1),this number is no morethanj + 3ap1/,4logepwith probability > 1 \342\200\224 p~a~l for any a > 1. Thus this


Phase1. Packetq choosesa randomprocessor(i',j)in thecolumnof its origin and traversesto (i',j)using the shortestpath.

Phase2.Packetq traversesalongrow i' to(i',\302\243)-

Phase3.Finally, packetq travels alongcolumn\302\243 to its correctdestination.

Algorithm14.1A randomizedpacketroutingalgorithm

numberis j +0{p1'4logp).Now, applying Lemma14.3,we seethat phase2 terminatesin time^Jp+0(p1'4logp).

At the beginningof phase3,thereare < ^Jp packetsstarting from anycolumnand eachprocessoris the destinationof at most one packet.Thusin accordancewith Lemma14.2,phase3 takes< y/p steps. \342\226\241

Note:In this analysis it was shown that for a specificpacketthereis a highprobability that it will terminatewithin the statedtimebounds.But forthe algorithmto terminatewithin a specifiedamountof time,every packetshouldreachits destinationwithin this time.If the probability that anindividual packettakesmorethan timeT (for someT) to reachits destinationcan be shown to be <p~a~l,then the probability that thereis at leastonepacketthat takesmorethan T time is < p~a~lp= p~a. That is,everypacketwill reachits destinationwithin timeT with probability > 1\342\200\224 p~a.

Thequeuelengthof Algorithm 14.1is O(logp).Duringany phaseof routing,note that the queuelengthat any processoris no morethan the maximumof the numberof packetsat the beginningof the phasein this processorandthe numberof packetsin this processorat the end of the phase.Considerany processor(i,j) in the mesh.Duringphase1,only onepacketstartsfromany processorand the numberof packetsthat end up at this processoratthe end of phase1 is B(^/p,-j=). The mean of this binomialis 1. UsingChernoffbounds(Equation1.1),this numbercan be shown to be 0(logp).Duringphase2,0(logp)packetsstart from any processor.Also, O(logp)packetsend up in any processor(the proofof this is left as an exercise).Inphase3,C)(logp)packetsstart from any processorand only onepacketendsup in any processor.Therefore,the queuelengthof the whole algorithmis(5(logp). \342\226\241


EXERCISES1.In Example14.1,computethe run timesfor the priority schemes

farthest originfirst and last-infirst-out.

2.On a p-nodelineararray thereare two packetsat eachnodeto beginwith. Assumep is even.Packetsat node1aredestinedfor node|+ 1,packetsat node2 aredestinedfor|+ 2,and soon.Let eachpackettakethe shortestpath from its originto its destination.Computetherouting time for this problemunder the following priority schemes:farthest originfirst, farthest destinationfirst, FIFO,and LIFO.

3. Do the above problemwhen the packetsfrom nodeone aredestinedfor nodep,packetsfrom nodetwo aredestinedfor p \342\200\224 1,and soon.

4. Partitiona y/p X y/p meshinto four quadrantsas shown in Figure14.8.Thereis a packetat eachnodeto start with. Packetsin quadrantI are to be exchangedwith packetsin quadrant IV.Also, packetsinquadrant II have to be exchangedwith packetsin quadrant III.Theorderingof packetsin individual quadrantsshouldnot change.Showhow you routethe packetsin time< y/p.

I

?Ill

II

IV


5. In the three-phaserandomizedalgorithmprove that the numberof

packetsthat end up in any processorat the end of phase2 is 0(\\ogp).

6. Therandomizedmeshroutingalgorithm(Algorithm 14.1)of thissection can be improved as follows. In phase1,partitionthe meshintoslicessothat eachsliceconsistsof -^ rows (for someintegerq > 1).A packetq that starts from (i,j)choosesa randomprocessorin thesamecolumnand sliceas its originand goesthereusingthe shortest

14.3.FUNDAMENTALALGORITHMS 679

path. Phases2 and 3 remainthe same.Showthat this algorithmrunsin time2^/p + 0(~)and has a queuelengthO(q). Notethat whenq = logp,the run timeof this algorithmis 2y/p+o(y/p) and the queuelengthis O(logp).

7. Supposethat in a /^-processorlinear array at most k packetsoriginate from any processorand at most k packetsare destinedfor anyprocessor.Show how to performthis routing task in time ^ t, orless.

8. In a y/px yjp mesh,assumethat at most onepacketoriginatesfrom anyprocessorand that the destinationof eachpacketis chosenuniformlyrandomly to beany processorof the mesh.Provethat for this specialcasethe greedy algorithmruns in time 2^/p + o(^/p) with a queuelengthof O(logp).

9. Supposethat at most one packetoriginatesfrom any processorof ay/P x \\/P meshand that the destinationof any packetis no morethand distanceaway from its origin,Presentan 0(d)algorithmfor routingthis specialcase.

10,A p-processorring is a p-processorlineararray in which theprocessors 1and p are connectedby a link (this link is alsoknown as thewraparoundconnection).Show that the PPR problemcan be solvedon a p-processorring in time|,

11,How fast can you solve Problem2 on a ring (seeExercise10)?Howabout Problem3?

12,A sjpx yfp torus is a y/p x^Jpmeshin whicheachrow andeachcolumnhas a wraparoundconnection,A 5 x 5 torus is shown in Figure14,9,Presentan implementationof the randomizedthree-phasealgorithm(Algorithm 14.1)on a torus to achievea run timeof 1,5^/p+ o(p),

13,A stringa\\a,2- \342\200\242\342\200\242av from somealphabetX is calleda palindromeif it isidenticalto apap-\\ \342\226\240\342\226\240\342\226\240a\\, A stringof lengthp is input on ap-processorlineararray. How can you test whetherthe string is a palindromein0(p)time?

14.3 FUNDAMENTALALGORITHMSIn this sectionwe presentmeshalgorithmsfor somebasicoperationssuchasbroadcasting,prefix sumscomputation,and data concentration,All thesealgorithmstake 0(^/p) timeon a ^Jp x ^Jp mesh, For many nontrivial


Figure14.9A 5 x 5 torus

problemsincludingthe preceding,sorting,convexhull, and soon, 2(y/p\342\200\224 1)is a lower bound, This follows from the fact that a data itemfrom onecornerof the meshneeds2(y/p \342\200\224 1) timeto reachan oppositecorner.Inthe worst case,two processorsin oppositecornershave to communicate.Thedistance2(y/p \342\200\224 1) is the diameterof the mesh, (Thediameterof anyinterconnectionnetwork is defined to be the maximumdistancebetweenanytwo processorsin the network.)Forany nontrivial problemtobe solved onan interconnectionnetwork, the diameteris usually a lower bound on therun time.

Thebisectionwidth of a networkcanalsobeusedto derivelower bounds,Thebisectionwidth of a networkis the minimum numberof links that have tobe removed to partitionthe network into two identicalhalves, Forexample,considera4x4mesh,If we remove the four links ((1,2),(1,3)),((2,2),(2,3)),((3,2),(3,3)),and ((4,2),(4,3)),two identical4x2 submeshesarise,Herethe bisectionwidth is 4, In general,the bisectionwidth of a ^/p x y/p meshcan beseento be y/p.

Theproblemof k \342\200\224 k routing is dennedas follows, At most k packetsoriginatefrom any processorand at most k packetsare destinedfor anyprocessorof the network, Routethesepackets,Let b be the bisectionwidthof the network underconcern,By definition,removal of b links resultsin aneven partitioningof the network, If the routingproblemis suchthat exactlyk packetsoriginateand aredestinedfor any processorand that the packetsfrom one half have to be exchangedwith packetsfrom the otherhalf, thisexchangecanhappenonly throughtheseb links, Thusany routingalgorithm


kn

will needat least-jj- = -^ timeto performthis routingon an n-processorbisectionb network, On a y/p x ^Jpmesh,this lower boundbecomes

\342\200\224^\342\200\224,

14.3.1BroadcastingThe problemof broadcastingin an interconnectionnetwork is to send acopy of a messagethat originatesfrom a particularprocessortoa specifiedsubsetof otherprocessors,Unlessotherwisespecified,this subsetis assumedto consistof every otherprocessor,Broadcastingis a primitive form ofinterprocessorcommunicationand is widely used in the designof severalalgorithms,Let \302\243 be a lineararray with the processorsl,2,,,,,p,Alsolet M be a messagethat originatesfrom processor1,MessageM can bebroadcasttoevery otherprocessoras follows, Node 1sendsa copy of M toprocessor2,which in turn forwards a copy to processor3, and so on, Thisalgorithmtakesp \342\200\224 1stepsand this run timeis the best possible,If theprocessorof messageoriginis different from processor1,a similarstrategycouldbeemployed, If processori is the origin,i couldstart by makingtwocopiesof M and sendinga copy in eachdirection,

In the caseof a ^/p x ^Jp meshbroadcastingcan bedonein two phases,If (i,j) is the processorof messageorigin,in phase1,M couldbebroadcasttoall processorsin row i. In phase2, broadcastingof M is done in eachcolumn,This algorithmtakes< 2(^/p \342\200\224 1) steps,This can beexpressedina theorem.

Theorem14.2Broadcastingon a p-processorlinear array can becompleted in p stepsor less.On a ^Jpx ^Jpmeshthe samecan beperformedin< 2(^/p- 1)= 0(^/p) time, \342\226\241

Example14.4On a 4 x 4 mesh,let the messageto be broadcastoriginate at (2,3),In phase1,this messageis broadcastin row 2, The nodes(2,1),(2,2),(2,3),and (2,4) get the messageat the endof phase1,In phase2,node(2,1)broadcastsin column1;node(2,2)broadcastsin column2;and nodes(2,3) and (2,4) broadcastin columns3 and 4, respectively(seeFigure14,10), \342\226\241

14.3.2 PrefixComputationLet X be any domainin which the binary associativeunit timecomputableoperator\302\251 is denned(seeSection13.3.1).Recallthat the prefixcomputationproblemon \302\243 has as input n elementsfrom X, say, X\\,X2, \342\200\242\342\200\242\342\200\242,xn. Theproblemis to computethe n elementsx\\,x\\ ffia^,...,x\\ \302\256X2\302\256xs\302\256

\342\200\242\342\226\240\342\200\242

\302\256xn.

Theoutput elementsareoften referredtoas the prefixes.Forsimplicity,werefer tothe operation\302\251 as addition.


(1,1) (1,2) (1,3) (1,4)

CH-e

e\342\200\224Q

e o o

(4,1) (4,2) (4,3) (4,4)

phase1

I(1,1) (1,2) (1,3) (1,4)

$I

V V V

(4,1) (4,2) (4,3) (4,4)

phase2

Figure14.10Broadcastingin a mesh

In the caseof a lineararray with p processors,assumethat thereis anelementX{ at processori (for i = 1,2,...,p). We have to computetheprefixesof x\\,X2,.\342\200\242\342\200\242,xp. After this computation,processori shouldhavethe value J2 :1XJ\" One way of performingthis computationis as follows.In step1,processor1sendsx\\ to the right, In step2,processor2 computesx\\ \302\251 X2, storesthis answer,and sendsa copy to its right neighbor.In step3,processor3 receivesX\\ \302\251 X2 from itsleft neighbor,computesX\\ \302\251 X2 \302\251

\302\2433,

storesthis result,and alsosendsa copy to the right neighbor,And soon, Ingeneralin stepi,processori addsthe elementreceivedfrom its left neighborto Xi, storesthe answer,and sends a copy to the right. This algorithm(Algorithm 14.2)then will takep stepsto computeallprefixes.Thuswe getthe following lemma,

Lemma 14.4Prefixcomputationon a p-processorlineararray canbeperformed in p steps, \342\226\241

A similaralgorithmcan beadoptedon a meshalso.Considera ^/p x ^Jpmeshin whichthereis an elementof Sat eachprocessor.Sincethe meshis atwo-dimensionalstructure,thereis no natural linearorderingof theprocessors, We couldcomeup with many possibleorderings.Any suchorderingofthe processorsis calledan indexingscheme.Examplesof indexingschemesare row major,column major,snakelikerow major,blockwisesnakelikerow


Processori (in parallelfor 1< i < n) does:if (i = 1) processor1sendsx\\ to the right in step1;else

if (i = n) processorn receivesan element(callit zn-i)in step n from processorn \342\200\224 1,computesand storeszn \342\200\224

\\ \302\251 xn\\ Glse

processori receivesan element(callit Zi-\\) in stepi from processor(i \342\200\224 1),computesand storesZ{ =

and sendsZ{ to processori + 1;

Algorithm14.2Prefixcomputationon a lineararray

m,ajor, and soon (seeFigure14,11),In the row majorindexingscheme,processorsare orderedas (1,1),(1,2),..., (l,n),(2,1),(2,2),,,,, (2,n),,,,,(n,n).In the snakelikerow majorindexingscheme,they are orderedas(1,1),(1,2),...,(l,n),(2,n),(2,n-l),..., (2,1),(3,1),(3,2),..., (n,n);thatis,it, is the sameas the row majororderingexceptthat alternaterowsreverse. In the blockwisesnakelikerow majorindexingscheme,the mesh ispartitionedinto smallblocksof appropriatesize.Within eachblocktheprocessors can be orderedin any fashion. The blocksthemselvesare orderedaccordingto the snakelikerow majorscheme.

Theproblemof computingprefix sumson the meshcan be reducedtothreephasesin eachof which the computationis localto the individualrows or columns(Algorithm 14,3).This algorithmassumesthe row majorindexingscheme.The prefix computationsin phases1and 2 takey/p stepseach(c.f.Lemma14.4),the shifting in phase2 takesone step,and thebroadcastingin phase 3 takesy/p steps.The final updateof the answersneedsan additionalstep.

Theorem14.3Prefixcomputationon a ^/p x v/p meshin row majorordercan beperformedin 3^/p + 2 = 0(s/p)steps. \342\226\241

Example14.5Considerthe dataon the 4x4meshof Figure14.12(a)andthe problemof prefix sumsunderthe row majorindexingscheme.In phase1,eachrow computesits prefix sums (Figure14.12(b)).In phase2, prefixsumsarecomputedonly in the fourth column(Figure14.12(c)),Finally, inphase3,the prefix sumsareupdated(Figure14.12(d)). \342\226\241

CHAPTER14.MESH

12 3 4 12 3 45678 87659 101112 9 10 111213 14 15 16 1615 1413

row major snakelikerow major

1

2

3

4

5

6

7

8

9

101112

13141516

column major blc

Examplesof indexingsche:

column major blockwisesnakelikerow major

Examplesof indexingschemes

e-

e-

g-

\\r

\\r

\342\226\240i i

H &

V 04 2-

(a)

a\342\200\224:*-

[ It

l 3-(c)

-2

-I

3

1 2

A_ 3_

131416 15

5 6

_8 7

9 101211

(b)

8 h

HI

H\342\200\22412 14 17(d)

blockwisesnakelikerow major

ing schemes

e\342\200\224i

\342\226\240\\r 1

(b)

G t

nH\342\200\22412 14 17

(d)


ing schemes

e\342\200\224i

\342\226\240\\r 1

(b)

G t

nH\342\200\22412 14 17

(d)


ing schemes

6\342\200\224i

\342\226\240\\r 1

(b)

G t

nH\342\200\22412 14 17

(d)

Figure14.12Prefixcomputation


Phase1. Row i (for i = 1,2,.,.,^/p)computesthe prefixesof its y/p elements.At the end, the processor(i,j)has y(jj) =

2^q=ix(i,g)-

Phase2. Only column y/p computesprefixesof sumscomputed in phase 1. Thus at the end, processor(i,y/p) has

z{i,Jp)= Sg=iV{q,Jp)- After the computationof prefixesshiftthem down by one processor;i.e.,have processor(i,y/p) sendz(hVP) to Processor(* + !>Vp) (for i = !'2'\342\200\242\342\200\242\342\200\242'VP

-!)\342\200\242

Phase3.Broadcast2(i,^p)in row i +1(for i = 1,2,...,y/p\342\200\224

1).Node j in row i + 1finally updatesits resultto2(j,^p)\302\251 y(i+i,j)-

Algorithm14.3Prefixcomputationon a mesh

Prefixcomputationswith respectto many other indexingschemescanalsobeperformedin 0(y/p) timeon a y/p x y/p mesh(seethe exercises).

14.3.3DataConcentrationIn a p-processorinterconnectionnetwork assumethat thereared < p dataitemsdistributedarbitrarily with at most onedataitemperprocessor.Theproblemof dataconcentrationis to move the data into the first d processorsof the network onedataitemperprocessor.This problemis alsoknown aspacking.In the caseof a p-processorlineararray, we have to move the datainto the processors1,2,...,d. On a mesh,we might requirethe data itemsto move accordingto any indexingschemeof our choice.For example,thedatacouldbe moved into the first \\~%] rows.

Dataconcentrationonany networkis achievedby first performinga prefixcomputationto determinethe destinationof eachpacketand then routingthe packetsusingan appropriatepacketroutingalgorithm.

Let \302\243 bea p-processorlineararray with d data items.To find thedestination of eachdata item,we make use of a variable x. If processori has adata item,then it setsXi = 1;otherwiseit setsx%

= 0. Let the prefixesofthe sequencexi,X2,...,xpbeyi,y2,...,yp.If processori has a data item,then the destinationof this itemis yi. Thedestinationsfor the data itemshaving beendetermined,they arerouted.Prefixcomputation(Lemma14.4)


intial data location prefixcomputationa be da be d\\ \302\253> } \\ $ 1 J * % ? ? 4

a

12 3 4 5

final data location

Figure14.13Dataconcentrationon a lineararray

as well as packetroutingon a lineararray (Lemma14.1)takesp timestepseach.Thus the total run timeis 2p.

Example14.6Considera six-processorlineararray in which there is anitemin processors1,3,4, and6.Then, (x\\, X2, X3, X4, x$,x^) = (1,0,1,1,0,1)and (2/1,2/2,2/3,2/4,2/5,2/6)= (1,1,2,3,3,4).So,the itemswill be sent to theprocessors1,2, 3,and 4 as expected(seeFigure14.13). \342\226\241

On a mesh too,the samestrategy of computingprefixesfollowed bypacketroutingcanbeemployed.Prefixcomputationtakes3^+2steps(c.f.Theorem14.3),whereaspacketroutingcan be done in 3^/p + 0(pllA\\ogp)steps(c.f.Theorem14.1).Example14.7Figure14.14showsa meshin whichtherearesixdataitemsa,b,c,d,e,f,and g. The parallelvariable x takesa value of onecorresponding toany elementand zerootherwise.Prefixsumsarecomputedonx\\,X2,---,xi6,and finally the data itemsare routed to their destinations.We have assumedthe row majorindexingscheme. \342\226\241

Theorem14.4Dataconcentrationon a p-processorlineararray takes2pstepsor less.On a y/p x y/p mesh,it takes6^/p+0(px/4logp)steps. \342\226\241

14.3.4 SparseEnumerationSortAn instanceof sortingin which the numberof keys to besortedis much lessthan the network sizeis referredtoas the sparseenumerationsort.If the


t) 1)

I\342\200\224I

a(

j :

I s

) ;

c5 1

i <

\302\243 <

bI J.

i 3

d4

t^,

a

e f

initial data location prefix computation final data location

Figure14.14Dataconcentrationon a 4 x 4 mesh

network sizeis p, the numberof keys to be sortedis typically assumedtobepe for someconstante <

\\. In the following discussionwe assumethate =

\\. On the mesh,sparseenumerationsortcanbedoneby computingtherank of eachkey and routingthe key to its correctpositionin sortedorder(seethe proofof Theorem13.15).

Let the sequenceto be sortedbe X = k\\, k2, \342\226\240\342\226\240

\342\226\240, k/p. We needto sortX usinga yfp x ^Jpmesh.Assume that the key kj is input at the processor(1,j) (for j = 1,2,...,y/p). We alsorequirethe final output to appearin thefirst row of the meshin nondecreasingorder,onekey perprocessor.To beginwith, kj is broadcastin columnj sothat eachrow hasa copyof X. In row icomputethe rank of k{.This is doneby broadcastingki toallprocessorsinrow i followed by a comparisonof ki with every key in the input and then bya prefixcomputation.The rank of kj is sent to processor(1,j). Finally, thekeys areroutedto their correctdestinationsin sortedorder.In particular,the key whose rank is r is sent to the processor(l,r).A formal descriptionof this algorithmappearsas Algorithm 14.4.

Algorithm 14.4is a collectionof operationslocalto the columnsor therows. The operationsinvolved are prefix computation,broadcast,routing(seeExercise10),and comparisoneachof whichcanbedonein 0(y/p) time.Thus the whole algorithmruns in time0(y/p).

Theorem14.5Sparseenumerationsortcan be completedin 0(y/p) timeon a y/p x yjp meshwhen the numberof keys to besortedis at most y/p. \342\226\241

Example14.8Considerthe problemof sortingthe four keyski,k2l k%, k^ =8,5, 3, 7 on a 4 x 4 mesh.Input to the meshis given in the first row(Figure 14.15(a))and the output shouldalsoappearin the samerow (Figure


Step1.In parallel,for 1< j < y^j, broadcastkj alongcolumnJ-

Step2.In parallel,for 1< i < ^/p, broadcastfcj alongrow i.Step3. In parallel,for 1< i < y/p, computethe rank of hi inrow i usinga prefix sumscomputation.

Step4. In parallel,for 1< j < ^/p, send the rank of key kj to(1,J).Step5.In parallel,for 1< r < y/p, routethe key whoserank isr to the node(l,r).

Algorithm14.4Sparseenumerationsort

14.15(e)).In step 1of the algorithm,keys are broadcastalong columns(Figure14.15(b)).In step2,ki is broadcastalongrow i (for 1< i <4). At

the end of step4, the ranks of keys are available in the first row. Figure14.15(d)shows the keys and theirranks.Finally, in step5,keys areroutedaccordingto their ranks (Figure14.15(e)). \342\226\241

EXERCISES1.Let xi,X2,\342\226\240\342\226\240\342\226\240,xn be elementsfrom S in which \302\251 is an associative

unit timecomputableoperator.The suffix computationproblemis tocomputex\\\302\256X2\302\256-

\342\200\242

\342\200\242\302\256xruX2\302\256x?>\302\256-

\342\226\240-\302\256xn,...,xn-i\302\256xn,xn. Presentan 0(p)timealgorithmfor the suffix computationproblemon ap-nodelineararray.

2. Showhow tosolvethe suffix computationproblemon a y/p x ^p meshin time0(y/p).

3. Computethe prefixsumson the meshof Figure14.16for the followingindexingschemes:row major,snakelikerow major,columnmajor,andsnakelikecolumnmajor.

4. Show that prefix computationswith respectto the following indexingschemescan alsobe performedin 0(^/p) timeon a y/p x ^/p mesh:


8 5 3 7

(a) (b)

(d)

7 83 5r8 33 7,8

3 5;5 3;3 7,5

3 5^\"\342\200\2243; 3 7,3

^ \302\247L7 tT- hl(0

87* 5It\342\200\224\"3 ri /,3 :s ;s '' 8

(e)

Figure14.15Sparseenumerationsorton a mesh

3 4 125 6 3 415 8 6

7 9 2 3



snakelikecolumnmajorand blockwiserow major(wherethe blocksareof sizep1/4xp1/4).Employ the row majorindexingschemewithineachblock.

5. On a p-processorlineararray therearep itemsper processor.Showhow you computethe prefixesof thesep2 itemsin 0(p)time. Theindexingschemeto beusedis the following:all the itemsin processor1are orderedfirst, all itemsin processor2 are orderednext,and soon.

6. On a y/p x y/p meshtherearey/p itemsperprocessor.Show how youcomputethe prefixesof thesep^fp itemsin 0(y/p) time.Usethe sameindexingschemeas in Exercise5.

7. Let f(x) = anxn + an-\\xn~l+ \342\200\242\342\200\242\342\200\242+ a\\x + ciq. Presentlinear arrayand meshalgorithmsto evaluatethe polynomial / at a given point y.What are the run timesof your algorithms?

8.Presentefficient lineararray and meshalgorithmsfor the segmentedprefix problem(seeSection13.3,Exercise5) and analyze their timecomplexities.

9.You are given a sequenceA of p elementsand an elementx. Youare to rearrangethe elementsof A sothat allelementsof A that are< x appearfirst (in successiveprocessors)followed by the restof theelements.Presentan 0(p)timealgorithmon ap-processorlineararrayand an 0(y/p) timealgorithmon a y/p x y/p meshfor this.

10.Presentan 0(^/p) timedeterministicalgorithmfor step5 of Algorithm14.4.11.Let A be a sequenceof p keys. Show how you computethe rank of a

given key x in A on a p-processorlineararray as well ason a y/p x y/pmesh.Therun timesshouldbe 0(p)and 0(y/p), respectively.

12.Let M. bea y/p x y/p meshand letA bea y/p x y/p matrixstoredin M.in row majororder,oneelementperprocessor.Considerthe followingrecursivealgorithmfor transposingA.

(a) Partitionthe matrixinto four submatricesof size^ x -^r each;An A12A21 A22

let the partitionbe

(b) InterchangeA12 with A21 \342\226\240

(c) Recursivelytransposeeachsubmatrix.

14.4.SELECTION 691

Show that this algorithmis correctand alsodeterminethe run timeofthis algorithm.

13.MatricesA and Baretwo y/p x y/p matricesstoredin a y/p x ^Jpmeshin row majororder.Showhow to multiply them.What is the run timeof your algorithm?

14.Show how tocomputethe FFT(seeSection9.3)of a vectorof lengthp in 0(p)timeon a p-processorlineararray.

15.Implementthe FFTalgorithmof Section9.3on a y/px y/p mesh.Whatis the run timeof your algorithm?

14.4 SELECTIONGiven a sequenceof n keys and an integeri, 1 < i < n, the problemofselectionis to find the ith smallestkey from the sequence.We have seenboth sequentialalgorithms(Section3.6)and PRAM algorithms(Section13.4)for selection.We considertwo different versions of selectionon themesh. In the first version we assumethat p \342\200\224 n, p being the numberofprocessorsand n beingthe numberof input keys. In the secondversion weassumethat n > p. In the caseof a PRAM, the slow-downlemmacan beemployed to derive an algorithmfor the secondcasegiven an algorithmforthe first caseand preservethe work done.But no suchgeneralslow-downlemmaexistsfor the mesh.Thus it becomesessentialto handlethe secondversion separately.

14.4.1A RandomizedAlgorithmfor n = p (*)The work-optimalalgorithmof Section13.4.5can be adaptedto runoptimally on the meshalso.A summary of this algorithmfollows. If X =k\\,k'2,.\342\226\240\342\226\240,kn is the input, the algorithmchoosesa randomsample(callit S)from X and identifies two elementsl\\ and l<i from S.Theelementschosenaresuch that they bracketthe elementto beselectedwith highprobabilityand alsothe numberof input keys that are in the range [Zi, I2] is small.

After choosingl\\ and I2, we determinewhetherthe elementto beselectedis in the range [l\\, I2].If this is the case,we proceedfurther and the elementto beselectedis the

(i\342\200\224 |.X\"i|)th elementof X2. Ifthe elementto beselectedis not in the range [Zi,Z^], we start allover again.

Theaboveprocessof samplingand eliminationis repeateduntil thenumber of remainingkeys is < n0'4. After this, we performan appropriateselection from out of the remainingkeys usingthe sparseenumerationsort(Theorem14.5).For moredetails,seeAlgorithm 13.9.


A stagerefersto one run of the while loop.As shown in Section13.4.5,thereareonly 0(1)stagesin the algorithm.

Step1of Algorithm 13.9takesO(l)timeon the mesh.Theprefixcomputations of steps2 and 5 can bedonein a totalof 0(y/p) time(c.f.Theorem14.3).Concentrationof steps3 and 6 takes0(y/p) timeeach(seeTheorem14.4).Also, sparseenumerationsort takesthe sametime in steps3 and6 in accordancewith Theorem14.5.Theselectionsof steps4 and 6 takeonly 0(1)timeeachsincetheseareselectionsfrom sortedsequences.Thebroadcastsof steps2,4, and 5 take0(^/p) timeeach(c.f.Theorem14.2).As a result we arrive at the following theorem.

Theorem14.6Selectionfrom n = p keys canbeperformedin O(yfp) timeon a y/p x ^Jp mesh. \342\226\241

14.4.2 RandomizedSelectionfor n >p (*)Now we considerthe problemof selectionwhen the numberof keys is largerthan the network size.In particular,assumethat n = pc for someconstantc > 1.Algorithm 13.9can be used for this caseas well with someminormodifications.Eachprocessorhas - keys to beginwith. Theconditionforthe whilestatementis changedto (N > D) (whereD is a constant).In step1a processorincludeseachof its keyswith probability Afl_(11/3c).So,thisstepnow takestime^. Thenumberof keys in the sampleis 0(Nllic)= o(^/p).Step2 remainsthe sameand stilltakes0(y/p) time. Sincethereare only0(N1'3c)samplekeys, they canbeconcentratedand sortedin step3 in time0(y/p) (c.f.Theorems14.4and 14.5).Step4 takesO(yfp) timeas dosteps5 and 6.So,eachstagetakestime0(-+ y/p).

Lemma13.3can be used to show that the numberof keys that surviveat the endof any stageis < 2v/aiV(1-(1/6c))J^N = 0(N^-^^C^ y/E^N),whereN is the numberof alive keys at the beginningof this stage.This inturn impliesthereareonly 0(loglogp)stagesin the algorithm.In summary,we have the following theorem.

Theorem14.7If n = pc for someconstant c > 1,selectionfrom n keyscan beperformedon a ^fp x y/p meshin timeO ((^ + ^/p) loglogpJ. \342\226\241

14.4.3 A DeterministicAlgorithmFor n >p

In this sectionwe presenta deterministicalgorithmfor selectionwhoseruntimeis 0(-loglogp+ y/p logn).Thebasicideabehindthis algorithmisthe sameas the oneemployedin the sequentialalgorithmof Section3.6.The

14.4.SELECTION 693

sequentialalgorithmpartitionsthe input into groups(of size,say, 5), findsthe medianof eachgroup,and computesrecursivelythe median(callit M)of thesegroupmedians.Then the rank tm of M in the input is computed,and as a result,allelementsfrom the input that areeither<M or >M aredropped,dependingon whetheri > r^ or i <tm,respectively.Finally, anappropriateselectionis performedfrom the remainingkeys recursively. Wcshowed that the run timeof this algorithmwas 0(n).

If one has toemploy this algorithmon an interconnectionnetwork, onehas toperform periodicloadbalancing(i.e.,distributingthe remainingkeysuniformly amongall the processors).Load balancingis a time-consumingoperationand can be avoided as follows. To begin with, eachprocessorhas exactly- keys. As the algorithmproceeds,keys get droppedfromfuture consideration.Therearealwaysp groups(onegroupperprocessor).Theremainingkeys at eachprocessorconstituteitsgroup.We identify themedianof eachgroup.Insteadof pickingthe medianof thesemediansas thesplitterkey M, we choosea weightedmedian of thesemedians.Eachgroupmedianis weightedwith the numberof remainingkeys in that processor.Definition14.1Let X = ki,k2l---,kn be a sequenceof keys, wherekeyhi has an associatedweight wl7 for 1 \\and J2kiex,ki>kwh > -f-- In otherwords, the totalweight of allkeys of Xthat are< kj shouldbe > ~ and the totalweight of allkeys that are> kjalsoshouldbe > ^y. \342\226\241

Example14.9Let X = 9,15,12,6, 5, 2, 21,17and let the respectiveweightsbe 1,2,1,2, 3,1,7, 5. HereW = 22.The weightedmedianof X is 17.Oneway of identifying the weightedmedianis to sortX; let the sortedsequencebe k[,k'2, \342\226\240\342\226\240

\342\226\240, k'n; let the correspondingweight sequencebe w\\,w2,...,w'n;and computethe prefix sumsy\\, y2l \342\226\240\342\226\240. , yn on this weight sequence.If yj isthe leftmost prefix sum that is > ^-, then kj is the weightedmedian.

For X, the sortedorderis 2,5,6,9,12,15,17,21and the correspondingweights are 1,3,2,1,1,2, 5, 7. Theprefix sumsof this weight sequenceare1,4,6,7, 8,10,15,22.Theleftmost prefixsum that exceeds11is 15and hencethe weightedmedianis 17. \342\226\241

The deterministicselectionalgorithmmakesuse of the techniquejustdescribedfor finding the weightedmedian.To beginwith, thereareexactly^ keys at eachprocessor.We needto find the ith smallestkey. Thedetaileddescriptionof the algorithmappearsas Algorithm14.5.HereD is aconstant.

Example14.10Considera 3 x 3 meshwheretherearethreekeys at eachprocessorto beginwith. Also let i = 8. Let the input be 11,6,3,18,2,14,


N :=n;Step0.If \\og(n/p)is < loglogp,then sortthe elementsat eachprocessor;elsepartitionthe keys at eachprocessorinto logpequalpartssuchthat thekeys in eachpart are< keys in the partstothe right.while (iV > D) do{

Step1. In parallelfind the medianof keys at eachprocessor.Let Mq be the medianand Nq be the numberof remainingkeysat processorq, 1< q <p.

Step 2. Find and broadcast the weighted median ofMi,M2,...,Mp, wherekey Mq has a weight of Nq, 1< q < p.Let M be the weightedmedian.

Step3.Count the rank tm of M from out of all remainingkeysand broadcastit.

Step4. If i < rM, then eliminateall remainingkeys that are> M; elseeliminateallremainingkeys that are<M.

Step5. Computeand broadcastE, the numberof keyseliminated. If i > rMi then i :=i ~ E;N :=N \342\200\224 E;

}

Output the ith smallestkey from out of the remainingkeys.

Algorithm14.5Deterministicselectionon a ^Jpx y/p mesh

14.4.SELECTION 695

10,17,5,21,26,27,12,7,25,24,4,9,19,20,23,15,8,22,1,13,6.Figure14.17shows the stepsin selectingthe ith smallestkey. It is assumedthat partsareof size1in step0 of Algorithm 14.5,to make the discussionsimple.

Themedianof eachprocessoris found in step1.Sinceeachprocessorhasthe samenumberof keys, the weightedmedianof thesemediansis nothingbut the medianof these.The weightedmedianM is found to be 14in step2. The rank tm of this weighed medianis 14.Sincei < tm, aU the keysthat aregreaterthan or equal to 14aredeleted.The i remainsthe same.This completesone run of the while loop.

In the next run of the while loop,the weighted median is found bysorting the localmedians.The localmediansare 3,2,5,7,4,8,6.Theircorrespondingweights are 2,1,2,2,2,1,3.Sortedorderof thesemediansis 2,3,4,5,6,7,8,the respectiveweights being 1,2,2,2,3,2,1.Thus theweightedmedianM is found to be 5. The rank rj,[of M is 5. So,keysthat are lessthan or equal to 5 get eliminated.The value of i becomes8-5= 3.

In the third run of the while loop,thereareeight keys to begin with.Theweightedmedianis found to be 8,whose rank happensto be 3,whichis the sameas the value of i. Thusthe algorithmterminatesand 8 is outputas the correctanswer. \342\226\241

In step0,the partitioningof the elementsinto logp partscanbedonein|loglogp time(seeSection13.4,Exercise5).Sortingcan be done in time0(~logr~). Thusstep0 takestime|min {log(n/p),loglogp}.At the endof

step0,the keys in eachprocessorhave beenpartitionedinto approximatelylogpapproximatelyequalparts. Calleachsuchpart a block.

In step1,we can find the medianat any processoras follows.Determinefirst the blockthe medianis in and then performan appropriateselectioninthat block(usingAlgorithm 3.19).Thetotaltimeis 0{-^~).

In step2,we can sort the mediansto identify the weightedmedian.IfM{,M!2,\342\200\242..,

M' is the sortedorderof the medians,then we needto identify

j such that X/jUi^'k \342\200\224 T an(^ X/fc=i N'k < y. Sucha j can be computedwith an additionalprefix computation.Sortingcan be done in 0(s/p)time(as we show in Section14.6).The prefix computationtakes0{s/p)timeaswell (seeTheorem14.3).Thus M, the weightedmedian,can be identifiedin time0(^/p).

In step 3, eachprocessorcan identify the numberof remainingkeys inits queueand then all processorscan performa prefix sumscomputation.Therefore,this steptakes0(^/p) time.

In step 4, the appropriatekeys in any processorcan be eliminatedasfollows. First identify the blockB that M falls in. This can be done in

O(logp) time. After this, we compareM with the elementsof blockB todeterminethe keys to be eliminated.If i > tm {i < ?\"m), of courseall


Proc. (1,1)(1,2) (1,3) (2,1) (2,2) (2,3) (3,1) (3,2) (3,3)

11\302\251

3

182

<y>

\302\251

175

21\302\256

27

\302\251

725

244

(2)

19\302\251

\302\256)8

23 22

Weighted median is 14.

\302\251_

\302\251_


\302\251


113

\302\251

11\302\251

\302\251

10\302\251

12\302\251

\302\251-

9-

\302\251

113

\302\251

13\302\251

Answer is 8.

Figure14.17Deterministicselectionwhen n >p

14.4.SELECTION 697

blocksto the left (right)of B areeliminateden masse.Totaltimeneededis0(logpH\342\200\2241^-^),

which is 0{ t\" ) sincen = pc for someconstantc.Step5 takes0{^/p) time,sinceit involves a prefix computationand a

broadcast.Broadcastingof steps2,3, and 4 takes0(^/p) timeeach(c.f.Theorem

14.2).Thuseachrun of the while looptakesQ( ,\" + ^/p) time.How many keys areeliminatedin eachrun of the while loop?Assume

that i > r^ in a given run. (Theothercasecan be arguedsimilarly.)Thenumberof keys eliminatedis at leastYJk=i ~~2~ >

which is > ^-. Therefore,it follows that the while loopis executedO(logn) times. Thus we get(assumingthat n = pc and hencelognis asymptotically the sameas logp)the following theorem.

Theorem14.8Selectionfrom n keyscanbeperformedona y/p x ^fp meshin time0(^loglogp+ ^Jp logn). \342\226\241

EXERCISES1.Considerthe selectionproblemon a ^fp x ^fp mesh,wherep = n.

Let A be a medianfinding algorithmthat runs in timeT(v/p). Howcan you make use of A tosolve an arbitrary instanceof the selectionproblemand what is the resultant run time?

2.Presentan efficient algorithmfor finding the A:th quantilesof any givensequenceof n keys on a y/p x y/p mesh.Considerthe casesn = p andn >p.

3. Given an array A of n elements,presentan algorithmto find anyelementof A that is greaterthan orequalto the medianon a ^/p x ^fpmesh.Your algorithmshouldrun in time2^/p +d(^/p). Assume thatp = n.

4. Considera y/p x y/p meshin whichthereis a key at eachnodeto beginwith. Assume that the keysareintegersin the range [0,p\302\243

\342\200\224 1],whereeis a constant< 1.Designan efficient deterministicselectionalgorithmfor this input. What is the run timeof your algorithm?

5. Develop an efficient deterministicselectionalgorithm for the meshwhen n = p. Also assumethat i (the rank of the elementto beselected)is either<pe or >p \342\200\224 p( for somefixed e < 1.What is the runtimeof your algorithm?

6.Designa deterministicalgorithmfor selectionwhen n = p.Youralgorithm shouldhave a run timeof O(^)on a ^fp x ^Jp)mesh.


Figure14.18Computingthe ranksof keys

14.5 MERGINGTheproblemof mergingis to taketwo sortedsequencesas input and producea sortedsequenceof allelements.This problemwas studiedin Chapters3and 13.

14.5.1Rank Mergeona LinearArrayThe mergeby ranking algorithmof Section13.5.1can be implementedto run in lineartime on a lineararray. Let \302\243 be a lineararray withp = m processors.The input sequencesare X\\ = k\\, hi, \342\226\240\342\226\240\342\226\240, km and X2 \342\200\224

km+i,km+2,\342\226\240\342\226\240

\342\226\240, &2m- To beginwith, processori has the keys ki and ki+m.Followingthe merge,the two smallestkeys are in processor1,the next twosmallestkeys are in processor2,and soon.We show how to computetherank of eachkey k \342\202\254 X\\ and routeit to its right place. An analogousalgorithm can be appliedfor X2 also.Processori initiatesa counterq witha value of 0. A packetcontainingq togetherwith the value of k{ is sentalongboth directions.(If i = 1or m, it is sent in only onedirection.)Thetwo copiesof Cj travel all the way up to the two boundariesand comebacktoi (seeFigure14.18).Processorj on receiptof c4 increments

c\302\273 by one if

kj < kf, otherwiseit doesn'talterCj. (This incrementoccursonly when Cjis in its forward journey.) In any caseit forwards Cj to its neighbor.Whenthe two copiesof Ci return to processori,the rank of hi canbecomputedbysummingthe two copiesand addingone.Thetimeneededfor rankcomputations is 2{p\342\200\224 1)or less.Oncewe know the ranksof the keys, they can beroutedin time0(p)usingLemma14.1.In particularif rj is the rank of hi,this key is sent to processor[?f].None of the Cj'sget queuedsinceno twocounterscontendfor the samelink ever.

Lemma 14.5Mergingtwo sortedsequenceseachof lengthp can becompleted in 0(p)timeon a p-processorlineararray. \342\226\241

14.5.MERGING 699

(a) (b) (c)

Oj \302\243, 6>2 \302\2432 0, 6>2 E, E2

k~^ i^~^ k\"~^

)_ *^_f ?-$-^H$J\342\200\224\302\245-*--$--MF-*-? J--?-M^-*-*-^-$(d) (e) (0

Figure14.19Odd-evenmergeon a lineararray

14.5.2Odd-EvenMergeona LinearArrayTheodd-evenmergealgorithmwas describedin Section13.5.2(Algorithm13.10).On a p = 2m-processorlineararray assumethat X\\ is input inthe first m processorsand X? is input in the next m processors.In step1of Algorithm 13.10,X\\ and

X<\302\261are separatedinto oddand even parts

Oi,E],0'2,and E-2- This takesy stepsof data movement.Next,E\\ and02are interchanged.This alsotakes ^ steps.In step 2, 0\\ is mergedrecursivelywith O2 to get O.At the sametimeE\\ is mergedwith E<i to getE. In step3, O and E are shuffled in a totalof < m data movement steps.Finally, adjacentelementsarecomparedand interchangedif out of order.IfM(m) is the run timeof this algorithmon two sequencesof lengthm each,then we have M(rn) < M(m/2)+1m+ 1which solves to M(m) = 0(m).Lemma 14.6Two sortedsequencesof lengthm eachcan be mergedon a2m-processorlineararray in 0(m)time. \342\226\241

Example14.11Figure14.19showsthe mergingof two sortedsequencesoflengthfour eachon an 8-nodelineararray. Separationof the sequencesintotheir oddand even parts is shown in Figure14.19(b).In Figure14.19(c),O2 and E\\ are interchanged.0\\ and O2 as well as E\\ and E<i arerecursively mergedto get O and E, respectively (Figure14.19(d)).Next O andE are sliufiled (Figure14.19(e)).A comparison-exchangeoperationamongneighborsis performedto arrive at the final sortedorder(Figure14.19(f)).

\342\226\241

14.5.3Odd-EvenMergeona MeshNow we considera ^/p x ^/p mesh.Assume that the two sequencesto bemergedare input in the first and secondhalves of the mesh in snakelike


-5 6-,44\342\200\2248-

15 18125 20126 29135 361253

6>

-2\342\200\2245-

-8 7-15 1926 2032 3642 4046 56

(d) (e) (f)

Figure14.20Odd-evenmergeon the mesh

row majororder(seeFigure14.20(a)).TheX\\ andX<\302\261

aresnakeswith ^columnsand ^/p rows each.The final mergewill bea snakeof size^Jpx ^/p(as in Figure14.20(f)).Assume that y/p is an integralpower of 2. As thealgorithmproceeds,moreand moresnakesarecreatedallof which have thesamenumberof rows. Only the number \302\243 of columnswill diminish. Thebasecaseis when \302\243

\342\226\240= 1. A completeversion of the algorithmis given in

Algorithm 14.6.This algorithmmergestwo snakeswith \302\243 columnseach.Let

M{\302\243)be the run timeof Algorithm 14.6on two sortedsnakeswith \302\243

columnseach.In step0,we have to mergetwo sortedcolumns.Notethat the algorithm

of Lemma14.5canbeusedsincethe datafrom onecolumncanbemoved tothe othercolumnin onestepand then the algorithmof Lemma14.5applied.This takes0(^/p) time.

Steps1,2, and 4 take< |,|,and \302\243 stepsof datamovement,respectively.Step3 takesM(|)time.Thus, M{\302\243)

satisfiesM{\302\243)

<M(|)+ 2\302\243,which on

solutionimpliesM(\302\243)< 4\302\243 +M(l);that is,M{jp/2)= 0(^/p).

*1-49\342\200\2242

-37 3240 411

-48 4656 65

(a) (b) (c)

-44\342\200\2249J

18 25129 28135 37148 411

-53 65

-#=^5-^44-ll5 IOC^29 26\"~1l8^32 35^6^48- 4i~%\\-^\342\226\24046 5f~^6 65

44\342\200\2249 8 7-15 18\342\200\22419 2029 28\342\200\22426 25

36 37-41 4056 65

14.6.SORTING 701

Step0. If \302\243

= 1,mergethe two snakesusingLemma14.5.

Step1. PartitionX\\ into its oddand even parts, 0\\ andE\\, respectively.Similarly partitionXi into Oi and E<i- PartsOi,\302\243a,C>2i and E<i are snakeswith|columnseach(seeFigure14.20(b)).Step2.InterchangeO^ with E\\ as in Figure14.20(c).Step3.Recursivelymerge0\\ with 02to get the snakeO. At

the sametimemergeE\\ with E<i to get the snakeE. (SeeFigure14.20(d)).Step4. Shuffle O with E (seeFigure14.20(e)).Compareadjacent elementsand interchangethem if they areout of order.

Algorithm14.6Theodd-evenmergealgorithmon the mesh

Theorem14.9Two sortedsnakesof sizey/p x ^ eachcan be mergedintime0{yjp)on a y/p x y/p mesh. \342\226\241

14.6 SORTINGGivena sequenceof n keys, recallthat the problemof sortingis to rearrangethis sequencein eitherascendingor descendingorder. In this sectionwestudy severalalgorithmsfor sortingon both a lineararray and a mesh.

14.6.1Sortingona LinearArrayRank sortThefirst algorithm,we aregoingto study, rank sort,computesthe rank ofeachkey and then routesthe keys to their correctpositions.If therearepprocessorsin the lineararray with one key per processor,the ranks of allkeys can be computedin 0(p)timeusingan algorithmsimilarto the oneemployedin the proofof Lemma14.5.Followingthis, the key whoserank isr is routedtoprocessorr. This routingalsotakes0(p)time(Lemma14.1).Thus we get the following lemma.


for i :=1top doIf i is odd,compareand exchangekeys at processors2j \342\200\224 1and 2j for j = 1,2,...; elsecompareandexchange keysat processors2jand 2j+1for j = 1,2,....

Algorithm14.7Odd-eventranspositionsort

j I ? I ? j ? T 11 j 1 ? ? 4> ? T 1 I j ? ? ? ft 7(a) (b) (c)

i p? o f^ t p? n> fi Pt i ? ? i ^ ^ t ?(d) (e) (f)

Figure14.21Odd-eventranspositionsorton a lineararray

Lemma 14.7A totalof p keys can besortedon a p-processorlineararrayin 0(p)time. \342\226\241

Odd-eventranspositionsortAn algorithmsimilarto bubblesortcanalsobeusedto sorta lineararray in

0(p)time. This algorithm(Algorithm 14.7)is alsoknown as the odd-eventranspositionsort.\"Compareand exchange\"refersto comparingtwo keysand interchangingthem if they are out of order.Each iterationof the forlooptakesonly 0(1)time.Thusthe wholealgorithmterminatesin 0(p)timesteps.Thecorrectnessof this algorithmcan be proved usingthe zero-oneprincipleand is left as an exercise.

Lemma 14.8The odd-eventranspositionsort runs in 0(p)timeon a p-processorlineararray. \342\226\241

Example14.12Let p = 8 and let the keys to besortedbe4,5,1,8,2,6, 3,7.Figure14.21shows the stepsof the odd-eventranspositionsort. \342\226\241

14.6.SORTING 703

Odd-evenmergesortThelastalgorithmwe study on the lineararray is basedon mergesort;thatis,it makesuseof a knownmergingalgorithmin orderto sort.If therearepkeys on a p-processorlineararray, we can recursivelysortthe first half andthe secondhalf at the sametime. Oncethe resultsare ready, they can bemergedusingthe odd-evenmergealgorithm(Lemma14.6).Theresultantodd-evenmergesorthas a run timeT(p) = T(p/2)+0(p),which solvestoTip) = 0(p).Lemma 14.9Odd-evenmergesortruns in 0(p)timeonap-processorlineararray. \342\226\241

14.6.2 Sortingona MeshWe study two different algorithmsfor sortingon a mesh.Thefirst is calledShearsortand takes0{^/plogp)timeto sorta y/p x y/p mesh.The secondis an implementationof odd-evenmergesort.Thisalgorithmruns in 0(^/p)timeand henceis asymptotically optimal.

ShearsortThis algorithm(Algorithm 14.8)works by alternatelysortingthe rows andcolumns.If thereis a key at eachprocessorof a ^fp x ^fp mesh,therearelogp+ 1phasesin the algorithm.At the end, the meshwill be sortedinsnakelikerow majororder.Sincea lineararray with ^fp processorscan besortedin 0{^/p) time(c.f.Lemma14.8),Algorithm 14.8runs in a totalof

0(v/P(1\302\260gP + 1)) = 0{^/p\\ogp)time.

Example14.13Considerthe keys on a 4 x 4 meshof Figure14.22(a).Inphase1,we sort the rows, sortingalternaterows in oppositeorders.Theresult is Figure14.22(b).The resultsof the next four phasesareshown in

Figure14.22(c),(d), (e),and (f), respectively.At the end of the fifth phase,the meshis sorted. Q

Note that Algorithm 14.8is comparisonbasedand is alsooblivious andhencethe zero-oneprinciplecan be used to prove its correctness.Assumethat the input consistsof only zerosand ones.Definea row to be dirty if ithas both onesand zeros,cleanotherwise.Notethat if the mesh is sorted,therewill beonly onedirty row and the restof the rows will eitherhave allonesor all zerosand hencewill be clean.To beginwith, therecouldbe asmany as ^/p dirty rows; that, is,eachrow couldbe dirty.

Calla stageof the algorithmto be sortingall rows followed by sortingall columns(i.e.,a stageconsistsof two phases). We show that if N is


(a) (b) (c)

15

7

2

18

1213

16

11

8

6

19

5

32

17

25

3

8

17

2

18

1213

16

11

15

7

19

5

32

6

25

3

2

8

17

18

1112

13

16

5

7

15

19

3

6

25

32

(d) (e) (f)

2

1213

32

3

8

15

19

5

7

17

18

116

25

16

2

12

13

32

3

8

15

19

5

7

17

18

6

1116

25

2

1213

32

3

1115

25

5

8

16

19

6

7

17

18

Figure14.22Shearsort-an example

for i :=1to logp+ 1doIf i is even,sort the columnsin increasingorderfromtop to bottom;elsesortthe rows.The rows aresortedin sucha way that alternaterows aresortedin reverseorder.Thefirst row is sortedin increasingorderfromleft to right, the secondrow is sortedin decreasingorderfrom left to right, and soon.

Algorithm14.8Shearsort

14.6.SORTING 705

0 0\342\200\242

\342\200\242 \342\200\2420 1 1-\342\226\240-1 0---0---01--10 \342\200\242\342\200\242\342\200\2420 1 \342\200\242\342\200\242\342\200\242

1

1 1\342\200\242\342\200\242\342\200\242 1 0 0-\342\226\240-0 1\342\226\240\342\200\242 \342\200\242

1 0 \342\200\242\342\226\240 \342\200\242 0 1\342\200\242\342\226\240\342\200\2421 \342\200\242 \342\200\242\342\200\242 1 0 \342\200\242\342\200\242 \342\200\2420

(a) (b) (c)

Figure14.23Provingthe correctnessof Algorithm 14.8

the numberof dirty rows at the beginningof any stage,then the numberof dirty rows at the end of the stageis no morethan y. This will thenimply that after log(^/p)stages,therewill be at mostone dirty row leftwhich can be sortedin an additionalrow sort.Thus there will be only2 \\og(^/p)+ 1 = logp+ 1phasesin the algorithm.

Look at two adjacentdirty rows at the beginningof any stage. Thereare threepossibilities:(1) thesetwo rows put togethermay have an equalnumberof onesand zeros,(2) the two rows may have morezerosthan ones,and (3)the two rowsmay have moreonesthan zeros.In the first phaseof thisstagethe rowsaresortedand in the secondphasethe columnsaresorted.Incase1,when the rows aresorted,they will looklike Figure14.23(a).Then,when the columnsare sorted,the two rows will contributetwo cleanrows(onewith allonesand the otherwill all zeros).If case2 is true, after therow sorting,the two rows will looklike Figure14.23(b).When the columnsaresorted,the two rows will contributeonecleanrow consistingof allzeros.In case3 also,a cleanrow (consistingof all ones)will be contributed.Insummary, any two adjacentdirty rowswill contributeat leastonecleanrow.That is,the numberof dirty rows will decreasein any phaseby a factor ofat least2.

Theorem14.10TheShearsortalgorithm(Algorithm14.8)workscorrectlyand runs in time0{^/p\\ogp)on a ^/p x ^/p mesh. \342\226\241

Odd-evenmergesortNow we implementthe odd-evenmergesortmethodon the mesh.If X =ki,&2, \342\200\242\342\200\242. , kn is the given sequenceof n keys, odd-evenmergesortpartitionsX into two subsequencesX[ \342\200\224 hi,&2,\342\226\240\342\226\240\342\200\242

\342\226\240,kn/2

and X'2 =\302\243Vi/2+ii^n/2+2>...,kn of equal length. SubsequencesX[ and Xf2 are sortedrecursively

assigningn/2processorsto each. The two sortedsubsequences(callthemX\\ and X2, respectively)are then finally mergedusingthe odd-evenmergealgorithm.


We have already seenhow the odd-evenmergealgorithmworks on themesh in 0(^/p) time(Algorithm14.6).This algorithmcan be used in themergingpart. Given p keys distributedon a y/p x ^fp mesh (onekey perprocessor),we canpartitionthem into four equalpartsof size-^px -^p each.Sorteachpart recursivelyinto snakelikerow majororder.Theresultis shownin Figure14.24(b).Now,mergethe toptwo snakesusingAlgorithm 14.6.At

the sametimemergethe bottomtwo snakesusingthe samealgorithm.Thesemergingstaketime0{^/p).After thesemergings,the meshlookslike Figure14.24(c).Finally mergethesetwo snakesby properly modifying Algorithm14.6.This mergingalsotakes0{^/p) time. After this merging,the wholemeshis in snakelikerow majorsortedorder(as in Figure14.24(d)).

136

107

112

168

9

3

14

155

1214

^\342\200\224fs

t5-tt-f\342\200\224Si

ifrÔ

-5\342\200\2245

(a) (b)

\342\226\2402\342\200\2243\342\200\2245\342\200\2246

15 13 119

-i\342\200\2244\342\200\2247-

16141210

-i\342\200\2242\342\200\2243\342\200\2244

-1\342\200\2246\342\200\2245\"

9 101112i

16 15 14 13

(c) (d)

Figure14.24Odd-evenmergesorton the mesh

IfS(\302\243)

is the timeneededto sortan \302\243 x \302\243 meshusingthe above divide-and-conqueralgorithm,then we have

S{\302\243)=

s(^+0{\302\243)which solves to

S(\302\243)-

0(\302\243).

Theorem14.11We cansortp elementsin 0(y/p) timeona ^/p x ^/p meshinto snakelikerow majororder. \342\226\241

14.6.SORTING 707

Example14.14Figure14.24(a)showsa 4 x 4 meshin which thereis a keyat eachnodeto beginwith. Themeshis partitionedinto four quadrantsandeachquadrant is recursivelysorted.Theresult is Figure14.24(b).Thetoptwo quadrantsas well as the bottomtwo quadrantsaremergedin parallel(Figure14.24(c).Theresultanttwo snakesaremerged(Figure14.24(d)).\342\226\241

EXERCISES1.Provethe correctnessof Algorithm 14.7usingthe zero-oneprinciple.2. Presentan implementationof rank sorton a ^fp x ^fp mesh.What is

the run timeof your algorithm?3.The randomizedrouting algorithmof Section14.2can be made

deterministic with the helpof sorting.Theroutingalgorithmworks asfollows. Partitionthe mesh into blocksof size-^ x ^ each;sorteachblockin columnmajororderaccordingto the destinationcolumnof the packets(the advantage of such a sorting is that now packetsin any blockthat have the samedestinationcolumnwill be found insuccessiveprocessorsaccordingto columnmajororder).Fromthenon the packetsusephases2 and 3 of Algorithm 14.1.Provethat thisalgorithmhas a run timeof 2^/p + O(^)with a queuesizeof 0(q).

4. Assume that eachprocessorof a y/p x yjp meshis the originof exactlyonepacketand eachprocessoris the destinationof exactlyonepacket.Presentan 0(v/p)-time0(l)-queuesdeterministicalgorithmfor thisroutingproblem.(Hint:Make useof sorting.)

5.Makinguseof the ideaof Exercise4, devisean 0(y/p)-time0(l)-queue-lengthdeterministicalgorithmfor the PPRproblem(seeSection14.2).

6. The array A is an almost-sortedarray of n elements.It is given thatthe positionof eachkey is at most a distanced away from its finalsortedposition.Give an 0(d)-timealgorithmfor sortingA on an n-processorlineararray. Provethe correctnessof your algorithmusingthe zero-oneprinciple.

7. Let \302\243 bea lineararray with lognprocessors.Eachprocessorhas -^^keys tobeginwith. Thegoalis to sortthe array. At the end,processor1shouldhave the leastj^-keys. Node 2 shouldhave the next bigger-^^ keys. And so on.Establishthat this sortingcan beaccomplishedin 0(n)time.

8. Prove that if in a y/p x y/p meshthe rows are sortedand then thecolumns,the rows remainsorted.


9. Considerthe following algorithmfor sortinga y/p x ^/p mesh:

(a) Partitionthe meshinto four quadrantsof size 2 x 2 each.

(b) Sorteachquadrant recursively.(c) Performfive stagesof Algorithm 14.8on,the whole mesh.

Provethat this algorithmcorrectlysortsarbitrary numbers.What isthe run timeof this algorithm?

10.In this sectionwe showed how to sortkeys on a ^Jp x ^Jp mesh intosnakelikerow majororderin 0{^/p) time.Provethat the meshcanbesortedinto the following indexingschemesalsoin 0{xJp)time:columnmajorand blockwisecolumnmajorwherethe blocksareof sizep1/4xp1' (within eachblockemploy the snakelikecolumnmajororder).

11.Given is a sequenceX of n keys k\\, &2,...,kn. Foreachkey ki (1< i <n), its positionin sortedorderdiffers from % by at most d.Presentan0(nlogd)-timesequentialalgorithmto sortX. Provethe correctnessof your algorithmusingthe zero-oneprinciple.Implementthe samealgorithmon a \\/n x \\Jn mesh.What is the resultantrun time?

14.7 GRAPH PROBLEMSIn Section13.7,we introduceda generalframeworkfor solvingthe transitiveclosure,connectedcomponents,and all-pairsshortest-pathsproblems.Wemake useof this frameworkherealso.

ThematrixM (seeSection13.7)can becomputedfrom M in 0(nlogn)timeon an n x n x n mesh.An n x n x n meshis a three-dimensionalgrid,in which eachgrid point correspondsto a processingelementand eachlinkcorrespondsto a bidirectionalcommunicationlink. Each processorin ann x n x n meshcan be denotedwith a triple(i,j,k),where1< i,j,k< n(seeFigure14.25).

Definition14.2In an n x n x n mesh,let (i,*, *) stand for allprocessorswhose first coordinateis i. This is indeedan n x n mesh.Similarly define(*,j,*) and (*,*,&).Also define (i,j,*) to be all processorswhose firstcoordinateis % and whosesecondcoordinateis j. This is a lineararray.Similarlydefine (i,*,A;)and (*,j,k). \342\226\241

Theorem14.12M canbecomputedfrom annxnmatrixM in 0(nlogn)timeusingan n x n x n mesh.


Z

\342\226\240I*

Figure14.25A 3D mesh

Proof:The n x n x n meshalgorithmis the sameas the CRCW PRAMalgorithmof Algorithm 13.15.We storethe matrixm[ ] in (*,*, 1),which isinitializedto M.

In step1,to updateq[i,j,k],the correspondingprocessorhas to accessboth m[i,j]and m[j,k].Eachprocessorcan accessm[i,j]by broadcastingm[i,j] along(i,j,*). Forprocessor(i,j,k) to getm[j,k], we do the following:Transposethe matrixm[ ] and storethe transposeas x[ ] in (*,*, 1).Nowelementx[i,j]is broadcastalong(i,j,*).(Verifying that this ensuresthateachprocessorgetsthe correctdatais left as an exercise.)Broadcastingcanbe done in O(n) timeeach.Transposingthe matrixcan alsobe completedin 0(n)time(seeSection14.3,Exercise12).

In step 2 of Algorithm 13.15,eachm[i,j]is updatedto min {^[i,l,j],q[i,2,j],...,q[i,n,j]},for 1< i,j < n. This can be doneas follows: Notethat the n itemsof interestare localto the linear array (i,*,j).Usingthis array, the updatedvalue of m[i,j]can be computedand storedin theprocessor(i,l,j) in 0(n)time. Thesen1updatedvalues of m[ ] have tobe moved to (*,*,0).This transfercan be performedwith two broadcastoperations.Firstbroadcastthe updatedm[i,j]alongthe array (i,*,j).Eachlineararray (i,*,j)now hasa copy of the updatedm[i,j].Secondbroadcastthis rn[i,j]alongthe lineararray (i,j,*) sothat the processor(i,j,0)getsacopy of the updatedm[i,j]. Now the updatingof m[i,j]can be donelocalto the lineararray (i,j,*) in 0(n)time.

Thuseachrun of the for looptakes0(n)time. \342\226\241

Consequently,the following theoremsalsohold.

Theorem14.13Thetransitiveclosurematrixof an n-vertexdirectedgraphcan becomputedin 0(nlogn) timeon an n x n x n mesh. \342\226\241


Theorem14.14The connectedcomponentsof an n-vertexgraph can bedeterminedin 0(nlogn) timeon an n x n x n mesh. \342\226\241

14.7.1An n x n MeshAlgorithmfor TransitiveClosureIn Section13.7.1we saw that the transitiveclosureof an n-vertexgraphcouldbecomputedby performing[logn\\ multiplicationsof an n x n matrix.In Theorem14.15we show that eachof thesemultiplicationscanbedonein

O(n) timeon an n x n mesh.

Theorem14.15Two n x n matricesA = a[i,j]and B = b[i,j]can bemultipliedin O(n) timeon an n x n mesh.

Proof:Let C = c[i,j]be the producttobe computed.Assume that thematricescan be input to the meshas shown in Figure14.26.In particular,the first columnof A traversesthroughthe first columnof the meshoneitemand oneprocessorpertimeunit. That is,in the first timeunit a[l,1]reachesthe processor(1,1).In the secondtimeunit a[l,l]reachesthe processor(2,1)and at the sametimeo[l,2]reachesthe processor(1,1).And so on.The secondcolumnof A traversesthroughthe secondcolumnof the meshstartingfrom the secondtimeunit (that is,a[2,1]reachesthe processor(1,2)at timestep2,and soon).

In general,the processor(i,j)getsboth a[i,k]and b[k,j]at timestepi+j+k \342\200\224 2.Node (i,j)is in chargeof computingc[i,j].Note that c[i,j]=]Cfe=i\302\253.[\302\253', A;]6[A;,j].Node (i,j)usesthe following simplealgorithm:If it getstwo items(one from above and one from the left), it multipliesthem andaccumulatesthe productto c[^j]-It then forwards the data itemsto itsbottomand right neighborsrespectively.Sinceit is guaranteedto get alla[i,A;]'sand b[k,jYs, at the end of the algorithmit has correctlycomputedthe value of cf?,.?].Also, the processor(i,j)completesits task by step(i + j + n \342\200\224 2) sincethereareonly n possiblevalues that k can take.Thusthe whole algorithmterminateswithin 3n \342\200\224 2 steps.

Forthis algorithmwe have assumedthat the right dataitemcomesto theright placeat the right time.What happensif the two matricesarealreadystoredin the mesh? (In fact this is the casefor the applicationof matrixmultiplicationin the solutionof the transitiveclosureproblem.)Let the twomatricesbestoredin the meshin row majorordertobeginwith. Transposeboth the matricesin 0(n)time. Simulatethe effect of data comingfromabove and the left as shown in Figure14.27.In eachrow (column),thereisa streammoving to the right (down) and another moving to the left (up).The streamcorrespondingto the ith. row (jthcolumn)shouldstart at timestepi (j). \342\226\241


b[xi]bv-M \302\243n,i

\302\243[3,2] \302\243[2,2] \302\243[1,2]

\302\243[3,3] \302\243[2,3] \302\243[1,3]

Figure14.26Multiplying two matrices

As a result,we alsoget the following theorem.

Theorem14.16The transitiveclosurematrixof a given undirectedgraphwith n processorscan becomputedin 0(nlogn) timeon an n x n mesh.\342\226\241

14.7.2 All-PairsShortestPathsIn Section13.7.2we presenteda PRAM algorithmfor the all-pairsshortest-paths problem.Theideawas todefine Ak(i,j) to representthe lengthof ashortestpath from i to j goingthroughno vertexof indexgreaterthan k,and then to infer that

Ak(i,j)=mm {AîiJ),Ak-l{i,k)+Ak-l(k,j)},k>\\.

The importanceof this relationshipbetweenAk and Ak~l is that thecomputationof A from A correspondstomatrixmultiplication,wheremin and additiontake the placeof additionand multiplication,respectively.Under this interpretationof matrixmultiplication,the all-pairsshortest-paths problemreducesto computingAn = A1 . We get this theorem.


[i,i]

Co-\342\200\224o\342\200\224o

OjklU] fc[l,l] fc[2,l] fe[3,l]

0'a[i,3]

Figure14.27Simulatingthe data flow

Theorem14.17The all-pairsshortest-pathsproblemcan be solved inO(nlogn)timeon an n x n mesh. \342\226\241

EXERCISES1.Usethe generalparadigmof this sectionto designa meshalgorithm

for finding a minimum spanningtreeof a given weightedgraph.Youcan useeithera k x k meshor an / x I x I mesh (for an appropriatekor I).Analyze the timeand processorbounds.

2. Presentan efficient algorithmfor topologicalsorton the mesh.

3. Give an efficient meshalgorithmto checkwhethera given undirectedgraph is acyclic.Analyze the processorand timebounds.

4. If G is any undirectedgraph, Gk is defined as follows:Thereis a linkbetweenprocessorsi and j in Gk if and only if there is a path oflengthk in G betweeni and j. Presentan 0(nlogA;)-time n x n meshalgorithmto computeGk from G.

5. You are given a directedgraph whoselinks have a weight of zeroorone. Presentan efficient minimum spanningtree algorithmfor thisspecialcaseon the mesh.

6.Presentan efficient mesh implementationof the Bellmanand Fordalgorithm(seeSection5.4).

7. Show how to invert a triangularyjp x yjp matrixon a yjp x yjp meshin 0(y/p) time.

8. Presentan 0(v/p)-timealgorithmfor inverting a y/p x yjp tridiagonalmatrixon a y/p x yjp mesh.

h

14.8.COMPUTINGTHECONVEXHULL 713

14.8 COMPUTINGTHE CONVEXHULL

Theconvexhull of n points in the planecan becomputedin O(n) timeonan n-processorlineararray (the proofof this is left as an exercise).Moreinterestingly, the sameproblemcan be solved in 0(y/n) timeon a y/n x y/nmesh.

We beginby showingthat a straightforward implementationof thealgorithm of Sections3.8.4and 13.8resultsin a run timeof 0(y/nlog2n). Laterwe show how toreducethis run timeto 0(y/n).

In the preprocessingstepof Algorithm 13.16,the input pointsaresortedaccordingto theirx-coordinatevalues.The variable N is usedto denotethenumberof points to the left of (pi,P2)-This sortingcan bedonein 0(\\/N)timeon a v^V x \\/N mesh. If qi,q2, \342\226\240.. , qN is the sortedorderof thesepoints,step1of Algorithm 13.16canbedoneby partitioningthe input intotwo partswith qi,g2, \342\200\242\342\200\242\342\200\242, Qn/2 m the first part and qN/2+iiQn/2+2^\342\226\240\342\226\240\342\226\240

\342\226\240>QN m

the secondpart. The first half is placedin the first half (i.e.,the first ;y^columns)of the meshand the secondpart is kept in the secondhalf of themesh in snakelikerow majororder. In step2,the upper hull of eachhalfis recursively computed.Let the upper hulls be arrangedin snakelikerowmajororder(in successiveprocessors).Let Hi and H% be the upper hulls.Step3 is completedin 0(\\/NlogN) time.Step4 callsfor dataconcentrationand hencecan be completedin O(VN) time.

LetT(\302\243)

be the run timeof the above recursivealgorithmfor the upperhull on an input of \302\243 columns;then we have

T{\302\243)= T{\302\243/2)+0(VN logN)

which solvesto T{y/N)= 0(y/N log2N) +T(l).Sincethe convexhull on a\\/7V-processorlineararray can be found in 0(\\/N) time(seethe exercises),we have T(y/N) = 0(^N log2N).

Theonly partof the algorithmthat remainsto bespecifiedis how to findthe tangent (u,v) in 0(\\/NlogN)time.Theway to find the tangent is tostart from the middlepoint,callit p,of H\\. Find the tangentof p with H2.Let (jo, q) be the tangent. Using(p,q),determinewhetheru is to the left of,equal to,or to the right of p in H\\. A binary searchin this fashion on thepointsof H\\ revealsu. Usethe sameprocedureto isolatev as well.

Lemma 14.10Let H\\ and Hibe two upperhulls with at most N pointseach.If p is any point of Hi,its tangentq with H2 can be found in 0(\\/N)time.

Proof.Broadcastp to all processorscontainingHi (i.e.,the secondhalf ofthe mesh).Consideran arbitrary processorin the secondhalf. Let q' be the


point this processorhas. Also let x and y be the left and right neighborsof q' in the hull H2.Eachprocessorcan accessthe left and right neighborsin 0(1)time(sincethe pointsare arrangedin snakelikerow majororder).If Ipq'xand /-pq'y areboth left turns, then q' is the point we are lookingfor (seeFigure3.10).The point q' is broadcastto the whole mesh.Eachbroadcasttakes0(y/N) time. \342\226\241

Lemma 14.11If Hi and Hi are two upper hulls with at most N pointseach,their commontangentcan becomputedin 0(y/NlogN)time.

Proof:Similarto that of Lemma13.5(seeExercise1). \342\226\241

In summary, we have the following theorem.

Theorem14.18Theconvexhull of iV pointsin the planecanbecomputedin 0(y/Nlog2N) timeon a y/N x y/N mesh. \342\226\241

The run timeof the precedingalgorithmcanbereducedtoO(y/N logN)usingthe following strategy. In the precedingalgorithm,at every level ofrecursionthe numberof rows remainsthe same.Even though the numberof pointsdecreaseswith increasingrecursionlevel,thereis no correspondingreductionin the mergingtime. At eachlevel mergingtakes0(y/NlogN)time.This suggeststhat we shouldattemptto simultaneouslydecreasethenumberof rows as well as columnsin any subproblem.One way of doingthis is topartitionthe input into four equalpartslike we did in the caseofodd-evenmergesort (seeFigure14.24).After partitioning,the convexhullof eachquadrant is obtainedin snakelikerow majororder.Thesefour upperhulls are then mergedas shown in Figure14.24(i.e.,first mergethe twoupper quadrantsand the two lower quadrants, and then mergethe upperhalf with the lower half). Eachmergingcanbedoneusingthe just-discussedmergingtechniquein O(y/N logN) time.If

T(\302\243)is the run timeof computing

the convexhull of a y/l x y/l mesh,T(\302\243)=

T(\302\243/2) + O(^log^);this solvesto

T(\302\243)=

0(\302\243log\302\243).Thus on a y/N x y/N mesh,the run timewill be

0(y/NlogN).

Example14.15Considerthe problemin which iV = 16and q\\,q2,.\342\226\240

\342\226\240, <7i6are (1,1),(1.1,4), (1.5,3.5),(2, 6), (2.2,4), (3, 4.5),(4, 7.5),(4.1,6),(4.5,5.5),(5, 5), (6, 8), (6.3,7), (6.5,5), (7, 6), (8, 7), (9,6).Thesepointsareorganizedon a 4 x 4 meshas shown in Figure14.28(a).Note that thegi'shave beenpartitionedinto four and eachquadrant of the mesh has apart. Within eachquadrant the pointsarearrangedin sortedorder(of theirx-coordinatevalues) usingthe snakelikerow majorindexingscheme.Theupperhull of eachquadrant is recursivelycomputed.The resultsareshown


(1,1)d.1,4)

(2,6)(1.5,2.5)

(4.5,5.5)(5,5)

(6.3,7)(6,8)

(2.2,4)(3,4.5)

(4.1,6)(4,7.5)

(6.5,5)(7,6)

(9,6) (8,7)

(a)

(1,1)(1.1,4)(2,6) (4,7.5)

(4.1,6)

(4.5,5.5)(6,8) (8,7) (9,6)

(1,1)(1,1.4)

(2,6)

(4.5,5.5)(6,8)

(6.3,7)

(2,2.4)(3,4.5)

(4.1,6)

(6.5,5)(7,6)

(9,6) (8,7)

(b)

(1,1)(1.1,4)(2,6) (4,7.5)

(9,6) (8,7) (6,8)

(c) (d)

Figure14.28Upper-hullcomputation

in Figure14.28(b).Theupperhulls in the top two quadrantsaremerged.At

the sametime,the upperhulls in the bottomtwo quadrantsarealsomerged(seeFigure14.28(c).Finally, the upper hull in the top half is mergedwiththe upperhull of the bottomhalf. The result is (1,1),(1.1,4), (2, 6), (4,7.5),(6, 8), (8, 7), (9,6). \342\226\241

We canreducethe run timefurther if we can find a faster way of mergingthe two upperhulls.We devisea recursivealgorithmfor mergingtwo upperhulls alongthe samelinesas above. Considerthe problemof mergingH\\

and H-2on an \302\243 x \302\243 mesh,whereeachhull is in snakelikerow majororderasshown in Figure14.29.Thereareat most \302\2432 points in all.Let (u,v) be thetangent. Algorithm 14.9describesthe algorithmin detail.


Step0. If \302\243

= 1,u is the leftmost point and v is the rightmostpoint.

Step1.Let p be the middlepoint of H\\. Find the point q oftangent of p with H% in

0(\302\243)time. Now decidein 0(1)time

whetheru is to the left of, to the right of, or equaltop. As aresulteliminateonehalf of Hithat doesnot containu. Similarlyeliminateonehalf of H?.

Step2. Do step 2 onemoretimesothat at the end only aquarterof eachof H\\ and H^ remains.

Step3. Now rearrangethe remainingpoints of H\\ and H% sothey occupy a submeshof size|x|in the sameorderas in

Figure14.29.

Step4. Recursivelywork on the submeshto determineu and v.

Algorithm14.9Mergingtwo upperhulls in0(\302\243)

time

Figure14.29Mergingtwo upper hulls


LetM(\302\243)

be the run timeof Algorithm 14.9.In step1,eliminationisdoneby broadcastingp sothat processorsthat have a point to beeliminatedwill not participatein the future. Step1takes

0(\302\243)time. Sodoesstep2.

Rearrangingin step3 can be doneas follows. Firstperforma prefix sumsoperationto determinethe addressof eachsurviving point in the f x fsubmesh.Then routethe points to theiractualdestinations.This routingand hencestep3 take

0(\302\243)time. Step4 takes

M(\302\243/2)time. In summary,

we have

M(\302\243)=

M^+0(\302\243)

whosesolutionisM(\302\243)

=0(\302\243).

Lemma 14.12Two upper hulls can be mergedin0{\302\243)

timeon an t x \302\243

mesh. \342\226\241

As a corollary to Lemma14.12,the recurrencerelationforT{\302\243) (the time

neededtofind the convexhull on an \302\243 x \302\243 mesh)becomes

T{\302\243)=t(^+0{\302\243)

which alsosolves toT{\302\243)

=0(\302\243).

Theorem14.19The convexhull of n pointson a y/n x y/n meshcan becomputedin 0(y/n) time. \342\226\241

EXERCISES1.ProveLemma14.11.2.Showthat the convexhull of n given pointscanbedeterminedin 0[n)

timeon an n-processorlineararray.

3. Presentan 0(y/n)-timealgorithmto computethe areaof the convexhull of n given pointson a y/n x y/n mesh.

4. Given a simplepolygon and a point p, presentan 0(v/n)-timealgorithm ona y/nXy/ri meshto checkwhetherp is internalto the polygon.

5. Presentan efficient algorithmto checkwhetherany threeof n givenpoints are colinearboth on a lineararray and on a y/n x y/n mesh.What arethe timeand processorbounds?



Fora comprehensivecollectionof meshalgorithmsseeIntroductiontoParallel Algorithms and Architectures:Arrays-Trees-Hypercubes,by T.Leighton,Morgan-Kaufmann,1992.

The three-phaserandomizedpacketrouting algorithmis due to\"Universal schemesfor parallelcommunication,\"by L. Valiant and G.Brebner,Proceedingsof the 13thAnnual ACM Symposiumon Theory of Computing,1981,pp. 263-277.Relatedrandomizedpacketroutingalgorithmscan befound in:\"Optimalrouting algorithmsfor mesh-connectedprocessorarrays,\" by S.Rajasekaranand T.Tsantilas,Algorithmica8 (1992):21-38.\"Randomisedalgorithmsfor packetroutingonthe mesh,\"by S.Rajasekaran,in Advances in ParallelAlgorithms,editedby L.Kronsjoand D.Shumsherud-din, Blackwell,1992,pp.277-301.

Thereexistoptimaldeterministicalgorithmsfor packetroutingon themesh.Seefor example:\"Parallelpermutationand sortingalgorithmsand a generalized

interconnection network,\"by D. Nassimi and S.Sahni,Journalof the ACM 2%, no.3(1982):642-667.\"A 2n \342\200\224 2 stepalgorithmfor routing in an n x n mesh,\"by T. Leighton,F.Makedon, and I. Tollis,Proceedingsof the ACM Symposiumon ParallelAlgorithmsand Architectures,1989,pp.328-335.\"Constantqueueroutingon a mesh,\"by S.Rajasekaranand R. Overholt,Journalof Paralleland Distributed Computing15(1992):160-166.

Sparseenumerationsortwas introducedby D.Nassimi and S.Sahni.

Therandomizedand deterministicselectionalgorithmspresentedin thischaptercan be found in \"Unifying themesfor network selection,\"by S.Rajasekaran, W. Chen,and S.Yooseph,Proceedingsof the Fifth InternationalSymposiumon Algorithmsand Computation, August 1994.

For a comprehensivecoverageof sorting and selectionalgorithmssee\"Sortingand selectionon interconnectionnetworks,\"by S.Rajasekaran,DI-MACS Seriesin DiscreteMathematicsand TheoreticalComputerScience21,1995:275-296.

TheShearsortalgorithmwas independentlypresentedin:\"Someparallelsortson a mesh-connectedprocessorarray and their timeefficiency,\"by K.Sadoand Y. Igarashi,Journalof Paralleland DistributedComputing3 (1986):398-410.


\"Shear-sort:A true two-dimensionalsortingtechniquefor VLSInetworks,\"by I.Scherson,S.Sen,and A. Shamir,Proceedingsof the InternationalConference on ParallelProcessing,1986,pp.903-908.

Theodd-evenmergesorton the mesh is basedon \"Sortingon a mesh-connectedparallelcomputer,\"by C.Thompsonand H.Kung,Communications of the ACM20,no.4 (1977):263-271.Formoreon sortingalgorithmson the meshsee:\"Randomizedsortingand selectiononmesh-connectedprocessorarrays,\"byC.Kaklanianis,D.Krizanc,L.Narayanan, and T.Tsantilas,Proceedingsofthe ACM Symposiumon ParallelAlgorithmsand Architectures,1991.\"Blockgossipingon gridsand tori:Deterministicsortingand routingmatchthe bisectionbound,\"by M. Kunde,Proceedingsof the First AnnualEuropean Symposiumon Algorithms,1993,pp.272-283.M. Kaufmann, J.Sibeyn,and T.Suel, \"Derandomizingalgorithmsforrouting and sortingon meshes,\"Proceedingsof the Symposiumon DiscreteAlgorithms, 1994,pp.669-679.

Mesh algorithmsfor various graphproblemscan be found in:\"Solvingtreeproblemson a mesh-connectedprocessorarray,\" by M. Atal-

lah and S.Hambrusch,Informationand Computation69(1986):168-187.\"Graphproblemson a mesh-connectedprocessorarray,\" by M. Atallah andS.Kosaraju,Journalof the ACM31,no.3 (1984):649-667.

Optimalalgorithmsfor convexhull and relatedproblemscan beseenin:\"Finding connectedcomponentsand connectedoneson a mesh-connectedparallelcomputer,\"by D.Nassimiand S.Sahni,SIAM JournalonComputing 9,no.4 (1980):744-757.\"Meshcomputeralgorithmsfor computationalgeometry,\"by R. Miller andQ.Stout, IEEETransactionson Computers38,no.3 (1989):321-340.\"Parallelgeometricalgorithmsona mesh-connectedcomputer,\"by C.Jeongand D.Lee,Algorithmica 5, no.2 (1990):155-177.\"Optimalmeshalgorithmsfor the Voronoi diagramof linesegmentsandmotion planningin the plane,\"by S.Rajasekaranand S.Ramaswanii,Journalof Paralleland Distributed Computing26,no.1(1995):99-115.

14.10ADDITIONALEXERCISES

1.A binary tree of processors,or simply a binary tree, is a completebinary tree in which thereis a processorat eachnodeand the linkscorrespondto communicationlinks.Figure14.30showsa 4-leafbinarytree.Theinputs to the binary treeareusually at the leaves.An n-leaf


Figure14.30A binary treeof processors

binary tree has 2n \342\200\224 1processorsand is of height logn.If there isa numberat eachleaf, we can computethe sum of thesenumbersasfollows. Each leafstartsby sendingits numberto its parent. Everyinternalprocessor,on receiptof two numbersfrom below,addsthemand sendsthe result to its parent.Usingthis simplestrategy, the sumis at the rootafter lognsteps.You arerequiredto solvethe prefixcomputationproblemon an n-leafbinary tree. Thereis an elementto beginwith at eachleafprocessor.The prefixesshouldalsobe output from the leaves.Show how youperformthis task in O(logn) time.

2. Assume that eachleafof an n-leafbinary tree (seeExercise1) haslogn elementsto beginwith. The problemis tocomputethe prefixesof thesenlognelements.Presentan O(logn) timealgorithmfor thisproblem.Note that suchan algorithmis work-optimal.

3. Therearej0gn keys at eachleafof a (logn)-leafbinary tree(see

Exercise 1).Showhow to sortthesekeys in O(n) time.You canstore0(n)itemsat any processor.

4. Thereis a dataitemat eachleaf of an n-leafbinary tree (seeExercise1).Thegoalis to interchangethe dataitemsin the left half with thosein the right half. Presentan 0(n)timealgorithm.Is it possibletodevisea o(n) timealgorithmfor this problem?

5. Presentan efficient algorithmto sortan n-leafbinary tree(seeExercise1),in which eachleafis input a singlekey.

6. A meshof treesis a yfp x yjp meshin whicheachrow and eachcolumnhas an associatedbinary tree.The row orcolumnprocessorsform theleavesof the correspondingbinary trees.Figure14.31is a 4 x 4 meshof treesin which only the columntreesareshown.In a yfp x yjp mesh


of trees,thereare y/p data itemsin the first row. They have to beroutedaccordingto an arbitrary permutationn and the resultstoredin the samerow. Presentan 0(logp)timealgorithmfor this problem.

Figure14.31A meshof trees(only columntreesareshown)

7. Thereis a key at eachprocessorin the first row of a y/p x y/p meshof trees(seeExercise6).Presentan 0(logp)-timealgorithmto sortthesekeys.

8.The first row of a y/p x y/p mesh of trees (seeExercise6) has y/ppoints in the plane(onepointperprocessor).Showhow you computethe convexhull of thesepoints in O(logp)time.

9. Every processorof a y/p x y/p meshof trees(seeExercise6) has a dataitem.The goalis to perform a prefix computationon the wholemesh(insnakelikerow majororder).Showthat this canbedonein O(logp)time.

10.Show that the FFT (seeSection9.3)of a vectorof length n can becomputedin O(logn) timeon an n x n meshof trees(seeExercise6).

11.Provethat an n x n matrixcanbe multipliedwith an n x 1vectorin

O(logn) timeon an n x n meshof trees(seeExercise6).12.Theproblemof one-dimensionalconvolution takesas input two arrays

I[0:n \342\200\224 1]and T[0: m \342\200\224 1].The output is anotherarray C[0:n \342\200\224 1],whereC[i]= ET=o 7K\302\253

+ k) modn\\T[k]-, for 0 < i < n. Employ an


n-processormeshto solve this problem.Assume that eachprocessorhas 0(m)localmemory. What is the run timeof your algorithm?

13.SolveExercise12on an n-processormeshof trees(seeExercise6).14.SolveExercise12on an n-leafbinary tree (seeExercise1).15.Theproblemof template matchingtakesas input two matricesI[0:

n \342\200\224 1,0 : n \342\200\224 1]and T[0: m \342\200\224 1,0:m \342\200\224 1].The output is a matrixC[0: n-1,0:n-1],where

m\342\200\2241m\342\200\2241

C[i,j]=Y^ ^2 I[(i+k) modn,(j+I) modn]T[k,l]fc=0 1=0

for 0 < i,j <n.Presentan n2-processormeshalgorithmfor templatematching. Assume that eachprocessorhas 0(m)memory. What isthe timeboundof your algorithm?

16.SolveExercise15on an n2-processormeshof trees(seeExercise6).17.SolveExercise15on an n2-leafbinary tree (seeExercise1).

Chapter 15

HYPERCUBEALGORITHMS

15.1COMPUTATIONALMODEL

15.1.1TheHypercubeA hypercubeof dimensiond, denotedHd, has p = 2d processors.Eachprocessorin T-Ld can be labeledwith a d-bit binary number. Forexample,the processorsof ^3 canbelabeledwith 000,001,010,011,100,101,110,and111(seeFigure15.1).We use the samesymbol to denotea processorandits label.If v is a d-bit binary number,then the first bit of v is the mostsignificantbit of v. The secondbitoivis the next-mostsignificantbit.Andso on.The dth bitoivis its leastsignificantbit.Let t>W stand for the binarynumberthat differs from v only in the ith bit.For example,if v is 1011,then vW is 1001.A four-dimensionalhypercubeis shown in Figure15.2.

Any processorv in W.d is connectedonly to the processorsv^' for i =1,2,...,d. In %3, for instance,the processor110is connectedto theprocessors 010,100,and 111(seeFigure 15.1).The link (u,t>M)is calledalevel i link. The link (101,001)is a level one link. Sinceeachprocessorinlidis connectedto exactlyd otherprocessors,the degreeof 'Hd is d. TheHammingdistancebetweentwo binary numbersu and v is defined to bethe numberof bit positionsin which they differ. For any two processorsuand v in a hypercube,thereis a path betweenthem of lengthequal to theHammingdistancebetweenu and v. Forexample,thereis a path of length4 betweenthe processors10110and 01101in a five-dimensionalhypercube:10110,00110,OHIO,01100,OllOl.In generalif u and v areany two

processors,a path betweenthem (of lengthequalto their Hammingdistance)can

bedeterminedin the following way. Letii.1'2,\342\226\240\342\226\240

\342\226\240, ik be the bit positions(inincreasingorder) in which u and v differ. Then, the following path exists

723

724 CHAPTER15.HYPERCUBEALGORITHMS

no

010

\\

100

111

on

101

000 001

Figure15.1A three-dimensionalhypercube

Figure15.2A hypercubeof dimensionfour


betweenu and v. u, to,, , tOj2, \342\226\240\342\226\240

\342\226\240, Wik, v, wherewi has the samebitsas v inpositions1 through ij and the rest of the bitsare the sameas thoseof u

(for 1< j < k). In otherwords, for eachstep in the path, one bit of u is\"corrected\"to coincidewith the correspondingbit of v.

It follows that the diameter(for a definition seeSection14.3)of a d-dimensionalhypercubeis equal to d sinceany two processorsu and v candiffer in at most d bits.For instance,the Hammingdistancebetweentheprocessors00\342\200\242\342\200\242\342\200\242 0 and 11\342\200\242\342\200\242\342\200\2421is d. Every processorof the hypercubeis aRAM with somelocalmemory and can performany of the basicoperationssuchasaddition,subtraction,multiplication,comparison,localmemoryaccess, and so on, in one unit of time.

Interprocessorcommunicationhappenswith the help of communicationlinks (that is,links) in a hypercube.If thereis no link connectingtwo givenprocessorsthat desireto communicate,then communicationis enabledusing any of the paths connectingthemand hencethe timefor communicationdependsonthe path length.Therearetwo variants of the hypercube.In thefirst version,knownas the sequentialhypercubeorsingle-porthypercube,it isassumedthat in oneunit of timea processorcancommunicatewith only oneof its neighbors.In contrast,the secondversion,known as the parallelhypercube or multiport hypercube,assumesthat in oneunit of timea processorcancommunicatewith allits d neighbors.In our discussion,we indicatetheversion used when needed.Both theseversionsassumesynchronouscomputations; that is,in every timeunit, eachprocessorcompletesits intendedtask.

A hypercubenetworkpossessesnumerousspecialfeatures.Oneis its lowdiameter.If therearep processorsin a hypercube,then its diameteris onlylogp. On the otherhand,a meshwith the samenumberof processorshas adiameterof 2(y/p \342\200\224 1).Also, a hypercubeV-d+i canbe built recursivelyasfollows. Take two identicalcopiesof H^. Callthem %' and %\".Prefixthelabelof eachprocessorin %' with zeroand prefix thoseof %\" with ones.Ifv is any processorof %',connectit with its correspondingprocessorin V.\".

Example15.1The hypercubeof Figure15.1canbe built from two copiesof V.2. Each%2 has four processors,00,01,10,11.Nodesin %'areprefixedwith zeroto get 000,001,010,011.Nodesin %\" are prefixedwith one toget 100,101,110,111.Now connectthe correspondingprocessorswith links;that is,connect000and 100,001and 101,and soon.The result is Figure15.1.

Similarly, a four-dimensionalhypercubecan be constructedfrom twocopiesof a three-dimensionalhypercubeby connectingthe correspondingnodeswith links (seeFigure15.2).And so on. \342\226\241

Likewise,V.^ has two copiesof V-d-i (for d > 1).For example,all theprocessorsin H^ whose first bit is zeroform a subcube\"Hd-i (ignoringall


the otherprocessorsand links from them). Also, all the processorswhosefirst bit is one form a %d-\\- Howabout all the processorswhoseqih bit iszero(or one), for some1< q < d? They alsoform a %d-i^ Equivalently,if we remove all the level i links (for some1< i < d), we end up with twocopiesof %d-\\- In generalif we fix somei bitsand vary the remainingbitsof a d-bit number,the correspondingprocessorsform a subcubeV-d-i in lid-

15.1.2TheButterflyNetworkThe butterfly network is closely relatedto the hypercube. Algorithmsdesigned for the butterfly can easily be adaptedfor the hypercubeand viceversa.In fact, for severalproblemsit is easierto developalgorithmsfor thebutterfly and then adaptthem to the hypercube.

A d-dimensionalbutterfly, denotedBd, has p = (d + l)2d processorsandd2d+1links.Eachprocessorin Ba canberepresentedasa tuple (r,\302\243),

where0 < r < 2d - 1 and 0 < \302\243

< d. The variabler is calledthe row of theprocessorand \302\243 is calledthe level of the processor.A processoru = (r,l) inBd is connectedto two processorsin level \302\243 + 1 (for 0 < \302\243 < d). Thesetwo

processorsare v =(r,\302\243

+ 1) and w ={Al+l\\\302\243 + 1).Therow numberof v

is the sameas that of u and the row numberof w differs from r only in the(\302\243

+ l)th bit.Both v and w are in level \302\243 + 1.The link (u,v) is known asthe directlink and the link (u,w) is known as the crosslink. Both of theselinks arecalledlevel

(\302\243

+ 1) links.

row 000001010011100101110111level = 0

level = 1

level = 2

level = 3

Figure15.3A three-dimensionalbutterfly

\302\243?3is shown in Figure15.3.In Figure15.3,for example,the processor

(Oil,1) is connectedto the processors(011,2)and (001,2).Sinceeachpro-


cessoris connectedto at mostfour otherprocessors,the degreeof Bd (forany d) is four and henceis independentof the size(p) of the network.If uis any processorin level 0 and v is any processorin level d, thereis a uniquepath betweenu and v of lengthd.Let u = (r,0) and v =

(?\342\200\242', d). Theuniquepath is (r,0),(n,l),(r2>2),..., {r',d),wherer\\ has the samefirst bit asr', r2 has the samefirst and secondbitsas r',and soon.Note that suchapath existsby the definition of the butterfly links. We refer to suchpathsas greedypaths.

Example15.2In Figure15.3,let u = (100,0)and v = (010,3).Then, theunique path betweenu and v is (100,0),(000,1),(010,2),(010,3). \342\226\241

As a consequence,it follows that the distancebetweenany two processorsin Bd is < 2d.So,the diameterof Bd is 2d.A butterfly alsohas a recursivestructurelike that of a hypercube. For example,if the level 0 processorsand incidentlinks areremovedfrom Bd, two copiesof Bd-\\result.In Figure15.3,removal of level zeroprocessorsand links yields two copiesof Bi-

Thereis a closerelationshipbetweenHd and Bd- If eachrow of a Bd iscollapsedinto a singleprocessorpreservingall the links,then the resultantgraph is

%</\342\200\242

In Figure15.3.collapsingeachrow into a singleprocessor,weget eightprocessors.Theseprocessorscansimply be labeledwith their rownumbers.When this is done,the collapsedprocessor110,for example,haslinks to the processors010,100,and 111,which areexactlythe sameas ina hypercube. Of coursenow therecan be multiplelinks betweenany twoprocessors;we keeponly one copy. Also, the crosslink from any processor(r, \302\243)

tolevel \302\243 +1correspondstothe level(\302\243

+1) link of r in the hypercube.As a resultof this correspondence,we get the following lemma.

Lemma 15.1Eachstepof Bd can besimulatedin onestepon the parallelversionof Hd- Also eachstep of Bd can be simulatedin d stepson thesequentialversionof Hd- \342\226\241

Definition15.1Any algorithmthat runson Bd is saidto bea normalbutterfly algorithm if at any given time,processorsin only one level participatein the computation. \342\226\241

Lemma 15.2A singlestepof any normalalgorithmon Bd canbesimulatedin onestepon the sequentialHd- \342\226\241

15.1.3Embeddingof OtherNetworks

Many networks suchas the ring,mesh,and binary treecan be shown tobesubgraphsof a hypercube.A generalmappingof one network into anotheris calledan embedding.More precisely, if G{V\\,E\\) and i?(V2,i?2)are any


Figure15.4Embedding- an example

two connectednetworks, an embeddingof G into H is a mappingof V\\ intoV-2. Embeddingresultsareimportant,for instance,to simulateonenetworkon another.Usingan embeddingof G into H,an algorithmdesignedfor Gcan besimulatedon H.

Example15.3Referringto Figure15.4,one possiblemappingof thevertices of G into thoseof H is 1->6, 2 ->c,and 3 ->a.With this mapping,the link (1,2)is mappedtothe path (b,d),(d,c).Similarly (1,3)is mappedto (b, d),(d,a).And soon. \342\226\241

Definition15.2Theexpansionof an embeddingis defined to be Wpr. Thelength of the longestpath that any link of G is mappedto is calledthedilation.The congestionof any link of H is defined to be the numberofpaths (correspondingto the links of G) that it is on.Thecongestionof theembeddingis defined to be the maximumcongestionof any link in H. \342\226\241

Example15.4For the graphs of Figure 15.4,the expansionis |.Thedilationis 2,sinceevery link of G is mappedtoa path of length2 in H. Thecongestionof the link (6,d) is 2 sinceit is on the paths for the links (1,2)and (1,3).Thecongestionof every otherlink of Hcanalsoseento be2.Sothe congestionof the embeddingis 2. \342\226\241

Embeddingofa ringIn this sectionwe show that a ring with 2d processorscanbe embeddedinHd- Recallthat the processorsof V.^ arelabeledwith d-hitbinary numbers.


If 0,1,...,2rf \342\200\224 1arethe processorsof the ring,processor0 is mappedto theprocessor00\342\200\242\342\200\242\342\200\2420 of

%</\342\200\242

The mappingsfor the otherprocessorsareobtainedusingthe Gray code.

Definition15.3The Gray codeof orderk, denotedGk-, definesan orderingamongall the fc-bit binary numbers.The Gray codeG\\ is defined as 0,1.The Gray codeGk (for any k > 1) is defined recursively in termsof Gk-ias

0[\302\243fc_i], l[^._i]r.Here0[\302\243/t_i]

stands for the sequenceof elementsinGk-isuchthat eachelementis prefixedwith a zero.Theexpressionl^î]7\"stands for the sequenceof elementsin Gk-im reverseorder,whereeachelementis prefixedwith a one. \342\226\241

Example15.50[Gi]correspondsto 00,01andl[\302\243i]r correspondsto 11,10.

Thus G-2 is 00,01,11,10(seeFigure15.5).Given G2, Gz cannow bederivedas 000,001,011,010,110,111,101,100,etc. \342\226\241

0,1

prefixwith zero

00 11

1,0prefixwith one

11 00 ^2

Figure15.5Constructionof Gray codes an example

Oneof the propertiesof Gk 1Sthat any two adjacententriesdiffer in onlyonebit.This meansthat Ga IS an orderingof all the processorsof Hd suchthat any two adjacentprocessorsareconnectedby a link. Let g(i,k) denotethe ith. elementof Gk- Then, map processori (0< i < 2d \342\200\224 1)of the ring toprocessorg(i,d)ofhd-Suchan embeddinghas an expansion,dilation,andcongestionof one.For a ringof eightprocessors,the embeddingof the ringinto H3 is given by 0 -\302\273 000,1-\302\273 001,2 ->011,3 ->010,4 ->110,5 ->111,6 ->101,and 7 ->100.Lemma 15.3A ring with 2d processorscanbeembeddedinto 'Hdso as tohave an expansion,dilation,and congestionof one. \342\226\241


Embeddingofa torusLet M. bea torus (for a definitionseeSection14.2,Exercise12)of size2r x2C.We show that there is an embeddingof M. into Hr+C whoseexpansion,dilation,and congestionareallone.This follows as a corollaryto Lemma15.3.Thereare 2r rows and 2C columnsin M.. As has beenmentionedbefore,if we fix any q bitsof a d-bit numberand vary the otherbits,theresultantnumbersdescribeprocessorsof a subcubeV-d-q of %</.

Therefore,if we fix the r most-significantbits(MSBs)of an (r +c)-bitbinary numberand vary the otherbits,2C numbersarisewhich correspondto a subcubeHc. In accordancewith Lemma15.3,this subcubehas aring embeddedin it. For eachpossiblechoicefor the r MSBs,there is acorresponding1tc.A row of Ai getsmappedto onesuchHc.In particular,row i getsmappedto that subcubeHcall of whose processorshave g(i,r)for their r MSBs.Thismappingof a ringinto Hcis asdescribedin the proofof Lemma15.3.In otherwords, if (i,j)is any processorof .M,it is mappedto the processorg(i,r)g(j,c).

Sinceall the processorsin any given row of M. are mappedinto a Hc,in accordancewith Lemma15.3,the mappedprocessorsand links form aring. Likewise,all the processorsin any columnof M. get mappedto a %r,and hencethe correspondingprocessorsand links of V.r alsoform a ring.Therefore,this embeddingresultsin an expansion,dilation,and congestionof one.

Lemma 15.4A 2r x 2C meshcan be embeddedinto 'Hr+csothat theexpansion, dilation,and congestionareone. \342\226\241

Example15.6Figure15.6shows an embeddingof a 2 x 4 torus into 'H^.For the torus thereare two rows, namely, row 0 and row 1. Therearefour columns:0,1,2,and 3.Forinstance,the node (1,2)of the torus(Figure 15.6(a))is mappedto the nodeg(l,l)g(2,2)= 111of the hypercube(Figure15.6(b)).In the figure both (1,2)and 111arelabeledg. \342\226\241

Embeddingofa binary treeTherearemany ways in whicha binary treecanbeembeddedinto ahypercube. Herewe show that a p-leaffull binary treeT (wherep = 2d for someintegerd) can beembeddedinto Hd- Note that a p-leaffull binary treehasa totalof 2p \342\200\224 1processors.Hencethe mappingcannotbeone-to-one.Morethan one processorof T may have to be mappedinto the sameprocessorof lid- If the tree leaves are0,1,...,p\342\200\224 1,then leafi is mappedto theith processorof %d- Each internalprocessorof T is mappedto the sameprocessorof Hd as its leftmost descendantleaf; alsoFigure15.7shows the


^00) (U) (0,2)^^3)

e^\342\200\224i# a

(1,0) (1,1) (1,2) (1,3)

(a)

001

(b)

010

Figure15.6Embeddingof a torus- an example

110

000 001 010 Oil 100 101 110111

Figure15.7Embeddinga binary tree into a hypercube


embeddingof an eight-leafbinary tree.Thelabeladjacentto eachprocessoris the hypercubeprocessorthat it is mappedto.

Theembeddingjust discussedcouldbe used to simulatetreealgorithmsefficiently on a sequentialhypercube. If any stepof computationinvolvesonly onelevel of the tree,then this stepcanbesimulatedin onestepon thehypercube.

EXERCISES1.How many links aretherein 'H(p-

2.What is the bisectionwidth of H^3. Computethe bisectionwidth of Bj.4. Derive the Gray codesQ\\ and

Q\302\247.

15.2 PPR ROUTINGTheproblemof PPR was defined in Section14.2as:Eachprocessorin thenetwork is the originof at mostonepacketand is the destinationof at mostonepacket;sendthe packetsto their destinations.In this sectionwe developPPRalgorithmsfor 'H^,-

15.2.1A GreedyAlgorithmWe considerthe problemof routingon Bd, wherethereis a packetat eachprocessorof level 0.The destinationsof the packetsare in level d in suchaway that the destinationrows form a partialpermutationof the originrows.In #3,for instance,the originrows couldbeallthe rows and the destinationrows couldbe 001,000,100,111,101,010,011,110.A greedy algorithmforroutingany PPR is to let eachpacketusethe greedy path betweenits originand destination.The distancetraveledby any packetis d usingthisalgorithm. To analyze the run timeof this algorithm,we only needto computethe maximumdelay any packetsuffers.

Let u = (r, \302\243)

be any processorin B^. Then thereare < 2e packetsthat can potentially go throughthe processoru. This is becauseu has twoneighborsin level \302\243

\342\200\224 1,eachoneof whichhas two neighborsin level^\342\200\2242,

andsoon.As an example,the only packetsthat can go throughthe processor(011,2)have originrows001,011,101,and 111(inFigure15.3).Similarly,apacketthat goesthroughu canreachonly oneof 2d~ipossibledestinations.This impliesthat the maximumnumberof packetsthat can contendfor anylink in level \302\243 is min {2^_1,2d~^}.Let ir bean arbitrary packet;it can only

15.2.PPRROUTING 733

suffer a delay of < min {2f ] ,2df} in crossinga level \302\243 link and has tocrossa level \302\243 link for \302\243

= 1,2,...,d. Thus the maximumdelay the packetcan suffer is D = Ylg=i min {2i-\\2d~t}.Assumewithout lossof generalitythat d is even.Then,D canberewrittenasD= 5Zi=i^~X+YHe.=d/2+i ^d~^= 2 * 2dl'1-2= 0{2dl2).The value 0{2dl2)is alsoan upperboundon thequeuelengthof the algorithm,sincethe numberof packetsgoingthroughany processorin level \302\243 is < min {2i,2d~i}.Themaximumof this numberover allts is 2d'2.

Lemma 15.5Thegreedy algorithmon B^ runs in 0{2dl2)time,the queuelengthbeing0{2d/2). 0

15.2.2A RandomizedAlgorithmWe can improve the performanceof the precedinggreedy algorithmdrastically usingrandomization.Recallthat in the caseof routingon the mesh,we were ableto reducethe queuelength of the greedy routing algorithmwith the introductionof an additionalphasewherethe packetsaresent torandomintermediateprocessors.The reasonfor sendingpacketstorandomprocessorsis that with highprobability a packetdoesnot get tomeetmanyother packetsand hencethe numberof possiblelink contentionsdecreases.A similarstrategy can beappliedon the butterfly also.

The routingproblemconsideredis the sameas before;that is,thereisa packetat eachprocessorof level zeroand the packetshave destinationsin level d. Therearethreephasesin the algorithm.In the first phaseeachpacketchoosesa randomintermediatedestinationin level d and goesthereusingthe greedy path. In the secondphaseit goesto its actualdestinationrow but in level zero. Finally, in the third phase,the packetsgo to theiractualdestinationsin level d. In the third phase,eachpackethas to travelto level d usingthe directlink at eachlevel.This takesd steps.Figure15.8illustratesthesethreephases.In this figure, r is a randomnodein level d.The variablesu and v are the originand destinationof the packetunderconcern.The secondphaseis the reverseof phase1,and henceit sufficestocomputethe run timeof phase 1 to calculatethe run timeof the wholealgorithm.Thefollowing lemmaproves helpful in the analysis of phase1.

Lemma 15.6[Queuelinelemma] Let V bethe collectionof paths to betakenby packetsin a network.If the paths in V are nonrepeating\\ then the delaysufferedby any packetir is no morethan the numberof distinctpacketsthatoverlap with -k. A set of paths V is saidto benonrepeatingif any two pathsin V that meet,sharesomesuccessivelinks,and divergenever meetagain.Forexample,the greedy paths in B& arenonrepeating.Two packetsaresaidto overlap if they shareat leastone link in their paths.


r vphase3

Figure15.8Threephasesof randomizedrouting

Proof:Let tt bean arbitrary packet.If tt is delayed by eachof the packetsthat overlap with tt no morethan once,the lemmais proven. Else,if apacket(callit q) overlappingwith tt delays tt twice (say), then q has beendelayed by anotherpacketwhich alsooverlaps with tt and which never getsto delay tt. \342\226\241

Analysisofphase1Let tt bean arbitrary packet.Also let ej bethe link that tt traversesin leveli, for 1< i < d. To computethe maximumdelay that tt canever suffer, it

suffices to computethe numberof distinctpacketsthat overlap with tt (c.f.the queuelinelemma).If rii is the numberof packetsthat have the link ej in

15.2.PPRROUTING 735

their paths, then D = J2i=ini *s an upper boundon the numberof packetsthat overlap with ir.

Considerthe link t{. The number of packetsthat can potentially gothrough this link is 2i~1sincethereare only 2i_1processorsat level zerofor which therearegreedy paths throughthe link et.Eachsuchpackethasa probability of \342\226\240\342\200\224 of goingthroughej. This is becausea packetstartingat level zerocan take eitherthe directlink or the crosslink, eachwith aprobability of ^. Onceit reachesa processorat level one,it can againtakeeithera crosslink or a directlink with probability i. And soon. If thepackethas togo through eZl it shouldpick the right link at eachlevel andthereare i suchlinks.

Therefore,the number rii of packetsthat go though ej is a binomialB(21\"1,i).The expectedvalue of this is 5.Sincethe expectationof a sumis the sum of expectations,the expectedvalue of Ylt=i ni IS f \342\200\242 Now we showthat the totaldelay is 0(d)with highprobability. The variable D is upperboundedby the binomialB(d,5).UsingChernoffboundsEquation1.1,

Prob.[Z?> ead] < (j^)\"\"*eead~d/2< (^b) eead

^(it)\342\204\24211

<1-ead<p~a~l

Herea > 1and we have madeuse of the fact that d = @(logp).Sincethereare < p packets,the probability that at leastone of the packetshasa delay of morethan 2eadis <p~a~1p= p~a. We arrive at the followingtheorem.

Theorem15.1The randomizedalgorithmfor routingon B^ runs in time6(d). \342\226\241

Sincethediameterof any networkis a lowerboundonthe worst-casetimefor PPR in any network, the above algorithmis asymptotically optimal.

QueuelengthanalysisThequeuelengthof the precedingalgorithmis also0(d).Let V{ beanyprocessor in level i (for 1< i <d). The numberof packetsthat can potentiallygo thoughthis processoris 2*. Eachsuchpackethas a probability of ~i ofgoingthroughV{. Thusthe expectednumberof packetsgoingthroughuj is2l~i= 1.UsingChernoffboundsEquation1.1,the numberof packetsgoingthroughVi canbe shown to be0(d).


Theorem15.1togetherwith Lemma15.1yields Theorem15.2.

Theorem15.2Any PPR canberoutedon a parallelT-L^ in 0(d)time,thequeuelengthbeing0(d). \342\226\241

EXERCISES

1.Lemma15.5proves an upper bound on the run timeof the greedyalgorithmon B&. Prove a matchinglower bound. (Hint: Considerthe bit reversalpermutation.In this permutationif 6i&2'''&<2 is theoriginrow of any packet,its destinationrow is b^bd-i \342\226\240\342\226\240\342\226\24062&1\342\226\240 Forthispermutation,computethe traffic throughany level|link.)

2.Assume that d packetsoriginatefrom every processorof level zeroin a Bd- Thesepacketsare destinedfor level d with d packetsperprocessor.Analyze the run timeand queuelengthof the randomizedroutingalgorithmon this problem.

3. If q packetsoriginatefrom every processorof level zeroandeachpackethasa randomdestinationin level d of a B^,presenta routingalgorithmthat runs in time0(q+d).

4. In a B&, at mostone packetis destinedfor any processorin level d.The packet(if any) destinedfor processori is at the beginningplacedrandomly in one of the processorsof level zero(eachsuchprocessorbeingequally likely).Thereareonly a totalof

(2d)\342\202\254 packets,for someconstante > 0. If the greedy algorithmis used to route,what is theworst-caserun time?What is the queuelength?

5. Forthe routingproblemof Exercise4, what is the run timeand queuelength if the randomizedrouting algorithmof Section15.2.2isemployed?

15.3FUNDAMENTALALGORITHMS

In this sectionwe presenthypercubealgorithmsfor suchbasicoperationsasbroadcasting,prefix sumscomputation,and data concentration.All thesealgorithmstakeO(d) timeon lid-Sincethe diameteris a lower boundonthe solutiontimeof any nontrivial problemin an interconnectionnetwork,thesealgorithmsareasymptotically optimal.


000001 010Oil 000001 010Oil 000001 010Oil

step1 step2

Figure15.9Broadcastingon a %2

15.3.1BroadcastingThe problemof broadcastingin an interconnectionnetwork is to send acopy of a messagethat originatesfrom a particularprocessorto a subsetofotherprocessors.Broadcastingis quite useful sinceit is widely used in thedesignof severalalgorithms.To perform broadcastingon Hd, we employ thebinary treeembedding(seeFigure15.7).Assume that the messageM to bebroadcastis at the rootof the tree(i.e.,at the processor00 \342\226\240\342\226\240\342\226\2400).Therootmakestwo copiesof M and sendsa copy to eachof its two childrenin thetree.Eachinternalprocessor,on receiptof a messagefrom its parent,makestwo copiesand sendsa copy to eachof its children.This proceedsuntil allthe leaveshave a copy of M.Note that the heightof this treeis d.Thus in0(d)steps,eachleaf processorhas a copy of M.

In this algorithm,computationhappensonly at one level of the treeatany given time. Thus eachstepof this algorithmcan be run in one timeunit on the sequential%d-

Lemma 15.7Broadcastingof a messagecanbedoneon the sequential%din Q(d) time. \342\226\241

Example15.7Stepsinvolved in broadcastingona %2 areshown in Figure15.9.Thealgorithmcompletesin two steps. \342\226\241

15.3.2PrefixComputationWe againmake useof the binary treeembeddingto perform prefixcomputation on Hd- Let xi be input at the ith leaf of a 2d-leafbinary tree.Therearetwo phasesin the algorithm,namely, the forward phaseand the reversephase.In the forward (reverse)phase,data itemsflow from bottomto top


(top to bottom).In eachstepof the algorithmonly one level of the tree isactive.Algorithm 15.1gives the algorithm.

Forward phase

Theleavesstart by sendingtheir dataup to theirparents. Each internalprocessoron receiptof two items(say y from its left child and z from its right child)computesw = y \302\251 z, storesa copy of y and w, andsendsw to its parent.At the end of d steps,eachprocessor in the treehas storedin its memory the sum ofall the data itemsin the subtreerootedat thisprocessor. In particular,the roothas the sum of all theelementsin the tree.

Reversephase

The rootstartsby sendingzerotoits left childand itsy to its right child.Eachinternalprocessoron receiptof a datum (say q) from its parent sendsq to its leftchild and q \302\251 y to its right child.When the ith leafgetsa datum q from its parent,it computes<? \302\251 x; andstoresit as the final result.

Algorithm15.1Prefixcomputationon a binary tree

Example15.8Let S bethe setof allintegersand \302\251 be the usualaddition.Considera four-leaf binary tree with the following input: 5,8,1,3.Figure15.10shows the executionof every step of Algorithm 15.1.The datuminsideeachinternalprocessoris its y-value. In step1,the leavessendtheirdata up (Figure15.10(a)).In step2, the internalprocessorssend 13and4, respectively,storing5 and 1 (Figure15.10(b))as their y-values. In step3,the rootsends0 to the left and 13to the right (Figure15.10(c)).Inthenext step,the leftmost internalprocessorsends0 to the left and 5 to theright. The rightmostinternalprocessorsends13to the left and 14to theright (Figure15.10(d)).In step5, the prefixesarecomputedat the leaves.

D

In the forward phaseof Algorithm 15.1,eachinternalprocessorcomputesthe sum of allthe data in its subtree.Let v beany internalprocessorand v'be the leftmost leafin the subtreerootedat v. Then, in the reversephaseof


(a) (b) (c)

(d) (e)

Figure15.10Prefixcomputationon a binary tree

the algorithm,the datum q receivedby v canbeseentobe X^=o xi- Thatis,q is the sum of all input data itemsto the left of v'. Thecorrectnessofthe algorithmfollows. Also both the forward phaseand the reversephasetaked stepseach.Moreover, at any given timeunit, only one level of thetreeis active.Thuseachstepof Algorithm 15.1canbesimulatedin onestepon Hd.

Lemma 15.8Prefixcomputationon a 2 -leafbinary treeas well asHd canbeperformedin Q(d) timesteps. \342\226\241

Note:Theproblemof data sum is tocomputex\\ \302\251 X2 \302\251

\342\226\240\342\226\240\342\226\240

\302\251 xn, given theXj's.Theforward phaseof Algorithm 15.1suffices tocomputethe datasum.Thus the timeto computethe data sum is only one-halfthe timetaken tocomputeall the prefixes.

15.3.3DataConcentrationOn Hd assumethat thereare k <p data itemsdistributedarbitrarily withat most onedatum perprocessor.Theproblemof data concentrationis tomove the data into the processors0,1,...,k \342\200\224 1of Hd one data itemperprocessor.If we cancomputethe final destinationaddressfor eachdataitem,


then the randomizedroutingalgorithmof Section15.2can be employedtoroutethe packetsin time0(d).Note that the randomizedroutingalgorithmassumesthe parallelhypercube.

Thereis a much simplerdeterministicalgorithmthat runs in the sameasymptotic timeon the sequentialhypercube.In fact we presenta normalbutterfly algorithmwith the samerun timeand then invoke Lemma15.2.Welist somepropertiesof the butterfly network that areneededin the analysisof the algorithm.

Property1If the level d processorsand incidentlinks areeliminatedfromBd, two copiesof Bd-\\ result. As an example,in Figure15.11,removal oflevel 3 processorsand links resultsin two independent 's.One of thesebutterfliesconsistsof only even rows (shownwith thick lines)and the otherconsistsof only oddrows. Callthe former the even subbutterfly and thelatterthe oddsubbutterfly.

row 000001010011100101110111level = 0

level = 1

level = 2

level = 3

Figure15.11Removal of leveld processorsand links

Property2 All processorsat level d are connectedby a full binary tree.Forexample,if we traceall the descendantsof the processor00\342\200\242\342\200\242\342\200\2420 of levelzero,the resultis a full binary treewith the processorsof level d as its leaves.In fact this is true for eachprocessorat level zero.

Now we areready to describethe algorithmfor dataconcentration.Assume that the k <2ddata itemsarearbitrarily distributedin level d of Bd-At the end, thesedata itemshave to be moved to successiverows of level


zero.Forexample,if therearefive itemsin level 3 of #3,row 001\342\200\224\342\226\272 a (thisnotationmeansthat the processor(001,3}has the itema), row 010\342\200\224> b, row100\342\200\224> c, row 101\342\200\224> d, and row 111\342\200\224> e, then at the end, theseitemswillbe at level zeroand row 000\342\200\224> a, row 001\342\200\224> b, row 010\342\200\224> c, row 011\342\200\224> <i,and row 100\342\200\224> e.Thereare two phasesin the algorithm.In the first phasea prefix sumsoperationis performedtocomputethe destinationaddressofeachdataitem.In the secondphaseeachpacketis routedtoits destinationusingthe greedy path from its originto its destination.

The prefix computationcanbe doneusingany of the treesmentionedinproperty 2 and Lemma15.8.Theprefix sumsarecomputedon a sequence,xq,x\\,....x2d_i,of zerosand ones.Leaf i setsXj toone if it has a datum,otherwiseto zero. In accordancewith Lemma15.8,this phasetakesO(d)time.

In the secondphasepacketsareroutedusingthe greedy paths.Theclaimis that no packetgets to meetany otherand hencethereis no possibility oflink contentions.Considerthe first stepin whichthe packetstravel from leveld to level d \342\200\224 1.If two packetsmeetat level d \342\200\224 1,it couldbeonly becausethey originatedfrom two successiveprocessorsof level d. If two packetsoriginatefrom two successiveprocessors,then they arealsodestinedfor twosuccessiveprocessors.In particular,one has an oddrow as its destinationand the otherhas an even row. That is,onebelongstothe oddsubbutterflyand the otherbelongsto the even subbutterfly (seeFigure15.11).Withoutlossof generality assumethat the packetsthat meetat level d \342\200\224 1meetat aprocessorof the oddsubbutterfly. Then it is impossiblefor oneof thesetwoto reachany processorof the even subbutterfly. In summary, no two packetscan meetat level d \342\200\224 1.

After the first step,the problemof concentrationreducestotwo subprob-lems:concentratingthe packetsin the oddsubbutterfly and concentratingthe itemson the even subbutterfly. But thesesubbutterfliesareof dimensiond \342\200\224 1.Thus by inductionit follows that thereis no possibility of any twopackets'meetingin the wholealgorithm.

Thefirst phaseas well as the secondphaseof this algorithmtakes@(d)timeeach.Alsonotethat the wholealgorithmis normal.We get this lemma.

Lemma 15.9Dataconcentrationcan be performedon Bd as well as thesequential%d in &(d) time. \342\226\241

Definition15.4Theproblemof data spreadingis thereare k < 2d itemsin successiveprocessorsat level zeroof Bd (starting from row zero). Theproblemis to routethem to somek specifiedprocessorsat level d (oneitemperprocessor).Thedestinationscan be arbitrary exceptthat the orderofthe itemsmust be preserved(that is,the packetoriginatingfrom row zeromust be the leftmost packetat level d, the packetoriginatingfrom row onemust bethe next packetat level <i, and soon). \342\226\241


Definition15.5Theproblemof monotonerouting is therearek < 2dpackets arbitrarily distributed,at mostoneperprocessor,at level d of Bd- Theyaredestinedfor somek arbitrary processorsat level zerosuchthat the orderof packetsis preserved. \342\226\241

Dataspreadingis just the reverseof dataconcentration.Also, monotoneroutingcanbeperformedby performinga dataconcentrationfollowed by adataspreading.Thuseachcan be donein time@(d).

Lemma 15.10Dataspreadingas well as monotoneroutingtakes&(d) timeon Bd and the sequentialT-Ld- a

15.3.4SparseEnumerationSortTheproblemof sparseenumerationsort was introducedin Section14.3.4.Let the sequencetobesortedon T-Ld beX = ko, &2,...,k^p-i,wherep = 2d.Without lossof generality assumethat d is even.Let the input begiven onekey perprocessorin the subcubedefined by fixing the first ^ bitsof a cf-bitnumbertozeros(andvarying the otherbits).Thesortedoutput alsoshouldappearin this subcubeonekey perprocessor.

Let v be any processorof %d- Itslabelis a cf-bit binary number.Thesamelabelcan be thoughtof as a tuple (i,j),wherei is the first ^ bitsandj is the next|bits.All processorswhose first ^ bitsarethe sameand equalto i form a %dft (f\302\260r

each0 2 \342\200\224 1).Callthis subcuberow i. Alsoall processorsof 7-Cd whose last ^ bitsare the sameand equal toj form asubcube%dj2- Callthis subcubecolumnj. The^Jp numberstobe sortedare input in row zero. Theoutput alsoshouldappearin row zero. To bespecific,the key whoserank is r shouldbeoutput at the processor(0,r \342\200\224 1}(for 1< r < 2dl2).

To beginwith, kj is broadcastin columnj (for 0 < j < ^/p \342\200\224 1) sothateachrow has a copy of X. In row i computethe rank of ki. This is doneby broadcastingki toallthe processorsin row i followed by a comparisonofki with every key in the input and a datasumcomputation.Therank of kiis broadcastto all the processorsin row i. If the rank of ki is r^, then theprocessor(i,ri \342\200\224 1}broadcastski alongthe columnr^ \342\200\224 1sothat at the endof this broadcastingthe processor(0,r; \342\200\224 1}getsthe key that it is supposedto output.

Theprecedingalgorithmis a collectionof operationslocaltothe columnsor the rows. Theoperationsinvolved are prefix computation,broadcast,and comparisoneachof which can be done in 0{d)time. Thus the wholealgorithmruns in time@(d).


Lemma 15.11Sparseenumerationsortcan becompletedin &(d) timeona sequentialTid, wherethe numberof keys to besortedis at most ^/p. \342\226\241

EXERCISES1.Thebroadcastingalgorithmof Section15.3.1assumesthat the message

originatesfrom processor00\342\200\242\342\200\242\342\200\2420.Howcanyou broadcastin &(d) timeon T-Ld if the originis an arbitrary processor?

2.ProveLemma15.10.3.In a sequentialTid, every processorhas a packetto be sent to every

otherprocessor.Presentan 0(pd)algorithmfor this routingproblem,wherep = 2d.

4. Presentan 0(p)-timealgorithmfor the problemof Exercise3.5. If we fix the first d \342\200\224 k bits of a of-bit binary numberand vary the

last k bits,the correspondingprocessorsform a subcube%k m %d-Thereare 2d~k suchsubcubes.Thereis a messagein eachof thesesubcubes.Presentan algorithmfor every subcubeto broadcastitsmessagelocally (knownas window broadcast).What is the run timeofyour algorithm?

6. The data concentrationalgorithmof Lemma15.9assumesthat thedataoriginatein level d.Presentan 0(d)timealgorithmfor the casein which the dataoriginateand aredestinedfor level zero.

7. On a Bd,thereis a datumat eachprocessorof level zero.Theproblemis to shift the data clockwiseby i positions(for somegiven 1< i <2d \342\200\224 1).Forexample,on a B3 let the distributionof data (startingfromrow zero)be *, a,*, *, 6,c,*, d (* indicatesthe absenceof a datum). Ifi is 3,the final distributionof datahas to bec,*, <i, *, a,*, *, b. Presentan 0(d)algorithmfor performingshifts.

8. The problemof window shifts is the sameas shifts (seeExercise7)exceptthat windowshifts are localto eachsubcubeof size2k. Showhow to perform windowshifts in 0(k)time.

9. Give an O(k) timealgorithmfor the window prefix computationproblem, whereprefix computationhas to bedone in eachsubcube\"Hk-

10.On a Bd, thereare d itemsat eachprocessorof level zero. Let theitemsat row i be kj,Uj,...,kf. Theproblemis to computed batchesof prefixes.The first batch is on the sequence&q,k\\,...,k\\d_,\\ thesecondbatch is on the sequence&q,kf,...,k\"id_.\\ and so on.Showhow


to computeall thesebatch prefixesin 0(d)time. Isyour algorithmnormal?

11.Let f(x) = ap-\\xp~l+ap-2Xp~2+ \342\226\240\342\226\240-+a\\x +aQ,wherep = 2d. Presentan 0(d)timealgorithmto evaluatethe polynomial/ at a given pointy on Hd- Assume that a,i is input at processori.

12.Present0(<i)-timealgorithmsfor the segmentedprefix problem(seeSection13.3,Exercise5) on Hd and Bd,wherethe input sequenceis of

length2d. On Bd,dataare input at level zero.

13.You aregiven a sequenceA of p = 2d elementsand an elementx. Thegoal is to rearrangethe elementsof A sothat all the elementsof Athat are< x appearfirst (insuccessiveprocessors)followed by the restof the elements.Presentan 0(d)timealgorithmon Hd-

14.In a sequentialHd, 2d>2packetshave to berouted.No morethan onepacketoriginatesfrom any processorand no morethan one packetisdestinedfor any processor.Presentan 0(d)timealgorithmfor thisroutingproblem.

15.If T is the run timeof any algorithmon an arbitrary 2d-processordegreed network, show that the samealgorithmcan be simulatedonHd in time0(Td).

15.4 SELECTIONGiven a sequenceof n keys and an integeri, 1 < i < n, the problemofselectionis to find the ith-smallestkey from the sequence.We have seensequentialalgorithms(Section3.6),PRAM algorithms(Section13.4),andmeshalgorithms(Section14.4)for selection.Like in the caseof the mesh,we considertwo different versionsof selectionon Hd- In the first version weassumethat p = n, p beingthe numberof processorsand n the numberofinput keys. In the secondversion we assumethat n > p. It is necessaryto handlethe secondversion separately,sinceno generalslow-downlemma(likethe onefor PRAMs) existsfor Hd-

15.4.1A RandomizedAlgorithmforn= p (*)Theworkoptimalalgorithm(Algorithm13.9)of Section13.4.5canbeadaptedto run optimally onHd as well.Thereare0(1)stagesin thisalgorithm.Step1of Algorithm 13.9canbeimplementedon Hd in 0(1)time.In steps2 and5 prefixcomputationscanbedonein a totalof 0(d)time(c.f.Lemma15.8).Concentrationin steps3 and 6 takes0(d)timeeach(seeLemma15.9).Also,sparseenumerationsort takesthe sametimein steps3 and 6 in accordance

15.4.SELECTION 745

with Lemma15.11.Selectionsin steps4 and 6 take only 0(1)timeeachsincetheseare selectionsfrom sortedsequences.Broadcastsin steps2, 4,and 5 take 0(d)timeeach(c.f.Lemma15.7).We arrive at this theorem.

Theorem15.3Selectionfrom n = p keys can be performedin 0(d)timeon Hd- \302\260

15.4.2RandomizedSelectionfor n > p (*)Now we considerthe problemof selectionwhen n = pc for someconstantc >1.Algorithm 13.9canbeusedfor this caseas well with somemodifications.The modificationsarethe sameas the oneswe did for the mesh(seeSection14.4.2).Each processorhas ^ keys to beginwith. The conditionfor thewhilestatementis changedto(N > D) (whereD is a constant).In step1aprocessorincludeseachoneof its keys with probability 1_|1/3c).Thusthisstepnow takestime-.Step2 remainsthe sameand stilltakes0(d)time.Theconcentrationand sparseenumerationsortof step3 can be performedin timeO(d) (c.f.Lemmas15.9and 15.11).Step4 takes0(d)timeandsodo steps5 and 6. Thuseachstagetakestime0(^+d). Thereareonly

O(loglogp)stagesin the algorithm.Thefinal result is this theorem.

Theorem15.4Selectionfrom out of n = pc keys can beperformedon V.^

in timeO ((^ +d) loglogpj. \342\226\241

15.4.3A DeterministicAlgorithmfor n >pThedeterministicmeshselectionalgorithm(Algorithm14.5)canbeadaptedto a hypercube.The correctnessof this algorithmhas already beenestablished in Section14.4.3.We only have to computethe run timeof thisalgorithmwhen implementedon a hypercube.

In step0,the elementscan be partitionedinto logpparts in ^ loglogptime (seeSection13.4,Exercise5) and the sorting can be done in timeO(Mog^).Thusstep0 takes^ min {log(n/p),loglogp}time.At the endof step0,the keys in any processorhave beenpartitionedinto approximatelylogpnearly equalparts.Calleachsuchpart a block.

In step1,we can find the medianat any processoras follows.Firstdetermine the blockthe medianis in and then performan appropriateselectionin that block(usingAlgorithm 3.19).The totaltimeis

0(\342\200\224^--).

In step 2, we can sort the mediansto identify the weighted median.IfM{,M2,...,Mpis the sortedorderof the medians,we needto identify jsuchthat JjI=iN'k > y and

\302\243{=i N'k < y. Sucha j canbecomputedwith


an additionalprefix computation.Sortingcan be done in 0(d2)time(as isshown in Section15.6).Theprefixcomputationtakes0(d)time(seeLemma15.8).ThusM, the weightedmedian,can be identified in 0(d2)time.

In step3,eachprocessorcan identify the numberof remainingkeys in itsqueueand then all the processorscan performa prefix sumscomputation.Therefore,this steptakes0( ,\" +d) time.

In step 4, the appropriatekeys in any processorcan be eliminatedasfollows. First identify the blockB that M falls in. This can be done in0(logp)time. After this, compareM with the elementsof blockB todeterminethe keys to be eliminated.If i > tm

(\302\253

< ?\"m), all blockstothe left (right) of B are eliminateden masse. The total timeneededis0(logp+ ,\" ), which is 0( ,\" ) sincen = pc for someconstantc > 1.

Step5 takes0(d)timeas it involves aprefixcomputationand abroadcast.Thebroadcastingin steps2, 3,and 4 takes0(d)timeeach(c.f.Lemma

15.7).Thus eachrun of the while looptakes0( .\" + d2) time. (Notethat d = \\ogp.)Thewhileloopis executedO(logn) times(seethe proofofTheorem14.8).Thus we get the following theorem(assumingthat n = pcand hencelognis asymptotically the sameas logp).

Theorem15.5Selectionon Hd can be performedin time0(-loglogp+d2 logn). \342\226\241

Example15.9On a Tis, let eachprocessorhave five keys to beginwith.Considerthe selectionproblemin which %

= 32.The input keys areshownin Figure15.12.Forsimplicity neglectstep 0 of Algorithm 14.3;that is,assumethat the partsareof size1.

In step 1,localmediansare found. Thesemediansare circledin thefigure.The sortedorderof thesemediansis 6,16,18,22,25,26,45,55. Sinceat the beginningeachprocessorhas the samenumberof keys, the weightedmedianis the sameas the regularmedian.A medianM of thesemediansis22.In step3,the rank of M is determinedas 21.Sincei > 21,all the keysthat arelessthan or equalto 22 areeliminated.We updatei to 32 \342\200\224 21= 11.This completesone run of the while loop.

In the secondrun of the whileloop,thereare19keys tobeginwith. Thelocalmediansare 27,23,24,35,63,36,45,28with correspondingweights of1,1,2,2,3,3,4,3,respectively.Thesortedorderof thesemediansis 23,24,27,28,35,36,45,63with correspondingweightsof 1,2,1,3,2, 3,4,3,respectively.TheweightedmedianM is 36.Itsrank is 10.Thusallthe keys that arelessthan or equal to 36areeliminated.We updatei to 11\342\200\224 10= 1,and thiscompletesthe secondrun of the while loop.

Therestof the computationproceedssimilarly to finally output 42 as theanswer. \342\226\241

15.4.SELECTION 747

Processor000 001 010 Oil 100 101 110 111

5 2010 15

4 (16)27 23

\302\251 @

Figure15.12Deterministicselection

EXERCISES

1.CompleteExample15.9.

2.Presentan efficient algorithmfor finding the kth quantilesof any givensequenceof n keys on Hd- Considerthe casesn = p and n >p.

3. Given an array A of n elements,the problemis to find any elementof A that is greaterthan or equal to the median.Presenta simpleMonte Carloalgorithmfor this problemon %d- You cannot use theselectionalgorithmof this section.Your algorithmshouldrun in timeO(d).Show that the output of your algorithmis correctwith highprobability. Assume that p = n.

24121332

\302\251

32


\302\251 @71 4755 26 (45)

30

!125


_ 42 637155

47 625145


15.5MERGINGTheproblemof mergingis to taketwo sortedsequencesas input and producea sortedsequenceof all the elements.Thisproblemwas studiedin Chapters3,10,13,and 14.If the two sequencesto be mergedareof lengthm each,they canbemergedin O(logm)timeusinga hypercubewith 0(m2)processors (applying the sparseenumerationsort).In thissectionwe areinterestedin mergingon a hypercubewith only 2mprocessors(assumingthat m is anintegralpower of 2).The techniqueemployed is the odd-evenmerge.

15.5.1Odd-EvenMergeLet X\\ = ko,ki,...,km-\\ and X2 = km, A;m+i,...,&2m-i be the two sortedsequencestobemerged,where2m= 2d. We show that thereexistsa normalbutterfly algorithmthat can mergeX\\ and X2 in 0(d)time.

We use a slightly different versionof the odd-evenmerge.Firstweseparate the oddand even partsof X\\ and X2. Let them be Oi,Ei,02,andE2.Then we recursivelymergeE\\ with O2 to obtain A = ao,a\\,...,am-\\.Also we recursivelymerge0\\ with E2 to obtain B = bo,b\\,...,bm-\\.Afterthis, A and B are shuffled to form C = ao,bo,a\\, 61,...,am_i,bm-\\. Nowwe compareaj with bi (for 0 < % < m \342\200\224 1) and interchangethem if they areout of order.Theresultantsequenceis in sortedorder.Thecorrectnessofthis algorithmcan beestablishedusingthe zero-oneprincipleand is left asan exercise.

Example15.10Let XY = 8,12,25,31and X2 = 3,5,28,46.Forthis case,Ox = 12,31and Ex = 8,25.02= 5,46and E2 = 3,28.MergingE1with 02we get A = 5,8,25,46.Similarly we get 5= 3,12,28,31.Shuffling A withB gives C = 5,3,8,12,25,28,46,31.Next we interchange5 and 3,and 46and 31to get 3,5,8,12,25,28,31,46. \342\226\241

It turns out that the modified algorithmis very easy to implementonBa- Forexample,partitioningX\\ and X2 into their oddand even partscan be easily done in onestepon a B&. Theshuffling operationcan alsobeperformedeasily on a B&. On a Bd, assumethat both X\\ and X2 are inputin level d. Let

X\302\261be input in the first m rows (i.e.,rows0,1,2,...,m \342\200\224 1)

and X2 in the next m rows.Thefirst stepof the algorithmis to separateX\302\261

and X2 into their oddand even parts.After this, we recursivelymergeE\\

with O2,and 0\\ with E2.To do this, routethe keys in the first m rowsusingdirectlinks and routethe otherkeys usingcrosslinks (seeFigure15.13).

After this routing we have E\\ and O2 in the even subbutterfly and 0\\and E2 in the oddsubbutterfly. In particular,E\\ is in the first half of therows of the even subbutterfly, 02 is in the secondhalf of the rows of theeven subbutterfly, and soon (seeFigure15.13(a)).ThepartsE\\ and O2 are

15.5.MERGING 749

(a) (b)

Figure15.13Odd-evenmergeon the butterfly

recursively mergedin the even subbutterfly. At the sametime,0\\ and E2arerecursivelymergedin the oddsubbutterfly. Oncethe recursivecallsareover, A will be ready in the even subbutterfly (at level d \342\200\224 1) and B willbe ready in the oddsubbutterfly (seeFigure15.13(b)).What remainstobedone is a shuffle and a compare-exchange.They can be doneas follows.Eachprocessorat level d \342\200\224 1 sendsitsresult alongthe crosslink as well asthe directlink. When the processorin row i at level d receivestwo datafrom above,it keepsthe minimum of the two if i is even;otherwiseit keepsthe maximum.Forexample,the processor(0,d) keepsthe minimum of aoand &o- Theprocessor(1,d) keepsthe maximumof ao and bo- And soon.

IfT(\302\243)

is the run timeof the above algorithmon a butterfly of dimension\302\243,

then the timeneededto partitionX\\ and X2 is 0(1).Thetimetakento recursively mergeE\\ with 02 and 0\\ with E2 is

T(\302\243

\342\200\224 1) sincethesemergingshappen in the even and oddsubbutterflieswhich areof dimensiononelessthan I. Oncethe recursivemergesareready, shuffling A and B andperformingthe compare-exchangeoperationalsotakea totalof 0(1)time.So, T(\302\243)

satisfiesT(\302\243)

=T(\302\243

- 1)+O(l),which solves toT(\302\243)

=0(\302\243).

Therearetwo phasesin the overall algorithm.In the first phasedataflowfrom bottomto topand in the secondphasedata flow from topto bottom.In the first phase,when any dataitemprogressestoward level zero,it enterssubbutterfliesof smallerand smallerdimensions.In particular,if it is inlevel

\302\243,

it is in a Be- In the Be the datum is in, if it is in the first half of therows, it takesthe directlink; otherwiseit takesthe crosslink. When all the


datareachlevel zero,the first phaseis complete.In the secondphase,dataflow from topto bottomone level at a time. When the data are at levelI, eachprocessorat level \302\243 sendsits datum both alongthe directlink andalongthe crosslink. In the next timestep,processorsin level \302\243 + 1keepeitherthe minimum or the maximumof the two itemsreceivedfrom abovedependingon whetherthe processoris in an even row or an oddrow.Whenthe data reachlevel d, the final result is computed.This alsoverifies thatthe algorithmtakesonly time0(d)on any Bd- Note alsothat this algorithmis indeednormal.

Theorem15.6Two sortedsequencesof lengthm eachcanbemergedon aBd in 0(d)time,given that 2m= 2d. UsingLemma15.2,mergingcan alsobe doneon a sequentialV.d in 0(d)time. \342\226\241

15.5.2BitonicMergeIn Section13.5,Exercise2, the notionof a bitonicsequencewas introduced.To recall,a sequenceK = ko, k\\,...,kn-\\ is saidtobe bitoniceither(1) ifthere is a 0 < j < n \342\200\224 1 such that ko < k\\ < \342\226\240\342\226\240\342\226\240 < kj > kj+\\ > \342\200\242\342\200\242\342\200\242 >kn-\\ or (2) if a cyclic shift of K satisfies condition1. If K is a bitonicsequencewith n elements(for n even), let ai = min {ki,ki+n/2}and 6j =max {ki,ki+n/2},0 < i < n/2 \342\200\224 1.Also let L(K) = ao,a\\,...,an/2-iandH(K)= bo,b\\,...,fen/2-i- A bitonicsequencehas the following properties(whichyou have already proved):

1.L(K) and H(K)areboth bitonic.2. Every elementof L(K) is smallerthan any elementof H(K).In other

words, to sort K, it suffices to sort L(K) and H(K)separatelyandoutput one followed by the other.

Given two sortedsequencesof m elementseach,we can form a bitonicsequenceout of them by following one by the other in reverseorder. Forexample,if we have the sequences5,12,16,22 and 6,14,18,32,the bitonicsequence5,12,16,22,32,18,14,6 can be formed.If we have an algorithmthat takesas input a bitonicsequenceand sortsthis sequence,then thatalgorithm canbeusedto mergetwo sortedsequences.Theresultantalgorithmis calledbitonic merge.

We show how to sort a bitonicsequenceon a butterfly usinga normalalgorithm.Let the bitonicsequenceX = ko, k\\,...,&n_i with n = 2d beinput at level zeroof Bd- We makeuse of the fact that if we remove allthe zerolevel processorsand incidentlinks of a Bd, then two copiesof Bd-iresult (seeFigure15.14).Callthe subbutterfly with rows0,l,...,2d-1- 1the left subbutterfly (shown with dottedthick lines in the figure) and theothersubbutterfly the right subbutterfly.

15.5.MERGING 751

level = 0

level = 1

level = 2

level = 3

row 000001010Oil 100101110111

Figure15.14Bitonicmergeon Bd

In the first step,eachlevel zeroprocessorsendsits key alongthe directlink and the crosslink as shown in Figure15.14.A level oneprocessor,onreceiptof two keys from above, keepsthe minimum of the two if it is in theleft subbutterfly. Otherwiseit keepsthe maximum.At the end of this step,the left subbutterfly has L(K) and the right subbutterfly has H(K).TheL(K) and H(K)are recursively sortedin the left and right subbutterflies,respectively.Oncethe recursivecallsarecomplete,the sortedsequenceis atlevel d.

IfT(\302\243)

is the run timeof this algorithm,we have T(l)= T(i-1)+ O(l);that is, T(i)=

0(\302\243).

Theorem15.7A bitonicsequenceof length 2d can be sortedon a Bd in

0(d)time. In accordancewith Lemma15.2,sortingcan alsobe doneon asequentialTid in 0(d)time. \342\226\241

EXERCISE

1.Provethe correctnessof the modifiedversionof odd-evenmergeusingthe zero-oneprinciple.


15.6SORTING15.6.1Odd-EvenMergeSortThe first algorithmis an implementationof the odd-evenmergesort.If X =ko, k\\,...,kn-\\ is the given sequenceof n keys, odd-evenmergesortpartitions X into two subsequencesX[ = ko,k\\,...,

fc\342\200\236/2-i

and X'2 = kn/2,kn/2+i,...,fcn-i of equallength.ThesubsequencesX[ and X2 aresortedrecursivelyassigningn/2processorstoeach. The two sortedsubsequences(callthemX\\ and X2, respectively)are then finally mergedusingthe odd-evenmergealgorithm.

Given 2d keys on level d of Bd (onekey perprocessor),we can partitionthem into two equalpartsof which the first part is in rows0,1,..., 2d~l\342\200\224 1and the secondpart is in the remainingrows. Sort eachpart recursively.Specifically,sort the first part usingthe left subbutterfly and at the sametimesortthe secondpart usingthe right subbutterfly. At the endof sorting,the sortedsequencesappearin level d.Now mergethemusingthe odd-evenmergealgorithm(Theorem15.6).

IfS(\302\243)

is the timeneededto sort on a Be usingthe above divide-and-conqueralgorithm,then we have

S(i)= S(\302\243-l)+ 0(\302\243)

which solves toS(\302\243)

= O(f).Theorem15.8We can sortp = 2d elementsin 0(d2)timeon a Bd- As aconsequence,the samecanbedonein 0{d2)timeon a sequentialT-id aswell(c.f.Lemma15.2). \342\226\241

15.6.2BitonicSortTheideaof mergesortcan alsobe appliedin conjunctionwith the bitonicmergealgorithm(Theorem15.7).In this case,we have p numbersinput atlevel zeroof Bd- We sendthe data to level d \342\200\224 1,sothat the first half of theinput is in the left subbutterfly and the next half is in the right subbutterfly.The left half of the input is sortedrecursively using the left subbutterflyin increasingorder. At the sametimethe right half of the input is sortedusingthe right subbutterfly in decreasingorder. The sortedsequencesareavailableat level d\342\200\224\\. They arenow sent back to level d, so that at level dwe have a bitonicsequence.Thissequenceis thensortedusingthe algorithmof Theorem15.7.

Again, ifS(\302\243)

is the timeneededto sorton a Beusingthe above bitonicsortmethod,then we have


S(\302\243)= S(\302\243-l)+ 0(\302\243)

which solves toS(\302\243)

= O(f).Theorem15.9We can sortp = 2d elementsin 0(d2)timeon a Bd usingbitonicsort.As a result,applying Lemma15.2,sortingcan alsobe done in0(d2)timeon a sequentialJid usingbitonicsort. \342\226\241

EXERCISES1.Eachprocessorof a sequentialTid is the originof exactlyone packet

and eachprocessoris the destinationof exactlyonepacket.Presentan0(d2)time0(l)-queue-sizeddeterministicalgorithmfor this routingproblem.

2.Makinguseof the ideaof Exercise1,devisean 0(d2)time0(l)-queue-lengthdeterministicalgorithmfor the PPRproblem.

3.You aregiven 2d k-bit keys. Presentan 0(kd)timealgorithmto sortthesekeys on a Bd-

4. Array A is an almost-sortedarray of p = 2d elements.It is given thatthe positionof eachkey is at most a distanceq away from its finalsortedposition.How fast canyou sortthis sequenceon the sequentialTid and the Bd'l Expressthe run timeof your algorithmas a functionof d and q. Provethe correctnessof your algorithmusingthe zero-oneprinciple.

15.7 GRAPH PROBLEMSWe usethe generalframeworkof Section13.7to solvegraphproblemssuchas transitiveclosure,connectedcomponents,and minimum spanningtree.We seehow to implementAlgorithm 13.15on an n3-processorhypercubesothat its timecomplexity is 0(logn).

Let the elementsof M be indexedas M(i,j)for 0 < i,j < n \342\200\224 1,asthis will simplify the notation.Assume that n = 2lfor someintegerI. We

employ a 'Hu- Note thatH3\302\243

has n3 processors.Each processorof a Tiêcanbelabeledwith a 3\302\243-bit binary number.View the labelof any processoras a triple(i,j,k), wherei is the first \302\243 bits,j is the next \302\243 bits,and k is thelast \302\243 bits.

Definition15.6On the hypercubeV.^ let (i,*, *) stand for allprocessorswhose first \302\243 bits equal the integeri, for 0 < i < 2l\342\200\224 1.Theseprocessors


form a 7i2t- Similarly define (*,j,*) and (*,*,k).Also define (i,j,*)tobeall the processorswhose first \302\243 bitsequali and whosesecond\302\243 bitsequalj.Theseprocessorsform a \"H^. Similarlydefine (i,*,k)and (*,j,k). \342\226\241

Theorem15.10MatrixM can be computedfrom an n x n matrixM in

0(log2n)timeusinga Use-

Proof:Theproofparallelsthat of Theorem14.12.In step 1,to updateq[i,j,k],the correspondingprocessorhas to accessboth m[i,j]and m[j,k].Each processorcan accessm[i,j]by broadcastingm[i,j]in the subcube(i,j,*).For processor(i,j,k)to get m\\j,k],transposethe matrixm[ ] andstore the transposeas x[ ] in (*,*,0).Now elementx[i,j]is broadcastin the subcube(i,j,*).Broadcastingcan be done in O(logn)timeeach.Transposingthe matrixcan alsobe completedin O(logn)timeand thedetailsare left as an exercise.

In step 2 of Algorithm 13.14,the updatedvalue of m[i,j]can becomputed and storedin the processor(i,0,j)in O(logn) timeusingthe prefixcomputationalgorithm.Thetwo broadcastsused in the meshimplementation to transferthe n2 updatedm[] values to (*,*, 0) canbedonein O(logn)timeusingthe hypercubebroadcastalgorithm.

Thus eachrun of the for looptakesO(logn) time. So,the overall timeof Algorithm 13.15is 0(log2n). \342\226\241

Thefollowing theoremsareimpliedas corollaries.

Theorem15.11Thetransitiveclosurematrixof ann-vertexdirectedgraphcan becomputedin 0(log2n)timeon an n3-processorhypercube. \342\226\241

Theorem15.12The connectedcomponentsof an n-vertexgraph can bedeterminedin 0(logn) timeon an n3-processorhypercube. \342\226\241

AllPairsShortestPathsIn Section13.7.2,we notedthat this problemcan be solved by performinglognmatrixmultiplications.Two n x n matricescan be multipliedon an

j^-^-processorhypercubein O(logn) time (seeExercise3), sowe get thistheorem.

Theorem15.13The all-pairsshortestpaths problemcan be solved in

0(log2n)timeon anj^\342\200\224-processor hypercube.


EXERCISES1.Showhow to transposeannxnmatrixon an n3-processorhypercube

in O(logn) time.Assume that n = I1for someinteger\302\243.

2. Provethat two n xn matricescanbemultipliedin O(logn)timeon ann3-processorhypercube.Assume that n is an integralpower of two.

33. Show that two n x n matricescan be multipliedon anj\302\243\342\200\224

-processorhypercubein O(logn) time.

4. Usethe generalparadigmof this sectiontodesigna hypercubealgorithm for finding a minimum spanningtreeof a given weightedgraph.Analyze the timeand processorbounds.

5. Presentan efficient algorithmfor topologicalsorton the hypercube.6.Give an efficient hypercubealgorithmtocheckwhethera given

undirected graph is acyclic.Analyze the processorand timebounds.7. If G is any undirectedgraph, G is defined as follows: Therewill be

a link betweenprocessorsi and j in Gk if and only if thereis a pathof lengthk in G betweeni and j. Presentan O(lognlogfc)timen3-processorhypercubealgorithmto computeGk from G.It is given thatn is an integralpower of two.

8.You are given a directedgraph whoselinks have a weight of zeroorone.Presentan efficient minimum spanningtree algorithmfor thisspecialcaseon the hypercube.

9. Presentan efficient hypercubeimplementationof the Bellmanand Fordalgorithm(seeSection5.4).

10.Showhow toinvert a triangularnxnmatrixon an n3-processorhypercube in 0(log2n)time.

11.Presentan O(logn) timealgorithmfor inverting an n x n tridiagonalmatrixon an n2-processorhypercube.

15.8COMPUTINGTHE CONVEXHULL

Like the PRAM and meshconvexhull algorithmsof Sections13.8and 14.8,the hypercube algorithmis basedon the divide-and-conqueralgorithmofSection3.8.4.Assume that the n pointsareinput on Tid (wheren = 2d) onepoint perprocessor.The final output is the clockwisesequenceof pointson


the hull. We needto specify an indexingschemefor the processors.Onepossibility is to use the embeddingof a ringon the hypercube;that is, theorderingamongthe processorsis as specifiedby the embedding.

Referringto the preprocessingstepof Algorithm 13.16,we can sort theN input points accordingto their re-coordinatevalues in 0(logN) time(Theorem15.8).Let qo,q\\, \342\226\240\342\226\240

\342\226\240, qN-ibe the sortedorderof thesepoints.Instep1,we partitionthe input into two equalpartswith qo,qi,...,qjq/2-imthe first part and q^/2,qN/2+i-,\342\200\242\342\200\242\342\200\242

> Qn-im the secondpart. Thefirst partis placedin the subcube(callit the first subcube)of processorsthat havezerosin their first bits-. Thesecondpart is kept in the subcube(callit thesecondsubcube)of processorsthat have ones in their first bits.In step2,the upperhull of eachhalf is recursivelycomputed.Let H\\ and Hibe theupperhulls.In step3,we find the commontangent in 0(logN) timeusingthe sameproceduresasusedin the meshimplementation(seeLemmas14.10and 14.11).Let (u,v) be the commontangent. In step4, all the pointsofH\\ that aretothe right of u aredropped.Similarly,all the pointsof H2 tothe left of v aredropped.Theremainingpart of Hi,the commontangent,and the remainingpart of H2 is output.

IfT{\302\243)

is the run timeof this recursivealgorithmfor the upperhull onan input of size2f employingTii,then we have

T(\302\243)

=T{\302\243-\\) + 0{\302\2432)

which solves toT{\302\243)

=0(\302\2433).

We stillhave to indicatehow to find the tangent (u,v) in 0(logN) time.The way to find the tangent is to start from the middlepoint, call it p,of H\\. Find the tangent of p with H2.Let (p,q) be the tangent. Using{p,q),determinewhetheru is to the left of, equalto,or to the right of p inHi.A binary searchin this fashion on the pointsof Hi revealsu. Usethesameprocedureto isolatev as well.Similarto Lemmas14.10and 14.11,thefollowing lemmascan beproved.

Lemma 15.12Let Hi and H2 be two upperhulls with at most N pointseach.If p is any point of Hi,its tangent with H2 can be found in 0(logN)time. \342\226\241

Lemma 15.13If Hi and H2 are two upperhulls with at mostN pointseach,their commontangentcan becomputedin 0(logN) time. \342\226\241

In summary, we have the following theorem.

Theorem15.14Theconvexhull of n points in the planecanbecomputedin 0(log3n)timeon a Ud,wheren = 2d. \342\226\241


EXERCISES1.ProveLemmas15.12and 15.13.2.Presentan 0(log3n) timealgorithmto computethe areaof the convex

hull of n given points in 2D on a l-i^ with n = 2d.

3.Given a simplepolygon and a point p,presentan O(logn) timealgorithm on a Tid (wheren = 2d) to checkwhetherp is internal to thepolygon.

4. Presentan efficient algorithmto checkwhetherany threeof n givenpointsarecolinear.Usea hypercubewith n processors.What is thetimebound?

15.9REFERENCESAND READINGSFora comprehensivecollectionof hypercubealgorithmssee:Introductionto ParallelAlgorithmsand Architectures:Arrays-Trees-Hypercubes,by Tom Leighton,Morgan-Kaufmann,1992.HypercubeAlgorithms with Applications to ImageProcessingand PatternRecognition,by S.Ranka and S.Sahni,Springer-VerlagBilkent UniversityLectureSeries,1990.

Therandomizedpacketroutingalgorithmis due to \"Universal schemesfor parallelcommunication.\"by L. Valiant and G.Brebner,Proceedingsofthe J3th Annual ACMSymposiumon Theory of Computing,''''1981,pp.263-277.

A moregeneralsparseenumerationsortwas given by D.Nassimiand S.Sahni.They showedthat n keyscanbesortedon an m-processorhypercubein time O ( w?n/|^T')\342\200\242 See \"Parallelpermutationand sorting algorithmsand a new generalizedconnectionnetwork,\" Journalof the ACM29,no.3(1982):642-667.

The randomizedselectionalgorithmcan be found in \"Randomizedparallel selection.\"by S.Rajasekaran,Proceedingsof the Tenth Conferenceon Foundationsof Software Technology and TheoreticalComputerScience,1990,Springer-VerlagLectureNotes in ComputerScience472,pp.215-224.

Thedeterministicselectionalgorithmpresentedin thischapterisbasedon\"Unifying themesfor network selection,\"by S.Rajasekaran,W. Chen,andS.Yooseph,Proceedingsof the Fifth InternationalSymposiumon Algorithmsand Computation,August 1994.


For a comprehensivecoverageof sorting and selectionalgorithmssee\"Sortingand selectionon interconnectionnetworks,\"by S.Rajasekaran,DI-MACS Seriesin DiscreteMathematicsand TheoreticalComputerScience21,1995,pp.275-296.

An O(logn) timealgorithmfor sortingon an n-processorhypercubecanbe found in \"A logarithmictimesortfor linearsizenetworks,'by J.H.Reifand L.Valiant in Journalof the ACM 34, no.1(1987):60-76.

A fairly involved O(lognloglogn)timedeterministicsortingalgorithmcanbe found in \"Deterministicsortingin nearly logarithmictimeon thehypercube and relatedcomputers,\"by R. Cypher and G.Plaxton,Proceedingsof the ACM Symposiumon Theory of Computing,1990,pp.193-203.

15.10ADDITIONALEXERCISES1.Showhow to computethe FFT(seeSection9.3)on an input vectorof

length2d in 0(d)timeemployinga Bd-

2. Provethat two polynomialsof degree2d canbemultipliedin 0(d)timeon a Bd-

3. A d-dimensionalCube ConnectedCyclesnetwork, or CCC,is a V,d inwhich eachprocessoris replacedwith a cycle of d processors(oneforeachdimension).A three-dimensionalCCCis shown in Figure15.15.Thereis a closeconnectionbetweenthe CCC,butterfly, and hypercubenetworks.Computethe degree,diameter,and bisectionwidth of a d-dimensionalCCC.

4. Show how to performprefix computationson a d-dimensionalCCC(seeExercise3) in 0(d)time. Assume that the 2d data are given inthe processors(i, l) (0 < i < 2 \342\200\224 1).

5. Presentan algorithmfor sorting2d keys in 0(d2)timeonad-dimensionalCCC(seeExercise3).

6. Theproblemof one-dimensionalconvolution takesas input two arrays7[0 : n \342\200\224 1] and T[0: m \342\200\224 1].Theoutput is anotherarray C[0: n \342\200\224 1],whereC[i]= J2T=o ^K* + ^) mo^ n]^M> f\302\260r 0 < i <n. Employ an n-processorhypercubeto solvethisproblem.Assumethat eachprocessorhas 0(m)localmemory. What is the run timeof your algorithm?

7. SolveExercise6 on an n-processorCCC(seeExercise3).


000

OlOi

100

110

001(000,3) (001,3)

(000,1)(/ ^,(000,2)(001,1)(X _^s> (001,2)

101

on

111

Figure15.15A 3D cubeconnectedcyclesnetwork

8.Theproblemof template matchingtakesas input two matrices7[0 :n \342\200\224 1,0 : n \342\200\224 1]and T[0: m \342\200\224 1,0 :m \342\200\224 1].The output is a matrixC[0:n \342\200\224 1,0: n \342\200\224 1],where

m\342\200\2241m\342\200\2241

C[i,j]= 5Z 5Z 7t(* + fc) modn> ^' + 0 modnJT[fc5']fc=0 i=0

for 0 < i,j < n. Presentan n2-processorhypercubealgorithmfortemplatematching. Assume that eachprocessorhas 0(m)memory.What is the timeboundof your algorithm?

9. SolveExercise8 on an n2-processorCCC(seeExercise3).

INDEX

O,5815-puzzle,382-3862-3tree,88-89,1004-queens,3468-queens,340, 353-357e-approximatealgorithms,558,566-575Absolute approximationalgorithm,558,

561-565Ackermann'sfunction,14Adelman, R.,447Adjacency list,119Adjacencymatrix,118Adjacencymultilists,123Adversary argument,458, 466-473Aho, A.V.,456,494, 553Algebraicproblems,417-456Algorithm, definition, 1-4Algorithm, specification,5All pairsshortestpaths,

hypercube,754mesh,711-712PRAM, 655

Alon, N.,664Amdahl'slaw, 608Amortization, 89-90And/Or Graphs,526-530Appel, K.I.,374Approximationalgorithms,557-604Asymptotic notation (O,Q, etc.),29Asymptotic speedup,606Atallah, M.,719AVL tree,88-89Azar, Y.,664

B-tree,88-89Bach,E.,456

Backtracking,339-378Baeza-Yates,126Balancingprinciple,435Belaga,492Bellman,R.E.,307,308Bernoullitrial,56Bestfit, 570Bestfit decreasing,571BFS,320-323Biconnectedcomponents,329-336Binpacking,569Binary search,131-138Binary searchtrees,83-89

optimal,275-282Binary treeof processors,719Binary trees,78-81,313-318

complete,79full, 79inordertraversal,314postordertraversal,315preordertraversal,315skewed,78tripleordertraversal,318-319

Binomialdistribution,56Binomialheap,99Bisectionwidth, 680Bitonicmerge,750-751Bitonicsort,752-753Blum,M.,194Bopanna,R.B.,664Borodin,A.B.,456, 486,493Boundingfunctions,340, 369,

388-391Branchand bound,379-416

FIFO,391,397-401LC,392,394-397

Breadthfirst search,320-323

762 inde:

Brebner,G.,718,757Brigham,O.,456Broadcasting

on a hypercube,737on a mesh,681

Bruno,J.,553Bruteforcemethod,339Bucketsort,194-195Butterfly network, 726

Chen,W., 718,757Chernoffbounds,57Children,in a tree,77Chineseremaindertheorem,

438,444Chromaticnumber,360,521-522,560Circuitrealization,553Clique,

seeMaximum cliqueCliquecover,532Codegenerationproblems,540-549Coffman, E.G.,Jr.,553Coinchanging,250Cole,R.,664Colorability,551,556Comparisontrees,458-465Complexityclasses,504-507Concentratorlocation,554Conditionalprobability, 54Conjunctivenormalform, 502Connectedcomponents,325-327Controlabstraction,198Convexhull,

hypercube,755-756lower bound,475-476mesh,713-717PRAM, 656-658sequential,183-193

Convolution,721,758Cook,S.A.,505Cook'stheorem,508-516Cooley,J.M.,431,456Coppersmith,D.,194Cormen,T.H.,68CRCW PRAM, 610CREW PRAM, 610

Cubeconnectedcycles,758Cyclecover,601Cypher, R.,758

Dataconcentration,on a hypercube,739-742on a mesh,685-686

Datastructures,69-126queues,73-75stacks,69-73trees,76-81

Deaps,100Decisionalgorithm,498-499Decisionproblem,498Degreeof a tree,77Demers,A., 572Denardo,E.V.,308Depthfirst search,323-325DFS,323-325Diameter,680Dictionaries,81-90Dietz,P.,494Diffie, 447Dijkstra,E.W.,250Disjointsetunion,101-110Disjunctivenormalform, 503Divide-and-conquer,127-196Dixon,B.,249Dobkin,D.,494Dominancecounting,659Dominancerules,288,414,

586Dreyfus, S.E.,307D-search,325,344, 346, 379Dynamic programming,

253-312

Embedding,727-732Embeddinga binary tree,730-732Embeddinga ring,728-729Embeddinga torus,730Eppstein,D.,663EREW PRAM, 610Euclid'salgorithm,441-444Euler,112Exactcover,533

INDEX 763

Expectedvalue, 55Exponentiation,35-37Externalpath length,136

Fast FourierTransform, 430-439in-place,435-438nonrecursive,435-438

Feedbackarcset,532Feedbacknodeset,530Fermat'stheorem,61FFT,430-439,627,691,721,758Fibonacciheap,100Fibonaccinumbers,26,429Fiduccia,C,183,494Field,485FIFOsearch,379Find the maxand min, 139-144Find,union,101-110Firstfit, 570Firstfit decreasing,570Fischer,M.J.,308Fixedconnectionnetworks,609Flannery, B.P.,456Flow shopscheduling,301-306,

536-538Floyd, R.,194,308,664Ford,L.,494Forest,77Fourierprimes,440-446Frazer,W.D.,193,194Fredman,M.,250Freesequence,673Fully polynomial time

approximation,585-595

Gabow, H.N.,249Galil,Z.,249, 663Galoisfield, GF(p),440Garey, M.,553,572,593,599,600Generalizedassignmentproblem,

604Gibbons,A., 338Goldner,A., 338Golumbic,M.,338Gonnet,126Gonzalez,T.,553,599

Graham,R.L.,68,572,579,599Graham'sscan,187-189Graphcoloring,360-364Graphs,112-124

adajacencylists,119adjacencymatrix,118and/or,526-530biconnectedcomponents,329-336bipartite,328bridgeof, 337complete,378connected,115cycle,115definitions,112-118in-degree,117isomorphic,377minimum spanningtrees,216-228out-degree,117path, 115planar,361shortestpaths,241-247,265-273strongly connected,117traversals,318-325

Gray code,729Greatestcommondivisor, 441-444Greedy method,197-252

for packetrouting,674-675,732-733

Greenberg,H.,374Grimmet,G.R.,68

Hall,M.,374Haltingproblem,506-507Hambrusch,S.,719Hamiltoniancycles,364-366,532,

551,574-575directed,522-525

Hammingdistance,723Hayfil, L.,494Heaps,92Heapsort,99Hegerich,R.,374Heightof a tree,77Heightrule,110Held,M.,308,416Hellman,447

764 INDEJ

Hermann,P.,553Heuristicsearch,416High probability, 58Highprobability bounds,58Hi-Qgame,376Hittingset,533Hoare,C.A.R.,193Holt,250Hopcroft,J.E.,456, 494Horner,W.G.,421Horner'srule,421-422Horowitz,E.,126,308,374Huffman codes,238-240Huffman, D.,250Hypercube,723Hypercubealgorithms,723-759

allpairsshortestpaths,754broadcasting,737convexhull, 755-756dataconcentration,739-742embedding,727-732merging,748-751packetrouting,732-736prefix computation,737-739selection,744-747sorting,752-753sparseenumerationsort,742-743transitiveclosure,754

Ibaraki,T.,416Ibarra,O.,591,599Igarashi,Y., 718Independence,54Independentset,555Indexingschemes,684Ingargiola,G.,416Input size,22Insertionsort,151,172Integerprogramming,573Integersort,195Internalnodes,135Internalpath length,136Interpolation,423-428,448-455Intervalpartitioning,591-592

Ja Ja, J.,663,664

Jeong,C,719Jobsequencingwith deadlines,

208-215,539Jobshopscheduling,538-539Johnson,D.,553,572,593,599,600Johnson,S.,494, 553

k \342\200\224 k routing,680Kaklamanis,C,719Karel,C,416Karger,D.R.,249Karp,R.,308,416,598,600Kaufmann,M.,719Kim,C,591,600King,V.,249Klein,P.N.,249, 250Kleinrock,68Knapsackproblem,198-202,253,

255-256,287-294,368-373,393-401,499,501,554, 580

Knuth,D.E.,68,308,374, 448, 456,494

Kohler,W., 416Konigsberg,112-113Korsh,J.,416Kosaraju,S.,719Krizanc,D.,494, 719Kruskal,J.B.,249Kruskal'salgorithm,220-225Kucera,L.,664Kunde,M.,719Kung,H.T.,719

Lagrange,423,425Lagrangianinterpolation,423-426Lame,G.,443LasVegasalgorithm,57Lawler,E.,250Leaf nodes,77Leastcost(LC)search,380-391Lee,D.T.,719Leftist tree,100Leighton,F.T.,663,718,757Leiserson,C.E.,68Level of a node,77Lewis,P.A.,431

INDEX

Lieberherr,K.,599LIFOsearch,379Lineararrangement,553Lineararray, 668Linearspeedup,606Lipton,R.,494List ranking,618-625Little,J.,416Longestcommonsubsequence,287Lowerbounds,457-494

for merging,467-470for selection,464-465,662for sorting,459-462,660-661

LPTschedule,566

Magicsquare,34-36Makedon,F.,718Manber,U.,126,194,661,665Markov's inequality, 56Martello,S.,416Matrix addition,23Matrix inversion,478-480,755Matrixmultiplication,179-181,

477on a hypercube,755on a mesh,691on a PRAM, 626

Matrix productchains,308-310Matrix transpose,

on a hypercube,755on a mesh,690-691

Max cut,552Maximalselection,627-631Maximumclique,499,500,502,

518-520,556McKeller,A.C.,193,194Median,165-174Median of medians,169Mehlhorn,K.,126Mehta,D.,126Merge,

hypercube,748-751mesh,698-701PRAM, 636-642sequential,147,153

Mergesort,145-153

Mesh, 668Mesh algorithms,667-722

allpairsshortestpaths,711-712broadcasting,681convexhull, 713-717dataconcentration,685-686merge,698-701packetrouting,669-676prefix computation,681-685selection,691-697sorting,701-707sparseenumerationsort,686-688transitiveclosure,709-711

Mesh of trees,721Miller, G.,63Miller,R.,719Miller, W., 265,493Minimumbooleanform, 533Minimumcostspanningtrees,

216-228Kruskal'salgorithm,220-225PRAM algorithm,654Prim'salgorithm,218-221randomizedalgorithm,225-228

Minimumcut, 552Minimumedgedeletion,552Minimumequivalent graph,532Minimumnodedeletion,552Modular arithmetic,440-446Monte Carloalgorithm,57Motwani, R.,68Motzkin, 492Mulmuley,K.,194Multiport hypercube,725Multistagegraphs,257-263Munro, I.,456,487,493Murty, K.,416

n-queens,342Narayanan, L.,719Nassimi,D.,553,718,719,757Nemhauser,G.,308Newton, L, 421

766 INDE)

Newtonian interpolation,426-428Node cover,251,519,551Nodes,live or dead,345Nondeterministicalgorithms,

496-507Nonpreemptiveschedule,303Normal butterfly algorithm,727Af'P-completeproblems,495-553MV'-hard approximations,563-565Af'P-hard problems,495-553

Odd-evenmerge,hypercube,748-750mesh,699-701PRAM, 637-639

Odd-evenmergesort,hypercube,752-753lineararray, 703mesh,705-707

Odd-eventranspositionsort,lineararray, 702

On-linemedian,477Optimalbinary searchtrees,275-283Optimalfinish time,303Optimalmergepatterns,234-239,253Optimalstorageon tapes,229-233Optimizationmeasure,197Oracle,458,466-473Overholt,R.,718

Packetrouting,on a hypercube,732-736on a mesh,669-677

Padberg,M.,375Paige,R.,68,194,664Paik,D.,249Papadimitriou,C.H.,553,600Parallelcomparisontrees,659Parallelhypercube,725ParallelRandomAccessMachine,

605-666Parent,in a tree,77Partialordering,466Partialpermutationrouting,669Partition,154,165,533-539,573Patashnik,O.,68

Path halving, 111Pathsplitting,111Performanceanalysis, 14-49Performancemeasurement,40-49,

159Permutationgenerator,11-13Pigeonholeprinciple,14Plant location,554Plaxton,G.,758Polynomial complexity,504Polynomialequivalence,507Polynomialtimeapproximations,

579-584Polynomials,418-421

denserepresentation,420evaluation,420-423,448-455625,690,744sparserepresentation,420

Posa,L.,597,600Postordertraversal,315Postagestampproblem,376Practicalcomplexities,37-40Pratt, V.,194Preemptiveschedule,303Prefixcomputation,

mesh,681-685on a binary tree,737-739on a hypercube,737-739PRAM, 615-618

Preparata,F.,194,664Preparata'salgorithm,645-647Press,W.H.,456Prim,R.C.,249Prim'salgorithm,218-221Primality testing,61-64Principleof optimality, 254Priority queues,91-100Priority schemes,684Probabilisticallygoodalgorithms,

596-599Probability distribution,55Probabilitytheory, 53-57PRAM algorithms,605-666

allpairsshortestpaths,655convexhull, 656-658list ranking,618-625

INDEX 767

merge,636-642prefix computation,615-618selection,627-634sorting,643-649transitiveclosure,653-654

Quadraticprogramming,554Queuelength,669Queuelinelemma,733Queues,73-75Quickhull,185-187Quicksort,154-159

Rabin,M. 0.,63Radixsort,195Raghavan, P.,68Rajasekaran,S.,68,194,494, 664, 718,

719,757,758Ramaswami,S.,719Randomaccessread,650-651Randomaccesswrite, 651Randomvariable,55Randomizedalgorithms,57-65,159-164,

225-228,622-625,632-634,647-649,676-677,691-692,733-736,744-745

Randomizedlist ranking,622-625Randomizedpacketrouting,

on a hypercube,733-736on a mesh,676-677

Randomizedselection,hypercube,745mesh.691-692PRAM, 632-634

Randomizedsorting,PRAM, 647-649sequential,159-164

Rank merge,698Rank sort,701-702Ranka, S.,757Rao,S.,250Rauch,M.,249,250Recurrencerelations,128-130Red-blacktree,89,100Reducedcostmatrix,403-411Reductions,474-483,505Reif, J.H.,68,194,663,664,758

Reischuk,R.,664Reischuk'salgorithm,647-649Reliability design,295-298Repeatedelement,59-61Resourceallocation,257Retrieval time,

mean,230total,231

Rinehart,250Ring,485Ringof processors,679Rivest, R.L.,68,194,447, 664Root of unity, 431Rosenthal,A., 338Rounding,587-591

Sado,K.,718Sahni,S.,68,126,308,374, 553,599,

600,663,718,719,757Satisfiability,502,508-516,

528,550,552,600-603Scheduling,534-536,575-577

independenttasks,566-569,579-580

Scherson,L, 719Schonhage,H.,448, 456Schwartz,E.S.,250Searchand traversal,313-338

BFS,320-323DFS,323-325

Search,ordered,459Segmentedprefix,625,690Selection,

hypercube,744-747mesh,691-697PRAM, 627-634sequential,165-174

Selectionsort,8-10Sen,S.,664, 719Separation,592-595Sequentialhypercube,725Sequentialrepresentationof,

graphs,121trees,81

Set cover,251,533,600-604Set intersectionproblem,556

768 inde:

Sethi,R.,549, 553Sets,101-110Shamir,A., 447, 719Shamos,M.I.,175,194Shannon,I.,661,665Shastri,S.,494Shearsort,703-705Shortestpaths,

allpairs,265-268single-source,241-247,270-273

Sibeyn,J.,719Simplemaxcut, 552Simpleoptimallinear

arrangement,553Single-porthypercube,725Slow-downlemma,613Smith,W.E.,250Solvingrecurrences,128-130Sorting,

bucketsort,194-195Ford-Johnson,463-464hypercube,752-753insertionsort,151,172integersort,195mergesort,145-152mesh,701-707nondeterministic,497-498PRAM, 643-649quicksort,154-159randomizedsort,159-164selectionsort,8-10shearsort,703-705

Spacecomplexity,15Spanningtrees,216,325-327Sparseenumerationsort

on a hypercube,742-743on a mesh,686-688

Spencer,T.,249Splay tree,89Stablesort,195Stacks,69-73Statespace,344

dynamic trees,345statictrees,345tree,344, 360,363,384, 395,398,402,404, 406,407

Statespacemethod,470-473Steiglitz,K.,416Steinertrees,555Steptable,24, 25Stirzaker,D.R.,68Stockmeyer,L.,553Stout, Q.,719Straightlineprograms,484Strassen,V.,179-181,194,448, 456,Stringediting,284-286StronglyA^-hard, 593-594Subgraph,115Suel,T.,719Suffix computation,625,688Sultan,A., 375Sumof subsets,341-346,357-359,

534Superlinearspeedup,607Sweeny,D.,416

Tarjan,R.E.,109,126,194,249,338

Tautology,533Templatematching,722,759Teukolsky,S.A.,456Thanvantri, V.,663Thompson,CD.,719Timecomplexity,15Tollis,I.,718Tompa,M.,661,665Topologicalsort,655Torus,679-680Totalordering,466Toth,P.,416Tournament,469Towersof Hanoi,11Transitiveclosure,325-327,480-483

hypercube,754mesh,709PRAM, 653-654

Traub,J.,599Traveling salesperson,252,298-301,

403-411,525-526Treevertexsplitting,203-208Trees,

binary, 78-81

INDEX 769

binary search,83comparison,458-465dynamic, 345statespace,344static,345traversals,313-317

Tsantilas,T.,718,719Tukey, J.W.,431,456Turan,662

Ullman,J.D.,456,494, 553,572Ullman,Z.,308Unary flow show, 552Unary input partition,552Union,disjointset,101-110Valiant, L.G.,662,664,718,757,758Van Leeuwen,109,126

Vetterling,W.T.,456

Wachter, R.,68,194,664Wagner, R.A.,308Walker, R.J.,374Weighted edges,124Weighted median,692-697,745-747Welch, P.D.,431Willard, D.E.,250Window broadcast,743Window shifts, 743Winograd, S.,182,194,494Winston, 250

Yao, A.C.,249Yooseph,S.,718,757

Zero-oneprinciple,639

I'

TE\"L I'llEllisHorowitz,UniversityofSouthernCaliforniaSartajSahni,UniversityofFloridaSanguthevarRajasekaran,UniversityofFlorida

ThpTov t thathelpedestablishthedisciplineofcomputer3cien\342\200\236e returnsin thisthoroughlyrevisedandupdatededition.Thetextincorporatesthelatestresearchandstate-of-the-art

pli tion , bringingthisclassictotheforefrontofmoderncomputerscienceeducation.ComputerAlnorithmsemphasizes:

\342\200\242 Designtechniques:AnalysisofalgorithmsisâsQdontual dQoi3n.

\342\200\242 Examples:Usinga widerangeofexamplesprovidesstud;nt3with theactualimplementationofcorrectdesign.

\342\200\242 Thelatestresearch:The new editionincludescompletechaptPreon probabilisticandparallelalgorithms.

\342\200\242 Fullintegrationof randomizedalgorithms:Performancewith nonrandomizedalgorithmsisthoroughlycompared.

nompuh Algorithmsisappropriateasa coretextfortheupper-andgraduate-levelanalysisofalgorithmscourse.Alsoavailable

a versionofthebookthatimplementsthepopularobject-oriemedlanguageC+ v (0-7167-8315-0).

ISBN D-71b7-B31t-1

'1 3dl

\\f\" H Fnemanand Cc ny

0'eRG21*S,Fngland

780716\"783169

.90.000

Date post:	18-Aug-2015
Category:	Engineering
Upload:	marangburu42
View:	27 times
Download:	3 times

Sahni

Engineering