+ All Categories
Home > Documents > Presented by: Sagnik Bhattacharya

Presented by: Sagnik Bhattacharya

Date post: 22-Feb-2016
Category:
Upload: adolfo
View: 38 times
Download: 0 times
Share this document with a friend
Description:
Cellular Disco. Kingshuk Govil, Dan Teodosiu, Yongjang Huang, Mendel Rosenblum. Presented by: Sagnik Bhattacharya. Overview. Problems of current shared memory multiprocessors and our requirements Cellular Disco as a solution architecture prototype hardware-fault containment - PowerPoint PPT Presentation
Popular Tags:
61
Presented by: Sagnik Bhattacharya Kingshuk Govil, Dan Teodosiu, Yongjang Kingshuk Govil, Dan Teodosiu, Yongjang Huang, Mendel Rosenblum Huang, Mendel Rosenblum
Transcript
Page 1: Presented by: Sagnik Bhattacharya

Presented by:Sagnik Bhattacharya

Kingshuk Govil, Dan Teodosiu, Yongjang Kingshuk Govil, Dan Teodosiu, Yongjang Huang, Mendel RosenblumHuang, Mendel Rosenblum

Page 2: Presented by: Sagnik Bhattacharya

OverviewOverview• Problems of current shared memory multiprocessors and

our requirements• Cellular Disco as a solution

– architecture– prototype– hardware-fault containment– CPU management– Memory management– statistics

•Cellular Disco and ubiquitous environments•Conclusion

Page 3: Presented by: Sagnik Bhattacharya

ProblemProblem

• Extending modern Operating systems to run efficiently on shared memory multiprocessors.

• Software development has not kept pace with hardware development.

• Common operating systems fail beyond 12 processors.

Page 4: Presented by: Sagnik Bhattacharya

What we need….What we need….

• the system should be reliable• it should be scalable• it should be fault-tolerant• it should not take too much of development

time or effort.

Page 5: Presented by: Sagnik Bhattacharya

Traditional approachesTraditional approaches

• Hardware partitioning - lacks resource sharing, makes physical clusters.

• Software-centric approaches : (significant development time and cost)– modify existing OS– develop new OS

Page 6: Presented by: Sagnik Bhattacharya

A scenario….A scenario….

Control unitControl unitSmart Smart SpaceSpace

ProcProc ProcProc

ProcProc ProcProc

(No rebooting (No rebooting necessary)necessary)

Page 7: Presented by: Sagnik Bhattacharya

Solution : Cellular DiscoSolution : Cellular Disco

• Extension of previous work - Disco• Uses the concept of Virtual machine

monitors• Partitions the multiprocessor system into

virtual clusters.

Page 8: Presented by: Sagnik Bhattacharya

Virtual Machine MonitorVirtual Machine Monitor

Virtual Machine MonitorVirtual Machine Monitor

VM1VM1

µP1µP1 µP2µP2 µP3µP3

VM2VM2

µP1µP1 µP3µP3 µP8µP8

VM1 - µP’s 1,2,3VM1 - µP’s 1,2,3

µP5µP5

VM2 - µP’s 1,3,5,8VM2 - µP’s 1,3,5,8

OSOS(Win NT)(Win NT)

OSOS(IRIX 6.2)(IRIX 6.2)

Virtual Virtual MachineMachine

Virtual Virtual MachineMachine

HardwareHardware

Page 9: Presented by: Sagnik Bhattacharya

Virtual Machine MonitorVirtual Machine Monitor

VM1VM1

µP1µP1 µP2µP2 µP3µP3VM2VM2

µP1µP1 µP3µP3 µP8µP8

VM1 - µP’s 1,2,3VM1 - µP’s 1,2,3

µP5µP5

VM2 - µP’s 1,3,5,8VM2 - µP’s 1,3,5,8

OSOS(Win NT)(Win NT)

OSOS(IRIX 6.2)(IRIX 6.2)

I/O requestI/O request

Page 10: Presented by: Sagnik Bhattacharya

Virtual Machine MonitorVirtual Machine Monitor

VM1VM1

µP1µP1 µP2µP2 µP3µP3VM2VM2

µP1µP1 µP3µP3 µP8µP8

VM1 - µP’s 1,2,3VM1 - µP’s 1,2,3

µP5µP5

VM2 - µP’s 1,3,5,8VM2 - µP’s 1,3,5,8

OSOS(Win NT)(Win NT)

OSOS(IRIX 6.2)(IRIX 6.2)

Trap I/O Trap I/O request & request & perform I/Operform I/O

Page 11: Presented by: Sagnik Bhattacharya

Virtual Machine MonitorVirtual Machine Monitor

VM1VM1

µP1µP1 µP2µP2 µP3µP3VM2VM2

µP1µP1 µP3µP3 µP8µP8

VM1 - µP’s 1,2,3VM1 - µP’s 1,2,3

µP5µP5

VM2 - µP’s 1,3,5,8VM2 - µP’s 1,3,5,8

OSOS(Win NT)(Win NT)

OSOS(IRIX 6.2)(IRIX 6.2)

Perform I/O Perform I/O and send and send interruptinterrupt

Page 12: Presented by: Sagnik Bhattacharya

Virtual Machine MonitorVirtual Machine Monitor

VM1VM1

µP1µP1 µP2µP2 µP3µP3VM2VM2

µP1µP1 µP3µP3 µP8µP8

VM1 - µP’s 1,2,3VM1 - µP’s 1,2,3

µP5µP5

VM2 - µP’s 1,3,5,8VM2 - µP’s 1,3,5,8

OSOS(Win NT)(Win NT)

OSOS(IRIX 6.2)(IRIX 6.2)

Page 13: Presented by: Sagnik Bhattacharya

Issues it addressesIssues it addresses

• Address scalability• NUMA awareness• Hardware fault-containment• Resource management

Page 14: Presented by: Sagnik Bhattacharya

Basic Cellular Disco ArchitectureBasic Cellular Disco Architecture

Page 15: Presented by: Sagnik Bhattacharya

PrototypePrototype

• Runs on a 32-processor SGI-Origin 2000• Supports shared memory systems based on

MIPS R1000 architecture.• The prototype runs piggybacked on IRIX

6.4• The host OS is made dormant and is only

used to invoke some device drivers.

Page 16: Presented by: Sagnik Bhattacharya

Hardware VirtualizationHardware Virtualization

• Physical Resources - visible to a virtual machine

• Machine Resources - actual resources; allocated by Cellular Disco

• CD operates in the kernel mode of the MIPS processor

• CD intercepts all system calls.

Page 17: Presented by: Sagnik Bhattacharya

Resource ManagementResource Management• CPU management - Each processor maintains its

own run queue• Memory Management - Memory borrowing

mechanism• Each OS instance is only given as many

resources as it can handle. Large applications are split and communications between the parts is established by using the shared-memory regions.

Page 18: Presented by: Sagnik Bhattacharya

CPU ManagementCPU Management

• VCPU migration :

- Intra node (37 µsec)

- Inter node (520 µsec)

- Inter Cell (1520 µsec)

Page 19: Presented by: Sagnik Bhattacharya

VCPU migrationVCPU migration

Cellular DiscoCellular Disco

InterconnectInterconnect

NodeNode NodeNode NodeNode NodeNodeNodeNodeNodeNode

CPUCPU CPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPU

CellCell CellCell CellCell

VCPUVCPU

Page 20: Presented by: Sagnik Bhattacharya

Intra NodeIntra Node

Cellular DiscoCellular Disco

InterconnectInterconnect

NodeNode NodeNode NodeNode NodeNodeNodeNodeNodeNode

CPUCPU CPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPU

CellCell CellCell CellCell

VCPUVCPU

Page 21: Presented by: Sagnik Bhattacharya

Inter NodeInter Node

Cellular DiscoCellular Disco

InterconnectInterconnect

NodeNode NodeNode NodeNode NodeNodeNodeNodeNodeNode

CPUCPU CPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPU

CellCell CellCell CellCell

VCPUVCPU

Page 22: Presented by: Sagnik Bhattacharya

Inter CellInter Cell

Cellular DiscoCellular Disco

InterconnectInterconnect

NodeNode NodeNode NodeNode NodeNodeNodeNodeNodeNode

CPUCPU CPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPU

CellCell CellCell CellCell

VCPUVCPU

Page 23: Presented by: Sagnik Bhattacharya

CPU ManagementCPU Management (contd.)(contd.)

• CPU balancing : Idle Balancer Periodic

balancer

Load Balancing ScenarioLoad Balancing Scenario

Page 24: Presented by: Sagnik Bhattacharya

Idle balancerIdle balancer

CPU0CPU0 CPU1CPU1 CPU2CPU2 CPU3CPU3

VC B0VC B0

VC A1VC A1

VC B1VC B1

VC A0VC A0

Does this have enough Does this have enough cache affinity to CPU2?cache affinity to CPU2?

(Idle)(Idle)

AsksAsks

Page 25: Presented by: Sagnik Bhattacharya

Idle balancerIdle balancer

CPU0CPU0 CPU1CPU1 CPU2CPU2 CPU3CPU3

VC B0VC B0

VC A1VC A1

VC B1VC B1

VC A0VC A0

Does this have enough Does this have enough cache affinity to CPU2?cache affinity to CPU2?

NO!!NO!!

(Idle)(Idle)

AsksAsks

Page 26: Presented by: Sagnik Bhattacharya

Idle balancerIdle balancer

CPU0CPU0 CPU1CPU1 CPU2CPU2 CPU3CPU3

VC B0VC B0

VC A1VC A1

VC B1VC B1

VC A0VC A0VC B1VC B1

Page 27: Presented by: Sagnik Bhattacharya

Periodic BalancerPeriodic Balancer

• Does depth-first traversal of the load tree

44

11 33

11 00 22 11

Trav

ersa

lTr

aver

sal

Page 28: Presented by: Sagnik Bhattacharya

Periodic BalancerPeriodic Balancer

• Checks difference of 2 siblings, ignores if<2

44

11 33

11 00 22 11

Trav

ersa

lTr

aver

sal

Diff=1Diff=1 Diff=1Diff=1

Page 29: Presented by: Sagnik Bhattacharya

Periodic BalancerPeriodic Balancer

• If diff>=2 does load balancing if benefit>cost

44

11 33

11 00 22 11

Trav

ersa

lTr

aver

sal

Diff=2Diff=2

Page 30: Presented by: Sagnik Bhattacharya

Gang SchedulingGang Scheduling

• For all the CPU’s we select the VCPU that is to run on the physical CPU.

• The VCPU selected is the highest priority be gang-runnable VCPU– all non-idle VCPU’s of that VM are either

• running or,• waiting on run queues of processors running lower-

priority VM’s.

Page 31: Presented by: Sagnik Bhattacharya

ExampleExample

µP1 :µP1 :

µP2 :µP2 :

µP3 :µP3 :

VC1VC1

VC2VC2

VC5VC5

VC7VC7 VC5VC5

VC1VC1 VC9VC9

VC3VC3 VC4VC4

Currently Currently Executing VCPUExecuting VCPU

Wait QueueWait Queue VM1VM1VC’s - 1,3,8VC’s - 1,3,8(idle)(idle)

VM2VM2VC’s - 2,4,6VC’s - 2,4,6(idle),7(idle),7

VM3VM3VC’s - 5,9VC’s - 5,9

Prio

rity

Prio

rity

Page 32: Presented by: Sagnik Bhattacharya

ExampleExample

µP1 :µP1 :

µP2 :µP2 :

µP3 :µP3 :

VC1VC1

VC2VC2

VC5VC5

VC7VC7 VC5VC5

VC1VC1 VC9VC9

VC3VC3 VC4VC4

VM1VM1VC’s - 1,3,8 VC’s - 1,3,8 (idle)(idle)

VM2VM2VC’s - 2,4,6VC’s - 2,4,6(idle),7(idle),7

VM3VM3VC’s - 5,9VC’s - 5,9

Prio

rity

Prio

rity

Gang RunnableGang Runnable

Page 33: Presented by: Sagnik Bhattacharya

ExampleExample

µP1 :µP1 :

µP2 :µP2 :

µP3 :µP3 :

VC5VC5

VC9VC9

VC5VC5

VC7VC7 VC1VC1

VC1VC1 VC2VC2

VC3VC3 VC4VC4

New New Executing VCPUExecuting VCPU

NewNewWait QueueWait Queue

VM1VM1VC’s - 1,3,8VC’s - 1,3,8(idle)(idle)

VM2VM2VC’s - 2,4,6VC’s - 2,4,6(idle),7(idle),7

VM3VM3VC’s - 5,9VC’s - 5,9

Prio

rity

Prio

rity

Page 34: Presented by: Sagnik Bhattacharya

Memory ManagementMemory Management

• Each cell maintains its own freelist, and allocates memory to other cells in it allocation preference list on request(RPC).

• Speed - 758 µsec for 4 MB.• A threshold is set for min. amount of local

free memory• As far as possible Paging is avoided.

Page 35: Presented by: Sagnik Bhattacharya

Memory BorrowingMemory Borrowing

• freelist - list of free pages in the cell• allocation preference list - list of cells from

which borrowing memory is more beneficial than paging.

Page 36: Presented by: Sagnik Bhattacharya

Memory BorrowingMemory Borrowing

Cell 1Cell 1 Cell 3Cell 3 Cell 4Cell 4 Cell 5Cell 5Cell 2Cell 2

Freelist sizesFreelist sizes

16 16 MBMB

32 32 MBMB

Borrowing thresholdBorrowing threshold

Lending thresholdLending threshold

Page 37: Presented by: Sagnik Bhattacharya

Memory BorrowingMemory Borrowing

Cell 1Cell 1 Cell 3Cell 3 Cell 4Cell 4 Cell 5Cell 5Cell 2Cell 2

Freelist sizesFreelist sizes

16 16 MBMB

32 32 MBMB

Borrowing thresholdBorrowing threshold

Lending thresholdLending threshold

asksasks

Page 38: Presented by: Sagnik Bhattacharya

Memory BorrowingMemory Borrowing

Cell 1Cell 1 Cell 3Cell 3 Cell 4Cell 4 Cell 5Cell 5Cell 2Cell 2

Freelist sizesFreelist sizes

16 16 MBMB

32 32 MBMB

Borrowing thresholdBorrowing threshold

Lending thresholdLending threshold

refusedrefused

Page 39: Presented by: Sagnik Bhattacharya

Memory BorrowingMemory Borrowing

Cell 1Cell 1 Cell 3Cell 3 Cell 4Cell 4 Cell 5Cell 5Cell 2Cell 2

Freelist sizesFreelist sizes

16 16 MBMB

32 32 MBMB

Borrowing thresholdBorrowing threshold

Lending thresholdLending thresholdcannot cannot askask

Page 40: Presented by: Sagnik Bhattacharya

Memory BorrowingMemory Borrowing

Cell 1Cell 1 Cell 3Cell 3 Cell 4Cell 4 Cell 5Cell 5Cell 2Cell 2

Freelist sizesFreelist sizes

16 16 MBMB

32 32 MBMB

Borrowing thresholdBorrowing threshold

Lending thresholdLending threshold

asksasks

Page 41: Presented by: Sagnik Bhattacharya

Memory BorrowingMemory Borrowing

Cell 1Cell 1 Cell 3Cell 3 Cell 4Cell 4 Cell 5Cell 5Cell 2Cell 2

Freelist sizesFreelist sizes

16 16 MBMB

32 32 MBMB

Borrowing thresholdBorrowing threshold

Lending thresholdLending threshold

GivesGives

4 MB4 MB

Page 42: Presented by: Sagnik Bhattacharya

Memory BorrowingMemory Borrowing

Cell 1Cell 1 Cell 3Cell 3 Cell 4Cell 4 Cell 5Cell 5Cell 2Cell 2

Freelist sizesFreelist sizes

16 16 MBMB

32 32 MBMB

Borrowing thresholdBorrowing threshold

Lending thresholdLending threshold

Page 43: Presented by: Sagnik Bhattacharya

Memory Management Memory Management (contd.)(contd.)

• Paging : Algo - Second Chance FIFO• Page sharing information by some control

data structure• Cellular Disco traps all read and write

requests made by the Operating Systems

Page 44: Presented by: Sagnik Bhattacharya

Second-chance FIFOSecond-chance FIFO

• A reference bit is added to each page in FIFO scheme

• Every time the page is accessed the bit is set to 1• If the page is selected by FIFO, and the reference

bit is 1, then it is set to 0 and another page is looked for.

• A page is the target page if it is selected b FIFO and the reference bit is 0

Page 45: Presented by: Sagnik Bhattacharya

ExampleExample

Page FaultPage Fault

1 Oldest Page1 Oldest Page

0 Second0 Second Oldest Page Oldest Page

FIFOFIFO

RBRB

Page TablePage Table

Page 46: Presented by: Sagnik Bhattacharya

ExampleExample

Page FaultPage Fault

0 Oldest Page0 Oldest Page

0 Second0 Second Oldest Page Oldest Page

Second-Second-chance chance FIFOFIFO

RBRB

Page TablePage Table

Page 47: Presented by: Sagnik Bhattacharya

ExampleExample

0 Oldest Page0 Oldest Page

RBRB

Page TablePage Table

Page 48: Presented by: Sagnik Bhattacharya

Hardware fault-containmentHardware fault-containment

• Failure rate increases with increase in processors.

• Internally structured as a set of semi-independent cells.

• Failure in one cell does not impact VM’s running in other cells (localization of faults)

• Assumption - CD is a trusted software layer

Page 49: Presented by: Sagnik Bhattacharya

Cellular StructureCellular Structure

Fault in one cell does not affect others

Page 50: Presented by: Sagnik Bhattacharya

Hardware fault-containment Hardware fault-containment (contd.)(contd.)

• Communication modes - Fast inter-processor RPC - Message

• Side benefit - Software fault containment, i.e., individual OS crashes do not impact the system.

Page 51: Presented by: Sagnik Bhattacharya

Hardware-Fault recoveryHardware-Fault recovery

• liveset - set of still functioning nodes.• Failure - removal from liveset• Recovery - insert back to liveset• Virtual machines dependent on the failed

cell are terminated.• Memory dependencies are updated when a

cell fails.

Page 52: Presented by: Sagnik Bhattacharya

ExampleExample

Cellular DiscoCellular Disco

InterconnectInterconnect

Node1Node1 Node4Node4 Node5Node5 Node6Node6Node3Node3Node2Node2

CPUCPU CPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPU

CellCell CellCell CellCell

VM 1VM 1 VM 2VM 2VM 3VM 3

Liveset - 1,2,3,4,5,6Liveset - 1,2,3,4,5,6

Page 53: Presented by: Sagnik Bhattacharya

ExampleExample

Cellular DiscoCellular Disco

InterconnectInterconnect

Node1Node1 Node4Node4 Node5Node5 Node6Node6Node3Node3Node2Node2

CPUCPU CPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPU

CellCell CellCell CellCell

VM 1VM 1 VM 2VM 2VM 3VM 3

Liveset - 1,2,3,4,5,6Liveset - 1,2,3,4,5,6

BOOMBOOM

Page 54: Presented by: Sagnik Bhattacharya

ExampleExample

Cellular DiscoCellular Disco

InterconnectInterconnect

Node4Node4 Node5Node5 Node6Node6Node3Node3

CPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPU

CellCell CellCell CellCell

VM 2VM 2

Liveset - 5,6Liveset - 5,6

Page 55: Presented by: Sagnik Bhattacharya

ExampleExample

Cellular DiscoCellular Disco

InterconnectInterconnect

Node1Node1 Node4Node4 Node5Node5 Node6Node6Node3Node3Node2Node2

CPUCPU CPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPU

CellCell CellCell CellCell

VM 2VM 2

Liveset - 5,6Liveset - 5,6

InterruptInterrupt

Page 56: Presented by: Sagnik Bhattacharya

ExampleExample

Cellular DiscoCellular Disco

InterconnectInterconnect

Node1Node1 Node4Node4 Node5Node5 Node6Node6Node3Node3Node2Node2

CPUCPU CPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPUCPUCPU CPUCPU

CellCell CellCell CellCell

VM 2VM 2

Liveset - 1,2,3,4,5,6Liveset - 1,2,3,4,5,6

Page 57: Presented by: Sagnik Bhattacharya

Fault-Recovery overheadFault-Recovery overhead

Page 58: Presented by: Sagnik Bhattacharya

Virtualization OverheadsVirtualization Overheads

(the first column shows the exec. Time on IRIX 6.4 and the second shows the exec. time on Cellular Disco).

Page 59: Presented by: Sagnik Bhattacharya

Cellular Disco and Ubiquitous Cellular Disco and Ubiquitous environmentsenvironments

• Provides raw computational power for our smart spaces.

• More importantly it does not fail. Fault-recovery present.

• Adaptable to new Operating systems

Page 60: Presented by: Sagnik Bhattacharya

Grey AreasGrey Areas

• Will the source simplicity remain if it is not piggybacked on IRIX 6.4?

• Will it work on non-uniform multiprocessor systems?– Probable solution - development of a hardware

virtualization standard

Page 61: Presented by: Sagnik Bhattacharya

In conclusion….In conclusion….

• Cellular Disco present a midway path between hardware and software directed techniques.

• It can be used on the central control unit for our smart spaces because it is scalable and fault-tolerant.


Recommended