Slide 1 ISTORE Software Runtime Architecture Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi...

Slide 1

ISTORE Software Runtime Architecture

Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi Thomas, Jim Beck, John

Kubiatowicz, and David Patterson

http://iram.cs.berkeley.edu/istore

1999 Winter IRAM Retreat

Slide 2

ISTORE RuntimeSoftware Architecture

• Runtime system goals for the ISTORE meta-appliance(1) Provide mechanisms that allow network service

applications to exploit introspection (monitor + adapt)

(2) Allow appliance designer to tailor runtime system policies and interfaces

• How the goals are achieved (1) Introspection: layered local and global runtime system

libraries that manipulate and react to monitoring data

(2) Specialization: runtime system is extensible using domain- specific languages (DSLs)

Slide 3

Roadmap• Layered software structure• Example of introspection• Runtime system extensibility using DSLs• Conclusion

Slide 4

Device Interface& RTOS


Layered software structure

Device nodes

Front-end node(s)

SwitchLAN/WANHW Device

HW Device (NIC)

Local Runtime

Local Runtime

DistributedGlobal RuntimeD

istrib

uted

Glo

bal R

untim

e DistributedGlobal Runtime

Parallel App.Worker Code Pa

ralle

l App

. Wor

ker Co

de Application Front-End Code

Slide 5



Device Interface Layer

Device nodes

Front-end node(s)


HW Device (NIC)

Slide 6

Device interface layer• Microkernel OS modules• Traditional OS services

– Networking, mem management, process scheduling, threads, …

• Device-specific monitoring– Raw access patterns– Utilization statistics– Environmental parameters– Indications of impending failure

• Self-characterization of performance, functional capabilities

Slide 7



Local runtime layer

Device nodes

Front-end node(s)


HW Device (NIC)

Local Runtime

Local Runtime

Slide 8

Local runtime layer• Non-distributed mechanisms needed by network

service applications• Feeds information to global layer or performs local

operations on behalf of global layer• Example mechanisms

– Application-specific filtering/aggregation of device monitoring data

» Example: OLTP server vs. DSS server

– Data layout and naming» Example: record-based interface for DB, file-based for web

server

– Device scheduling» Example: maximize TPS vs. maximize disk bandwidth utilization

– Caching» Coherence essential vs. coherence unnecessary» More efficient caching implementation possible in second case

Slide 9



Global runtime layer

Device nodes

Front-end node(s)


HW Device (NIC)

Local Runtime

Local Runtime


istrib

uted

Glo

bal R

untim


Slide 10

Global runtime layer• Aggregate, process, react to monitoring data• Relies on local per-device runtime mechanisms to

provide monitoring data, implement control actions• Provides application interface that hides

distributed implementation of runtime services• Example services

– High-level services» Load balancing: replicate and/or migrate heavily used data

objects when a disk becomes over-utilized» Availability: replicate data from failed or failing component to

restore required redundancy» Plug-and-play: integrate new devices into the system

– Low-level services used to implement high-level global services

» Distributed directory tracks data and metadata objects» Migration, replication, caching» Inter-brick communication» Distributed transactions

Slide 11



Distributed application worker code

Device nodes

Front-end node(s)


HW Device (NIC)

Local Runtime

Local Runtime


istrib

uted

Glo

bal R

untim



ralle

l App

. Wor

ker Co

de

Slide 12

Distributed application worker code

• Runs on top of global runtime system• Written by appliance designer• Application-specific

– Database» scan, sort, join, aggregate, update record, delete

record, ...

– Transformational web proxy» fetch web page (from disk or remote site), apply

transformation filter, update user preferences database, ...

• System administration tools implemented at this level

– Customized runtime system defines administrative interface tailored to application

Slide 13



Application front-end code

Device nodes

Front-end node(s)


HW Device (NIC)

Local Runtime

Local Runtime


istrib

uted

Glo

bal R

untim



ralle

l App

. Wor

ker Co

de Application Front-End Code

Slide 14

Application front-end code• Runs on front-end interface bricks • Accepts requests from LAN/WAN connection

– Incoming requests made using standard high-level protocols

» HTTP, NFS, SQL, ODBC, …

• Invokes and coordinates appropriate worker code components that execute on internal blocks

– Takes into account locality and load balancing– Database: front-end performs SQL query optimization,

invokes distributed relational operators on data storage devices

– Transformational proxy: front-end invokes distiller thread on appropriate device brick

» if data is cached, invoke on disk node » otherwise, fetch data from web and invoke on compute node

or disk node

Slide 15


Slide 16

From introspection to adaptation

• Example: slowly-failing data disk in large DB system(1) Detect problem

(2) Repair problem while continuing to handle incoming requests

(3) Return to normal system operation

Intelligent HWcomponents

Continuous monitoring

Extensible, application-tailored runtime system

Adaptive, self-maintaining appliance+

Slide 17

Failing disk: detection• Microkernel monitoring module continuously

monitoring disk’s health detects exceptional condition, e.g.

– ECC failures– Media errors– Increased rates of ECC retries

• Notifies global fault handling mechanism

Slide 18

Failing disk: reaction• Global fault handling mechanism…

– Prevents system from sending more work to failed device

» Modifies global directory to remove entries corresponding to failed component’s data

– Application-specific response to impending failure» Transactional system: discard work currently in

progress on failing device, reissue to another data replica

» Non-transactional system w/o coherent replicas: checkpoint computation, restore on another data replica

» Transformational web proxy: do nothing

– Instruct disk runtime system to shut disk down» Disk device is considered failed

Slide 19

Failing disk: return to normal operation

• Global fault handling mechanism...– Rebuilds data redundancy

» By allocating space for a new replica on a functioning disk and copying data to it from existing replicas

» Using an application-specific data replication mechanism•Where to allocate new replicas, how to copy data,

how to lay out data for new replicas, how to update global directory

•Example in upcoming slide

• Life returns to normal– Degree of fault-tolerance has been restored

• Failed component can be replaced during regularly-scheduled maintenance

Slide 20


Slide 21

Runtime system extensibility• Two ways of looking at system

– Partitioned on functional/mechanism boundaries» Collection of libraries: failure detection, transactions, ...» Mechanisms are isolated

application

libfail

librepl

libtrxn

libcache

OS

Slide 22



– Partitioned on global system properties» This is how the programmer thinks about the system (high-

level)» e.g. application-specific data availability policy

•Failure detection (which devices to monitor, …) •Replication (used to restore redundancy)•Transactions (how to restart work in progress)•Caching (how to handle dirty cached objects during

failure)

application

libfail

librepl

libtrxn

libcache

OS

Slide 23



– Partitioned on global system properties» This is how the programmer thinks about the system (high-

level)» e.g. application-specific data availability policy

•Failure detection (which devices to monitor, …) •Replication (used to restore redundancy)•Transactions (how to restart work in progress)•Caching (how to handle dirty cached objects during

failure)

application

libfail

librepl

libtrxn

libcache

OS

policy

compiler

Customized runtimesystem library

Slide 24

Extensibility using DSLs• DSLs are languages specialized for a particular task• Each ISTORE DSL

– Encapsulates high-level semantics of one system behavior– Allows declarative specification of

» Behavior of one aspect of the system (a “policy”)» Interfaces to coordinated mechanisms that implement the policy

– Is compiled into an implementation that might coordinate several local and/or global base runtime system mechanisms

» May be implemented as background and/or foreground tasks

• Analysis tools can potentially infer unspecified emergent system behaviors from the specifications

– e.g. what impact will a new redundancy policy have on transaction commit time

• Extensions compiled together with local and global base mechanisms form the distributed runtime system

Slide 25

Extensibility using DSLs: Example

Avail::FailureDetected(Device d) {Object o; ObjList objs;Transaction t; TxnList txns;Replica x, c, r;

Directory::MarkDeviceDisabled(d);Admin::AlertFailure(d);objs = Directory::GetObjects(d); objs stored on failed deviceforeach o (objs) {

x = Directory::GetReplica(o,d) find o’s replica on dDirectory::DeleteReplica(x); delete from global directorytxns = Txn::GetActiveTxns(x);foreach t (txns) {

Txn::AbortTxn(t); abort pending txns for o on d

}c = Directory::GetReplica(o); find still-accessible copyr = LoadBalancer::AllocateReplica(o); get space for new replLocalRuntime::CopyObject(c,c->device,r,r->device); copy itDirectory::AddReplica(r,r->device); update directoryforeach t (txns) {

Txn::IssueTxn(txn,r); reissue txns on new replica

}

}

}

Slide 26

Extensibility using DSLs (cont.)• Similar specification written for each

extension to base library• Other examples of extensible system

behaviors– Transaction response time requirements– Prioritizing operations based on type of data

processed– Resource allocation– Backup policy– Exported administrative interface

Slide 27

Why use DSLs?• Possible choices

– Each appliance designer writes runtime system from scratch

» Similar to exokernel operating systems

– All designers use single parameterized runtime system library

» Similar to tunable kernel parameters in modern OSs

– Designer writes high-level specification of system behavior

» DSL compiler automatically translates specification into runtime system extensions that coordinate base mechanisms

» Advantages include• Programmability• Performance• Reliability, verifiability, safety• Artificial diversity

Slide 28

DSL advantages (cont.)• Programmability

– High-level specification close to designer’s abstraction level

» Easier to write, reason about, maintain, modify runtime system code

» Simple enough to allow site-specific customization at installation time

• Performance– Aggressive DSL compiler can take advantage of high-

level semantics of specification language– Base library mechanisms can be highly optimized;

optimization complexity hidden from appliance designer

– Web example: infer that TCP checksums should be stored with web pages

Slide 29

DSL advantages (cont.)• Reliability

– Automatically generate code that’s easy to forget or get wrong

» Example: synchronization operations to serialize accesses to distributed data structure

• Verifiability– Of input code (DSL specification)

» More abstract form of semantic checking» e.g. DSL supports types natural to behavior being specified =>

type-checking verifies some semantic constraints• e.g. “ensure no unencrypted objects are written to disk”

– Of output code (coordinated use of base mechanisms)» DSL compiler writer satisfied DSL compiler is correct =>

appliance designer inherits verification effort

• Safety (prevent runtime errors)– Whole classes of general programming errors not possible

» DSLs hide details: runtime memory management, IPC, ...» Compiler automatically adds code: synchronization, ...

Slide 30

DSL advantages

(cont.)

• Artificial diversity– Potentially allow system to continue operation in face of

internal bugs or malicious attack» Multiple implementations of component run simultaneously on

different data replicas» Continuously check each other with respect to high-level

behavior» Non-traditional fault-tolerance, but related to process pairs

– Potentially usable to enhance performance» Select best-performing implementation(s) for future use;

periodically reevaluate choice

– Examples of possible implementation differences» Low-level: runtime memory layout, code ordering and layout» High-level: system resource usage (recompute vs. use stored

data, general space/time/bandwidth tradeoffs)

Specification

DSL compiler

Implementation 1 Implementation 2 Implementation 3 ...

Slide 31

ISTORE software summary• ISTORE software architecture provides an

extensible runtime environment for distributed network service application code

– Layered local and global mechanism libraries provide introspection and self-maintenance

– Mechanisms can be customized using DSL-based specifications of application policy

» DSL code coordinates base mechanisms to implement application semantics and interfaces

» DSL-based extension offers significant advantages in programmability, performance, reliability, safety, diversity

Slide 32

ISTORE summary• Network services are increasing in importance

– Self-maintaining scaleable storage appliances match the needs of these services

• ISTORE provides a flexible architecture for implementing storage-based network service apps

– Modular, intelligent, fault-tolerant hardware platform is easy to configure, scale, and administer

– Runtime system allows applications to leverage intelligent hardware, achieve introspection, and provide self-maintenance through

» Layered runtime software structure» DSL-based extensibility that allows easy application-

specific customization

Slide 33

Agenda• Overview of ISTORE: Motivation and

Architecture• Hardware Details and Prototype Plans • Software Architecture• Discussion and Feedback

Slide 34

Backup slides

Slide 35

What ISTORE is not• An extensible operating system

– Use commodity OS, only add hardware monitoring module» MM could just be a device driver => no need for microkernel

OS» ISTORE could be built on top of an extensible operating

system for even greater flexibility

• An attempt to make commodity OS’s extensible– Extensible runtime system allows designer to customize

higher-level operations than OS extensions do– Closest to an extensible distributed operating system built

on top of a commodity single-node operating system

• A multiple-protection-domain system– Assumes non-malicious programmer– If user-downloaded code permitted, sandbox must be

implemented as part of (trusted) application– DSLs specify resource allocation/scheduling policies,

appliance designer responsible for ensuring fairness

• A framework for building generic servers

Slide 36

ISTORE boot process(1) Initially, undifferentiated ISTORE system(2) On boot, each device block contacts system

boot server

(3) Device blocks download customized runtime system and application worker code

– Front-end blocks also download application front-end code

• Runtime system libraries structured as shared libraries => hot upgrade

Slide 37

Example Appliances• E-commerce• Web search engine• Transformational web/PDA proxy• Election server• Mail server• News server• NFS server• Database server: OLTP, DSS, mixed OLTP-DSS• Video server

Date post:	18-Dec-2015
Category:	Documents
View:	214 times
Download:	1 times

Slide 1 ISTORE Software Runtime Architecture Aaron Brown, David Oppenheimer, Kimberly Keeton, Randi...

Documents