Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | nola-watson |
View: | 22 times |
Download: | 0 times |
Self-* Networks of Unmanned Vehicles
CSE 597c, Fall 2006
Introduction to Self-* SystemsSept. 14, 2006
Bhuvan Urgaonkar
Definition
Self-* Systems A regular expression
Self-tuning, self-configuring, self-healing, self-stabilizing, …
Autonomic computing [IBM] Inspired by the autonomous central
nervous system in a living organism In humans and other vertebrates, the part of the nervous
system that regulates the involuntary activity of the heart, intestines and glands.
Some History What do you think the first Self-* system was?
Wind/water mill? Emergence of (semi-) autonomous systems
starting with the industrial revolution Steam engine, printing press, car, … Could carry out certain tasks without human intervention Development of feedback-control theory, signal processing
Thermostat (Albert Butz of the Thermo-Electric Regulator Co., Minneapolis, 1885)
Cruise control
Early 20th century onwards Major advances in engineering & emergence of computing Now you could program a mechanical/electrical/… system
More complex autonomous systems
More History
Artificial Intelligence Make a machine/computer do what a (smart/able)
human can do Learn like a human does Sometimes easy, very often not!
Turing Test A computer that can pose as a human passes the Turing
test A definition of Self-* ness?
Would imitating human behavior alone be enough?
Complexity of Modern Systems
Computer systems grew in complexity Others as well, but lets talk about CS
NYTimes: All science is computer science Complex h/w, s/w, Distributed systems, Heterogeneity, …
Can’t be managed by housewives who are given a manual – WW II !!
IBM’s DB2 database server has about 80 parameters! Modern systems operate in highly dynamic conditions Human-intervention based operation often infeasible
Error-prone Slow Expensive …
Operating Environments that Prohibit Human Participation
Robots or machines operating in mines, under oceans, volcanic areas, … Must “take care” of themselves
Defining Self-* ness The Turing test doesn’t quite capture
Self-* ness Sometimes we want better than what
even the smartest/fastest human can do! Not quite the same as the original AI goal And not a superset of it Some intersection, but also some
orthogonal requirements
Outline Motivation and history Examples Self-* networks/distributed systems Relevant areas/useful techniques Summary
Example 1:General-purpose Operating Systems
CPU scheduling and memory management First computers did batch processing of jobs A human would schedule the jobs
Multi-programming came up Dynamically changing set of processes Interleaving of computation and I/O Response time sensitive processes such as editors The CPU scheduler had to adapt to these dynamics
Self-tuning behavior was desired Same for memory manager
Self-tuning Systems
External environmentincluding inputs
System output(e.g., performance)
Feedback
System components
Keep output within desired bounds even when the external environment is changing
Example 2:Mission-critical Operating Systems
OSes running on space-crafts System had to discover errors and recover on its own
Self-healing systems Initial/simple solutions: High degree of redundancy
Introduce redundancy to deal with failures Implement mechanisms to quickly discover failures
OK for a space-craft, but not for a more “down-to-earth” system Could be very expensive How can a system self-heal without excessive redundancy?
Later: Software became very complex S/w failures far more serious problem than h/w failures!
Software engineering, programming languages
Self-healing Systems
Keep output within reasonable bounds even when internal components fail
What’s different from a self-tuning system? Failures are internal events; changes in operating
environment are external events Note: Failures might be induced by external events
System output(e.g., performance)
Feedback
External environmentincluding inputs
ComponentFailure
Self-Stabilization
Green=good, Blue=bad Guaranteed to return to a good state, eventually, on its
own Related to fault tolerance
How?
Classification of Self-* Systems
Self-tuning Performance
Self-healing Failure handling
Self-stabilizing Convergence
Is this a good classification? Note: Not necessarily a non-intersecting
classification
Defining Self-* ness (contd.)
First define for each member of our classification
Quantifying Self-tunability How good is the system at meeting performance targets under
dynamic operating conditions? E.g., Can the system ensure response time degradation is always at
most proportional to increase in request arrival? Note: The system can change its internal state (e.g., increase its capacity
dynamically) to achieve its goal
Quantifying the Goodness of a Self-healing System
How good is the system at maintaining functionality under failures? E.g. 1, Can the system continue functioning even after N failures? E.g. 2, Can the system continue to offer the same response time even
after N failures?
Quantifying the Goodness of a Self-stabilizing System
How long does it take the system to return to a good state after a perturbation?
Defining Self-* ness (contd.)
One approch: Define a vector whose individual elements characterize self-tunability, goodness of self-healing, and self-stabilization
E.g., <ST=excellent, SH=poor, SS=good> Conflicting goals!
E.g., maintaining performance might require fewer components; dealing with failures might require redundancy
Need to understand what is more important Context dependent
Relative importance of various self-* properties vary across systems
Outline Motivation and history Examples Self-* networks/distributed systems Relevant areas/useful techniques Summary
Distributed Systems How do things change? Cons: Problems associated with a distributed system
Data consistency Larger communication delays Heterogeneity More failures, more kinds of failures …
Pros: More sources of redundancy might mean better self-
healing More resources might mean more options to self-tune Any more?
Example 3:Networking: TCP/IP
Simple AIMD based congestion control De-centralized, only at end-points Has worked pretty well!
Scaled to current Internet I consider TCP a good Self-tuning protocol
What about link failures and how IP handles them?
Example 4:Enterprise/Utility Computing
Varying workloads, complex applications Human management infeasible, error-prone
How to manage resources to maximize revenue while meeting client requirements
Example 5:Search Engine: Google
Web content highly dynamic Self-tuning:
How good is the search engine at keeping up with changes in Web content?
Self-healing: Thousands of servers and disks in their data
center, failures every few hours! Does google.com keep working despite these
failures? How much human intervention does this need?
Outline Motivation and history Examples Self-* networks/distributed systems Relevant areas/useful techniques Summary
Relevant areas/useful techniques
Multi-criteria Optimization Techniques (economics) Analytical modeling (e.g., to infer resource needs of an app) Measurement techniques Feedback-control theory (reactive) Statistical techniques for prediction, learning (reactive+proactive) Biological, ecological, social networks
How do termites with pinhead-sized brains build air-conditioned colonies?
Theoretical CS: online algorithms, approximation algorithms Distributed computing Systems issues
Efficient & bug-free software, prototyping, simulation, experiment design)
Outline Motivation and history Examples Self-* networks/distributed systems Relevant areas/useful techniques Summary
Summary: Key Principles Keep is simple, silly!
Occam’s razor E.g., Partial automation vs complete automation
Understand and define system goals clearly Which Self-* properties are essential, which are not?
Understand system properties, operating environments
One size may not fit all Measurements Prediction, classification, learning, feed-back control
Design for agility (assuming online operation) Efficient algorithms & systems mechanisms