14 Feb 2001 OASIS PI Meeting
Computational ResiliencyComputational Resiliency
Steve J. Chapin, Susan OlderSteve J. Chapin, Susan Older
Syracuse UniversitySyracuse University
Gregg IrvinGregg Irvin
Mobium EnterprisesMobium Enterprises
1
Recap: What isRecap: What isComputational Resiliency?Computational Resiliency?
The ability to sustain application operation The ability to sustain application operation and dynamically restore the level and dynamically restore the level
of assurance during an attack.of assurance during an attack.
Application-centric self defense, builtApplication-centric self defense, builton replication, migration, functionalityon replication, migration, functionality
mutation, and camouflage.mutation, and camouflage.
Computational ResiliencyComputational Resiliency
Mission CriticalApplication
Attack
Degraded Application sufficiently Improved by
Resiliency to perform Mission Critical Function
Techniques applied to correct situation
ComputationalResiliency
Result ofAttack
Degraded Application trying to perform Mission Critical
Function
Example of CRLibExample of CRLib
16 2x Pentium
16 2x Pentium
16 2x Pentium
16 Alpha
Firewall
Intel 8x SMP
Intel 8x SMP
SGI Origin
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
"The Net"
“Safe Zone”OASIS protection
“The Wild”limited protection
The PlayersThe Players
Rocky & Bullwinkle: our heroes, both air Rocky & Bullwinkle: our heroes, both air and ground forces.and ground forces.
Dudley: representative of allied power.Dudley: representative of allied power. Boris & Natasha: Directed by shadowy Boris & Natasha: Directed by shadowy
figure (Fearless Leader). Mission: big figure (Fearless Leader). Mission: big trouble for Moose and Squirrel.trouble for Moose and Squirrel.
Snidely: attempting to disrupt Dudley’s Snidely: attempting to disrupt Dudley’s jobs.jobs.
The Benign StateThe Benign State
16 2x Pentium
16 2x Pentium
16 2x Pentium
16 Alpha
Firewall
Intel 8x SMP
Intel 8x SMP
SGI Origin
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
"The Net"
Dudley’s job(low priority)
Bullwinkle’s jobRocky’s job
The AttacksThe Attacks
16 2x Pentium
16 2x Pentium
16 2x Pentium
16 Alpha
Firewall
Intel 8x SMP
Intel 8x SMP
SGI Origin
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
"The Net"
Snidely: blocked atfirewall
Dudley does nothing.
The AttacksThe Attacks
16 2x Pentium
16 2x Pentium
16 2x Pentium
16 Alpha
Firewall
Intel 8x SMP
Intel 8x SMP
SGI Origin
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
"The Net"
Natasha attacks Rocky; caught by IDS.
The AttacksThe Attacks
16 2x Pentium
16 2x Pentium
16 2x Pentium
16 Alpha
Firewall
Intel 8x SMP
Intel 8x SMP
SGI Origin
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
"The Net"
Rocky’s job migrates back into safe zone;Dudley must give up resources.
The AttacksThe Attacks
16 2x Pentium
16 2x Pentium
16 2x Pentium
16 Alpha
Firewall
Intel 8x SMP
Intel 8x SMP
SGI Origin
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
"The Net"
Boris attacks Bullwinkle’s job.Some attacks succeed.
The AttacksThe Attacks
16 2x Pentium
16 2x Pentium
16 2x Pentium
16 Alpha
Firewall
Intel 8x SMP
Intel 8x SMP
SGI Origin
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
3Com Superstack 3300
"The Net"
Bullwinkle’s job employs camouflage,decoys, and migration.
Multi-Faceted ApproachMulti-Faceted Approach
Strong theoretical basis Strong theoretical basis reason about conformance to policyreason about conformance to policy
Computational resiliency libraryComputational resiliency library dynamic application managementdynamic application management
System software support System software support scheduling/policy frameworksscheduling/policy frameworks
Computational ResiliencyComputational ResiliencyLibraryLibrary
Group messagingGroup messaging group contains multiple nodesgroup contains multiple nodes all nodes receive all messages to groupall nodes receive all messages to group
Replication/recovery with migrationReplication/recovery with migration liveness check at synchronization pointsliveness check at synchronization points application readiness restored via node application readiness restored via node
creation and migrationcreation and migration
Groups and MessagingGroups and Messaging
Group 1
Group 2
Group 3
nodechannel
One group per cooperating task in a distributed computation.
Group Messaging DetailGroup Messaging Detail
Group 1 Group 2
In actuality, each member of Group 1 hasa channel to each member of Group 2.
Mapping of Nodes to Processors Mapping of Nodes to Processors (channels not shown)(channels not shown)
Group
Processor
Nodes of group Nodes of group mapped across mapped across processorsprocessors
Multiple nodes as Multiple nodes as threads in a single threads in a single processprocess
One or more One or more processes per processes per processorprocessor
Periodic Liveness CheckPeriodic Liveness Check Done at user-defined synchronization points in Done at user-defined synchronization points in
the computationthe computation All group members send ping messages to all All group members send ping messages to all
others in the same groupothers in the same group Local Group Leader (1 per group) elected Local Group Leader (1 per group) elected
(responsible for restoring intra-group replication (responsible for restoring intra-group replication level)level)
LGLs elect Global Group Leader (responsible LGLs elect Global Group Leader (responsible for inter-group coordination)for inter-group coordination)
Periodic Liveness Check IIPeriodic Liveness Check II LGLs determine local status by fiat, LGLs determine local status by fiat,
restore replication level, and report to GGLrestore replication level, and report to GGL create new threads via cloning LGLcreate new threads via cloning LGL consensus option is in place but currently consensus option is in place but currently
unusedunused GGL reports results of LGL actions to GGL reports results of LGL actions to
other LGLs.other LGLs. LGL and GGL return to normal dutyLGL and GGL return to normal duty
Simple ApplicationSimple Application
Simple Application After Simple Application After Process Taken Out by AttackerProcess Taken Out by Attacker
Application After Second Application After Second Processor LostProcessor Lost
Current IssuesCurrent Issues Exploring through in-house red teaming and Exploring through in-house red teaming and
modelingmodeling Efficiency of basic mechanismsEfficiency of basic mechanisms
multiplicative communication loadmultiplicative communication load additive computation loadadditive computation load
Efficacy of basic mechanisms Efficacy of basic mechanisms Window of attack between liveness checksWindow of attack between liveness checks Attack during liveness checkAttack during liveness check agreement algorithmsagreement algorithms
Next StepsNext Steps
Additional policy choicesAdditional policy choices agreement protocolsagreement protocols replication/recovery methodsreplication/recovery methods message passing schemes message passing schemes
Tool for user policy expressionTool for user policy expression state-dependent policy specified via state-dependent policy specified via
“chinese menu” approach“chinese menu” approach logical predicates, state transitionslogical predicates, state transitions
} Not necessarilyorthogonalchoices
Next StepsNext Steps
-calculus-based formal model for core -calculus-based formal model for core library behaviorlibrary behavior
Split/merge for groupsSplit/merge for groups all nodes in a group must be identicalall nodes in a group must be identical basis for load balancing, functionality basis for load balancing, functionality
mutationmutation First demo at summer PI meeting, 2001First demo at summer PI meeting, 2001
ScheduleSchedule
6/00 12/00 6/01 12/02 6/02 12/02 6/03 12/03
Basic -calc
Formalequivalence
Policy/ProtocolAnalysis
BasicCRLib
Schedule IISchedule II
6/00 12/00 6/01 12/02 6/02 12/02 6/03 12/03
Funct. Mut.PolicyFrameworksCamouflage
Schedulers
Hard. Apps.Integration
Demos
Open IssuesOpen Issues
Cost/benefit analysis of CRCost/benefit analysis of CR how much protection do we provide if the how much protection do we provide if the
attacker knows what we’re trying to do?attacker knows what we’re trying to do? How much is performance affected by How much is performance affected by
message load, active replication, etc.message load, active replication, etc. Potential integration with other OASISPotential integration with other OASIS
complementary with system-hardening complementary with system-hardening technology (e.g., Dependable Intrusion technology (e.g., Dependable Intrusion Tolerance)Tolerance)