Mathematical modelof computer virusesMathematical modelMathematical modelofof computercomputer virusesviruses
FerencFerenc LeitoldLeitold,,Hunix LtdHunix Ltd., Hungary., Hungary
fleitoldfleitold@@hunixhunix.hu.hu
Table of contentsTable of contents
• Models of computation• Operating system• Virus definition• What can we do with this
mathematical model ?
Turing MachineTuring Machine
Turing MachineTuring Machine
Turing MachineTuring Machine
Turing MachineTuring Machine
Turing MachineTuring MachineT ,S,I, ,b,q ,q >0 f=<=<=<=< Q δδδδ
qq00
qqff
S: tape symbolsI: input symbols,b: blank symbol,
: move function,δδδδ
I S⊂⊂⊂⊂b S I∈∈∈∈ \
{{{{ }}}}δδδδ: , ,Q S Q S l r s×××× →→→→ ×××× ××××
RandomAccessMachine
RandomRandomAccessAccessMachineMachine
RandomAccessMachine
RandomRandomAccessAccessMachineMachine mm00
mm11
mm22
mm33
mm44
......
AccumulatorAccumulator
RASPMRASPMRASPM
RASPM with ABSRASPMRASPM withwith ABSABS
RASPM with SABSRASPMRASPM withwith SABSSABS
RASPMRASPM withwith ABSABSdefinitiondefinition
M: initial memory contentq: initial value of the IP
T: set of processor’s activitiesU: operation codes,
V: set of symbols
G = <V,U,T,f,q,M>G = <V,U,T,f,q,M>G = <V,U,T,f,q,M>
U V⊆⊆⊆⊆
f U T: →→→→
Instruction setInstruction set• move (LOAD, STORE)• logical (AND, OR, XOR)• arithmetic (ADD, SUB, MULT, DIV)• branch (JUMP, JGTZ, JZERO)• input/output tape handling
(READ, WRITE)• background tape handling
(GET, PUT, SEEK, SETDRIVE)
Operating SystemOperating System
• system of programs• able to handle separate program
or data files• able to make a specified program
to run.
Operating SystemsOperating Systemsunderunder RASPMRASPM withwith ABSABS
Operating SystemsOperating Systemsunderunder RASPMRASPM withwith ABSABS
• The OS is in the initial memory (M)
Operating SystemsOperating Systemsunderunder RASPMRASPM withwith ABSABS
• The OS is in the initial memory (M)� OS specific machine
Operating SystemsOperating Systemsunderunder RASPMRASPM withwith ABSABS
• The OS is in the initial memory (M)� OS specific machine
• The OS is in the background tape
Operating SystemsOperating Systemsunderunder RASPMRASPM withwith ABSABS
• The OS is in the initial memory (M)� OS specific machine
• The OS is in the background tape� OS independent machine
Operating SystemsOperating Systemsunderunder RASPMRASPM withwith ABSABS
• The OS is in the initial memory (M)� OS specific machine
• The OS is in the background tape� OS independent machine
• The OS is in the input tape
Operating SystemsOperating Systemsunderunder RASPMRASPM withwith ABSABS
• The OS is in the initial memory (M)� OS specific machine
• The OS is in the background tape� OS independent machine
• The OS is in the input tape� unusable
ComparingComparingRASPMRASPM withwith ABSABS--eses
GGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
ComparingComparingRASPMRASPM withwith ABSABS--eses
GGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
{q{q{q111 ,M,M,M111} {q} {q} {q222 ,M,M,M222}}}≠≠≠≠
ComparingComparingRASPMRASPM withwith ABSABS--eses
GGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
{q{q{q111 ,M,M,M111} {q} {q} {q222 ,M,M,M222}}}≠≠≠≠
•• different operating systemsdifferent operating systems•• different loaderdifferent loader programprogram
ComparingComparingRASPMRASPM withwith ABSABS--eses
GGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
ComparingComparingRASPMRASPM withwith ABSABS--eses
GGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
{f{f{f111 ,T,T,T111 ,U,U,U111} {f} {f} {f222 ,T,T,T222 ,U,U,U222}}}≠≠≠≠
ComparingComparingRASPMRASPM withwith ABSABS--eses
GGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
{f{f{f111 ,T,T,T111 ,U,U,U111} {f} {f} {f222 ,T,T,T222 ,U,U,U222}}}≠≠≠≠
•• different instruction setsdifferent instruction sets ((activitiesactivities))•• different sets of operation codesdifferent sets of operation codes•• different operation codesdifferent operation codes
ComparingComparingRASPMRASPM withwith ABSABS--eses
GGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
ComparingComparingRASPMRASPM withwith ABSABS--eses
GGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
VVV111 VVV222≠≠≠≠
ComparingComparingRASPMRASPM withwith ABSABS--eses
GGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
VVV111 VVV222≠≠≠≠
•• different symbolsdifferent symbols•• different tape formatsdifferent tape formats
ComputerComputer virusvirus
ComputerComputer virusvirus
• a (part of) program
ComputerComputer virusvirus
• a (part of) program• it is attached to a program area
ComputerComputer virusvirus
• a (part of) program• it is attached to a program area• it is able to link itself to other
program areas
ComputerComputer virusvirus
• a (part of) program• it is attached to a program area• it is able to link itself to other
program areas• it is executed when the host
program area is to be executed
Virus spreading modesVirus spreading modes
Virus spreading modesVirus spreading modes
• machine specific
Virus spreading modesVirus spreading modes
• machine specific• machine independent
Virus spreading modesVirus spreading modes
• machine specific• machine independent• operating system specific
Virus spreading modesVirus spreading modes
• machine specific• machine independent• operating system specific• operating system independent
Virus spreading modesVirus spreading modes
• machine specific• machine independent• operating system specific• operating system independent• direct
Virus spreading modesVirus spreading modes
• machine specific• machine independent• operating system specific• operating system independent• direct• indirect
What can we do with thisWhat can we do with thismathematical modelmathematical model ??
What can we do with thisWhat can we do with thismathematical modelmathematical model ??
• Examining virus detection problem
What can we do with thisWhat can we do with thismathematical modelmathematical model ??
• Examining virus detection problem• Examining searching techniques
What can we do with thisWhat can we do with thismathematical modelmathematical model ??
• Examining virus detection problem• Examining searching techniques• Examining polymorphic viruses
What can we do with thisWhat can we do with thismathematical modelmathematical model ??
• Examining virus detection problem• Examining searching techniques• Examining polymorphic viruses• Examining multiplatform viruses
General virusGeneral virusdetection problemdetection problem
It is impossible to build a TuringMachine which could decide if anexecutable file in a RASPM withABS contains a virus or not.
TheoremTheorem::
General virusGeneral virusdetection problemdetection problem
ProofProof::
Host program Virus
General virusGeneral virusdetection problemdetection problem
ProofProof::
Host program Virus TM prg
General virusGeneral virusdetection problemdetection problem
ProofProof::
Host program Virus TM prg TM input
General virusGeneral virusdetection problemdetection problem
ProofProof::
Host program Virus TM prg TM input
General virusGeneral virusdetection problemdetection problem
ProofProof::
Host program Virus TM prg TM input
General virusGeneral virusdetection problemdetection problem
ProofProof::
Host program Virus TM prg TM input
Virus detection problemVirus detection problem TMTM halting problemhalting problem
““An antiAn anti--virusvirus hashas itsits limit,limit,thanks to Turingthanks to Turing,,
andand aa virus can find those limitsvirus can find those limits,,exploit themexploit them,,
thanks tothanks to Darwin.”Darwin.”
from the Giant Black Book offrom the Giant Black Book of ComputerComputer VirusesViruses
Searching techniqueSearching techniquequestionsquestions
Searching techniqueSearching techniquequestionsquestions
•• For what kind of viruses canFor what kind of viruses can bebeusedused ??
Searching techniqueSearching techniquequestionsquestions
•• For what kind of viruses canFor what kind of viruses can bebeusedused ??
•• WhatWhat isis the probability of falsethe probability of falsealarmsalarms ??
Searching techniqueSearching techniquequestionsquestions
•• For what kind of viruses canFor what kind of viruses can bebeusedused ??
•• WhatWhat isis the probability of falsethe probability of falsealarmsalarms ??
•• WhatWhat isis the expense criteriathe expense criteria ??
Sequence searching algorithmSequence searching algorithm
Sequence searching algorithmSequence searching algorithm
• for non-polymorphic known viruses
Sequence searching algorithmSequence searching algorithm
• for non-polymorphic known viruses
• false alarms: p L MNn
≈≈≈≈⋅⋅⋅⋅
L:L: size of suspicious areasize of suspicious areaM:M: number of sequencesnumber of sequencesN:N: size ofsize of aa sequencesequencen:n: number of values in one cellnumber of values in one cell
Sequence searching algorithmSequence searching algorithm
• for non-polymorphic known viruses
• false alarms:
• expense criteria: P, polynomial
p L MNn
≈≈≈≈⋅⋅⋅⋅
≤≤≤≤ ⋅⋅⋅⋅ ⋅⋅⋅⋅L M N comparisions
L:L: size of suspicious areasize of suspicious areaM:M: number of sequencesnumber of sequencesN:N: size ofsize of aa sequencesequencen:n: number of values in one cellnumber of values in one cell
““HeuristicHeuristic”” algorithmalgorithm
““HeuristicHeuristic”” algorithmalgorithm
• for known viruses
““HeuristicHeuristic”” algorithmalgorithm
• for known viruses
• expense criteria:
Host program Decoder (cycle) Body
““HeuristicHeuristic”” algorithmalgorithm
• for known viruses
• expense criteria: NP
Host program Decoder (cycle) Body
Executes 2n cycle !
n
How can we measure theHow can we measure thepower of polymorphismpower of polymorphism ??
How can we measure theHow can we measure thepower of polymorphismpower of polymorphism ??
Host program Decoder Body
How can we measure theHow can we measure thepower of polymorphismpower of polymorphism ??
Host program Decoder Body
size of variable parts of the virusfull size of the virusαααα ====
How can we measure theHow can we measure thepower of polymorphismpower of polymorphism ??
Host program Decoder Body
size of variable parts of the virusfull size of the virusαααα ====
ββββ ==== number of variants of the decoders
Flowchart ofFlowchart of aa virusvirus
Flowchart ofFlowchart of aa virusvirussearch for an
uninfected program
Flowchart ofFlowchart of aa virusvirussearch for an
uninfected program
append virus
Flowchart ofFlowchart of aa virusvirussearch for an
uninfected program
append virus
choose a randominstruction in the virus
Flowchart ofFlowchart of aa virusvirussearch for an
uninfected program
append virus
choose a randominstruction in the virus
swap with the nextinstruction
Flowchart ofFlowchart of aa virusvirussearch for an
uninfected program
append virus
swap with the nextinstruction
choose a randominstruction in the virus
repeat100 times
Flowchart ofFlowchart of aa virusvirussearch for an
uninfected program
append virus
swap with the nextinstruction
choose a randominstruction in the virus
repeat100 times
Flowchart ofFlowchart of aa virusvirussearch for an
uninfected program
append virus
swap with the nextinstruction
choose a randominstruction in the virus
repeat100 times
DISKDISK
Name: RIPPERAliases: Jack RipperStatus: CommonOrigin: NorwayLength: 1024 bytes (2 sectors)Infect: MBR, Boot sectorOther: Resident, Stealth,
Disk corruption
Name: RIPPERAliases: Jack RipperStatus: CommonOrigin: NorwayLength: 1024 bytes (2 sectors)Infect: MBR, Boot sectorOther: Resident, Stealth,
Disk corruption
The virus swaps two words in the DOSwrite buffer. It occurs on a random basisof approximately 1 write in 1024 cases.
Multiplatform virusesMultiplatform virusesGGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
Multiplatform virusesMultiplatform virusesGGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
Conditions:
VVV111 UUU222 000UUU111 VVV222 000
� ≠≠≠≠� ≠≠≠≠
G1 has to know some operation codes of G2
G2 has to know some operation codes of G1
Multiplatform virusesMultiplatform virusesGGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
Conditions:
UUU111 UUU222 000� ≠≠≠≠- The virus code can be the same..
Multiplatform virusesMultiplatform virusesGGG111=<V=<V=<V111 ,U,U,U111 ,T,T,T111 ,f,f,f111 ,q,q,q111 ,M,M,M111>>>GGG222=<V=<V=<V222 ,U,U,U222 ,T,T,T222 ,f,f,f222 ,q,q,q222 ,M,M,M222>>>
Conditions:
UUU111 UUU222 000
UUU111 UUU222 = 0= 0= 0
� ≠≠≠≠
�
- The virus code can be the same..
- The virus code must be different..