1 Finding Diversity in Remote Code Injection Exploits Presented by Kenneth Poon Fai Yiu 2.4.2007...

1

Finding Diversity inRemote Code Injection Exploits

Presented byKenneth Poon Fai Yiu

2.4.2007

University of California, San DiegoJustin Ma, Stefan Savage, Geoffrey M. Voelker

and

Microsoft ResearchJohn Dunagan, Helen J. Wang

2

Outline

Introduction / Objectives

Benefits of Malware Family Tree

A Remote Code Injection Attack

Shellcode

Methodology for Measuring Diversity

Analysis of Exploit Diversity

NIDS vs Polymorphism

Factors Driving the Evolution

Observations

3

Introduction

Internet users are facing with increasing threats of online crimes due to the presence of numerous malware running on the Internet

Previous studies were focused on methods for defending against such attacks

Few researches have been done on the malware ecosystem, such as

• the relationship between different pieces of malware

• the factors that drive the structural and functional evolution of malware

4

Objectives

To develop a measurement methodology for identifying and measuring the diversity among remote code injection exploits

Use the measured data to

• understand the diversity of today’s malware, and

• construct a shellcode phylogeny (i.e. a malware family tree) for selected vulnerabilities

5

Benefits of Malware Family Tree

Simplify the categorization and analysis of malware

Provide insight into the factors influencing malware development and evolution

Help in estimating the market-share and vigor of different cyber-criminal organizations

6

Glossary Vulnerability

• A system bug or design flaw allowing an attacker to misuse an application (e.g. executing commands on the system)

Malware (“Malicious Software”)

• Software designed to infiltrate or damage a computer system, e.g. computer viruses, worms, spyware and adware

Exploit

• Software that attack a vulnerability of a system in order to gain control of it

• A “remote exploit” is an exploit that works over a network

• A kind of malware; use interchangeably with “malware” throughout the presentation Code injection

• A technique to add codes into a computer program to modify its functionality Shell code

• A piece of machine code used as the payload of an exploit

• May contain mechanism to avoid detection by detection by anti-intrusion system Phylogeny

• A biological term - the study of evolutionary relationship among organisms

• The classification of exploits according to their relationship in the evolutionary history

(Source: Wikipedia)

7


First, there exists a software (e.g. MS Window XP) with vulnerability (e.g. a stack based buffer overflow) and a corresponding malware targeted for such vulnerability

Windows XP Windows XP MalwareMalware

Second, there is a computer with Internet connection installed with this software without applying any patchThird, the malware attacks the computer by injecting exploit code (shellcode, data and random character fillers) to the vulnerability

8


Fourth, the codes overwrite the data in the buffer beyond the boundaries and changes the contents of memory location adjacent to the buffer which may be used by other buffers and variables. If the buffer is a stack-based buffer, the return address of the calling function can also be changed (e.g. to the address of the shellcode)

Fifth, the exploit gains control of the computer and executes the shellcodeSixth, the shellcode may (1) download additional software to the computer, (2) join a centralized “botnet” or (3) reconfigure the operating system to evade detection

VulnerableBuffer

Host Memory

Return Address

Exploit Packets

9

Shellcode

Small, simple, hand-coded machine programs

Initial payload of an exploit that first executes on a newly compromised machine

Polymorphism (variation in the style of construction)

May be encrypted and only decrypted just before execution

XOR encoding is a commonly used encoding scheme

May contain anti-debugging code (including self-modifying code) to complicate disassembly and analysis of the shellcode

10

Methodology for Measuring Diversity

Exploit collection(To collect exploit samples)

Extracting shellcodes

(To extract shellcodes from the collected exploit samples)

Exploit emulation

(To run the extracted shellcodes to retrieve the instruction code bytes )

Clustering

(To group the instructions code bytes into families)

11

Methodology for Measuring Diversity“Exploit Collection”

Examine network traces of traffic using a fully-patched Windows XP computer connected to a residential DSL network

Capture exploit attempts from the DSL network to 4 well-known vulnerabilities for 2 days starting from 6/9/2006 5:00 pm

The 4 vulnerabilities are

• SQL Name Resolution (Slammer)

• LSASS (Sasser)

• MS RPC IsystemActivator (Blaster)

• MS RPC RemoteActivation (Blaster)

12

Methodology for Measuring Diversity“Extracting Shellcodes”

Extract shellcodes directly from the collected network trace using “Shield”

“Shield”

• A tool originally designed for filtering exploits for known vulnerabilities

• But modified to collect data that is beyond the buffer boundary

13

Methodology for Measuring Diversity“Exploit Emulation”

Most shellcodes are encrypted; decoding is needed to reveal the actual executable code

The solution is restricted binary emulation, i.e. allowing the exploit decoding routines to execute in order to reveal the actual instruction codes

Implement the emulator on a Linux platform

Load an encoded shellcode, declare it as a statically allocated buffer, treat the buffer as a function and allow it to run

Overcome the issue with non-executable prefixes by iteratively retrying failed emulations at subsequent offsets

Mark the executed instruction bytes for later analysis

Emulation stops when the control flow makes an absolute jump to a location outside the buffer

14

Methodology for Measuring Diversity“Clustering” (1)

A datamining technique for grouping objects with similar characteristics

Perform clustering on the shellcode instruction bytes using exedit distance - a metric for measuring the similarity between 2 sets of shellcode instruction bytes generated by binary emulation

Construct a dendrogram to visualize the clustering results

Evaluate the resulting clusters manually to confirm the constructed family tree is a sensible representation of the phylogeny of the exploit families

15

Methodology for Measuring Diversity“Clustering” (2)

Exedit Distance

Relative edit distance over the shellcode instruction bytes, which is the number of edit operations (insertion, deletion, substitution) used to transform one string to another

For each sample,

• Mark the executed instruction bytes

• Concatenate the marked bytes in the order they appear in the payload (i.e., memory order) to construct a string representation

• Compress each consecutive run of the NOP (No operation) instructions into one single NOP instruction

Compute the relative edit distance over all exploits using these strings

16

Analysis of Exploit Diversity“SQL Name Resolution” (1)

Malware: Slammer worm

• First noticed on 25.1.2003; infected 75,000 computers in 10 minutes

• Exploited two buffer overflow bugs in Microsoft's SQL Server and Desktop Engine database products

• By generating random IP addresses and send itself out to those addresses

• Dramatically slowed down general Internet traffic

• Patch was available six months before the worm’s first launch

(Source: Wikipedia)

17

Analysis of Exploit Diversity“SQL Name Resolution” (2)

767 exploit samples were collected

2 apparent variations of Slammer were detected

766 exploits with the exact same payload and 1 outlier

The outlier was identical to all the other payloads except for the last 91 bytes; evidence shows that the payload was likely corrupted on the network before being captured in the trace

By discarding the outlier sample, there was only 1 Slammer exploit in the DSL trace; so, no exploit diversity

18

Analysis of Exploit Diversity“LSASS” (1)

Malware: Sasser worm

• First noticed on 30.4.2004; disrupted operations for airlines, banks, and government offices globally

• Exploited a buffer vulnerability in LSASS (Local Security Authority Subsystem Service) of MS Windows 2000 and XP

• By scanning different ranges of IP addresses and connects to victims’ computers primarily through TCP port 139 or 445

• Patch was available in 4.2004, prior to the release of the worm

• Written by a 18 years old CS student in Germany; arrested and received a 21 month suspended sentence

(Source: Wikipedia)

19



56 distinct payload were identified

Histogram of shellcode instance

20


Each x-axis position represents a unique shellcode

The y-axis shows relative edit distance

A horizontal line segment at y-axis value ‘y’ indicates that two sub-clusters had cluster distance ‘y’ when they were merged into one cluster

Dendrogram

21


Most cluster merges occurred at a small exedit distance of 10%; use 10% as threshold for defining families among the exploits

5 families of shellcodes can be identified

Manual examination of the shellcodes concluded that the identified families were indeed 5 separate code bases

LSASS-2, 3 and 4 had sufficient similarity to conclude that they were evolved of the same code base

Dendrogram Evolution diagram

22


Shellcodes within each family exhibits small amount of variation, which corresponds to phone-home/connect-back IP addresses encoded in the payload for the victims to connect to a specified host for downloading additional codes or files

• Connect-back refers to connecting to the victim’s immediate parent in the infection chain

• Phone-home refers to connecting to a central location


23

Analysis of Exploit Diversity“ISystemActivator” (1)

Malware: Blaster worm

• First noticed on 11.8.2003; infected hundreds of thousands of computers within the first 24 hours, and several millions more in the following few months

• Exploited a buffer overflow in the RPC service of Windows 2000 and XP

• By creating a DDoS attack against MS’s “windowsupdate.com”

• The worm contains a hidden string, which reads “billy gates why do you make this possible? Stop making money and fix your software!!”

• Patche was available 1 month earlier than the release of the worm

• Written by an 18 years old US resident; arrested and sentenced to an 18-month prison term

(Source: Wikipedia)

24



90 distinct payload were identified

10 variations responsible for most of the observed exploits while 80 distinct shellcodes appearing only once


25


Most cluster merges happened below a distance of 10%, use this distance value as the threshold to define families among the exploits

6 families of shellcodes can be identified

The low initial threshold distance of 10% and the large gap between cluster merges at distance of 85% indicate that exploits within a family are similar, but vary substantially between families

Dendrogram

26


Manual examination of the shellcodes confirmed that the clusters reflected 6 different code bases

Slight differences among exploits within each family due to variations in data constants

Relatively low 10% exedit distance between ISys-2 and ISys-3 implied a close relationship


27



Only difference was that ISys-3 contained a connect, but ISys-2 contained a bind, listen, and accept; believe that these two families were derived from the same code base except that

• ISys-3 required the newly-infected host to connect back to the infecting host, while

• ISys-2 required the newly-infected host to bind on a socket and wait for a connection attempt from the infecting host

28

Analysis of Exploit Diversity“RemoteActivation” (1)

Malware: Blaster worm

RemoteActivation was the original MS RPC vulnerability that Blaster and its variants exploited before also targeting ISystemActivator

338 distinct exploit payloads were identified; each exploit attempt used a unique payload


29


Exedit distance among the shellcodes was very small; most cluster merges occur below a distance of 1%

Use this value as threshold results in 2 distinct families; the 1.3% interfamily exedit distance indicates that the families are closely related

Dendrogram

30


Manual examination of the shellcodes reveals that the last third of the payload contained randomly generated characters which accounted for the variation within each family

Two very similar but functionally different types of RemoteActivation exploits in the trace; 10% belonged to Remact-0, the bind version, while the other 90% belonged to family Remact-1, the connect-back version

All payloads shared the same prefix which resembles part of the Metasploit Framework but cannot be proofed (Metasploit is a toolkit for generating exploits, and includes options for generating encoded shellcodes and random filler characters)


31

NIDS vs Polymorphism

To what extent exploit polymorphism will limit the effectiveness of Network Intrusion Detection Systems (NIDS)?

Tried to generate the signatures required to exhaustively cover all exploits observed for each vulnerability in the DSL residential trace

For each individual vulnerability except LSASS, one signature sufficed to cover the set of exploits; the size of each signature is 100 bytes

Tested the signatures against a 5-GB trace of network traffic and none of the signatures yielded false positives

The results indicate that polymorphism was not effective for evading detection

32

Factors Driving the Evolution

Having reviewed the relationship between different pieces of malware, but what are the factors that drive the structural and functional evolution of malware?

Two hypotheses are:

• The malware authors wish to use polymorphism to prevent the malware from being caught by NIDS signatures (perhaps they do not realize that their polymorphism was ineffective against evasion), or

• Today’s polymorphism is unrelated to evading NIDS signatures; the variation in shellcodes was due to functional variation (e.g., the bind and connect-back varieties)

33

Observations (1) About 4,500 samples of exploits were collected in a DSL connection in 2-d

ays time; it indicates that once a computer is connected to the Internet, it is exposed to huge amount of malware attacks (an attack every 40 seconds)

For all the Microsoft vulnerabilities studied in the paper, Micorsoft had in fact released the relevant patches before the exploit attacks were first launched

Users should be able protect their machines from such attacks if patches for the vulnerabilities are applied promptly

The public announcement of patch releases by Microsoft advertises the existence of vulnerability to the malware authors, who can perform reverse engineering on the patch to discover the vulnerability and write the malware

34

Observations (2)

Identification of exploit families based on cluster merges threshold seems arbitrary; choosing a different threshold value will result in different number of families and their compositions

Though the exploit families can be verified by manual examination of the shellcodes, such methodology may not be appropriate if the samples involved are in the magnitude of millions – not scalable

Simple relationships are built for some shellcode instances; the relationships of the other shellcode instances remain unknown – complete family tree (phylogeny) cannot be built

Unlike the relationships of organisms; correctness of the constructed shellcode phylogeny is difficult to prove

Recommend to repeat the research using other datamining techniques and distance metrics to see their effects on the resulting exploit families

35

~ End ~

Date post:	20-Jan-2016
Category:	Documents
Upload:	dennis-adams
View:	220 times
Download:	0 times

1 Finding Diversity in Remote Code Injection Exploits Presented by Kenneth Poon Fai Yiu 2.4.2007...

Documents