+ All Categories
Home > Technology > DEEPSEC 2013: Malware Datamining And Attribution

DEEPSEC 2013: Malware Datamining And Attribution

Date post: 11-May-2015
Category:
Upload: michael-boman
View: 4,760 times
Download: 1 times
Share this document with a friend
Description:
Greg Hoglund explained at BlackHat 2010 that the development environments that malware authors use leaves traces in the code which can be used to attribute malware to a individual or a group of individuals. Not with the precision of name, date of birth and address but with evidence that a arrested suspects computer can be analysed and compared with the "tool marks" on the collected malware sample.
Popular Tags:
43
Malware Attribution Theory, Code and Result
Transcript
Page 1: DEEPSEC 2013: Malware Datamining And Attribution

Malware AttributionTheory, Code and Result

Page 2: DEEPSEC 2013: Malware Datamining And Attribution

Who am I?

• Michael Boman, M.A.R.T. project

• Have been “playing around” with malware analysis “for a while”

• Working for FireEye

• This is a HOBBY project that I use my SPARE TIME to work on

Page 3: DEEPSEC 2013: Malware Datamining And Attribution

Agenda

Theorybehind Malware Attribution

Codeto conduct Malware Attribution analysis

Resultof analysis

Page 4: DEEPSEC 2013: Malware Datamining And Attribution

Theory

Page 5: DEEPSEC 2013: Malware Datamining And Attribution

• Malware Attribution: tracking cyber spies - Greg Hoglund, Blackhat 2010

http://www.youtube.com/watch?v=k4Ry1trQhDk

Page 6: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do?

Binary Human

Move this way

Page 7: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do?

Binary Human

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Page 8: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do?

Binary Human

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Page 9: DEEPSEC 2013: Malware Datamining And Attribution

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Page 10: DEEPSEC 2013: Malware Datamining And Attribution

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Actions / Intent

Installation / Deploym

ent

CN

A (spreader) / C

NE (search &

exfil tool)

CO

MS

Defensive / A

nti-forensic

Exploit

Shellcode

DN

S, Com

mand and C

ontrol Protocol,

Encryption

Page 11: DEEPSEC 2013: Malware Datamining And Attribution

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Actions / Intent

Installation / Deploym

ent

CN

A (spreader) / C

NE (search &

exfil tool)

CO

MS

Defensive / A

nti-forensic

Exploit

Shellcode

DN

S, Com

mand and C

ontrol Protocol,

Encryption

Page 12: DEEPSEC 2013: Malware Datamining And Attribution

Steps

• Step 0: Gather malware

• Step 1: Extract metadata from binary

• Step 2: Store metadata and binary in MongoDB

• Step 3: Analyze collected data

Page 14: DEEPSEC 2013: Malware Datamining And Attribution

Step 1: Extract metadata from binary

Page 15: DEEPSEC 2013: Malware Datamining And Attribution

Development Steps

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Page 16: DEEPSEC 2013: Malware Datamining And Attribution

Development Steps

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Page 17: DEEPSEC 2013: Malware Datamining And Attribution

Development Steps

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Page 18: DEEPSEC 2013: Malware Datamining And Attribution

Step 1: Extract metadata from binary• Hashes (for sample identification)

• md5, sha1, sha256, sha512, ssdeep etc.

• File type / Exif / PEiD

• Compiler / Packer etc.

• PE Headers / Imports / Exports etc.

• Virustotal results

• Tags

Page 19: DEEPSEC 2013: Malware Datamining And Attribution

Identifyingcompiler / packer

• PEiD

• Python

• peutils.SignatureDatabase().match_all()

Page 20: DEEPSEC 2013: Malware Datamining And Attribution

PE Header information

Page 21: DEEPSEC 2013: Malware Datamining And Attribution

VirusTotal Results

Page 22: DEEPSEC 2013: Malware Datamining And Attribution

Tags

• User-supplied tags to identify sample source and behavior

• analyst / analyst-system supplied

Page 23: DEEPSEC 2013: Malware Datamining And Attribution

Step 2: Store metadata and binary in MongoDB

Page 24: DEEPSEC 2013: Malware Datamining And Attribution

Components• Modified VXCage server

• Collects a lot more metadata then the original

• Stores malware & metadata in MongoDB instead of FS / ORDBMS

Page 25: DEEPSEC 2013: Malware Datamining And Attribution

VXCage REST API• /malware/add

• Add sample

• /malware/get/<filehash>

• Download sample. If no local sample, search other repos

• /malware/find

• Search for sample by md5, sha256, ssdeep, tag, date

• /tags/list

• List tags

Page 26: DEEPSEC 2013: Malware Datamining And Attribution

Step 3: Analyze collected data

Page 27: DEEPSEC 2013: Malware Datamining And Attribution

Identifying development environments

• Compiler / Linker / Libraries

• Strings

• Paths

• PE Translation header

• Compile times

• Number of times a software been built

Page 28: DEEPSEC 2013: Malware Datamining And Attribution

Cataloging behaviors

• Packers

• Encryption

• Anti-debugging

• Anti-VM

• Anti-forensics

Page 29: DEEPSEC 2013: Malware Datamining And Attribution

Result

Page 30: DEEPSEC 2013: Malware Datamining And Attribution

Have I seen you before?

• Detects similar malware (based on SSDEEP fuzzy hashing)

Page 31: DEEPSEC 2013: Malware Datamining And Attribution

Different MD5,100% SSDeep match

Page 32: DEEPSEC 2013: Malware Datamining And Attribution

SSDEEP Analysis (3007)

Page 33: DEEPSEC 2013: Malware Datamining And Attribution

SSDEEP Analysis (3007)

Page 34: DEEPSEC 2013: Malware Datamining And Attribution

SSDEEP Analysis (851)

Page 35: DEEPSEC 2013: Malware Datamining And Attribution

Challanges

• Party handshake problem:

• 707k samples analyzed and counting (resulting in over 250 billion compares!)

• Need a better target (pre-)selection

Page 36: DEEPSEC 2013: Malware Datamining And Attribution

What compilers / packers are common?

1. "Borland Delphi 3.0 (???)", 54298

2. "Microsoft Visual C++ v6.0", 33364

3. "Microsoft Visual C++ 8", 28005

4. "Microsoft Visual Basic v5.0 - v6.0", 26573

5. "UPX v0.80 - v0.84", 22353

Page 37: DEEPSEC 2013: Malware Datamining And Attribution

Are there any unidentified packers?

• How to identify a packer

• PE Section is empty in binary, is writable and executable

Page 38: DEEPSEC 2013: Malware Datamining And Attribution

How common are anti-debugging techniques?

• 31622 out of 531182 PE binaries uses IsDebuggerPresent (6 %)

• Packed executable uncounted

Page 39: DEEPSEC 2013: Malware Datamining And Attribution

Analysis Coverage

Core “backbone” sourcecode

Tweaks & Mods

3rd party sourcecode

3rd party libraries

Compiler

Runtime libraries

Time

Paths

MAC Address

Malware

Packing

Machine Binary

Source

Page 40: DEEPSEC 2013: Malware Datamining And Attribution

Future

Page 41: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do in the future

Binary Human

BlacklistsNet ReconCommand

and Control

Developer Fingerprints

TacticsTechniquesProcedures

Social Cyberspace

DIGINT

Physical Surveillance HUMINT

Expand scope of analysis+network +memory +os changes +behavior

Page 42: DEEPSEC 2013: Malware Datamining And Attribution

What am I trying to do in the future

• More automation

• More modular design

• Solve the “Big Data” issue I am getting myself into (Hadoop?)

• More pretty graphs

Page 43: DEEPSEC 2013: Malware Datamining And Attribution

Thank you

• Michael Boman

[email protected]

• @mboman

• http://blog.michaelboman.org

• Code available at https://github.com/mboman/vxcage


Recommended