Semantically-enabled Digital Investigations
by Spyridon Dosis
Outline
• Problem
• Background
• Developed Method
• Demonstration
• Conclusions
2023-04-15 ISACA Dagen 2013
Problem Area
• Complex attacks against networked systems
• Multiple data sources of possible evidentiary value– Volume & Variety– ”looking for a needle in a stack of
needles” – Paul Pillar, CIA CoA
• Analysis of the collected digital data– Least formalized process step– Rely on investigators’ expertise and
experience2023-04-15 ISACA Dagen 2013
Digital Evidence / Investigations
• Reliable digital data that support hypothesizing about a security incident
• Sound methods for collecting and interpreting digital data
• Reconstruct events found to be criminal (DF)
• Investigate and learn from information security breaches (IR)
2023-04-15 ISACA Dagen 2013
Forensic Tools
• Interpreters between data abstraction layers– e.g. Reconstruct raw disk data into
filesystem hierarchy and objects (files, directories)
• Evidence- but not investigation-centric design
• Limited tool interoperability– Manual integration of tool findings– Multiple (proprietary, undocumented)
data formats/models
2023-04-15 ISACA Dagen 2013
A Digital Investigation Example
ISACA Dagen 20132023-04-15
Semantic Web & Linked Data Technologies
• ”… information is given well-defined
meaning, better enabling computers
and people to work in cooperation” –
(Tim Berners Lee, 2001)
• Ontology – ”explicit and formal
specification of a conceptualization”– Entities, attributes, relationships
• Metadata - Context-based or domain-
specific annotation of data
• Reason and inference of implicit facts2023-04-15 ISACA Dagen 2013
Semantic Web Architecture
• URI/IRI enables global data object
identification
• XML provides a machine readable,
validatable data encoding scheme
• RDF(S) is a metadata data model and
knowledge representation language– Subject-Property-Object/Value statements– Class and Property hierarchies
• OWL 2 is a more expressive KR
language for specifying ontologies– Restrictions, Equivalence, Cardinality,
Property Chains
• Rule and RDF-query languages2023-04-15 ISACA Dagen 2013
Method Overview
2023-04-15 ISACA Dagen 2013
Data CollectionSemantic Representation
Ontological ReasoningRule-based Reasoning
Integrated Query
Domain Ontologies
• Introduced a set of lightweight domain-specific OWL ontologies– Storage Media– Network Traffic
– Windows Firewall Log, WHOIS RIR DB– Malicious Networks Reputation List– Malware Detection
2023-04-15 ISACA Dagen 2013
Evidence Representation (Graph)
2023-04-15 ISACA Dagen 2013
Semantic Representation
• Resource Unique Identification Scheme
• Parsing tools able to process each source type with respect to the domain ontology
2023-04-15 ISACA Dagen 2013
Evidence Integration
• Automated linking among (homo/hetero-)geneous evidence
sources based on key properties & matching rules
2023-04-15 ISACA Dagen 2013
Evidence Correlation
• Link instances of dissimilar type across a shared domain
• Temporal Correlation– Rules for establishing time
instant & interval relations among recovered artifacts
• Mereological Correlation– “partOf” transitivity relations
2023-04-15 ISACA Dagen 2013
Semantic Integration & Correlation
2023-04-15 ISACA Dagen 2013
Integrated Query
• Purpose-built triplestore (graph) database engine can store the final dataset– Up to billions of triples
• SQL-like queries against the integrated/correlated evidence set
• Graph pattern matching techniques
2023-04-15 ISACA Dagen 2013
A PoC Instantiation
• Evidence Manager
• Filtering / Pre-processing
• Semantic Parser
• Inference Engine
• Classification, Inverse & Transitive Properties
• Rule & Query Engines 2023-04-15 ISACA Dagen 2013
Experiment A
2023-04-15 ISACA Dagen 2013
Experiment B
2023-04-15 ISACA Dagen 2013
Sample Query
• “Is any file resident on the disk malicious and if yes where has it been downloaded from and which ISP did the IP belong to?”
2023-04-15 ISACA Dagen 2013
Sample Query
SELECT DISTINCT ?pathName ?uri ?ipvalue ?asnumber ?linkWHERE {?file rdf:type digitalmedia:File .?file digitalmedia:hasPathName ?pathName .?file digitalmedia:hasMD5 ?md5 .?httpbody integration:HTTPContentToMediaFile ?file .?file integration:MediaFileToVTFile ?vtfile .?vtfile virustotal:hasAVReport ?report .?report virustotal:hasPermanentLink ?link .?httpresp http:body ?httpbody .?httpreq http:requestURI ?uri .?httpreq http:resp ?httpresp .?http packetcapture:hasHTTPRequest ?httpreq .?http rdf:type packetcapture:HTTP .?tcpflow packetcapture:hasApplicationLayerProtocol ?http .?tcpflow packetcapture:hasDestinationIP ?destip .?destip packetcapture:hasIPValue ?ipvalue .?destip integration:PcapIPToWHOISIpAddr ?whoisip .?whoisip whois:isContainedInRange ?range .?range whois:hasRange ?rangeValue .?range whois:isContainedInAS ?as .?as whois:hasNetName ?netname .?as whois:hasASNumber ?asnumber
2023-04-15 ISACA Dagen 2013
Example Hypothesies-Queries
• Have there been any unsuccessful connection attempts from systems in the same network as the one that hosted the malicious file?
• Which disk files have been created or accessed shortly after the malicious file was downloaded?
• Has there been any successful connection between our system and a known malicious host?
• Which files have been accessed shortly before the host communicated with any blacklisted network host?
• Which websites have been visited by the user shortly before the download of the malicious file?
2023-04-15 ISACA Dagen 2013
Summary
• Ability to represent and integrate heterogeneous data
• Supports the formulation and execution of complex queries
• Expandable (ontologies, rules, queries)
• Computational complexity depends on the ontology, rules, amount of data
• Reliance to online data sources may affect the accuracy of the results
2023-04-15 ISACA Dagen 2013
Future Work
• Advanced reasoning capabilities (e.g. detect
anti-forensic inconsistencies)
• Extended analysis techniques (e.g. additional
data sources, user activities)
• Large scale performance evaluation, distributed
architecture
• User-friendly graphical interface for rule/query
formulation and result navigation
2023-04-15 ISACA Dagen 2013
Thank you
2023-04-15 ISACA Dagen 2013