Date post: | 05-Dec-2014 |
Category: |
Technology |
Upload: | basis-technology |
View: | 393 times |
Download: | 0 times |
Triaging Foreign Language Documents for MEDEX
Brian Carrier
VP Digital Forensics
Basis Technology
Scenarios / Problem Statement
1. Media triage is performed in the field. Triage reveals dozens of non-English documents. The translator is busy talking with the suspect.
2. Medium-dive analysis is performed at a base. Even more documents are found. Limited translators are available.
How does examiner / operator prioritize the documents for the translator?
Ideal Solution: Translated Gist
▪ A several page non-English document turns into an English executive summary.
▪ Allow user to understand who, what, and where are mentioned.
▪ No one provides that solution today.
Our Proposed 70% Solution
▪ Show human generated gists when they are known.
▪ Use Rosette Named Entity software to find names of people, places, and organizations:
– Who and where
▪ Use name matching software to identify people on watch lists.
▪ Use dictionaries to find concepts (financial, drugs, IED).
– What
▪ Use graphical techniques to show relationships and context.
Names
▪ Rosette® Entity Extractor: – Uses statistical models, regular expressions, and
gazetteers to find names.
– Works on 17 languages.
▪ Rosette® Name Translator:– Translates names from native language to English.
– Uses linguistic algorithms, dictionaries, and statistical inference.
Concept Dictionary
▪ User generated dictionary based on concepts that are important to them.
▪ Contains both native word and English words.
▪ Text in documents are normalized using Rosette Base Linguistics.
▪ Concepts are identified in native or English.
Navigation Techniques
▪ Goals:– Provide summary of names and concepts.
– Provide context to know what was mentioned nearby.
▪ This is an area of research to find an approach that works best.
Concise, but no context
Prototype Interface 1
Prototype Interface 2
Deployment Platform
▪ Autopsy™ is an open source digital forensics platform.
▪ Development started after our first Open Source Digital Forensics Conference (OSDFCon) in 2010.
▪ Community wanted an end-to-end platform instead of many stand-alone tools.
▪ Version 3.0 was released in September 2012.
▪ Received some US Army funding.
Autopsy 3
Autopsy Capabilities
▪ Ingests hard drives, media cards, and other digital media.
▪ Identifies suspicious files based on:
– Keywords
– Hash databases
– File types
▪ Allows operator to quickly focus on recent user activity:
– Web artifacts
▪ Provides fast results to enable field-based scenarios.
Autopsy Extensibility
▪ Ingest Modules analyze media on import– Hash analysis, keyword search, registry, web artifacts
▪ Content viewers display files– Text, image, text analytics, video triage, …
▪ Report modules generate final reports– HTML, XML, …
Text Gisting Module
Another Module Example
Scenario: USB-based Triage
▪ USB drive from media triage.– Logical files are added to Autopsy.
– User can navigate all documents and images.
Review with Text Gist module
Tag High Priority Files
Translator Focuses on Tagged Files
Scenario 2: Medium Dive
▪ Media card, hard drive, or cell phone are added.
▪ File system is analyzed.
▪ User navigates media using:– Hash lookup
– Keyword search
– Web browser activity
– E-mail analysis
▪ Uses triage module to evaluate documents as they are found.
▪ Uses tags to flag priority files.
80% Solution
▪ Entity resolution integration.
▪ Topic classifiers.
▪ More advanced analysis relating concepts and entities.
▪ More advanced interface approaches.
Questions?
Brian Carrier
VP of Digital Forensics
Basis Technology
617-386-2000