+ All Categories
Home > Technology > HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Date post: 05-Dec-2014
Category:
Upload: basis-technology
View: 393 times
Download: 0 times
Share this document with a friend
Description:
When digital forensics investigators come across multilingual documents during an examination, how do they quickly check the content without a translator in the room? Basis Technology has built a document triage solution that integrates entity finding and translation capabilities with navigation to quickly help the examiner identify the priority of the document. This solution is a module for Autopsy, which is an open source digital forensics platform that has thousands of users and contributors.
23
Triaging Foreign Language Documents for MEDEX Brian Carrier VP Digital Forensics Basis Technology
Transcript
Page 1: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Triaging Foreign Language Documents for MEDEX

Brian Carrier

VP Digital Forensics

Basis Technology

Page 2: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Scenarios / Problem Statement

1. Media triage is performed in the field. Triage reveals dozens of non-English documents. The translator is busy talking with the suspect.

2. Medium-dive analysis is performed at a base. Even more documents are found. Limited translators are available.

How does examiner / operator prioritize the documents for the translator?

Page 3: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Ideal Solution: Translated Gist

▪ A several page non-English document turns into an English executive summary.

▪ Allow user to understand who, what, and where are mentioned.

▪ No one provides that solution today.

Page 4: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Our Proposed 70% Solution

▪ Show human generated gists when they are known.

▪ Use Rosette Named Entity software to find names of people, places, and organizations:

– Who and where

▪ Use name matching software to identify people on watch lists.

▪ Use dictionaries to find concepts (financial, drugs, IED).

– What

▪ Use graphical techniques to show relationships and context.

Page 5: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Names

▪ Rosette® Entity Extractor: – Uses statistical models, regular expressions, and

gazetteers to find names.

– Works on 17 languages.

▪ Rosette® Name Translator:– Translates names from native language to English.

– Uses linguistic algorithms, dictionaries, and statistical inference.

Page 6: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Concept Dictionary

▪ User generated dictionary based on concepts that are important to them.

▪ Contains both native word and English words.

▪ Text in documents are normalized using Rosette Base Linguistics.

▪ Concepts are identified in native or English.

Page 7: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Navigation Techniques

▪ Goals:– Provide summary of names and concepts.

– Provide context to know what was mentioned nearby.

▪ This is an area of research to find an approach that works best.

Page 8: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Concise, but no context

Page 9: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Prototype Interface 1

Page 10: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Prototype Interface 2

Page 11: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Deployment Platform

▪ Autopsy™ is an open source digital forensics platform.

▪ Development started after our first Open Source Digital Forensics Conference (OSDFCon) in 2010.

▪ Community wanted an end-to-end platform instead of many stand-alone tools.

▪ Version 3.0 was released in September 2012.

▪ Received some US Army funding.

Page 12: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Autopsy 3

Page 13: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Autopsy Capabilities

▪ Ingests hard drives, media cards, and other digital media.

▪ Identifies suspicious files based on:

– Keywords

– Hash databases

– File types

▪ Allows operator to quickly focus on recent user activity:

– Web artifacts

– E-mail

▪ Provides fast results to enable field-based scenarios.

Page 14: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Autopsy Extensibility

▪ Ingest Modules analyze media on import– Hash analysis, keyword search, registry, web artifacts

▪ Content viewers display files– Text, image, text analytics, video triage, …

▪ Report modules generate final reports– HTML, XML, …

Page 15: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Text Gisting Module

Page 16: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Another Module Example

Page 17: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Scenario: USB-based Triage

▪ USB drive from media triage.– Logical files are added to Autopsy.

– User can navigate all documents and images.

Page 18: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Review with Text Gist module

Page 19: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Tag High Priority Files

Page 20: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Translator Focuses on Tagged Files

Page 21: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Scenario 2: Medium Dive

▪ Media card, hard drive, or cell phone are added.

▪ File system is analyzed.

▪ User navigates media using:– Hash lookup

– Keyword search

– Web browser activity

– E-mail analysis

▪ Uses triage module to evaluate documents as they are found.

▪ Uses tags to flag priority files.

Page 22: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

80% Solution

▪ Entity resolution integration.

▪ Topic classifiers.

▪ More advanced analysis relating concepts and entities.

▪ More advanced interface approaches.

Page 23: HLT 2013 - Triaging Foreign Language Documents for MEDEX by Brian Carrier

Questions?

Brian Carrier

VP of Digital Forensics

Basis Technology

617-386-2000


Recommended