+ All Categories
Home > Technology > Vigiles Overview June 2010

Vigiles Overview June 2010

Date post: 02-Aug-2015
Category:
Upload: graeme-mcgowan
View: 609 times
Download: 0 times
Share this document with a friend
Popular Tags:
15
Copyright © gillc S.A. 2009. All rights reserved Flexible and Intelligent Access to Information ColumboDiscovery™ and ColumboForensics™ June 2010 1
Transcript

Copyright © gillc S.A. 2009. All rights reserved

Flexible and Intelligent Access to Information

ColumboDiscovery™ and ColumboForensics™

June 2010

1

Copyright © gillc S.A. 2009. All rights reserved

Contents

Context

Information Challenges

Our Approach to Information Discovery

The Technologies we Use

Columbo® Information Discovery Platform

Automatic Entity Extraction

Themes and Links

Forensics Case Study

Summary and benefits

2

Copyright © gillc S.A. 2009. All rights reserved

Context

Economic pressures versus increasing demand

Increasing technical sophistication of criminal and terrorist

Criminal investigations versus digital investigations

Information technology – failure of expectations

Shortcomings of software - 8 years on from Soham

LEA working together?

3

Copyright © gillc S.A. 2009. All rights reserved

Information Challenges

Massive data volumes

Petabytes + of data (1 Petabyte (1000 terabytes) = approx 3000 million documents Forensics analysis of hundreds of devices on large cases

Diverse sources The Internet – www, blogs, twitter, social networks, virtual worlds, chat-rooms Internal – mail, office systems, intelligence databases, operational systems Third-party databases such as ISPs and Telcos Computers, storage devices, mobile phones, cameras, sat-navs, Wi Intelligence from other law enforcement agencies

Integration of data in multiple formats Structured, unstructured (text), multi-media (image, voice, video) Deleted and hidden Languages / alphabets

Dangers and shortcomings of search

Search engine issues – ranking, relevance etc. Terminology, expert knowledge of subject…. Can distort investigative approach Spellings / miss-spellings

4

Copyright © gillc S.A. 2009. All rights reserved

Some spelling challenges

5

MohammedMuhammadMohammadMuhammedMohamedMohamadMahammedMohammodMahamedMuhammodMuhamadMohmmedMohamudMohammud

hydrogen peroxide

hydrogen peroxode

hydrogen peoxide

hydrogen perioxide

hydrogen peroxcide

hydrogen peroxyde

hydrogen proxide

hydrogen pyroxide

hydrogenperoxide

hydrogen-peroxide

hydrogen peroxide.

hydrogen peroxide)

hydrogen peroxide-

hydrogen peroxide,

hydrogen peroxides

HusseinHussainHusainHusaynHuseinHusenHuseyinHussayn

112 different combinations!

Copyright © gillc S.A. 2009. All rights reserved

Our Approach to Information Discovery

Help the user to understand and explore the content

We identify entities, themes (subjects), links – in most cases automatically

People, Places, Objects, Account Numbers, Telephone Numbers, etc. Themes, Concepts, Sentiment Hard and soft (weak) links between Entities and Themes

We present this in ways that help users understand and explore (discover) the data

Entity /Theme Extractions Summaries Timelines and Graphs Connection and Relationship Diagrams Geo-location Maps

Intelligent search

Prompted Sounds like / spelt like Semantic (find similar content to this)

Automate processes including reports, where possible

6

Copyright © gillc S.A. 2009. All rights reserved

Some of the Technologies we use

We use advanced analysis techniques that result in much better conceptual understanding and forensic performance

These techniques include using semantic indexing and linking and more novel proprietary ‘digital fingerprint’ techniques (CSI – Columbo® Semantic Indexing)

Our platform is scalable and our techniques are geared to indexing and comparing massive amounts of information – many ‘discovery’ requirements are a numbers and speed game

Our platform can be trained to recognise certain patterns where appropriate (both text and image based), and can run autonomously and covertly if required

A key difference is that our solutions ‘turn search upside down’

We get the data to tell us what is there, rather than just looking for something specific We don’t search for the needle hidden in the haystack – we remove the hay and find the needle

together with whatever else might be there

Gillc has a number of products and applications, the main one of which is ColumboDiscoverytm, our integrated information discovery platform

7

Copyright © gillc S.A. 2009. All rights reserved

Columbo® Information Discovery Platform

8

Reports & Comparisons

Relationships

Timelines & Events

Geo-Location

ColumboCOREtm

(Columbo Object Resource Enhancement)ColumboDiscoverytm

(Intelligence Operations & Analysis Techniques)

Entities & Themes

Copyright © gillc S.A. 2009. All rights reserved

Automatic Entity Extraction

All structured and unstructured information resources can be automatically processed for entity extraction, including:

Documents – including web pages, social media, office applications, email, databases Digital devices – cameras, phones, SIM cards, storage devices

The entity types shown (left) are a selection of those already coded into Columbo® software. Others could include for example:

Airports and airlines Known street gangs

Additional types can be added by Gillc or added as Custom types by the end user

Metadata from applications, image files and digital devices is also extracted as entity information. For example:

Device type and ID – for phones, cameras, computers etc. Author and creation date – for enterprise documents etc.

Entity classification is customisable, and includes various identification and matching techniques, for example:

Detect entities where slang, codes or ‘street names’ are used Detect entities where there are multiple spellings Detect complex /variable formats – e.g. phone numbers, dates

9

Copyright © gillc S.A. 2009. All rights reserved

Themes and Links

Themes and Classification

Themes and sub-themes are automatically identified from textual resource information

Various techniques are used for theme deduction Various techniques are used for image classification /

identification

Links

Hard and soft links can be identified or uncovered by interacting with the information within Columbo®

Hard links show direct links between entities, entities and themes, and themes

Soft links (or weak links) can be identified by:– Analysing the presence/popularity of entities and themes in different

resources/devices– Using Columbo® Semantic Indexing (CSI) to identify varying levels of

link strength– CSI is also used for linking / categorising images

10

Copyright © gillc S.A. 2009. All rights reserved

ColumboForensicstm – case study

11

Suspect 3

Suspect 7

Suspect 2

Suspect 4

Suspect 6

Suspect 5

Suspect 1

X 2

X 2

X 2

X 7

X 4

X 5

X 9X 10

X 4X 4

X 3

Copyright © gillc S.A. 2009. All rights reserved

Forensics Process

12

SuspectTwo

SuspectOne

GatheringIndexing andAnalysis

Pro-activeComparison

BetweenSuspects

ImageProcess

ImageProcess

ImageProcess

ImageProcess

ImageProcess

ImageProcess

ImageProcess

ImageProcess

E01

E01

E01

E01

3 days

(7 suspects, 22 phones, 37 computers)(Existing search driven approach requires each device to be analysed separately – estimate of 55- 75 days)

Copyright © gillc S.A. 2009. All rights reserved

ColumboForensicstm – case study benchmarks

Task FTK results (secs) ColumboForensics™ (secs)

List all the documents containing paint and brush 10 10

Which people are mentioned in documents containing paint and brush not possible < 10Bookmark and extract all relevant content of all documents containing paint approx 1 day 20

Extract all the sentences mentioning paint approx 1 day 20

Which telephone numbers are associated with 07771 123456 not possible < 10

Which names are associated with 07771 123456 not possible < 10

Copyright © gillc S.A. 2009. All rights reserved

Some other Law Enforcement considerations

All necessary security features including:

Multiple protection levels

Security at document, entity and word level – extensive audit trail options

Can build case / suspect ‘databases’ allowing:

Intra-case analysis

Cross-case analysis

Suspect consolidation whilst retaining case integrity

Secure links between agencies could allow controlled comparison of content

Performant

Quick response times and turn-around offers real opportunity to change processes

Potential for comprehensive but rapid tri-age

14

Copyright © gillc S.A. 2009. All rights reserved

Summary and Benefits

The Columbo® group of products are powerful, next generation information discovery applications

Columbo® applications are tailored towards ‘discovery’, as opposed to ‘search’

Search implies that the user already knows what to look for Discovery allows the data to identify what may be relevant, and allows the user to

interact with it in order to find the information contained within it

The software delivers significant efficiency savings, by both rapidly finding relevant data and automating much of the process including reporting

The software enhances effectiveness, automatically compares content and incrementally builds an intelligence repository

Columbo® is “implementation-lite” and has capacity to readily link diverse agencies together, sharing and collaborating critical data as appropriate

15


Recommended