“Big%Data:%Big%Costs,%Enormous%
Risks%and%Whopping%
Opportuni:es”%
=%Is%your%archive%half%empty,%half%full%
or%are%you%afraid%to%look?%
…www.matrixconnexions.com… !
Matrix Connexions Consultancy and Advisory Services!Dr%Michael%R.%Taylor%–%Managing%Consultant%
Mobile:%07595%359%506%
Presenta:on%Outline:%
%
1. %Data%becomes%BIG%DATA.%
%
2. %Data%Structures%
%
3. %Data%Usage%–%STRUCTURED%and%UNSTRUCTURED%
%
4. %Meteoric%rise%of%UNSTRUCTURED%Data%
%
5. %Big%Costs,%Enormous%Risks%
%
6. %Informa:on%Management%Impera:ves%
%
7. %Whopping%Opportuni:es%
%
OPTIONAL%OVER%LUNCH%–%‘Big%Data%Jargon’%and%the%‘Big%Data%Quick%Step’%
…www.matrixconnexions.com… !
Public Sector Group - IRMS
* %Data%becomes%BIG%DATA%
Gartner%on%BIG%DATA:%“Big$data%is%high=volume,%high=velocity%and%high=variety%informa:on%assets%that%demand%cost=effec:ve,%
innova:ve%forms%of%informa:on%processing%for%enhanced%insight%
and%decision%making.”%
• Volume%• Velocity%and%%• Variety%
…www.matrixconnexions.com… !
What%is%Crea:ng%this%BIG%DATA?%
SMART%phones%and%TABLET%Computers%generate%UNSTRUCTURED%
data%(including%sound%and%video%files)%
• SMART%phone%ownership,%72%%of%UK%popula:on%in%2013%=%expected%to%be%98%%by%2014/5%
Gartner%on%BIG%DATA:%“The%size,%complexity%of%formats%and%speed%
of%delivery%exceeds%the%capabili:es%of%tradi:onal%data%
management%technologies”%
• TABLET%ownership,%22%%of%UK%popula:on%in%2013%% %=%expected%to%be%50%%by%2014/5%
…www.matrixconnexions.com… !
• More%than%50%%of%adult%popula:on%using%social%media%% %=%UK%Office%of%Na:onal%Sta:s:cs%
What%is%Crea:ng%this%BIG%DATA?%
Nordicana%2014:%1st%February%2014,%Old%Truman%Brewery,%London,%E1%
…www.matrixconnexions.com… !
SMART%phone%photo%e=mailed%to%5%friends%and%Facebook%=%10MByte%storage%
%
= each%friend%forwards%to%2%other%friends%= addi:onal%20MByte%storage%
= add%iCloud%and%other%back=up%storage%= %>100MByte%storage%required%for%1%photo!%
What%is%Crea:ng%this%BIG%DATA?%
Nordicana%2014:%1st%February%2014,%Old%Truman%Brewery,%London,%E1%
…www.matrixconnexions.com… !
3%photos%use%more%storage%capacity%than%a%1979%250MByte%Hard%Drive!%
%
500%SMART%phones%users%taking%only%3%photos%would%typically%require%600$Hard$Drives$of$1979$vintage$to$store$the$150GBytes$of$DATA$generated!$%
(1000%Megabytes%=%1%Gigabyte,%1000%Gigabytes%=%1%Terabyte,%1000%Terabytes%=%1%Petabyte)%%
*%Data%Structures%%
Gartner%on%Dark%Data:%%“DARK%DATA%are%the%
informa:on%assets%organiza:ons%collect,%process%and%
store%during%regular%business%ac:vi:es,%but%generally%
fail%to%use%for%other%purposes%(for%example,%analy:cs,%
business%rela:onships%and%direct%mone:zing.%
%
…organiza>ons$o?en$retain$dark$data$for$compliance$purposes$only.$$Storing$and$securing$(dark)$data$typically$incurs$more$expense$(and$some>mes$greater$risk)$than$value.”$
…www.matrixconnexions.com… !
Public%Sector%Data%in%2014%
It%is%frequently%DISPARATE%and%%%
• 90%%of%all%digital%data%has%been%created%in%the%last%two%years%
…www.matrixconnexions.com… !
Different%Data%Structures%=%These%categories%do%not%have%100%%support%but%are%possibly%the%most%used.%
Structured%Data:%
%Data%that%resides%in%a%fixed%field%within%a%record%or%file%is%
%called%structured%data%–%rela:onal%%databases%and%
%spreadsheets.%
Unstructured%Data:%
%Data%(mainly%in%the%form%of%text)%that%can't%readily%be
%classified,%par:cularly%webpages,%PDF%files,%PowerPoint,%
%emails,%blog%entries,%wikis%and%word%processing%documents.%
Semi=Structured%Data:%
%This%is%data%that%is%a%cross%between%the%two%and%may%also%
%include%visual%and%acous:c%files.%It%has%been%%referred%to%
%as%the%‘Duck%Billed%Platypus%of%Data’.%
…www.matrixconnexions.com… !
*%Data%Usage%–%Scien:fic%Analysis%
1960/70s%Physics%Research%at%RRE%Malvern%Using%Transistor%Computer%
=%24K%words%of%core%store%(100K%Byte%Memory)%
=%Paper%tape%for%Input%and%Output%of%Data%
=%Mag%tape%for%Data%storage.%
• ‘%White%hot%heat%of%technology’=%lab%notes%and%all%DATA%records%very%:ghtly%controlled%(lasers,%LCDs%and%superconductors)%
%
• Failure%to%lodge%DATA%with%Physics%Registry%a%disciplinary%offence.%
…www.matrixconnexions.com… !
Scien:fic%Discoveries%Based%on%Analysis%of%STRUCTURED%DATA%
Data%Usage%–%Predic:ve%Analysis%
Predic:ve%Flight%Performance%Based%on%STRUCTURED%DATA%
…www.matrixconnexions.com… !
1970/80s%NASA%Space%Explora:on%using%General%Purpose%Computers%(GPC)%
%=%400K%Byte%of%storage%
%=%upgraded%to%1%M%Byte%in%1991%
%
• BUT%physical%and%memory%constraints%meant%liule%capacity%for%mission%DATA%%
• HP%41%Programmable%Calculators%used%for%mission%specific%DATA%and%onboard%experiments.%Paul%Fisher’s%an:=gravity%pens%used%to%record%UNSTRUCTURED%DATA!%
1972$GPC$as$fiQed$to$Challenger$
Data%Usage%–%Visualisa:on%of%Unstructured%
Text%
1990s%Metropolitan%Police%Intelligence%System%(MCRAC)%%
%%
• 32%independent%Borough%based%Criminal%Intelligence%Data%Bases%%
• Joined$up$32$Data$Bases$as$a$POC$Demo$$$• 3$significant$inves>ga>ons$solved$in$an$a?ernoon!$%
…www.matrixconnexions.com… !
Intelligence%Opera:ons%Based%on%the%%%%UNSTRUCTURED%DATA%Visualisa
:on%of%
v%
Data%Usage%–%Assessment,%Visualisa:on%
and%Sharing%of%Unstructured%Text%%
2006%Sensi:ve%Data%sharing%across%Whitehall%JIC%Members%2006%
…www.matrixconnexions.com… !
‘Joined%Up’%Users%of%UNSTRUCTURED%and%Previously$DARK$DATA%
2014%UNSTRUCTURED%DATA%was%not%as%DARK$as%it%was%in%2001!%
Data%Analy:cs%
Taylor%M.%R.,%August%2011,%“Latent%Seman:c%Indexing%–%Why%Conceptual%Search%
is%Vital%in%the%Analysis%of%Large%Mul:=Lingual%Data%Sets”.%Digital%Forensics%
Magazine.,%Issue%08,%pp.%17=23.%%
…www.matrixconnexions.com… !
Key$Message:$Document$Analy>cs$is$Complicated!$%
Search%Engines%are%not%the%ANSWER%–%value%is%not%in%the%words%but%in%what%the%
words%are%being%used%to%say!%
%
• Dynamic%Clustering%• Concept%–based%Categoriza:on%• Conceptual%Search%• Summariza:on%• Near%Duplica:on%Detec:on%• Language%Analy:cs%• E=mail%Analy:cs%
*%Meteoric%Rise%of%UNSTRUCTURED%Data%
The%Velocity%and%Volume%of%Data%Crea:on%is%Overwhelming%
%=%dominated%by%UNSTRUCTURED%DATA,%the%most%difficult%data%to%analyze!%
=%90%%of%all%data%will%be%UNSTRUCTURED%data%types%by%2015%
…www.matrixconnexions.com… !
Some%Data%is%Dark%because%it%is%siloed%and%inaccessible!%
Legacy%IT%Business%Solu:ons%were%built%to%solve%specific%business%
problems%and%the%policies%existent%at%:me%of%procurement.%
• Proprietary%• Monolithic%• Slow%and%• Expensive.%
Legacy%systems%are%frequently%inflexible%and%unable%to%share%
their%siloed%data%–%hence%the%rise%of%Dark%Data!%
%
Less%than%0.5%%is%analysed%for%any%purpose.%
…www.matrixconnexions.com… !
90%%of%Public%Sector%Departments%
Cannot%Access%the%Right%Data.%
Photo%Credit:%©%Mark%Richards%2012%/%from%The%Human%Face%of%Big%Data.%
In%2012%more%than%90%%of%Local%Government%Departments%could%
not%access%&%process%the%correct%informa:on%to%support%their%
current%business%outcomes!%
…www.matrixconnexions.com… !
*%BIG%DATA,%Big%Costs%%%%
BIG%DATA%repositories%typically%contain%85%%of%an%organiza:on's%
UNSTRUCTURED%informa:on%resource.%
Documents,%par:cularly%e=mails%have%a%habit%of%growing%exponen:ally%
%
• Data%repositories%typically%contain%40%%DUPLICATIONS%%%
• Duplicates%cost%between%£5%=%£80%per%document%%
• Increased%risk%of%accidental%distribu:on%%
• Near%Duplicates%can%create%mul:ple%versions%of%the%‘Truth’%%
…www.matrixconnexions.com… !
Why%are%Duplicates%so%Damaging?%
Because%holding%duplicate%records%is%like%driving%a%%
car%with%a%DIRTY$WINDSCREEN:%%
Duplicates:%
• Devalue%your%analysis%
• Slows%down%system%response%%
• Create%unnecessary%storage%costs%
• Compromise%informa:on%legisla:on%%
• Cause%uninten:onal%policy%viola:ons.%%
…www.matrixconnexions.com… !
BIG%DATA,%Enormous%Risks%%%
BIG%DATA%repositories%create%Enormous%Risks%–%mobile%devices%
contain%>80%%of%an%organiza:on's%IP.%
%
Some%security%related%issues:%
%
• Big%Data%sources%can%create%policy%noncompliance.%%%
• Is%your%data%stored%securely%and%who%is%authorised%to%access%it?%
• Who%has%accessed%it%already?%%
• Is%the%data%correctly%Protec:vely%Marked?%
…www.matrixconnexions.com… !
*%Informa:on%Management%Impera:ves%
%%%%%%Photo%Credit:%©%Jason%Grow%2012%%
All%of%the%above%simultaneously!%
• Iden:fy%the%informa:on%required%to%improve%business%outcomes%–%develop%
an%Informa:on%Strategy.%
• Manage%the%data%they%have%and%who%has%access%to%it%–%implement%effec:ve%
Informa:on%Management.%
• Leverage%Unstructured%Data%through%Big%Data%analy:cs.%
…www.matrixconnexions.com… !
Public%Sector%managers%that%leverage%their%‘BIG%DATA’%assets%will%be%tomorrow’s%
leaders!%
%
• BIG%DATA%tools%provide%powerful%insights%and%the%ability%to%predict%trends.%%%%
• The%measurable%outcomes%for%these%Public%Sector%Departments%are%increased%efficiency%and%effec:veness.%
%
• Whopping%opportuni:es%arise%from%combining%and%analyzing%data%from%mul:ple%sources%so%you%can%take%the%right%ac:on,%at%the%right%:me%and%in%the%
right%place.%%%%
*%%Whopping%Opportuni:es%for%the%%
%Public%Sector%…www.matrixconnexions.com… !
Public%Sector%Opportuni:es%
Iden:fy%process%‘risk%and%pain%points’%that%would%be%removed%by%
leveraging%your%Big%(Disparate%&%Dark)%Data.%%%
• $$Manage%your%Big%DATA%• Remove%duplicates%• Remove%non%compliant%files%%
• $$Protect%your%Big%DATA%• Check%protec:ve%marking%• Re=label%if%necessary%• Move%sensi:ve%Data%to%secure%storage%$%
…www.matrixconnexions.com… !
Public%Sector%Opportuni:es%
%
• $$FREEDOM%OF%INFORMATION%(FOI).%%%Is%there%a%business%case%to%automate%the%%
%FOI%process?%
%
• $$FRAUD.%$Would%it%be%beneficial%to%iden:fy%mul:ple%%%iden::es%and%fraudulent%claim%pauerns?%
…www.matrixconnexions.com… !
“Think%Big,%Start%Small,%Deliver%Benefits%Incrementally.”%%
%
THANK%YOU%
%Please%feel%free%to%contact%me%if%you%would%like%to%discuss%aspects%of%
this%presenta:on.%
%%
Dr%Michael%R.%Taylor%
Managing%Consultant%
Matrix%Connexions%
michael.taylor@matrixconnexions%
Tel:%+44(0)%7595%359%506%
…www.matrixconnexions.com… !
Public Sector Group - IRMS