Date post: | 23-Aug-2014 |
Category: |
News & Politics |
Upload: | chris-taggart |
View: | 3,379 times |
Download: | 7 times |
Open Data
Not Just Good. Better
Open Data is Good!
http
://w
ww
.flic
kr.c
om/p
hoto
s/st
olid
soul
/433
1297
08/s
izes
/o/i
n/ph
otos
trea
m/
But we’re not the ones we need to convince
http
://o
kfes
tiva
l.org
/ope
n-go
vern
men
t-da
ta-c
amp/
Most people don’t care about ‘open’
http
://w
ww
.flic
kr.c
om/p
hoto
s/er
lin1/
9312
6462
98/s
izes
/l/i
n/ph
otos
trea
m/
Even though open data is better
(than closed/proprietary)
Even though open data is better
(than closed/proprietary)• Better for innovation
Even though open data is better
(than closed/proprietary)• Better for innovation
• Better for competition
Even though open data is better
(than closed/proprietary)• Better for innovation
• Better for competition
• Better for efficiency
Even though open data is better
(than closed/proprietary)• Better for innovation
• Better for competition
• Better for efficiency
• Better for sharing (esp cross-organisation or cross-border)
But open has a secret weapon
http
://w
ww
.flic
kr.c
om/p
hoto
s/x-
ray_
delt
a_on
e/84
9333
5701
/siz
es/l
/in/
phot
ostr
eam
/
It’s better quality too
http
://w
ww
.flic
kr.c
om/p
hoto
s/in
fusi
onso
ft/4
4843
7317
9/si
zes/
l/in
/pho
tost
ream
/
Problem Cause
Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying
Gaps in data High (& often duplicated) cost of data entry. Limited to payers
Lack of granularity Legacy systems/data models hard to reengineer in closed world
Errors go uncorrected Few feedback mechanismsBlack box/No provenance
Can’t reveal (sometimes dubious) sources. Limits usefulness/trust
IsolatedProprietary IDs are internal identifiers & are barriers to
sharing & improved data quality
Common proprietary data quality issues
Problem Cause
Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying
Gaps in data High (& often duplicated) cost of data entry. Limited to payers
Lack of granularity Legacy systems/data models hard to reengineer in closed world
Errors go uncorrected Few feedback mechanismsBlack box/No provenance
Can’t reveal (sometimes dubious) sources. Limits usefulness/trust
IsolatedProprietary IDs are internal identifiers & are barriers to
sharing & improved data quality
Common proprietary data quality issues
Problem Cause
Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying
Gaps in data High (& often duplicated) cost of data entry. Limited to payers
Lack of granularity Legacy systems/data models hard to reengineer in closed world
Errors go uncorrected Few feedback mechanismsBlack box/No provenance
Can’t reveal (sometimes dubious) sources. Limits usefulness/trust
IsolatedProprietary IDs are internal identifiers & are barriers to
sharing & improved data quality
Common proprietary data quality issues
Problem Cause
Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying
Gaps in data High (& often duplicated) cost of data entry. Limited to payers
Lack of granularity Legacy systems/data models hard to reengineer in closed world
Errors go uncorrected Few feedback mechanismsBlack box/No provenance
Can’t reveal (sometimes dubious) sources. Limits usefulness/trust
IsolatedProprietary IDs are internal identifiers & are barriers to
sharing & improved data quality
Common proprietary data quality issues
Problem Cause
Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying
Gaps in data High (& often duplicated) cost of data entry. Limited to payers
Lack of granularity Legacy systems/data models hard to reengineer in closed world
Errors go uncorrected Few feedback mechanismsBlack box/No provenance
Can’t reveal (sometimes dubious) sources. Limits usefulness/trust
IsolatedProprietary IDs are internal identifiers & are barriers to
sharing & improved data quality
Common proprietary data quality issues
Problem Cause
Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying
Gaps in data High (& often duplicated) cost of data entry. Limited to payers
Lack of granularity Legacy systems/data models hard to reengineer in closed world
Errors go uncorrected Few feedback mechanismsBlack box/No provenance
Can’t reveal (sometimes dubious) sources. Limits usefulness/trust
IsolatedProprietary IDs are internal identifiers & are barriers to
sharing & improved data quality
Common proprietary data quality issues
Problem Cause
Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying
Gaps in data High (& often duplicated) cost of data entry. Limited to payers
Lack of granularity Legacy systems/data models hard to reengineer in closed world
Errors go uncorrected Few feedback mechanismsBlack box/No provenance
Can’t reveal (sometimes dubious) sources. Limits usefulness/trust
IsolatedProprietary IDs are internal identifiers & are barriers to
sharing & improved data quality
Common proprietary data quality issues
A concrete example: corporate networks
Hugely important (and valuable)
• The dataset we need to understand the corporate world
• Who we (or the government) is really doing business with
• Political influence/donations/lobbying• Tax/resource extraction• Corporate Governance• Credit risk
But proprietary datasets on this are problematic
• Expensive, so relatively few users• Huge gaps in data• Uses proprietary IDs (so not clear
what it’s refers to)• Restrictive licences• Opaque – no info re calculations,
provenance or confidence
But proprietary datasets on this are problematic
• Expensive, so relatively few users• Huge gaps in data• Uses proprietary IDs (so not clear
what it’s refers to)• Restrictive licences• Opaque – no info re calculations,
provenance or confidence
Result: low-quality data
The open data alternative
The open data alternative
Enabled by a grant from the
Alfred P Sloan Foundation
Data from disparate public sources
findi
ng
new
in
sigh
ts
no such
company
...an
d er
rors
too
no such
company
What a modern financial company looks like (highly simplified
& truncated views)
What a modern financial company looks like (highly simplified
& truncated views)
What a modern financial company looks like (highly simplified
& truncated views)
What a modern financial company looks like (highly simplified
& truncated views)
private
unlimited
company
Crowd-sourcing?
Ninja-sourcing!
http
://w
ww
.flic
kr.c
om/p
hoto
s/da
niel
ygo/
5531
0247
32/s
izes
/l/i
n/ph
otos
trea
m/
The company that wants to know your network... every friend...
every interaction
http
://w
ww
.flic
kr.c
om/p
hoto
s/je
ffm
cnei
ll/52
6081
5552
/siz
es/l
/
why bother?
Facebook, Inc
This is what we got from their SEC filings as text
Facebook, Inc
(and turned into data)
This is what we got from their SEC filings as text
Facebook, Inc
Pinnacle Sweden AB
Vitesse LLC
Facebook Operations LLC
Facebook Ireland Limited
Edge Network Services Limited
Andale Acquisition Corp
(and turned into data)
This is what we got from their SEC filings as text
Facebook Ireland Limited
Edge Network Services Limited
Pinnacle Sweden AB
Vitesse LLC
Facebook Operations LLC
Andale Acquisition Corp
Then we started investigating
Facebook, Inc
Facebook Ireland Limited
Edge Network Services Limited
Then we started investigating
Facebook, Inc
Facebook, Inc
Facebook Ireland Limited Edge Network Services Limited
Facebook, Inc
Facebook Ireland Limited Edge Network Services Limited
Facebook Cayman Holdings Unlimited
IV
Facebook Cayman Holdings Unlimited II
Facebook Cayman Holdings Unlimited lll
Facebook Ireland Holdings
Randomus Investments Limited
Facebook International Holdings II Ltd
Facebook International Holdings I Ltd
Facebook Cayman Holdings Unlimited I