Open Corporate Data: not just good, better

Post on 23-Aug-2014

3,379 views 7 download

Tags:

description

Presentation given by Chris Taggart, CEO and Co-Founder of OpenCorporates at Open Knowledge Festival, Geneva, September 2013 Discussing benefits and quality of open corporate hierarchy (network) data

transcript

Open Data

Not Just Good. Better

Open Data is Good!

http

://w

ww

.flic

kr.c

om/p

hoto

s/st

olid

soul

/433

1297

08/s

izes

/o/i

n/ph

otos

trea

m/

But we’re not the ones we need to convince

http

://o

kfes

tiva

l.org

/ope

n-go

vern

men

t-da

ta-c

amp/

Most people don’t care about ‘open’

http

://w

ww

.flic

kr.c

om/p

hoto

s/er

lin1/

9312

6462

98/s

izes

/l/i

n/ph

otos

trea

m/

Even though open data is better

(than closed/proprietary)

Even though open data is better

(than closed/proprietary)• Better for innovation

Even though open data is better

(than closed/proprietary)• Better for innovation

• Better for competition

Even though open data is better

(than closed/proprietary)• Better for innovation

• Better for competition

• Better for efficiency

Even though open data is better

(than closed/proprietary)• Better for innovation

• Better for competition

• Better for efficiency

• Better for sharing (esp cross-organisation or cross-border)

But open has a secret weapon

http

://w

ww

.flic

kr.c

om/p

hoto

s/x-

ray_

delt

a_on

e/84

9333

5701

/siz

es/l

/in/

phot

ostr

eam

/

It’s better quality too

http

://w

ww

.flic

kr.c

om/p

hoto

s/in

fusi

onso

ft/4

4843

7317

9/si

zes/

l/in

/pho

tost

ream

/

Problem Cause

Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying

Gaps in data High (& often duplicated) cost of data entry. Limited to payers

Lack of granularity Legacy systems/data models hard to reengineer in closed world

Errors go uncorrected Few feedback mechanismsBlack box/No provenance

Can’t reveal (sometimes dubious) sources. Limits usefulness/trust

IsolatedProprietary IDs are internal identifiers & are barriers to

sharing & improved data quality

Common proprietary data quality issues

Problem Cause

Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying

Gaps in data High (& often duplicated) cost of data entry. Limited to payers

Lack of granularity Legacy systems/data models hard to reengineer in closed world

Errors go uncorrected Few feedback mechanismsBlack box/No provenance

Can’t reveal (sometimes dubious) sources. Limits usefulness/trust

IsolatedProprietary IDs are internal identifiers & are barriers to

sharing & improved data quality

Common proprietary data quality issues

Problem Cause

Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying

Gaps in data High (& often duplicated) cost of data entry. Limited to payers

Lack of granularity Legacy systems/data models hard to reengineer in closed world

Errors go uncorrected Few feedback mechanismsBlack box/No provenance

Can’t reveal (sometimes dubious) sources. Limits usefulness/trust

IsolatedProprietary IDs are internal identifiers & are barriers to

sharing & improved data quality

Common proprietary data quality issues

Problem Cause

Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying

Gaps in data High (& often duplicated) cost of data entry. Limited to payers

Lack of granularity Legacy systems/data models hard to reengineer in closed world

Errors go uncorrected Few feedback mechanismsBlack box/No provenance

Can’t reveal (sometimes dubious) sources. Limits usefulness/trust

IsolatedProprietary IDs are internal identifiers & are barriers to

sharing & improved data quality

Common proprietary data quality issues

Problem Cause

Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying

Gaps in data High (& often duplicated) cost of data entry. Limited to payers

Lack of granularity Legacy systems/data models hard to reengineer in closed world

Errors go uncorrected Few feedback mechanismsBlack box/No provenance

Can’t reveal (sometimes dubious) sources. Limits usefulness/trust

IsolatedProprietary IDs are internal identifiers & are barriers to

sharing & improved data quality

Common proprietary data quality issues

Problem Cause

Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying

Gaps in data High (& often duplicated) cost of data entry. Limited to payers

Lack of granularity Legacy systems/data models hard to reengineer in closed world

Errors go uncorrected Few feedback mechanismsBlack box/No provenance

Can’t reveal (sometimes dubious) sources. Limits usefulness/trust

IsolatedProprietary IDs are internal identifiers & are barriers to

sharing & improved data quality

Common proprietary data quality issues

Problem Cause

Data accuracy Data is re-keyed. Few eyeballs. Often little downside to lying

Gaps in data High (& often duplicated) cost of data entry. Limited to payers

Lack of granularity Legacy systems/data models hard to reengineer in closed world

Errors go uncorrected Few feedback mechanismsBlack box/No provenance

Can’t reveal (sometimes dubious) sources. Limits usefulness/trust

IsolatedProprietary IDs are internal identifiers & are barriers to

sharing & improved data quality

Common proprietary data quality issues

A concrete example: corporate networks

Hugely important (and valuable)

• The dataset we need to understand the corporate world

• Who we (or the government) is really doing business with

• Political influence/donations/lobbying• Tax/resource extraction• Corporate Governance• Credit risk

But proprietary datasets on this are problematic

• Expensive, so relatively few users• Huge gaps in data• Uses proprietary IDs (so not clear

what it’s refers to)• Restrictive licences• Opaque – no info re calculations,

provenance or confidence

But proprietary datasets on this are problematic

• Expensive, so relatively few users• Huge gaps in data• Uses proprietary IDs (so not clear

what it’s refers to)• Restrictive licences• Opaque – no info re calculations,

provenance or confidence

Result: low-quality data

The open data alternative

The open data alternative

Enabled by a grant from the

Alfred P Sloan Foundation

Data from disparate public sources

findi

ng

new

in

sigh

ts

no such

company

...an

d er

rors

too

no such

company

What a modern financial company looks like (highly simplified

& truncated views)

What a modern financial company looks like (highly simplified

& truncated views)

What a modern financial company looks like (highly simplified

& truncated views)

What a modern financial company looks like (highly simplified

& truncated views)

private

unlimited

company

Crowd-sourcing?

Ninja-sourcing!

http

://w

ww

.flic

kr.c

om/p

hoto

s/da

niel

ygo/

5531

0247

32/s

izes

/l/i

n/ph

otos

trea

m/

The company that wants to know your network... every friend...

every interaction

http

://w

ww

.flic

kr.c

om/p

hoto

s/je

ffm

cnei

ll/52

6081

5552

/siz

es/l

/

why bother?

Facebook, Inc

This is what we got from their SEC filings as text

Facebook, Inc

(and turned into data)

This is what we got from their SEC filings as text

Facebook, Inc

Pinnacle Sweden AB

Vitesse LLC

Facebook Operations LLC

Facebook Ireland Limited

Edge Network Services Limited

Andale Acquisition Corp

(and turned into data)

This is what we got from their SEC filings as text

Facebook Ireland Limited

Edge Network Services Limited

Pinnacle Sweden AB

Vitesse LLC

Facebook Operations LLC

Andale Acquisition Corp

Then we started investigating

Facebook, Inc

Facebook Ireland Limited

Edge Network Services Limited

Then we started investigating

Facebook, Inc

Facebook, Inc

Facebook Ireland Limited Edge Network Services Limited

Facebook, Inc

Facebook Ireland Limited Edge Network Services Limited

Facebook Cayman Holdings Unlimited

IV

Facebook Cayman Holdings Unlimited II

Facebook Cayman Holdings Unlimited lll

Facebook Ireland Holdings

Randomus Investments Limited

Facebook International Holdings II Ltd

Facebook International Holdings I Ltd

Facebook Cayman Holdings Unlimited I