Not all data is born equal - B.C Open Data Summit 2013

Post on 22-Nov-2014

318 views 0 download

Tags:

description

Several criteria are used to evaluate the value of a dataset. This presentation exemplify how some data is easy to value, while some are much more complex. For the later (which might be a large part of open data), the value discovery process might be long but worthy.

transcript

Not all open data is born equal

• Canadian nonprofit that builds websites and tools to help governments and citizens engage with each other

• Follows two main strategies:

Improve access to government information via open data

Make participation easy and meaningful

Some context

• Citizen Budget: a online consultative budget simulator for municipalities and civil society organizations

• Represent: the largest database and open API of elected Canadian officials with two drupal modules for easy website integration

• MaMairie/MyCityHall: an online portal for tracking and interacting with your city hall

• Open511: an open data standard for traffic data and basic related tools

Ongoing projects

Data = Natural Resources?

Source: James St-Jones (cc-by)Source: USGS

Value! Meh?Hint: this is bauxite

Value extraction

Diamond

Extract

Cut

Tada!

Aluminum

Discover it’s valuable

Elaborate process

Industrialize process…

Cans, Car parts, etc.

• Sort of case study

– Region of San Francisco: 2 leader organizations

• Bay Area Rapid Transit (BART): 80+ apps

• Metropolitan Transportation Commission (MTC): handful of apps

– Same (full of geeks and startups) region

– Same “type” of data (transportation)

– Both organizations are innovative

Let’s look at “intrinsic” data value

Traffic and Transit data

• Transit data

– GTFS & SIRI: open data-oriented standards

– Used by 250 transit/transportation agencies

• Traffic

– Several standards (TMDD, TPEG, etc.), but difficult to use in an open data context

Standard = low barrier to entry,

Tools/apps built for these standards can reach lots of customers

1. Standardization

• Transit data

– Data can be interpreted on its own. No need for external data

• Traffic

– Several subsets of related data (accident, constructions, road data, etc.)

– Data managed by several jurisdictions (local, regional, provincial, federal)

Managing several sources and several datasets is always… complex

2. Self sufficient

• Transit

– (Quite) simple: some schedules, some fares, some spatial data

• Traffic

– Complex: networks are wide, intertwined, with lots of rules, lots of “free” actors

Modeling complex data is… complex and more prone to discrepancy

3. Complexity

• Transit

– Usually buses and trains follow their schedule

– Adding a GPS on each single bus is simple and give almost 100% reliability of the data

• Traffic

– Impossible to monitor every single road segment

Lack of reliability has a strong, negative impact on data value

4. Reliability

Techno-utopian dreamYour iphone 8S

Dear smartphone,I need to pick

the kids at schoolas fast as possible,

what’s the bestchoice?

A wealth of data

Road events

Parking data Crowdsourced data

Realtime traffic sensors (gov)

Realtime traffic (business)

Planned trip

Road dataGaz price

Personal data: car, location, habits

Car efficiency

• “Diamond” data self-sufficient: a strength for adoption

• For all data: real value is in cross-use with other datasets

• Some datasets will find their value because of the existence of other datasets

• Adding new datasets has a multiplier effects on existing related datasets

Multiplicative effect

• Usually open data = open government data

• But open data can be much more

Not only gov data

Gaz price

Road eventsRoad data

Traffic dataParking data

Parking data

Car, transit pass, bike shareTransportation habits

Planned trip

Traffic data

Vehicle efficiency

Crowdsourced data

OpenGovData

Openpersonal

data

Open (?) data from companies

Bike share

Gartner’s hype cycle of innovation (but it is not only about hype)

Some innovation theory

Innovationtrigger

Peak ofinflated expectations

Trough ofdisillusionment

Slope ofenlightment

Plateau ofproductivity

Stairway to heaven(internet-style)

Abyssal crash

You might be here…

…or here

• Assess your datasets: diamond vs bauxite analogy or any other analysis framework

• All datasets are not born equal, some might take more time to show their value

• Help discovery and value extraction process

• Follow “open” standards when they exist or participate to their elaboration

• Improve reliability of data where possible

• Be patient… but active!

Conclusion

Twitter: @opennorth Facebook: OpenNorth.NordOuvertBlog: www.opennorth.ca/blog

Stéphane Guidoin@hoedic