+ All Categories
Home > Technology > Why should you trust my data code4lib 2016

Why should you trust my data code4lib 2016

Date post: 26-Jan-2017
Category:
Upload: flyingzumwalt
View: 71 times
Download: 0 times
Share this document with a friend
60
Why Should You Trust My Data? building data infrastructure that accommodates networks of trust Matt Zumwalt datjawn.com | databindery.com @flyingzumwalt code{4}lib 2016
Transcript
  • Why Should You Trust My Data?building data infrastructure that accommodates networks of trust

    Matt Zumwalt

    datjawn.com | databindery.com

    @flyingzumwaltcode{4}lib 2016

    http://datjawn.comhttp://databindery.com

  • Im interested in trust.

  • Im interested in trust.particularly trust & trustworthiness

    when people exchange data

  • theres a rhythm to the computing world

    centralization decentralization

    client-server peer-to-peer

  • mainframes

    personal computers

    server farms

    [internet of everything]the cloud

    the PC revolution

    computers

    the diamond age

  • remember mainframes?

  • image credit wikipedia

    https://en.wikipedia.org/wiki/UNIVAC#/media/File:UnivacII.jpg

  • the www

  • host datareference each other

  • but data

  • image credit Torkild Retvedt

    https://www.flickr.com/photos/torkildr/3462606643

  • $$

    $$

    $$

    $

  • By 2019 the data created by IoE devices alone will be 49 times higher than all the traffic that moved through

    datacenters in 2014.

    it wont scale.

    Reference: Cisco Global Cloud Index

    http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html

  • Worldwide Storage Capacity in 2012: 2.5 zettabytes

    Total Data Center Traffic in 2016: 10.4 zettabytes per year

    Anticipated data created by Internet of Everything (IoE) devices in 2019:

    507.5 zettabytes per year

    References: NetApp Cisco Global Cloud Index gigaom Washington Post

    http://siliconangle.com/blog/2012/05/21/when-will-the-world-reach-8-zetabytes-of-stored-data-infographic/http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.htmlhttps://gigaom.com/2012/05/30/heres-what-our-web-addiction-looks-like-in-2016/https://www.washingtonpost.com/blogs/ezra-klein/post/how-big-can-the-internet-get/2012/05/30/gJQAu9OH2U_blog.html

  • distributed data web

    You cant propose that something be a universal space and at the

    same time keep control of it. - Tim Berners Lee

    http://webfoundation.org/about/vision/history-of-the-web/

  • this relies on trust

  • elements of trustworthiness

    authority & reputation integrity & provenance synergy or compatibility

    consistency etc

  • weve got thisOrganisms have been solving

    these problems for eons Humans for millennia

    Librarians for centuries Software developers for decades

  • git for (tabular) data

    transparency & reproducibility

    http://datjawn.com builds from the work of http://dat-data.com

    Tabular: rows & columns (ie. Spreadsheets, CSV, SQL DBs)

    http://datjawn.comhttp://dat-data.com

  • history has branches

  • initial commit

    a set of changes

    commit those changes and describe them

    Who made the changes? Why did they make them?

    When did they commit them?

  • more changes

    commit those changes

  • different changes committed to a different branch

  • other changes on another branch

  • merge two branches

  • get a specific version prove its identical know who made it

  • Files are data. They have histories.

    Metadata are data. They have histories too. Whatever the data,

    The same patterns apply.

  • How does this get replicated?

  • client-server approach

  • peer to peer approach

  • the tide has already shifted

  • Stop building server-side applications. Assume that data are anywhere and/or everywhere.

    Assume that your software will be run in many places. Erase your distinctions between server and client.

    Let data grow branches - build trees (ie. Merkle DAGs) Stop thinking of data as singular.

    Stop thinking of datasets as monolithic. Embrace redundancy & replication.

    Understand that trustworthiness and authority are dynamic. Broaden your sense of now.

    Appreciate provenance.

    there are no servers there is only the web

  • Meet the dat jawn team on Wednesday

    Matt Zumwalt

    datjawn.com | databindery.com

    @flyingzumwaltcode{4}lib 2016

    http://datjawn.comhttp://databindery.com


Recommended