Date post: | 26-Jan-2017 |
Category: |
Technology |
Upload: | flyingzumwalt |
View: | 71 times |
Download: | 0 times |
Why Should You Trust My Data?building data infrastructure that accommodates networks of trust
Matt Zumwalt
datjawn.com | databindery.com
@flyingzumwaltcode{4}lib 2016
http://datjawn.comhttp://databindery.com
Im interested in trust.
Im interested in trust.particularly trust & trustworthiness
when people exchange data
theres a rhythm to the computing world
centralization decentralization
client-server peer-to-peer
mainframes
personal computers
server farms
[internet of everything]the cloud
the PC revolution
computers
the diamond age
remember mainframes?
image credit wikipedia
https://en.wikipedia.org/wiki/UNIVAC#/media/File:UnivacII.jpg
the www
host datareference each other
but data
image credit Torkild Retvedt
https://www.flickr.com/photos/torkildr/3462606643
$$
$$
$$
$
By 2019 the data created by IoE devices alone will be 49 times higher than all the traffic that moved through
datacenters in 2014.
it wont scale.
Reference: Cisco Global Cloud Index
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.html
Worldwide Storage Capacity in 2012: 2.5 zettabytes
Total Data Center Traffic in 2016: 10.4 zettabytes per year
Anticipated data created by Internet of Everything (IoE) devices in 2019:
507.5 zettabytes per year
References: NetApp Cisco Global Cloud Index gigaom Washington Post
http://siliconangle.com/blog/2012/05/21/when-will-the-world-reach-8-zetabytes-of-stored-data-infographic/http://www.cisco.com/c/en/us/solutions/collateral/service-provider/global-cloud-index-gci/Cloud_Index_White_Paper.htmlhttps://gigaom.com/2012/05/30/heres-what-our-web-addiction-looks-like-in-2016/https://www.washingtonpost.com/blogs/ezra-klein/post/how-big-can-the-internet-get/2012/05/30/gJQAu9OH2U_blog.html
distributed data web
You cant propose that something be a universal space and at the
same time keep control of it. - Tim Berners Lee
http://webfoundation.org/about/vision/history-of-the-web/
this relies on trust
elements of trustworthiness
authority & reputation integrity & provenance synergy or compatibility
consistency etc
weve got thisOrganisms have been solving
these problems for eons Humans for millennia
Librarians for centuries Software developers for decades
git for (tabular) data
transparency & reproducibility
http://datjawn.com builds from the work of http://dat-data.com
Tabular: rows & columns (ie. Spreadsheets, CSV, SQL DBs)
http://datjawn.comhttp://dat-data.com
history has branches
initial commit
a set of changes
commit those changes and describe them
Who made the changes? Why did they make them?
When did they commit them?
more changes
commit those changes
different changes committed to a different branch
other changes on another branch
merge two branches
get a specific version prove its identical know who made it
Files are data. They have histories.
Metadata are data. They have histories too. Whatever the data,
The same patterns apply.
How does this get replicated?
client-server approach
peer to peer approach
the tide has already shifted
Stop building server-side applications. Assume that data are anywhere and/or everywhere.
Assume that your software will be run in many places. Erase your distinctions between server and client.
Let data grow branches - build trees (ie. Merkle DAGs) Stop thinking of data as singular.
Stop thinking of datasets as monolithic. Embrace redundancy & replication.
Understand that trustworthiness and authority are dynamic. Broaden your sense of now.
Appreciate provenance.
there are no servers there is only the web
Meet the dat jawn team on Wednesday
Matt Zumwalt
datjawn.com | databindery.com
@flyingzumwaltcode{4}lib 2016
http://datjawn.comhttp://databindery.com