+ All Categories
Home > Documents > Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering...

Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering...

Date post: 17-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
22
Building the Data Infrastructure Solutions of Tomorrow
Transcript
Page 1: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Building the Data Infrastructure Solutions of Tomorrow

Page 2: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Research Data Challenges

• Storage• Everything else!!!

• The bytes are not enough on their own00110100 00110010

• Metadata, curation tools, indexes, storage abstraction, replication, data transfer, authentication, access control, transformation, analysis, tools, computation, …

Page 3: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Cyberinfrastructure for the 21st Century Vision (CIF21) - 2012

• Develop a deep symbiotic relationship between science and engineering users and developers of cyberinfrastructure to simultaneously advance new research practices and open transformative opportunitiesacross all science and engineering fields

• Provide an integrated and scalable cyberinfrastructure that leverages existing and new components across all areas of CIF21 and establishes a national data infrastructure and services capability

• Ensure long-term sustainability for cyberinfrastructure, via community development, learning and workforce development in CDS&E and transformation of practice

http://www.nsf.gov/cif21/

Page 4: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Research FacilitiesScience Portals

Inte

gra

tive S

erv

ices

Resourc

es

Dis

cip

line S

pecific

Environm

ents

Applications & Frameworks

University Resources International ResourcesNSF Resources Commercial Resources

Architectural Vision for Research Cyberinfrastructurehttps://dibbs17.org

Page 5: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Smarr Taxonomy of Research CI Components• Data Applications

• Particles, Materials, Astro, Geo, GIS, Bio, Social, Environ, Ag, Medical, Sensors, etc• Data Cyberinfrastructure

• Computing, Storage, Federation, Clouds, Networking, SDN• Data Trust, Security, and Privacy• Data Curation

• Capture, Annotation, Documentation, Archiving, Libraries, Management, Publishing• Data Discovery and Exploration

• Semantics, Ontology, Metadata, Data Mining, Web, Search• Data Sharing Middleware

• Accessibility, Collaboration, Hubs, Repositories• Data Workflows• Data Analytics and Analysis

• Data-Intensive Computing, Maching Learning, NLP, Statistics

Page 6: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Tabular DataGap Filling

Climate ModelingLidar

Flood Plain Analysis River Depth Distribution

River MaturityStream Detection and Sinuosity

Satellite/Aerial PhotosLand Cover/Usage

Water Detection (e.g. Lakes, Retaining Ponds)

Green Infrastructure

Hyperspectral

Radar

Photos

3D Reconstruction

3D Data

Human Preference Modeling

Video

People Detection/Tracking

Large Dynamic Group Behavior

Bee Detection/Tracking

Bee Colony Behavior

Underwater Photos

Color Correction

Image Stitching

Mapping

Event Detection

Species Detection/CountingReef Changes

Food Supply

Structural Defects

Hazard Modeling

Microscopy Images

Pollen Detection/Classification

Paleoclimate

EvolutionRoot Tip Tracking

Phenomics

Materials Development

Cell Tracking

Tissue Classification

Renal Failure

Loss of Organ Function

Feedlot Tracking

Disease Detection

Historic Maps

River Meander

Coastline Changes

Documents

NLP

Sentiment Analysis

Regions in Conflict

Handwritten Documents Pre-Digital Datasets

Databases

Web Sites

Publications

Simulations

Page 7: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Tabular DataGap Filling

Climate ModelingLidar

Flood Plain Analysis River Depth Distribution

River MaturityStream Detection and Sinuosity

Satellite/Aerial PhotosLand Cover/Usage

Water Detection (e.g. Lakes, Retaining Ponds)

Green Infrastructure

Hyperspectral

Radar

Photos

3D Reconstruction

3D Data

Human Preference Modeling

Video

People Detection/Tracking

Large Dynamic Group Behavior

Bee Detection/Tracking

Bee Colony Behavior

Underwater Photos

Color Correction

Image Stitching

Mapping

Event Detection

Species Detection/CountingReef Changes

Food Supply

Structural Defects

Hazard Modeling

Microscopy Images

Pollen Detection/Classification

Paleoclimate

EvolutionRoot Tip Tracking

Phenomics

Materials Development

Cell Tracking

Tissue Classification

Renal Failure

Loss of Organ Function

Feedlot Tracking

Disease Detection

Historic Maps

River Meander

Coastline Changes

Documents

NLP

Sentiment Analysis

Regions in Conflict

Handwritten Documents Pre-Digital Datasets

Databases

Web Sites

Publications

Simulations

Page 8: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Brown Dog - A Science Driven Data Transformation Service

• Extensibility• Easy to add new transformations (i.e. converters and extractors)

• Encapsulated transformation software & dependencies

https://en.wikipedia.org/wiki/Mongrel

• API

• Supporting other applications/frameworks to build on top of

• Support for diverse usage (i.e. clients, languages, community tools & applications)

• Scalability, Distributed, Data Movement, Provenance with File Validation & Information

Loss, Tool Preservation & Publication, Open Source

Page 9: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Conversion

Extraction

Page 10: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Brown Dogcurl -s -F "[email protected]" https://bd-api.ncsa.illinois.edu/v1/conversions/pgm/ -H "Transfer-Encoding: chunked" -H "Accept: text/plain" -H "Authorization: e6dab924-04c8-45c0-94aa-f0608c3c1a45”

response = requests.post('https://bd-api.ncsa.illinois.edu/v1/conversions/ed.zip/', files={'file': open(“US-Dk3-2001-2003.xml”, 'rb')}, headers={'Accept': 'text/plain', 'Authorization': 'e6dab924-04c8-45c0-94aa-f0608c3c1a45’})

curl -s https://bd-api.ncsa.illinois.edu/v1/extractions/url/ -X POST -d '{"fileurl":"http://browndog.ncsa.illinois.edu/examples/IMG_0997.jpg"}' -H "Content-Type: application/json" -H "Authorization: e6dab924-04c8-45c0-94aa-f0608c3c1a45" | jq -r ".id"

Page 11: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search
Page 12: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Geospatial Software

Page 13: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

General Software

Page 14: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Operating Systems

Page 15: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Total Code from 2 Files

Lines of Code 273Other files Model

Image SampleDependencies numpy, argparse, glob, cv2,

cPickle, random, h5py, skimage, sklearn, scipy

Difficulties Install OpenCV (cv2)

Page 16: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Lines of Code 47Other files None

Dependencies bd, requests, os, glob, argparse, time, json, PIL

Difficulties

Total Code from 1 File

Page 17: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search
Page 18: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search
Page 19: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

Clowder (2013-Present)NSF Innovative Systems and Software: Applications to NARA Research Problems (OCI-0525308)

Page 20: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search
Page 21: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search
Page 22: Building the Data Infrastructure Solutions of Tomorro · 2018-08-03 · science and engineering users and developers of ... • Semantics, Ontology, Metadata, Data Mining, Web, Search

@NCSABrownDoghttp://browndog.ncsa.illinois.edu


Recommended