Building the Data Infrastructure Solutions of Tomorrow
Research Data Challenges
• Storage• Everything else!!!
• The bytes are not enough on their own00110100 00110010
• Metadata, curation tools, indexes, storage abstraction, replication, data transfer, authentication, access control, transformation, analysis, tools, computation, …
Cyberinfrastructure for the 21st Century Vision (CIF21) - 2012
• Develop a deep symbiotic relationship between science and engineering users and developers of cyberinfrastructure to simultaneously advance new research practices and open transformative opportunitiesacross all science and engineering fields
• Provide an integrated and scalable cyberinfrastructure that leverages existing and new components across all areas of CIF21 and establishes a national data infrastructure and services capability
• Ensure long-term sustainability for cyberinfrastructure, via community development, learning and workforce development in CDS&E and transformation of practice
http://www.nsf.gov/cif21/
Research FacilitiesScience Portals
Inte
gra
tive S
erv
ices
Resourc
es
Dis
cip
line S
pecific
Environm
ents
Applications & Frameworks
University Resources International ResourcesNSF Resources Commercial Resources
Architectural Vision for Research Cyberinfrastructurehttps://dibbs17.org
Smarr Taxonomy of Research CI Components• Data Applications
• Particles, Materials, Astro, Geo, GIS, Bio, Social, Environ, Ag, Medical, Sensors, etc• Data Cyberinfrastructure
• Computing, Storage, Federation, Clouds, Networking, SDN• Data Trust, Security, and Privacy• Data Curation
• Capture, Annotation, Documentation, Archiving, Libraries, Management, Publishing• Data Discovery and Exploration
• Semantics, Ontology, Metadata, Data Mining, Web, Search• Data Sharing Middleware
• Accessibility, Collaboration, Hubs, Repositories• Data Workflows• Data Analytics and Analysis
• Data-Intensive Computing, Maching Learning, NLP, Statistics
Tabular DataGap Filling
Climate ModelingLidar
Flood Plain Analysis River Depth Distribution
River MaturityStream Detection and Sinuosity
Satellite/Aerial PhotosLand Cover/Usage
Water Detection (e.g. Lakes, Retaining Ponds)
Green Infrastructure
Hyperspectral
Radar
Photos
3D Reconstruction
3D Data
Human Preference Modeling
Video
People Detection/Tracking
Large Dynamic Group Behavior
Bee Detection/Tracking
Bee Colony Behavior
Underwater Photos
Color Correction
Image Stitching
Mapping
Event Detection
Species Detection/CountingReef Changes
Food Supply
Structural Defects
Hazard Modeling
Microscopy Images
Pollen Detection/Classification
Paleoclimate
EvolutionRoot Tip Tracking
Phenomics
Materials Development
Cell Tracking
Tissue Classification
Renal Failure
Loss of Organ Function
Feedlot Tracking
Disease Detection
Historic Maps
River Meander
Coastline Changes
Documents
NLP
Sentiment Analysis
Regions in Conflict
Handwritten Documents Pre-Digital Datasets
Databases
Web Sites
Publications
Simulations
Tabular DataGap Filling
Climate ModelingLidar
Flood Plain Analysis River Depth Distribution
River MaturityStream Detection and Sinuosity
Satellite/Aerial PhotosLand Cover/Usage
Water Detection (e.g. Lakes, Retaining Ponds)
Green Infrastructure
Hyperspectral
Radar
Photos
3D Reconstruction
3D Data
Human Preference Modeling
Video
People Detection/Tracking
Large Dynamic Group Behavior
Bee Detection/Tracking
Bee Colony Behavior
Underwater Photos
Color Correction
Image Stitching
Mapping
Event Detection
Species Detection/CountingReef Changes
Food Supply
Structural Defects
Hazard Modeling
Microscopy Images
Pollen Detection/Classification
Paleoclimate
EvolutionRoot Tip Tracking
Phenomics
Materials Development
Cell Tracking
Tissue Classification
Renal Failure
Loss of Organ Function
Feedlot Tracking
Disease Detection
Historic Maps
River Meander
Coastline Changes
Documents
NLP
Sentiment Analysis
Regions in Conflict
Handwritten Documents Pre-Digital Datasets
Databases
Web Sites
Publications
Simulations
Brown Dog - A Science Driven Data Transformation Service
• Extensibility• Easy to add new transformations (i.e. converters and extractors)
• Encapsulated transformation software & dependencies
https://en.wikipedia.org/wiki/Mongrel
• API
• Supporting other applications/frameworks to build on top of
• Support for diverse usage (i.e. clients, languages, community tools & applications)
• Scalability, Distributed, Data Movement, Provenance with File Validation & Information
Loss, Tool Preservation & Publication, Open Source
Conversion
Extraction
Brown Dogcurl -s -F "[email protected]" https://bd-api.ncsa.illinois.edu/v1/conversions/pgm/ -H "Transfer-Encoding: chunked" -H "Accept: text/plain" -H "Authorization: e6dab924-04c8-45c0-94aa-f0608c3c1a45”
response = requests.post('https://bd-api.ncsa.illinois.edu/v1/conversions/ed.zip/', files={'file': open(“US-Dk3-2001-2003.xml”, 'rb')}, headers={'Accept': 'text/plain', 'Authorization': 'e6dab924-04c8-45c0-94aa-f0608c3c1a45’})
curl -s https://bd-api.ncsa.illinois.edu/v1/extractions/url/ -X POST -d '{"fileurl":"http://browndog.ncsa.illinois.edu/examples/IMG_0997.jpg"}' -H "Content-Type: application/json" -H "Authorization: e6dab924-04c8-45c0-94aa-f0608c3c1a45" | jq -r ".id"
Geospatial Software
General Software
Operating Systems
Total Code from 2 Files
Lines of Code 273Other files Model
Image SampleDependencies numpy, argparse, glob, cv2,
cPickle, random, h5py, skimage, sklearn, scipy
Difficulties Install OpenCV (cv2)
Lines of Code 47Other files None
Dependencies bd, requests, os, glob, argparse, time, json, PIL
Difficulties
Total Code from 1 File
Clowder (2013-Present)NSF Innovative Systems and Software: Applications to NARA Research Problems (OCI-0525308)
@NCSABrownDoghttp://browndog.ncsa.illinois.edu