+ All Categories
Home > Education > Platforms for simulation, visualisation and data analysis

Platforms for simulation, visualisation and data analysis

Date post: 14-Apr-2017
Category:
Upload: uvacolloquium
View: 118 times
Download: 0 times
Share this document with a friend
30
Platforms for simulation, visualisation and data analysis Joris Borgdorff Understanding large-scale human behaviour, and more generally, large complex systems and datasets. Most important: in this age, you should not treat data, simulations and human understanding separately. Software platforms with this combination are a step in this direction. This is a technical talk!!!
Transcript
Page 1: Platforms for simulation, visualisation and data analysis

Platforms for simulation, visualisation and data analysisJoris Borgdorff

Understanding large-scale human behaviour, and more generally, large complex systems and datasets.Most important: in this age, you should not treat data, simulations and human understanding separately.Software platforms with this combination are a step in this direction.This is a technical talk!!!

Page 2: Platforms for simulation, visualisation and data analysis

Platforms for simulation, visualisation and data analysisJoris Borgdorff

Understanding large-scale human behaviour, and more generally, large complex systems and datasets.Most important: in this age, you should not treat data, simulations and human understanding separately.Software platforms with this combination are a step in this direction.This is a technical talk!!!

Page 3: Platforms for simulation, visualisation and data analysis

SIM-CITY: Understanding and responding to problems of urbanisation through computation

Page 4: Platforms for simulation, visualisation and data analysis

3

Shortage of:97% of the required fire stations 80% of the fire fighting vehicles

96% of fire fighters

Fire: high risk, poor infrastructurestatic data: Road network; Fire stations; department census data; hazard mapdynamic data: origin-destination matrixreal-time data: traffic density, fire engine locationscontrol: fire station placement, road police interventionsoutput: traffic behaviour, response times, optimal fire station placement, optimal road interventions

Page 5: Platforms for simulation, visualisation and data analysis

A software platform with this combination facilitates an iterative scientific process; from its early phases of planning/setting up the experiment to predicting behavior.

Page 6: Platforms for simulation, visualisation and data analysis

5

Scenario run: response times in low traffic situationMore scenarios and fire station placements to be run for better overviewJump to micro-simulation

Page 7: Platforms for simulation, visualisation and data analysis

Scenario exploration

Models

Computing infrastructure

Input

Data

Output

trigger andintervene analysis

user

sensors

update show

emergencysupport epidemics

parameterexploration

parameter optimization

cluster cloud

public sources

GIS

experi-ments

files streams database

metrics

statistics

HPC

likelyscenarios

6

Assisted decision support

Used in SIM-CITY, to be repeated in Dynaslum with Depraj, Kumbh Mela project, Indo-Dutch project.

Page 8: Platforms for simulation, visualisation and data analysis

7

Services

Back-end

Data

Legend

Python scenario exploration

Xenon

Computing (cluster)

Provenance(CouchDB)

geographic & aggregated data

(PostGIS)

Web service

Simulation

Geographic, statistics and simulation site

Modify data and parameters

Update parameter

studyREST API

execute

Show output

process

Files(WebDAV)

rethink

raw data(Apache Spark)

scheduled dataprocessing

Jupyter Notebook

connectGeoserver

prototype in SIM-CITY- First upper part - web interface- Who has used Jupyter Notebooks? -> binder- Geoserver, geographic data understandable for machines and web interfaces- Custom web services: essential to provide new functionality, not to serve web pages.

Page 9: Platforms for simulation, visualisation and data analysis

Web interfaces

All demonstrations on https://github.com/NLeSC/collab-demosCrossfilter: make dynamic selections

Page 10: Platforms for simulation, visualisation and data analysis

9

source: computerweekly.com

Docker

Who knows docker?Very lightweightCombine different TCP/IP services with docker-composeNot yet available everywhere.

Page 11: Platforms for simulation, visualisation and data analysis

9

source: computerweekly.com

Docker

Who knows docker?Very lightweightCombine different TCP/IP services with docker-composeNot yet available everywhere.

Page 12: Platforms for simulation, visualisation and data analysis

10

Services

Back-end

Data

Legend

Python scenario exploration

Xenon

Computing (cluster)

Provenance(CouchDB)

geographic & aggregated data

(PostGIS)

Web service

Simulation

Geographic, statistics and simulation site

Modify data and parameters

Update parameter

studyREST API

execute

Show output

process

Files(WebDAV)

rethink

raw data(Apache Spark)

scheduled dataprocessing

Jupyter Notebook

connectGeoserver

- who ever lost track of what simulations they ran? Provenance: keep track of tasks, configuration, use as cache, HTTP support- File service: again, needs HTTP support, WebDAV does this out of the box- For large amounts of raw data that you want to analyse multiple times, you would like some server-side processing use Apache spark (see more later)- For aggregates of raw data, store in a separate database

Page 13: Platforms for simulation, visualisation and data analysis

Apache Spark

11

Source: arstechnica.com

Who has heard of MapReduce or Hadoop? And Apache Spark

Page 14: Platforms for simulation, visualisation and data analysis

Apache spark example

12

Resilient Distributed Dataset

Page 15: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()

Resilient Distributed Dataset

Page 16: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()valdictionary:RDD[DictionaryItem]=documents

Resilient Distributed Dataset

Page 17: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()valdictionary:RDD[DictionaryItem]=documents.flatMap(_.tokens.distinct)

Resilient Distributed Dataset

Page 18: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()valdictionary:RDD[DictionaryItem]=documents.flatMap(_.tokens.distinct).map((_,1L))

Resilient Distributed Dataset

Page 19: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()valdictionary:RDD[DictionaryItem]=documents.flatMap(_.tokens.distinct).map((_,1L)).reduceByKey(_+_)

Resilient Distributed Dataset

Page 20: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()valdictionary:RDD[DictionaryItem]=documents.flatMap(_.tokens.distinct).map((_,1L)).reduceByKey(_+_).filter(_._2>=lowerThreshold)

Resilient Distributed Dataset

Page 21: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()valdictionary:RDD[DictionaryItem]=documents.flatMap(_.tokens.distinct).map((_,1L)).reduceByKey(_+_).filter(_._2>=lowerThreshold).zipWithIndex()

Resilient Distributed Dataset

Page 22: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()valdictionary:RDD[DictionaryItem]=documents.flatMap(_.tokens.distinct).map((_,1L)).reduceByKey(_+_).filter(_._2>=lowerThreshold).zipWithIndex().map(DictionaryItem(_._2,_._1._1,_._1._2))

Resilient Distributed Dataset

Page 23: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()valdictionary:RDD[DictionaryItem]=documents.flatMap(_.tokens.distinct).map((_,1L)).reduceByKey(_+_).filter(_._2>=lowerThreshold).zipWithIndex().map(DictionaryItem(_._2,_._1._1,_._1._2)).cache()

Resilient Distributed Dataset

Page 24: Platforms for simulation, visualisation and data analysis

Apache spark example

12

valdocuments:RDD[Document]=myReadFunc()valdictionary:RDD[DictionaryItem]=documents.flatMap(_.tokens.distinct).map((_,1L)).reduceByKey(_+_).filter(_._2>=lowerThreshold).zipWithIndex().map(DictionaryItem(_._2,_._1._1,_._1._2)).cache()dictionary.saveAsTextFile("dictionary.txt")

Resilient Distributed Dataset

Page 25: Platforms for simulation, visualisation and data analysis

13

Services

Back-end

Data

Legend

Python scenario exploration

Xenon

Computing (cluster)

Provenance(CouchDB)

geographic & aggregated data

(PostGIS)

Web service

Simulation

Geographic, statistics and simulation site

Modify data and parameters

Update parameter

studyREST API

execute

Show output

process

Files(WebDAV)

rethink

raw data(Apache Spark)

scheduled dataprocessing

Jupyter Notebook

connectGeoserver

- Simulations: interface with clusters- Get data on the right place.

Page 26: Platforms for simulation, visualisation and data analysis

14

Testing: travis + dockerExtensions: pyxenon, noodles, sim-city-client

Page 27: Platforms for simulation, visualisation and data analysis

Code quality

• Git • Travis or Jenkins • Code quality • Docker images

• Software and data carpentry

15

- Bringing it together:- Git (Github)- continuous integration (Travis)- Code quality- Docker- A list of software quality measures: https://github.com/NLeSC/estep-checklist/blob/master/checklist.md

Page 28: Platforms for simulation, visualisation and data analysis

Kumbh Mela

80 million people

Page 29: Platforms for simulation, visualisation and data analysis

Kumbh Mela project

• Current focus: data gathering

1. Distribute 3.200 very cheap bracelets with WiFi

2. Camera feeds

3. GPS trackers

4. Questionnaires

• A month worth of data: tens of terabytes

Plan to use spark with Jupyter notebooks and custom services to do data analysis and later simulations.Large scale projects and data sets like these benefit from platform approach.

Page 30: Platforms for simulation, visualisation and data analysis

Combine data, simulations and human understandingUseful in large-scale contexts; in two ways: a large project or a large community. A partner like SURFsara or eScience Center can help.Direct access by user is needed in academia.


Recommended