+ All Categories
Home > Documents > Mingfang Wu, Stefanie Kethers , Andrew Treloar

Mingfang Wu, Stefanie Kethers , Andrew Treloar

Date post: 22-Feb-2016
Category:
Upload: gauri
View: 45 times
Download: 0 times
Share this document with a friend
Description:
Mingfang Wu, Stefanie Kethers , Andrew Treloar. Getting from managed to reused: Making it easier for researchers to do something useful with data. What is ANDS?. ANDS is supported by the Australian Government Began in 2009, currently funded to mid 2015 - PowerPoint PPT Presentation
Popular Tags:
29
Mingfang Wu, Stefanie Kethers, Andrew Treloar Getting from managed to reused: Making it easier for researchers to do something useful with data
Transcript
Page 1: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Mingfang Wu, Stefanie Kethers, Andrew Treloar

Getting from managed to reused: Making it easier for researchers to do something useful with data

Page 2: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

2

What is ANDS? ANDS is supported by the Australian Government Began in 2009, currently funded to mid 2015 Collaboration between Monash University, CSIRO and the

Australian National University Staff in 6 cities across the country Funded 200+ projects across 68 institutions

ANDS aims to make data more valuable to researchers, research institutions and the nation

Page 3: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

So that researchers can easily publish, discover, access and use research data through the Australian Research Data Commons.

How Do We Make Data More Valuable?

Value

Page 4: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

ANDS Programs Underpinning infrastructure for discovery and citation (ARDC Core) Enable rich metadata about data to be managed and accessible

(Metadata Stores) Make new data and associated metadata available from range of

instruments (Data Capture) Make a selection of existing data and associated metadata available from

Australia’s research-producing universities (Seeding the Commons) Make data and associated metadata available from government

departments (Public Sector Data) Provide the overall policy and practice frameworks to support better

data management and re-use (Frameworks and Capabilities) Demonstrate the value of doing all these (Applications)

4

Page 5: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Tools for Data-reuse

5Data Collections Metadata

Data

Form Hypothesis

Design & Run Experiment

Publish Paper,Data, Software

Research ActivitiesLook UpData

AnalyseData/Results

Discover Data

Transform Data

Visualise Data

Analyse Data

Register Data

Workflow

Integrate Data

Extract Data

Computing

Page 6: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

6

The ANDS Applications Program Funded through EIF (Education Infrastructure Fund) Focus on Software Infrastructure to enable research Goal of the Applications program:

“to produce compelling demonstrations of the value of having data available for re-use” (i.e. enabling research across many sources of data that was not previously possible).

Page 7: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Developed software might… empower researchers to solve important problems build new connections enable important problems to be solved enable new questions to be answered simplify problems accelerate solving problems, or analysing data

7

Page 8: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

What have been funded under the apps program?

7 projects in bio/characterisation 8 projects in climate change adaptation 10 others (urban planning, marine research, public

health, humanity) For a completed list of the apps projects and their

profiles, please visit ANDS project registry: https://projects.ands.org.au/getAllProjects.php?start=app

8

Page 9: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

What kind of tools have been developed?

Data transformation Data linkage and integration Data service Data analysis and modelling Data visulisation Data manipulation workflow

….9

Page 10: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Example Applications Climate Model Downscaling Data for Impacts

Research Cancer Genomics Linkage Application Brain Mapping National Resource POSITIVE PLACES: Spatial Analysis of Public Open

Space

10

Page 11: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Climate Model Downscaling Data for Impacts Research

Regional Climate Model Data Collection

11

Very big!• High spatial and temporal resolution• Large region• Many climate variables• Many atmospheric layers• Multiple simulations

Data on an irregular model grid

Stored in netCDF

Page 12: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

12

Regional Climate Model Downscaling Data

Agricultural Impacts Researchers

Hydrological Impact Researchers

Health Impacts Researchers

Ecological Impacts Group

Page 13: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

13

Climate Change Impact Researchers: I see some problems!

What is a Regional Climate Model?

I don’t have enough disk space for this dataset on my computer

I can’t find data for the sites I’m interested in

My software tools can’t handle this irregular grid.

I can’t read this netCDF data format

This data set doesn’t contain data for my site

This data gives me strange results for the current climate

This dataset is great! – How can I share my work on it with others?

Impacts-relevant high res

Very big!• High spatial and temporal

resolution• Large region• Many climate variables• Many atmospheric layers• Multiple simulations

Data on an irregular model grid

Stored in netCDF

Regional Climate Model Downscaling Data

Page 14: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

14

Data service – Climate Model Downscaling Data for Impact Research (CliMDDIR) (AP04, UNSW)

http://www.climddir.org/node/33

Provide open source software to transform RCM data• Extract subsets of data (e.g.

variables, regions)• Regrid or interpolate data to

sites• Reformat data (e.g. GIS, ASCII,

CSV)• Calculate derived variables

(e.g. pan evaporation)• Apply statistical corrections (if

necessary)

Page 15: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

CliMDDIR Service

15

Collection Description at RDA Service Description at RDA

Page 16: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

CliMDDIR Service Portal

16

Climate impact researchers can

• select region• select time coverage• select variables• select simulation models• select output format• share (sub-set) data to other

researchers

Page 17: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Agricultural Impact Researchers

17

Assess how climate change impact onwheat cropping in NSW using the APSIM agriculture model

Climate Modellers

IT Specialists

Page 18: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Workflow - Cancer Genome Linkage Project

18

Challenges faced by biologists and Clinicians:• The manual process

required to integrated their research data with other data sets

• No availability of standarised analytical processes

• The delay in transitioning from analysis to publication ready result

http://ap27-cgla.blogspot.com.au/

Raw datatttctgaaga ccatggacta tgagacctct

Derived Data (i.e. mutation info) is released through the ICGC Data Portal

Page 19: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Workflow - Cancer Genome Linkage Project

19

Variant detection pipeline in GalaxyProvide software/infrastructure to enable integration/transformation of multiple datasets within the GVL environment

Software Development by QFAB (Queensland Facility for Advanced Bioinformatics, UQ)

Development aligned with that of the NeCTAR GVL

Inclusion of the very large raw ICGC Pancreatic Dataset into the NeCTAR GVL

Development of (reusable) Galaxy Workflows for easier mutation searching

Page 20: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Workflow - Cancer Genome Linkage Project

20

Screenshots of output data

Page 21: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Workflow - Cancer Genome Linkage Project

21

Page 22: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Data VisualisationBrain Mapping National Resource Funded at QCIF and Centre for Advanced

Imaging, UQ Developed TissueStack that can link to

specific parts of the data, , and rapidly view and collaboratively annotate on very large 3D datasets via a web browser.

For detail, please go to Dr. Andrew Janke’s presentation on Wed. 12:05 – 12:25, Room:P1

22

Page 23: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

POSITIVE PLACES: spatial analysis of public open space Are the current provisions of POS and parks adequate for the projected

urban densification and population growth? Will there be enough POS? (i.e. will it meet the 10% land provision still?) Will the provision of different park types and facilities that encourage use by

different population demographics (i.e. small pocket parks with play equipment for young children) or for different uses (i.e. active or passive recreation) be adequate? What more / less will be needed?

Is there sufficient large open space for active recreation and sporting needs? What type of POS can promote increase social connectedness within

communities?

Challenge: lack of a comprehensive and consistent digital datasets of public open space

23

Page 24: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

24http://positiveplaces.blogspot.com.au/

Data integration and interrogation: Public Open Space (POS) Tool developed at UWA

With advance features, users can:• define area of interest directly on screen• upload a user defined region as a GIS

shapefile• scenario test the relationship between

changes in population structure for a user defined area and the provision of POS

POS statistics of a searched suburb or LGA can be downloaded as an Excel spreadsheet

7624 areas of POS• 3813 parks (up to 43

different facilities and amenities per park)

• 820 school grounds/playing fields

• 1860 natural and conservation or bushland areas

• 771 areas of residual green space

Page 25: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Who benefit from the applications projects? Researchers

Conduct existing research more efficiently

Enable new research Increase research

collaboration opportunities Strength relationship with

government agencies and industries

Connect science to the public Government agencies,

urban planner, and infrastructure planner, …

The public 25

Prof. Charles Watson, from Curtin University and neuroscience Research Australia commented that “The ability to share data from cloud, access it through TissueStack, would make a huge difference to the way we are able to interact, the ability for all participates to access the same dataset, to annotate it and to have a discussion on the way forward.

Max De Antoni Migliorati (PhD Candidate from QUT) on Semaphore: monitoring and Modelling Australian Gas Emissions: It is much more time effective, it is much more easier to get our result with Semaphore. Now I can run 5 simulation today, while a previous method, it took me one day to get one simulation done.

Page 26: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Summary Substantial data infrastructures have been built to enable data

sharing and data reuse The ANDS application program has demonstrated the value of

data sharing and data reuse

26

Page 27: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Information ANDS project registry: https://

projects.ands.org.au/getAllProjects.php?start=all Project blogs: http://

andsapps.blogspot.com.au/p/project-feed.html Demonstrations of value: http://

andsapps.blogspot.com.au/p/resources.html

27

Page 28: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Thanks To Ian Macadam (from UNSW) for providing some

slides about CliMDDIR project To all who have participated in and contributed to

the program

28

Page 29: Mingfang  Wu, Stefanie  Kethers , Andrew  Treloar

Questions?

29


Recommended