The New
e-Science
David De Roure
Eindhoven Edition
Due to the complexity of the software and the backend infrastructural requirements, e-Science projects usually involve large teams managed and developed by research laboratories, large universities or governments.
e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.
How do we know when e-Science has succeeded?
Not just accelerated but new
A. When everyone is using the Grid
B. When there are routine scientific advances that would not have happened otherwise
How do we move from heroic scientists doing heroic science with heroic infrastructure to everyday scientists doing science they couldn’t do before?humanists
archaeologistsgeographersmusicologists...researchers!
research
It’s the democratisation of e-Research
scientists
LocalWeb
Repositories
Digital Libraries
Graduate Students
Undergraduate Students
Virtual Learning Environment
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints &
Metadata
Certified Experimental
Results & Analyses
experimentation
Data, Metadata Provenance WorkflowsOntologies
The social process of science
Between 19th October and23rd November 2007
I attended sixinternational meetings
related to e-Science
Grid 2007Scientific and Scholarly Workflows
e-Social Science 2007W3C
Open Grid ForumMicrosoft e-Science
This is what I found
Everyday researchers doing everyday research
• Not just a specialist few doing heroic science with heroic infrastructure
• Chemists are blogging the lab• Everyone is mashing up• Everday hardware – multicore
machines and mobile devices
1
A data-centric perspective, like researchers
• Data is large, rich, complex and real-time
• There is new value in data, through new digital artefacts and through metadata e.g. context, provenance, workflows
• This isn’t “anti-computation” –design interaction around data
2
Collaborative and participatory
• The social process of science revisited in the digital age
• Collaborative tools – blogsand Wikis
• e-Science now focuseson publishing as well as consuming
• Scholarly lifecycle perspective
3
Benefitting from the scale of digital science activity to support science
• This is new and powerful!• Community intelligence• Review• Usage informing
recommendation• e.g. OpenWetWare• e.g. myExperiment
4
Increasingly open
• Preprints servers and institutional repositories
• Open journals• Open access to data• Science Commons• Object Reuse & Exchange
5
Better not Perfect
• The technologies people are using are not perfect
• They are better• They are easy to use• They are chosen by
scientists
6
Empowering researchers
• The success stories come from the researchers who have learned to use ICT
• Domain ICT experts are delivering the solutions
• Anything that takes away autonomy will be resisted
7
About pervasive computing
• e-Science is about the intersection of the digital and physical worlds
• Sensor networks• Mobile handheld
devices
8
1. Everyday researchers doing everyday research2. A data-centric perspective, like researchers3. Collaborative and participatory4. Benefitting from the scale of digital science
activity to support science 5. Increasingly open6. Better not Perfect7. Empowering researchers8. About pervasive computing
Signs of the Times
• e-Science is now enabling researchers to do some completely new stuff!
• As the individual pieces become easy to use, researchers can bring them together in new ways and ask new questions
• “The next level”
Onward and Upward
“Standing on theshoulders of giants”
www.w3.org/2007/Talks/www2007-AnsweringScientificQuestions-Ruttenberg.pdf
(Everyday researchers are giants too)
Note to Reader. The next slides are not intended to be anti-grid. Everyone working on Grid is doing great work.
• Everyday researchers doing everyday researchBUT heroic Grid infrastructure not being adopted
• A data-centric perspective, like researchersBUT Grid gives APIs to computation not data
• Collaborative and participatoryBUT Grid has deeply rooted service provider mindset
• Better not PerfectBUT Grid aims to provide well-engineered perfect solution
• Giving autonomy to researchersBUT Grid has feel of institutional control (at this time)
• About pervasive computingBUT Grid is about portals, not the next generation of users
The Grid Problem
e-ScienceTechnologyCreators& Integrators
ApplicationsResearch
EEResearch
Socio-economic&CommercialInnovation
e-Sciencebespoketailoring
MassUse byResearchers
5 years 5 years 5 years
CSResearch
e-Science
10s ofintegrators
100s ofembeddedconsultants
1000s ofresearch
users
The Arrow Problem e-Science Pipeline
Malcolm Atkinson
NB This isn’t wrong!
Don’t think rollout of technologies...
Think roll-in of researchers...
MassUse byResearchers
MassUse byResearchers
Knowledge co-production vs Service Delivery!
Web Services RESTful APIs cmd lines ssh http
Web Browser Mobile phone iPod Car Equipment PDA
P2P
mashups
workflows
services
applicationsSubjectICT experts Computer
Scientists
Software Companies
Workflowtools
Ruby on Rails
ecosystem
Scientists
open sourceSoftwareEngineers
nescOeRC
• It’s about empowerment as well as provision• People power – the new instrument of scale!• Hence usability:
– Simple/familiar interfaces for users– Simple/familiar interfaces for developers– No need for a summer school!
• Step into user space and look back• Computer Scientists as facilitators and
problem solvers(?)
For a flourishing ecosystem...
• Wikis• Mashups• REST APIs• Google Maps• Technologies:
– AJAX, JSON, Ruby on Rails, ...• Social networking• Web as a distributed application platform
– Amazon S3 and EC2
But what about Web 2.0?!
Signs of the TimesThe Long Tail
Data is the Next Intel Inside
Users add value
Network effects by default
Some Rights Reserved
The Perpetual BetaCooperate, don’t ControlSoftware above the level of the single device
Web 2.0 patterns
www.oreilly.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
1. Everyday researchers doing everyday research
2. A data-centric perspective, like researchers
3. Collaborative and participatory
4. Benefitting from the scale of digital science activity
5. Increasingly open
6. Better not Perfect
7. Empowering researchers
8. About pervasive computing
use Web 2.0 here?
Grid
use Web 2.0
here?
Grid
Grid
use Web 2.0 here
Gridcloud HPC
A utility is a directly and immediately useable service with established functionality, performance and dependability, illustrating the emphasis on user needs and issues such as trust
Services are knowledge-assisted (‘semantic’) to facilitate automation and advanced functionality, the knowledge aspect reinforced by the emphasis on delivering high level services to the user
The architecture comprises services which may be instantiated and assembled dynamically, hence the structure, behaviour and location of software is changing at run-time
Service-Oriented Knowledge Utility
semanticgrid.org/NGG3
If you peel back the label and it says “Grid” or “OGSA” underneath… it is not a cloud. If you need to send a 40 page requirements document to the vendor then… it is not cloud.If you can’t buy it on your personal credit card… it is not a cloudIf they are trying to sell you hardware… it is not a cloud.If there is no API… it is not a cloud.If you need to rearchitect your systems for it… it is not a cloud.If it takes more than ten minutes to provision… it is not a cloud.If you can’t deprovision in less than ten minutes… it is not a cloud.If you know where the machines are… it is not a cloud. If there is a consultant in the room… it is not a cloud.If you need to specify the number of machines you want upfront… it is not a cloud.If it only runs one operating system… it is not a cloud.If you can’t connect to it from your own machine… it is not a cloud.If you need to install software to use it… it is not a cloud.If you own all the hardware… it is not a cloud.
James Governor
Multicore chips will offer so much performance that we need not cobble together heterogeneous resources but rather can deploy simple powerful systems
Geo
ffrey
Fox
Intel Developer Forum
• Web 2.0 is not high performance– It improves the performance of science and people!
• Web 2.0 is not a properly engineered solution– Scientists want better, not perfect. And agility.
• Web 2.0 is not secure– People do lots of “secure” things on the Web
• Web 2.0 is a fad that will pass– It’s inevitable and it’s already happened!
• Web 2.0 works for teenagers but it won’t for scientists– See OpenWetWare
• Web 2.0 lets the oiks in and this is a bad thing– Now we can do peer review even better!
Myths
N2
N
N
One Middleware2N
N
N
Middleware?
N
N
Middleware
Middleware
Middleware
Middleware
MiddlewarePolynomial involving N1,N2 and M
www.myexperiment.org
Workflows are the new rock and roll
Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources
The era of Service Oriented Applications
Repetitive and mundane boring stuff made easier
E. Science laboris
Carole Goble
Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle
Paul meets Jo. Jo is investigating Whipworm in mouse.
Jo reuses one of Paul’s workflow without change.
Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite.
Previously a manual two year study by Jo had failed to do this.
Recycling, Reuse, Repurposing
20072006200520042003
40
Taverna downloads per day
taverna.sourceforge.net
• Run on your laptop – no sysadmin required
• Access independent third party world-wide service providers of applications, tools and datasets– 850 databases, 166 web
servers Nucleic Acids Research Jan 2006
• My local applications, tools and datasets. In the Enterprise. In the laboratory.
• Easily incorporate new services without coding
The Superclient
Kepler
Triana
BPEL
Ptolemy II
myExperiment.org is… “Facebook for Scientists”...but
different to Facebook! A community social network. A gateway to other publishing
environments A federated repository A platform for launching
workflows Publishing self-describing
Encapsulated myExperiment Objects
Mindful publication Started March 2007 Closed beta since July 2007 Open beta November 2007
myExperiment.org is...
Google Gadget
Ownership and Attribution
24/5/2007 | myExperiment | Slide 46
`
users
descriptions
groups
friendships
SearchAPI
tags
Enactor
Enactor API
Workflow API
blobsworkflows
Social NetAPI
TAG API
EMO
APIOwnership
SharingAPI
EPrintsDSpaceFedoraS3SRB
EMOmanifest
HTMLXML
Snapshot map of resources with their relationships and versions
scientists
LocalWeb
Repositories
Graduate Students
Undergraduate Students
Virtual Learning Environment
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints &
Metadata
Certified Experimental
Results & Analyses
experimentation
Data, Metadata Provenance WorkflowsOntologies
Digital Libraries
The social process of science 2.0
• e-Research is about doing new research• Grid is just one part of the solution• Users are not just consumers of
infrastructure. Empower them.• Web 2.0 is a set of design patterns• Think Web 2.0 coupling Grid and other
services• Workflows make e-Science easier, and
Web 2 makes workflows easier
Take Homes 2.0
Contact
David De [email protected]
Carole [email protected]
Thanks
Malcolm Atkinson, Geoffrey Fox,Jeremy Frey, Savas Parastatides,
The myGrid Family