Date post: | 28-Mar-2015 |
Category: |
Documents |
Upload: | jennifer-mcleod |
View: | 212 times |
Download: | 0 times |
Man
ch
este
r C
om
pu
tin
g
Cross Council ICT ConferenceFor e-Science & GRID17-19 May 2004
End to End Services to End to End Services to support an e-Science support an e-Science
CommunityCommunity
Professor M J ClarkDirector ISManchester Computing
2
Agenda
End to End Services to support an e-Science Community – How does it relate to institutional Strategy
– Myth & Magic?
– What do we demand?
– End-to-end issues!
– The challenges!
– Economic issues
– Should I bury my head in the sand!
3
2004-2010
The vision central to the University's IS Strategy is:
To provide a transparent and seamless interface to teaching, RESEARCH and administrative information services.
4
An IS architecture to provide an environment
where the IS solutions maximize efficiency and effectiveness handling of:– routine transactions and access to support– creating solutions for less routine but essential transactions
that facilitates University staff to provide the highest levels of customer service – whilst maintaining high degrees of job satisfaction
where staff have ready access to tools necessary to do their job efficiently and effectively
with simplified processes and policies within constraints acknowledging risks associated with devolved authority
rich in services through a single aggregated interface accessible from networked devices
5
The Principles
Strive for Simplification – Develop tools that can be flexibly applied to reduce the complexity of University
business processes.
Enhance Individuals Productivity– Provide flexible tools that individuals can use to perform their roles more effectively.
Encourage Collaboration and Common Process approaches– alliances with and between stakeholders in process mechanisms in order to further
the University's goals.
Empower Technologies as an Investment– View IS investment in systems, staff and process as an investment that will yield a
return in exchange for up-front expenditures with full transparency of any assumptions of risk.
Focus on Outcomes– Measure and assess projects and teams by what is accomplished.
6
How does this translate into End-to-end support for Research
To demonstrably enhance the research process– from idea – through planning and resourcing– supporting access to: data, codes & algorithms, computing thru to supercomputing– post-processing & visualization– to results and scientific insight– leading to innovation
and to deliver this formidable advantage:– to all researchers,– in the most natural and powerful way possible
Adding value to users research– Collaboration in and through projects
7
e-ScienceWhat does it mean to me?
We were told!
‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’
‘e-Science will change the dynamic of the way science is undertaken.’
John Taylor,
Director General of Research Councils,
Office of Science and Technology
8
GRIDs
[…provides] "Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource"– From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”
"…enables communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of central location, central control, omniscience, existing trust relationships."
9
My translation
Little problems are no longer good enough Large-scale research is done through
– the interaction of people,
– heterogeneous computing resources, information systems, and instruments,
– all of which are geographically and organizationally dispersed.
The overall motivation for “Grids” is to facilitate the routine interactions of people & resources in order to support large-scale science(s) and engineering.
GRID was a bad noun to choose
10
So then: What & when?
Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations
On-demand, ubiquitous access to computing, data, and all kinds of services
New capabilities constructed dynamically and transparently from distributed services
When as a:– Research Project– Pilot Service– Full production service
11
An advanced IT infrastructure & standards
An infrastructure that is hidden from the real ‘science’
When research is facilitated invisibly by the infrastructures– The standards are embedded within the research support infrastructures
and are transparent to the ‘science’
12
Gartner’s Hype Cycle Technology Maturity Assessment Methodology
13
The 5 phases of the Hype Cycle
A Hype Cycle is a graphic representation of the maturity, adoption and business application of specific technologies.
1. "Technology Trigger"The first phase of a Hype Cycle is the "technology trigger" or breakthrough, product launch or other event that generates significant press and interest.
2. "Peak of Inflated Expectations"In the next phase, a frenzy of publicity typically generates over-enthusiasm and unrealistic expectations. There may be some successful applications of a technology, but there are typically more failures.
3. "Trough of Disillusionment"Technologies enter the "trough of disillusionment" because they fail to meet expectations and quickly become unfashionable. Consequently, the press usually abandons the topic and the technology.
4. "Slope of Enlightenment"Although the press may have stopped covering the technology, some businesses continue through the "slope of enlightenment" and experiment to understand the benefits and practical application of the technology.
5. "Plateau of Productivity"A technology reaches the "plateau of productivity" as the benefits of it become widely demonstrated and accepted. The technology becomes increasingly stable and evolves in second and third generations. The final height of the plateau varies according to whether the technology is broadly applicable or benefits only a niche market.
Gartner Hype cycle for emerging technologies July 2003
15
Gartner on Commercial GRIDs
Definition: – Grid formed for non-scientific, non-technical tasks across multiple enterprises to
address a single, large-scale purpose. Grids can also be used within one enterprise. The term "grid" is sometimes misused to denote the related technologies of distributed and utility computing.
Time to Plateau/Adoption Speed:– Five to 10 years.
Justification for Hype Cycle Position/Adoption – Speed: Growing movement by vendors to call products and long-term visions "grid."
Confusion over definitions, benefits, maturity and applicability. Little is known about what commercial grid applications might be.
Business Impact Areas: – New industry models could replace third-party intermediaries for large, multi-
enterprise systems. Joint business opportunities with combined data warehouse and analytics. Distributed computing to increase efficiency and use of IT resources. Some claim grids will transform commercial IT operations.
Analysis by Carl Claunch
16
Gridsconsists of …
Computational facilities (supercomputers, clusters, workstations, small processors, …)
Access to mass storage (disk drives, tapes, …) Networking (including wireless, distributed, ubiquitous) Digital libraries/data bases Sensors/effectors Software (operating systems, middleware, domain specific
tools/platforms for building applications) Services (education, training, consulting, user assistance)
With people: All working together in an integrated fashion.
17
Advances in Technology Can and Will Fix Most of the Fundamental Problems
Infrastructure:Make It Robust, Reliable and Invisible
ApplicationDevelopment:Make It Faster, Cheaper and Holistic
ApplicationMaintenance:Make It Inexpensive
ApplicationDeployment:Make It Secure, Reliable and Interenterprise
When!
18
What are the issues facing service deliverers?
No central coordination– Lack of joined up requirement, commitment, or resource demands
Little predetermination– Making it up as we go– No planned investment, based on need as it arises– No shared understanding of the problems created
Difficult to say no!– But saying yes is also bad!
No experience-based trust relationships– We are asked to support un-trusted third parties– They want to install beta-class software! (and that’s generous)– They want ports and security left wide open!!!
• AND THEY WANT IT FREE!
19
Today’s Grid is demanding
Transparent wide-area access to large data banks Transparent wide-area access to applications on heterogeneous
platforms Transparent wide-area access to processing resources Security, certification, single sign-on authentication (AAA’s)
– Grid Security Infrastructure,
Data access, Transfer & Replication – GridFTP
Computational resource discovery, allocation and process creation – GRAAM, Unicore, Condor-G
20
E-science developments predict shifts in current research practice:
Research (in many disciplines) revolutionized by using computers, digital data, and networks to replace and extend their traditional efforts.– Do we have enough resources directed at scaling solutions to problems
– HPC is running on legacy codes not designed for bid problems
– Algorithms are not designed for new paradigms/new science
New technology-mediated, distributed work environments relax constraints of distance and time– Do we train to work in virtual organisations
21
Challenges to classical approach
The classic two approaches to scientific research,– theoretical/analytical and experimental/observational, have been extended to in
silico simulation – and modelling to explore new possibilities and to achieve new precision.
Challenged by:– The enormous performance leap of computers (and networks) enable simulations of
far more complex systems and phenomena, as well as visualizing the outputs– Advanced computing is no longer restricted to a few research groups in a few fields
such as weather prediction and high-energy physics, but pervades scientific and engineering research, including the biological, chemical, social, and environmental sciences, medicine, and nanotechnology.
– The primary access to the latest findings in a growing number of fields is through the Web
– Crucial data collections in the social, biological, and physical sciences are online and remotely accessible
22
‘Tomorrow’ we might expect (1)
Combine raw data and new models from many sources, and utilize the most up-to-date tools to analyze, visualize, and simulate complex interrelations
Collect / make information widely available– E.g. the outputs of all major observatories and astronomical satellites,
satellite and land-based weather data, three-dimensional images of anthropologically important objects)
– leading to a qualitative change in the way research is done and the type of science that results.
Work across traditional disciplinary boundaries: environmental scientists will take advantage of climate models, physicists will make direct use of astronomical observations, social scientists will analyze interactive behaviour of scientists as well as others
23
‘Tomorrow’ we might expect (2)
Simulate more complex and exciting systems– E.g. cells and organisms rather than proteins and DNA;– the entire earth system rather than air, water, land, and snow
independently
Access the entire published record of science online Make publications incorporating rich media (hypertext,
video, photographic images) Visualize the results of complex data sets in new and
exciting ways,– and create techniques for understanding and acting on these observations
Work routinely with colleagues at distant institutions– even ones that are not traditionally considered research universities, and
with junior scientists and students as genuine peers, despite differences in age, experience, race, or physical limitations.
24
Knowledge environments for ‘GRID’ working
Community-Specific Knowledge Environments for researcher communitiesCustomised for specific disciplines/inter-disciplines
HPCservice
Data, informationKnowledge
managementservices
ObservationMeasurementData-collectionsservices
InterfacesVisualisationservices
Collaborationservices
Networking, Operating Systems, Middleware
Infrastructures: Computation, storage, communication
Denotes: grid infrastructures
25
NSF Cyberinfrastructure – panel conclusion
The Panel’s overarching finding is that:
A new age has dawned in scientific and engineering research, pushed by continuing progress in computing, information, and communication technology; and pulled by the expanding complexity, scope, and scale of today’s research challenges. The capacity of this technology has crossed thresholds that now make possible a comprehensive “cyberinfrastructure” on which to build new types of scientific and engineering knowledge environments and organizations and to pursue research in new ways and with increased efficacy.
The cost of not doing this is high, both in opportunities lost and through increasing fragmentation and balkanization of the research communities.
26
Is There a Definitionfor Cyberinfrastructure (CI)?
Not really - means different things to different groups - but there are commonalities
Literally, infrastructure composed of “cyber” elements Includes High-End Computing (HEC, or supercomputing),
grid computing, distributed computing, etc. etc.
27
Is There a Definitionof Cyberinfrastructure (CI)?
Working definition: an integrated system of interconnected computation/communication/information elements that supports a range of applications
Note: We are only at the beginning of infrastructure developments
Cyberinfrastructure is the means; “e-Science” is the result
28
Integrated architectures
Hardware
Grid Services& Middleware
Development tools& Libraries
Domain specific tools
Dis
covery
& in
novati
on
Ed
ucati
on
& t
rain
ing
}DisciplineIndependentinfrastructures
Applications
29
In Ten Years, an infrastructure that is…
rich in resources, comprehensive in functionality, and ubiquitous;
easily usable by all scientists and engineers accessible anywhere, anytime needed by authenticated
users; interoperable, extendable, flexible, tailorable, and robust; funded by multiple agencies, states, campuses, and
organizations; supported and utilised by educational programs at all
levels.
30
Some characteristics:
Built on broadly accessible, highly capable network: 100’s of terabits backbones down to intermittent, wireless connectivity at low speeds
Contains significant and varied computing resources: 100’s of petaflops at high end, with capacity to support most scientific work
Contains significant storage capacity: exabyte collections common; high-degree of DB confederation possible
Allows wide range of sensors/effectors to be connected: sensor nets of millions of elements attached
Contains a broad variety of intelligent visualization, search, database, programming and other services that are fitted to specific disciplines
31
The initial Challenges
Technical Challenges– Computer Science and Engineering broadly – How to build the components? – Networks, processors, storage devices, sensors, software – How to shape the technical architecture? – Pervasive, many cyberinfrastructures, constantly evolving/changing capabilities – How to customize CI to particular Sci & Eng domains
Operational Challenges– Data standards – General interoperability – Resource allocation – Security and privacy – Training – Continuous evolution
Funding/Ownership Challenges– Cooperation among agencies – Cooperation between federal and state/private levels – Role of campuses – Interaction with private industry – ££££’s !
32
Computer Services
Must run or be extensively involved in “Grids”
– Experimental services for developers
– Production service for developers
– Production service for users
Must be resourced to support the ‘new world’
33
The computer centre remit
Once at the heart of the Computing/Network research agenda– Now support the core business
– Provide ‘plain old internet services’
– Significantly about quantity rather than quality
– Resource limited; minimum risk environment; intolerant user base
Manchester Computing is not typical– Staff actively engaged doing research
– Success through partnerships
– Risk taking within constraints
– Entrepreneurial; >50% of funding external
34
Success through internal Partnerships with
ESNW– Computer Science and increasingly all Schools
– Backed by £3.1m institutional investment for 2004
BioBank (Hub & Spoke)– With Medicine
NCeSS– With Social Sciences, Computer Science, Economics, Geography + Essex
National Text Mining Centre – UMIST & Manchester + Salford + Liverpool
35
The MANs & Campus Networks
The Metropolitan Area Networks will provide GRID capabilities– High Speed access regardless of location
• Resilience: how important?• Who pays and how
– Fairness (location) v cost– Commercial partner issues– Very significant Quality of Service issues to be resolved!
Campus Networks– The ‘last mile’ syndrome– Commodity v research needs (also for MANs)– Security v accessibility– Who has the accountability/responsibility, Who has the sanctions– Very real threats through providing ‘access’
36
Don’t forget the Library (knowledge) services
Knowledge is premised on the access to information Librarians are professionals at information management
– The nature of the medium holding knowledge is changing– The nature of the learned article is changing
• May contain multimedia• Or datasets including access to applications to re-run the ‘experiment’ and even
modify the parameters
The data available to the ‘researcher’ is growing at an alarming rate
Understanding of the IPR issues in relation to knowledge, information and data is a ‘professional’ issue
They ‘could’ be the experts for digital curation support– A growing responsibility for us all!
37
The ‘missing’ skills
Where are the people who are going to develop new codes for new architectures including data resources– Optimisation and recovery require to be integral
Requires comprehension of the science and understanding of the algorithms
Needs to be driven by the demands for efficiency/effectiveness of solutions
Needs to understand the associated datasets Note US Gov is funding both architecture and ‘language’
development for the 2015 timeframe– UK must not loose through under-investment to benefit the future
38
Visualisation
A picture is worth a thousand words Complex information (data) requires simplification for the
human consumer The cost of local visualisation facilities has radically
diminished– 3d, virtual reality, high definition………
However, it will require to handle complex datasets or real-time processing– The visualisation may cause the need to steer the science dynamically
An area of support/requirement expected to expand dramatically with significant new tools/techniques required
39
The migration from research to production
From developer/champion -> don’t care user– From 1st user -> Thousands
Research Project -> Computer Service– Who does QA?– Who does integration?– Who supports it?– Who promotes it?– Who does the development?
Open Middleware Infrastructure Institute (OMMI) will do some– Quality assurance, testing as the community will not abide ‘bugs’
Computer services like ‘supported’ products– Only the ‘best’ survive?
40
AAA
Authentication, Authorisation and Accounting (AAA) Managing access resources involves a number of
processes:– authentication - identifying the person requesting the access
– authorisation - determining from that person's identity, and often using other sources of information, what privileges the individual has and hence whether access should be allowed or not
– accounting - maintaining logs of events for the purpose of generating management information on resource usage
A big challenge to provide certificates for every learner– Is there anyone who is not a learner
41
The digital certificate
An ‘attachment’ to an electronic message used for security purposes. E.g. to verify that a user sending a message is who he/she claims to be, and to provide the receiver with the means to encode a reply.
An individual wishing to send an encrypted message applies for a digital certificate from a Certificate Authority (CA). The CA issues an encrypted digital certificate containing the applicant's public key and a variety of other identification information. The CA makes its own public key readily available through print publicity or perhaps on the Internet.
The recipient of an encrypted message uses the CA's public key to decode the digital certificate attached to the message, verifies it as issued by the CA and then obtains the sender's public key and identification information held within the certificate. With this information, the recipient can send an encrypted reply.
42
Digital certificates
Digital certificates are required as the means of authenticating individuals in e-science Grid projects; they will become more widespread in normal campus operations. Issues to be investigated:– certificate profiling
– life-cycle management of certificates, including revocation mechanisms
– key recovery mechanisms
– use of certificates on public-access workstations
– user mobility (on and off campus)
– "mixed economy" working, i.e. use of certificates alongside more traditional forms of electronic credentials
– development of open source tools to facilitate deployment of certificates in typical university or college environments
43
The ‘REAL’ Challenge
Educational Challenges– How to make sure that future generations of scientists and engineers can
fully utilize emerging ‘enabling’ infrastructures
New paradigms, methods, objectives How to retrain current scientists and engineers How to make sure that new ideas for extending supporting
architectures continue to come from those that are using it
44
Competition v collaboration
It is a cultural agenda
Assumes we can build virtual organisations
We have been ‘indoctrinated’ to compete!– Why collaborate?
Why do I want to open my resources to others – over whom I have little or no ‘control’
Who gives me resource to facilitate collaboration
The Japanese say: ‘We collaborate to competeThe Japanese say: ‘We collaborate to compete’
45
Reality Checks!!
The Technology is Ready?– Not true — its emerging and certainly not robust
• Building middleware, Advancing Standards, Developing, Dependability
• Building demonstrators.
• The computational grid is in advance of the data intensive middleware
• Integration and data curation are probably the obstacles
• But!! It doesn’t have to be all there to be useful.
We know how we will use grid services?– No — Disruptive technology
• We need to lower the barriers of entry.
46
Grid Evolution
1st Generation Grid– Computationally intensive, file access/transfer– Bag of various heterogeneous protocols & toolkits– Recognises internet, Ignores Web– Academic teams
2nd Generation Grid– Data intensive -> knowledge intensive– Services-based architecture– Recognises Web and Web services– Global Grid Forum– Industry participation
We are here!
47
Sharing & Funding
The current philosophy is to donate some portion of ‘their’ resource to the Grid.
Who will donate resources to the ‘GRID‘ and why?– Will my VC?
Quite reasonably a funding body could argue – If you only need X units of the resource to do the science you indicated in your case, then that is what you should get.– Alternatively if you need additional or alternative resources you should
have indicated this in your original request.
How should requests be cast? – Should a researcher bid for 110% for what is needed and then put the 10%
into the Grid? – Should the user bid for 90% of what's needed and assume the rest is from
the Grid.
48
In conclusion
The biggest issues to be faced:
The real challenge is cultural change in the research community
– Getting researchers to see and prepare for the change that is coming– It’s not about the infrastructures
• They will emerge
Resources on a GRID are not, and will not, be free!– Resources have costs– Support for a GRID
• equally is not free of cost• SOMEONE must pay
Are we equipping the new graduates and post-graduates for this ‘new world’
Man
ch
este
r C
om
pu
tin
g
Thank you.
Prof M.J. ClarkManchester Computing
The University of ManchesterM13 9PL
Manchester ComputingManchester Computing