Request for Information (RFI)
Data storage and services for DIRISA facility
CSIR RFI No. 9000/17/07/2018
Date of Issue 29 November 2017
Closing Date Wednesday, 17 January 2018 @ 16:30
Place
Tender box, SIR Main Reception, Gate 3 ( North Gate)
CSIR Pretoria Campus or emailed to [email protected]
Enquiries CSIR Strategic Procurement Unit E-mail: [email protected]
CSIR business hours 08h00 – 16h30
Category Computer Hardware and Software
CSIR RFI No 9000/17/07/2018 Page 2 of 13
TABLE OF CONTENTS
1 INTRODUCTION 3
2 BACKGROUND 3
3 PURPOSE OF THIS RFI 4
4 INVITATION FOR INFORMATION ON INTEGRATED TECHNOLOGY SOLUTIONS 4
5 INFORMATION REQUESTED 4
6 SUBMISSION OF INFORMATION 10
7 VENUE FOR SUBMISSIONS 11
8 DEADLINE FOR SUBMISSION 11
9 MEDIUM OF COMMUNICATION 11
10 COST OF INFORMATION 12
11 DISCLAIMER 12
12 APPENDIX: BREAKDOWN OF COST ESTIMATE 13
CSIR RFI No 9000/17/07/2018 Page 3 of 13
SECTION A: TECHNICAL INFORMATION
1 INTRODUCTION
The Council for Scientific and Industrial Research (CSIR) is one of the leading scientific
research and technology development organisations in Africa. In partnership with
national and international research and technology institutions, CSIR undertakes
directed and multidisciplinary research and technology innovation that contributes to the
improvement of the quality of life of South Africans. The CSIR’s main site is in Pretoria
while it is represented in other provinces of South Africa through regional offices.
2 BACKGROUND
The Data Intensive Research Initiative of South Africa (DIRISA) is one component of
South Africa’s National Integrated Cyberinfrastructure System (NICIS), together with the
Centre for High Performance Computing (CHPC) and the South African National
Research Network (SANReN). The goals of DIRISA are principally to enable and
support innovation through data intensive research and its remit can be described as
the “national data infrastructure layer providing services and support for the (re)use,
transformation, curation, preservation, exchange and interoperability of research data
serving to catalyse the production of research outputs.”
This Request for Information is solely to gather information for a petascale research
data repository of 8 petabytes of raw storage hosted at two sites of the CSIR, together
with the information and communications technologies that facilitate the deposit
(upload), search (discover), access and download of data sets in a controlled (secure
and reliable) manner.
2.1 Current DIRISA Architecture
The current DIRISA storage facility comprises 2 PB of racked disk drives. This storage
volume is physically divided between two mirrored sites, one at the CHPC in Cape Town
and the other at the CSIR campus in Pretoria, providing usable (“real”) storage of
approximately 700 TB. Besides this facility, it is also important that the proposed solution
interfaces appropriately with CHPC’s high performance computing resources.
CSIR RFI No 9000/17/07/2018 Page 4 of 13
3 PURPOSE OF THIS RFI
This request forms part of the process for CSIR to gather information about information
and communications technologies that can be deployed for a data repository and to
provide services that enable the research community to use this repository to securely
and reliably manage research data. This document describes the general terms and
conditions to which all respondents must comply. Responses to this RFI may inform, but
would not necessarily determine any decisions on technologies for the DIRISA storage
facility.
4 INVITATION FOR INFORMATION ON INTEGRATED TECHNOLOGY SOLUTIONS
The CSIR invites submissions of information on integrated technology solutions, as
detailed in Section 5.1, for a data repository and related services for securely and
reliably managing data objects throughout their life cycle. Submissions should also
include information about options for;
1 Maintenance and support services and training interventions to maintain this facility;
and
2 Training interventions to designated employees of the CSIR in order to transfer the
knowledge and skills necessary to monitor and maintain the deployed system
subsequent to the termination of a three year period of support.
5 INFORMATION REQUESTED
1 Architectural description. A description of a proposed system detailing the
storage and computing facilities as well as the network topology and security
architecture.
2 Hardware. Detailed specifications of the hardware equipment (rack space, blades,
physical CPUs and cores, memory, disk racks, switches and routers, etc.) together
with their specifications (performance, power consumption, speeds, latencies, etc.)
and upgrade capacities providing a minimum of eight (8) petabytes of raw disk
storage and configured to provide at least 2.7 petabytes of usable storage.
Describe the features that make your equipment better than others and specify the
cost-benefit implications of redundant storage options.
CSIR RFI No 9000/17/07/2018 Page 5 of 13
3 Software. Details of the entire software stack (systems, middleware to applications
layers) and services for reliable and secure access, use, administration and
management of this storage facility. Motivate your choice of operating environment
and interface (cloud-based, web portals, gateways, etc.)
4 Maintenance and support. Description of the maintenance and support options to
ensure that the deployed system and services remain robustly operational for a
period of at least three (3) years from the date of commissioning.
5 Skills transfer. Training intervention options that can be provided to designated
employees of the CSIR in order to transfer the knowledge and skills necessary to
monitor and maintain the deployed system subsequent to the termination of the
three-year period of support.
5.1 Business Requirements and Constraints
Reponses should specifically address and/or provide solutions to the following
business requirements and constraints.
5.1.1 Open systems and interoperability
The system should preferably utilise Open Standards based software and have the
capability to federate other data repositories in an interoperable way. Hardware and
software systems should be contemporary and based on mainstream technologies.
Currently used Open Source systems software and middleware are Linux Redhat
Enterprise (RHEL) and Redhat Openstack.
5.1.2 User trust
The system must be configured and maintained in a manner that allows for secure,
reliable and long-term storage for datasets. The provision of secure and reliable
storage for datasets is a stringent requirement that critically impacts the uptake and
use of this facility.
5.1.3 Diverse user skills
The main end user group of this facility comprises academics and researchers at
institutions primarily in South Africa from across all academic disciplines including the
CSIR RFI No 9000/17/07/2018 Page 6 of 13
Arts, Social Sciences, Health and Education. The system should cater for end users
with wide ranging skills and experience in information technologies.
5.1.4 Collaborative relationship
Proposed solutions should be “future-proof”, i.e., can be readily refreshed with
contemporary and emerging technologies. Elaborate on possibilities to enter into a
collaborative relationship in order to co-develop solutions for petascale data
infrastructures.
5.1.5 Integration with existing infrastructure and services
The proposed solution should integrate with the existing storage infrastructure and
with resources provided by the CHPC and SANReN. A possible architectural option
is that the proposed solution is deployed alongside with, but links to existing storage
infrastructure and services. This architectural model results in two physically distinct
but linked storage systems. However, proposers are free to propose alternative
architectures.
5.1.6 Scalability and modularity
The system should be scalable to readily allow for the upgrade of storage volume,
connectivity bandwidth and computing capability. The system should be modular to
facilitate easy upgrade or substitution of hardware and software components in
future.
5.1.7 Minimal disruption and continued operation
While the system is not acutely business critical it is important that the disruption of
services be minimised. It may be possible to interrupt services for a short, pre-
planned and well-communicated period of time preferably outside normal business
hours. Continued operation of the facility must be ensured for more than 99.0 % of
the time (7 x 24 x 365 hours) and failover mechanisms should be implemented.
CSIR RFI No 9000/17/07/2018 Page 7 of 13
5.1.8 Security and privacy
Related to the user trust requirement, secure access is a paramount business
requirement: secure access to data and the system must be ensured at all times.
Some datasets are confidential and their confidentiality must be maintained
throughout.
This may include, for example, the functionality for users to encrypt their datasets or
for users to restrict access to their datasets to specific users or groups of users. The
system must be designed to allow for compliance with the ISO 27000 series of
standards and relevant information security / privacy / data protection laws and
regulations, such as the Protection of Personal Information Act 4 of 2013. The entire
technology stack of the system must be configured to protect data from malicious or
unintended access, modification, disclosure, destruction or misuse.
5.1.9 Reliability and performance
Low latency, appropriate backup and failover mechanisms (such as the currently
implemented data mirroring functionality) should be maintained to ensure a robust
and fault tolerant system for persistent retention and integrity of data and services.
Performance should degrade gracefully in cases of failure.
5.1.10 Close consultation with hosts of existing system
The existing storage technology is hosted at the CHPC in Cape Town and at CSIR
ICT Shared Services in Pretoria. The hardware and related equipment forming part of
a specified solution would be installed and deployed at these sites.
5.1.11 Power, cooling and connectivity constraints
Information is required on how the proposed solution will minimise environmental
impact (power, cooling, recycling, hazardous materials, etc), and what measures can
be put into place to manage future impact. Responses should take into account a five
to ten year cycle and include windows within which technology would be refreshed. It
is required that the SANReN managed communications network be utilised for any
external (“off-site”) networking activities (e.g., mirroring and backup)
CSIR RFI No 9000/17/07/2018 Page 8 of 13
5.2 High Level Functional Requirements
The overarching functional requirements of the data infrastructure are reliable and
secure storage, and sound management of datasets throughout the entire data
lifecycle (from acquisition to use, preservation or expunction). While many models of
such lifecycles exist, typical data life cycle stages are as shown in Figure 1. Two high-
level functional components can be distinguished:
Active Data Management supports the data intensive research mandate of DIRISA,
i.e., enabling the use and reuse of data for e-research (or e-science). Users should
be able, in real time, to deposit (upload) datasets in a well-controlled manner; be
able to search for or discover datasets, be provided with metadata about them;
granted authorised access to datasets; download or stage datasets or portions of
datasets for processing or analysis.
Passive Data Management concerns the preservation or curation of datasets that
have potential research or national value.
Figure 1: Stages in a typical data lifecycle. A passive and active component can be
distinguished. Standards compliance processes are not shown.
The CHPC and SANReN are presently providing high performance computing and high
bandwidth connectivity resources and services that leverage these resources. The
storage facility should interface appropriately with these resources and services. An
example of a service environment interface for the end user functions is shown in Figure
CSIR RFI No 9000/17/07/2018 Page 9 of 13
2. Prevailing interactive cloud-based technologies could provide the needed services in
which case specific details are required about securing information and data
sovereignty. Interoperability with external services is a further key aspect to consider in
the design of the system.
Figure 2: Example of a user interface showing core DIRISA functions. At least the three functions
DSubscribe, DataDrop and FindGet should be provided.
Typically, an end user subscribes to the system, i.e., becomes uniquely identifiable and
obtains access to the functions provided, using the DSubscribe function. The use of
Persistent Identifiers1 (PIDs) to manage digital objects is an important requirement for
properly managing data assets. Upon deposit (DataDrop), a data set is assigned a
unique PID together with other metadata that supports search and management. Data
types can be any of structured or unstructured; text, audio or video in any of a number of
formats although Open formats are preferred.
The FindGet function provides a data discovery service while SafeShare allows the user
to set and assign privacy and access criteria for users and groups through a policy
management environment. The DataStage service prepares selected dataset/s for
processing on CHPC computing resources.
1 A PID is a globally unique computer-actionable string that references a digital object and is
intended to function for a long time. “Persistent” refers to the service that administers and
manages them, rather than the identifier itself.
CSIR RFI No 9000/17/07/2018 Page 10 of
13
6 SUBMISSION OF INFORMATION
Service providers may submit solutions in a format that they deem fit but consistent with
the flow of this RFI. Submissions should specify at least the following deliverables and
outcomes.
1. Architectural descriptions of the proposed system detailing the storage and
computing facility, as well as the network topology and security architecture;
2. Detailed description and specifications of the hardware equipment (rack space,
blades, physical CPUs and cores, memory, disk racks, switches and routers, etc.)
together with their specifications (performance, power consumption, speeds,
latencies, etc.) and upgrade capacities;
3. Detailed description of the systems level software, including middleware (operating
systems, file systems, physical and virtual operating environments);
4. Descriptions of software environments and services (application layer) supporting
management, administration, functional and business requirements;
5. Installation, configuration, deployment and delivery schedule together with unit and
system acceptance testing and validation regimes;
6. Warranty and licensing terms and conditions as well as a description of information
security and privacy functionality and services within the entire technology stack;
7. Near-time and in-time support and maintenance services for a minimum of three
years; user and system documentation and manuals; and
8. Knowledge and skills transfer interventions together with their outcomes and benefits
of collaborative relationship
9. Indicative pricing options detailing a bill of materials, installation, configuration,
testing and deployment costs. The Pricing Breakdown given in the Appendix should
be used as basis.
CSIR RFI No 9000/17/07/2018 Page 11 of
13
7 VENUE FOR SUBMISSIONS
All information may be submitted via email to [email protected] or in sealed envelopes
in at:
CSIR GATE 03 - Main Reception Area (in the Tender box) at the following address:
Council for Scientific and Industrial Research (CSIR)
Meiring Naudé Road
Brummeria
Pretoria
Submissions in sealed enveloped must include a hardcopy document as well as a digital
copy of the document in pdf® format on a malware-free and Windows™ accessible
flash drive. The hardcopy document shall take precedence where there are differences
between the printed and digital version.
8 DEADLINE FOR SUBMISSION
Submissions must be submitted at and emailed to the addresses mentioned above no
later than the closing date of Wednesday, 17 January 2018 during CSIR’s business
hours.
The CSIR business hours are between 08h00 and 16h30. Where a submission is not
received by the CSIR by the due date and stipulated place, it will be regarded as a late
submission. Late submissions will not be considered.
The last date for submission of queries is 12 January 2018
9 MEDIUM OF COMMUNICATION
All documentation submitted in response to this RFI must be in English, unless indicated
otherwise.
CSIR RFI No 9000/17/07/2018 Page 12 of
13
10 COST OF INFORMATION
Service providers are expected to fully acquaint themselves with the conditions,
requirements and specifications of this RFI before submitting information. Each service
provider assumes all risks for resource commitment and expenses, direct or indirect, of
information preparation and participation throughout the RFI process. The CSIR is not
responsible directly or indirectly for any costs incurred by a service provider.
11 DISCLAIMER
This RFI is a request for information only and not an offer document; answers to it must
not be construed in any manner, as acceptance of an offer or imply the existence of a
contract between the parties. By submission of its information, service providers shall be
deemed to have satisfied themselves with and to have accepted all Terms & Conditions
of this RFI. The CSIR makes no representation, warranty, assurance, guarantee or
endorsements to the service provider concerning the RFI, whether with regard to its
accuracy, completeness or otherwise and the CSIR shall have no liability towards the
service provider or any other party in connection therewith.
CSIR RFI No 9000/17/07/2018 Page 13 of
13
12 APPENDIX: Breakdown of Cost Estimate
The following table can be used in submissions. Shaded cells should be completed. A
minimum of 3 years duration is assumed for all prices with subscription options. Standard list
pricing and additional options can be listed and detailed as needed.
Item Subtotal (ZAR) Total (ZAR)
1 System architectural design
2 Hardware and equipment 0.00
Computing and servers
Storage (minimum 8 PB)
Networking and connectivity
3 Software 0.00
Operating systems
Middleware and operating environments
System management and administration
Application services and other
4 Delivery, installation, configuration and deployment
5 Warranties and licensing
6 Knowledge transfer, training and documentation
7 Supplier collaboration
8 Maintenance, support and technical services (Minimum 3 years)
Total ZAR excluding VAT: 0.00
Less Discount:
Nett Price (Excl. VAT): 0.00