Date post: | 05-Jul-2015 |
Category: |
Technology |
Upload: | trivadis |
View: | 209 times |
Download: | 1 times |
2014 © Trivadis
BASEL BERN BRUGG GENF LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
2014 © Trivadis
Big Data Infrastructure. Appliance, Cloud, or Do-it-Yourself. Daniel Steiger Discipline Manager Infrastructure Engineering
DOAG Jahreskonferenz 2014 Big Data Infrastructure
1
2014 © Trivadis
Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem Solution-Engineering und der Erbringung von IT-Services mit Fokussierung auf und Technologien im D-A-CH-Raum. Unsere strategischen Geschäftsfelder...
Unser Unternehmen
DOAG Jahreskonferenz 2014 Big Data Infrastructure
2
2014 © Trivadis
Mit über 600 IT- und Fachexperten bei Ihnen vor Ort
3
12 Trivadis Niederlassungen mit über 600 Mitarbeitenden
200 Service Level Agreements
Mehr als 4'000 Trainingsteilnehmer
Forschungs- und Entwicklungs-budget: CHF 5.0 Mio. / EUR 4.0 Mio.
Finanziell unabhängig und nachhaltig profitabel
Erfahrung aus mehr als 1'900 Projekten pro Jahr bei über 800 Kunden
(Stand 12/2013) 3
DOAG Jahreskonferenz 2014 Big Data Infrastructure
3
Hamburg
Düsseldorf
Frankfurt
Freiburg München Wien
Basel Zürich Bern
Lausanne
Stuttgart
Brugg
2014 © Trivadis
1. Big Data Infrastructure Challenges
2. Hadoop on an Appliance
3. Hadoop in the Cloud
4. Hadoop Do-it-Yourself
5. Conclusion
DOAG Jahreskonferenz 2014 Big Data Infrastructure
4
Agenda
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
5
Big Data Infrastructure Challenges
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
6
Trailwise – a "quantified self" use case
47'295 data points rendered in 643ms
11'000 data points rendered in 165ms
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
7
Trailwise – Infrastructure for a Proof of Concept
7
§ Hadoop HDFS as data store
§ HBase for real-time data access
§ Hadoop Map/Reduce
2014 © Trivadis
Concerns…
§ Scalability
§ Costs for "always up"
§ Setup and administration of a large cluster on AWS
§ Break-even cloud vs on-premise
For a proof of concept hadoop in the cloud (e.g. on Amazon EC2) is perfect...
+ Fast and easy deployment
+ Optimized Hadoop/HBase setup
+ HBase real-time performance
+ Map/Reduce scalability
+ Affordable, ca. EUR 15.-/day
DOAG Jahreskonferenz 2014 Big Data Infrastructure
8
Trailwise – Infrastructure Lessons Learned
2014 © Trivadis
§ Big Data means big data volume § Petabytes and exabytes
§ Scalability § 10, 20, 50, 100, ... cluster nodes § Costs should scale as well...
§ High demands on machine-to-machine networks § In Big Data for every one-client interaction, there may be hundreds or thousands of
server and data node interactions § This generates far more east-west (server-to-server or server-to-storage) network traffic
than north-south (server-to-client or server-to-outside) network traffic
§ And many others like integration, data protection, operation, etc.
DOAG Jahreskonferenz 2014 Big Data Infrastructure
9
Big Data Infrastructure Challenges
2014 © Trivadis
§ Infrastructure must be engineered to scale
§ The network has to provide high bandwidth, low latency, and should scale seamlessly with Hadoop clusters to provide predictable performance
§ And many more, like § Integration with operational data systems § Authentication, authorization, encryption § Centralized management
DOAG Jahreskonferenz 2014 Big Data Infrastructure
10
Infrastructure Requirements 7
Figure 1.2: Picture of a row of servers in a Google WSC, 2012.
1.6.1 STORAGEDisk drives or Flash devices are connected directly to each individual server and managed by a global distributed file system (such as Google’s GFS [58]) or they can be part of Network Attached Storage (NAS) devices directly connected to the cluster-level switching fabric. A NAS tends to be a simpler solution to deploy initially because it allows some of the data management responsibilities to be outsourced to a NAS appliance vendor. Keeping storage separate from computing nodes also makes it easier to enforce quality of service guarantees since the NAS runs no compute jobs be-sides the storage server. In contrast, attaching disks directly to compute nodes can reduce hardware costs (the disks leverage the existing server enclosure) and improve networking fabric utilization (each server network port is effectively dynamically shared between the computing tasks and the file system).
The replication model between these two approaches is also fundamentally different. A NAS tends to provide high availability through replication or error correction capabilities within each appliance, whereas systems like GFS implement replication across different machines and conse-quently will use more networking bandwidth to complete write operations. However, GFS-like systems are able to keep data available even after the loss of an entire server enclosure or rack and may allow higher aggregate read bandwidth because the same data can be sourced from multiple
1.6. ARCHITECTURAL OVERVIEW OF WSCS
Will my infrastructure meet my needs
now and in the future without putting my business at risk?
2014 © Trivadis
When enterprises adopt Hadoop, one of the decisions they must make is the deployment model. There are four options:
DOAG Jahreskonferenz 2014 Big Data Infrastructure
11
Where to Deploy your Hadoop Cluster?
When enterprises adopt Hadoop, one of the decisions they must make is the deployment model. There are four options as illustrated in Figure 1:
��On-premise full custom. With this option, businesses purchase commodity hardware, then they install software and operate it themselves. This option gives businesses full control of the Hadoop cluster.
��Hadoop appliance. This preconfigured Hadoop cluster allows businesses to bypass detailed technical configuration decisions and jumpstart data analysis.
��Hadoop hosting. Much as with a traditional ISP model, organizations rely on a service provider to deploy and operate Hadoop clusters on their behalf.
� Hadoop-as-a-Service. This option gives businesses instant access to Hadoop clusters with a pay-per-use consumption model, providing greater business agility.
To determine which of these options presents the right deployment model, organizations must consider five key areas. The first is the price-performance ratio, and it is the focus of this paper. The Hadoop-as-a-service model is typically cloud-based and uses virtualization technology to automate deployment and operation processes (in comparison, the other models typically use physical machines directly).
There have existed two divergent views related to the price-performance ratio for Hadoop deployments. One view is that a virtualized Hadoop cluster is slower because Hadoop’s workload has intensive I/O operations, which tend to run slowly on virtualized environments. The other view is that the cloud-based model provides compelling cost savings because its individual server node tends to be less expensive; furthermore, Hadoop is horizontally scalable.
The second area of consideration is data privacy, which is a common concern when storing data outside of corporate-owned infrastructure. Cloud-based deployment requires a comprehensive cloud-data privacy strategy that encompasses areas such as proper implementation of legal requirements, well-orchestrated data-protection technologies, as well as the organization’s culture with regard to adopting emerging technologies. Accenture Cloud Data Privacy Framework outlines a detailed approach to help clients address this issue.
The third area is data gravity. Once data volume reaches a certain point, physical data migration becomes prohibitively slow, which means that many organizations are locked into their current data platform. Therefore, the portability of data, the anticipated future growth of data, and the location of data must all be carefully considered.
A related and fourth area is data enrichment, which involves leveraging multiple datasets to uncover new insights. For example, combining a consumer’s purchase history and social-networking activities can yield a deeper understanding of the consumer’s lifestyle and key personal events and therefore enable companies to introduce new services and products of interest. The primary challenge is that the storage of these multiple datasets increases the volume of data, resulting in slow connectivity. Therefore, many organizations choose to co-locate these datasets. Given volume and portability considerations, most organizations choose to move the smaller datasets to the location of the larger ones. Thus, thinking strategically about where to house your data, considering both current and future needs, is key.
The fifth area is the productivity of developers and data scientists. They tap into the datasets, create a “sandbox” environment, explore the data analysis ideas, and deploy them into production. Cloud’s self-service deployment model tends to expedite this process.
Figure 1. The spectrum of Hadoop deployment options
On-premise full custom
Hadoop appliance
Hadoop hosting
Hadoop-as-a-Service
Bare-metal Cloud
2
Reference: Hadoop Deployment Comparison Study, Price-Performance Comparison, Accenture Technology Labs, 2013
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
12
Hadoop on an Appliance Oracle Big Data Appliance
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
13
Overview: Oracle's Big Data Solution
§ A complete and optimized solution for big data
§ Tight integration with Exadata, Exalogic, Exalytics and SPARC Supercluster using Infiniband network
§ Single-vendor support for both hardware and software
2014 © Trivadis
Full Rack Configuration (up to 18 racks)
§ 18 x compute/storage nodes
Per Node:
§ 2 x Eight-Core Intel ® Xeon ® E5-2650 V2 Processors
§ 64 GB Memory (up to 512 GB)
§ 48 TB Raw Storage Capacity
§ 40 Gb/sec Infiniband Network
§ 10 Gb/sec Data Center Connectivity
DOAG Jahreskonferenz 2014 Big Data Infrastructure
14
Oracle Big Data Appliance X4-2 HW
Sou
rce:
Ora
cle
®
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
15
Oracle Big Data Appliance Internal Network Connectivity
Source: Oracle Big Data Appliance: Datacenter Network Integration, Oracle White Paper, 2012
2014 © Trivadis
§ Oracle R Distribution
§ Oracle NoSQL DB Community Ed.
§ BDA Enterprise Manager Plug-In
§ Optional Software* § Oracle Big Data SQL § Oracle Big Data Connectors § Oracle Audit Vault & Database Firewall
for Hadoop Auditing § Oracle Data Integrator § Oracle NoSQL Database EE
§ Oracle Linux 6.4 with UEK
§ Oracle Java JDK 7
§ Cloudera Enterprise Data Hub Edition § Apache Hadoop HDFS § HBase § Cloudera Impala § Cloudera Search § Cloudera Manager § Apache Spark
DOAG Jahreskonferenz 2014 Big Data Infrastructure
16
Big Data Appliance Software Stack
*Connectors are licensed separately from Oracle Big Data Appliance
2014 © Trivadis
§ Oracle R Support for Big Data § R is an open-source language and
environment for statistical analysis and graphing
§ The standard R distribution is installed on all nodes of Oracle Big Data Appliance
§ Oracle R Connector for Hadoop provides R users with high-performance, native access to HDFS and the MapReduce programming framework
§ Oracle R Enterprise is a separate package that provides real-time access to Oracle Database.
§ Oracle NoSQL Database § Oracle NoSQL Database is a
distributed key-value database built on storage technology of Berkeley DB Java Edition.
§ An intelligent driver on top of Berkeley DB keeps track of the underlying storage topology, shards the data and knows where data can be placed with the lowest latency
DOAG Jahreskonferenz 2014 Big Data Infrastructure
17
BDA Specific Software Features
2014 © Trivadis
§ Oracle SQL Connector for HDFS
§ Oracle Loader for Hadoop
§ Oracle R Connector for Hadoop
§ Oracle Data Integrator Application Adapter for Hadoop
§ Data in HDFS (and NoSQL) data is accessable through relational database external table mechanism (HDFS as cluster file system)
*The connectors are licensed separately from Oracle Big Data Appliance
DOAG Jahreskonferenz 2014 Big Data Infrastructure
18
Oracle Big Data Connectors
Source: Oracle ® Reference: Oracle Big Data Connectors Data Sheet
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
19
Oracle Big Data SQL: one tool for all data sources
Reference: https://www.oracle.com/webfolder/s/delivery_production/docs/FY15h1/doc6/1-T2-BigData.pdf
2014 © Trivadis
§ Oracle Big Data Lite VM § http://www.oracle.com/technetwork/database/bigdata-appliance/
oracle-bigdatalite-2104726.html
§ MOS Notes § Information Center: Oracle Big Data Appliance (Doc ID 1445762.2) § Big Data Connectors (ID 1487399.2) § Sqoop Frequently Asked Questions (FAQ) (Doc ID 1510470.1)
DOAG Jahreskonferenz 2014 Big Data Infrastructure
20
Oracle Big Data Appliance Ressources
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
21
Hadoop in the Cloud
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
22
Hadoop in the Cloud
2014 © Trivadis
There are five key areas to consider when choosing the right deployment model*:
*Public Cloud, Private Cloud, Community Cloud oder Hybrid Cloud
DOAG Jahreskonferenz 2014 Big Data Infrastructure
23
Deployment Considerations
The second area of consideration is data
privacy, which is a common concern when
storing data outside of corporate-owned
infrastructure. Cloud-based deployment
requires a comprehensive cloud-data
privacy strategy that encompasses
areas such as proper implementation of
legal requirements, well-orchestrated
data-protection technologies, as well as
the organization’s culture with regard
to adopting emerging technologies.
Accenture Cloud Data Privacy
Framework outlines a detailed approach
to help clients address this issue.
The third area is data gravity. Once
data volume reaches a certain point,
physical data migration becomes
prohibitively slow, which means that
many organizations are locked into
their current data platform. Therefore,
the portability of data, the anticipated
future growth of data, and the location
of data must all be carefully considered.
A related and fourth area is data
enrichment, which involves leveraging
multiple datasets to uncover new
insights. For example, combining
a consumer’s purchase history and
social-networking activities can yield a
deeper understanding of the consumer’s
lifestyle and key personal events
and therefore enable companies to
introduce new services and products of
interest. The primary challenge is that
the storage of these multiple datasets
increases the volume of data, resulting
in slow connectivity. Therefore, many
organizations choose to co-locate these
datasets. Given volume and portability
considerations, most organizations
choose to move the smaller datasets to
the location of the larger ones. Thus,
thinking strategically about where
to house your data, considering both
current and future needs, is key.
The fifth area is the productivity of
developers and data scientists. They tap
into the datasets, create a “sandbox”
environment, explore the data analysis
ideas, and deploy them into production.
Cloud’s self-service deployment model
tends to expedite this process.
Out of these five key areas, Accenture
assessed the price-performance ratio
between bare-metal Hadoop clusters and
Hadoop-as-a-Service on Amazon Web
ServicesTM. (A bare-metal Hadoop cluster
refers to a Hadoop cluster deployed
on top of physical servers without a
virtualization layer. Currently, it is the
most common Hadoop deployment
option in production environments.)
For the experiment, we first built
the total cost of ownership (TCO)
model to control two environments
at the matched cost level. Then, using
Accenture Data Platform Benchmark
as real-world workloads, we compared
the performance of both a bare-
metal Hadoop cluster and Amazon
ElasticMapReduce (Amazon EMRTM).
Employing these empirical and systemic
analyses, Accenture’s study revealed
that Hadoop-as-a-Service offers better
price-performance ratio. Thus, this
result debunks the idea that the cloud
is not suitable for Hadoop MapReduce
workloads, with their heavy I/O
requirements. Moreover, the benefit
of performance tuning is so huge that
cloud’s virtualization layer overhead
is a worthy investment as it expands
performance tuning opportunities.
Lastly, despite of the sizable benefit, the
performance tuning process is complex
and time-consuming, thus requires
automated tuning tools. The results
are explored in detail in our full study,
“Hadoop Deployment Comparison Study”.
Five key areas to consider when choosing the right deployment model:
Price-performance
ratioData privacy Data gravity Data
enrichment
Productivity of developers and data scientists
Reference: Where to Deploy your Hadoop Cluster?, Executive Summary, Accenture Technology Labs, 2013
2014 © Trivadis
EC2 Instance for Hadoop/MapReduce
Storage optimized – current generation
§ Instance "hs1.8xlarge" § 16 vCPUs (Intel Xeon) § 117GB RAM § 24 x 2000GB = 48TB § 10 Gigabit network
§ MapR as option § M3, M5 or M7 edition
DOAG Jahreskonferenz 2014 Big Data Infrastructure
24
Amazon EMR with the MapR Distribution for Hadoop
Reference: http://aws.amazon.com/elasticmapreduce/mapr/
2014 © Trivadis
Costs for "hs1.8xlarge" Instance
§ Medium Utilization Reserved Instances § 1-Year term: upfront $9'200, $1.809 per Hour § 3-Year term: upfront $14'109, $1.581 per Hour
§ Data Transfer IN to Amazon EC2 from internet: $0.0 per GB
§ Data Transfer OUT from Amazon EC2 to internet: $0.12 per GB up to 10TB/month ($120 per TB)
§ MapR M7: $1.49 per Hour
§ Total: $2'600/month, $31'200/year (24/365 utilization)
DOAG Jahreskonferenz 2014 Big Data Infrastructure
25
Amazon EMR with the MapR Distribution for Hadoop
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
26
Hadoop on Do-It-Yourself Infrastructure
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
27
Do-it-Yourself (experimental setup)
Source: http://blog.ittoby.com/
2014 © Trivadis
HP ProLiant DL380p Gen8
§ 2 x Eight-Core Intel ® Xeon ® E5-2650 V2
§ 64 GB Memory (up to 512 GB)
§ 48 TB Raw Storage Capacity
§ 40 Gb/sec Infiniband Network
§ 10 Gb/sec Data Center Connectivity
§ About $20'000 + Rack + Network + Work
DOAG Jahreskonferenz 2014 Big Data Infrastructure
28
Do-it-Yourself (enterprise class setup)
Technical white paper | HP Reference Architecture for MapR M5
11
This section specifies which server to use and the rationale behind it. The Reference Architectures section will provide topologies for the deployment of control and worker services across the nodes for clusters of varying sizes.
Processor configuration MapR manages the amount of work each server is able to undertake via the amount of Map/Reduce slots configured for that server. The more cores available to the server, the more Map/Reduce slots can be configured for the server (see the Computation section for more detail). We recommend 6 core processors for a good balance of price and performance. We recommend that Hyper-Threading is turned on.
Drive configuration Redundancy is built into the MapR architecture and thus there is no need for RAID or additional hardware components to improve redundancy on the server as it is all coordinated and managed in the MapR software.
MapR Benefit Drives should use a Just a Bunch of Disks (JBOD) configuration, which can be achieved with the HP P420 RAID controller by configuring each individual disk as a separate RAID 0 volume. We recommend disabling array acceleration on the controller to better handle large block I/Os in the Hadoop environment.
Lastly, servers should provide a large amount of storage capacity which increases the total capacity of the distributed file system and provide that capacity by using at least twelve 2TB Large Form Factor drives for optimum I/O performance. The DL380e supports 14 Large Form Factor (LFF) drives, which allows one to either use all 14 drives for data or use 12 drives for data and the additional 2 for mirroring the operating system and MapR runtime. Hot pluggable drives are recommended so that drives can be replaced without restarting the server.
Memory configuration Servers running the node processes should have sufficient memory for either HBase or for the amount of Map/Reduce Slots configured on the server. A server with larger RAM configuration will deliver optimum performance for both HBase and Map/Reduce. To ensure optimal memory performance and bandwidth, we recommend using 8GB or 16GB DIMMs to populate each of the 6 memory channels as needed.
Network configuration The DL380e includes four 1GbE NICs onboard. MapR automatically identifies the available NICs on the server and bonds them via the MapR software to increase throughput.
MapR Benefit Each of the reference architecture configurations below specifies an additional Top of Rack Switch for redundancy. To best make use of this, we recommend cabling the ProLiant DL380e Worker Nodes so that NIC 1 is cabled to Switch 1 and NIC 2 is cabled to Switch 2, repeating the same process for NICs 3 and 4. Each NIC in the server should have its own IP subnet instead of sharing the same subnet with other NICs.
HP ProLiant DL380e Gen8 The HP ProLiant DL380e Gen8 (2U) is an excellent choice as the server platform for the worker nodes.
Figure 6. HP ProLiant DL380e Gen8 Server
§ Cloudera Enterprise Data Hub Edition 5.x
§ ca. $2'500/node + support
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
29
Conclusion
2014 © Trivadis
Oracle BDA + High performance scalable network architecture
+ Highly integrated into Oracle eco system
+ Complete software stack Oracle & Hadoop
+ Single point of support
+ Competitive price/ performance ratio for enterprise class demands
DOAG Jahreskonferenz 2014 Big Data Infrastructure
30
Appliance, Cloud or DIY?
Amazon EC2 Instances + Fast and easy deployment
+ Scales from very small to very large cluster setups
+ Capacity on demand on hourly base + Optional enterprise class hadoop distribution
+ Interesting price model for volatile utilisation and capacity on demand
Do it Yourself + Low entry point
+ Free choice of hardware
+ Free choice of software stack
Technical white paper | HP Reference Architecture for MapR M5
11
This section specifies which server to use and the rationale behind it. The Reference Architectures section will provide topologies for the deployment of control and worker services across the nodes for clusters of varying sizes.
Processor configuration MapR manages the amount of work each server is able to undertake via the amount of Map/Reduce slots configured for that server. The more cores available to the server, the more Map/Reduce slots can be configured for the server (see the Computation section for more detail). We recommend 6 core processors for a good balance of price and performance. We recommend that Hyper-Threading is turned on.
Drive configuration Redundancy is built into the MapR architecture and thus there is no need for RAID or additional hardware components to improve redundancy on the server as it is all coordinated and managed in the MapR software.
MapR Benefit Drives should use a Just a Bunch of Disks (JBOD) configuration, which can be achieved with the HP P420 RAID controller by configuring each individual disk as a separate RAID 0 volume. We recommend disabling array acceleration on the controller to better handle large block I/Os in the Hadoop environment.
Lastly, servers should provide a large amount of storage capacity which increases the total capacity of the distributed file system and provide that capacity by using at least twelve 2TB Large Form Factor drives for optimum I/O performance. The DL380e supports 14 Large Form Factor (LFF) drives, which allows one to either use all 14 drives for data or use 12 drives for data and the additional 2 for mirroring the operating system and MapR runtime. Hot pluggable drives are recommended so that drives can be replaced without restarting the server.
Memory configuration Servers running the node processes should have sufficient memory for either HBase or for the amount of Map/Reduce Slots configured on the server. A server with larger RAM configuration will deliver optimum performance for both HBase and Map/Reduce. To ensure optimal memory performance and bandwidth, we recommend using 8GB or 16GB DIMMs to populate each of the 6 memory channels as needed.
Network configuration The DL380e includes four 1GbE NICs onboard. MapR automatically identifies the available NICs on the server and bonds them via the MapR software to increase throughput.
MapR Benefit Each of the reference architecture configurations below specifies an additional Top of Rack Switch for redundancy. To best make use of this, we recommend cabling the ProLiant DL380e Worker Nodes so that NIC 1 is cabled to Switch 1 and NIC 2 is cabled to Switch 2, repeating the same process for NICs 3 and 4. Each NIC in the server should have its own IP subnet instead of sharing the same subnet with other NICs.
HP ProLiant DL380e Gen8 The HP ProLiant DL380e Gen8 (2U) is an excellent choice as the server platform for the worker nodes.
Figure 6. HP ProLiant DL380e Gen8 Server
2014 © Trivadis
§ Building an enterprise-class hadoop infrastructure is a challenge
§ Analyse and prioritize your requirements (business and IT) is crucial
§ Start „small & fast“ with a proof of concept
§ Consider various deployment models (On-Premis, Appliance, IaaS, PaaS, HaaS, ...)
§ The Oracle Database Appliance is a very competitive offering – especially as extension to your existing Oracle operational data systems
DOAG Jahreskonferenz 2014 Big Data Infrastructure
31
Conclusion
2014 © Trivadis 2014 © Trivadis
BASEL BERN BRUGG GENF LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN
Thank you. Daniel Steiger Discipline Manager Infratructure Engineering
Tel: +41 58 459 50 88 [email protected]
DOAG Jahreskonferenz 2014 Big Data Infrastructure
32
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
33
Trivadis an der DOAG Ebene 3 - gleich neben der Rolltreppe
Wir freuen uns auf Ihren Besuch. Denn mit Trivadis gewinnen Sie immer.
2014 © Trivadis
DOAG Jahreskonferenz 2014 Big Data Infrastructure
34
Cost comparison
A"ribute Oracle BDA Amazon EMR DIY Typ X4-‐2 hs1.8xlarge DL-‐380 CPU 2x8-‐Core 16 vCPU 2x8-‐Core RAM 64 GB 117 GB 64 GB Storage 48 TB 48 TB 8 TB Network 10 GB / 40 GB 10 GB 10 GB / 40 GB Hadoop Distr. Cloudera MapR Cloudera Preis / Jahr 525'000 562'256 405'000 Wartung / Jahr 63'000 -‐ 40'000 Total 1. Jahr 588'000 562'256 445'000
Total 3 Jahre 714'000 1'686'768 525'000