+ All Categories
Home > Technology > Big Data in the Cloud? Yes, you can do it in OpenStack

Big Data in the Cloud? Yes, you can do it in OpenStack

Date post: 22-Jan-2018
Category:
Upload: obed-n-munoz
View: 34 times
Download: 2 times
Share this document with a friend
28
Big Data in the Cloud? Yes, you can do it in OpenStack
Transcript

Big Data in the Cloud? Yes, you can do it in

OpenStack

Hello!I am Obed N MuñozI am here because I love to give presentations.

-vvvv

Software Engineer

Who am I?

Musician Fast driver

Agenda- Introduction: Cloud and

OpenStack- Data-processing- Sahara Project

IntroductionCloud Computing and OpenStack

1

Cloud and XaaS EraEverything as a Service

“Cloud computing term is used for a variety of services and applications emerging for users to access on demand over the Internet as opposed to being utilized via on-premises means.

OpenStack

“OpenStack is a cloud operating system that controls large pools of compute storage and networking resources throughout a datacenter, all managed through a dashboard, CLI, RestFUL API ...

Architecture

Data-ProcessingData-Processing in the Cloud

2

What’s around Data-Processing?◇ Big Data◇ Data Science◇ Cloud◇ Machine Learning◇ Patterns Recognition◇ Neural Networks◇ Etc ...

Data-Processing Technologies

Sahara ProjectData-Processing in OpenStack

3

OpenStack SaharaThe Sahara project provides a simple means to provision data-intensive application cluster (Spark or Hadoop) on top of OpenStack.

https://wiki.openstack.org/wiki/Sahara

Architecture

http://docs.openstack.org/developer/sahara/architecture.html

Getting Started- Clusters- Templates- Provisioning Plugins- Image Registry- Data Processing Frameworks- Elastic Data Processing (EDP)

http://docs.openstack.org/developer/sahara/userdoc/edp.html

More Features ...- OpenStack Block Storage support- Cluster Scaling- Data locality- Distributed Mode- Hadoop HDFS High Availability- Orchestration support- …

Clusters (Hadoop)

http://docs.openstack.org/developer/sahara/userdoc/edp.html

Data-Processing Frameworks

- Hadoop- Spark- Storm

http://docs.openstack.org/developer/sahara/userdoc/edp.html

Provisioning Plugins- Vanilla - Vanilla Apache Hadoop- Ambari - Hortonworks Data

Platform- Spark - Apache Spark with Cloudera

HDFS- MapR Distribution - MapR plugin

with MapR File System- Cloudera - Cloudera Hadoop

http://docs.openstack.org/developer/sahara/userdoc/edp.html

Elastic Data Processing (EDP)Allows the execution of jobs on cluster created from Sahara. It supports:

- Hive, Pig, MapReduce.Streaming, Java, Shell job types on Hadoop clusters

- Spark jobs- Shared File system service (manila), or Sahara own

database- Access to input and output data sources in:

- HDFS- Swift- Manila

http://docs.openstack.org/developer/sahara/userdoc/edp.html

Resources- Documentation

- http://docs.openstack.org/developer/sahara/- https://wiki.openstack.org/wiki/Sahara -

- Hadoop/Spark Images- http://sahara-files.mirantis.com/images/upstream/mitak

a/ - OpenStack Auto-deployment with RDO

- https://www.rdoproject.org/install/quickstart/- Videos

- https://www.youtube.com/watch?v=idAaLo1stbw- https://www.youtube.com/watch?v=TgPTjrf1y0A

http://docs.openstack.org/developer/sahara/userdoc/edp.html

http://hackathon.openstackgdl.org/

Q & ACONCLUSION

Thanks!Any questions?You can find me at:◇ @obedmr◇ [email protected]


Recommended