Oracle Big Data Cloud Service - MANDY SANDHU’S BLOG · • choose Oracle Big data Cloud service...

Post on 27-Jun-2020

6 views 0 download

transcript

Oracle Big Data Cloud Service

Presented by : Mandeep Kaur SandhuSenior Oracle DBA

Download these slides from : mandysandhu.com

• Introduction to Big Data• Oracle Big data deployment models• Oracle Big Data cloud Service• Core Principles• Access and Admin tasks• Data Management tools• Event Hub • Conclusion

2

Goals

3

What is Big Data??

Batch

Streaming Data

Terabytes

Zettabytes

Structured andUnstructured

Structured

VarietyVelocity

Volume

• Big data is a term that describe Large or complex datasets

• Traditional data Processing system failed to analyse this data

• Big data identify the value of data

An open Source Software Platform for distributed storage and processing – Highly Scalable , Reliable and Available

4

What is Hadoop??

Hadoop

Logically Distributed file system

Framework for processing

Designed to run on small/large machine for parallel processing

Allow resource Growth

Avoid Vendor Locks in

HDFS MapReduce

HDFS stores the data in cluster

• NameNode• DataNode

5

Two Components

Programming Model for processing large data sets

• Map - set of data and converts into another set of data • Reduce – Take output of Map as input and combine into smaller set

MapReduce

6

7

Oracle Big Data Deployment Models

Oracle Big Data Cloud service model

delivered in your data centre, behind your

firewall

Oracle Big Data Cloud at

Customer(BDCC)

On- Premises engineered system designed to deliver predictable Hadoop

infrastructure

Oracle Big Data Appliance X6

Oracle public cloud infrastructure with cluster nodes and

data sources

Oracle Big Data Cloud Service

(BDCS)

Operational Efficiency• Out of box installation • Automated cluster management• Cloudera Manager

Security• Data in encrypted – At rest and motion• Authorization and Authentication• Network Firewall

Versatility• Cloudera distribution – Apache Hadoop Enterprise Data hub• Install and operate third party software

8

BDCS - Core Principles

Highly Efficient Cluster Management • Fault Tolerant – HA Hadoop Infrastructure• Fully tested Hadoop upgrades

Cluster Nodes• Cluster is a collection of nodes• Permanent nodes• Edge Nodes• Compute Nodes

9

BDCS - Features

• Master or Data node • Last for the lifetime of the cluster• Each nodes has:

• 32 OCPU’s• 256 GB RAM• 48 TB Storage• Full Cloudera distribution – Licence and Support

10

Permanent Nodes

• Empty Nodes – OS and disk• Hadoop client configs• Interface between Hadoop cluster and outside

Network• Permanent node

Note: No data Node role

11

Edge Nodes

• CPU and Memory• No disks• Temporary nodes• Need to Have cluster to add compute nodes• Cluster can be extended up to 15 cluster

compute nodes• No HDFS data

12

Compute Nodes

• Oracle Linux 6 and Oracle Java – JDK8• Cloudera Enterprise (Data Hub Edition)

• CDH 5.X with support for YARN and MR2• Cloudera Impala• HBASE• Cloudera Search• Apache Spark

• Oracle R distribution• Oracle Big Data Spatial and Graph

13

BDCS – Included Software

Oracle Big Data SQL Cloud Service• Unified SQL access• Dedicated instances

14

BDCS – Additional Component

Oracle cloud

Cloudera 12c

B X

• Login to Oracle cloud • choose Oracle Big data Cloud service • Start Pack 1 –> 3 Nodes • Additional Node – Added later• Big Data SQL node

15

Oracle BDCS – Service Instance

• Go to Oracle big data service instance • Create service cluster• Provide tags and Instance Name

16

Oracle BDCS – Service Cluster

• Select Big data Appliance system – Service instance• SSH keys

17

Oracle BDCS – Service Cluster

Starter pack 1 –> 3 instancesLowest IP address –> Master Node

18

Oracle BDCS – Admin page

• You can connect via– opc• CLI – bdacli• Overall information about cluster

19

Oracle BDCS – Connect

• Open Cloudera console • Username/password

20

Access Cloudera console

• Add nodes in one node increment – up to total 60 nodes• Four Permanent Hadoop nodes – Allow additional Edge Node

• Extend/Shrink the service

21

Administrative Tasks

• Open Cloudera console – Hue

• Same account detail as CM• Add Group• Add User• Upload file

22

Hue – Group/user and File upload

• GUI based console• Login username – bigdatamgr• Explore jobs and data stored• Usage and Health of cluster• YARN jobs

23

Big Data Manager Console

• Zeepelin Notebooks – Interactive analysis using R and Python

24

Oracle Big Manager - Notebook

odcp• Command line for copy large files• Take input and split it into chunks• Uses spark to provide parallel transfer

Examples:

odcp hdfs:///user/mandy/bigdata01.csv hdfs:///user/mandy/bigdata01.csv_copy

odcp hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy

odcp hdfs:///user/mandy/bigdata01.csv s3://aserver/bigdata01.csv_copy

odcp s3://user/mandy/bigdata01.csv s3://mandy01/bigdata01.csv_copy

25

Data Management - odcp

odiff• Oracle distribution diff – To compare large Data sets• Compatible with cloudera distribution• Minimum block size to compare – 5MB• Maximum – 2GB

Examples:

/usr/bin/odiff hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy

/usr/bin/odiff -V hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy

/usr/bin/odiff -d hdfs:///user/mandy/bigdata01.csv swift://aserver.1234/bigdata01.csv_copy

26

Data Management - odiff

bda-oss-admin• To Manage data and resources• Can set the environment variables• Configure the cluster with storage provider

Examples:

bdm-oss-admin --cm-username admin --cm-password abce1234

bdm-oss-admin restart_cluster

#!/bin/bashexport CM_ADMIN="my_CM_admin_username"

27

Data Management

bdm-cli • Big data command line interface to copy data and mange copy jobs• Duplicate of odcp commands

bdm-cli copy

bdm-cli create_job

28

Data Management – bdm-cli

Oracle Big Data Cloud Service

Direct ingest into oracle BDCS

29

Data ingest options

Customer Data Centre

Flume

SCP

SCP(SSH protocol)

Common ingests using Flume or ETL work

VPN and FastConnect

• Open Source stream processing• Real time streaming• High throughput and Low latency platform

30

Apache Kafka

Steams ProcessingIOT

Anomaly Detection

Data IntegrationData Lakes

HDFSObjects storage

Log AggregationClick Streams

Server logs

MessagingTraditional AppsMicros-services

• Fully Managed streaming data platform• Provide world’s most popular message broker( kafka)

• Flexible• Available full managed and dedicated deployment option• Elastic – horizontally and Vertically

• Access• REST API access• SSH access to Kafka cluster

31

Oracle Event Hub Cloud Service

• Start you big data journey now• Built and populate a data lake• Help business to solve the problems by using data• Register for oracle cloud free trail

https://cloud.oracle.com/tryit

32

Conclusion

Thank you for your time!!

Follow and Subscribe Me.

Blog mandysandhu.com Twitter @mandysandhu14 LinkedIn kaurmandeep88