Date post: | 27-Jan-2015 |
Category: |
Documents |
Upload: | ramazan-firin |
View: | 113 times |
Download: | 1 times |
This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission of this document in any manner to any third parties that are not authorised to receive.
Big Data – Hadoop - NoSQL and Graph DatabaseRamazan FIRIN20.11.2012
2
AGENDA
• Big Data
• Hadoop
• NoSQL
• Graph DB and Neoj
• Possible Usage in Tellco
• Demo
3
Executive Summary
R&D /MW DevelopementAVEA
• Big Data is a new IT trend
• Hadoop and NoSQL can used to process Big Data
• Possible usage area in Tellco : - Prevent Churn - to offer customer spesific campaign - to get more customer
4
What is Big Data?
Datasets that are too awkward to work with using traditional,
hands-ondatabase management tools.
5
Big Data- 3V Concept
6
Big Data Sources
1. Social network profiles -Facebook, LinkedIn, Yahoo, Google
2. Social influencers - blog comments, user forums, review sites,
3. Activity-generated data - application logs, sensor data
4. Public—Wikipedia, IMDb, etc
5. Data warehouse appliances - transactional data
6. Network and in-stream monitoring
7. Legacy documents—
7
Big Data To Smart Data
Cover of The Economist
8
Volume
/
9
New Data Sources - Internet
• 2 Billion internet users by 2011
• Twitter processes 7 terabytes data of every day
• Facebook processes 10 terabytes data of every day
• 4.6 billion mobile phone
• Google processes 24 petabytes data of every day
10
Big Data Approach
11
Big Data Design
12
Big Data Usage Sector
13
Sample Usage - 360°Degree View of the Customers
14
Sample Usage – Customer Sentiment
15
Sample Usage – Detect Churn Pattern
16
Sample Usage - Healty
17
Big Data Market
18
Big Data Solutions – Oracle Big Data Appliance
19
Big Data Solutions – IBM Pure Data
20
TOP 10 Tecnology Trend 2012 from CSC
21
Gartner: Top 10 IT Trends for 2013
21R&D /MW DevelopementAvea
22
Gartner:10 Critical IT Trends For The Next Five Years
• Third trend is Bigger data and storage:
• By 2015, big data demand will generate 1 million jobs in the Global 1000,
• but only a one-third of jobs will get filled due to shortage of talent.
• Analytics and pattern recognition are key.
• Seeing new specialized ARM-based servers to do specialty analytics.
22R&D /MW DevelopementAvea
23
HADOOP
24
What is HADOOP?
The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets
across clusters of computers using simple programming models
25
History
26
Hadoop Components
27
28
Hadoop Ecosystem
Pig - simplifies hadoop programming, data processing language
Hive - SQL like queries
HBase - Random read/write, billions of row and millions of colums (NoSQL)
29
Other Google Research
30
NoSQL
31
RDBMS PERFORMANCE
31R&D /MW DevelopementAvea
32
Join is killer...
32R&D /MW DevelopementAvea
33
What is NoSQL?
• Stands for Not Only SQL
• Non relational
• Cheap, Easy to implement
• Scalability
– Vertically - Add more data
– Horizontally - Add more storage
• No pre-defined schema
• No join operations
• Not ACID, support CAP threom
34
NoSQL DB Types
1. Key-values Stores
2. Document Databases
3. Column Family Stores
4. Graph Databases
35
Key-Value Stores
- Redis, Voldemort
36
Document Database
- CouchDB, MongoDB
37
-Cassandra, HBase
38
Graph Database
- Neo4J, InfoGrid, Infinite Graph
39
RMDBS Support ACID
• Atomicity - a transaction is all or nothing
• Consistency - only valid data is written to the database
• Isolation - pretend all transactions are happening serially and the data is correct
• Durability - what you write is what you get
40
NoSQL Support CAP Threom
41
NoSQL Support CAP Theorem
• Consistency - each client always has the same view of the data.
• Availability - all clients can always read and write.
• Partition tolerance - if one or more nodes fails the system still works
You can pick only two...
42
Visual Guide to NoSQL Systems
42R&D /MW DevelopementAvea
43
NoSQL Complexity
44
NoSQL Performance
45
Job Trends
45R&D /MW DevelopementAvea
46
Graph DB and Neo4j
47
Graph DB
Graph database uses graph structures with nodes, edges, and properties to represent and store data.
48
Graph DB Usage Area
• Recommendations
• Business Inteligence
• Social networking
• MDM
• System Management
• Time Series data
• Product Catalogue
• Web Analitics
• Scientific Computing
• Indexing your slow RMDBS
49
Relational Databases are Graphs!
50
Neo4j
• Leading Graph Database
• Transaction support (ACID)
• Indexing
• Querying
• REST support
• Disk Based
• Opensource
• Traversal framework
• High Performance (traverse 1.000.000 + relationship/seconds)
• Robust (in 7/24 operation since 2003)
• Massive scalability
51
Neo4j Data Model
Neo4j has Nodes and Relationship.
Nodes and realtionships have properties.
Node1
Node2
Property:name
Property:surname
Property:name
Property:surname
Relationship
Relationship type : knowsProperty : Date of meeting
52
Ne4j Performance
http://www.neotechnology.com/2012/10/20-billion-relationships-imported-into-neo4j-on-ec2/
53
Who use Neo4j?
• Cisco - Master Data Management
• Telenor Group : Customer organization scructure (203 million subscribers )
• Deutsche Telekom: Social football site (150 million subscribers )
54
Cypher For Query
55
Sample Code
56
Spring Data Neo4j
57
Neoclipse
58
Product Catalog
58R&D /MW DevelopementAvea
59
Sample OM Data Model
60
Hardware Calculating Tool
61
Hardware Calculating Tool Result
Calculation Result Prod Environment
• 4 pysical machines
• 3 node at every machines
• 1024 mhz cpu
• 65536 MB Ram
62
Orient DB
• The Document-Graph database
• ACID support
• SQL and Native Queries,
• schema-less, schema-full and schema-mixed modes
• Roles + Security
• Functions
• HTTP / Restfull / Json / Binary supports
• Hooks
• Fetch plans
• Inheritance
• 200.000 insert per second(6 M node travels with cache)
63
FluxGraph
• Temporal Graph Database
• Has checkpoint
• Compatible with Neo4j
632008-07-01_Presentation Template MBT / CEOMercedes-Benz Türk A.Ş.
64
Examples for TelCos
• CDR
• Routing
• Social graphs
• Master Data Management
• Spatial and LBS
• Network topology analysis
• Neo4j and Android
64R&D /MW DevelopementAvea
65
CDR Analysis
65R&D /MW DevelopementAvea
66
Master Data Management
66R&D /MW DevelopementAvea
67
Network Management
67R&D /MW DevelopementAvea
68
Cell Network Analiysis
68R&D /MW DevelopementAvea
69
Sample Senarios
• Customer Spesific Campaign
• Prevent Churn
• Get More Customer
• Special offer for campaigns
70
Thanks