Estado del arte de las tecnologías NoSQL_Sergio Rodríguez

Post on 02-Dec-2014

257 views 1 download

Tags:

description

En poco tiempo NoSQL se ha convertido en uno de los tópicos de moda para el tratamiento de la inofrmación en entornos Web, móviles y de la Internet de las cosas lo que ha ocasionado la creación de un nuevo segmento de software, las bases de datos NoSQL, con muchos fabricantes, arquitecturas y funcionalidades muy diversas.

transcript

NoSQL

[ {‘nombre’: ‘Sergio Rodríguez de Guzmán’,‘email’: ‘sergio@corenetworks.es’,‘departamento’: ‘formación’

} ]

LÍNEAS DE NEGOCIOLÍNEAS DE NEGOCIO

FORMACIÓN IT DESARROLLO PRODUCTOCONSULTORÍA GESTIÓN DE IDENTIDAD Y ACCESOS

BIG DATA

Centro de Formación oficial de IT.

Acuerdos con Cloudera, Oracle, MongoDB, Apple, IBM, Vmware, Microsoft, Red Hat, Cisco…

Presencial, Virtual, Online

Personalización por proyecto

Consultoría, Integración y Soporte.

Tecnologías Oracle, Microsoft, Red Hat, ForgeRock, OWS2, Opensource…

Soporte 24x7, Proyectos Llave en mano

Expertos Big Data desde 2011

Acuerdos con Cloudera, Oracle, MongoDB,

Paquetes de servicios predefinidos en ETL, Arquitectura, Seguridad, Analíticas Descriptivas y Certificación de Producción

Soluciones de Gestión de Identidad y Accesos.

Tecnologías Oracle, Microsoft, Red Hat, ForgeRock, OWS2, Opensource…

Soporte 24x7, Proyectos Llave en mano

Proyectos en MDM

Soluciones de Seguridad y Auditoría del CPD

Gestión y Actualización de parches Multivendor para el CPD

Desarrollo de Aplicaciones Móviles

1980

1990

2000

2010

Rise of Relational

PersistenceIntegration

SQLTransactions

Reporting

IMPEDANCEMISMATCH

1980

1990

2000

2010

Rise of objectDatabases

Billing

Inventory

Integration Database

1980

1990

2000

2010

RelationalDominance

Lots of Traffic

SQL SQL

BigTable

Dynamo

“NoSQL”

Johan Oskarsson

London

San Francisco

#nosql

Dynomite

Characteristics of NoSQL

Non-relationalOpen Source

Schema-less Cluster-friendly

21st Century Web

DATA MODEL

DOCUMENT

GRAPH

COLUMN

KEY-VALUENoSQL

KEY-VALUE

10025

10026

10043

10048

DOCUMENT{ "id": 1, "name": "A green door", "price": 12.50, "tags": ["home", "green"] }

{ "id": 1, "name": "A blue door", "price": 14.50, "tags": ["home", “blue"], “discount": true}

Noschema

anOrder[“price”] * anOrder[“quantity”]

Implicitschema

KEY-VALUE10025

10026

10043

DOCUMENT

{ "id": 1, "name": "A green door", "price": 12.50, "tags": ["home", "green"] }

{ "id": 1, "name": "A blue door", "price": 14.50, "tags": ["home", “blue"], “discount": true}

customer_id: 7231

metadatakey

Key-Value Document

Aggregate-Oriented

Aggregate

Order

Line Item

KEY-VALUE10025

10026

10043

DOCUMENT

{ "id": 1, "name": "A green door", "price": 12.50, "tags": ["home", "green"] }

{ "id": 1, "name": "A blue door", "price": 14.50, "tags": ["home", “blue"], “discount": true}

Value == Aggregate Document == Aggregate

COLUMN-FAMILY

1234

name “sergio”

billingAddress data…

payment data…

OR1001 data…

OR1002 data…

OR1003 data…

OR1004 data…row key

column family

column key column value

profile

orders

Order1001

aggregate

Product Revenue Prior revenue

321291233 3083 7043

343412758 5032 4782

131494408 2198 3187

… … …

… … …

… … …

… … …

Order

Line Item Product1

DOCUMENT

GRAPH

COLUMN

KEY-VALUENoSQL

Aggregate-Oriented

DocumentColumn-Family

Key-valueGraph

Graph

GraphBigCo

Sergio Lucia

Rocio Rosana

employee_of employee_of

friend

friend

START rocio = node:nodeIndex(name = “Rocio”) MATCH (rocio)-[:FRIEND]->(friend_node)RETURN friend_node.name,friend_node.location

friend

Aggregate-Oriented

DocumentColumn-Family

Key-valueGraph

Schemaless

NOSQL AND CONSISTENCY

RDBMS == ACID

NoSQL == BASE

Aggregate-Oriented

DocumentColumn-Family

Key-valueGraph

ACID

Browser Server Database

Get Get

Post

Post

Offline Lock

v101

v101 v101VersionStamp

v102

v101

Consistency

Logical

Replication

Lucia Sergio

Lucia Sergio

Lucia Sergio

Lucia Sergio

Consistency

Availability

CAP Theorem

Consistency

AvailabilityPartitionTolerance

Pick any 2

Partition

Consistency

Availability

OR

Partition

Consistency

Availability

Consistency

Response Time

Safety

Liveness

RelaxingDurabilityEventual

ConsistencyQuorums

Read-Your-WritesConsistency

WHEN AND WHY TO USE NOSQL?

easierdevelopment

large scale data

NoSQL

Billing

Inventory

IntegrationDatabase

Billing

Inventory

ApplicationDatabaseWebservice

API

1980

1990

2000

2010

NoSQL?

1980

1990

2000

2010

PolyglotPersistence

User sessionsRedis

Financial DataRDBMS

Shopping CartRiak

RecommendationNeo4J

Product CatalogMongoDB

ReportingRDBMS

AnalyticsCassandra

User activity logsCassandra

Speculative Retailers Web Application

Problems

Decisions

Organizational Change

ImmaturityEventual Consistency

Strategic

Rapid time tomarket

Dataintensive

andand/or

Possible Use Cases• Use A NoSQL Database For A Particular Application

Feature• Use A NoSQL Database For Speedy Batch

Processing• Use A NoSQL Database For Distributed Logging• Use A NoSQL Database For Large Tables• Use A RDBMS For Reporting

What's The Catch?• Difficult For Data In Different Databases To Interact• You Now Have To Decide Where To Store Data• Increased Application And Deployment Complexity• Additional Administrative Responsibilities• Training

APIS

Java

NoSQL

Python

Javascript EktorpJrelaxCouchDB4J

Who Is Actually Doing This?

Twitter• Vertically and horizontally partitioned MySQL• Several layers of aggressive caching, all application managed• Schema changes impossible, resulting in the use of bitfields

and piggyback tables• Hardware intensive• Error prone• Hitting MySQL limits• Already eventually consistent

Twitter

FlockDB

Twitter• Migrating from MySQL to Cassandra as their main

online data store• Hadoop/HBase used for people search feature• FlockDB used to manage the social graph• Hadoop for analytics• “As with all NoSQL systems, strengths in different

situations” - Kevin Weil, Analytics Lead, Twitterhttp://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010

Twitter• Increased availability• The ability to support new features• The ability to analyze their massive amount of

data in a reasonable amount of time

http://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010

NoSQL Job Trends

NoSQL Job Growth by Project

NoSQL Job Growth by Project (Relative)

NOSQL + BIG DATA SIMPLE SAMPLEGrokking Twitter

Step by Step• Use/Install Hadoop NoSQL Plugin• Import tweets from twitter• Write mapper in Java/Python• Write reducer in Java/Python• Call myself a data scientist

Groking Twittercurl --get 'https://stream.twitter.com/1.1/statuses/sample.json' --header 'Authorization: OAuth oauth_consumer_key="OsITqnRiCTmkQcv4dtPPj3mnq", oauth_nonce="d41d45177ab9b450f7d1cb82b0d37328", oauth_signature="bOpdpvFNxPuqrlUV4nBhiyyGWbA%3D", oauth_signature_method="HMAC-SHA1", oauth_timestamp="1411988258", oauth_token="295079318-AbQu8sOPCaxXebjwDnDhOUjMST8bgs60JajOffMn", oauth_version="1.0"' --verbose | mongoimport –d test –c live

Map Hashtags in Python#!/usr/bin/env python

import syssys.path.append(".")

from pynosql_hadoop import BSONMapper

def mapper(documents): for doc in documents: for hashtag in doc['entities']['hashtags']: yield {'_id': hashtag['text'], 'count': 1}

BSONMapper(mapper)print >> sys.stderr, "Done Mapping."

Reduce Hashtags in Python#!/usr/bin/env python

import syssys.path.append(".")

from pynosql_hadoop import BSONReducer

def reducer(key, values): print >> sys.stderr, "Hashtag %s" % key.encode('utf8') _count = 0 for v in values: _count += v['count'] return {'_id': key.encode('utf8'), 'count': _count}

BSONReducer(reducer)

All Together$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -libjars /usr/lib/hadoop/lib/nosql-hadoop.jar,/usr/lib/hadoop/lib/nosql-hadoop-streaming-1.4.0-SNAPSHOT.jar -mapper /tmp/twit_hashtag_map.py -reducer /tmp/twit_hashtag_reduce.py -jobconf nosql.input.uri=nosqldb://127.0.0.1/test.live -inputformat com.nosqldb.hadoop.mapred.NoSQLInputFormat -jobconf nosql.output.uri=nosqldb://127.0.0.1/test.twit_reduction -outputformat com.nosqldb.hadoop.mapred.NoSQLOutputFormat -io nosqldb -input /tmp/in -output /tmp/out -file /tmp/twit_hashtag_map.py -file /tmp/twit_hashtag_reduce.py

Popular Hashtagsdb.twit_hashtags.find().sort( {'count' : -1 }){ "_id" : "gameinsight", "count" : 1367 }{ "_id" : "رتويت", "count" : 1135 }{ "_id" : "넌감동이야 ", "count" : 796 }{ "_id" : "비투비 ", "count" : 778 }{ "_id" : " _ _ عنك_ غريبة معلومة { count" : 768" ,"ضع{ "_id" : "ريتويت", "count" : 757 }{ "_id" : " _ _ _ الواتساب_ قروبات في وظيفتك { count" : 748" ,"ماهي{ "_id" : "androidgames", "count" : 706 }{ "_id" : "android", "count" : 683 }{ "_id" : " _ _ الثنيان_ من احسن { count" : 680" ,"الفريدي

sales@corenetworks.es