+ All Categories
Home > Documents > NoSQL: From Oracle to MongoDB

NoSQL: From Oracle to MongoDB

Date post: 04-Apr-2018
Category:
Upload: istvan-reiter
View: 255 times
Download: 0 times
Share this document with a friend

of 56

Transcript
  • 7/29/2019 NoSQL: From Oracle to MongoDB

    1/56

    Pablo [email protected]

    06.10.2012

    A real use case atTelefnica PDI

    From Oracle toMongoDB

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    2/56

    01

    02

    03

    04

    IntroductionTelefnica PDI. Who?Personalisation Server. Why? What?The SQL version

    Data model and architecture Integrations, problems and improvementsThe NoSQL versionData model and architecturePerformance boostThe badConclusionsConclusionsPersonal thoughts

    Content

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    3/56

    Ttulo del captulo

    Mximo 3 lneas

    01Introduction

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    4/56

    4Telefnica PDI

    Telefnica PDI. Who?

    Telefnica Fifth largest telecommunications company in the world Operations in Europe (7 countries), the United States and Latin America

    (15 countries)

    Telefnica Digital Web and mobile digital contents and services divisionProduct Development and Innovation unit

    Formerly Telefnica R&D

    Product & service development, platforms development, research,technology strategy, user experience and deployment & operation

    Around 70 diferent on going projects at all time.

    01

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    5/56

    5Telefnica PDI

    Personalisation Server. What?

    User profiling systemMachine learningRecommendations

    Customers profile storage

    01

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    6/56

    6Telefnica PDI

    Opt-in and profile module. Why?

    Users data, profile and permissions, was scattered across diferentstorages

    01

    GenderFilm and music preferencesIPTV servicePermission to contact by SMS?GenderMobileserviceAddressMusic preferencesMusic ticketsserviceAddressPermission to contact by SMS?Locationbased ofers

    So you want toknow myaddressAGAIN?!

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    7/56

    7Telefnica PDI

    Opt-in and profile module. Why?

    Users data, profile and permissions, was scattered across diferentstorages

    01

    GenderFilm and music preferencesIPTV servicePermission to contact by SMS?GenderMobileserviceAddressMusic preferencesMusic ticketsserviceAddressPermission to contact by SMS?Locationbased ofers

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    8/56

    8Telefnica PDI

    Opt-in and profile module. Why?

    Provide a module to become mastercustomers data storage

    01

    Gender Film and music

    preferences Permission to contact

    by SMS? Address

    IPTV service

    Mobileservice

    Music ticketsservice

    Locationbased ofers

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    9/56

    9Telefnica PDI

    Opt-in and profile module. What?

    Features: Flexible profile definition, classified in services Profile sharing options between diferent services Real time API Supplementary oine batch interface Authorization system High availability Inexpensive solution & hardware

    01

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    10/56

    Ttulo del captulo

    Mximo 3 lneas

    02The SQL solution

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    11/56

    11Telefnica PDI

    Data model

    Services defined a set ofattributes (their profile), with defaultvalue and data typeUsers were registered in servicesUsers defined values for some of the services attributes

    Each attribute value had an update date to avoid overwriting newerchanges through batch loads

    Services, users and their profile

    02

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    12/56

    12Telefnica PDI

    Data model

    Services could access attributes declared inside other servicesThere were sharing rights for read or read and writeThe user had to be registered in both servicesServices profile sharing matrix

    02

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    13/56

    13Telefnica PDI

    Data model

    Everything that could be accessed in the PS was a resourceRoles defined access rights (read or read and write) of resourcesAuth users had rolesRoles could include other roles

    Authorization system

    02

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    14/56

    14Telefnica PDI

    Data model

    Multiple IDS: Users profile could be accessed with diferent equivalent IDs dependingon the service

    Each user ID was defined by an ID type (phone number, email, portal ID,hash) and the ID value

    Bonus features!

    02

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    15/56

    15Telefnica PDI

    High level logical architecture

    Everything running on Red Hat EL 5.4 64 bits

    02

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    16/56

    16Telefnica PDI

    High level logical architecture

    Everything running on Red Hat EL 5.4 64 bits

    02

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    17/56

    17Telefnica PDI

    Integration

    PS replaces all customers profile andpermissions DBsAll systems access this data through

    PS real time API

    In special cases, some PS-consumerscould use the batch interface.

    The same way new services could beadded quite easily

    Planned integration

    02

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    18/56

    18Telefnica PDI

    Integration

    Budget restrictions: adapt all servicesto use the API was too expensiveKeep independent systems DBs and

    synchronize PS through batch

    Use DBs built-in massive extractionfeature to generate daily batch files

    However in most cases those DBswere not able to generate Delta(only changes) extractions

    Provide full daily snapshots!

    Problems arise

    02

    02

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    19/56

    19Telefnica PDI

    First version performance

    1.8M customers, 180 profile attributes, 6 servicesSizes

    Tables + indexes size: 65Gb 30% of the size were indexes

    Batch Full DWH customers profile import: > 24 hours Delta extractions: 4 - 6 hours Loads and extractions performance proportional to data size

    API: Response time with average trac: 110ms

    02

    Ireland

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    20/56

    Ttulo del captulo

    Mximo 3 lneas

    03The SQL solution

    Second version

    03

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    21/56

    21Telefnica PDI

    Second version

    New approach: batch processes access directly DB

    03

    High level logical architecture

    03

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    22/56

    22Telefnica PDI

    Second version

    Batch processes had to Validate authentication and authorization Verify user, service and attribute existence Check equivalent IDs Validate sharing matrix rights

    Validate values data type

    Check the update date of the existing values

    03

    Batch processes

    03

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    23/56

    23Telefnica PDI

    Second version03

    DB Batch processing

    OurDBAs

    03

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    24/56

    24Telefnica PDI

    Second version

    Preprocess incoming batch file in BE servers Validate format, services and attributes existence and values data types Generate intermediate file with structure like target DB table

    Load intermediate file (Oracles SQL*Loader) to a temporal tableSwitch DB to deferred writing, storing all incoming modificationsMerge temporal table and final table, checking values update dateReplace old users attributes values table with merge resultApply deferred writing operations

    03

    New DB-based batch loading process

    03

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    25/56

    25Telefnica PDI

    Second version

    Generate a temporal DB table with format similar to final batch file.Two loops over users attributes values table required: Select format of the table; number and order of columns / attributes Fill the new table

    Loop the whole temporal table for final formatting (empty fields)From batch side loop across the whole table (SELECT * FROM )

    Write each retrieved row as a line in the resulting file

    03

    New batch extraction process

    03

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    26/56

    26Telefnica PDI

    Second version performance

    Batch time window: 3:30 hours Full DWH load Two Delta loads Three Delta extractions

    API: Ireland requirement: < 500ms

    03

    Ireland performance requirements

    03

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    27/56

    27Telefnica PDI

    Second version performance

    1.8M customers, 180 profile attributes, 6 servicesSizes Tables + indexes size: 65Gb 30% of the size were indexes Temporal tables size increases almost exponentially: 15Gb and above Intermediate file size: from 700Mb to 7Gb

    Batch Full DWH customers profile import: 2:30 hours Delta extractions: 1:00 hour Loads performance worsened quickly (almost exp): 6:00 hours Extractions performance proportional to data size Concurrent batch processes may halt the DB

    API: Response time with average trac: 80ms Response time while loading was unpredictable:>300ms

    03

    Ireland

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    28/56

    Ttulo del captulo

    Mximo 3 lneas

    04The SQL solution

    Third version

    04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    29/56

    29Telefnica PDI

    Third version04

    Speed up DB Batch processes

    OurDB

    As(aga

    in)

    04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    30/56

    30Telefnica PDI

    Third version

    Minor preprocessing of incoming batch file in BE servers Just validate format No intermediate file needed!

    Load validated file (Oracles SQL*Loader) to a temporal table

    Loop the temporal table merging the values into final table, checkingvalues update date and data types

    Use several concurrent writing jobs

    Store results on real table, no need to replace!No deferred writing!

    04

    New (second) DB-based batch loading process

    04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    31/56

    31Telefnica PDI

    Third version

    Optimized loops to generate temporal output table. Use several concurrent writing jobs We achieved a speed-up of between 1.5 and 2

    Loop the whole temporal table for final formatting (empty fields)

    Download and write lines directly inside Oracles sqlplusNo SELECT * FROM query from Batch side!

    04

    Enhancements to extraction process

    04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    32/56

    32Telefnica PDI

    OurDB

    As

    Third version performance

    1.8M customers, 180 profile attributes, 6 servicesSizes Tables + indexes size: 65Gb 30% of the size were indexes Temporal tables: 15Gb

    Batch Full DWH customers profile import: 1:10 hours (vs. 2:30 hours) Three Delta extractions: 2:15 hours (vs. 3:00 hours) Loads and extractions performance proportional to data size

    Concurrent batch processes not so harmful

    API: Response time with average trac: 110ms Response time while loading:400ms

    04

    Ireland

    04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    33/56

    33Telefnica PDI

    OurDB

    As

    Third version performance

    25M customers, 150 profile attributes, 15 servicesSizes Tables + indexes size: 700Gb 40% of the size were indexes

    Batch Two Delta imports: < 2:00 hours Two Delta extractions: < 2:00 hours Loads and extractions performance proportional to data size

    API: Response time with average trac: 90ms

    04

    United Kingdom

    04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    34/56

    34Telefnica PDI

    OurDB

    As

    Third version performance04

    Ireland 3rd version 2nd versionDB size 65Gb + 15Gb (temp) 65Gb + > 15GbFull DWH load 1:10 hours 2:30 hoursThree Delta exports 2:15 hours 3:00 hoursBatch stability Stable, linear Unstable, exponentialAPI response time 110ms 110msAPI while loading 400ms Unpredictable

    United Kingdom 3rd versionDB size

    700Gb

    Two Delta loads < 2:00 hoursThree Delta exports < 2:00 hoursAPI response time 90ms

    04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    35/56

    35Telefnica PDI

    Third version performance

    20 database tablesAPI: several queries withup to 35 joins and even some unionsAuthorization: 5 joins to validate auth users accessBatch:

    Load: 1700 lines of PL/SQL Extraction: 1200 of PL/SQL

    04

    DB stats

    04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    36/56

    36Telefnica PDI

    Mission completed?

    04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    37/56

    37Telefnica PDI

    Third version performance

    20M customers, 200 profile attributes, 10 servicesMexico time window: 4:00 hours

    Full DWH load! Additional Delta feeds loads At least two Delta extractions

    Mexico

    OurDB

    As

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    38/56

    Ttulo del captulo

    Mximo 3 lneas

    05The NoSQL solution

    05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    39/56

    39Telefnica PDI

    MongoDB Data ModelServices and their profile + sharing matrix

    { _id : 7,

    service_name : "root",id_type : 1,default_values: false,owned_attribs :

    [{

    attrib_id : 70005,

    attrib_nane : marketing.consent",attrib_data_type : 1,attrib_def_value : "no",

    attrib_status : 1}, ...

    ],

    shared_attribs :

    [{attrib_id : 20144, sharing_mode : 0},...

    ]}

    attrib_id = service_id * 10000 + num attribs + 1

    attrib_id = service_id * 10000 + num attribs + 1

    d l05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    40/56

    40Telefnica PDI

    MongoDB Data ModelUsers and their profile + multiple IDs

    {

    _id : "011234"services_list :[

    {

    service_id : 1,reg_date : {"$date" : 1318040693000}

    },...

    ],user_values :

    [{

    attrib_id : 10140,

    attrib_value : "Open",update_date : {"$date" : 1317110161000}

    },...

    ]}

    Equivalent ID document:

    { _id : 05abcd"

    ue : "011234"}

    _id = id type + user ID

    attrib_id = service_id * 10000 + num attribs + 1

    _id = id type + user ID

    d l05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    41/56

    41Telefnica PDI

    MongoDB Data ModelAuthorization system

    AUTH USERS COLLECTION:

    {_id: "admin"auth_pswd: XXX",

    auth_roles: ['PS_ADMIN_ROLE, ],

    auth_uris: [

    {uri_path: "/**", method: 'R'},{uri_path: "/stats/**", method: 'RW'},{uri_path: "/kpis/**", method: IMPORT'},...

    ]} RESOURCES COLLECTION:

    {_id: "admin.**",

    role_uri: "/**"}

    ROLES COLLECTION:

    {

    _id: 'PS_ADMIN_ROLE',roles_resources: [

    {resource_id: "admin.**,

    method: 'R' },{

    resource_id: "stats.**, method: 'IMPORT' },

    ...]

    }

    Replicate uris (from resources)and methods (from roles)

    M DB D M d l05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    42/56

    42Telefnica PDI

    MongoDB Data Model

    Only 5 collectionsAPI: typically 2 accesses (services and users collections)Authorization: access only 1 collection to grant accessBatch: all processing done outside DB

    DB stats

    N SQL i05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    43/56

    43Telefnica PDI

    NoSQL version

    Everything running on Red Hat EL 6.2 64 bits

    High level logical architecture

    N SQL i f05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    44/56

    44Telefnica PDI

    NoSQL version performance

    1.8M customers, 180 profile attributes, 6 servicesSizes Collections + indexes size: 20Gb (vs. 65Gb) < 5% of the size are indexes (vs. 30%)

    Batch Full DWH customers profile import: 0:12 hours (vs. 1:10 hours) Three Delta extractions: 0:40 hours (vs. 2:15 hours) Loads and extractions performance proportional to data size Concurrent batch processes without performance afection

    API: Response time with average trac: < 10ms (vs. 110ms) Response time while loading: the same High load (600 TPS) response time while loading: 300ms

    Ireland (at PDI lab)

    N SQL i f05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    45/56

    45Telefnica PDI

    NoSQL version performance

    25M customers, 150 profile attributes, 15 servicesSizes Collections + indexes size: 210Gb (vs. 700Gb) < 5% of the size were indexes

    Batch Two Delta imports: < 0:40 hours (vs. 2:00 hours) Loads and extractions performance proportional to data size

    United Kingdom (at PDI lab)

    N SQL i f05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    46/56

    46Telefnica PDI

    NoSQL version performance

    20M customers, 200 profile attributes, 15 servicesSizes Collections + indexes size: 320Gb Indexes size: 1.2Gb

    Batch Initial Full import (20M, 40 attributes): 2:00 hours Small Full import (20M, 6 attributes): 0:40 hours

    API: Response time with average trac: < 10ms (vs. 90ms) Response time while loading: the same High load (500 TPS) response time while loading: 270ms

    Mexico

    N SQL i f04

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    47/56

    47Telefnica PDI

    OurDB

    As

    NoSQL version performanceIreland NoSQL version SQL version

    DB size 20Gb 80GbFull DWH load 0:12 hours 1:10 hoursThree Delta exports 0:40 hours 2:15 hoursAPI while loading < 10ms 400msAPI 600TPS + loading 300ms Timeout / failure

    United Kingdom NoSQL version SQL versionDB size 210Gb 700GbTwo Delta loads < 0:40hours < 2:00 hours

    Mexico NoSQL versionDB size 320GbInitial Full load (40 attr) 2:00 hoursDaily Full load (6 attr) 0:40 hoursAPI while loading < 10msAPI 500TPS + loading 270ms

    Mi i l t d?05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    48/56

    48Telefnica PDI

    Mission completed?

    The bad05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    49/56

    49Telefnica PDI

    The bad

    Batch load process was too fast To keep secondary nodes synched we needed oplog of16 or 24Gb We had to disable journaling for the first migrations

    Labels of documents fields take up disc space Reduced them to just 2 chars: attribute_id -> ai

    Respect the unwritten law of at least 70% of size in RAMTake care with compound indexes, order matters

    You can save one index or you can have problems Put most important key (never nullable) the first one

    DBAs whining and complaining about NoSQL If we had enough RAM for all data, Oracle would outperform MongoDB

    The ugly05

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    50/56

    50Telefnica PDI

    The ugly

    Second migration once the PS is already running Full import adding 30 new attributes values: 10:00 hours Full import adding 150 new attributes values: 40:00 hours

    Increase considerably documents size (i.e. adding lots of new valuesto the users) makes MongoDB rearrange the documents, performingaround 5 times slower Thats a problem when you are updating 10k documents per second

    Solutions? Avoid this situation at all cost. Run away! Normalize users values; move to a new individual collection Prealloc the size with a faux field

    You could waste space! Load in new collection, merge and swap, like we did in Oracle

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    51/56

    Ttulo del captulo

    Mximo 3 lneas

    06Ttulo del captulo

    Mximo 3 lneas

    Conclusions

    Conclusions & personal thoughts06

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    52/56

    52Telefnica PDI

    Conclusions & personal thoughts

    Awesome performance boost But not all use cases fit in a MongoDB / NoSQL solution!New technology, diferent limitations

    Fear of the unknown

    SSDs performance? Long term performance and stability?

    Python + MongoDB + pymongo = fast development I mean, really fast

    MongoDB Monitoring Service (MMS)10gen people were very helpful

    Questions?06

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    53/56

    53Telefnica PDI

    Questions?

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    54/56

    SQL Physical architecture0X

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    55/56

    55Telefnica PDI

    SQL Physical architecture

    Scale horizontally adding more BE or DB servers or disks in the SAN Virtualized or physical servers depending on the deployment

    MongoDB Physical architecture0X

  • 7/29/2019 NoSQL: From Oracle to MongoDB

    56/56

    MongoDB Physical architecture

    MongoDB arbiters running on BE servers Scale horizontally adding more BE servers or disks in the SAN Sharding may already be configured to scale adding more replica sets


Recommended