7/29/2019 NoSQL: From Oracle to MongoDB
1/56
Pablo [email protected]
06.10.2012
A real use case atTelefnica PDI
From Oracle toMongoDB
7/29/2019 NoSQL: From Oracle to MongoDB
2/56
01
02
03
04
IntroductionTelefnica PDI. Who?Personalisation Server. Why? What?The SQL version
Data model and architecture Integrations, problems and improvementsThe NoSQL versionData model and architecturePerformance boostThe badConclusionsConclusionsPersonal thoughts
Content
7/29/2019 NoSQL: From Oracle to MongoDB
3/56
Ttulo del captulo
Mximo 3 lneas
01Introduction
7/29/2019 NoSQL: From Oracle to MongoDB
4/56
4Telefnica PDI
Telefnica PDI. Who?
Telefnica Fifth largest telecommunications company in the world Operations in Europe (7 countries), the United States and Latin America
(15 countries)
Telefnica Digital Web and mobile digital contents and services divisionProduct Development and Innovation unit
Formerly Telefnica R&D
Product & service development, platforms development, research,technology strategy, user experience and deployment & operation
Around 70 diferent on going projects at all time.
01
7/29/2019 NoSQL: From Oracle to MongoDB
5/56
5Telefnica PDI
Personalisation Server. What?
User profiling systemMachine learningRecommendations
Customers profile storage
01
7/29/2019 NoSQL: From Oracle to MongoDB
6/56
6Telefnica PDI
Opt-in and profile module. Why?
Users data, profile and permissions, was scattered across diferentstorages
01
GenderFilm and music preferencesIPTV servicePermission to contact by SMS?GenderMobileserviceAddressMusic preferencesMusic ticketsserviceAddressPermission to contact by SMS?Locationbased ofers
So you want toknow myaddressAGAIN?!
7/29/2019 NoSQL: From Oracle to MongoDB
7/56
7Telefnica PDI
Opt-in and profile module. Why?
Users data, profile and permissions, was scattered across diferentstorages
01
GenderFilm and music preferencesIPTV servicePermission to contact by SMS?GenderMobileserviceAddressMusic preferencesMusic ticketsserviceAddressPermission to contact by SMS?Locationbased ofers
7/29/2019 NoSQL: From Oracle to MongoDB
8/56
8Telefnica PDI
Opt-in and profile module. Why?
Provide a module to become mastercustomers data storage
01
Gender Film and music
preferences Permission to contact
by SMS? Address
IPTV service
Mobileservice
Music ticketsservice
Locationbased ofers
7/29/2019 NoSQL: From Oracle to MongoDB
9/56
9Telefnica PDI
Opt-in and profile module. What?
Features: Flexible profile definition, classified in services Profile sharing options between diferent services Real time API Supplementary oine batch interface Authorization system High availability Inexpensive solution & hardware
01
7/29/2019 NoSQL: From Oracle to MongoDB
10/56
Ttulo del captulo
Mximo 3 lneas
02The SQL solution
7/29/2019 NoSQL: From Oracle to MongoDB
11/56
11Telefnica PDI
Data model
Services defined a set ofattributes (their profile), with defaultvalue and data typeUsers were registered in servicesUsers defined values for some of the services attributes
Each attribute value had an update date to avoid overwriting newerchanges through batch loads
Services, users and their profile
02
7/29/2019 NoSQL: From Oracle to MongoDB
12/56
12Telefnica PDI
Data model
Services could access attributes declared inside other servicesThere were sharing rights for read or read and writeThe user had to be registered in both servicesServices profile sharing matrix
02
7/29/2019 NoSQL: From Oracle to MongoDB
13/56
13Telefnica PDI
Data model
Everything that could be accessed in the PS was a resourceRoles defined access rights (read or read and write) of resourcesAuth users had rolesRoles could include other roles
Authorization system
02
7/29/2019 NoSQL: From Oracle to MongoDB
14/56
14Telefnica PDI
Data model
Multiple IDS: Users profile could be accessed with diferent equivalent IDs dependingon the service
Each user ID was defined by an ID type (phone number, email, portal ID,hash) and the ID value
Bonus features!
02
7/29/2019 NoSQL: From Oracle to MongoDB
15/56
15Telefnica PDI
High level logical architecture
Everything running on Red Hat EL 5.4 64 bits
02
7/29/2019 NoSQL: From Oracle to MongoDB
16/56
16Telefnica PDI
High level logical architecture
Everything running on Red Hat EL 5.4 64 bits
02
7/29/2019 NoSQL: From Oracle to MongoDB
17/56
17Telefnica PDI
Integration
PS replaces all customers profile andpermissions DBsAll systems access this data through
PS real time API
In special cases, some PS-consumerscould use the batch interface.
The same way new services could beadded quite easily
Planned integration
02
7/29/2019 NoSQL: From Oracle to MongoDB
18/56
18Telefnica PDI
Integration
Budget restrictions: adapt all servicesto use the API was too expensiveKeep independent systems DBs and
synchronize PS through batch
Use DBs built-in massive extractionfeature to generate daily batch files
However in most cases those DBswere not able to generate Delta(only changes) extractions
Provide full daily snapshots!
Problems arise
02
02
7/29/2019 NoSQL: From Oracle to MongoDB
19/56
19Telefnica PDI
First version performance
1.8M customers, 180 profile attributes, 6 servicesSizes
Tables + indexes size: 65Gb 30% of the size were indexes
Batch Full DWH customers profile import: > 24 hours Delta extractions: 4 - 6 hours Loads and extractions performance proportional to data size
API: Response time with average trac: 110ms
02
Ireland
7/29/2019 NoSQL: From Oracle to MongoDB
20/56
Ttulo del captulo
Mximo 3 lneas
03The SQL solution
Second version
03
7/29/2019 NoSQL: From Oracle to MongoDB
21/56
21Telefnica PDI
Second version
New approach: batch processes access directly DB
03
High level logical architecture
03
7/29/2019 NoSQL: From Oracle to MongoDB
22/56
22Telefnica PDI
Second version
Batch processes had to Validate authentication and authorization Verify user, service and attribute existence Check equivalent IDs Validate sharing matrix rights
Validate values data type
Check the update date of the existing values
03
Batch processes
03
7/29/2019 NoSQL: From Oracle to MongoDB
23/56
23Telefnica PDI
Second version03
DB Batch processing
OurDBAs
03
7/29/2019 NoSQL: From Oracle to MongoDB
24/56
24Telefnica PDI
Second version
Preprocess incoming batch file in BE servers Validate format, services and attributes existence and values data types Generate intermediate file with structure like target DB table
Load intermediate file (Oracles SQL*Loader) to a temporal tableSwitch DB to deferred writing, storing all incoming modificationsMerge temporal table and final table, checking values update dateReplace old users attributes values table with merge resultApply deferred writing operations
03
New DB-based batch loading process
03
7/29/2019 NoSQL: From Oracle to MongoDB
25/56
25Telefnica PDI
Second version
Generate a temporal DB table with format similar to final batch file.Two loops over users attributes values table required: Select format of the table; number and order of columns / attributes Fill the new table
Loop the whole temporal table for final formatting (empty fields)From batch side loop across the whole table (SELECT * FROM )
Write each retrieved row as a line in the resulting file
03
New batch extraction process
03
7/29/2019 NoSQL: From Oracle to MongoDB
26/56
26Telefnica PDI
Second version performance
Batch time window: 3:30 hours Full DWH load Two Delta loads Three Delta extractions
API: Ireland requirement: < 500ms
03
Ireland performance requirements
03
7/29/2019 NoSQL: From Oracle to MongoDB
27/56
27Telefnica PDI
Second version performance
1.8M customers, 180 profile attributes, 6 servicesSizes Tables + indexes size: 65Gb 30% of the size were indexes Temporal tables size increases almost exponentially: 15Gb and above Intermediate file size: from 700Mb to 7Gb
Batch Full DWH customers profile import: 2:30 hours Delta extractions: 1:00 hour Loads performance worsened quickly (almost exp): 6:00 hours Extractions performance proportional to data size Concurrent batch processes may halt the DB
API: Response time with average trac: 80ms Response time while loading was unpredictable:>300ms
03
Ireland
7/29/2019 NoSQL: From Oracle to MongoDB
28/56
Ttulo del captulo
Mximo 3 lneas
04The SQL solution
Third version
04
7/29/2019 NoSQL: From Oracle to MongoDB
29/56
29Telefnica PDI
Third version04
Speed up DB Batch processes
OurDB
As(aga
in)
04
7/29/2019 NoSQL: From Oracle to MongoDB
30/56
30Telefnica PDI
Third version
Minor preprocessing of incoming batch file in BE servers Just validate format No intermediate file needed!
Load validated file (Oracles SQL*Loader) to a temporal table
Loop the temporal table merging the values into final table, checkingvalues update date and data types
Use several concurrent writing jobs
Store results on real table, no need to replace!No deferred writing!
04
New (second) DB-based batch loading process
04
7/29/2019 NoSQL: From Oracle to MongoDB
31/56
31Telefnica PDI
Third version
Optimized loops to generate temporal output table. Use several concurrent writing jobs We achieved a speed-up of between 1.5 and 2
Loop the whole temporal table for final formatting (empty fields)
Download and write lines directly inside Oracles sqlplusNo SELECT * FROM query from Batch side!
04
Enhancements to extraction process
04
7/29/2019 NoSQL: From Oracle to MongoDB
32/56
32Telefnica PDI
OurDB
As
Third version performance
1.8M customers, 180 profile attributes, 6 servicesSizes Tables + indexes size: 65Gb 30% of the size were indexes Temporal tables: 15Gb
Batch Full DWH customers profile import: 1:10 hours (vs. 2:30 hours) Three Delta extractions: 2:15 hours (vs. 3:00 hours) Loads and extractions performance proportional to data size
Concurrent batch processes not so harmful
API: Response time with average trac: 110ms Response time while loading:400ms
04
Ireland
04
7/29/2019 NoSQL: From Oracle to MongoDB
33/56
33Telefnica PDI
OurDB
As
Third version performance
25M customers, 150 profile attributes, 15 servicesSizes Tables + indexes size: 700Gb 40% of the size were indexes
Batch Two Delta imports: < 2:00 hours Two Delta extractions: < 2:00 hours Loads and extractions performance proportional to data size
API: Response time with average trac: 90ms
04
United Kingdom
04
7/29/2019 NoSQL: From Oracle to MongoDB
34/56
34Telefnica PDI
OurDB
As
Third version performance04
Ireland 3rd version 2nd versionDB size 65Gb + 15Gb (temp) 65Gb + > 15GbFull DWH load 1:10 hours 2:30 hoursThree Delta exports 2:15 hours 3:00 hoursBatch stability Stable, linear Unstable, exponentialAPI response time 110ms 110msAPI while loading 400ms Unpredictable
United Kingdom 3rd versionDB size
700Gb
Two Delta loads < 2:00 hoursThree Delta exports < 2:00 hoursAPI response time 90ms
04
7/29/2019 NoSQL: From Oracle to MongoDB
35/56
35Telefnica PDI
Third version performance
20 database tablesAPI: several queries withup to 35 joins and even some unionsAuthorization: 5 joins to validate auth users accessBatch:
Load: 1700 lines of PL/SQL Extraction: 1200 of PL/SQL
04
DB stats
04
7/29/2019 NoSQL: From Oracle to MongoDB
36/56
36Telefnica PDI
Mission completed?
04
7/29/2019 NoSQL: From Oracle to MongoDB
37/56
37Telefnica PDI
Third version performance
20M customers, 200 profile attributes, 10 servicesMexico time window: 4:00 hours
Full DWH load! Additional Delta feeds loads At least two Delta extractions
Mexico
OurDB
As
7/29/2019 NoSQL: From Oracle to MongoDB
38/56
Ttulo del captulo
Mximo 3 lneas
05The NoSQL solution
05
7/29/2019 NoSQL: From Oracle to MongoDB
39/56
39Telefnica PDI
MongoDB Data ModelServices and their profile + sharing matrix
{ _id : 7,
service_name : "root",id_type : 1,default_values: false,owned_attribs :
[{
attrib_id : 70005,
attrib_nane : marketing.consent",attrib_data_type : 1,attrib_def_value : "no",
attrib_status : 1}, ...
],
shared_attribs :
[{attrib_id : 20144, sharing_mode : 0},...
]}
attrib_id = service_id * 10000 + num attribs + 1
attrib_id = service_id * 10000 + num attribs + 1
d l05
7/29/2019 NoSQL: From Oracle to MongoDB
40/56
40Telefnica PDI
MongoDB Data ModelUsers and their profile + multiple IDs
{
_id : "011234"services_list :[
{
service_id : 1,reg_date : {"$date" : 1318040693000}
},...
],user_values :
[{
attrib_id : 10140,
attrib_value : "Open",update_date : {"$date" : 1317110161000}
},...
]}
Equivalent ID document:
{ _id : 05abcd"
ue : "011234"}
_id = id type + user ID
attrib_id = service_id * 10000 + num attribs + 1
_id = id type + user ID
d l05
7/29/2019 NoSQL: From Oracle to MongoDB
41/56
41Telefnica PDI
MongoDB Data ModelAuthorization system
AUTH USERS COLLECTION:
{_id: "admin"auth_pswd: XXX",
auth_roles: ['PS_ADMIN_ROLE, ],
auth_uris: [
{uri_path: "/**", method: 'R'},{uri_path: "/stats/**", method: 'RW'},{uri_path: "/kpis/**", method: IMPORT'},...
]} RESOURCES COLLECTION:
{_id: "admin.**",
role_uri: "/**"}
ROLES COLLECTION:
{
_id: 'PS_ADMIN_ROLE',roles_resources: [
{resource_id: "admin.**,
method: 'R' },{
resource_id: "stats.**, method: 'IMPORT' },
...]
}
Replicate uris (from resources)and methods (from roles)
M DB D M d l05
7/29/2019 NoSQL: From Oracle to MongoDB
42/56
42Telefnica PDI
MongoDB Data Model
Only 5 collectionsAPI: typically 2 accesses (services and users collections)Authorization: access only 1 collection to grant accessBatch: all processing done outside DB
DB stats
N SQL i05
7/29/2019 NoSQL: From Oracle to MongoDB
43/56
43Telefnica PDI
NoSQL version
Everything running on Red Hat EL 6.2 64 bits
High level logical architecture
N SQL i f05
7/29/2019 NoSQL: From Oracle to MongoDB
44/56
44Telefnica PDI
NoSQL version performance
1.8M customers, 180 profile attributes, 6 servicesSizes Collections + indexes size: 20Gb (vs. 65Gb) < 5% of the size are indexes (vs. 30%)
Batch Full DWH customers profile import: 0:12 hours (vs. 1:10 hours) Three Delta extractions: 0:40 hours (vs. 2:15 hours) Loads and extractions performance proportional to data size Concurrent batch processes without performance afection
API: Response time with average trac: < 10ms (vs. 110ms) Response time while loading: the same High load (600 TPS) response time while loading: 300ms
Ireland (at PDI lab)
N SQL i f05
7/29/2019 NoSQL: From Oracle to MongoDB
45/56
45Telefnica PDI
NoSQL version performance
25M customers, 150 profile attributes, 15 servicesSizes Collections + indexes size: 210Gb (vs. 700Gb) < 5% of the size were indexes
Batch Two Delta imports: < 0:40 hours (vs. 2:00 hours) Loads and extractions performance proportional to data size
United Kingdom (at PDI lab)
N SQL i f05
7/29/2019 NoSQL: From Oracle to MongoDB
46/56
46Telefnica PDI
NoSQL version performance
20M customers, 200 profile attributes, 15 servicesSizes Collections + indexes size: 320Gb Indexes size: 1.2Gb
Batch Initial Full import (20M, 40 attributes): 2:00 hours Small Full import (20M, 6 attributes): 0:40 hours
API: Response time with average trac: < 10ms (vs. 90ms) Response time while loading: the same High load (500 TPS) response time while loading: 270ms
Mexico
N SQL i f04
7/29/2019 NoSQL: From Oracle to MongoDB
47/56
47Telefnica PDI
OurDB
As
NoSQL version performanceIreland NoSQL version SQL version
DB size 20Gb 80GbFull DWH load 0:12 hours 1:10 hoursThree Delta exports 0:40 hours 2:15 hoursAPI while loading < 10ms 400msAPI 600TPS + loading 300ms Timeout / failure
United Kingdom NoSQL version SQL versionDB size 210Gb 700GbTwo Delta loads < 0:40hours < 2:00 hours
Mexico NoSQL versionDB size 320GbInitial Full load (40 attr) 2:00 hoursDaily Full load (6 attr) 0:40 hoursAPI while loading < 10msAPI 500TPS + loading 270ms
Mi i l t d?05
7/29/2019 NoSQL: From Oracle to MongoDB
48/56
48Telefnica PDI
Mission completed?
The bad05
7/29/2019 NoSQL: From Oracle to MongoDB
49/56
49Telefnica PDI
The bad
Batch load process was too fast To keep secondary nodes synched we needed oplog of16 or 24Gb We had to disable journaling for the first migrations
Labels of documents fields take up disc space Reduced them to just 2 chars: attribute_id -> ai
Respect the unwritten law of at least 70% of size in RAMTake care with compound indexes, order matters
You can save one index or you can have problems Put most important key (never nullable) the first one
DBAs whining and complaining about NoSQL If we had enough RAM for all data, Oracle would outperform MongoDB
The ugly05
7/29/2019 NoSQL: From Oracle to MongoDB
50/56
50Telefnica PDI
The ugly
Second migration once the PS is already running Full import adding 30 new attributes values: 10:00 hours Full import adding 150 new attributes values: 40:00 hours
Increase considerably documents size (i.e. adding lots of new valuesto the users) makes MongoDB rearrange the documents, performingaround 5 times slower Thats a problem when you are updating 10k documents per second
Solutions? Avoid this situation at all cost. Run away! Normalize users values; move to a new individual collection Prealloc the size with a faux field
You could waste space! Load in new collection, merge and swap, like we did in Oracle
7/29/2019 NoSQL: From Oracle to MongoDB
51/56
Ttulo del captulo
Mximo 3 lneas
06Ttulo del captulo
Mximo 3 lneas
Conclusions
Conclusions & personal thoughts06
7/29/2019 NoSQL: From Oracle to MongoDB
52/56
52Telefnica PDI
Conclusions & personal thoughts
Awesome performance boost But not all use cases fit in a MongoDB / NoSQL solution!New technology, diferent limitations
Fear of the unknown
SSDs performance? Long term performance and stability?
Python + MongoDB + pymongo = fast development I mean, really fast
MongoDB Monitoring Service (MMS)10gen people were very helpful
Questions?06
7/29/2019 NoSQL: From Oracle to MongoDB
53/56
53Telefnica PDI
Questions?
7/29/2019 NoSQL: From Oracle to MongoDB
54/56
SQL Physical architecture0X
7/29/2019 NoSQL: From Oracle to MongoDB
55/56
55Telefnica PDI
SQL Physical architecture
Scale horizontally adding more BE or DB servers or disks in the SAN Virtualized or physical servers depending on the deployment
MongoDB Physical architecture0X
7/29/2019 NoSQL: From Oracle to MongoDB
56/56
MongoDB Physical architecture
MongoDB arbiters running on BE servers Scale horizontally adding more BE servers or disks in the SAN Sharding may already be configured to scale adding more replica sets