+ All Categories
Home > Software > Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Date post: 15-Apr-2017
Category:
Upload: couchbase
View: 189 times
Download: 0 times
Share this document with a friend
13
Confidential and Proprietary CONNECTING THE DOTS WITH COUCHBASE Nov 2016 Jay Duraisamy Gijun Lee
Transcript
Page 1: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary

CONNECTING THE DOTS WITH COUCHBASE

Nov 2016Jay Duraisamy

Gijun Lee

Page 2: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 2

PresentersJay Duraisamy – VP Technology

• Currently leading Data Platforms group within Equifax, a core platform organization that supports US Consumer Information Solutions group ($1.3 billion). The group is responsible for petabyte scale infrastructure (5PB's and growing) for both offline (MPP, and Big Data) and Online (incl. NoSQL).

• 18 years of Industry experience in building teams and platforms leveraging expertise in software architecture and design philosophies. Worked as a developer, lead and architect on B2B, B2C and Big Data technologies. Graduate degree from Indian Institute of Technology and MBA from Goizueta Business School from Emory University, Atlanta.

• Enjoy Jogging, Reading and spending time with his twin daughters!

Gijun Lee - Application Developer IV• Currently working on Equifax B2B data platform that supports US consumer data analytics &

processing both offline & online. Recently developed B2B Rest service that serves financial history of U.S. consumers on Couchbase in Java.

• 16 years of design/development experience in financial applications including infrastructure, online/offline data analytics & processing in C/C++ & Java on Linux/Unix platform. Huge interest in NoSQL & Hadoop platform in Big Data space. Master of Science in Computer Science from University of Arkansas at Fayetteville.

• Enjoy hiking, watching movies, and travelling with family.

Page 3: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 3

Equifax & The Business of Big Data• An Information Technology company that operates in 24 countries. • A consumer credit company grown into a leading provider of insights and knowledge that helps

its customers make informed decisions. • The company organizes, assimilates and analyzes data on more than 820 million consumers and

more than 91 million businesses worldwide, and its database includes employee data contributed from more than 5,000 employers.

• Big Data before Big Data• First MPP/Grid Computing in 2003,

currently in production• Focused on high throughput

systems to deliver terabytes of data and Insights to FI’s and Banks

• Petabytes in Scale• Talent that can distinguish and gain

between low latency and high throughput trade offs.

Big Data Why NoSQL?

Page 4: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 4

Big Data Online & The Teamwork

Page 5: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary

Technology Requirements

PLAN EVALUATE

Q1 ‘16 Q4 ’15 Q1’16 Q2’16

INTEGRATEBUILD

Potential TimelineLAUNCH

Next steps

1. Keep in mind of the tight SLA (5ms) and timeline for Q2’16 launch

2. Evaluate Technologies – Redis, Mongo and Couchbase

3. Grade the Technical support from the Partners during the evaluation

4. Choose the winning Technology Partner and Negotiate the Software agreement

5. Build, Integrate, Deploy and Run

Key Value Store• Key to retrieve data, no complex queries• NoSQL document – Complex data objects with no normalization

Ever Growing Data• Current use case is little over a TB, but plan for other use cases• Scale for Multi-terra bytes of data expected in the future.

High Performance & Availability• System uptime and replication for fault tolerance. • DR Capabilities

Others• Application Development friendly• Integration with Hadoop, Spark and Elastic Search

Page 6: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 6

The Winner is … In Memory and Disk Key Value Store - ForestDB Distributed Documented Database Automatic Replication Integrated Caching Primary and Secondary Indexes Spatial Querying LDAP integration and admin auditing Master-Master and Master-Slave Replication Memcached Protocol and Restful HTTP API N1QL – SQL-like query language Multi-dimensional scaling Cross data center replication filtering - XDCR

Page 7: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 7

NuDB – Architecture

Page 8: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 8

NuDB Development

Storage Format•24 month trended credit data in JSON

•App specific metadata

•Compression with base64 encoding

Interface•JSON based HTTP Post

•Retrieve, Update, Add and Delete operations

•Spring MVC to marshal request response

App Server•App server in Tomcat shields Couchbase as backend

•Simple drop installation

•DAO to decouple Database transactions

Data Ingestion•Online live system, Ingest data faster with little downtime

•RxJava, multi-threaded parallel loader

•Programmed in Java

Page 9: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 9

NuDB Deployment

Cluster

App Server

8 Node Cluster with 2x replication, 100% data cached in memory, RAID 10 mirroring

2 Linux ETL server as App Server w/ failover, Load balancing with F5

Monthly import and export via Control-M scheduler when cluster is live, No impact to production

System generated transactions to monitor health, Transactions aggregated time monitored

Regular transactions extractions to UAT to monitor for verification and validation

XDCR to handle Cluster Replication. No coding required

Ingestion

Monitoring

Sampling

DR

Page 10: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 10

NuDB – Lessons LearnedData Compression RxJava View and Consoles

• Compression friendly internal data format

• Compression saved 70% in document size

• Compression helps nullify the additional storage needed for replication

• Compression helps in data import. IO bound operations with 50% increase in CPU clock time

• Hadoop based import tool was replaced by RxJava

• RxJava utilizes resources better

• 300 million documents (1TB) in 40 minutes with 2 Java processes

• Exported 50million transactions in 10 minutes with 1 Java process

• Need to identify the latest updated transactions

• Initial design was to use Kafka asynchronous and switched to Couchbase views

• Operations team uses Views to analyze data. No additional coding required

• Couchbase health via Console

Page 11: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 11

Performance and Stress Testing

• 8 external servers with 2 threads per server

• 15 hours of continuous transactions

• Estimated 115 million transactions

• Average transaction time is 60ms

• Only failure observed was due to log filling disk after 15 hours

Stress Testing• 2133 Ops/Sec in Debug

mode. • 500K to 1.6million

Ops/Sec with Couchbase Pillowfight load test tool

• System can support up to an estimated 250million transactions/day approximately

Performance Sample Stats

Page 12: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 12

IN T

HE

NE

WS

Page 13: Equifax: Connecting the dots with Couchbase – Couchbase Connect 2016

Confidential and Proprietary 13

Questions?


Recommended