+ All Categories
Home > Business > Data-As-A-Service to enable compliance reporting

Data-As-A-Service to enable compliance reporting

Date post: 24-Jan-2015
Category:
Upload: analyticsweek
View: 58 times
Download: 0 times
Share this document with a friend
Description:
Big Data tools are clearly very powerful & flexible while dealing with unstructured information. However, they are equally applicable, especially when combined with columnar stores such as parquet, to address rapidly changing regulatory requirements that involve reporting & analyzing data across multiple silos of structured information. This is an example of applying multiple big data tools to create data-as-a-service that brings together a data hub, and enable very high performance analytics & reporting leveraging a combination of HDFS, Spark, Cassandra, Parquet, Talend and Jasper. In this talk, we will discuss the architecture, challenges & opportunities of designing data-as-a-Service that enables businesses to respond to changing regulatory & compliance requirements. Speaker: Girish Juneja, Senior Vice President/CTO at Altisource Girish Juneja is in charge of guiding Altisource's technology vision and will led technology teams across Boston, Los Angeles, Seattle and other cities nationally and nationally, according to a release. Girish was formerly general manager of big data products and chief technology officer of data center software at California-based chip maker Intel Corp. (Nasdaq: INTC). He helped lead several acquisitions including the acquisition of McAfee Inc. in 2011, according to a release. He was also the co-founder of technology company Sarvega Inc., acquired by Intel in 2005, and he holds a master's degree in computer science and an MBA in finance and strategy from the University of Chicago.
28
Page | 1 © 2014 Altisource Labs. All Rights Reserved. © 2014 Altisource Labs. All Rights Reserved. Data as a service to enable compliance reporting Girish Juneja, CTO October 7, 2014
Transcript
Page 1: Data-As-A-Service to enable compliance reporting

Page | 1© 2014 Altisource Labs. All Rights Reserved. © 2014 Altisource Labs. All Rights Reserved.

Data as a service to enable compliance reporting

Girish Juneja, CTO October 7, 2014

Page 2: Data-As-A-Service to enable compliance reporting

Page | 2© 2014 Altisource Labs. All Rights Reserved.

Chairman: William C. Erbey

CEO: William B. Shepro

Employees: ~8,000

NASDAQ: ASPS

Market Cap:(Sept. 15, 2014)

$2.2 Billion

Performance since August 2009Separation from Ocwen®

CAGR Share Price: (Through Sept. 15, 2014)

47%

CAGR Service Revenue:(Through Sept. 30, 2013)

39%

Separated from Ocwen in August 2009

Created and separated RESI and AAMC in December 2012

Strong free cash flow

Strong growth prospects in very large markets

Altisource Overview

Page 3: Data-As-A-Service to enable compliance reporting

Page | 3© 2014 Altisource Labs. All Rights Reserved.

Altisource Vision

Vision To be the premier real estate and mortgage marketplace offering both content and distribution to the marketplace participants

MissionTo offer homeowners, buyers, sellers, agents, mortgage originators and servicers trusted and efficient marketplaces to conduct real estate and mortgage transactions, and improve outcomes for market participants

Home Sales

Home Rentals

Home Maintenance

Mortgage Originations

Mortgage Servicing

Mortgage MarketplaceReal Estate Marketplace Mortgage Marketplace

Page 4: Data-As-A-Service to enable compliance reporting

Page | 4© 2014 Altisource Labs. All Rights Reserved.

Increased RiskIncreased costs

Increased penalties and fines

Decreased customer satisfaction

COMPLIANCE- Velocity of new and

changing rules- Magnitude of financial

exposure- Existing technology limits

compliance capabilities

COMPLEXITY- Meeting borrower

/customer expectations- Elevated scrutiny of

borrower interactions- Proliferation of servicer

products- Reporting requirements

CHANGE- Lack of end-to-end visibility- Rigid and inflexible systems- Volume and nature of data

interoperability between data silos

State of the Business: Servicing

Co

mp

lian

ce

Page 5: Data-As-A-Service to enable compliance reporting

Page | 5© 2014 Altisource Labs. All Rights Reserved.

Future of Servicing

For servicers’ businesses to grow, a modern servicing platform must be:

Flexible and Adaptable

to easily and cost effectively respond to evolving market and

business dynamics

Scalable and Automated

to enable cost effective business growth

Interoperableto seamlessly interface with third

party apps and other software platforms

Compliance Centric

to meet ever-changing regulatory mandates

Analytical to drive continuous improvement and manage risk

Page 6: Data-As-A-Service to enable compliance reporting

Page | 6© 2014 Altisource Labs. All Rights Reserved.

Common Foundational Layer

Identity Mgmt

Single Sign-on

Authorization & RBAC

Entitlements

Encryption

MFA Authentication

Access Governance

User Profile

Rules Mgmt

Workflow Mgmt

Messaging

Notification & Subscription

Search

3rd Party Integrations

Metadata ManagementMaster Data Management

Reporting

Auditing

Data Archival

Warehousing

Data as a Service

Provisioning

Monitoring/Alerting

Backup/Restore

Configuration & Customization

Elastic Performance

Availability/DR

Metering

Cloud Abstraction layer

Deployment & Test Automation

App Provisioning, isolation, multi-tenancy & Life-cycle

Menus & Navigation API Management Caching DMZ Gateway

Service Registry

High performance scala based framework

Multi-tenant Compliance &

Security Framework

Workflows, Business & Compliance

Rules, Messaging, Integrations

CompliantData

Management Transactional, Reporting &

Analytics

Multi-tenant Operational Framework

Multi-tenant Cloud Provider Independent Rapid Deployment PaaS

Customer Experience

Page 7: Data-As-A-Service to enable compliance reporting

Page | 7© 2014 Altisource Labs. All Rights Reserved.

Financial Industry faces:– Increasing regulatory requirements– Increasing customer compliance requirements– Regulatory & customers’ changing requirements need correlating data across sources– Risk Analysis requires correlating internal and non-conforming structured external data sets– Existing data stores unable to respond rapidly

Organizations need solutions that:– Enable financial institutions to address changing regulations– Enable automated compliance monitoring processes and systems– Improve internal controls by retaining data lineage to unmodified source datasets– Provide actionable & timely information from data– Enable on-demand reporting on massive datasets based on schema defined with the

reporting request

Environment Overview

Page 8: Data-As-A-Service to enable compliance reporting

Page | 8© 2014 Altisource Labs. All Rights Reserved.

– Maintain data integrity by storing the organizational data with regulatory information in respective data dimensions.

– Data modeling and the design of facts and dimensions are very critical to the success of compliance data warehouse.

– The regulatory sufficiency needs to be maintained in the regulation dimension of the compliance data warehouse.

– From data warehouse data is transmitted into a regulatory data mart.– Transformation of base data elements and regulation rules are maintained

in the meta data.

The Traditional Enterprise Data Warehouse approach

Page 9: Data-As-A-Service to enable compliance reporting

Page | 9© 2014 Altisource Labs. All Rights Reserved.

Metadata, ETL & Core Model

Metadata– Reflects the business and business

processes– In EDW, all functionality is metadata driven

Data definitions• Source and Core model from technical

perspective• business perspective

– Transformations & Aggregations• Transformations to derive cleansed Core data

from source data• Aggregations and de-normalizations to create

Access model data

Loading Service– Reflects the business and business

processes– In EDW, all functionality is metadata driven

Data definitions• Source and Core model from technical

perspective• business perspective

– Transformations & Aggregations• Transformations to derive cleansed Core data

from source data• Aggregations and de-normalizations to create

Access model data

Access Model– Data in the Access model is directly

traceable to the Core model• De-normalized• Aggregated • Designed for query and access

performance.• End user requirements• Access restrictions and controls

– As a design decision, access model objects can either be physical structures, or structures materialized on access.

Core Model– Control

• Only approved, tested, and validated processes can update data in the Core model.

– Data Model• Highly Normalized• No redundancy• Subject areas, loans, investors..• Not optimized for apps• target data formats post-cleansing• It is optimized for efficiency and

correctness.

Page 10: Data-As-A-Service to enable compliance reporting

Page | 10© 2014 Altisource Labs. All Rights Reserved.

Each Change involves the following steps:– Update the extraction module– Update the staging module– Update the Transformation module– Update the metadata– Update the data repository

This process is tedious, involved, and brittle.

The Change/Update Process

Page 11: Data-As-A-Service to enable compliance reporting

Page | 11© 2014 Altisource Labs. All Rights Reserved.

We were driven to adopt big data technology for many reasons:– Demand to analyze new data sources in an ever shorter timeframe– Growth in data complexity– Variety of data types– Volume of data and inability to move it around due to time constraints– Velocity of data generation, internal and external– Veracity of data from multiple sources– Growth in analytical complexity– Increasing availability of cost-effective computing and data storage

Enhancing with Big Data Technologies

The big reason for us was the frequent change of requirements due to changing business & regulatory changes. Spark is a more flexible platform

Page 12: Data-As-A-Service to enable compliance reporting

Page | 12© 2014 Altisource Labs. All Rights Reserved. © 2014 Altisource Labs. All Rights Reserved.

Data as a Service

Page 13: Data-As-A-Service to enable compliance reporting

Page | 13© 2014 Altisource Labs. All Rights Reserved.

Data Mobilization View for Data Lake

Page 14: Data-As-A-Service to enable compliance reporting

Page | 14© 2014 Altisource Labs. All Rights Reserved.

Request Details:Service Name: Borrower DataContext: Current/Cleansed & Conformed/HistoryRequest Filter: <Borrower Name>, <Date Range>

Response Details:

Borrower Data Service

Page 15: Data-As-A-Service to enable compliance reporting

Page | 15© 2014 Altisource Labs. All Rights Reserved.

Borrower Data Service

Page 16: Data-As-A-Service to enable compliance reporting

Page | 16© 2014 Altisource Labs. All Rights Reserved.

Request Details:Service Name: Mortgage DataContext: Current/Cleansed & Conformed/HistoryRequest Filter: <Loan Number>, <Date Range>

Response Details:

Mortgage Data Service

Page 17: Data-As-A-Service to enable compliance reporting

Page | 17© 2014 Altisource Labs. All Rights Reserved.

Mortgage Data Service

Page 18: Data-As-A-Service to enable compliance reporting

Page | 18© 2014 Altisource Labs. All Rights Reserved.

Event Details:Event Name: Loan DefaultContext: Current

Loan Default Event

Page 19: Data-As-A-Service to enable compliance reporting

Page | 19© 2014 Altisource Labs. All Rights Reserved.

Loan Default Event

Page 20: Data-As-A-Service to enable compliance reporting

Page | 20© 2014 Altisource Labs. All Rights Reserved.

Data Lake vs. Data Warehouse

Feature Data Lake Data warehouse

Data Volume Extremely large (Petabytes)

Large (Terabytes)

Access Methods NoSQL SQL

Schema Schema on read Schema on write

Scalability Scales horizontally Scales vertically

Hardware Commodity hardware Specialized hardware/ appliances

Data Structure Structured and unstructured

Structured

Data Raw Cleansed/Aggregated

Page 21: Data-As-A-Service to enable compliance reporting

Page | 21© 2014 Altisource Labs. All Rights Reserved.

Data Lake Technology Stack

Data StorageCassandra /HDFS/

Parquet

External Data Stores

ODBC/JDBC

In-houseAPI GraphX Mlib/

SparkR

Spark (DAG construct and execute engine)

RDD Instances/Schemas

Spark Streaming Hive/HQL Spark SQL 3 rd Party

Drivers In-house

Drivers

BI/ETL tools Services/ Application Portals

Interactive Analytics Portals

YARN

Page 22: Data-As-A-Service to enable compliance reporting

Page | 22© 2014 Altisource Labs. All Rights Reserved.

Benefits of Apache Spark based Data Lake

– Load data as its stored in the source system - no transformation needed– Build structure on it, apply Hive external tables on this raw data– Data sets built with our business logic– The intermediate and final results saved back to data storages– Working data sets saved as Parquet files– Distinction between data view and update view– When the data file changes in Hadoop or Cassandra, we have to update

the Hive or Schema RDD’s: then we are done.

-

Page 23: Data-As-A-Service to enable compliance reporting

Page | 23© 2014 Altisource Labs. All Rights Reserved.

– Abstract the details of data accessing through contexts/drivers. Hive/HQL Spark Sql Cassandra driver for Spark

– Unify the data into RDD interfaces. SchemaRDD HadoopRDD CassandraRDD

Data Storage Access Layer

Page 24: Data-As-A-Service to enable compliance reporting

Page | 24© 2014 Altisource Labs. All Rights Reserved.

Code Samples - Apply Hive Schema to Raw Data

Pour dataInto HDFS

Create Hive

Schema

Use HQL inside

Spark SQL

Save result in Parquet format

RDBMS’sExcel FilesDocumentsExternal Sources

Cluster Details:16 VM’s128 GB Memory126 GB Disk

Page 25: Data-As-A-Service to enable compliance reporting

Page | 25© 2014 Altisource Labs. All Rights Reserved.

The Spark Cluster

App App Service Service Tool Tool… … … …

Spark Driver

Worker Worker Worker Worker Worker Worker… … … …

Worker Worker Worker Worker Worker Worker… … … …

Data Data Data Data DataData… … … …

Storage Storage Storage Storage Storage Storage… … … …

Page 26: Data-As-A-Service to enable compliance reporting

Page | 26© 2014 Altisource Labs. All Rights Reserved.

Performance observations 10 18 Rows

4.5 hrs

48 minutes1 min

EngineeredSolutionsCores 128Memory 2048 GbDisk 12 Tb

In-memory DatabasesCores 160Memory 2048 GbDisk 12 Tb

Spark ClusterVM’sCores 128Memory 2048 GbDisk 12 Tb

Page 27: Data-As-A-Service to enable compliance reporting

Page | 27© 2014 Altisource Labs. All Rights Reserved.

– Open source Apache Spark, while very promising, has to mature– Spark production deployment is complicated– Security of data is not enterprise class, needs additional layers– Tools eco system is still developing – BI Tools still in developmentBut..– Done right has a lot of business value

– We are hiring engineers!

Challenges

Page 28: Data-As-A-Service to enable compliance reporting

Page | 28© 2014 Altisource Labs. All Rights Reserved.

Q & A


Recommended