Date post: | 24-Jan-2015 |
Category: |
Business |
Upload: | analyticsweek |
View: | 58 times |
Download: | 0 times |
Page | 1© 2014 Altisource Labs. All Rights Reserved. © 2014 Altisource Labs. All Rights Reserved.
Data as a service to enable compliance reporting
Girish Juneja, CTO October 7, 2014
Page | 2© 2014 Altisource Labs. All Rights Reserved.
Chairman: William C. Erbey
CEO: William B. Shepro
Employees: ~8,000
NASDAQ: ASPS
Market Cap:(Sept. 15, 2014)
$2.2 Billion
Performance since August 2009Separation from Ocwen®
CAGR Share Price: (Through Sept. 15, 2014)
47%
CAGR Service Revenue:(Through Sept. 30, 2013)
39%
Separated from Ocwen in August 2009
Created and separated RESI and AAMC in December 2012
Strong free cash flow
Strong growth prospects in very large markets
Altisource Overview
Page | 3© 2014 Altisource Labs. All Rights Reserved.
Altisource Vision
Vision To be the premier real estate and mortgage marketplace offering both content and distribution to the marketplace participants
MissionTo offer homeowners, buyers, sellers, agents, mortgage originators and servicers trusted and efficient marketplaces to conduct real estate and mortgage transactions, and improve outcomes for market participants
Home Sales
Home Rentals
Home Maintenance
Mortgage Originations
Mortgage Servicing
Mortgage MarketplaceReal Estate Marketplace Mortgage Marketplace
Page | 4© 2014 Altisource Labs. All Rights Reserved.
Increased RiskIncreased costs
Increased penalties and fines
Decreased customer satisfaction
COMPLIANCE- Velocity of new and
changing rules- Magnitude of financial
exposure- Existing technology limits
compliance capabilities
COMPLEXITY- Meeting borrower
/customer expectations- Elevated scrutiny of
borrower interactions- Proliferation of servicer
products- Reporting requirements
CHANGE- Lack of end-to-end visibility- Rigid and inflexible systems- Volume and nature of data
interoperability between data silos
State of the Business: Servicing
Co
mp
lian
ce
Page | 5© 2014 Altisource Labs. All Rights Reserved.
Future of Servicing
For servicers’ businesses to grow, a modern servicing platform must be:
Flexible and Adaptable
to easily and cost effectively respond to evolving market and
business dynamics
Scalable and Automated
to enable cost effective business growth
Interoperableto seamlessly interface with third
party apps and other software platforms
Compliance Centric
to meet ever-changing regulatory mandates
Analytical to drive continuous improvement and manage risk
Page | 6© 2014 Altisource Labs. All Rights Reserved.
Common Foundational Layer
Identity Mgmt
Single Sign-on
Authorization & RBAC
Entitlements
Encryption
MFA Authentication
Access Governance
User Profile
Rules Mgmt
Workflow Mgmt
Messaging
Notification & Subscription
Search
3rd Party Integrations
Metadata ManagementMaster Data Management
Reporting
Auditing
Data Archival
Warehousing
Data as a Service
Provisioning
Monitoring/Alerting
Backup/Restore
Configuration & Customization
Elastic Performance
Availability/DR
Metering
Cloud Abstraction layer
Deployment & Test Automation
App Provisioning, isolation, multi-tenancy & Life-cycle
Menus & Navigation API Management Caching DMZ Gateway
Service Registry
High performance scala based framework
Multi-tenant Compliance &
Security Framework
Workflows, Business & Compliance
Rules, Messaging, Integrations
CompliantData
Management Transactional, Reporting &
Analytics
Multi-tenant Operational Framework
Multi-tenant Cloud Provider Independent Rapid Deployment PaaS
Customer Experience
Page | 7© 2014 Altisource Labs. All Rights Reserved.
Financial Industry faces:– Increasing regulatory requirements– Increasing customer compliance requirements– Regulatory & customers’ changing requirements need correlating data across sources– Risk Analysis requires correlating internal and non-conforming structured external data sets– Existing data stores unable to respond rapidly
Organizations need solutions that:– Enable financial institutions to address changing regulations– Enable automated compliance monitoring processes and systems– Improve internal controls by retaining data lineage to unmodified source datasets– Provide actionable & timely information from data– Enable on-demand reporting on massive datasets based on schema defined with the
reporting request
Environment Overview
Page | 8© 2014 Altisource Labs. All Rights Reserved.
– Maintain data integrity by storing the organizational data with regulatory information in respective data dimensions.
– Data modeling and the design of facts and dimensions are very critical to the success of compliance data warehouse.
– The regulatory sufficiency needs to be maintained in the regulation dimension of the compliance data warehouse.
– From data warehouse data is transmitted into a regulatory data mart.– Transformation of base data elements and regulation rules are maintained
in the meta data.
The Traditional Enterprise Data Warehouse approach
Page | 9© 2014 Altisource Labs. All Rights Reserved.
Metadata, ETL & Core Model
Metadata– Reflects the business and business
processes– In EDW, all functionality is metadata driven
Data definitions• Source and Core model from technical
perspective• business perspective
– Transformations & Aggregations• Transformations to derive cleansed Core data
from source data• Aggregations and de-normalizations to create
Access model data
Loading Service– Reflects the business and business
processes– In EDW, all functionality is metadata driven
Data definitions• Source and Core model from technical
perspective• business perspective
– Transformations & Aggregations• Transformations to derive cleansed Core data
from source data• Aggregations and de-normalizations to create
Access model data
Access Model– Data in the Access model is directly
traceable to the Core model• De-normalized• Aggregated • Designed for query and access
performance.• End user requirements• Access restrictions and controls
– As a design decision, access model objects can either be physical structures, or structures materialized on access.
Core Model– Control
• Only approved, tested, and validated processes can update data in the Core model.
– Data Model• Highly Normalized• No redundancy• Subject areas, loans, investors..• Not optimized for apps• target data formats post-cleansing• It is optimized for efficiency and
correctness.
Page | 10© 2014 Altisource Labs. All Rights Reserved.
Each Change involves the following steps:– Update the extraction module– Update the staging module– Update the Transformation module– Update the metadata– Update the data repository
This process is tedious, involved, and brittle.
The Change/Update Process
Page | 11© 2014 Altisource Labs. All Rights Reserved.
We were driven to adopt big data technology for many reasons:– Demand to analyze new data sources in an ever shorter timeframe– Growth in data complexity– Variety of data types– Volume of data and inability to move it around due to time constraints– Velocity of data generation, internal and external– Veracity of data from multiple sources– Growth in analytical complexity– Increasing availability of cost-effective computing and data storage
Enhancing with Big Data Technologies
The big reason for us was the frequent change of requirements due to changing business & regulatory changes. Spark is a more flexible platform
Page | 12© 2014 Altisource Labs. All Rights Reserved. © 2014 Altisource Labs. All Rights Reserved.
Data as a Service
Page | 13© 2014 Altisource Labs. All Rights Reserved.
Data Mobilization View for Data Lake
Page | 14© 2014 Altisource Labs. All Rights Reserved.
Request Details:Service Name: Borrower DataContext: Current/Cleansed & Conformed/HistoryRequest Filter: <Borrower Name>, <Date Range>
Response Details:
Borrower Data Service
Page | 15© 2014 Altisource Labs. All Rights Reserved.
Borrower Data Service
Page | 16© 2014 Altisource Labs. All Rights Reserved.
Request Details:Service Name: Mortgage DataContext: Current/Cleansed & Conformed/HistoryRequest Filter: <Loan Number>, <Date Range>
Response Details:
Mortgage Data Service
Page | 17© 2014 Altisource Labs. All Rights Reserved.
Mortgage Data Service
Page | 18© 2014 Altisource Labs. All Rights Reserved.
Event Details:Event Name: Loan DefaultContext: Current
Loan Default Event
Page | 19© 2014 Altisource Labs. All Rights Reserved.
Loan Default Event
Page | 20© 2014 Altisource Labs. All Rights Reserved.
Data Lake vs. Data Warehouse
Feature Data Lake Data warehouse
Data Volume Extremely large (Petabytes)
Large (Terabytes)
Access Methods NoSQL SQL
Schema Schema on read Schema on write
Scalability Scales horizontally Scales vertically
Hardware Commodity hardware Specialized hardware/ appliances
Data Structure Structured and unstructured
Structured
Data Raw Cleansed/Aggregated
Page | 21© 2014 Altisource Labs. All Rights Reserved.
Data Lake Technology Stack
Data StorageCassandra /HDFS/
Parquet
External Data Stores
ODBC/JDBC
In-houseAPI GraphX Mlib/
SparkR
Spark (DAG construct and execute engine)
RDD Instances/Schemas
Spark Streaming Hive/HQL Spark SQL 3 rd Party
Drivers In-house
Drivers
BI/ETL tools Services/ Application Portals
Interactive Analytics Portals
YARN
Page | 22© 2014 Altisource Labs. All Rights Reserved.
Benefits of Apache Spark based Data Lake
– Load data as its stored in the source system - no transformation needed– Build structure on it, apply Hive external tables on this raw data– Data sets built with our business logic– The intermediate and final results saved back to data storages– Working data sets saved as Parquet files– Distinction between data view and update view– When the data file changes in Hadoop or Cassandra, we have to update
the Hive or Schema RDD’s: then we are done.
-
Page | 23© 2014 Altisource Labs. All Rights Reserved.
– Abstract the details of data accessing through contexts/drivers. Hive/HQL Spark Sql Cassandra driver for Spark
– Unify the data into RDD interfaces. SchemaRDD HadoopRDD CassandraRDD
Data Storage Access Layer
Page | 24© 2014 Altisource Labs. All Rights Reserved.
Code Samples - Apply Hive Schema to Raw Data
Pour dataInto HDFS
Create Hive
Schema
Use HQL inside
Spark SQL
Save result in Parquet format
RDBMS’sExcel FilesDocumentsExternal Sources
Cluster Details:16 VM’s128 GB Memory126 GB Disk
Page | 25© 2014 Altisource Labs. All Rights Reserved.
The Spark Cluster
App App Service Service Tool Tool… … … …
Spark Driver
Worker Worker Worker Worker Worker Worker… … … …
Worker Worker Worker Worker Worker Worker… … … …
Data Data Data Data DataData… … … …
Storage Storage Storage Storage Storage Storage… … … …
Page | 26© 2014 Altisource Labs. All Rights Reserved.
Performance observations 10 18 Rows
4.5 hrs
48 minutes1 min
EngineeredSolutionsCores 128Memory 2048 GbDisk 12 Tb
In-memory DatabasesCores 160Memory 2048 GbDisk 12 Tb
Spark ClusterVM’sCores 128Memory 2048 GbDisk 12 Tb
Page | 27© 2014 Altisource Labs. All Rights Reserved.
– Open source Apache Spark, while very promising, has to mature– Spark production deployment is complicated– Security of data is not enterprise class, needs additional layers– Tools eco system is still developing – BI Tools still in developmentBut..– Done right has a lot of business value
– We are hiring engineers!
Challenges
Page | 28© 2014 Altisource Labs. All Rights Reserved.
Q & A