Date post: | 12-Apr-2017 |
Category: |
Software |
Upload: | ram-g-athreya |
View: | 337 times |
Download: | 0 times |
A Public Cloud Based SOA Workflow for Machine Learning
Based Recommendation Algorithms
Presented By Srinivasan Thanukrishnan, Founder & CEO
Ram G Athreya, Research InternGlosys Technology Solutions Pvt. Ltd.
Chennai, India
5th IEEE International Conference on Cloud & Service Computing – SC2 2015
Outline
• Challenges
• Motivation
• Proposed System
• Design Components
• Experimental Results
• Conclusion
• Future Work
Challenges• Existing workflows for web based (SOA)
applications elaborate only on certain aspects such as– Cloud Computing– Backend– Frontend– Development & Testing– Machine Learning
• How do all these work together at a big picture level?
Motivation• Our aim is to combine these vast and disparate fields and
provide a cohesive framework for building such applications
• We propose a multi layered architecture that will simulate the structure of a cloud environment in terms of frontend and backend, and examine how it can leverage Machine Learning
• For completeness we created an actual Retail Application which integrates the above technologies
Proposed System• It comprises three major modules which are
– Product Information System (PIS)
– Analytics Based Inventory Management (ABIM)
– Transaction Based Analytics (TBA)
• To build a framework that will simulate the architectural setup of an E – Commerce site
• To examine how it can improve its sales by employing intelligence
• To derive a general workflow on how such systems can be built end-to-end starting from the user interface up to the machine learning algorithm that powers it in the backend
Design Components
• Cloud Architecture• Back End Application Stack• User Interface Design• Development Environment• Load Testing• Machine Learning Based Recommendation
Algorithms
Cloud Architecture
• Core components of the Cloud Architecture are– Content Delivery Network (CDN)
– Load Balancer
– Server Instances
– Storage Services
Content Delivery Network (CDN)
• It is a large distributed network of servers across geographies
• It serves assets such as images, css, js
• The CDN caches requests• Thus load to origin server
is reduced
Load Balancers• To optimize resource use, maximize throughput, minimize response
time
• It employ round-robin or least recently used algorithms to route internet traffic
• The ability of auto-scaling
• During a traffic spike it automatically increases the number of application server instances
Server Instances• It generates responses with the help of backing services
• It contains only application code which is version controlled
• Creating a new instance is as simple as checking out the latest version of the codebase and deploying it within an instance
• They are commodity servers which can be scaled on demand
Storage Services
• Storage Services– Database
– Middleware (Cache)
– Static Assets (AWS S3)
Storage Services• To protect the database and ensure its availability, a
Master-Slave setup is required
• All database write operations happen at the master and are replicated to the slaves, while the read operations are carried out on the slave instances
• If master fails one of the slave nodes becomes the new master
Storage Services• The cache lies between the application servers and
the database
• It has in-memory (RAM) storage
• This ensures speeding up of requests since fewer queries hit the database
• AWS S3 was used to store static assets such as images, css & js
Cloud Architecture
Back End Application Stack• Model-View-Controller (MVC)
– Promotes the principle of ‘separation of concerns’
– The Model is responsible for managing the data required by the application
– The View is responsible for presentation of data triggered by a Controller action
– Template systems are used to embed dynamic data within the HTML structure of the View
– The Controller is responsible for responding to user requests
Back End Application Stack
User Interface Design
• Three technologies come into play in this regard
– HTML
– CSS
– JS
HTML
• Basically a set of tags within which content is placed
• Starts with <html> tag
• Has two major sections which are <head> and <body>
• <head> contains metadata
• <body> contains all the content
CSS• It achieves this in the form of rules that
are defined on HTML selectors
• Additionally LESS a CSS pre-processor is necessary
• LESS provides additional features such as variables, functions and mixins etc
• This makes CSS more maintainable, themable and extensible
JS• For dependency management, Bower is
used which is package manager for browser development
• Require.js is a library for asynchronously loading Javascript dependencies within a web page
• jQuery is used for DOM Navigation, Event Handling and AJAX calls with the server
Responsive Web Design
• Designing for large variety of devices with varying screen sizes and resolutions is difficult
• To support multiple devices, a web design methodology called responsive web design (RWD) was used to provide optimal viewing experience across a wide range of devices
• RWD achieves this capability with the help of CSS3 Media Queries which is a W3C Recommendation
User Interface Design
Development Environment• Ideally, the development
environment must be similar to the production environment
• Vagrant is a Free and Open Source Software (FOSS) for creating and configuring virtual development environments
• This setup ensures that environment related bugs are kept to a minimum
Development Environment
• Any Software Project would involve multiple developers working together.
• That fact brings about the need for a version control system (VCS) since a version control system makes tracking changes easy
• The Git Distributed Version Control System (DVCS) was used to commit and track code changes and was hosted in a GitHub repository
Load Testing
• Developers typically measure a Web application’s quality of service in terms of response time, throughput, and availability
• Load testing measures an application’s QoS performance based on actual customer behavior
• When customers access the site, a script recorder uses their requests to create interaction scripts
• A load generator then replays the scripts, possibly modified by test parameters, against the website
Load Testing
Machine Learning Based Recommendation Algorithms
• To illustrate the intelligence portion of the system, Apriori and sequential pattern based machine learning algorithms were employed
• Both algorithms take the transaction data of user purchases as input based on which each algorithm individually makes predictions on what the user might buy next
• Although both algorithms try to find frequently occurring patterns in the dataset, they employ different methodologies and hence come up with slightly different results
• The algorithms were implemented using the R programming language
Apriori Based Algorithm
• The Apriori algorithm takes the historical transaction data of users (stored in the database) so that it can identify frequently occurring itemsets that can then be formulated into association rules.
• For example a rule might be where a user who buys a smartphone is also likely to buy earphones, that is {smartphone} => {earphone}
• Such a rule can be found by the algorithm if there are enough transactions to support it
• These rules ultimately become insights on what the user might do next and can be given as product recommendations within the application
Apriori Based Algorithm
Sequential Pattern Mining Based Algorithm
• The sequential pattern mining algorithm also attempts to mine relevant patterns from available data, but it additionally takes the order of the pattern into account
• The algorithm tries to find patterns based on the order in which transactions take place
• There are many variations of the sequential pattern mining algorithm, the one used by the program is called SPADE (Sequential PAttern Discovery using Equivalence classes)
Sequential Pattern Mining Based Algorithm
Cloud Based Development Environment
Technology Software/Tool
CDN CloudFront
Load Balancer HA Proxy
Server Instances Ubuntu 14.04
Distributed Cache Redis
RDBMS MySQL
Assets Storage AWS S3
Orchestration & Provisioning Chef
Backend Development Environment
Technology Software/Tool
Server Language Node.js
Server Package Management NPM
MVC Framework Express.js
Template Engine Jade
ORM Node-ORM
Redis Library Node_redis
Authentication Passport.js
Front End Development Environment
Technology Software/Tool
Content HTML
Presentation CSS
Interactivity JS
RWD Framework Bootstrap
CSS Pre-processor LESS
Frontend Framework jQuery
Frontend Package Management Bower
Dynamic Script Injection Require.js
Other ToolsTechnology Software/Tool
Machine Learning Tool R
Load Testing Apache Jmeter
Development Virtualization Vagrant
VCS Git
VCS Hosting GitHub
Experimental Results
Experimental Results
Experimental Results
Experimental Results
Conclusion
• We presented a workflow for creating online applications deployed in a cloud environment
• We looked at its cloud architecture in exhaustive detail and how different cloud appliances such as virtual machines, load balancers etc interact with each other
• We also focused on how such an application is built from the ground up, including its backend architecture, user interface built using the responsive web design technique as well as its development workflow
Conclusion• For completeness, we examined how such a cloud application
should be tested to ensure its reliability at scale
• Finally, we explored how such a system could leverage the vast amounts of data it collects and employed Apriori and sequential pattern based machine learning algorithms to generate insights about its users
• Using these insights the application can better assist its customers by providing relevant and timely recommendations based on their behavior
Future Work
• In future works, we plan to explore the performance of algorithms used in such a cloud application and interoperability between two or more algorithms and the usage of a more distributed architecture, such as Hadoop for the machine learning setup
Thank You
May 3, 2023 Data Mining: Concepts and Techniques 44
The Apriori Algorithm — Example
TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5
Database D itemset sup.{1} 2{2} 3{3} 3{4} 1{5} 3
itemset sup.{1} 2{2} 3{3} 3{5} 3
Scan D
C1L1
itemset{1 2}{1 3}{1 5}{2 3}{2 5}{3 5}
itemset sup{1 2} 1{1 3} 2{1 5} 1{2 3} 2{2 5} 3{3 5} 2
itemset sup{1 3} 2{2 3} 2{2 5} 3{3 5} 2
L2
C2 C2Scan D
C3 L3itemset{2 3 5}
Scan D itemset sup{2 3 5} 2