Scaling Blackboard Learn™ for High Performance and
AvailabilityStephen Feldman
Sr. Director Performance, Security and Architecture
Quick Bio
• Blackboard since 2003• Performance Engineering
from the start• Platform Architecture in
2005• Security Engineering in
2010
“Love my job…love my team. If you email me, I will respond.”
@seven_seconds
http://goo.gl/Z4Rq5
A Quick History Lesson of Bb…
• First release was 6.0.11 launched within a few weeks of arriving.
• Technology shift from Perl to Java through Release 5 and Release 8.– Blackboard was the largest PerlEx ISV in the world in 2003.
• Customers were having issues with optimizing Java, Oracle and SQL Server
• First benchmark was at Sun in 2004 called the Tunathon.– Learned that Blackboard Learn could scale and could scale
to high-levels with a little TLC.
As We Started Growing and Scaling
• In late 2004, we started building the Ref Arch as a model for customers
• Proved it out in benchmarks, as well as our own hosting facilities.
• We needed other players to come in and work with us to help us learn and validate a solution
• Key to our success: aggressive port from Perl to Java, earliest adoption of technologies: Solaris10, Oracle 10g, RHL 4 and 5, SQL 2005 and Java 5/6– Willingness to adopt virtualization very early on– Willingness to open our technology stack for
affordable solutions such as NFS and CIFS
Where We Are Today
• We have multiple customers supporting nearly 1 million users and dozens well over 250k live production users.
• Our benchmarks have been successful supporting over 1 million users with greater than 100k simultaneous sessions with sub-3s response times.
• The majority of our customers have benefitted from the Reference Architecture and have completely transformed their deployment to support the adoption and growth of the product.
In The Beginning: RefArch I
Focus of RefArch I
• Distribution of application and database– Need for load-balancing the application server– Early JVM clustering
• Fiber Storage and High-Speed Disks– Low-cost option to use JBODs
• Basic operational monitoring– Hardware, Network and Storage– Database
• Keep it simple and you will succeed
A Few Years Later Came RefArch II
Federated ApplicationsEnterprise SearchOther WSI ...
Application Layer
Enterprise Storage
· Optimization· Backup· Recovery· Growth
Analysis
SNMP
ManagementMonitoring
Integration
Even
t-Driven
M
gm
t.A
dvan
ced
Rep
ortin
gB
ehavio
r M
od
el Stu
dies
SIS & Back office
B2 Partners
Campus Systems
Publishers
Directory Svcs.
SSO
Portals
SMS & MobileCam
paig
n
Mg
mt.
Database Layer
Blackboard Reference Architecture
User Experience
Virtualization
MonitoringManagementClustering
Load Balancing
Beyond ServicesBlackboardInstitution
AccessSecurityIdentify
…Then Marketing Got their Hands on It
Focus of RefArch II
Infrastructure
• Virtualization• Blade Computing• NFS/CIFS/ISCSI
Storage• Mobile Access• Identity
Management
Monitoring Services
• User Experience Monitoring
• Enterprise Infrastructure Monitoring
• Database Trending• JVM (JMX
Monitoring)• Synthetic
Monitoring
Optimization
• 64-bit Computing• Compression/
Caching• Image
Optimization• JVM Optimization• Database Wait
Event Tuning
What are we modeling today and future…
Large Connected Communities• 100’s to 1000’s of
Concurrent Requests
Heavy Adoption of Advanced Tools• Emphasis on Mobility and
Synchronous Computing
Extended/Frequent Time in System• Ubiquitous Access
Richer Content and User Experience• Instantaneous and
Immediate Expectations
Reference Architecture III
Unified Approach Working Together
RefArch1 RefArch2 RefArch3
No Longer Center, but Parallel…
Introducing RefArch III
Identity & Access Management
Logging and Monitoring Cloud Services Secure Performance Immunity
Analytics
Web OptimizationMobility Virtualization & Provisioning
Data Management
Now Comes RefArch III
AccessibilityUbiquitous Access and
Mobility
Cloud Service Management
SAAS Application Integration
Cloud-Based Benchmark/Testing
Web Optimization/Accelerati
on
ConfidenceAdvanced System
Provisioning
Enterprise Monitoring Lifecycle Management
Enterprise Logging
Institutional Analytics
Secure Management/Infrastruct
ure
Defining SLAs
Performance
Scalability
Availability
The amount of useful work accomplished by a computer system compared to the time and resource used.
The ability for a distributed system to expand by accommodating greater levels of load while maintaining similar levels of performance.
The capability to service a functional request without issue under conditions of desired performance and workload scalability
Defining SLAs
Define Metrics: Goal
Setting
Identify Method of Gathering:
Isolate Tools and Processes
Implement Instrumentation: Begin Measuring
Align to KPI/ROI: Share
with Stakeholders
Recommend Changes:
Show Business Value
Reset Expectations:
New Initiatives
What is Performance?
• Performance is quantifiable and measureable• Performance is also perception• Mostly recognized from a cognitive perspective
– Instantaneous– Immediate– Continuous– Captive
Response Time Latency Performance
What is Scalability?
What is Availability?
• High-availability offerings mask the effects of a system failure in order to minimize the impact of access and functional use of a system to a community of users.
• Simple Definition:– Percentage of time the system is in its operational state.
• You will often hear the concept of 3x9’s, 4x9’s or even 5x9’s– Planned versus Unplanned
• Availability = (Total Units of Time – Downtime) / Total Units of Time– 8760 hours in a year– Downtime = 10 hours– Availability = (8760 – 10)/8760 = 99.88%
Quick View into Availability StatisticsAvailability Percentage Model Unexpected Downtime per Year
90% 36.5 days
95% 18.25 days
98% 7.30 days
99% 3.65 days
99.5% 1.83 days
99.8% 17.52 hours
99.9% 8.76 hours
99.95% 4.38 hours
99.99% 52.6 minutes
99.999% 5.26 minutes
99.9999% 31.5s
Automated Provisioning
• Simple routine of provisioning systems• Master processes and reduce human error• Balance workloads• Quick recovery• Emphasis on efficient computing
Complete Monitoring and Logging Solutions
Performance• User
Experience Monitoring
• Application Lifecycle Management
• Database Wait Event Monitoring
Scalability• Infrastructure
Resource Monitoring
• JMX Monitoring
• Database Trending
• Log Management
Availability• Infrastructure
Trending• Remote
Synthetic Monitoring
Application Lifecycle Management
• True application insight and visibility• Business processing mapping to transaction
SLAs• Multi-layer correlation• Transaction workflow mapping
Web Optimization Services
• Typical Optimization Services– Compression– Domain Sharding– Minification– Consolidation– Inlining– Asynchronous JavaScript– Response Prediction– Browser Caching
Present and Future of Caches
• Caches are used throughout Blackboard Learn to manage the life and reuse of data.– Leveraging ehCache presently in Release 9.1
• Caches can and should be controlled via the cache-settings.properties file– Insight into the caches can be achieved in the
Admin Console and other JMX tools.
• Next generation of caches: pluggable caches (use your own) and distributed caches
Steve Feldman@seven_seconds
Please provide feedback for this session by [email protected].
Scaling Blackboard Learn™for High Performance and Delivery