Revolutionizingenterprise web
development
Prepare to Scale
Intro• Performance is critical when it comes to any web application and
Drupal is certainly no different.
• D6 performance out of the box is okay, but needs caching to shine.
• D7 actually has slightly poorer performance (maybe 20% slower) out of the box, but provides greater and easier flexibility for scalability down the road.
• We’ll walk through building a site-from 1 server to a multi-tiered infrastructure-with an eye to the future and common steps for improving performance over time.
• Performance: How fast pages are returned to a user.
• Scalability: How well a site can handle many users.
Basic InfrastructureSingle-Server
• Database & Application on the same server
• Start optimizing what you have
• Web Server
• Drupal
• PHP
• Database
• Optimizations you make for the first server will be applicable for future servers
• Strategy: Optimize what you have, then divert traffic through caching and specialization
Web Server
• Apache
• Standard, but bloated
• Lots of history; know things will work.
• Nginx
• Lighter
• Faster
• There are edge cases that sometimes make it unusable for the web server.
Drupal
1-word:
• Support for Database Replication• Support for Squid/Varnish• MySQL optimizations• PHP5 optimizations• http://fourkitchens.com/pressflow-makes-drupal- scale/downloads
• Currently ONLY relevant for D6. Most of the above has been incorporated into D7.
DB
MyISAM
• Relational database
• Default storage engine for <= Drupal 6
• Good for selects
• Read-only sort of websites
• Poor read-write performance, particularly for large websites where it can cause locking
DB, Cont.
InnoDB is your friend in most scenarios
• Relational Database
• Row-level vs Table-level locking
• Improves read/write functionality
• Does slow pure read functionality to some degree
• Default Store Engine of Drupal 7+
• Best bet at the moment for allowing your site to scale
DB, Cont.The horizon or ‘other RDBMS of note’
• Drizzle
• Rewrite from the ground up of MySQL
• Slightly poorer performance than MySQL InnoDB at low volumes but far better scalability
• No production release
• MariaDB
• ‘Drop-in’ replacement for MySQL
• Uses XtraDB instead of InnoDB
• Superior performance to MySQL
DB, Cont.Other DBs of Note (NoSQL)
• MongoDB
• Document-oriented DB
• Used by the Examiner
• D7 module for it
• Cassandra
• Column-oriented DB
• Facebook Inbox
• Eventual consistency
PHP
Opcode Caching
• Sort of like having a compiled version of your application
• Optimizes PHP components
• Stores the compiled PHP bytecode for execution in stored memory
• Result: Smaller PHP memory footprint (read: more users with less hardware) and faster execution of code
• Virtually a necessity for any large-scale/high-volume Drupal deployment
PHP, Cont.
Opcode caching
• eAccelerator
• Off & on maintenance
• Only works with threadsafe PHP
• Has – in my experience – led to some strange crashing, WSOD, etc.
• Xcache
• Reasonable performance improvement, though tends to performance test slowest of the 3
• Actively maintained
• Stable, but still prone to cache-corruption, SWOD, etc.
PHP, Cont.
Opcode caching, cont.
• APC
• Current opcode cache of choice
• Most actively updated
• Most stable of the 3
• Usually the winner in performance benchmarks
• Maintained by core PHP developers (Rasmus)
Static Caching
Static Caching Modules
• Creating and storing rendered versions of the html
• Rather than building the page on request
• Avoids having to load any aspect of your application depending on the implementation
• Acts as a layer between the user and actual execution of your program
• Alleviates DB issues since the DB is no longer involved
• Simplifies any PHP execution
Static Caching, Cont.Static Caching Modules, Cont.
• Boost Module
• Static file caching
• Good for Anonymous traffic only
• Great Performance for small sites
• Ideal for shared hosts
• AuthCache Module
• Static file caching
• Attempts to handle logged-in traffic
• Plays nice with and/or can utilize multiple caching engines
• Can be a bit of a pain for user-specific content as you have to write particular cases for each user-specific area
Static Caching, Cont.Static Caching Modules, Cont.
• Shameless plug: Ajaxify Regions
• Aptly-named….or not
• Actually pulls Blocks not Regions via ajax
• Early release w/plenty of work to do, needs more real-world testing etc.
• Automatically handles all user specific block content based on block-caching settings
• BLOCK_NO_CACHE
• BLOCK_CACHE_PER_USER
• BLOCK_CACHE_PER_ROLE
• Concept: ajax load anything that can’t be cached for everyone
Object-level CachingObject-level caching
• Provides a way to store fully-generated objects
• Can be the amalgam of many queries
• Think of all the queries run on a node_load vs retrieving all that information in 1 query.
• Stores the information in memory for fast access
• Performance characteristics not significantly different than MySQL when MySQL can handle the load
• BUT can handle a much higher load
• Protects the DB – the area most likely to inhibit performance for Drupal – from becoming overwhelmed
Object-level Caching, Cont.Object-level caching, Cont.
• APC
• Not a typo
• APC can handle object caching as well as op-code caching
• It’s fast: everything is stored in local memory
• It caches only for one server.
• This means that you could have synchronization issues between servers if you have more than one
• If that’s not an issue, it’s a quick and easy solution
• Ideal for single-server implementations or when synchronicity isn’t an issue
Object-level Caching, Cont.
• Object-level caching, Cont.
• Memcache
• Utilized by most high-profile sites
• Facebook, for instance, makes tremendous use of lots and lots of memcache servers
• Drupal.org uses it
• Provides an object cache that can be used by multiple servers
• Slower in the single-server instance than APC, but provides synchronicity
• Multiple silos/buckets can be created for information so you can distribute information across multiple servers
Advanced Infrastructure (ex)
Load Balancer
19
Application (Drupal)
Database
Solr
Memcache
Deployment
Slave DB19
Static-Caching
GlusterFS
Specialization
Specialized Servers/Services
• DB Server
• SOLR
• Memcache
• Static-caching
• CDN
• GlusterFS
Specialization
MySQL Server
• One of the fastest ways to improve performance is to separate your MySQL DB from your application
• This allows both your application and your db to make full use of independent hardware
• The change is basically transparent at the application layer: just single change to settings.php
Specialization
Search
• Problem: Search is incredibly hard on the system
• Particularly w/ multiple search terms
• Drupal search works, but despite great efforts is still not as quick or useful as an outside solution
• Search is particularly hard on the DB, Drupal’s traditional bottleneck
• In other words, search makes a bad problem worse
SpecializationSearch, Cont.
• Solution: Solr
• Communication layer between the website and the Lucene search index
• Offloads all of the complex processing to a search
• More power for searches (search faster!)
• Doesn’t lock up your website DB
• Website can focus on what it does, search can focus on what it does
• Additional benefit: faceting (filtering), sorting
• Ability to search content based on specific criteria (content type, author, taxonomy terms) and sort based on criteria (title, date, author, content type)
• Hosted model (Acquia Search) or can be installed on server in your infrastructure
Specialization
Static Caching
• Static-caching on the same server as the website provides performance improvement
• Downside: there’s still a lot of wasted overhead, apache has everything it needs for a website, not just serving html; php also has to load
• Static-caching elsewhere provides the opportunity to optimize the server for static-caching
• Side effect: your web server now has more memory free to handle requests that require php processing
• D6 does not, but Pressflow and D7 provide capabilities for leveraging external caching services.
Specialization
Static Caching, Cont.
• Squid
• Free
• Not Specifically designed just for http acceleration
• Difficult to setup/configure
• Performance improvement, but less than competition
Specialization
Static Caching, Cont.
• Varnish
• Free (to download)
• Pressflow/D7 built to work w/ Varnish
• Varnish servers set up for Drupal and usable off Amazon EC2 (developed by Chapter 3) ($.34/hr + $.17/GB)
• Designed from the group up for http acceleration
• Can take time/expertise to get the performance you want
• Can create a significant performance improvement once configured correctly
• Most popular + off-the-shelf/aws implementations
Specialization
Static Caching, Cont.
• AI-Cache
• Best performance of the bunch
• Simple configuration
• Provides additional features for caching
• Header recognition
• Session caching
• Drop-in solution
• Not free
• Amazon EC2 instance is available ($.68/hr + $20/GB)
SpecializationCDN
• Cache content that is static (outside of full pages)
• Images
• Video
• CSS
• JS
• Popular examples
• Akamai
• LimeLight
• Amazon CloudFront
• Separate domains, more bandwidth, geographic servers all equal faster loading
• Can be an expensive option
Summary• Start small and make the easy optimizations:
• Pressflow/D7
• InnoDB (D7 by default)
• APC
• Add servers and services as necessary and based on individual traffic:
• MySQL
• SOLR
• Memcache
• Static Cache
• CDN
The End
• Questions?
Thank YouBill O’Connor, CTOd.o: csevb10t: csevb10e: [email protected]