Post on 15-Jul-2015
transcript
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ahead
Darren Spehr – System Architect
MapQuest Going strong … since 1967 • Maps • Directions • Routing • Geocoding • Mobile • B2B
Every Adventure has a Beginning
Our mobile client needs an overhaul … Oh, and we need an auto-correct feature … well, auto-complete … actually, search ahead
… Top Secret Meeting Minutes
• How do we use auto-‐complete today? • What are we searching over? • How fast can a person type? • What are we going to say in response? • When do we have to launch this?
Characteristics
• Searches march from le; to right • Expect the first term to be highly relevant • Term order and proximity are clues • Spaces are now really important • Expect mixed query types • AbbreviaCons and misspellings are common • People can type really fast (but generally less than 10 keystrokes per sec) • Users frequently want to browse
Requirements
Fast, Like really fast
140 milliseconds maximum response time
Methodology
Some opCmizaCons can be planned
Others need to be discovered Test alternaCves – opCmize low hanging fruit early
Finally: Take it to task
Multiple Types Possible
The Data: Categories Franchises Locations
• Neighborhoods to Countries Points of Interest
• Airports • Businesses • Landmarks
Addresses • Individual • Block (Interpolated)
In all – over 10 Billion unique documents
Architecture
Solr Clusters API Mobile
Client
Mobile App API-‐East
Targeted
LocaCon
Business
Address 1
Address 2
API-‐West
Targeted: 4 VMs 1 shard, 283,000 docs
Frequent Low Volume Updates
Loca7on: 3 VMs 1 shard, 4.3 million docs
Frequent Low Volume Updates
Business: 5 VMs 1 shard, 13.4 million docs
Heavy Updates
Address: 30 VMs 10 shards, 100 million docs
No Updates
Interpolated Address: 30 VMs 10 shards, 10 billion docs*
No Updates
Special Cases
Business data Ø Complex synonyms Ø Stemming needs Ø The memory factor Ø Complex query patterns Addresses Ø So many! Ø Nested structure Ø Interpolated positions Ø Updates an issue
Airports Ø Airport codes Ø International issues Locations Ø International issues Ø Relevance
Move Analysis to the ETL
A typical job includes: • Basic text processing / cleansing • Stemming • Synonyms and subsCtuCon • Cloning • Filtering • Various permutaCons • RegionalizaCon • Pre-‐calculaCng relevance
Custom Doc Routing
Address data won’t fit in memory or perform well … Both collections are sharded so the size on disk is around 6-8 GB Initial, naïve balancing wasn’t nearly good enough Optimization problem that accounts for: - Size on disk - Predicted query volumes - FST load (entropy)
Setting Up the Indexes Clean up schema.xml and solrconfig.xml Exact and Fuzzy queries tested – String fields WIN!
(Thank you FST and prefix queries!) Geo-sensitivity made easy using Spatial4J
(Thank you David Smiley!) Optimization required No NRT functionality needed
Query-Time Considerations Jetty -‐ <New class=“java.uCl.concurrent.ArrayBlockingQueue”> -‐ Limit thread pool based on projected need
Filters used judiciously Pull in a single field from the indexes for display. Shard/route aware clients used for Addresses Estimate caching needs
The API has to be Fast Too Pool as many resources as it makes sense A Note on connection pools: - The DefaultHttpClient avoids key registry overhead - Ask for keep-alive support - Balance pool according to use Thread level caching used to avoid ClassLoader overhead Take out some insurance with TTLs
Executors HfpClient Solr Query
Keep Queries Simple Federate a larger number of queries
Break queries out by type and expectation Use custom search handlers to move the burden of “tough” queries to Solr Special case: Ø Interpolated Addresses Ø Business Names
Collec7on Query Count
Category 3 Franchise 3 Airport 1
CriCcal Address 1
LocaCons 4
Businesses 3 Addresses (both) 2 each
At this point the service is up and running … but the fun has only begun
Getting Ready to Test Choose your tool set … Ø Test Suite (JMeter) Ø Application Monitoring (VisualVM) Ø GC Monitoring (VisualGC) Ø On Host tools (top, pidstat) Ø Runtime exposure (JMX, jsvc) Ø Offline analysis (JMeter, GCHisto)
Set Boundary Conditions Production Query Volume • What is the expected peak QPS • Estimate 50th, 75th percentiles Know what success looks like: • What availability are you looking for? • What about latency? • Caching success? Know what failure looks like: • When do you consider a machine maxed out?
So, Let’s Talk About …
Memory Settings Max Heap = anticipated index size in memory + delta for new gen Min Heap = Max Heap to limit HotSpot optimizations
-Xmx = -Xms Sizing the new generation (-Xmn): Ø Start with around 1/3 of your heap size Ø Set the Survivor space (-XX:SurvivorRatio=15) Determine the Eden Space: eden = -Xmn - 2 * ( -Xmn / 15 )
Example 7 GB Index + 3 GB for new generation: -Xms10G -Xmx10G -Xmn3G -XX:SurvivorRatio=15 -XX:PermSize=64m -XX:MaxPermSize=64m Survivor Size: 3 GB / 15 = 205 MB Eden Space: 3 GB – 2 * 205 MB = 2.7 GB
Baseline JVM Settings Simple and verbose -verbose:gc -XX:HeapDumpPath=/logs/solr_heap.hprof -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -Xloggc:/logs/gc.log -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
Other Settings We Use
TargetSurvivorRatio=70 MaxTenuringThreshold=5 PretenureSizeThreshold=64m CMSFullGCsBeforeCompaction=1 CMSInitiatingOccupancyFraction=70 CMSTriggerPermRatio=80 CMSMaxAbortablePrecleanTime=6000
+CMSScavengeBeforeRemark +UseCMSInitiatingOccupancyOnly +CMSParallelRemarkEnabled +ParallelRefProcEnabled
Establish Single Test Thread Settings
<ConstantThroughputTimer guiclass="TestBeanGUI" testclass="ConstantThroughputTimer" testname="Constant Throughput Timer" enabled="true"> <intProp name="calcMode">0</intProp> <doubleProp>
<name>throughput</name> <value>1600.0</value> <savedValue>0.0</savedValue>
</doubleProp> </ConstantThroughputTimer>
Test Cycle
Monitor Record Evaluate ?
JVM Page Faults
CPU GC Rates Threading
Context Switches Locks
Swapping Network Traffic
Availability Throughput Latency
Thread Count
Have I met my exit condiCons?
Add More Threads
Monitoring the JVM Watch your application come to life! Memory Steady States: • Old GeneraCon: ⅓ to ¼ the size of your sepngs • Permanent GeneraCon: ½ its size
Tenure histogram sizes should drop off … this is your ideal level
0 2000 4000 6000 8000
10000 12000 14000
1 2 3 4 5
Tenure Size
Tenure Size
Monitoring Solr Caches The UI is a wealth of information! Cache Strategy Ø Size Ø Type Look at the hit and eviction statistics Use “binary sizing” to walk the sizes up until there are diminishing returns
<filterCache class="solr.LRUCache" size="8384" initialSize="8384" autowarmCount="0"/> <documentCache class="solr.LRUCache" size="8384" initialSize="8384" autowarmCount="0"/>
JVM Tuning Strategies Smaller eden spaces result in: Ø more frequent minor GCs Ø a higher probability of premature promotion Ø the best performance Watch out for too much eager promotion and lengthening major GCs Mitigate major GC STW pauses by: Ø Keeping the old generation as small as possible Ø Maybe even a little smaller Ø Turn off swapping Ø Consider explicit GC
Bookkeeping Demo … VisualVM GCHisto Performance Stats
Planning for the Future What we used to do predictive expansion: 1) Target max VM capacity 2) Matching QPS 3) Breakdown of traffic load 4) Scaling factor
Conclusions 7 Habits of Highly Effective Tuners 1. Know where you’re going 2. Know where you’re starting from 3. Test incrementally 4. Monitor with intent 5. Make small changes 6. Know when to stop 7. Plan ahead
Questions? Darren Spehr darren.spehr@mapquest.com
Resources
VisualVM VisualGC GCHisto Java Performance – Hunt and John The Garbage Collection Handbook – Jones, Hosking and Moss Solr In Action – Grainger and Potter