Con5388 maier

Post on 07-Jul-2015

293 views 5 download

transcript

Java Application Design Practices to Avoid When Dealing with Sub-100 ms SLAs

Daryl Maier (IBM Canada Lab), Anil Kumar (Intel Corporation)

1st October, 2012

© 2012 IBM Corporation

Important Disclaimers

§ THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY.

§ WHILST EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS-IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESSED OR IMPLIED.

§ ALL PERFORMANCE DATA INCLUDED IN THIS PRESENTATION HAVE BEEN GATHERED IN A CONTROLLED ENVIRONMENT. YOUR OWN TEST RESULTS MAY VARY BASED ON HARDWARE, SOFTWARE, OR INFRASTRUCTURE DIFFERENCES.

© 2012 IBM Corporation2

§ ALL DATA INCLUDED IN THIS PRESENTATION ARE MEANT TO BE USED ONLY AS A GUIDE.

§ IN ADDITION, THE INFORMATION CONTAINED IN THIS PRESENTATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM, WITHOUT NOTICE.

§ IBM AND ITS AFFILIATED COMPANIES SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

§ NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF: – CREATING ANY WARRANT OR REPRESENTATION FROM IBM, ITS AFFILIATED COMPANIES OR ITS OR THEIR SUPPLIERS AND/OR LICENSORS

Introduction to the speakers

Daryl Maier

– 12 years experience developing and deploying Java SDKs at IBM Canada Lab

– Recent work focus:• X86 Java just-in-time compiler development and performance• Java benchmarking

© 2012 IBM Corporation3

– Contact: maier@ca.ibm.com

Anil Kumar

– 10 years experience in server Java performance ensuring best customer experience on all Intel Architecture based platforms

– Contact: anil.kumar@intel.com

The contents of this presentation were jointly produced with

Credits

Elena Sayapina. Java Performance / Intel

© 2012 IBM Corporation4

Intel and IBM collaborate to ensure the best user experience across all Intel Architecture based platforms.

4

What this talk is about…

§ Learn what contributes to higher transactional response times within a Java application

§ How to measure response time

§ Java application design practices that lead to lower response times

© 2012 IBM Corporation5

§ Java application design practices that lead to lower response times

§ How to tune the environment in which your application runs for better response time

§ How to determine if you can achieve an even better response time

§ Lots of practical examples

Service Level Agreements

§ SLA == Service Level Agreement– A commitment to provide a service that meets a prescribed level of performance– Can be informal or contractually obligated

CPU

© 2012 IBM Corporation6

CPU

AvailabilityStorage

ConcurrentUsers

ResponseTime

?

Response time

§ Measure of time needed to complete a transaction in response to a request to do work

§ Lower response times generally have positive effects

§ Different perceptions of response time: user interface, real time event, service level

© 2012 IBM Corporation7

§ Different perceptions of response time: user interface, real time event, service level commitments, …

§ Isn’t improving response time simply a matter of increasing throughput? Not necessarily…

How do you measure response time?

§ Be sure what you’re measuring is the response time you’re interested in

Transaction A

Requests Responses

© 2012 IBM Corporation8

Transaction B

Transaction CRequestQueue

TransactionQueue

ResponseQueue

Executor Thread Pool

Requests Responses

How do you measure response time?

§ Be sure what you’re measuring is the response time you’re interested in

Transaction A

Requests Responses

© 2012 IBM Corporation9

Transaction B

Transaction CRequestQueue

TransactionQueue

ResponseQueue

Executor Thread Pool

Requests Responses

Measuring response time from request made to response received?

How do you measure response time?

§ Be sure what you’re measuring is the response time you’re interested in

Transaction A

Requests Responses

© 2012 IBM Corporation10

Transaction B

Transaction CRequestQueue

TransactionQueue

ResponseQueue

Executor Thread Pool

Requests Responses

Measuring response time from transaction submitted to response received?

How do you measure response time?

§ Be sure what you’re measuring is the response time you’re interested in

Transaction A

Requests Responses

© 2012 IBM Corporation11

Transaction B

Transaction CRequestQueue

TransactionQueue

ResponseQueue

Executor Thread Pool

Requests Responses

Measure time to complete the transaction?

How do you measure response time?

§ Make sure your timing measurement isn’t part of the response time!

§ Be aware of accuracy and precision of Java timing methods– System.nanotime()– System.currentTimeMillis()– …and don’t use too many timers!

© 2012 IBM Corporation12

– …and don’t use too many timers!

§ Beware of clock skew in virtual environments– May need to keep time on an external system

How do you measure response time?

Sample of transaction response times for an IR of 3000 ops/sec. Most long transactions above 95th percentile.

© 2012 IBM Corporation13

Framework

Influences on response time are not localized

Application

© 2012 IBM Corporation14

Hardware

Operating System

Java VM

FrameworkYou must design and tune the entire stack in order to achieve your response time targets

SPECjbb2012

§ Next generation Java business logic benchmark from SPEC

§ Business model is a supermarket supply chain: headquarters, supermarkets, suppliers

§ Scalable, self-injecting workload with multiple supported configurations

© 2012 IBM Corporation15

§ Scalable, self-injecting workload with multiple supported configurations

§ Customer relevant technologies: security, XML, JDK 7 features

§ Metrics: max-jOPs (throughput) and critical-jOPs (response time)

§ Will be used for case studies in this presentation

Framework

Application design influences response time

Application• design for scalability

• eliminate serial bottlenecks

• use appropriate JCL packages

© 2012 IBM Corporation16

Hardware

Operating System

Java VM

Framework• use appropriate JCL packages

• avoid needless synchronization

• avoid excessive object allocations

• cache data locally

• use non-blocking I/O

• be careful with logging and tracing

Design for scalability

§ Scalability : the ability to increase throughput as more resources are applied

§ Prepare your application to run on modern multi-core architectures

§ Create more parallelism in your application and eliminate serial bottlenecks– Change algorithms

© 2012 IBM Corporation17

– Change algorithms

§ Organize your application into parallel tasks– Leverage TaskExecutor framework for high-level tasks– Consider ForkJoin in Java 7 for fine-grained task decomposition

Use the java/util/concurrent package

§ j/u/c introduced in Java 5, additional features in Java 6/7

§ Contains building blocks for developing scalable applications– Uses state-of-the-art concurrency algorithms using non-blocking sync algorithms– More variety in locking operations (Lock interface, multiple Conditions)– Atomic variables (atomic math ops such as increment, test-and-set)

© 2012 IBM Corporation18

– Atomic variables (atomic math ops such as increment, test-and-set)– Concurrent collections– Coarse and fine-grained task management

§ Use j/u/c classes as base classes for new data structures

§ Optimized by modern JVMs

Avoid unnecessary Java synchronization

§ Required for correctness so it can’t always be done

§ Built-in Java synchronization is coarse grained and can inhibit scalability– Useful when true mutual exclusion is the goal– JVMs can help

§ Strongly consider using j/u/c for finer-grained locking– Building blocks for scalable locking

© 2012 IBM Corporation19

– Building blocks for scalable locking

§ Eliminate contended locks

§ Use volatile fields when appropriate– No locking– May be suitable for single writer, multiple-reader (e.g., time stamps)

Avoid excessive object allocations

§ Understand the effect of object creation on the heap and the strain on garbage collection

§ Consider hoisting allocations from loops

§ Consider using weak/soft references when appropriate

© 2012 IBM Corporation20

§ Consider using weak/soft references when appropriate– Useful for caches, object metadata, or easily rematerializable data

§ Be aware of immutable classes that implicitly return new objects– e.g., BigDecimal, Integer

Case study: SPECjbb2012

§ Example of design choices around receipt storage in the benchmark

• Some impact on throughput

• No impact on median response time

© 2012 IBM Corporation21

response time

• Significant impact on 99th-percentile response time

Case study: SPECjbb2012

§ Example of design choices where background tasks become more heavy– Increase in background task of Data Mining (DM)

• Some impact on throughput

• No impact on median response time

© 2012 IBM Corporation22

response time

• Significant impact on 99th-percentile response time

Reduce data access latency

§ Often a problem in client/server systems

§ Cache data locally to avoid remote communication– Particularly effective with data unlikely to change

© 2012 IBM Corporation23

§ Pitfall : Tradeoff between caching too much to improve remote access latency and accumulating too much that strains garbage collection– an example of where local benefits to throughput have broader negative effects

§ Use Java NIO (Java SE 1.4) and NIO2 (Java SE 7)– Can leverage high performance features

§ Carefully consider non-blocking, unbounded data structures (e.g., ConcurrentLinkedQueue)

Case study: SPECjbb2012

Performance effects of caching supermarket data over not caching it

• Throughput reduces by half

• Minor impact on median response time

© 2012 IBM Corporation24

median response time

• Some impact on 99th-percentile response time

Framework

Application frameworks

Application

• application containers (e.g., application

© 2012 IBM Corporation25

Hardware

Operating System

Java VM

Framework• application containers (e.g., application servers, Eclipse)

• 3rd party packages (e.g., Apache commons), Grizzly

• understand thread management and local caching policies

Framework

Java virtual machine tuning

Application

© 2012 IBM Corporation26

Hardware

Operating System

Java VM

Framework

• garbage collection

• heap tuning

• 64-bit addressing

Java virtual machine architecture

Debugger Profilers Java Application Code

JVMTI JSE6 Classes

JSE6 Classes

Harmony Classes

User Natives

GC / JIT / Class Lib. Natives Java Native Interface (JNI)

Core VM (Interpreter, Verifier, Stack Walker)

Trace & Dump EnginesJava Runtime

Java APIe.g. Java6/Java7

User Code

© 2012 IBM Corporation27

Trace & Dump Engines

Port Library (Files, Sockets, Memory)

Thread Library

AIX Linux Windows z/OS

PPC-32PPC-64

x86-32x86-64

PPC-32PPC-64

zArch-31zArch-64

x86-32x86-64

zArch-31zArch-64

Operating Systems /Architecture

Environmente.g. J9 R26

= User Code

= Java Platform API

= VM-aware

= Core VM

Garbage collection

§ Determine the best garbage collection policy to use for your application– Often a response time vs. throughput tradeoff

§ Most GC policies involve a “stop-the-world” phase that works against response times– “throughput” policies tend to incur longer pauses but fewer interruptions– “concurrent” policies lower average pause times by completing some tasks concurrently

© 2012 IBM Corporation28

– “concurrent” policies lower average pause times by completing some tasks concurrently– “balanced” policies carve heap into regions to improve parallelism and reduce pauses

§ Tune your heap parameters

§ -verbose:gc to correlate GC events with application events

Case study: SPECjbb2012

§ Example showing the effect of different GC policies and heap tunings

• Small throughput reduction from ConMarkSweep

• No impact on median

© 2012 IBM Corporation29

• No impact on median response time

• ConMarkSweep 99th-percentile response time higher but consistent

64-bit addressing

§ Heap addressability beyond 32-bits (> 3.5GB)– Common for applications with large in-memory working set (e.g., databases, object caches)

§ 64-bit addressing is a less efficient representation than 32-bit– Cache & TLB effects stress hardware

© 2012 IBM Corporation30

– Cache & TLB effects stress hardware

§ Solution: build a 64-bit JVM with near 32-bit efficiency– Use 32-bit values (offsets) to represent object fields– With scaling, between 4 GB and 32 GB can be addressed

§ Enable with –XX:+UseCompressedOops or -Xcompressedrefs

Framework

Operating system tuning

Application

© 2012 IBM Corporation31

Hardware

Operating System

Java VM

Framework

• large pages

• thread scheduling

Large data and code pages

§ OS paging architecture requires memory addresses to be mapped to more granular “pages” that are mapped to physical memory– Translation Lookaside Buffers (TLBs)– Using larger page sizes increases TLB effectiveness

§ Large pages must be enabled by the OS

© 2012 IBM Corporation32

§ Large pages must be enabled by the OS– BUT require enough physical pages to be allocated together to be most effective

§ Modern JVMs place both heap and compiled code in large pages

§ -Xlp (J9) or –XX:+UseLargePages (HotSpot)

Case study: SPECjbb2012

§ Example showing the effect of large pages

• Increase throughput by ~13%

• No impact on median response time

© 2012 IBM Corporation33

response time

• Helps in keeping 99th-percentile response time lower at higher load

Thread scheduling

§ Context switches

– Voluntary (e.g., preemption during locking)

– Involuntary (e.g., too many active threads)

© 2012 IBM Corporation34

§ Watch for thread migration

Framework

Hardware tuning

Application

© 2012 IBM Corporation35

Hardware

Operating System

Java VM

Framework

• power management

• BIOS settings

Hardware tuning

§ Power management

§ Insufficient resources– Physical memory, amount and latency– I/O storage latency

• RAID• SSDs

– Network I/O bandwidth

§ Tune your BIOS settings carefully

© 2012 IBM Corporation36

§ Tune your BIOS settings carefully– Hyperthreading– Prefetching– Power management

Know your Intel® Xeon® Processor Family

© 2012 IBM Corporation37

Know your Intel® Xeon® Processor SKU:

© 2012 IBM Corporation38

Case study: SPECjbb2012

§ Example showing the effect of 8 cores vs. 4 cores– Assumes application leveraging parallelism of multiple cores

• Increases throughput by ~100%

• No impact on median response time

© 2012 IBM Corporation39

response time

• 8 cores deliver much lower 99th-percentile response

Leveraging your hardware topology

§ Understand the underlying hardware topology to reduce latency and increase throughput

§ For NUMA, affinitize JVMs to core/memory subsets to improve performance– Improve NUMA performance– Optimize the cache hierarchy of the underlying processors

• Increases throughput by ~12%

© 2012 IBM Corporation40

• No impact on median response time

• Much lower 99th-percentile response

Evaluating your response time

§ Even though you may be achieving an acceptable SLA are there tell-tale signs that you could be achieving even better?– Lack of multi-threadedness in your application– Lock contention– Low CPU utilization– Excessive time (>10%) being spent in OS kernel

© 2012 IBM Corporation

§ Tooling to help diagnose response time issues– IBM HealthCenter

– What is my JVM doing? Is everything ok?– Why is my application running slowly? Why is it not scaling?– Am I using the right options?

– Garbage Collector and Memory Visualizer• Online analysis of heap usage, pause times, many others

– Memory Analyzer• Offline tool providing insight into Java heaps

Questions?

© 2012 IBM Corporation42

References

§ Get Products and Technologies– IBM Java Runtimes and SDKs:• https://www.ibm.com/developerworks/java/jdk/

– IBM Monitoring and Diagnostic Tools for Java:• https://www.ibm.com/developerworks/java/jdk/tools/

– SPEC benchmarking• http://www.spec.org

© 2012 IBM Corporation43

• http://www.spec.org

§ Learn– IBM Java InfoCenter:• http://publib.boulder.ibm.com/infocenter/javasdk/v6r0/index.jsp

§ Discuss– IBM Java Runtimes and SDKs Forum:• http://www.ibm.com/developerworks/forums/forum.jspa?forumID=367&start=0

Copyright and Trademarks

© IBM Corporation 2012. All Rights Reserved.

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., and registered in many jurisdictions worldwide.

© 2012 IBM Corporation44

Other product and service names might be trademarks of IBM or other companies.

A current list of IBM trademarks is available on the Web – see the IBM “Copyright and trademark information” page at URL: www.ibm.com/legal/copytrade.shtml

SPECjbb2012 architecture

Single Application Set Multi-Application Set

Ctrl BETxI Ctr

l

BETxI

© 2012 IBM Corporation4545

Controller (Ctrl)–Controls and evaluates the runs

Transaction Injector (TxI)– Issues “Requests” at a given rate–Measures response time by sending probe requests

Backend SUT (BE) –Some % of transactions go across BEs exercising inter-JVM process communication

BETxI

Group

SPECjbb2012 architecture

SM 1

HQSM 2

SP 1 SP 2

Backend 1

Group 1

© 2012 IBM Corporation46

SM 1

HQ

SM: SupermarketHQ: HeadquartersSP: Supplier

SM 2

SP 1 SP 2

Backend 2

Group 2

Group 1

Be aware of the impact of logging and tracing

§ Tracing and logging events from your application can have hidden costs– I/O latency

– Storage requirements

– Overhead of test guarding tracing code

© 2012 IBM Corporation47

– Impact on JIT compilation

§ Do try to correlate application tracing information with events in other system or JVM logs