Extreme Java: Concurrency Performance with Dr Heinz Kabutz

Topics covered at JAVA-PERF-HK-03
View Schedule & Book More dates available

Next up:

  • 31st March - 2nd April 2021, Online Course, delivered by Heinz Kabutz


This course could be your most productive learning experience ever! It is aimed at the busy Java professional who wants to quickly learn and apply new essentials on core Java topics. All topics have been thoroughly researched by Dr Heinz Kabutz, famous in over 150 countries for his Java Specialists' Newsletter.

Join globally renowned Java expert Dr Heinz Kabutz for this hands-on workshop for the busy Java professional who wants to quickly learn and apply new essentials on core Java topics.

During the course you will learn about threading, performance, compare-and-swap non-blocking constructs, garbage collectors and many other topics that you will be able to quickly apply in your own work. We will also cover all relevant constructs found in Java 8, such as StampedLock, LongAdder, parallel streams and many more. We also cover Java 9 VarHandles. The course always uses the latest version of OpenJDK.

During the training, you will always get a chance to try out what you have learned in carefully thought out exercises. This will help you understand and quickly internalize what you have just learned.

Online Learning at Skills Matter

This course will be offered virtually over 3 full-day sessions.

Our virtual courses offer the same expert-led, hands-on experience we've offered since 2013 — only now we’re making it accessible from the comfort of your own home (office).

You'll join Heinz and participants from around the globe in an online classroom where you'll utilise a collaboration tools like Zoom, Slack, and to master new essentials of core Java topics.

Learn how to:

Students who have successfully completed this course, can expect the following outcomes:

  1. Throughout the course, we use the latest Java syntax. The first outcome would thus be an understanding of how the new Java constructs work.
  2. Students gain a good understanding of why threads are important and what the risks are. They learn how to share objects safely, including visibility concerns. They also master safety techniques of thread confinement, stack confinement and object confinement. Through this, they learn how to design a thread-safe class.
  3. They will know the difference between a synchronized and a concurrent collection and when to use which one. This is particularly important to be able to write high-performance code that scales well.
  4. They would understand how a blocking queue can be used to build producer consumer systems and what the various blocking queues are in Java.
  5. They would know how Semaphore, CountDownLatch and Phaser works.
  6. Students would learn how to use the thread pool executors to run tasks asynchronously. They would also learn how to configure these, including how to cope with an unexpected number of tasks and how the various settings interact.
  7. They will learn how to break up a large tasks into smaller tasks by choosing good task boundaries, resulting in tasks that are homogeneous and independent.
  8. They would learn how to cleanly cancel tasks that have been started by using interruptions and volatile boolean fields.
  9. Students would learn how the Fork/Join Pool works by comparing it to a normal single-threaded recursive algorithm. They will also get an opportunity to refactor a piece of Fork/Join Code to use parallel streams instead, in order to see how Java 8 can make coding a bit easier.
  10. Students would know how to detect and solve liveness issues, such as deadlock, livelock and contention.
  11. They would also know how to find and solve performance bottlenecks, especially in threaded code.
  12. They would know how ReentrantLock, ReentrantReadWriteLock and the new Java 8 StampedLock work and how we can use that to write efficient code using optimistic techniques.
  13. They would know how to write their own synchronizers when needed, by creating state-dependent classes.
  14. Students would understand what atomic classes are and know techniques to use them to build efficient non-blocking classes that offer better performance under contention.
  15. They also learn how VarHandles can improve performance.
  16. They would understand the most common garbage collection algorithms: throughput, concurrent and G1 and also how to tune each one to give best performance.
  17. They would know how to discover performance bottlenecks in an application and also how to solve these. They would also learn how profilers can be used to find bottlenecks and the role of microbenchmarks in confirming these.
  18. Throughout the course, a strong emphasis is placed on the practical application of learning. Each student needs to complete a set of exercises to demonstrate that they have understood the material.

About the Author

Heinz Kabutz

Dr Heinz Kabutz has programmed significant portions of several large Java applications and has taught Java to thousands of professional programmers. He is a regular speaker at all the major Java conferences and is the mastermind behind The Java Specialists' Newsletter. Heinz was chosen as a Java Champion by Sun Microsystems, the inventors of Java, for his work in advancing Java.


Day 1

  1. Introduction
    • Welcome To The Course
      • How we deal with questions
      • Exercises with partial solutions
      • Certificate of Training
    • History of concurrency
      • New supercomputers
      • Moore's Law
      • Hardware impact of concurrency
    • Benefits of threads
      • Programming is easier
      • Better throughput
      • Simpler modeling
    • Risks of threads
      • Safety vs liveness
      • Safety hazards
      • Using basic synchronized
      • Caching of fields
      • Code reordering
      • Annotations for Concurrency
        • Class annotations
        • Field annotations
    • Threads are everywhere
      • Threads created by JVM
      • Threads created by frameworks
      • Timer
      • Servlets and JavaServer Pages
      • Remote Method Invocation (RMI)
    • Short Java 7 and 8 Primer
      • Underscores in integral literals
      • Generic type inference
      • Lambdas
      • Method References
      • Streams
      • Primitive Streams
  2. Thread Safety
    • Introduction to Thread Safety
      • Synchronization and shared data
      • Your program has latent defects
    • Atomicity
      • Byte code generated by simple count++
      • Demonstration of broken servlet
      • Compound actions
        • Check-then-act
        • Read-write-modify
    • Sharing Objects
      • Visibility
        • Synchronization and visibility
        • Reason why changes are not visible
        • Making fields visible with volatile
        • Volatile flushing
      • Thread confinement
        • Unshared objects are safe
        • Ad-hoc thread confinement
        • ThreadLocal
        • Stack confinement
      • Immutability
        • Immutable is always thread safe
        • Definition of immutable
        • Final fields
    • Designing a thread-safe class
      • Encapsulation
      • Primitive vs object fields
      • Thread-safe counter with invariant
      • Post-conditions
      • Pre-condition
      • Waiting for pre-condition to become true
  3. Building Blocks
    • Synchronized collections
      • Old Java 1.0 thread-safe containers
      • Synchronized wrapper classes
      • Locking with compound actions
    • Concurrent collections
      • Scalability
      • ConcurrentHashMap
        • New Java 8 methods
      • Additional atomic operations
      • Java 8 ConcurrentHashMap
      • CopyOnWriteCollections
    • Blocking queues and the producer-consumer pattern
      • How BlockingQueues work
      • Java implementations of BlockingQueue
        • ArrayBlockingQueue
          • Circular array lists
        • LinkedBlockingQueue
        • PriorityBlockingQueue
        • DelayQueue
        • SynchronousQueue
        • TransferQueue
      • Deques
        • ArrayDeque
        • LinkedBlockingDeque
        • ConcurrentLinkedDeque (Java 7)
        • Work stealing
      • Good defaults for collections
      • Synchronizers
        • CountDownLatch
        • FutureTask
        • Semaphore
        • CyclicBarrier
        • Phaser (Java 7)
  4. Task Execution
    • The Executor framework
      • Executor interface
      • Motivation for using Executor
      • Decoupling task submission from execution
      • Execution policies
        • Who will execute it?
        • In which order? (FIFO, LIFO, by priority)
        • Various sizing options for number of threads and queue length
      • Thread pool structure
      • Thread pool benefits
      • Memory leaks with ThreadLocal
      • Standard ExecutorService configurations
      • ThreadPoolExecutor
      • Executor lifecycle, state machine
      • Shutdown() vs ShutdownNow()
    • Finding exploitable parallelism
      • Breaking up a single client request
      • Sequential vs parallel
      • Callable and Future
      • Callable controlling lifecycle
      • Example showing page renderer with future
      • Limitations of parallelizing heterogeneous tasks
      • CompletionService
      • Time limited tasks
    • Using Parallel Streams (Java 8)
    • Transforming collections into streams
    • Limitations of using parallel streams for IO
    • Finding prime numbers in parallel
    • Filtering and mapping streams
    • Configuring underlying Fork/Join framework
  5. Cancellation
    • Cancellation
      • Reasons for wanting to cancel a task
      • Cooperative vs preemptive cancellation
      • Using flags to signal cancellation
      • Cancellation policies
    • Interruption
      • Origins of interruptions
      • How does interrupt work?
      • Policies in dealing with InterruptedException
      • Thread.interrupted() method
    • Responding to interruption
      • Letting the method throw the exception
      • Restoring the interrupt and exiting
      • Ignoring the interrupt status
      • Saving the interrupt for later
    • Non-interruptible blocking
    • Reactions of IO libraries to interrupts
      • Interrupting locks
Join globally renowned Java expert Dr Heinz Kabutz for this hands-on workshop for the busy Java professional who wants to quickly learn and apply new essentials on core Java topics.

Day 2

  1. Thread Pools
    • Homogenous, independent and thread-agnostic tasks
    • Sizing thread pools
      • Danger of hardcoding worker number
      • Problems when pool is too large or small
      • Formula for calculating how many threads to use
      • CPU-intensiv vs IO-intensive task sizing
      • Examples of various pool sizes
      • Mixing different types of tasks
      • Determining the maximum allowed threads on your operating system
    • Configuring ThreadPoolExecutor
      • corePoolSize
      • maximumPoolSize
      • keepAliveTime
      • Using default* methods
      • Managing queued tasks
      • PriorityBlockingQueue
      • Saturation policies
        • Abort
        • Caller runs
        • Discard
        • Discard oldest
      • Thread factories
      • Customizing thread pool executor after construction
  2. ork/Join
    • Basics
      • Breaking up work into chunks
      • ForkJoinPool and ForkJoinTask
      • Work-stealing in ForkJoinPool
      • ForkJoinTask state machine
      • RecursiveTask vs RecursiveAction
    • Example of a parallel recursive function
      • Parallel Fibonacci Calculator
      • Fork/Join vs. Compute
      • Parallel merge sort
      • Sorting in Java 8
    • Managing tasks
      • Canceling a task
      • Visibility guarantees with fork/join
    • Use cases of fork/join
      • Using new parallel streams in Java 8 to simplify code
  3. Avoiding Liveness Hazards
    • Deadlock
      • The drinking philosophers
      • Causing a deadlock amongst philosophers
      • Resolving deadlocks
      • Discovering deadlocks
      • Lock-ordering deadlocks
      • Defining a global ordering
      • Dynamic lock order deadlocks
      • Defining order on dynamic locks
      • Checking whether locks are held
      • Imposing a natural order
      • Deadlock between cooperating objects
      • Open calls and alien methods
        • Example in Vector
      • Resource deadlocks
    • Avoiding and Diagnosing
      • Avoiding multiple locks
      • Using open calls
      • Unit testing for lock ordering deadlocks
      • Adding a sleep to cause deadlocks
      • Verifying thread deadlocks
      • Deadlock analysis with thread dumps
    • Livelock
      • Causes
  4. Testing Concurrent Programs
    • Testing for correctness
      • Checking for data races
      • Automatic tooling
        • JChord
        • JavaRaceFinder
        • FindBugs
        • IntelliJ IDEA
        • False positives
        • Memory requirements of automatic tools
      • Testing through bulk updates
      • Server HotSpot interference
      • Testing pitfalls
      • Controlling HotSpot and JIT
      • Turning off optimizations
      • Randomizing bulk operations
      • Testing field visibility
      • Single updates, with time delays
      • Pros and cons of various approaches
      • Examples of testing broken code
      • Testing for deadlocks
    • Testing for performance
      • HotSpot tricks
        • Loop unrolling
        • Useless code elimination
        • Inlining of method calls
        • Lock eliding
        • Lock coarsening
        • Eliminating object creation
      • HotSpot interference in microbenchmarks
      • HotSpot method call threshold
      • HotSpot compile time
      • Getting the fastest most optimized code
      • Randomization
        • Ensuring HotSpot does not overoptimize
        • Math.random() vs ThreadLocalRandom
        • Cost of remainder calculation
      • Statistics
        • Average and variance
        • Value of the minimum
        • Excluding warmup results
        • Eliminating interference
        • Length of timings
        • Value of including standard deviation
      • Concurrent performance Testing
        • Difference between single and multi-threaded test
        • ArrayList vs CopyOnWriteArrayList iteration benchmark
        • Context switching cost interference
  5. Performance and Scalability
    • Thinking about performance
      • Effects of serial sections and locking
      • Performance vs scalability
      • How fast vs how much
      • Mistakes in traditional performance optimizations
      • 2-tier vs multi-tier
      • Evaluating performance tradeoffs
    • Amdahl's and Little's laws
      • Formula for Amdahl's Law
      • Utilization according to Amdahl
      • Maximum useful cores
      • Problems with Amdahl's law in practice
      • Formula for Little's Law
      • Applying Little's Law in practice
      • How threading relates to Little's Law
    • Costs introduced by threads
      • Context switching
      • Cache invalidation
      • Locking and unlocking
      • Memory barriers
      • Escape analysis and uncontended locks
      • Lock elision
    • Reducing lock contention
      • Exclusive locks
      • Safety first!
      • Narrowing lock scope
      • Using ConcurrentHashMap
      • Performance comparisons
      • Reducing lock granularity
      • Lock splitting
      • Using CopyOnWrite collections
      • Lock striping
        • In ConcurrentHashMap
        • In ConcurrentLinkedQueue
      • Avoiding "hot fields"
      • ReadWriteLock
      • Immutable objects
      • Atomic fields
      • How to monitor CPU utilization
      • Reasons why CPUs might not be loaded
      • How to find "hot locks"
      • Hotspot options for lock performance
  6. Explicit Locks
    • Lock and ReentrantLock
      • Memory visibility semantics
      • ReentrantLock implementation
      • Using the explicit lock
      • Using try-finally
      • tryLock and timed locks
      • Using try-lock to avoid deadlocks
      • Interruptible locking
    • Performance considerations
      • Java 5 vs Java 6 performance
      • Throughput on contended locks
      • Uncontended performance
      • Heavily contended locks
    • Synchronized vs ReentrantLock
      • Memory semantics
      • Ease of use
      • Prefer synchronized
    • Read-write locks
      • ReadWriteLock interface
      • Understanding system to avoid starvation
      • ReadWriteLock implementation options
      • Release preference
      • Reader barging
      • Reentrancy
      • Downgrading
      • Upgrading
    • StampedLock (Java 8)
      • Difference between StampedLock and ReentrantReadWriteLock
      • Pessimistic reading and writing
      • Optimistic reading
      • Conditional changes by upgrading read to write lock
      • Performance differences between StampedLock and ReentrantReadWriteLock

Day 3

  1. Building Custom Synchronizers
    • Managing state dependence
      • Single-threaded vs multi-threaded
      • Structure of blocking state-dependent actions
      • Example using bounded queues
      • Introducing condition queues
        • With intrinsic locks
    • Using condition queues
      • State-dependence
      • Condition predicate
      • Lock
      • Condition queue
      • Waking up too soon
      • Waiting for a specific timeout
      • Conditional waits
      • Missed signals
        • InterruptedException
      • notify() vs notifyAll()
      • Encapsulating condition queues
    • Explicit condition objects
      • Condition interface
      • Benefits of explicit condition queues
      • Timed conditions
  2. Atomic Variables and Nonblocking Synchronization
    • Disadvantages of locking
      • Elimination of uncontended intrinsic locks
      • Volatile vs locking performance
      • Priority inversion
    • Hardware support
      • Optimistic locking
      • Compare-and-Swap (CAS)
      • Compare-and-Set
      • Managing conflicts with CAS
      • Simulation of CAS
      • Nonblocking counter
      • CAS support in the JVM
      • Shared cache lines
      • Performance advantage of padding
      • Using @sun.misc.Contended (Java 8)
    • Atomic variable classes
      • Optimistic locking classes
      • Very fast when not too much contention
      • Types of atomic classes
      • How do atomics work?
      • Atomic array classes
      • LongAdder and LongAccumulator (Java 8)
      • Performance comparisons: Locks vs atomics
      • Cost of atomic spin loops
      • Nonblocking algorithms
      • Scalability problems with lock-based algorithms
      • Definition of nonblocking and lock-free
    • Nonblocking stack
      • Doing speculative work
      • Performance
  3. Java Memory
    • Garbage Collection
      • Generational Spaces
      • Difference between young and old GC
      • Stop-The-World events
      • Throughput Collector (Parallel)
        • How it works
        • Tuning the throughput collector
      • Concurrent Mark Sweep Collector
        • Various phases either STW or Concurrent
        • Types of serious STW failures
          • Concurrent Mode Failure
          • Promotion Failed
          • Perm/Meta GC
        • Tuning the throughput collector
      • G1 Collector
        • How it works
        • Tuning the G1 collector
      • Sizing the collector
        • Total Heap
        • New/Old
        • Eden/Survivor
        • Tenuring threshold
      • Working example in tuning a large Fibonacci number calculation
      • Measuring GC Activity
        • Flags for generating GC logs
        • Understanding GC information
      • References
        • Reference Objects
        • Object Reachability
        • Using References
          • SoftReference
          • WeakReference
          • PhantomReference
  4. Java Optimizations
    • Tuning Process
      • Optimization Techniques - Big Gains Quickly
      • Specifying the required performance
      • Optimization methodology
      • System Overview - "The Box"
      • Analyzing CPU bottlenecks
      • Microbenchmarking
        • Java Microbenchmarking Harness (JMH)
    • JIT and HotSpot
      • Just-in-Time
      • HotSpot
      • Client
      • Server
      • Tiered Compilation
      • VM Switches
    • Typical Problem Areas
      • Object Creation
      • Array creation
      • Temporary objects
      • Lazy initialization
      • Strings
        • intern
        • StringBuilder vs StringBuffer vs String
        • char[] creation with modifying methods
        • += complexity
        • String appending performance
        • Parsing Strings
        • substring() in various versions of Java
        • String deduplication (Java 8)
      • Regular Expressions
      • Exception performance
      • Loops
        • Tuning loops
        • Extracting invariants
        • Method calls
        • Arrays and loops
          • Cache lines
        • Calls to JNI
      • Benchmarking
      • Other Areas
      • Final
  5. Conclusion
    • Tips on where to learn more
    • Thank you!




This course is ideally suited to the professional Java programmer with at least 2 years experience, who would like to learn how to truly understand Java concurrency.

Previous Training: Extreme Java: Advanced Topics Course (Recommended). Required Experience: At least two years of professional Java programming.