Sunday, August 26, 2012

Java Concurrency in Practice - Summary - Part 2

NOTE: These summaries are NOT meant to replace the book.  I highly recommend buying your own copy of the book if you haven't already read it. 

Chapter 3 - Sharing Objects
  1. Synchronization (for example, by using synchronized blocks) is not just about atomic execution of code blocks, it also influences memory visibility - i.e., ensures that a thread can see the changes made in another thread.  Without synchronization, the Java memory model does not guarantee that a value written by a thread will be seen by another thread on a timely basis or even at all.   For example,  if proper synchronization is not used, a thread X that relies on a control variable that is set in thread Y may NEVER see any updates to it that are written in thread Y.  In most cases, thread X will incorrectly loop for ever.
  2. If synchronization is not used, reordering of operations done by multi-processor CPUs for improving performance, can cause a thread to see an incorrect or partial value written by another thread.  Without synchronization, the data can be stale.  If thread first sets variable x to 1 and then y to 2, another thread may see y set to 2 while x is still unset.  This can lead to bugs that are very hard to debug.
  3. Always use synchronization whenever data is shared across threads.
  4. Out-of-thin-air safety - A thread always sees the value of a variable that was written by some thread; not some random value pulled out of thin air.  Unless declared as volatile, 64-bit numeric variables (long and double) do not have out-of-thin-air safety, because the JVM treats 64-bit operations as two 32-bit operations.
  5. Volatile variables provide a weaker form of synchronization.  
    1. Volatile variables are specially treated by the compiler (for eg: not cached in registers). So a read of a volatile variable always returns the latest value written by some thread.
    2. When thread A writes to a volatile variable and subsequently thread B reads that same variable, the values of all variables visible to A prior to writing the volatile variable become visible to B after reading the volatile variable.
    3. Don't overuse volatile, and in tricky ways.  Synchronized blocks are still necessary for atomic operations.  
    4. Volatile is commonly used for a control variable that determines when a thread should exit an infinite loop.
    5. Locking guarantees visibility and atomicity; volatile variables guarantee only visibility.
    6. Use volatile variables only when all the following conditions are satisfied:
      1. Writes to the variable do not depend on its current value, or if it is guaranteed that only a single thread writes to the variable.
      2. The variable does not participate in invariants with other state variables.
      3. Locking is not required for any other reason while the variable is being accessed.
  6. For server applications, always specify the -server JVM command line argument even while developing and testing, since the JVM does more drastic optimizations in server mode. Some concurrency bugs arise only under these optimizations.
  7. Publishing an object means making it available to code outside of its current scope.
    1. This can be done by:
      1. storing a reference to it somewhere where other code can find it, say a public static field or in a publicly accessible HashMap.
      2. returning it from a non-private method.
      3. passing it to an alien method
        1. a method in other classes
        2. an overridable method in the same class
      4. publishing an inner class instance (this automatically exposes the enclosing instance)
    2. Sometimes we do not want to publish an object since that will break encapsulation.  An object that is published when it should not have been is said to have escaped.
    3. Do not allow the this reference to escape during construction.  This commonly happens when the constructor registers some inner class with external event listeners or starts a thread.  Even if this is the last statement in the constructor, it is possible that a reference to the object may escape before it is fully constructed.  Other threads can see the partially constructed object and react incorrectly.  Use a separate start() method to start a thread created in the constructor, or to register event listeners created in the constructor.  Alternatively, to do it one step, use a newInstance() factory method that calls the constructor and then automatically calls start() before returning the newly created object.
    4. Calling an overriden instance method from the constructor also allows this to escape before being fully constructed.
  8. Thread confinement, i.e., make sure data is accessed only from one thread, is the easiest way to achieve thread safety.
    1. Swing UI framework & JDBC connection objects use thread confinement extensively.
    2. Thread confinement options:
      1. Ad-hoc thread confinement - programmer entirely responsible to confine object to thread - no language features used. Not recommended due to fragility.
      2. Stack confinement - Object can be reached only through local variables
        1. Primitive types are always stack confined.
        2. Care should be taken that object references do not escape.
      3. ThreadLocal - provides get and set methods that maintain a separate copy of a value for each thread. Used as: new ThreadLocal() { public T initialValue() {...}}
  9. Immutability - Immutable objects are always thread-safe.
    1. Even if all fields of an object are final, it may still not be immutable as some of its final fields can refer to mutable objects.
    2. Final fields provide initialization safety as they have special semantics under the Java Memory Model.  Make all fields of a class final unless they really need to be mutable 
    3. When a group of related data items must be processed atomically, consider creating an immutable holder class.   When an immutable holder class, we may be able to avoid a synchronized block.
  10. Safe publication
    1. Simply storing a reference to an object into a public field is not safe, as it could lead to other threads seeing the object in a partially constructed state (due to reordering).
    2. Immutable objects can be published through any mechanism; no synchronization necessary.
    3. Others must be safely published, i.e., both the reference to the object and the object's state must be made visible to other threads at the same time.  A properly constructed object can be safely published by:
      1. Initializing an object reference from a static initializer.  This is often the easiest way; static initializers are executed by the JVM at class initialization time which has JVM-internal synchronization.
      2. Storing a reference to it in a volatile field or AtomicReference.
      3. Storing a reference to it into a final field of a properly constructed object
      4. Storing a reference to it into a field that is properly guarded by a lock, like thread-safe collections like Vector or synchronizedList.
    4. Effectively immutable objects must be safely published.
      1. Objects that are not technically immutable, but whose state will not be modified after publication are called effectively immutable.  Safely published effectively immutable objects can be safely used by any thread without additional synchronization.  For example, the Date object is often used as an effectively immutable object although it is technically mutable.
    5. Mutable objects must be safely published, AND must be either thread-safe or guarded by a lock.

Java Concurrency in Practice - Summary - Part 1

This is part 1 of  my notes from reading Java Concurrency in Practice.

NOTE: These summaries are NOT meant to replace the book.  I highly recommend buying your own copy of the book if you haven't already read it.

Chapter 1 - Introduction
  1. Writing correct concurrent programs is very hard.
  2. Threads are the easiest way to effectively use multi-processor systems, which are now ubiquitous.
  3. When writing multi-threaded programs, we must pay attention to the following:
    1. Safety - Nothing  bad ever happens, i.e. program correctness is guaranteed irrespective of interleaved execution.
    2. Liveness - Something good eventually happens.  For eg: no deadlock.
    3. Performance - Something good happens fast enough. For eg: no excessive context switches.
  4. Many Java frameworks (GUI toolkits, RMI, Timers, etc) internally use threads.  So your code must be thread-safe even if you do not explicitly use threads.

Chapter 2 - Thread Safety

  1. Writing concurrent programs is all about correctly managing access to shared, mutable state.  Threads are just one kind of mechanism.
    1. An object's state = any data that can affect its externally visible behavior.  
    2. An object's mutable state needs to be protected from uncontrolled concurrent access from multiple threads.
  2. A class is thread-safe if it continues to behave correctly when accessed from multiple threads, with no additional synchronization or coordination required of the calling code.  In the absence of formal specifications (i.e., invariants constraining an object's fields, postconditions defining the effect of operations on the object etc), we assume that the single-threaded behavior of a class is its correct behavior (after verification, of course!).
    1. It is much easier to design a class to be thread-safe than to retrofit thread-safety into it later.
    2. It is easier to make a class thread-safe if its state is private.  In other words, follow good OO practices.
    3. Thread-safe classes encapsulate any needed synchronization so that calling code need not provide their own.
  3. Stateless objects are always thread-safe.
  4. The most common race condition is associated with check-then-act sequences.  Lazy initialization of expensive objects is a common place where check-then-act is used.
  5. Race condition != data race.  Data race happens when a thread writes a variable without synchronization and another thread tries to read it - the reading thread may see partial or completely incorrect data.
  6. If all you need is a thread-safe counter, just use java.util.concurrent.atomic.AtomicLong.  If multiple pieces of state are involved, this is not sufficient - further synchronization is necessary.
  7. synchronized block - Java's built-in locking mechanism for enforcing atomicity
    1. synchronized block is associated with an object that serves as the lock, and a block of code to be guarded.  
    2. Every java object can act as a lock for a synchronized block.  These built-in locks are called intrinsic or monitor locks.
    3. Intrinsic locks are mutexes; i.e., only one thread can own it at a time.
    4. Intrinsic locks are reentrant - a thread can immediately acquire a lock that it is currently holding.
  8. Each mutable variable that is read/written from multiple threads must be guarded by synchronization with the SAME lock object EVERY TIME it is read/written.
    1. Use @GuardedBy("lockobject") annotation on each mutable shared variable to document the locking strategy.
  9. For every invariant that involves more than one variable, all the variables involved in the invariant must be guarded by the same lock.