The Ex CS Grad Student: Java Concurrency in Practice - Summary

This is part 3 of my notes from reading Java Concurrency in Practice.

NOTE: These summaries are NOT meant to replace the book. I highly recommend buying your own copy of the book if you haven't already read it.

Chapter 4 - Composing objects

We often create thread-safe objects by composing together other thread-safe objects. In order to design a thread-safe class, we need to identify the object's state variables, establish invariants that constraint them and the post conditions associated with its method, and then establish a policy for managing concurrent access to them.

An object's state includes the state of other objects referenced from its state variables. For eg: a LinkedList's state includes all the link node objects.

Synchronization policy - how an object uses immutability, thread-confinement, locking (how and which variables are guarded by which locks) to coordinate access to its state variable so that the invariants or post conditions are not violated.
Encapsulation enables us to determine that a class is thread-safe without having to examine the entire program, because all paths that use the data can be identified and made thread-safe.

Collection wrappers like synchronizedList make the underlying non-thread-safe ArrayList thread-safe by using encapsulation.

Java monitor pattern - object encapsulates all state and guards it with its own intrinsic lock.

Pro: Simplicity.
Con: External code can lock on the object's intrinsic lock and cause liveness problems that may require examining the whole program to debug. This problem is avoided if a separate private object is used as the lock.

A composite object made of thread-safe components need not be thread-safe. This is usually the case when there are constraints on the state of the components.
A state variable can be published if it is thread-safe, does not participate in any invariants that constrain its value and has no prohibited state transitions for any of its operations.
Re-use existing java thread-safe libraries whenever possible. To add-functionality to an existing thread-safe class:

The best option is to modify the source code of the class if available.

Pro: details about synchronization policy are confined to one file, and is thus easier to maintain.

If modifying source code is not possible, the next best option is to extend the class, assuming it was designed for extension.

Con: Fragile. If base class changes synchronization policy (say which locks are used), then the extended class will silently fail.

Extension or source modification is not possible for collections wrapped in Collections.synchronized* wrappers, since the underlying wrapped class is unknown. The solution is to use client-side locking - guard client code with the lock specified by the object's synchronization policy. For Vector, the lock is the object itself.
Another option is Composition. In a wrapper object, maintain a private internal copy of the object (say List) whose functionality we wish to extend. Add the new functionality as a synchronized method of the wrapper. Add synchronized methods for existing functionality of the wrapped object that simply delegate to the underlying object.

Pro: Less fragile
Con: Minor overhead due to double locking.

Document a class's thread-safety guarantees for users of the class; Document its synchronization policy for the class's maintainers.

Use @GuardedBy annotations to document the locks used to guard different state variables.
Since documentation for commonly used Java libraries is vague, we often have to guess whether a class is thread-safe or not. For example, since a JDBC DataSource represents a pool of reusable database connections to be shared across multiple threads, we can assume that it is thread-safe. However, individual JDBC Connection objects are intended to be used by a single thread, and are most likely not thread-safe.

Chapter 5: Building Blocks

Java offers a wide variety of thread-safe classes that can serve as building blocks for large concurrent programs.
External locking is still required for thread-safe classes in order to provide a 'reasonable' behavior.
Unreliable Iteration is one example where additional synchronization is required. For example, using getLast() and deleteLast() on a thread-safe List requires additional synchronization. If getLast() determines the index of the last element to be L and the element at L is deleted by deleteLast() before getLast() accesses it, an ArrayOutOfBoundsException will be thrown. Otherwise, an ArrayOutOfBounds exception may be thrown if the last element of the list is deleted after getLast() determined the index of the last element to be returned. Note that the data in the List is never corrupted, we just get unexpected behavior.

Java Iterators throw an unchecked ConcurrentModificationException if it detects that the underlying collection has changed during iteration. This is not reliable as the counters used to track whether the underlying collection has changed are not thread-safe.
Iterators can sometimes be hidden - For eg: an iterator is used if a collection object is passed to a log statement which tries to get its string representation.
Unreliable iteration can be solved by client-side locking. When using the synchronized wrappers provided by Java, we can use the underlying collection as the lock for composite actions. Con: Decreases scalability.
Another option to avoid concurrent modification exception is to clone the collection as a local copy. The lock on the collection must be held while cloning. Con: Can be very expensive to clone large collections.
To avoid client-side locking during iteration, use the Concurrent collections offered by Java. This improves scalability as multiple threads can now access the collections simultaneously without blocking

ConcurrentHashMap

HashMap uses a single lock to synchronize all its operations.
ConcurrentHashMap uses lock-striping. Supports concurrent non-blocking access by infinite number of readers, and a limited number of writers.
Iterators do not throw ConcurrentModificationException.
size() and isEmpty() are approximate. These methods are not useful in concurrent environments anyway.
Cannot lock the entire map for synchronized access, needed in rare cases where multiple map entries need to be added atomically.
Cannot use client-side mapping while adding new atomic operations. If these are needed, you most likely need ConcurrentMap instead of ConcurrentHashMap.

CopyOnWriteArrayList

Thread-safety derived from the immutability of underlying list. Mutability is provided by creating and republishing a new copy of the list on every change. Iterators point to the list at the time the iterator was created.
Copying large lists can be expensive. Hence mainly useful when iteration is the more common operation rather than addition - for eg: when using a list of event listeners.

Some more concurrent collections - CopyOnWriteArraySet, ConcurrentLinkedQueue, ConcurrentSkipListMap - concurrent replacement for synchronized SortedMap, ConcurrentSkipListSet - concurrent replacement for synchronized SortedSets
Java offers multiple Queue implementations (esp. BlockingQueue) that can be very useful to implement producer-consumer designs. Queues can be blocking or non-blocking. For non-blocking queues, .retrieval operations return null if queue is empty.
BlockingQueue

blocking methods : take, put (blocking happens only if queue is bounded)
non-blocking methods: offer, poll
Use bounded blocking queues for reliable resource management. Otherwise, if consumers are slow, producers can keeping adding to the queue till the JVM runs out of heap space. Do this early in design; hard to retrofit later.
We can use offer() to check if the item will be accepted by the queue. If the queue is full, the item can be discarded in application specific ways (for eg: simply drop it or save it to local disk for later usage)
BlockingQueue implementations - contain sufficient internal synchronization to safely publish objects from a producer thread to a consumer thread

LinkedBlockingQueue
ArrayBlockingQueue
PriorityBlockingQueue
SynchronousQueue - Not really a queue as it does not maintain storage space for elements. Just maintains a list of queued threads waiting to enqueue or dequeue an element. Directly transfers item from producer to consumer - more efficient. Direct handoff also informs producer that consumer has taken responsibility for the item. take() and put() will block if no thread is waiting to participate in the handoff. Mainly used when there are always enough consumer threads.

Deque and BlockingDeque - allows efficient insertion and removal from head and tail.

Enables Work Stealing designs - In producer-consumer design, there is a single queue that is shared across all threads. This causes lots of contention. In work stealing, each consumer has its own deque, from the head of which it consumes items. If its deque is empty, it can steal objects from the tail of some other consumer's deque. Most of the time, a consumer takes objects from the head of its own deque, thereby avoiding contention. Even when it steals, there is little contention as it steals from the tail rather than the head. Work stealing is well-suited for applications where producers are also consumers.

Interruption

Thread.interrupt() interrupts a thread.
Interruption is a cooperative mechanism, i.e., One thread cannot force another to stop what it is doing. Thread.interrupt() merely requests a thread to stop at a convenient stopping point.
If a method is marked to throw an InterruptedException, it means that the method is blocking and that it will attempt to stop blocking early if interrupted.
No language specification about how to deal with interrupts. Most natural option is to cancel whatever the thread is currently doing. Blocking methods that are interruptible make it easy to cancel long-running tasks when necessary.
One common option to handle the InterruptedException is to propagate it to your caller. This can be done by not catching it at all, or by catching and rethrowing it after performing some local cleanup.
In cases where you cannot throw an InterruptedException (for eg: inside Runnabe.run()), you must catch the InterruptedException and restore the interrupted status by calling Thread.currentThread().interrupt() on the current thread. This allows code higher up in the call stack to see that the thread was interrupted.
Never catch an InterruptedException and ignore it - except when extending Thread (and therefore controlling all code higher up in the call stack).

Synchronizers - an object that coordinates the control flow of threads based on its state. BlockingQueue is one example of a synchronizer. Latch, FutureTask, Semaphores and Barriers are other examples of synchronizers.

Latch - a synchronizer that can block threads until it reaches its terminal state. A latch acts as a gate. Once open, it remains open forever.

Eg usage: Ensure that a computation cannot proceed until the resources needed by it have been initialized, Wait until all players in a multi-player game have finished their moves.
CountDownLatch - initialized with positive integer.Threads call await(), which blocks till counter becomes 0. Other threads call countDown() which decreases the count.

FutureTask - mainly used to represent long running or async computation (for eg: by the Executor framework)

The computation is encapsulated in a Callable (result-bearing equivalent of Runnable).
FutureTask.get() returns result immediately if computation is done, or if exception is thrown or if cancelled; otherwise blocks till done. Result obtained from get() is safely published.
Once complete, it stays in completed state forever.
Future.get() can throw an ExecutionException if the Callable.run() throws one. Check all known exceptions when calling get(). Other exceptions are generally rethrown.

Semaphores

Counting semaphores are used to control the number of threads that can simultaneously access a resource. A thread wishing to use the resource must acquire() a virtual permit and release() it when done. acquire() blocks if no permits are available.
A binary semaphore is a mutex with non-reentrant locking, unlike the intrinsic java object lock which is reentrant.
Can be used to turn any collection into a bounded blocking collection.

Barriers

Similar to latches, but all threads must come together at the barrier point at the same time in order to proceed.
CyclicBarrier allows a fixed number of threads to rendezvous repeatedly. Useful in parallel iterative algorithms.
If a thread blocked on await() is interrupted or an await() times out, then BrokenBarrierException is thrown.
When barrier is successfully passed, await() returns with a unique arrival index per thread, which can be used for leader election amongst the threads.
Also supports barrier action - a Runnable to be executed when barrier is successfully passed but before threads are released.

For building an efficient scalable result cache, use a ConcurrentHashMap> putIfAbsent()

The Ex CS Grad Student

Wednesday, September 19, 2012

Java Concurrency in Practice - Summary - Part 3

Chapter 4 - Composing objects

Chapter 5: Building Blocks

1 comment: