The Ex CS Grad Student: September 2012

Sunday, September 30, 2012

Java Concurrency in Practice - Summary - Part 9

This is part 9 of my notes from reading Java Concurrency in Practice.

NOTE: These summaries are NOT meant to replace the book. I highly recommend buying your own copy of the book if you haven't already read it.

Chapter 13 - Explicit Locks

Unlike intrinsic locking, the Lock interface offers unconditional, polled, timed, and interruptible lock acquisition.
Lock implementations provide the same memory visibility guarantees as intrinsic locking. They can vary in locking semantics, scheduling algorithms, ordering guarantees and performance.
ReentrantLock has same semantics as a synchronized block.
Why use explicit locks over intrinsic locks?

Unlike intrinsic locking, a thread waiting to acquire a ReentrantLock can be interrupted.
ReentrantLock also supports timed lock acquisition.
WIth intrinsic locks, a deadlock is fatal.
Intrinsic locks must be released in the same code block they are acquired in. This makes non-blocking designs impossible.
ReentrantLock is much faster than intrinsic locking in Java 5.0

Lock objects are usually released in a finally block, to make sure that it is released if an exception is thrown.
lockInterruptibly() helps us build cancelable tasks.
tryLock() returns false if the lock cannot be acquired. Timed tryLock() is also responsive to interruption.
ReentrantLock offers two fairness options

Fair - threads acquire locks in order of requesting.
Non-fair (default) - thread can acquire lock if it is available at the time of the lock request, even if earlier threads are waiting. Non-fair locking is useful because it avoids the overhead of suspending/resuming a thread if the lock is available at time of the lock request.
Fairness is usually not needed, and has a very high performance penalty (multiple orders of magnitude).
Fair locks work best when they are held for a relatively long time or when the mean time between lock requests is large.

When to use intrinsic locks?

synchronized blocks have a more concise syntax. You can never forget to unlock a synchronized block.
Use ReentrantLock only when advanced features like timed, polled, interruptible lock acquisition, fairness or non-block structured locking are needed.
Harder to debug deadlock problems when using ReentrantLock because lock acquisition is not tied to a particular stack frame, and thus the stack dump is not very helpful.
synchronized is likely to have more performance improvements in the future (eg: lock coarsening) as it is part of the Java language spec.

Read-Write Lock - protected resource can be accessed by multiple readers or one writer at the same time.

offers readLock() and writeLock() methods which return a Lock object that must be acquired before doing the respective operations.
More complex implementation. Hence has lower performance except in read-heavy workloads.

Lock can only be released by thread that acquired it.

Chapter 14 - Building Custom Synchronizers

State-dependent classes - blocking operations can proceed only if state-precondition becomes true (for example, you cannot retrieve result of FutureTask if computation has not yet finished).
Try to use existing state-dependent classes whenever possible.
Condition queue - allows a group of threads (called wait set) to wait for a specific condition to become true.
Intrinsic condition queues - Any java object can act as a condition queue via the Object.wait(), notify() and notifyAll() functions.

Must hold intrinsic lock on an object before you can call wait(), notify() or notifyAll().
Calling Object.wait() atomically releases lock and suspends the current thread. It reacquires the lock upon waking up, just before returning from the wait() function call. wait() blocks till thread is awakened by a notification, a specified timeout expires or the thread is interrupted.
In order to use condition queues, we must first identify and document the pre-condition that makes an operation state-dependent. The state variables involved in the condition must be protected by the same lock object as the one we wait() on.
A single intrinsic condition queue can be used with more than one condition predicate. This means that when a thread is awakened by a notifyAll, the condition it was waiting on need not be true. wait() can even return spuriously without any notify(). The condition can also become false by the time wait() reacquires the lock after waking up. Hence when waking up from wait(), the condition predicate must be tested again and we must go back to waiting if it is false. Hence, call wait() in a loop: synchronized(lockObj) { while(!conditionPredicate()) { lock.wait();} // object is in desired state now

Notifications are not sticky - i.e. a thread won't know about notifications that occurred before it called wait().
In order to call notify() or notifyAll() on an object, you must hold the intrinsic lock on that object. Unlike wait(), the lock is not automatically released. The lock must be manually released soon as none of the woken up threads can make progress without acquiring the lock.
Use notifyAll() instead of notify(). If multiple threads are waiting on the same condition queue for different condition predicates, calling notify() instead of notifyAll() can lead to missed signals, as only the wrong thread may be woken up.

However using notifyAll() can be very inefficient, as multiple threads are woken up and contend for the lock where only one of them can usually make progress.
notify() can be used only if

The same condition predicate is associated with the condition queue and each thread executes the same logic on returning from wait().
A notification on the condition queue enables at most one thread to proceed.

A bounded buffer implementation needs to call notify only when moving away from the empty state or full states. Such conditional notifications are efficient, but makes the code hard to get right. Hence, avoid unless necessary as an optimization.
A state dependent class should either fully document its waiting/notification protocols to sub-classes or prevent sub-classes from participating in them at all.
Encapsulate condition queue objects in order to avoid external code from incorrectly calling wait() or notify() on them. This often implies the usage of a private lock object instead of using the main object itself.
Explicit Condition objects - Condition

Each intrinsic lock can have only one associated condition queue. Hence multiple threads may wait on same condition queue for different condition predicates.
A Condition is associated with a single Lock object. A Condition is created by calling Lock.newCondition(). You can create multiple Condition objects per Lock.
Equivalents of wait(), notify() and notifyAll() for Condition are await(), signal() and signalAll(). Since Condition is an Object, wait() and notify() are also available. Do not confuse them.
Explicit Condition objects make it easier to use signal() instead of signalAll().

Synchronizers

Both Semaphore and ReentrantLock extend AbstractQueuedSynchronzer (AQS) class.
AQS is a framework for building locks and synchronizers.
When using AQS, there is only one point of contention.
Acquisition - state dependent operation that can block.
Release - allows some threads blocked in acquire to proceed. Not-blocking

AQS manages a single integer of state for the synchronizer class. It can be accessed with getState(), setState() and compareAndSetState() methods. The integer can represent arbitrary semantics. For example, FutureTask uses it to represent the state (running, completed, canceled) of the task. Semaphore uses it to track the number of permits remaining.

Synchronizers track additional state variables themselves.
Synchronizers override tryAcquire, tryRelease, isHeldExclusively, tryAcquireShared and tryReleaseShared. The acquire, release, etc methods of AQS call the appropriate try methods,

Thursday, September 27, 2012

Java Concurrency in Practice - Summary - Part 8

This is part 8 of my notes from reading Java Concurrency in Practice.

NOTE: These summaries are NOT meant to replace the book. I highly recommend buying your own copy of the book if you haven't already read it.

Chapter 11 - Performance and Scalability

Avoid premature optimization - first make it right, then make it fast, if not fast enough already (as indicated by actual performance measurements)
Tuning for scalability is often different from tuning for performance, and are often contradictory.
Amdahl's Law : Speedup <= 1/( F + (1-F)/N) where F is the fraction of computation that must be executed serially, and N is the number of processors.
A shared work queue adds some (often overlooked) serial processing. Result handling is another form of serialization hidden inside otherwise seemingly 100% concurrent programs.
Costs of using threads

context switches - managing shared data structures in OS and JVM take memory and CPU. Can also cause flurry of processor cache misses on a thread context switch.
When a thread blocks on a lock, it is switched out by JVM before reaching its full scheduled CPU quantum, leading to more overhead.

Context switching costs 5000-10000 clock cycles (few microseconds). Use vmstat to find % of time program spent in the kernel. High % can indicate high context switching.
synchronized and volatile result in the use of special CPU instructions called memory barriers that involve flushing/invalidating CPU caches, stalling execution pipelines, flushing hardware write buffers, and inhibit compiler optimizations as operations cannot be reordered.
Performance of contended and uncontended synchronization are very different. synchronized is optimized for the uncontended scenario (20 to 250 clock cycles). volatile is always uncontended.
Modern JVMs can optimize away locking code that can be proven to never contend.
Modern JVMs perform escape analysis to identify thread-confined objects and avoid locking them.
Modern JVMs can do lock coarsening to merge multiple adjacent locks into a larger lock to avoid multiple lock/unlocks.
Synchronization by one thread affects performance of other threads due to traffic on the shared memory bus.
Uncontended synchronization can be handled entirely in JVM. Contended synchronization involves OS activity - OS needs to suspend the thread that loses the contention.
Blocking can implemented by spin-waiting or by suspending the thread via the OS. spin-waiting is preferred for short waits. JVM decides what to use based on profiling past performance.
Reducing lock contention

reduce duration for which locks are held.
reduce frequency at which locks are requested. Coarsen lock granularity by lock splitting (for moderately contended locks) and lock striping (for heavily contended locks).
replace exclusive locks with coordination mechanisms that permit greater concurrency.

Lock striping - ConcurrentHashMap uses 16 locks - bucket N is guarded by lock N % 16. Locking for exclusive access to entire collection is hard when lock striping is used.
Avoid hot fields like cached values - for eg: size is cached for a Map, in order to convert an O(n) operation to a O(1) operation. Use striped counters or atomic variables.
Alternatives to exclusive locks - concurrent collections, read-write locks, immutable objects, atomic variables.
Do not use object pools. Object allocation and GC were slow in earlier versions of Java. Now object allocation is faster than a C malloc - only 10 machine instructions. Object pools also introduce synchronization overheads

Chapter 12 - Testing Concurrent Programs

Every test must wait till all the threads created by it terminate. It should then report any failures in tearDown().
Testing blocking operations need some way to unblock a thread that has blocked as expected. This is usually done by doing the blocking operation in a new thread and interrupting it after waiting for some time. An InterruptedException is thrown if the operation blocked as expected.
Thread.getState() should not be used for concurrency control or testing. Useful only for debugging.
One approach to test producer-consumer programs is to check that everything that is put into a queue or buffer eventually comes out of it, and nothing else does.

For single producer-single consumer designs, use order sensitive checksum of elements that are added, and verify them when the element is removed. Do not use a synchronized shadow list to track the elements as that will introduce artificial serialization.
For multiple producer-consumer designs, use an order insensitive checksum that can be combined at the end of the test to verify that all enqueued elements have been dequeued.

Make sure that the checksums are not guessable by the compiler (for eg: consecutive integers), so that they are not precomputed. Use a simple random number generator like xorShift(int y) { y ^= (y << 6); y ^= (y >>> 21); y ^= (y << 7); return y;}

Test on multi-processor machines with fewer processors than active threads.
Generate more thread interleaving by using Thread.yield() to encourage more context switches during operations that access shared state.
Always include some basic functionality testing when doing performance testing to make sure that you are not measuring performance of broken code.
Non-fair semaphores provide better throughput, while fair semaphores provide lower variance in responsiveness.
Avoiding performance testing pitfalls

Ensure that garbage collection does not run at all during your test (check this using the -verbose:gc flag) OR ensure that garbage collection runs a number of times during the test (need to run test for a long time).
Your tests should run only after all code has been compiled; no point measuring performance of interpreted byte code. Dynamic compilation takes CPU resources. Compiled code executes much faster.

Code may be decompiled/recompiled multiple times during execution - for eg: if some previous assumption made by JVM is invalidated, or to compile with better optimization flags based on recently gathered performance statistics.
Run program long enough (several minutes) so that compilation and interpreted execution represent a small fraction of the results and do not bias it.
Or have an unmeasured warm-up run before starting to collect performance statistics.
Run JVM with -XX:+PrintCompilation so that we know when dynamic compilation happens.

When running multiple unrelated computationally intensive tests in a single JVM, place explicit pauses between tests in order to give the JVM a chance to catch up with its background tasks. Don't do this when measuring multiple related activities, since omitting CPU required by background tasks gives unrealistic results.
In order to obtain realistic results, concurrent performance tests should approximate the thread-local computation done by a typical application. Otherwise, there will be unrealistic contention.
Make sure that compilers do not optimize away benchmarking code.

Trick to make sure that benchmarking calculation is not optimized away: if (fox.hashCode() == System.nanoTime()) System.out.print(" ");

Complementary Testing Approaches

Code Review
Static analysis tools: FindBugs has detectors for:

Inconsistent synchronization.
Invoking Thread.run (Thread.start() is what is usually invoked, not Thread.run())
Unreleased lock
Empty synchronized block
Double-checked locking
Starting a thread from a constructor
Notification errors
Condition wait errors: Object.wait() or Condition.await() should be called in a loop with the appropriate lock held after testing some state predicate.
Misuse of Lock and Condition
Sleeping or waiting while holding a lock.
Spin loops

Java Concurrency in Practice - Summary - Part 7

This is part 7 of my notes from reading Java Concurrency in Practice.

NOTE: These summaries are NOT meant to replace the book. I highly recommend buying your own copy of the book if you haven't already read it.

Chapter 9 - GUI Applications

Almost all GUI toolkits, including Swing, are implemented as a single-threaded subsystem. All GUI activity is confined to a single dedicated event dispatch thread. Attempts at multi-threaded GUIs suffered from deadlocks and race conditions. User actions manifest as events that bubble up from the GUI component to the application. Application initiated actions bubble down from the application code to the GUI components. Hence, GUI components are often accessed in opposite order, creating ripe conditions for deadlocks.
Tasks that execute in the event thread must complete quickly. Otherwise the UI will hang.
In Swing, GUI objects are kept consistent not by synchronization, but by thread confinement. They must NOT be accessed from any other thread.
A few Swing methods are thread-safe:

SwingUtilities.isEventDispatchThread
SwingUtilities.invokeLater - schedules a Runnable to be executed on the event thread.
SwingUtilities.invokeAndWait - callable only from a non-GUI thread. Schedules Runnable to be executed on GUI thread and waits for it complete
methods to enqueue a repaint or revalidate request on the event queue.
methods for adding/removing event listeners.

Short-running tasks can be run directly on the GUI thread. For long running tasks, use Executors.newCachedThreadPool().
Use Future, so that tasks can be easily cancelled. The task must be coded so that it is responsive to interruption.
SwingWorker class provides support for cancellation, progress indication, completion notification. So, we don't have to implement our own using FutureTask and Executor.
Data models must be thread-safe if they are to be accessed from the GUI thread.
A program that has both a presentation-domain and an application domain data model is said to have a split-model design.

presentation data model is confined to event thread. Application domain data model is thread-safe and is shared between the application and GUI threads.
presentation model registers listeners with the application model so that it can be notified of updates. Presentation model can be updated from the application model by sending a snapshot of the current state or via incremental updates.

Chapter 10 - Avoiding Liveness Hazards

Unlike database systems, JVM does not do deadlock detection or recovery
A program will be free of lock-ordering deadlocks if all threads acquire the needed locks in a fixed global order.

The order of locks acquired by a thread may depend on external input. Hence static analysis alone is not sufficient to avoid lock-ordering deadlocks.
An alternative is to induce an ordering on locks by using System.identityHashCode. Order lock acquisition by the hash code of the lock object.

In the extremely unlike scenario where the hash codes of two lock objects are equal, acquire a third "tie" lock before trying to acquire the original two locks. The tie lock can be a global lock. Since hash collisions are infrequent, the tie lock won't introduce a concurrency bottleneck.

If the lock objects (say bank Accounts) have a unique key, lock acquisition can be ordered by the key, and there is no need for the tie-lock.
Multiple locks may not always acquired in the same method. Hence, it is not easy to detect lock-ordering deadlocks. Watch out for invocation of alien methods while holding a lock.

Calling a method with no locks held is called an open call. Liveness of a program can be more easily analyzed if all calls are open.
Use synchronized blocks within methods to guard shared state, instead of making the entire method synchronized.

In cases where loss of atomicity of the synchronized method is unacceptable, we need to construct application level protocols. For example, when shutting down a service, lock for just long enough to mark the service as shutting down, and wait for existing tasks to complete without holding the lock. Since the service is marked as shutting down, no new tasks will start.

In addition to deadlocking waiting for locks, threads can also deadlock waiting for resources like database connections.
If you must acquire multiple locks, lock ordering must be part of your design. Minimize number of locks needed. Document ordering policy.
Timed locks offered by the Lock class are another option for detecting and recovering from deadlocks. The tryLock() method returns failure if timeout expires. It can return failure even if no deadlock occurred, but the thread just took a long time due to some other reason.
JVM prints out deadlock information in thread dumps. To trigger a thread dump, send SIGQUIT (kill -3) to the JVM. Explicit Lock objects are not clearly shown in a thread dump.
Starvation - a thread is perpetually denied access to needed resources.

CPU cycle starvation can be caused by inappropriate use of thread priorities, or by executing infinite loops with locks held.
Avoid setting thread priorities as they are platform-dependent and can cause liveness issues. Set lower priorities only for truly background tasks, that can improve the responsiveness of foreground tasks.

Livelock - thread is not blocked, but cannot make progress because it keeps retrying an operation that will always fail. For example, when a code bug is triggered when processing a particular input, and that input is re-queued for processing by over-eager error handling code. An unrecoverable error is being mistakenly being treated as a recoverable one. Solution for some forms for livelocks is to introduce randomness into the retry.

Saturday, September 22, 2012

Java Concurrency in Practice - Summary - Part 6

This is part 6 of my notes from reading Java Concurrency in Practice.

NOTE: These summaries are NOT meant to replace the book. I highly recommend buying your own copy of the book if you haven't already read it.

Chapter 8 - Applying Thread Pools

In the Executor framework, there is an implicit coupling between tasks and execution policies. Not all tasks are compatible with all execution policies.
If a task depends on the results of other tasks, then the execution policy must be carefully managed to avoid liveness problems. Deadlocks can happen if the thread pool is bounded, i.e. thread starvation deadlock.

Will always deadlock if using Executors.newSingleThreadExecutor().
Other resources like JDBC connections may also be a bottleneck.
Document any pool sizing or configuration constraints.

Tasks that rely on thread confinement for thread-safety will not work well with thread pools.
Responsiveness of time-sensitive tasks may be bad if we use a single thread executor or if we submit several long running tasks to a small thread pool. Use timed resource waits instead of unbounded waits.
Tasks that use ThreadLocal cannot be used with the standard Executor implementation as Executors may reuse or kill threads. Do not use ThreadLocal to communicate value between tasks.
For compute-intensive tasks, an Ncpu-processor system achieves optimum utilization with a thread pool of Ncpu + 1 threads. For tasks that include I/O or other blocking operations, use a larger thread pool since not all threads will be schedulable at all times.
ThreadPoolExecutor is the base class of executors returned by Executors.newCachedThreadPool, newFixedThreadPool and newScheduledThreadExecutor. It is highly configurable.
We can specify the type of BlockingQueue that holds tasks awaiting execution.

unbounded LinkedBlockingQueue is the default for newFixedThreadPool and newSingleThreadExecutor.
Another option is to use a bounded LinkedBlockingQueue, ArrayBlockingQueue or PriorityBlockingQueue.
SynchronousQueue - not really a queue. It is a mechanism for managing handoffs between threads. Another thread must be waiting to accept handoff - if pool maximum size has not been reached a new thread is created. If no thread is available, the task is rejected. Handoff is more efficient as we don't have to place the Runnable in an Queue. newCachedThreadPool uses a SynchronousQueue

newCachedThreadPool is a good default choice for an Executor.
Saturation Policy for a ThreadPoolExecutor can be modified by calling setRejectedExecutionHandler().

abort - causes execute() to throw the unchecked RejectedExecutionException. Caller catches this exception and implements its own overflow handling. This is the default.
discard - silently discard the newly submitted task.
discard-oldest - discard tasks that would be executed next and tries to resubmit the new task.
caller-runs - Tries to slow down the flow of new task submission by pushing some of the work to the caller. It executes the newly submitted task not in a pool thread, but in the thread that calls execute().
There is no predefined saturation policy to make execute() block when the work queue is full. However, this can be achieved using a Semaphore to bound the task injection rate.

Thread Factories - whenever a thread pool needs to create a thread, it uses a thread factory. ThreadFactory.newThread() is called whenever a thread pool needs to create a new thread. Default thread factory creates a new non-daemon thread with no special configuration. Use a custom thread factory to to specify an UncaughtExceptionHandler for pool threads, or instantiate an instance of a custom Thread class that does debug logging, or give pool threads more meaningful names.
Most ThreadPoolExecutor options can be changed after construction via setters. Executors.unconfigurableExecutorService wraps an existing ExecutorService to ensure that its configuration cannot be changed further. newSingleThreadExecutor() returns such a wrapped Executor rather than a raw ThreadPoolExecutor. This is because newSingleThreadExecutor is implemented as a thread pool with one thread, and no one should be able to increase the pool size.
ThreadPoolExecutor was designed for extension.

beforeExecute and afterExecute hooks are called in the thread that executes the task. Used for logging, timing, monitoring, statistics gathering. Use ThreadLocal to share values between beforeExecute and afterExecute.
afterExecute is not called if task completes with an Error (regular exception is okay)
If beforeExcute throws a RuntimeException, the task is not executed and afterExecute() is not called.
terminated hook is called after the thread pool has shutdown - all tasks have finished and all worker threads have shut down. Useful for releasing resources allocated by the Executor, notification, logging, finalize statistics gathering.

Parallelizing recursive algorithms

Sequential loops are suitable for parallelization when each iteration is independent of others, and the work done in each iteration is significant to offset cost of task creation.
Sequential loops within recursive algorithms can be parallelized. Easier if iteration does not need value of recursive iterations it invokes.
In order to wait for all results, create a new Executor, schedule the parallel tasks, call executor.shutdown() and then awaitTermination().

Friday, September 21, 2012

Java Concurrency in Practice - Summary - Part 5

This is part 5 of my notes from reading Java Concurrency in Practice.

NOTE: These summaries are NOT meant to replace the book. I highly recommend buying your own copy of the book if you haven't already read it.

Chapter 7 - Cancellation & Shutdown

Java does not provide any mechanism for safely forcing a thread to stop what it is doing. Instead, we need to rely on interruption where a thread can request another thread to stop. The thread to be stopped may choose to ignore the request or can terminate after optionally performing cleanup operations.
Don't use the deprecated Thread.stop() and suspend() methods.
A task is cancelable if some external code can move it to a completed state before its normal completion. A task that supports cancellation must specify its cancellation policy:

how external code can request cancellation
responsiveness guarantees to cancellation requests
what happens on cancellation (such as cleanup operations)

One cooperative mechanism for terminating a task is to use a cancellation requested flag which the task periodically checks. If the flag is set by some external code, then the task terminates early. Remember to make the cancellation requested flag volatile. Otherwise changes made by the external code may never become visible to the task.

The thread will exit only when it checks the cancellation flag. Hence there is no guarantee on if and when the check will be made.
Can take a very long time to take effect (if at all) if the thread to be cancelled is stuck at a blocking operation.

Interruption Is usually the most sensible way to implement cancellation
Each thread has a boolean interrupted status, which is set to true when a thread is interrupted.

interrupt() interrupts the target thread
isInterrupted() returns the interrupted status of the target thread
interrupted() clears the interrupted status of the current thread and returns its previous value. This is the only way to clear the interrupted status of a thread. Poor choice of function name.

Blocking library calls try to detect when a thread has been interrupted and return early. They clear the interrupted status and throw an InterruptedException. There is no guarantee about how quickly a blocking method will detect interruption. In practice, it happens fairly quickly. When a thread that is not blocked is interrupted, its interrupted status is set. It is upto the activity being cancelled to poll the interrupted status and respond appropriately.

A task does not need to immediately stop on detecting the interrupted status. It can postpone acting on the interruption till a more opportune moment. This can prevent internal data structures from being corrupted when interrupted in the middle of some critical operations.

There is a distinction between how tasks and threads react to interruption. An interrupt on a worker thread in a thread pool can cancel the current task as well as shut down the worker thread. Hence, guesT code that doesn't own a thread must preserve the interrupted status of the thread after acting on the interrupt, so that the owner can appropriately deal with it later.
A thread should be interrupted only by its owner. Because each thread has its own interruption policy, you should not interrupt a thread unless you know what interruption means to that thread.
Responding to interruption

Propagate the interruptedException, OR
Restore the interrupted status by calling Thread.currentThread.interrupt(). This is the only feasible solution in cases like Runnable.run() which does not allow exceptions to be thrown.
Only code that implements a thread's interruption policy may swallow an interruption request. General purpose task and library code must never swallow interruption requests.

Activities that do not support cancellation but still call interruptible blocking methods must call them in a loop, retrying when interruption is detected. The interruption status should be saved locally and restored before returning. Restoring the interrupted status immediately can result in an infinite loop.
Cancellation via Future

Future.cancel(boolean mayInterruptIfRunning) - if mayInterruptIfRunning is true and the task is running in some thread, then that thread is interrupted. If false, cancel() only means that don't run this task if it hasn't started yet.
Standard Executor implementation implement a thread interruption policy that allows tasks to be canceled through interruption. Hence, it is ok to call Future.cancel(true) which interrupts the thread.
You should not interrupt a pool thread directly when attempting to cancel a task because you won't know what task is running at the time the interrupt is delivered. Cancel only through the task's Future.
When Future.get() throws an InterruptedException or TimeoutException and you know that the result is no longer required, cancel the task by calling Future.cancel().

Dealing with non-interruptile blocking

synchronous socket I/O in java.io - read/write in InputStream and OutputStream are not responsive to interruption, but closing the underlying socket makes any threads blocked in read/write to throw a SocketException.
Synchronous I/IO in java.nio - Interrupting a thread waiting on an InterruptibleChannel causes it to throw ClosedByInterruptionException and close the channel. Closing an InterruptibleChannel cause threads blocked on channel operations to throw AsynchronousCloseException.
Asynchronous IO with Selector - A thread blocked in Selector.select() returns prematurely if close() or wakeup() is called.
Lock acquisition - A thread waiting for an intrinsic lock cannot be interrupted. Explicit Lock class offers the lockInterruptibly method.
To perform non standard cancellation tasks (like closing a socket), override newTaskFor() in ThreadPoolExecutor to return a CancellableTask. CancellableTask extends Callable and overrides the FutureTask.cancel() method to close socket or perform any other nonstandard cancellation tasks.

Stopping a thread-based service

A thread pool owns the worker threads, and should be responsible for stopping them. It should provide lifecycle methods that can be used by the application to shut down the pool, which in turn shuts down the worker threads.

ExecutorService provides shutdown() and shutdownNow(). shutdownNow() returns the list of tasks that had not started, so they can be logged or saved for future processing. The returned Runnable objects may not be the same as what was submitted - they may be wrapped.

shutdownNow() provides no way of knowing the state of tasks in progress at shutdown time, unless the tasks themselves do checkpointing. Another option is to override execute() of AbstractExecutorService and pass in a wrapper Runnable that records tasks cancelled at shutdown. There is a race condition that may cause a completed task to be marked as cancelled. So tasks must be idempotent.

Poison Pill - a special object placed on the work queue is another way to convince a producer-consumer service to shut down. Applicable only when the number of consumers and producers is known, and when the queue is unbounded.
Handling abnormal thread termination

Leading cause of premature thread death is RuntimeException. If nothing special is done, the exception bubbles all the way up the stack and the thread is killed after printing a stacktrace to the console (which no one may be watching for)
If you are writing a worker thread class for a thread pool or executor service, be sure to catch Throwable and then notify the executor service of premature thread death.
Thread API provides an UncaughtExceptionHandler facility - when a thread exits due to an uncaught exception, the JVM reports this to an application provided UncaughtExceptionHandler. Use Thread.setUncaughtExceptionHandler() to set the handler for the current thread or Thread.setDefaultUncaughtExceptionHandler to set it for all threads.
In long-running applications, always use uncaught exception handlers for all threads that at least log the exception.
To set an UncaughtExceptionHandler for pool threads, provide a ThreadFactory to the ThreadPoolExecutor. Exceptions thrown from tasks make it to the UncaughtExceptionHandler only for tasks submitted with execute(). For those submitted with submit(), the exceptions are rethrown when calling Future.get()

JVM Shutdown

orderly shutdown - when the last nondaemon thread exits, System.exit(), Ctrl-C
abrupt shutdown - Runtime.halt(), SIGKILL the JVM
In orderly shutdown, JVM starts all shutdown hooks registered while Runtime.addShutdownHook(). Order of shutdown hook execution is not guaranteed.
After all shutdown hooks have completed, JVM may choose to run finalizers if runFinalizersOnExit is true,
JVM makes no attempt to stop or interrupt any application threads that are still running.
Shut-down hooks can run concurrently with other application threads. So, they must be thread-safe. Since JVM is shutting down, the application state may be messy. Hence the shut-down hooks must be coded extremely defensively.
Shut-down hooks should not use services that can be shutdown by the application or by other shutdown hooks. One option is to use a single shutdown hook per application that executes various shutdown operations in sequence.

Daemon threads - existence of a Daemon thread does not prevent JVM from shutting down. Internal JVM threads like GC thread are daemon threads. When JVM exits, finally blocks of any existing daemon threads are not run.

Should be used sparingly, for activities that can be safely abandoned at any time without any cleanup.
Do not use daemon threads for any tasks that perform I/O.
Generally used for housekeeping tasks like a background thread to remove expired cache entries.

Finalizers

GC treats objects that have a non-trivial finalize() method specially. finalize() is called after the memory is reclaimed.
Finalizers can run concurrently with application threads. Hence, they must be thread-safe.
No guarantee about if or when they will run.
HIgh performance cost.
Usually, finally blocks and explicit close statements are sufficient to release resources, instead of using finalize().
finalizers may be needed for objects that hold resources acquired by native methods.
Avoid finalizers

Wednesday, September 19, 2012

Java Concurrency in Practice - Summary - Part 4

This is part 4 of my notes from reading Java Concurrency in Practice.

NOTE: These summaries are NOT meant to replace the book. I highly recommend buying your own copy of the book if you haven't already read it.

Chapter 6 - Task Execution

Individual client requests are a natural task boundary choice for server applications.
Creating a new thread per task is usually NOT a good idea.

Overheads of thread creation and teardown can add up to be quite significant.
Active threads consume system resources like memory even when they are idle, and also increase CPU contention.
Creating too many threads can result in an OutOfMemoryError.

Primary abstraction for task execution in Java is the Executor framework, not Threads.

Executor interface has a void execute(Runnable) methods.

Executors are the easiest way to implement a producer-consumer design.
Decouples task submission from execution. Execution policy is separated from task submission.
Always use Executor instead of new Thread(runnable).start()
Different types of Thread Pools can be created using static factory methods of the Executors class:

newFixedThreadPool - creates new threads upto a maximum specified size. Tries to keep pool size constant by adding new threads if some die due to exceptions.
newCachedThreadPool - No bounds on number of threads. Number of threads increase/decrease based on load.
newSingleThreadExecutor - Used to process tasks sequentially in order imposed by task queue (FIFO, LIFO, priority order). Replaces thread if it dies unexpectedly.
newScheduledThreadPool - Fixed size thread pool that supports delayed and periodic task execution.

Use instead of Timer.

Timer creates only a single thread. If one task takes too long, it affects the timing accuracy of subsequent TimerTasks.
An unchecked exception thrown by a TimerTask terminates the Timer thread. The Timer thread is not resurrected, and the Timer is simply cancelled.
Scheduled thread pools do not have the above two limitations However, Timers can schedule based on absolute time, while scheduled thread pools only support relative time.

Executor lifecycle has 3 states - running, shutting down, terminated.
ExecutorService interface (that extends Executor) offers methods like shutdown(), shutdownNow(), isShutdown(), isTerminated(), awaitTermination() to control Executor life cycle.

shutdown() - graceful shutdown. Allow all running and previously submitted tasks to complete. No new tasks are accepted.
shutdownNow() - cancel running tasks, and ignores any queued tasks.
Tasks submitted to executor after shutdown are passed to a rejection handler, which may silently drop the task or throw a RejectedExecutionException.
awaitTermination() is usually called immediately after calling shutdown().

ExecutorService.submit(Callable) returns a Future. A Future represents the lifecycle of a task, and provides methods to monitor/control it.
CompletionService combines the functionality of an Executor and a BlockingQueue. Submit a bunch of Callables to the Executor, and then wait for the results to be available using take() and poll().

An ExecutorCompletionService is a wrapper around an Executor - new ExecutorCompletionService(executor).
Tasks are submitted to the completion service, and not directly to the Executor.
Multiple completion services can share an Executor.
Keep track of the number of tasks submitted in order to determine the number of times to call take().

Future.get() supports a version that throws a TimeoutException if the result is not available within the specified time delay.

If a TimeoutException happens, then call cancel on the Future to cancel the task. If the task is written to be cancelable, then it can be terminated to avoid consuming unnecessary resources.

ExecutorService.invokeAll - takes a collection of tasks and returns a collection of Futures. Timed version of invokeAll returns when all tasks have completed, the calling thread is interrupted or if the timeout expires. Use Future.isCancelled() to determine if a particular task completed or was interrupted/cancelled.

Java Concurrency in Practice - Summary - Part 3

This is part 3 of my notes from reading Java Concurrency in Practice.

NOTE: These summaries are NOT meant to replace the book. I highly recommend buying your own copy of the book if you haven't already read it.

Chapter 4 - Composing objects

We often create thread-safe objects by composing together other thread-safe objects. In order to design a thread-safe class, we need to identify the object's state variables, establish invariants that constraint them and the post conditions associated with its method, and then establish a policy for managing concurrent access to them.

An object's state includes the state of other objects referenced from its state variables. For eg: a LinkedList's state includes all the link node objects.

Synchronization policy - how an object uses immutability, thread-confinement, locking (how and which variables are guarded by which locks) to coordinate access to its state variable so that the invariants or post conditions are not violated.
Encapsulation enables us to determine that a class is thread-safe without having to examine the entire program, because all paths that use the data can be identified and made thread-safe.

Collection wrappers like synchronizedList make the underlying non-thread-safe ArrayList thread-safe by using encapsulation.

Java monitor pattern - object encapsulates all state and guards it with its own intrinsic lock.

Pro: Simplicity.
Con: External code can lock on the object's intrinsic lock and cause liveness problems that may require examining the whole program to debug. This problem is avoided if a separate private object is used as the lock.

A composite object made of thread-safe components need not be thread-safe. This is usually the case when there are constraints on the state of the components.
A state variable can be published if it is thread-safe, does not participate in any invariants that constrain its value and has no prohibited state transitions for any of its operations.
Re-use existing java thread-safe libraries whenever possible. To add-functionality to an existing thread-safe class:

The best option is to modify the source code of the class if available.

Pro: details about synchronization policy are confined to one file, and is thus easier to maintain.

If modifying source code is not possible, the next best option is to extend the class, assuming it was designed for extension.

Con: Fragile. If base class changes synchronization policy (say which locks are used), then the extended class will silently fail.

Extension or source modification is not possible for collections wrapped in Collections.synchronized* wrappers, since the underlying wrapped class is unknown. The solution is to use client-side locking - guard client code with the lock specified by the object's synchronization policy. For Vector, the lock is the object itself.
Another option is Composition. In a wrapper object, maintain a private internal copy of the object (say List) whose functionality we wish to extend. Add the new functionality as a synchronized method of the wrapper. Add synchronized methods for existing functionality of the wrapped object that simply delegate to the underlying object.

Pro: Less fragile
Con: Minor overhead due to double locking.

Document a class's thread-safety guarantees for users of the class; Document its synchronization policy for the class's maintainers.

Use @GuardedBy annotations to document the locks used to guard different state variables.
Since documentation for commonly used Java libraries is vague, we often have to guess whether a class is thread-safe or not. For example, since a JDBC DataSource represents a pool of reusable database connections to be shared across multiple threads, we can assume that it is thread-safe. However, individual JDBC Connection objects are intended to be used by a single thread, and are most likely not thread-safe.

Chapter 5: Building Blocks

Java offers a wide variety of thread-safe classes that can serve as building blocks for large concurrent programs.
External locking is still required for thread-safe classes in order to provide a 'reasonable' behavior.
Unreliable Iteration is one example where additional synchronization is required. For example, using getLast() and deleteLast() on a thread-safe List requires additional synchronization. If getLast() determines the index of the last element to be L and the element at L is deleted by deleteLast() before getLast() accesses it, an ArrayOutOfBoundsException will be thrown. Otherwise, an ArrayOutOfBounds exception may be thrown if the last element of the list is deleted after getLast() determined the index of the last element to be returned. Note that the data in the List is never corrupted, we just get unexpected behavior.

Java Iterators throw an unchecked ConcurrentModificationException if it detects that the underlying collection has changed during iteration. This is not reliable as the counters used to track whether the underlying collection has changed are not thread-safe.
Iterators can sometimes be hidden - For eg: an iterator is used if a collection object is passed to a log statement which tries to get its string representation.
Unreliable iteration can be solved by client-side locking. When using the synchronized wrappers provided by Java, we can use the underlying collection as the lock for composite actions. Con: Decreases scalability.
Another option to avoid concurrent modification exception is to clone the collection as a local copy. The lock on the collection must be held while cloning. Con: Can be very expensive to clone large collections.
To avoid client-side locking during iteration, use the Concurrent collections offered by Java. This improves scalability as multiple threads can now access the collections simultaneously without blocking

ConcurrentHashMap

HashMap uses a single lock to synchronize all its operations.
ConcurrentHashMap uses lock-striping. Supports concurrent non-blocking access by infinite number of readers, and a limited number of writers.
Iterators do not throw ConcurrentModificationException.
size() and isEmpty() are approximate. These methods are not useful in concurrent environments anyway.
Cannot lock the entire map for synchronized access, needed in rare cases where multiple map entries need to be added atomically.
Cannot use client-side mapping while adding new atomic operations. If these are needed, you most likely need ConcurrentMap instead of ConcurrentHashMap.

CopyOnWriteArrayList

Thread-safety derived from the immutability of underlying list. Mutability is provided by creating and republishing a new copy of the list on every change. Iterators point to the list at the time the iterator was created.
Copying large lists can be expensive. Hence mainly useful when iteration is the more common operation rather than addition - for eg: when using a list of event listeners.

Some more concurrent collections - CopyOnWriteArraySet, ConcurrentLinkedQueue, ConcurrentSkipListMap - concurrent replacement for synchronized SortedMap, ConcurrentSkipListSet - concurrent replacement for synchronized SortedSets
Java offers multiple Queue implementations (esp. BlockingQueue) that can be very useful to implement producer-consumer designs. Queues can be blocking or non-blocking. For non-blocking queues, .retrieval operations return null if queue is empty.
BlockingQueue

blocking methods : take, put (blocking happens only if queue is bounded)
non-blocking methods: offer, poll
Use bounded blocking queues for reliable resource management. Otherwise, if consumers are slow, producers can keeping adding to the queue till the JVM runs out of heap space. Do this early in design; hard to retrofit later.
We can use offer() to check if the item will be accepted by the queue. If the queue is full, the item can be discarded in application specific ways (for eg: simply drop it or save it to local disk for later usage)
BlockingQueue implementations - contain sufficient internal synchronization to safely publish objects from a producer thread to a consumer thread

LinkedBlockingQueue
ArrayBlockingQueue
PriorityBlockingQueue
SynchronousQueue - Not really a queue as it does not maintain storage space for elements. Just maintains a list of queued threads waiting to enqueue or dequeue an element. Directly transfers item from producer to consumer - more efficient. Direct handoff also informs producer that consumer has taken responsibility for the item. take() and put() will block if no thread is waiting to participate in the handoff. Mainly used when there are always enough consumer threads.

Deque and BlockingDeque - allows efficient insertion and removal from head and tail.

Enables Work Stealing designs - In producer-consumer design, there is a single queue that is shared across all threads. This causes lots of contention. In work stealing, each consumer has its own deque, from the head of which it consumes items. If its deque is empty, it can steal objects from the tail of some other consumer's deque. Most of the time, a consumer takes objects from the head of its own deque, thereby avoiding contention. Even when it steals, there is little contention as it steals from the tail rather than the head. Work stealing is well-suited for applications where producers are also consumers.

Interruption

Thread.interrupt() interrupts a thread.
Interruption is a cooperative mechanism, i.e., One thread cannot force another to stop what it is doing. Thread.interrupt() merely requests a thread to stop at a convenient stopping point.
If a method is marked to throw an InterruptedException, it means that the method is blocking and that it will attempt to stop blocking early if interrupted.
No language specification about how to deal with interrupts. Most natural option is to cancel whatever the thread is currently doing. Blocking methods that are interruptible make it easy to cancel long-running tasks when necessary.
One common option to handle the InterruptedException is to propagate it to your caller. This can be done by not catching it at all, or by catching and rethrowing it after performing some local cleanup.
In cases where you cannot throw an InterruptedException (for eg: inside Runnabe.run()), you must catch the InterruptedException and restore the interrupted status by calling Thread.currentThread().interrupt() on the current thread. This allows code higher up in the call stack to see that the thread was interrupted.
Never catch an InterruptedException and ignore it - except when extending Thread (and therefore controlling all code higher up in the call stack).

Synchronizers - an object that coordinates the control flow of threads based on its state. BlockingQueue is one example of a synchronizer. Latch, FutureTask, Semaphores and Barriers are other examples of synchronizers.

Latch - a synchronizer that can block threads until it reaches its terminal state. A latch acts as a gate. Once open, it remains open forever.

Eg usage: Ensure that a computation cannot proceed until the resources needed by it have been initialized, Wait until all players in a multi-player game have finished their moves.
CountDownLatch - initialized with positive integer.Threads call await(), which blocks till counter becomes 0. Other threads call countDown() which decreases the count.

FutureTask - mainly used to represent long running or async computation (for eg: by the Executor framework)

The computation is encapsulated in a Callable (result-bearing equivalent of Runnable).
FutureTask.get() returns result immediately if computation is done, or if exception is thrown or if cancelled; otherwise blocks till done. Result obtained from get() is safely published.
Once complete, it stays in completed state forever.
Future.get() can throw an ExecutionException if the Callable.run() throws one. Check all known exceptions when calling get(). Other exceptions are generally rethrown.

Semaphores

Counting semaphores are used to control the number of threads that can simultaneously access a resource. A thread wishing to use the resource must acquire() a virtual permit and release() it when done. acquire() blocks if no permits are available.
A binary semaphore is a mutex with non-reentrant locking, unlike the intrinsic java object lock which is reentrant.
Can be used to turn any collection into a bounded blocking collection.

Barriers

Similar to latches, but all threads must come together at the barrier point at the same time in order to proceed.
CyclicBarrier allows a fixed number of threads to rendezvous repeatedly. Useful in parallel iterative algorithms.
If a thread blocked on await() is interrupted or an await() times out, then BrokenBarrierException is thrown.
When barrier is successfully passed, await() returns with a unique arrival index per thread, which can be used for leader election amongst the threads.
Also supports barrier action - a Runnable to be executed when barrier is successfully passed but before threads are released.

For building an efficient scalable result cache, use a ConcurrentHashMap> putIfAbsent()