Java SE 6
Thread Pooling in Java Applications
Thread Pooling in Java Applications
By: Vishal Goenka
Nov. 1, 2002 12:00 AM
There are several textbooks and Internet articles that dwell on the performance and scalability benefits of using a thread pool versus creating new threads in a multithreaded Java application.
While some of them overstate the benefits, most fail to emphasize some of the caveats of Java thread pooling. Due to space contraints, this article provides only a brief summary of the benefits and emphasizes the drawbacks. A list of references that covers the benefits in more detail is provided at the end.
What Is Thread Pooling?
Why Pool Threads?
To be clear, I'm not saying you don't need to manage the number of active threads in a system. After all, the benefits of multithreading do have diminishing returns once the number of threads contending for the available CPUs increases. If a server can process only about 1,000 simultaneous requests, it doesn't help to dispatch each incoming request as it's made. Often the requests must be queued and processed at a controlled rate to maintain the number of active requests below the server threshold. A common mistake, however, is to assume that dispatching queued requests automatically calls for the reuse of threads from a thread pool. Dispatching a request to a new thread and letting the thread die once the request is serviced achieves the same effect on managing the number of active threads in the system.
Thread creation also has an overhead that can be higher in many cases than the overhead of managing a thread pool. While the argument still applies, the relative performance impact has changed significantly over the years. The newer JVM implementations are optimized for creating threads; most use a combination of user-level threads (known as green threads) as well as system-level threads (or OS threads) to make creating threads much less expensive than in earlier implementations.
The Dichotomy of Pooling Threads
I distinguish thread pooling in general from thread pooling in Java simply because many of the arguments that apply to thread pooling in Java do not apply to other programming environments. Perhaps a common source of misconception about the benefits of thread pooling in Java stems from our experiences in other environments where the cost-benefit equation tilts strongly in favor of thread pooling. In the following discussion, "thread pooling" implies "thread pooling in Java," unless stated otherwise.
Thread Pooling Breaks Usage of Thread-Local Variables
Think of a ThreadLocal variable as a hashmap that stores one value per thread by using the thread as a key into the hashmap; however, these values are "associated" with the thread in a stronger and more intrusive way. Each thread maintains a reference to a private version of a hashmap (implemented as a package accessible class, ThreadLocalMap) that contains all the thread-local variables associated with that thread. Each thread uses the declared ThreadLocal variable as the key into the hashmap to store one value per ThreadLocal variable. When a thread dies and is garbage collected, all thread-local values referenced by it are subject to garbage collection (unless they're referenced elsewhere).
InheritableThreadLocal extends ThreadLocal to allow thread-local variables associated with a parent thread to be inherited by any new child thread created by the parent thread. This class is designed to replace the ThreadLocal in those cases where a per-thread attri- bute being maintained by the variable, such as UserId, TransactionId, etc., must be automatically transmitted to any child threads that are created. To achieve the inheritance, the Thread class maintains a separate private hashmap (ThreadLocalMap) for inheritable thread-local variables. The Thread constructor ensures that the inheritable thread-local variables of the executing thread (the parent thread) are copied onto itself (the child thread).
Thus, each Thread object has explicit references to all the thread-local variables, which in turn are only accessible via the ThreadLocal or InheritableThreadLocal object. Like normal variables, private ThreadLocal or InheritableThreadLocal variables are only accessible to the declaring class and the threads associated with them. While it's possible to expose a method in the Thread class to "purge" all (inheritable) thread-local variables associated with the thread, it would require additional security checks to ensure that only privileged code can do so, the privilege being ascertained using the Java permission mechanism. Given the lack of such a construct even in the latest versions of the J2SE/J2EE APIs, there's no way for a thread-pool manager to purge or reset all the thread-local variables associated with a given thread when reusing the thread in a different request context without the explicit cooperation of all code that uses any thread-local variables.
Unless the declaring code "removes" a value assignment by explicitly setting the value to null, thread-local variables remain assigned and hence "associated" with the thread. As a result, any code that uses thread locals risks using stale/incorrect values of the variables that were created in an earlier request context when running in a pooled thread. Given that ThreadLocal and InheritableThreadLocal are standard J2SE/J2EE classes, they're quite likely being used in various pieces of library code, none of which is safe to be executed by a pooled thread without an explicit understanding of the usage details.
The only way to get around this is to avoid using a pooled thread to execute code you don't know and control its implementation details. An application that uses a thread pool to dispatch requests made in different contexts is likely to have "inconsistent" logical errors when executing a piece of code while servicing a request that uses a thread-local variable.
Lack of a Standard Thread-Pooling Library
When using a new thread per request, the JVM's scheduler ensures that every runnable thread gets a fair share of the CPU, even if the share happens to be really small, as in the case where there are simply too many threads for the given execution environment. Using a size-bounded thread pool can cause queued requests to be starved. If one of the queued requests happens to be a producer (in a typical producer-consumer paradigm), it can lead to a deadlock if all the dispatched requests happen to be consumers waiting for the producer. Such application dependencies may necessitate knowledge of the application logic in the thread-pool dispatching decision, requiring some kind of priority dispatching construct. Priority-based dispatching opens up another can of worms, exemplified by the Mars Pathfinder "reset" problem caused by overlooking the classic priority-inversion problem.
Addressing all the design issues that a robust thread-pool library must implement is a nontrivial task. This happens to be one area of the system that can have systemic effects and bring your application to a grinding halt, unless tested for all potential race conditions and deadlocks, especially since the memory model in multiprocessor systems is often nonintuitive. This is no reflection of your abilities as a programmer, rather a statement about the inherent complexity of the problem and the effort involved in getting a robust implementation.
Performance Benefit Myths of Thread Pooling
To Pool or Not to Pool
How critical is the performance of that portion of the application and would you make the same decision if it turned out that you needed over a month to write a robust thread-pool library? Is it acceptable to risk an application deadlock due to a less-than-robust thread pool implemented in a few days? Do you have the time to validate and perhaps quantify the savings achieved when using a pooled thread versus creating a new thread? Do you have the time to validate correct behavior under heavy load on a multiprocessor machine, particularly when the boundary conditions on pool size are exercised? If you're not sure about the implementation details of some code, such as usage of thread-local variables, will the pooled thread run it?
In my own experience, a quick and dirty thread-pool implementation of the job at hand often comes back to bite you. A small perceived performance gain is probably not worth the risks introduced by a less-than-robust thread-pool implementation. Not that these concerns don't apply to other design decisions, but thread pooling falls in the category in which the risks are much higher and the benefits are often much lower than perceived.
Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
SYS-CON Featured Whitepapers
Most Read This Week