|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
Features How JVMs Improve Application Performance
Enable optimizations
By: Tom Deneau
Mar. 16, 2009 06:45 PM
This article looks at a Java Virtual Machine (JVM) feature called Escape Analysis in some detail and how the JVM can use it to improve an application's performance. As you'll see, understanding what the JVM can do with escape analysis can help explain some otherwise non-intuitive performance results. A Performance Puzzle public class Generator { Assume you use this generator class in a hot method of your application as follows: public class HotClass1 { As part of a code review, you decide that you don't really need to create a new generator object every time you call hotMethod(). Instead, you can just have a single generator as part of the HotClass. This should be more efficient because you don't need to allocate and initialize a new mygen object on every call. So you move the declaration of mygen out to class level: public class HotClass2 { private Generator mygen = new Generator(System.nanoTime()); To your surprise, you find that your application now runs more slowly. What's going on? Looking at Profiles When a hot method calls a target method and no timer samples are seen in the target method, the usual explanation is that the just-in-time (JIT) compiler has inlined the target method. This optimization expands the target method in the calling method (as if it were written inline) and is particularly useful in hot methods. Inlining eliminates the overhead of a call and return instruction, and often eliminates some register saving and shuffling. In addition, all the normal optimizations like constant folding that can be done in a method can now be applied across the inlined method boundary just because the target is inlined. One downside of inlining for profiling purposes is that the JVM often hides the information about the inlined method from the profiling tool. The calling method gets more samples, but those samples all get attributed to the source line where the target method is invoked, rather than to the individual lines within the target method. AMD is working with JVM vendors to help make this inlining information available to profiling tools. Note that some JVMs have ways to disable inlining but, in that case, you're not profiling the actual code that will be generated. Let's use AMD CodeAnalyst to look at the generated code. The source line: a[i] = mygen.getNext(); seems to have generated some extra code in the HotClass2 version. In particular, a lot of timer samples show up right after an instruction that looks something like: lock cmpxchg [esi+4], ecx and this instruction doesn't exist in the HotClass1 version. Synchronization Locks in Our Performance Puzzle Let's try to understand why the JVM needs to generate synchronization locks for HotClass2, but not for HotClass1. After all, the getNext() method itself didn't change. It is synchronized in both cases. First, remember that the getNext() method was inlined into hotMethod() in both cases. Once it's inlined, the JIT compiler is free to use optimizations specific to this invocation. In particular, in HotClass1, the JIT compiler can see that mygen's scope is limited to hotMethod() and a reference to mygen does not "escape," possibly to be accessed by some other thread. Thus, no other thread can possibly use this object. If no other thread can possibly use this object, the semantics of the getNext() method's synchronized keyword are guaranteed without the need to acquire locks. Note that this optimization would not be legal in a generic non-inlined getNext(). In HotClass2, on the other hand, mygen exists at class scope, which makes it accessible by any thread that accesses that same HotClass2 object. Declaring mygen as private makes no difference, since mygen is still accessible to any method in the class, like getNext(). It's easy for the JVM to determine that the scope of an object is local, but not always so easy to determine whether the object "escapes." The JVM must detect whether a reference to the object is copied to a class field, or if the reference is passed to some other method and that other method allows the reference to escape. This phase of analysis is called Escape Analysis, and JVMs are constantly trying to improve their Escape Analysis to detect more non-escaping cases. In fact, if you run this experiment on different JVMs, you may not see the performance discrepancy on a certain JVM because either it did not do escape analysis or its analysis did not detect that mygen did not escape in this case. Note that, by language semantics, if the JVM cannot prove that a reference to an object did not escape, it must be pessimistic and assume that it did escape, thus limiting optimizations like the one we saw in HotClass1. In this particular test application, even in the HotClass2 case you were only accessing HotClass2 from one thread (in fact, the whole application only had one thread). Shouldn't the JVM have been able to detect that and eliminate the unneeded lock acquisitions? Ideally, yes; however, it turns out it's much harder for the JVM to prove that a particular HotClass2 object and its associated mygen object is not accessed by some other thread. And, even when it can detect this, it would have to be able to handle the case where a new thread is created later in time and that new thread tries to access the object. The JVM would have to recompile hotMethod() with the locking code back in. Escape Analysis and Heap Allocations Summary
Reader Feedback: Page 1 of 1
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News
|
|||||||||||||||||||||||||||||||||||||||||||||||||