Comments
Patrick Collands wrote: collands (AT) gmail com I'd be very grateful for an invitation. Thank you.
Cloud Expo on Google News

SYS-CON.TV

2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Click For 2008 West
Event Webcasts
How JVMs Improve Application Performance
Enable optimizations

This article looks at a Java Virtual Machine (JVM) feature called Escape Analysis in some detail and how the JVM can use it to improve an application's performance. As you'll see, understanding what the JVM can do with escape analysis can help explain some otherwise non-intuitive performance results.

A Performance Puzzle
Let's assume you have a utility class that generates values by keeping some internal state of the last value generated and using it to arrive at the next value. Since the class maintains some internal state, and since this is a utility class that you might use in multithreaded environments, you decide to synchronize the getNext() method. You might code it as:

public class Generator {
private int lastgen;  // saved state
Generator(int value) {
// initialize lastgen using supplied value  
}
public synchronized int getNext() {
int nextgen = somefunction(lastgen);
lastgen = nextgen;
return nextgen;
}
}

Assume you use this generator class in a hot method of your application as follows:

public class HotClass1 {
final private static int ASIZ = 10000;
int a[] = new int[ASIZ];
public int hotMethod() {
Generator mygen = new Generator(System.nanoTime());
// fill in array with generated values
for (i=0; i<ASIZ; i++) {
a[i] = mygen.getNext();
}
// use array of values to compute something
// and return it
for (i=0; i<ASIZ-1; i++) {
computedValue += someFunc(a[i], a[i+1])
}
return computedValue;
}
}

As part of a code review, you decide that you don't really need to create a new generator object every time you call hotMethod(). Instead, you can just have a single generator as part of the HotClass. This should be more efficient because you don't need to allocate and initialize a new mygen object on every call. So you move the declaration of mygen out to class level:

public class HotClass2 { private Generator mygen = new Generator(System.nanoTime());
public int hotMethod() {
// rest of hotMethod is unchanged
}
}

To your surprise, you find that your application now runs more slowly.

What's going on?

Looking at Profiles
You decide to use a profiler like AMD's CodeAnalyst, which reveals that the amount of time you spend in hotMethod() has now gone up. Drilling down to a more detailed level, you see that the time spent computing computedValue hasn't changed; the increased wait time is all in the beginning part of hotMethod() where the array is filled. You also notice that no timer samples show up in the Generator.getNext() method, which is surprising because that's the only method called in that loop.

When a hot method calls a target method and no timer samples are seen in the target method, the usual explanation is that the just-in-time (JIT) compiler has inlined the target method. This optimization expands the target method in the calling method (as if it were written inline) and is particularly useful in hot methods. Inlining eliminates the overhead of a call and return instruction, and often eliminates some register saving and shuffling. In addition, all the normal optimizations like constant folding that can be done in a method can now be applied across the inlined method boundary just because the target is inlined.

One downside of inlining for profiling purposes is that the JVM often hides the information about the inlined method from the profiling tool. The calling method gets more samples, but those samples all get attributed to the source line where the target method is invoked, rather than to the individual lines within the target method. AMD is working with JVM vendors to help make this inlining information available to profiling tools. Note that some JVMs have ways to disable inlining but, in that case, you're not profiling the actual code that will be generated.

Let's use AMD CodeAnalyst to look at the generated code. The source line:

a[i] = mygen.getNext();

seems to have generated some extra code in the HotClass2 version. In particular, a lot of timer samples show up right after an instruction that looks something like:

lock cmpxchg   [esi+4], ecx

and this instruction doesn't exist in the HotClass1 version.

Synchronization Locks in Our Performance Puzzle
It looks like the HotClass2.getNext() method is spending time acquiring locks that HotClass1 did not acquire. Note: if your JVM has a monitoring tool that lets you record how many locks are acquired, you could confirm this lock acquire count difference by using such a tool.

Let's try to understand why the JVM needs to generate synchronization locks for HotClass2, but not for HotClass1. After all, the getNext() method itself didn't change. It is synchronized in both cases.

First, remember that the getNext() method was inlined into hotMethod() in both cases. Once it's inlined, the JIT compiler is free to use optimizations specific to this invocation. In particular, in HotClass1, the JIT compiler can see that mygen's scope is limited to hotMethod() and a reference to mygen does not "escape," possibly to be accessed by some other thread. Thus, no other thread can possibly use this object. If no other thread can possibly use this object, the semantics of the getNext() method's synchronized keyword are guaranteed without the need to acquire locks. Note that this optimization would not be legal in a generic non-inlined getNext().

In HotClass2, on the other hand, mygen exists at class scope, which makes it accessible by any thread that accesses that same HotClass2 object. Declaring mygen as private makes no difference, since mygen is still accessible to any method in the class, like getNext().

It's easy for the JVM to determine that the scope of an object is local, but not always so easy to determine whether the object "escapes." The JVM must detect whether a reference to the object is copied to a class field, or if the reference is passed to some other method and that other method allows the reference to escape. This phase of analysis is called Escape Analysis, and JVMs are constantly trying to improve their Escape Analysis to detect more non-escaping cases. In fact, if you run this experiment on different JVMs, you may not see the performance discrepancy on a certain JVM because either it did not do escape analysis or its analysis did not detect that mygen did not escape in this case. Note that, by language semantics, if the JVM cannot prove that a reference to an object did not escape, it must be pessimistic and assume that it did escape, thus limiting optimizations like the one we saw in HotClass1.

In this particular test application, even in the HotClass2 case you were only accessing HotClass2 from one thread (in fact, the whole application only had one thread). Shouldn't the JVM have been able to detect that and eliminate the unneeded lock acquisitions? Ideally, yes; however, it turns out it's much harder for the JVM to prove that a particular HotClass2 object and its associated mygen object is not accessed by some other thread. And, even when it can detect this, it would have to be able to handle the case where a new thread is created later in time and that new thread tries to access the object. The JVM would have to recompile hotMethod() with the locking code back in.

Escape Analysis and Heap Allocations
Eliminating unneeded synchronization locking is a clear benefit of escape analysis, but escape analysis can lead to other optimizations. For example, if an object is at local scope and does not escape, it doesn't have to be allocated on the heap at all; it can simply be allocated on the thread's local stack. Everything allocated on the local stack is effectively deallocated when the stack is popped at method exit. If the object were allocated on the heap, it would have to be collected later by the garbage collector. In the hotMethod() example above, assuming your JVM detects that mygen does not escape, you should not see any heap usage changes during the execution of hotMethod().

Summary
We've shown how a JVM can use escape analysis to enable some optimizations like eliminating unnecessary synchronization and allocating objects on the stack rather than on the heap. In your own Java code, this is something to be aware of when you are deciding whether to declare a new object at method scope or at class scope. If the object doesn't have state that must be preserved outside the method and if the object's constructor isn't too large, you should create the object at method scope. This would allow the JVM to detect that the object does not escape and enable optimizations like those mentioned here.

About Tom Deneau
Tom Deneau is a member of the AMD technical staff.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest Cloud Developer Stories
The Enterprise Cloud Requires a real time infrastructure and a management discipline that understands and can enforce service level discipline.
CloudBench Applications, Inc. announced its financial results for the three months and nine months ending September 30, 2009. All amounts are stated in Canadian dollars unless otherwise noted. Revenues from BasicGov, the Company's cloud computing solution for local government, gr...
The new contract is an industry first, with CSC being the first Microsoft partner to lead and win a cloud computing services agreement of this scale. Under terms of the contract, CSC will provide Royal Mail Group's 30,000 employees with access to new IT services using Microsoft's...
Operates in over 170 countries and is one of the world’s leading providers of communications solutions and services. Richard Tarboton talks for MeettheBoss.TV on his role as Head of Energy & Carbon for BT and what they are doing towards reducing carbon emissions.
CA is going to put its Agile Planner software on salesforce.com’s Force.com platform in the first half to accelerate development time and give users visibility over their development initiatives to reduce time-to-market. Customers are supposed to be able to accelerate the deploym...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News
CloudBench Applications, Inc. announced its financial results for the three months and nine months e...