Comments
Richard Davies wrote: The UK has a good crop of technology pioneers in cloud computing - for example ElasticHosts, FlexiScale, Flexiant, OnApp - and also some strong government initiatives such as G-Cloud. We will have to see whether this kind of technical leadership converts into swift mass-market adoption or not.
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Seven Rules to Improve Your Application Performance Practices
Week 23 of our 2010 Application Performance Almanac

In this article I discuss the seven most important steps to improve your application performance practices. These simple-to-follow practices will help you to improve the way you deal with application performance. Besides eventually improving the performance of your applications it will help you to avoid playing the classical blame game which normally happens when something goes wrong

Rule 1 – Understand the Application
While this may sound trivial and the most obvious thing to do – reality paints a different picture. Most failing application performance management processes originate from people’s lack of understanding of their application. This does not mean that the people involved in the process do not possess the knowledge to solve the problem. The case is rather that they do not have enough data to understand what is really going on in the application itself. They follow an approach which pretty much is like: “I will try to make sense of whatever information I can get.” They see the lack of information as an immutable physical law like gravity instead of trying to get the information they really need. In my post on the Proactivity of Troubleshooting I described in detail which information is needed to understand application problems. You basically must be able to answer the following questions.

  • What has happened?
  • When did it happen?
  • Who is impacted?
  • What is the difference compared to before the problem?
  • Why did the problem happen?

If you cannot answer these questions based on the data you have, then you need to get the data to answer them. Otherwise your activity will be a kind of “performance voodoo” rather than a solid engineering practice. You already think you know where the problem is? Ok, then get the data that proves your guess.

Rule 2 – Measure What Matters
This is the logical consequence of step 1. You have to measure what is important for you and your management. Many people still think that the major goal of performance management is to resolve production problems. This would be equivalent to a CEOs primary job being to save a company that is nearly bankrupt. Most of your work is to avoid problems.

Most people are pretty good in measuring technical aspects of their application like response time of web requests or connection pools. They feel like Captain Kirk (Well, I would prefer Captain Archer) if they have operations dashboards that show a lot of these fancy metrics. However, when they talk to their management they realize that this is not necessarily the information needed by management . In Is there a Business Case for Application Performance discussed this problem in detail.

The concept introduced in this post is Business Transaction Management (BTM). BTM is a specialized discipline within application performance management which targets communicating performance aspects at a business level. Typical questions answered by BTM are:

  • Which user was affected by a production problem?
  • What are the effects of increasing traffic for transaction X by 10 percent?
  • How do we have to change our infrastructure to serve our users better?
  • Why did user “Sam” have a problem accessing his account this morning?

So basically this means that you have to relate your low-level metrics to the context of the application. If you only have measures helping with highly technical problems, you will fail in these higher-level activities important to management.

Rule 3 – Objectify Measurements
As all your activities are based on measurement, you better get them right. The basic rule is that measurement results should be the same irrespective of who is measuring them. In case you think this is a no-brainer, try the following: Ask three different people in your organization how much host CPU is consumed by a certain transaction. Ideally, ask a developer, a tester and an operations guy. (If you really run the experiment I would really be interested in the results – please post them in comments below.)

So the important part here is to objectify the way you measure. This includes the measurement method as well as the tooling. You also have to define how to interpret the measurement. This becomes even more important if you have to work across teams. If you can’t agree on how to measure, how can you ever expect to compare results?

Rule 4 – Define a Language
A prerequisite for talking with each other is talking in the same language. This means you have to establish a language that everybody from management to IT super geeks understand. No, I am not talking about English or whatever your native language is. Have you ever been at the doctor’s and getting an explanation about what’s wrong with you and not being able to figure out whether you are pregnant or growing a third leg? So the import part about a common language is that both parties understand what the other one means.

In management we generally refer to this language as Key Performance Indicator (KPIs). KPIs provide a common means for communication across stakeholders. They do not provide the necessary level of detail you need for your daily work, but they help in coordination and planning. If you now think you have done the job if you define your KPIs as CPU usage, memory consumption and network traffic you did not yet get the point. That’s like a doctor telling all your detailed blood values. Your KPI tells you whether you are healthy or not. You do not care about value x being 20 or 30. In fact you probably have no idea whether a higher value is better or not.

Which values you actually choose depends on your application, but they have to cover things like quality-of-service or provisioning information. BTM is a central concept to collect information at this level. Your more-detailed measures are then used to decide how to influence your KPIs in the direction you need to. If you tell your boss that you cannot serve more than 300 concurrent users and he requests you to serve 1000 you have to figure out how to do that. This will then require to you to look at memory consumption or CPU usage.

Rule 5 – Use a Map and a Compass
So what does this metaphor mean? Can you imagine navigating on the ocean solely with a compass? Already after a short time you will realize that knowing where north is does not really help, if you do not know where you are in the first place. What does this mean for performance management? Performance management has to be a continuous activity; otherwise you get lost and cannot make optimal usage of your measurements. As pointed out in an earlier post (Top 10 Reports are not the final answer) many people think that top 10 reports are the ultimate answer to performance optimization. If I simply optimize the ten slowest parts everything will be fine. While this approach will have the desired effect of performance improvement, it can easily become the wrong way to go. What if what is shown in the top ten report is not the cause of why the application is slower?

So you have to create your map first to know the direction to head towards. In performance management this means to continuously monitor your performance. This enables you to understand trends and whether you are on the right track or have to move in another direction. This goes beyond just measuring the trends in response times. You also have to know why you are moving in the wrong direction. (see rule 1)

When I was doing my open sea sailing certificate, I had to deal with exactly the same issue in navigation. First you need to find out whether you are in the right place or not. If you failed navigating to the right coordinate you have to find out why you got there and how to avoid this in the future by taking into account factors like tides or windward drift.

Rule 6 – Do the Simple Things First
This is a generally good advice. However, most people do not follow it. As in most cases in life, the 80:20 rule also applies to performance management. You get 80 percent of the success by investing 20 percent of the effort. However it seems to be a law of nature that people are much more attracted by getting the other 20 percent of success. Let me give you some examples.

People try to implement complex high-end caching systems before following simple performance best practices. While all these technologies are great from a performance and scalability point they require massive efforts, while an improved web caching strategy requires nearly zero implementation effort.

Another example is people deciding to start with performance management and trying to get everything fully automated from their CI environment over testing to production. While this is a great goal and perfectly follows the Continuous APM idea this endeavor is doomed to failure. You have to start with regular manual performance analysis first and then step-by-step automate your processes as knowing what to measure is a prerequisite for automation (see rules 2 and 3)

Rule 7 – Every Ship Needs a Captain
A point many people also miss when implementing performance management is to define responsibility. The general rule regarding responsibility is that if it is not clearly defined the result will be chaos. Either many people feel responsible leading to a complete mess or nobody feels responsible leading to … well nothing. The vital step to success is to define who is responsible for performance in your company. This does not mean that he has all the expertise to solve every problem. Most likely he will not and will require the help of other people in the organization. His job is to ensure that all necessary steps will be taken and the right people get involved at the right time.

Conclusion
Performance Management should be part of every company’s software processes. However, failing to follow some important rules will lead to frustration, waste of resources and finally failure. I’ve described the top rules for being successful which solve the major problems I have seen out in the wild. I do not claim this list to be complete, but it goes a long way. If you have important rules to add, feel free to post them as a comment.

About Alois Reitbauer
Alois Reitbauer works as a Technology Strategist for dynaTrace Software where he is leading the Methods and Technology team. As part of the R&D team he influences the dynaTrace product strategy and works closely with key customers in implementing performance management solution for the entire lifecylce. Alois has 10 years experience as architect and developer in the Java and .NET space. He is a frequent speaker at technology conferences on performance and architecture related topics and regularly publishes articles blogs on blog.dynatrace.com

Latest Cloud Developer Stories
A Tel Aviv start-up called Porticor that’s just hit the radar says it’s got a way to secure the cloud, any cloud. Fancy that, a trustworthy cloud. And Porticor delivers its data encryption solution to IaaS and PaaS users through the cloud in minutes. Fancy that. It’s supposed...
"The volume of data we're generating now from machines pales in comparison to the volume of data we'll soon generate from our own bodies," says data security expert Dave Asprey. Writing in a Trend Micro blog, Asprey - who is one of the leaders in the emerging Quantified Self move...
Rackspace Hosting, the service leader in cloud computing, on Thursday announced its acquisition of SharePoint911, an industry leader in SharePoint consulting, training, and "JumpStart" services within SharePoint. The unification of both companies provides capabilities to deliver ...
Skill at computing comes naturally to those who are adept at abstraction. The best developers can instantly change focus—one moment they are orchestrating high level connections between abstract entities; the next they are sweating through the side effects of each …
Apache Deltacloud, the Red Hat-contributed ReSTful API that abstracts differences between clouds so services on any cloud can be managed – provided of course there’s a driver – has graduated from the Apache Foundation’s incubator and is now a full-fledged Top-Level Project (TLP)....
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE