Comments
bruce.armstrong wrote: Somebody just said it better than I did, and with more chops to say it: Open Letter to Mark Zuckerberg, Sheryl Sandberg & Facebook Mobile
Cloud Expo on Google News

SYS-CON.TV
Cloud Expo & Virtualization 2009 East
PLATINUM SPONSORS:
IBM
Smarter Business Solutions Through Dynamic Infrastructure
IBM
Smarter Insights: How the CIO Becomes a Hero Again
Microsoft
Windows Azure
GOLD SPONSORS:
Appsense
Why VDI?
CA
Maximizing the Business Value of Virtualization in Enterprise and Cloud Computing Environments
ExactTarget
Messaging in the Cloud - Email, SMS and Voice
Freedom OSS
Stairway to the Cloud
Sun
Sun's Incubation Platform: Helping Startups Serve the Enterprise
POWER PANELS:
Cloud Computing & Enterprise IT: Cost & Operational Benefits
How and Why is a Flexible IT Infrastructure the Key To the Future?
Click For 2008 West
Event Webcasts
Building the Next-Generation Datacenter - A Detailed Guide / Stage 2
Infrastructure Optimization

IT organizations today are experiencing pressure to not only adopt new and emerging technologies like virtualization, but also reduce costs and do more with fewer resources (thus reducing CapEx) - all while delivering assurance of capacity and performance to the business.

In the first part of this article, we provided a brief overview of the CA Technologies virtualization maturity lifecycle and focused on the server consolidation stage. Although server consolidation helps efficient use of available compute resources and reduces the total number of physical servers in the data center, organizations that have successfully consolidated their server environment and are progressing on their virtualization journey often find it difficult to virtualize tier 1 workloads. They also face significant challenges in utilizing the hosts at a higher capacity. This happens because they lack the confidence to move critical applications onto the virtual environment, or utilize servers to capacity.

In this second part of the article, we focus on building and maintaining a mature and optimized infrastructure that is essential for IT organizations to virtualize tier 1 workloads and achieve increased capacity utilization on the virtual hosts - thus helping them reap the true CapEx savings promised by virtualization.

Gain Visibility and Control
Organizations face significant challenges in trying to achieve the visibility and control necessary to optimize their virtual infrastructure. These include:

  • Providing performance and Service Level Agreement (SLA) assurance to the business.
  • Deploying and maintaining capacity on an automated basis.
  • Securing access to the virtual environment and facilitating compliance.
  • Providing business continuity in the event of a failure.

The following are the tasks and capabilities required to optimize the infrastructure and gain visibility and control into the availability and performance of the virtual environment.

Project Plan
The following is a high-level plan for an infrastructure optimization project. The timelines and tasks mentioned in Table 1 present a broad outline for a tier 1 infrastructure optimization project that targets setting up an optimized infrastructure and adding approximately 10 critical production workloads to about 40 virtual server hosts (with existing workloads) - thus resulting in a 80-90% capacity utilization on those servers. The 3-4 person implementation team suggested for the project is expected to be proficient in project management, virtualization design and deployment, and systems management.

Table 1: Infrastructure Optimization project plan

A successful infrastructure optimization project necessitates a structured approach that should consist of the following high-level tasks. For each of these tasks we will discuss the key objectives and possible challenges, articulate a successful outcome, and more.

Performance and Fault Monitoring
Prior to moving critical workloads onto the virtual environment, IT operations teams need to ensure that they have clear visibility and control into the availability and performance of the virtual environment. To foster this visibility and control, application / systems consultants should use performance management tools to:

  • Discover the virtual environment and create an aggregate view of the virtual infrastructure. This discovery should be dynamic and not static - i.e., the aggregate view should automatically reflect changes in the virtual environment that result from actions such as vMotion. In addition, this discovery should not only reflect the virtual environment, but also components surrounding the virtual network.
  • Set up event correlation. In a production environment where hundreds of events may be generated every second by the various components, event correlation is extremely essential to navigate through the noise and narrow down the root cause of active or potential problems.
  • Enable real-time performance monitoring and historical trending. The performance monitoring should go beyond the basic metrics like CPU / memory consumption and provide insight into the traffic responsiveness across hosts. Trending capabilities are also essential to monitor and be cognizant of historical performance.

Capabilities like the ones mentioned above provide IT administrators and business/application owners the confidence to move critical production applications into the virtual environment.

Continuous Capacity Management
Critical applications depend on multiple components in the virtual environment. Given the dynamic nature of the virtual environment and the high volume of workloads processed by virtual servers, it's almost impossible for administrators to create and manage capacity plans on a project-by-project basis. Therefore, managing critical workloads requires automating the manual steps of capacity management, thus enabling continuous capacity management. A continuous capacity management environment should:

  • Collect and correlate data from multiple data sources, update dashboards with the current state of utilization across virtual and physical infrastructure, and publish reports on the efficiency of resource utilization for each application / business service.
  • Highlight opportunities for optimization, solve resource constraints, update baselines in predictive models, and utilize the predictive model to produce interactive illustrations of future conditions.
  • Integrate with provisioning solutions for intelligent automation, and eco-governance solutions to help maintain compliance with environmental mandates.

The level of continuous capacity management described above, along with comprehensive analytic and simulation modeling capabilities, will allow the IT administrator to effectively manage the capacity of critical applications / services on an ongoing basis.

Change and Configuration Management (CCM)
Pre / post migration configuration discovery and testing is essential to enable successful server consolidation. However, IT organizations that support tier 1 workloads cannot afford to perform these activities on a one-time project basis. Optimized infrastructures need continuous CCM not only for the workloads, but also for the infrastructure. In a highly dynamic environment, erroneous virtual infrastructure configuration can have drastic effects on VM performance. Comprehensive CCM involves:

  • Providing ongoing configuration compliance with system hardening guidelines from the Center for Internet Security (CIS), hypervisor vendors, etc.
  • Tracking virtual machines, infrastructure components, applications, and the dependencies between them on a continuous basis.
  • Monitoring virtual infrastructure configuration and its association with workload performance.

Implementing comprehensive CCM for the virtual environment will not only help avoid configuration drift and its impact on workload performance, but also facilitate compliance with vendor license agreements and regulatory mandates such as Payment Card Industry Data Security Standards (PCI DSS), Health Insurance Portability and Accountability Act (HIPAA), Sarbanes-Oxley Act (SOX), etc.

Workload Migration
The workload migration process is easily the most complex component of an organization's virtualization endeavor. The migration process refers to the "copying" of an entire server / application stack. IT organizations face many challenges during workload migration - most end up migrating only 80-85% of target workloads successfully, and that too with considerable problems. Some of the challenges include:

  • Migration in a multi-hypervisor environment, and possible V2P and V2V scenarios.
  • The flexibility of working with either full snapshots or performing granular data migration.
  • In-depth migration support for critical applications such as AD, Exchange, and SharePoint
  • Application / system downtime during the migration process.

There are free tools for migration available from some hypervisor vendors, but these don't work well and require system shutdown for several hours for the conversion. They might also limit the amount of data supported or require running tests on storage to uncover and address bad storage blocks in advance. Backup, High Availability (HA) and IP-based replication tools serve as a very good option for successful workload migrations as they not only help overcome / mitigate the above mentioned challenges, but also can be used for comprehensive BCDR (Business Continuity and Disaster Recovery) capabilities.

From a process standpoint, ensure that the migrations are performed per a pre-defined schedule and include acceptance testing and sign-off steps to complete the process. Ensure contingency plans, and factor in a modest amount of troubleshooting time to work out minor issues in real-time and complete the migration of that workload at that time rather than rescheduling downtime again later.

Privileged-User Management and System Hardening
Privileged users enjoy much more leverage in the virtual environment as they have access to most virtual machines running on a host - hence tight control of privileged user entitlements is essential. This task should ensure that:

  • Access to critical system passwords is only available to authorized users and programs.
  • Passwords are stored in a secure password vault and not shared among users or hard coded in program scripts.
  • Privileged user actions are audited and the audit-logs are stored in a tamper-proof location.

In addition to privileged user management, which protects from internal threats, IT organizations need to ensure that their servers are secure from malicious external threats. This includes installing antivirus/anti-malware software to protect against these external threats, and making sure that the systems conform to the comprehensive system hardening guidelines provided by the hypervisor vendors.

Business Continuity and Disaster Recovery (BCDR)
BCDR has long been an essential requirement for critical applications and services. This includes backup, high availability and disaster recovery capabilities. However, server virtualization has changed the way modern IT organizations view BCDR. Instead of the traditional methods of installing and maintaining backup agents on each virtual machine, IT organizations should utilize tools that integrate with snapshot and off-host backup capabilities provided by most hypervisor vendors - thus enabling backups without disrupting operations on the VM and offloading workload from production servers to proxy ones. Activities within this task should ensure that:

  • Machines are backed up according to a pre-defined schedule, and granular restores using push button failback are possible.
  • Critical applications and systems are highly available, and use automated V2V or V2P failover for individual systems / clusters.
  • Non-disruptive recovery testing capabilities are available for the administrators, etc.

The one week timeline scheduled for this task assumes the existence of comprehensive BCDR plans for the physical workloads, which then only need to be translated into the virtual environment.

Production Testing and Final Deliverables
The breadth and depth of post-migration testing will vary according to the importance of the migrated workload; less critical workloads might require only basic acceptance tests, while critical ones might necessitate comprehensive QA tests. In addition, this task should include follow up on any changes that the migration teams should have applied to a VM but are unable to perform due to timing or need for additional change management approval. All such post-migration recommendations should be noted, as appropriate, within the post-test delivery document(s).

This final stage of the implementation process should include the delivery of documentation on the conversion and migration workflow and procedures for all workloads. Doing so will remove dependency on acquired tribal knowledge and allow staffing resources to be relatively interchangeable. These artifacts and related best practices documents will also allow the continuation of the migration process for additional workloads in an autonomous fashion in the future if desired.

Conclusion
In Part 2, we focused on building and maintaining a mature and optimized infrastructure that is essential for IT organizations to virtualize tier 1 workloads and achieve increased capacity utilization on the virtual hosts. In Part 3, we will focus on tackling problems such as "VM sprawl" (the problem of uncontrolled workloads), increased provisioning and configuration errors, and the lack of a detailed audit trail - all of which significantly increase the risk of service downtime.

About Birendra Gosai
Birendra Gosai has a Masters degree in Computer Science and over ten years of experience in the enterprise software industry. He has worked extensively on data warehousing, network & systems management, and security management technologies. He currently works in the virtualization management business at CA Technologies. You can view his blogs at: http://community.ca.com/members/Birendra-Gosai.aspx, or follow him on Twitter @BirendraGosai.

In order to post a comment you need to be registered and logged in.

Register | Sign-in

Reader Feedback: Page 1 of 1

Latest Cloud Developer Stories
With Cloud Expo 2012 New York (10th Cloud Expo) now just three weeks away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
If your organization already uses virtualized infrastructure, you are well on your way to providing IT as a Service. But as businesses demand faster results in today’s competitive market, organizations look to gain more benefits from cloud computing than just virtualized infrastr...
Facebook sold off again Tuesday scrapping the bottom at $30.98 after Reuters reported that Scott Devitt, a research analyst at the IPO’s lead underwriter Morgan Stanley, unexpectedly cut his revenue estimates on the company during the roadshow leading up to it going public last F...
In his session at the 10th International Cloud Expo, Marvin Wheeler, Open Data Center Alliance Chairman, will discuss the success the organization has had in charting the requirements for broad-scale enterprise adoption of the cloud and how 2012 is forecast to be the tipping poin...
With Cloud Expo 2012 New York (10th Cloud Expo) now just under three weeks away, what better time to introduce you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON Featured Whitepapers
ADS BY GOOGLE

Breaking Cloud Computing News

hhgregg, Inc. (NYSE: HGG):