SOA News Desk
BCP Lessons Learned and New Ideas for IT Infrastructure Continuity
Learn How to Justify the Creation of Disaster Recovery Facilities
Feb. 11, 2009 12:45 PM
Businesses in the southeastern United States have been hit hard with hurricanes in the last few years, and 2008 was no exception. As a project manager and CBCP for over 1600 disaster recovery deployments I can share real examples of how entire data centers were failed over to the DR operations center in preparation for hurricanes, while others (due to poor planning) did not have the same success. Those that were successful were efficient in organizing the RTO of their communication servers which helped them prioritize the recovery efforts as well utilize creative testing procedures in order to not disrupt normal business activity. The first priority of a BCP is to ensure the safety of the employees, but being able to communicate to those needed is also an important step for successfully executing a BCP. Because of this preparedness many businesses I have heard from were able to proactively allow their employees evacuate and still provide them remote access for business operations from almost anywhere. I will review a few of the examples of architecture, solutions and best practices for exercising controls in those events as well as discuss what future technology may be utilized to better help justify the creation of disaster recovery facilities.
10 Professional Practices for BCP
There are ten professional practices for business continuity planning; all equally important and if followed appropriately will allow you to create a solid foundation to build upon. For the purpose of this article I will summarize the professional practices, but for more information visit the Disaster Recovery International Institute (www.drii.org). DRII is an excellent resource for BCP and is a consortium of business continuity professionals dedicated to setting industry standards and sharing knowledge around the practice of business continuity management.
The first step in building a BCP is Program Initiation and Management. This step is designed to establish executive approval, support and justification for the need of a resiliency program. Start with building a dedicated team that is committed to supporting the BCP initiative and selecting team members that can effectively manage roles and responsibilities for their portion of the plan. Cost justification is often a hurdle in establishing the need for disaster recovery facilities, so one tip would be to utilize your current assets such as other offices or co-location facilities. You can also work with the IT department to help tie in the IT management budget into the BCP so that you are not just providing continuity in the event of a disaster, but also high availability for day-to-day operational maintenance.
The next couple of steps are important in determining the risk (risk evaluation) your organization faces from either a natural or environment disaster perspective and then determine the business impact (BIA) should one of those events occur. This will help you determine the next step in the business continuity strategy you design and implement to meet your defined recovery point (RPO) and time (RTO) objectives. Once those objectives and controls are defined you will need to integrate emergency response and operations in order to define the process in which a disaster is declared and what prompts the initiation of the BCP.
These previous steps are what allow you to design and implement a comprehensive strategy that meets the requirements of your company’s objectives. I have seen companies try to short cut these previous steps and immediately skip to implementing a solution, only to find out that their infrastructure doesn’t have enough power, bandwidth, resources and or executive approval to support the controls implemented. So the lesson learned is, don’t try to take short cuts and jump into something you have never done before. Following the previous steps will allow you to proceed and likely prevent challenges you may face during the deployment and execution of your plan.
The next three steps include designing and implementing the BCP, generating awareness and training your organization on what to do in event of a disaster, then exercising those plans regularly. Exercising BCP is typically recommended to be tied to your change control process which means the plan should be reviewed any time there is a change within the organization that may affect the plan. (That can be anything as small as a software update to some of the business critical servers to a BCP member leaving the company.) Depending on the situation, exercises could take place as frequently as once a month or at very least 2-3 times per year so that there is a consistent awareness of the plan and procedures.
The last two practices, crisis communication and coordinating with external agencies is really the culmination of the previous practices and will ultimately be the success or failure of your plan. In the event of a disaster, communication is critical to coordinating with emergency responders and your own business continuity team to make sure evacuations and safety procedures are implemented effectively.
When Planning and Exercising is Done Right
Planning is your best friend when it comes to rolling out controls for a business continuity solution. Starting with executive buy in though budget, infrastructure, process, procedures, testing and ultimately execution you can’t plan enough. And when it’s done right deployments go smoothly. However, is more than one way to go about this. As the saying goes “Don’t eat the elephant all in one bite”. Breaking down your overall rollout plan into smaller projects will help you better manage details as well as prioritize the order of the overall deployment. Here are some quotes from companies who did it right and were glad they did after Hurricane Ike made landfall:
- “All is OK and thanks. Our files were mirrored to our Austin facility with no loss of data or applications. Winds tore a 30'x30' hole in the building roof. The water damage was bad. The computer servers were spared but alot of workstations were soaked. Houston operations were running in Austin just before the hurricane hit and the transfer was seamless.”
- “Thanks, our company is doing just fine. With our replicated data to one of our other locations, we were up and seeing patients once the patients could get to us. We appreciate your concern, and your overall support of our organization. On behalf of our organization, we want to say thank you!”
- “Yes we did make it out alive; we activated our business contingency plan, and relocated to Dallas. Luckily our solution allowed us to failover and business continued. “
Exercising the business continuity plan on a regular basis helped these companies not only be prepared but assured that they were ready for anything. And with the adaption of new technologies for IT infrastructure, testing those plans are even easier to exercise while minimizing impact to production operations. In previous years testing business continuity plans for the data center usually required shutting down the entire production facility and running through the restoration process. With the adoption of real-time replication software, co-location facilities and virtualization testing can be accomplished with minimal impact to a production environment. If you have a dedicated disaster recovery facility with hot standby servers you could just segment the networks from each other and bring the site online. However, you had to be very careful about making sure those two sites weren’t talking to each other via domains or active directory services.
How Dynamic Infrastructure Is being used to facilitate BCP Exercise
Dynamic Infrastructure is defined by some as ‘the ability to rapidly move and provision workloads with security and inherent protection’. It may be a new idea to you, but it is being adopted within the IT community with great success. Dynamic Infrastructure not only simplifies the disaster recovery procedures for data center managers, but also provides the ability to use those same controls for day-to-day operations to keep your business operations available all the time - not just during disasters. With the adoption of virtualization technologies saving costs on hardware, power and cooling, data center management budgets can be combined with BCP for maximizing infrastructure availability. These technologies also assist BCP exercises by simulating recovery servers and sites without bringing down production servers. Some solutions like VMware® Site Recovery Manager have this feature but also have some inherent issues. For instance, in the event of a real disaster the virtual solution doesn’t have any failback capability. Typically once that process has been started there is no turning back without a complete restoration which could take days depending on the number of systems and or volume of data that needed to be restored. Dynamic Infrastructure provides the functionality that others are missing as well as allowing for rapid failback capabilities for smaller or “little d” disasters, which are more likely to impact a business critical system.
The Next Generation of BCP
With future technology delivering Dynamic Infrastructure, cloud computing and mobile communication devices, learning how they can protect IT infrastructure for Business Continuity Planning has never been more important. Many management services are offering remote or mobile access for initiating some of these data center management functions. Imagine if you could initiate a failover of a server via your iPhone or BlackBerry®. The reality is that it isn’t very far off. It’s possible that many business- critical services could be run via cloud computing so that services are available anywhere they are needed - even if there was a disaster at the production facility.
However, this begs the question. Who is protecting the cloud and what is their business continuity plan?