|
Comments
Did you read today's front page stories & breaking news?
SYS-CON.TV
|
BlackBerry News Desk Importance of Having DR Procedures
BlackBerry outage highlights poor state of disaster recovery and backup procedures
By: Ranko Mosic
Oct. 13, 2011 02:15 PM
A recent BlackBerry outage is the nightmare for both RIM and its users. Nature and mass popularity of Blackberry service make this particular outage highly visible. There are many other in-house outages we never hear about. Companies prefer to keep silent about them, if they can. The only insight we get into the poor state of DR procedures is in some public cases like this one, or from limited personal experience. In my Oracle database infrastructure consulting career I have seen several serious production outages and was part of a few data recovery efforts. Backups and DR procedures are just not as high on IT priority list as development, production support and new projects are. Backups and DR are frequently considered a chore, mechanical stuff, uninteresting work. Oracle database works fine most of the time, it is very reliable and robust product. When disaster strikes, be it human mistake or external cause (hardware failure), it is often difficult to recover from it.
Fairly large percentage of Oracle database backups in average enterprise fail every day and nobody even notices. Recovery of failed Oracle production database is not a simple task, most of database administrators can not do it properly. There are many possible variations and cases during Oracle database recovery. Smart, experienced, collected and well trained DBA will perhaps be able to recover the database, if all elements are properly aligned and available. Loss of data, incomplete recovery or even ad-hoc rebuild of production environments is very real possibility. Why is the current state of backups and DR so poor?
Hardware and software are inherently unstable. Switches fail, SANs/disks fail, software is buggy, systems are complex, staff is lacking skills - it all sometimes creates the perfect storm which makes production systems go down. Backups are poorly designed and executed, many companies still backup to tape. DR facilities and procedures are supposed to provide protection against production system failures and human mistakes.DR sites are half-ready, out of sync with production environments they are supposed to shadow. Staff is not specialized enough and not well trained either. Many companies still do not have dedicated training environment for DBAs where they can test various recovery scenarios, apply patches, test upgrades, learn new features etc.
How to improve ?
Start with better designed, executed and monitored backups and DR procedures. Perform backup to disk, as opposed to tape. Test backups and restores and hire skilled staff. It is better to have or hire smaller team of highly skilled DBAs then to have large team that you can not rely on. If you have no internal resources then use professional specialized service to design and manage backups and DR for you. Perform perpetual DR drills where various scenarios are tested. Set up training environments for DBAs to test for different scenarios - applying patches, upgrades, restores. Be aware that black swans - rare, negative events, have huge impact, inversely proportional to their frequency, and prepare for them.
Latest Cloud Developer Stories
Subscribe to the World's Most Powerful Newsletters
Subscribe to Our Rss Feeds & Get Your SYS-CON News Live!
|
SYS-CON Featured Whitepapers
Most Read This Week
Breaking Cloud Computing News
|
|||||||||||||||||||||||||||||||||||||||||||||||||