No longer a nice-to-have, today’s businesses require a well-tuned, always-on IT infrastructure just to survive. But any organization can fall prey to disaster – be it from natural causes like fire, flood or hurricane, man-made problems like a denial of service or malware attack, or an accident such as a mass file deletion.
You might think that most businesses would be prepared – especially given the increasing number of high-profile cyber-attacks, major weather events, and other business disruptions that have made the news lately. But you’d be mistaken.
In fact, though many organizations have some kind of business continuity plan, only 27 percent of companies surveyed by the Disaster Recovery Preparedness Council scored a passing grade when it comes to disaster readiness. That means nearly three out of four businesses are not ready.
And because the costs of system downtime can be extreme, the effectiveness of your response can determine if your organization keeps going, meets regulatory requirements, or fails completely. Consider that more than a third of businesses reported that they had already experienced at least one outage to a critical business application or a loss of vital data files in the previous year – and that nearly 20 percent of those who’d experienced such a disruption reported losing at least $50,000, with some reporting losses of more than $5 million.
Ensuring ongoing operations when a disaster or outage occurs is where business continuity management comes into play. Traditional business continuity management involves three separate phases:
- Conducting a business impact analysis (BIA): Essentially, a BIA lays the foundation for the organization’s business continuity and disaster recovery plans. It identifies the critical business functions that must be restored first when an event occurs and sets out requirements, such as the recovery point objectives (RPOs) and the recovery time objectives (RTOs), that must be hit and when.
- Creating a business continuity plan (BCP): The next step is to create a BCP. A BCP describes the actions needed to keep critical business processes running after a disruption. It also spells out strategies the organization will use to maintain operations by identifying all the resources necessary to support each business function – processes, equipment, software, connectivity, and so on – as well as how the organization’s employees will continue to do their jobs.
- Creating a disaster recovery plan (DRP): The DRP lists the actions that must be taken to recover resources after a disaster, restoring conditions to full functionality and normal business operations. The process of creating a DRP involves threat analysis and impact scenarios, to fully understand the types of disasters that may strike the organization, and recovery requirements documentation.
The goal of a good business continuity plan is to ensure that data access, employee productivity, customers, partners, and the organization’s bottom line are not materially affected by the disaster. The plan must contain instructions for every possible type of disaster an organization might encounter. At a minimum, the plan should include:
- A definition of the response team, including roles, responsibilities, and accountabilities of each member
- Strategies for avoiding downtime, such as continuous data backups and high availability
- The location of alternate facilities for business recovery and short- and long-term operations
- Detailed procedures for the response team, including periodic testing and fire drills
- Information for employees such as the location where they will work (from home, for example), where a backup copy of the employee phonebook is, where they should get work assignments, such as a separate portal, and how they will gain access to critical files during the disruption.
Innovating the process
It’s also important to think beyond the generic steps of a business recovery plan and look for innovative ways to improve processes. For example, a key feature of most business continuity plans is the use of hot sites, warm sites, and cold sites.
A hot site essentially replicates the organization’s IT infrastructure, applications, and data. A cold site is typically an empty data center with power, heating, and cooling. A warm site is something in between, usually with connectivity and perhaps servers and data storage.
The advantage of a hot site is that it can take over operations immediately, especially if you have a continuous backup solution running or you mirror your operations. The problem, as you can imagine, is that they’re expensive. However, disaster-recovery-as-a-service (DRaaS) changes that. DRaaS replicates an infrastructure in the cloud, usually at a more acceptable expense – especially if your RTO is minutes rather than hours.
Test, test, and keep testing
Before releasing a BCP or DRP company-wide, test it thoroughly to ensure it works as expected and that it meets all critical business requirements detailed in the BIA. You’ll need to perform several different tests, such as checklists, structured walk-throughs, simulations, and full interruptions.
Where the first three reveal shortcomings in a safe manner, the full interruption test involves pulling the plug on a major production system to make sure failover works. A full interruption test can be disruptive – so much, so, in fact, that it takes guts to run one. But it provides great insights to problems to be faced during an actual disaster.
Finally, BC/DR plans must be reviewed periodically –annually at least – and updated to reflect the organization’s current climate. However, any major change to the organization or its IT infrastructure should trigger a review.
For a deeper conversation about business continuity strategies, get in touch.