What is a Disaster Recovery Policy? | Microbyte

What is a Disaster Recovery Policy?

Recovering from a disaster requires proper planning. As the cliché goes, ‘Failing to plan is planning to fail’- a truism appropriate for incident responses in a business environment. 

When critical business operations are under attack, having a disaster recovery policy is essential to establish the correct response. As a high-level strategic document, the policy informs key stakeholders, managers, and employees about planned responses to incidents, such as natural disasters, cybercrime, fire at the premises, internal sabotage, and more.

Disaster recovery is a top-down process. Once the strategic approach is clarified, other company documents are assembled. These include a disaster recovery plan, which outlines the operational steps in response to an incident. Then, the business continuity plan follows.

A business continuity plan lays out the initial recovery procedures to minimise potential business impact and reduce recovery times. It instructs employees how to manage various issues to get operational again, while other personnel create permanent fixes.

So, let us get started.

What is a Disaster Recovery Policy
What is a Disaster Recovery Policy

The Components of a Disaster Recovery Policy

What is a disaster recovery policy? Let us first define the necessary components of the policy:

Purpose and Scope

The Purpose: The disaster recovery policy’s purpose establishes why it has been created and what it intends to cover.

For a smaller business, the purpose may be limited to confirming valid approaches to business resilience as they relate to the business and the industry it operates within.

The Scope: The scope determines how far the boundaries of a disaster recovery policy extend. A detailed review of the disaster recovery plan draft confirms what assets and infrastructure the organisation aims to recover.

The policy clarifies the procedures and applicable rules. The above plan will differ depending on the type of disaster – malware intrusion, on-site fire, sustained power outage, etc.

Roles and Responsibilities

It is not enough to fully appreciate both the purpose and scope of a disaster recovery policy. Who is assigned to respond to a critical event and what their responsibilities will be are two more pieces to the disaster recovery puzzle.

The Roles: The assigned roles clearly state who is responsible for different aspects following a disaster. In broad terms, each person is given their expected initial actions and post-event follow-up actions. This also includes both policy enforcement matters and any related action steps.

In certain scenarios, an individual may be responsible for a specific area, while in other cases, it could be a small disaster recovery team or a single person working with their direct report. These roles extend beyond IT staff and also encompass management-level and other operations staff.

Note: Anticipation of potential staffing issues is required. This includes an unexpected illness, planned and unplanned leave, and periods where one staff member has left the company and a replacement is still being sought.

For effective disaster recovery, all relevant parties must be clear about assigned roles. This avoids the ‘headless chicken’ effect of people running around, not knowing who should be doing what. Everyone must know their roles, without exception. Also, selected personnel must have sufficient training and knowledge to complete their tasks. Only merit-based assignments apply here.

The Responsibilities: Actions to take during a disaster and post-disaster fall under responsibilities. Such responsibilities require specificity regarding business infrastructure or other potentially affected areas.

Emergency procedures cover different sections of the business infrastructure e.g. if part of a company network is disrupted rather than the entire network. Assigned personnel must have the necessary training to manage all major issues relating to their areas of responsibility, not only minor aspects.

All company employees must receive sufficient disaster recovery training, advice, and support. This way, they will have confidence in managing their respective areas. It also prevents employees from bombarding IT staff, managers, or supervisors with questions when these personnel need to focus on problem-solving.

Definitions of Potential Events: Various potential events can occur. Below are a few of them. Because of their distinctions, these would be specified in recovery documents, typically expanded upon individually.

Here are a few examples:

On-Site Fire – A localised fire puts personnel at risk. Evacuation policies help ensure that everyone gets safely out of the building. However, IT hardware and other essential equipment are put at risk. On-site backups may be damaged or permanently lost. Servers go offline. Data centres become compromised, etc.

Responses to a fire differ from those to another crisis, such as a failing cloud server. The severity of the fire is elevated because of the risk to on-site personnel and business infrastructure.

Cloud Data Disaster – A failed cloud data storage may compromise data stores, website operations, Software-as-a-Service facilities, and other functionality.

A cloud disaster recovery response may include temporarily working around the problem or taking some customer-facing services offline. Remote problem resolution is one possibility. Cloud data disaster severity depends on what is initially lost, its disruption to ongoing operations, and whether data is recoverable from backups. 

Network Intrusion – An attack on the company’s network is managed with extreme urgency. The potential for significant damage is high.

Responses to a network intrusion include securing network endpoints and removing the virtual intruder. Cyber attack monitoring tools automatically respond to threats, while other actions are manually performed. The severity is relatively high, depending on the type and depth of network penetration.  

Difference Between Disaster Recovery Policy, Plans, and Business Continuity

For larger companies or substantial operations, there should be a disaster recovery policy, a disaster recovery plan, and a business continuity plan. Some small to medium-sized enterprises (SMEs) may produce a centralised document that includes facets of each.

A disaster recovery policy represents a serious, long-term commitment. To be blunt, it is not like a company’s ‘Mission Statement’ that is written, published, and instantly forgotten. Instead, it is closer to a living document that sets the framework. It details the high-level strategic approach that the company takes to its disaster recovery and best practices.

This is not to be confused with a disaster recovery plan, which lays forth the step-by-step actions to take, depending on the unfolding disaster and its aftermath. People often conflate a recovery policy with a plan, but they are distinct. 

Considerable integration exists between disaster recovery policy and plans, and business continuity plans. They never stand in isolation. They fit together neatly as follows:

  • The disaster recovery policy states how the business expects to manage any disaster, who will do what, and why.
  • The disaster recovery plan gets into the tall weeds about what actions to take, based on the disaster type. It details the precise action steps necessary to get the business fully operational again.
  • The business continuity plan conveys the company’s response to various disaster scenarios. These include using alternative resources or spinning up other services to get minimally operational again. That is until the recovery plan is completed.

Talk with our team at Microbyte to get help creating or updating your disaster recovery policy.

Steps to Create a Disaster Recovery Policy

Here are the broad steps to creating a disaster recovery policy.

Risk Assessment and Business Impact

Carry out a fact-based analysis of all business functions essential to the operation. Do this in conjunction with a review of the company’s asset inventory. Please discuss this with the operations manager to get their input. Cast a wide net on whom within the organisation – and potentially external to it – may have some bearing on critical operations.

Determine which types of disasters impact each of the business functions. Expect some overlap where one business function is potentially compromised in several disaster scenarios. This is completely normal. This determines whether it should be included in the disaster recovery plan.

Recovery Objectives Definitions

Two recovery metrics are relevant here:

Recovery Point Objective (RPO) – It specifies the time permitted for recovering critical operational files from backup storage. These are necessary for normal business operations and alter the frequency of data backups. The amount of acceptable data loss is a factor here.

Recovery Time Objective (RTO) – This objective states the maximum allowable downtime from a disaster during which the company can still recover successfully. It affects potential financial losses, customer satisfaction ratings, and brand longevity.

Drafting the Policy

The disaster recovery policy must include the following:

  • Recovery process steps to take (based on different disaster scenarios).
  • Assigned roles and task responsibilities.
  • All emergency contact details.
  • Planned monitoring and improvements to the policy document.
  • Document version control.

Need help drafting your company’s disaster recovery policy? Our team at Microbyte is ready to assist you.

Testing and Maintaining Your Policy

As stated above, the policy requires regular reviews and updates. Updates adjust the posture for new threats, different attack vectors for cyberattacks, account for personnel changes, and other relevant matters. 

Dynamic changes are necessary to avoid a situation where a disaster occurs, and only then is the policy reviewed and found to be sorely lacking. At this point, it is too late.

To determine whether the existing policy and plans are robust enough, it is important to test. Tests should include several of the following:

  • Service Interruption Tests: What happens after a service interruption? Where is the policy lacking?
  • Checklist Tests: Running through checklists to confirm if they are correct and complete. 
  • Parallel Tests: Run multiple tests simultaneously. Verify their veracity.
  • Simulation Tests: Put the policy through its paces as if it were a ‘real world’ scenario. Is it effective, or does the policy require a major overhaul? If so, make the required changes and re-run the simulation.

Case Studies and Real-World Applications

To bring disaster recovery documentation into a real-world scenario, we provide two disaster recovery examples:

Airline System Crash (Cyberattack): An airline system crash affects passenger check-in for flights and new bookings. The disaster recovery policy confirmed how the airline approached system crashes because of a cyberattack. Employees knew what was expected and who was responsible for the resolution. A section on managing customer expectations advised them on what to say.  

The disaster recovery plan advised how the team should respond to a system crash, determine the cause, resolve it, and restart the system. A cyberattack will not resolve quickly and employees are advised accordingly. The business continuity plan offered a section on system crashes and gave instructions on sidestepping quick fixes like rebooting the system. Instead, the plan advised how to manually check-in passengers to flights using paper-based forms. 

Port Authority Fire: A fire at a port authority – a major transportation transit point – is a huge safety issue. The disaster recovery policy prioritised safety over other concerns. This included following an evacuation plan and letting first responders extinguish the fire.

The disaster recovery plan confirmed the steps that Port Authority employees were to take in the event of a fire. These included contacting the fire safety officer, beginning a port evacuation, and notifying the emergency services. Subsequently, the port required a thorough safety assessment before reopening.

With safety issues addressed, system recovery began. This included assessing any fire damage to IT systems, equipment, and infrastructure. The continuity plan involved operating at a lower capacity within some parts of the port, while safety concerns and structural repairs were ongoing elsewhere.

Conclusion

By establishing a disaster recovery policy, plan, and continuity plan, companies can prevent potential panic, provide strategic oversight, and detail plans to manage any potential failures. Research confirms that many businesses do not survive post-disaster due to a lack of or inadequate planning.

Business disaster recovery plans involve hard work, yet they provide a safer environment for people on-site. Disaster strikes unexpectedly, especially in a crisis, and panic is dangerous 

Unsure where to start? Discuss your disaster plans with our talented team at Microbyte. 

Contact us today. 

Similar blogs

What is an IT Security Policy

What is an IT Security Policy

An IT security policy confirms the specific rules and correct procedures governing how employees and other parties may use the company’s IT resources. This type of policy details both what is expected and what actions are not allowed. Policies detail acceptable uses of IT technologies, controls limiting user access, accepted procedures, and the consequences for…

Read More

Avatar photo

A Guide to Responsible IT Asset Disposition

A Guide to Responsible IT Asset Disposition

IT asset acquisition requires careful consideration and post-purchase implementation. IT asset disposition (known as ITAD) involves the full lifecycle of an asset while owned by the business.   Every IT asset has a lifecycle. A home user might use a laptop for 4 – 5 years, but most companies expect a shorter lifespan. The equipment must…

Read More

Avatar photo

10 Important Questions to Ask Before Choosing a Managed Service Provider (MSP)

10 Important Questions to Ask Before Choosing a Managed Service Provider (MSP)

Having a reliable and efficient IT infrastructure is a huge asset to any business. Being able to rely on your technology and having professional support when you need it can set you ahead of the competition. As companies grow increasingly reliant on technology, Managed IT Service Providers (MSPs) offer an affordable, practical and efficient way…

Read More

Avatar photo

BAU IT Support

Business As Usual (BAU) IT Support

Business as usual (BAU) support represents regular work tasks within an IT department. They reflect tasks that key IT personnel perform to maintain technology systems with minimal potential disruption to business operations. BAU tasks may include routine tasks such as infrastructure management, network monitoring, software patching, hardware driver updates, and other responsibilities. Troubleshooting – working…

Read More

Avatar photo

What is EOSL and How to Manage it for Your Business

What is EOSL and How to Manage it for Your Business?

Technology-related products have an expected lifecycle: they do not last forever. The End of Service Life (EOSL), a technical term, applies somewhere between 6 to 12 years after the initial release date. At the EOSL stage, the Original Equipment Manufacturer (OEM) typically discontinues maintenance support, releasing software fixes, or new firmware updates. Occasionally, they offer…

Read More

Avatar photo

On-Premises vs Cloud Which is Best

On-Premises vs Cloud: Which is Best?

Small businesses and larger enterprises wrestle with managing their IT infrastructure, current requirements, and future demands. Choosing between on-premises and cloud computing is pivotal as it significantly changes the underlying approach to IT infrastructure and operations. Why Local Technology Was Previously Attractive? Previously, IT departments saw all technology-related operations as coming under their purview. As…

Read More

Avatar photo