Updated: 18 January, 2024
4 May, 2023
What is a disaster recovery plan?
A disaster recovery plan is your plan for when the dirt hits the fan.
Security pundits like to say things such as “It is not a question of whether your systems will be compromised but rather a question of when.” Sure, a strong security posture decreases the risk, but being ready for the worst case scenario is an integral part of any mature security posture.
The worst thing that could happen is not “getting hacked”, it is “getting hacked and not knowing what to do”. So, long before the clouds gather over the horizon we think through a broad range of potentially disastrous scenarios. The result of this exercise is a practical playbook on what to do when disaster strikes. This playbook is called the disaster recovery plan.
- Use the template to make sure you don’t forget anything in the DRP
- Manage multiple DRPs and company guidelines in SAMMY
- DRP is about recovery, BCP includes temporary continuity measures. They may overlap and that’s ok.
What should be included in my disaster recovery plan?
Your disaster recovery plan needs to identify all your assets (hardware, software and data), roles (responsibilities) and procedures (actions) for different scenarios. A categorization of criticality of assets helps to define the priorities in the response actions. On top of that we must prepare for a wide range of incidents that have varying impacts on infrastructure and data. How do we make sure we don’t forget anything?
The Computer Security Resource Center of NIST (National Institute of Standards and Technology) provides extensive guidance in special publication 800-34. Based on this guidance we have created a SAMMY module to manage your disaster recovery plans. This model starts with a checklist of everything that is needed in your disaster recovery plan.
Is there a template I can use for my disaster recovery plan?
You can use SAMMY’s disaster recovery plan module to guide you through the important components of the disaster recovery plan. This guidance is based on the NIST guidelines and is organized in 6 parts:
1. Supporting Information
Make sure to have the DRP accessible to all stakeholders. The plan should identify maximum tolerable downtime, recovery time objectives and recovery point objectives. Somewhere in the plan you should also be clearly stating the reasons for the development of the plan and its objective.
Make sure you list all essential assumptions and a list of situations that are not applicable.
2. Concept of Operation
Provide a general description of the system architecture and functionality. Indicate the operating environment, physical location, general location of users, and partnerships with external organizations and systems and any other technical information relevant for recovery purposes.
Provide an overview of the “Three Phases”:
- Activation Phase
- Recovery Phase
- Validation Phase
Include the overall structure of the organization, the teams and the hierarchical relationships. With that describe each team or role and their responsibilities in executing, supporting and validating system recovery. Include responsibilities around coordinating across teams.
Establish the criteria for activating the plan including the role(s) that may activate the plan. Provide notification procedure for both during and outside of business hours. Include procedures for the event specific people cannot be reached.
Next comes outage assessment, define the outage assessment team and the criteria for assessing the disruption.
Make sure the sequence of recovery events reflects the formulated maximum downtime formulated in the Business Impact Assessment (BIA). Include instructions for coordination and escalation activities. Have procedures in place to relocate equipment, data and records to an alternate site if needed.
Provide a clear step by step guide with checklists for the recovery procedure. Make sure to identify the team/role for each task.
Make sure it is clear what the procedure and thresholds are for escalation. Again make sure to appoint all tasks to specific teams/roles for clear ownership.
Provide relevant procedures for each of the following 3 stages of testing. Appoint all activities to specific team or roles.
- Testing and validating that the data has recovered completely at the permanent location.
- Testing that the different functionalities of the system have fully recovered.
- Testing that the security systems and controls are operating correctly.
Define the procedure to formally declare the recovery effort complete, including instructions on how to notify appropriate stakeholders. Provide a checklist for the dismantling any temporary infrastructure and readying the system for another event.
Include a plan with checklist to restart backup procedures.
Lastly provide the procedure for documenting the event. Which information has to be provided and collected by each DRP team member? Define procedures and responsibilities for the development, collectio, approval and maintenance of event documentation.
You should include the following appendices:
- Personnel Contact List: The personnel contact list ensures efficient communication and coordination during a disaster. Provides easy access to the individuals responsible for executing various tasks outlined in the DRP.
- Vendor Contact List: The vendor contact list helps facilitate communication and collaboration with external entities during the recovery process.
- Detailed Recovery Procedures: This appendix provides in-depth recovery procedures for specific systems or processes outlined in the DRP. Detailed recovery protocols provide consistency and accuracy in recovery job execution, decreasing delay and the effect of a disaster.
- Alternate Processing Plan: In the event of a disaster, organizations may need to establish alternate processing methods or locations to ensure business continuity.
- Validation Test Plan: This appendix provides a comprehensive test plan for validating the effectiveness and functionality of the DRP. Ensures that the DRP is regularly tested and optimized to maintain its relevance.
- Alternate Storage Locations: Alternate storage locations ensure the availability and integrity of critical data during a disaster.
- Diagrams: These diagrams of the DRP help stakeholders gain a clear understanding of the organization’s infrastructure and data flows.
- Inventory: For the successful recovery of systems and operations, a complete list of important assets is necessary. This includes hardware, software, and infrastructure components is provided in the inventory appendix.
- Interconnections Table: The interconnectivity table aids in identifying essential paths, prioritizing recovery activities, and assuring a thorough recovery strategy.
- Test and Maintenance Associated Documentation: Provides a historical record of previous test exercises. This enables companies to draw lessons from the past, monitor progress, and pinpoint areas for growth.
- Business Impact Analysis Document: A disaster’s potential effects on crucial business processes and resources are evaluated in the business impact analysis (BIA) document.
How do I know if my disaster recovery plan is good enough?
SAMMY uses a weighted scoring system based on the relative priority, risk profile and appetite factors of the organization or the product. This scoring system allows you to easily benchmark and set a “good enough” threshold for all business units and teams.
How do I manage multiple disaster recovery plans within an organization?
Depending on how you scope your disaster recovery plans you may end up with several plans in the same organization. SAMMY allows you to easily manage multiple plans in parallel with OWASP SAMM management. The tool also makes ownership and responsibilities across individuals clear and transparent.
Disaster Recovery Plan Training.
The teams that have to execute the different components of the plan have to be familiar with the plan. In order for this to happen we organize disaster recovery training. These trainings look like war games and improve the execution of the plan. They are also used to test the plans and to further refine them based on the practical experience of the test runs.
What is the difference between a disaster recovery plan (DRP) and a business continuity plan (BCP)?
Your business continuity plan is a set of often temporary measures that minimize the impact to the business while information systems are being recovered. You can think of the BCP as the reserve wheel you carry in your car, while the DRP is getting the car back to its original state in the garage. In smaller organizations the DRP and BCP may be parts of the same document. When dealing with critical infrastructure there will also be a continuity of operations plan (COOP). The COOP is similar to the BCP but specific to the most critical operations.
Wait, what? How many plans are there?
Well, I’m glad you asked! There are up to 8 plans recommended by NIST. But most organizations don’t need all of these. The amount of plans you have will be in line with the scale of your business, the nature of your activities, your security maturity and your process maturity.
The total picture looks like this:
On the schema the COOP and BCP are red and labeled “process focussed”. Their objective is to allow business processes to continue. These are often temporary measures.
The disaster recovery plan (DRP) is blue and labeled “system focused”. These processes are about getting the information systems back up and running.
The other system focussed plans are:
Critical Infrastructure Plan (CIP): Specific plans for critical infrastructure
Information System Contingency Plan (ISCP): A more granular plan dealing with a specific information system. Your general DRP may refer to several ISCPs based on what is affected in the incident.
Cyber Incident Response Plan (CIRP): This is your fight plan for dealing with active attacks.
Then there are two green plans labeled “People Focussed”. These are about protecting and communication with the people inside and outside your organization.
Crisis Communication Plan: Covers both internal and external communication.
Occupant Emergency Plan (OEP): This covers physical safety of anyone affected by an incident.
SAMMY makes things easy.
We believe in a simple and safe digital future. In the face of complexity, structure is your friend. And that is what SAMMY brings, be it for your Disaster Recovery Plan Management or your OWASP SAMM management.
And SAMMY is free to use. You can try it out here: