Tabletop disaster recovery exercise

A structured, discussion-based walkthrough of how your team would respond when critical systems and backups are under pressure. No live changes—only clarity, gaps, and priorities.

Suggested duration: 90–120 minutes Format: Facilitator-led, room or video Prep: RTO/RPO sheet, system inventory, contact tree

Purpose and outcomes

By the end of the session, the team should agree on:

Who decides what during a major incident (declaring disaster, freeze vs continue, comms).
What “recovery” means for your top applications (restore scope, order, acceptable data loss).
How backups fit the story—local appliance, offsite replication, and proof that restores are viable (tests, screenshots, virtualization drills).
Obvious gaps—missing runbooks, single points of failure, unclear vendor or legal steps—and one or two owners for follow-up.

Facilitator tip: Timebox each phase. Goal is exposure of assumptions, not a perfect plan in one sitting. Park deep debates; capture items in the decision log.

Participants and suggested roles

Adapt titles to your org. One person should be facilitator (keeps time, reads injects) and one scribe (decision log, follow-ups).

Incident leadership

Incident commander / IT manager

Sets priorities, approves major tradeoffs (e.g., restore vs rebuild), coordinates with leadership.

Technical recovery

Systems & backup owners

Explain realistic restore paths, dependencies, and time orders; flag technical blockers early.

Business

Application or business owners

Define minimum viable service, acceptable downtime, and customer or regulatory messaging needs.

Communications

Internal comms / HR / PR

Employee notifications, customer or partner updates, templates, and approval chains.

Optional

Legal, finance, vendor TAM

Ransom or data-breach angles, insurance, contractual SLAs, and vendor escalation paths.

Ground rules

No fault today. Gaps are expected; the exercise exists to find them safely.
Clarify, then decide. Ask “what’s true?” before “what do we do?”
State assumptions aloud. Write them down—many incidents fail on unstated guesses.
Use real names and systems where you can; hypothetical phrasing often hides confusion.

Baseline scenario

Facilitator reads aloud, then pauses for questions of fact (not solutioning yet).

Thursday, 06:12 local time. Monitoring shows several core services slow or unavailable. File shares return errors. Some endpoints report encrypted files with a ransom note. At approximately the same time, your primary site loses utility power; the generator tests weekly but has not carried full load in a storm before.

Your backup stack includes agents on protected systems, a local backup appliance, and offsite cloud replication for disaster recovery. Not every team member has used the restore tools hands-on.

Starter prompts for the room

What is the first fact you need before declaring a security incident vs a pure outage?
Which systems are islanded—if one is compromised, what else is already coupled?
Where would you restore first if you could only pick three workloads in the first four hours?

Suggested discussion timeline

Adjust minutes to your schedule. Facilitator announces phase shifts.

Phase	Focus	Prompt
0–10 min	Orientation	Objectives, roles, scenario read-through.
10–30 min	Detection & triage	How do we know it’s real? Who is contacted first? When do we involve legal or executives?
30–50 min	Containment & priorities	Isolate or preserve evidence? Freeze backups? What stays up for revenue or safety?
50–75 min	Recovery strategy	Restore from local vs cloud, order of operations, RTO/RPO checks, test restore vs production cutover.
75–95 min	Injects	Facilitator introduces 1–2 curveballs from the inject list.
95–120 min	Wrap-up	Decision log review, top three follow-ups, owners, next rehearsal date.

Optional injects (facilitator)

Introduce one at a time after the team has a working plan. Ask: “What changes now?”

Inject A

Key person unreachable

The primary backup admin is on a flight; the secondary hasn’t logged in for six months. Where is the break-glass documentation?

Inject B

Uncertain backup integrity

Logs suggest the affected server may have replicated corrupted or malicious state into the latest backup chain. How do you choose a restore point?

Inject C

Regional cloud degradation

Offsite replication is delayed; cloud restores are slower than planned. Local appliance is intact. How do you communicate revised expectations?

Inject D

Regulatory clock

Legal says you must notify a regulator within 36 hours if personal data was exposed. What evidence do you need before that send?

Deep-dive discussion questions

Do we have a written order of restoration for critical apps, including dependencies (AD/DNS, databases, app tiers)?
When did we last prove a restore—not assume backups ran—e.g., VM boot test, file-level sample, or documented drill?
How do we decide local appliance first vs cloud DR when both exist? What breaks that rule?
Who can authorize payment or engagement with incident response firms, counsel, or ransom advisers?
What is our communications cascade if email or chat is part of the blast radius?

Decision log (scribe)

Copy to a shared doc during the session. Short bullets beat paragraphs.

Time	Decision or assumption	Owner / follow-up

Open items after today

After-action and hygiene

Within one week: distribute summarized decisions, owners, and due dates.
Schedule a technical drill if the tabletop surfaced unknown restore paths.
Update the DR plan and contact tree; version the document.
Optionally run this tabletop again in 6–12 months or after major architecture changes.