3.8 Explain the importance of disaster recovery.
📘CompTIA Server+ (SK0-005)
1. Tabletops (Tabletop Testing)
What it is
A tabletop test is a discussion-based exercise where IT staff and stakeholders walk through a disaster scenario step-by-step without actually performing recovery actions.
Key points
- No real system changes are made
- Conducted in a meeting or workshop
- Participants review the DR plan and roles
- Simulates decision-making during a disaster
Purpose
- Identify gaps in the DR plan
- Confirm roles and responsibilities
- Improve communication during incidents
- Validate procedures and documentation
Example (IT context)
- Simulate a database server failure
- Teams discuss:
- Who declares the disaster?
- Which backup will be used?
- How services will be restored?
Exam tips
- Low-risk, non-technical test
- Focuses on planning and coordination
- No actual recovery is performed
2. Live Failover
What it is
A live failover is a real-time switch from the primary system to a backup system with minimal or no downtime.
Key points
- Fully automated or semi-automated
- Uses replicated systems (e.g., active-active or active-passive setups)
- Minimal interruption to users
- Requires strong infrastructure and monitoring
Purpose
- Ensure high availability
- Validate real failover capability
- Test production-level systems under real conditions
Example (IT context)
- A primary web server fails
- Traffic is automatically redirected to a secondary server
- Users continue accessing the application without noticing disruption
Exam tips
- No or very little downtime
- Uses real systems and real traffic
- Tests actual disaster recovery effectiveness
3. Simulated Failover
What it is
A simulated failover tests DR procedures by simulating a failure without fully disrupting production systems.
Key points
- Systems are tested in a controlled way
- Failover is initiated but may not affect real users
- Often uses isolated environments or partial traffic routing
- Safer than live failover
Purpose
- Validate DR processes without major impact
- Train IT staff in recovery procedures
- Test failover configurations
Example (IT context)
- Simulate failure of a production database
- Route test traffic to a backup system
- Verify that data replication and application behavior work correctly
Exam tips
- Partial or controlled disruption
- Safer than live failover
- Helps validate systems before real failover is needed
4. Production vs Non-Production Testing
Production Environment Testing
What it is
Testing conducted directly on live systems used by real users.
Key points
- High risk if something goes wrong
- Must be carefully planned and approved
- Usually done during low-usage periods
- Requires strong rollback plans
Purpose
- Test real-world performance and recovery
- Ensure systems behave correctly under actual workloads
Example (IT context)
- Testing failover of a live database cluster used by customers
Exam tips
- High impact if failure occurs
- Provides realistic testing results
Non-Production Environment Testing
What it is
Testing performed in separate environments that do not affect users.
Key points
- Includes staging, testing, or lab environments
- Safe to perform aggressive testing
- No impact on production users
Purpose
- Validate configurations
- Test recovery steps safely
- Experiment without risk
Example (IT context)
- Restoring backups in a test environment to verify data integrity
Exam tips
- No risk to live systems
- Used for validation before production deployment
Comparison Summary
| Testing Type | Risk Level | Impact on Users | Purpose |
|---|---|---|---|
| Tabletop | None | None | Plan validation and discussion |
| Live Failover | Medium | Minimal/None | Real failover validation |
| Simulated Failover | Low | Minimal | Controlled testing of DR |
| Production Testing | High | Possible impact | Real-world validation |
| Non-Production | None | None | Safe testing and validation |
Key Exam Takeaways
- Disaster recovery testing ensures the DR plan works when needed.
- Tabletop = discussion only, no system impact.
- Live failover = real-time switching with minimal downtime.
- Simulated failover = controlled testing without major impact.
- Production testing = real environment, high risk but realistic.
- Non-production testing = safe environment, no user impact.
