3.4 Resilience & recovery
📘CompTIA Security+ (SY0-701)
Introduction
In cybersecurity, resilience and recovery are all about keeping systems, data, and operations running even when something goes wrong — such as a cyberattack, system failure, or disaster.
To ensure that an organization can recover quickly, regular testing and capacity planning are vital.
- Testing verifies that recovery and continuity procedures actually work.
- Capacity planning ensures that there are enough people, resources, and infrastructure to handle normal and emergency situations.
1. Testing Recovery and Resilience Plans
Testing helps identify weaknesses in an organization’s recovery strategies before a real incident occurs. There are several common types of tests used in IT environments.
1.1 Tabletop Exercise
Definition:
A tabletop exercise is a discussion-based test where key personnel meet to talk through an incident or disaster scenario step-by-step.
Purpose:
- To review the organization’s incident response, disaster recovery (DR), and business continuity (BC) plans.
- To identify gaps in communication, responsibilities, or procedures.
- To make sure everyone knows their roles during an emergency.
How it works in IT:
- The security or IT management team presents a scenario (for example, a ransomware attack or data center outage).
- Participants discuss what actions they would take, what resources they would need, and how they would restore operations.
- No systems are actually shut down — it’s purely a planning and discussion exercise.
Exam Tip:
Tabletop = discussion-based, no real systems affected.
1.2 Simulation Testing
Definition:
A simulation test mimics a real disaster or cyberattack, but in a controlled environment.
Purpose:
- To evaluate the technical procedures in disaster recovery and business continuity plans.
- To test how well staff respond under simulated stress conditions.
How it works in IT:
- Security teams might simulate a DDoS attack to test incident response and firewall performance.
- A data restore simulation could be performed to ensure backups are functional and recoverable.
- These tests are more hands-on than tabletop exercises and may involve live systems or test environments.
Exam Tip:
Simulation = practical testing under controlled conditions (not full production failover).
1.3 Failover Testing
Definition:
A failover test checks if systems can automatically or manually switch to a backup system or redundant site when the primary system fails.
Purpose:
- To verify high availability (HA) and redundancy mechanisms.
- To confirm that backup systems are ready and that data replication works correctly.
How it works in IT:
- A company might have two data centers — primary and backup.
- During a failover test, IT administrators intentionally shut down services in the primary data center to verify if the backup data center activates correctly.
- After testing, the system can be switched back (failback).
Exam Tip:
Failover = testing backup or redundant systems for continuity.
1.4 Parallel Processing Test
Definition:
A parallel processing test runs critical systems simultaneously at both the main and backup sites to ensure that the backup site can handle live workloads.
Purpose:
- To validate that backup systems can run production workloads correctly.
- To ensure no data is lost or corrupted during synchronization.
How it works in IT:
- Both the main and backup systems are running.
- The backup system receives the same data in real-time (replication).
- IT compares the results and performance of both systems.
Exam Tip:
Parallel = backup systems process data alongside the primary systems without full cutover.
Summary of Testing Types
| Testing Type | Nature | Involves Real Systems? | Purpose |
|---|---|---|---|
| Tabletop | Discussion | No | Review roles and procedures |
| Simulation | Controlled practice | Sometimes | Test response in a realistic way |
| Failover | Technical test | Yes | Verify backup systems and redundancy |
| Parallel Processing | Technical + live comparison | Yes | Ensure both primary and backup systems work properly |
2. Capacity Planning
Definition
Capacity planning is the process of making sure that an organization has enough resources — such as people, technology, and infrastructure — to meet current and future demands, even during disruptions or growth.
It ensures performance, availability, and resilience are maintained at all times.
2.1 People (Human Capacity Planning)
Focus: Ensuring the organization has enough skilled personnel to maintain and restore operations.
Key Considerations:
- Staffing Levels: Making sure there are backup employees or cross-trained staff who can step in if key personnel are unavailable.
- Skill Development: Regular training on disaster recovery, cybersecurity, and incident response.
- Shift Coverage: Having teams available for 24/7 operations if needed.
- Communication Plans: Clear hierarchy and reporting lines during a crisis.
Why It Matters:
Without enough trained staff, recovery efforts can be delayed or performed incorrectly.
2.2 Technology (Technical Capacity Planning)
Focus: Ensuring systems, applications, and hardware can handle normal and peak loads, including failover scenarios.
Key Considerations:
- System Resources: CPU, memory, and storage capacity must meet demand.
- Scalability: Systems should handle increased workloads (for example, more users connecting during a failover).
- Redundancy: Having backup systems or virtual machines ready to activate when needed.
- Cloud and Virtualization: Using scalable cloud resources for flexibility during high demand.
- Backup and Recovery Tools: Ensuring backup software and recovery tools are capable of restoring data quickly.
Why It Matters:
If systems can’t handle load during recovery or peak times, it can cause downtime or data loss.
2.3 Infrastructure (Physical and Network Capacity Planning)
Focus: Ensuring the physical and network environment can support operations under normal and disaster conditions.
Key Considerations:
- Data Center Capacity: Enough power, cooling, and physical space for servers and backup hardware.
- Network Bandwidth: Adequate bandwidth to handle replication and recovery traffic, especially for remote backups or cloud recovery.
- Redundant Links: Having multiple internet or network connections to avoid a single point of failure.
- Geographic Redundancy: Placing backup infrastructure in different physical locations to avoid regional disasters affecting both sites.
Why It Matters:
Without reliable infrastructure, even the best recovery plan cannot function.
Exam Tip Summary Table
| Category | Purpose | Key Focus Areas |
|---|---|---|
| People | Maintain human resource readiness | Training, cross-training, staffing levels, communication |
| Technology | Ensure system performance and availability | Scalability, redundancy, backup tools |
| Infrastructure | Provide physical and network resilience | Data center capacity, power, network redundancy |
Final Summary
| Concept | Purpose |
|---|---|
| Testing | To verify that disaster recovery and business continuity plans work as intended. |
| Tabletop | Discussion-based review of procedures. |
| Simulation | Practical test of systems in controlled conditions. |
| Failover | Testing backup or redundant systems to ensure they activate properly. |
| Parallel Processing | Running primary and backup systems together to ensure readiness. |
| Capacity Planning | Making sure people, technology, and infrastructure can handle normal and emergency operations. |
Key Takeaway for Exam
CompTIA expects you to understand how each testing method works, what capacity planning includes, and how they all contribute to organizational resilience and business continuity.
