Selecting an appropriate DR strategy to meet business requirements

Task Statement 2.2: Design highly available and/or fault-tolerant architectures.

📘AWS Certified Solutions Architect – (SAA-C03)


1. What is Disaster Recovery (DR)?

Disaster Recovery (DR) is the process of restoring applications, data, and infrastructure after a failure such as:

  • Region failure
  • Data center outage
  • Application crash
  • Data corruption

The goal is to minimize downtime and data loss.


2. Key Concepts You MUST Know for the Exam

2.1 RTO (Recovery Time Objective)

  • Definition: Maximum acceptable time to restore a system after failure
  • Example: System must be back within 10 minutes

👉 Lower RTO = faster recovery = higher cost


2.2 RPO (Recovery Point Objective)

  • Definition: Maximum acceptable data loss measured in time
  • Example: Losing 5 minutes of data is acceptable

👉 Lower RPO = less data loss = higher cost


2.3 Relationship Between RTO, RPO, and Cost

RequirementImpact
Low RTORequires faster failover → expensive
Low RPORequires continuous replication → expensive
High RTO/RPOSlower recovery → cheaper

👉 Exam Tip: Always match DR strategy with business requirements (RTO + RPO)


3. AWS Disaster Recovery Strategies (Important)

AWS defines 4 main DR strategies. You must know all of them clearly.


3.1 Backup and Restore (Lowest Cost)

How it works:

  • Data is backed up regularly
  • Infrastructure is recreated after failure

AWS Services Used:

  • Amazon S3
  • Amazon Glacier
  • AWS Backup
  • EBS Snapshots
  • RDS Snapshots

Characteristics:

  • RTO: High (hours to days)
  • RPO: High (data loss possible)
  • Cost: Very low

When to Use:

  • Non-critical applications
  • Systems that can tolerate downtime

Key Idea:

👉 Nothing is running until disaster happens


3.2 Pilot Light

How it works:

  • Core system (like database) is always running
  • Rest of infrastructure is created during disaster

AWS Services Used:

  • Amazon RDS / DynamoDB (replicated)
  • Amazon EC2 (minimal running)
  • AMI templates
  • CloudFormation

Characteristics:

  • RTO: Medium (minutes to hours)
  • RPO: Low (data is replicated)
  • Cost: Low to medium

When to Use:

  • Important applications
  • Need faster recovery than backup

Key Idea:

👉 Only critical components stay active


3.3 Warm Standby

How it works:

  • Full system is running but at reduced capacity
  • Scales up during disaster

AWS Services Used:

  • EC2 Auto Scaling
  • RDS Multi-AZ / Read Replicas
  • Elastic Load Balancer
  • Route 53

Characteristics:

  • RTO: Low (minutes)
  • RPO: Low
  • Cost: Medium to high

When to Use:

  • Business-critical applications
  • Need quick recovery

Key Idea:

👉 System is always running, just scaled down


3.4 Multi-Site (Active-Active) (Highest Cost)

How it works:

  • Full system runs in multiple Regions simultaneously
  • Traffic is shared between them

AWS Services Used:

  • Route 53 (latency/health-based routing)
  • DynamoDB Global Tables
  • S3 Cross-Region Replication
  • Aurora Global Database

Characteristics:

  • RTO: Near zero
  • RPO: Near zero
  • Cost: Very high

When to Use:

  • Mission-critical systems
  • No downtime allowed

Key Idea:

👉 Both environments are always active


4. Comparison Table (VERY IMPORTANT FOR EXAM)

StrategyRTORPOCostComplexity
Backup & RestoreHighHighLowLow
Pilot LightMediumLowLow-MediumMedium
Warm StandbyLowLowMedium-HighMedium
Multi-SiteVery LowVery LowVery HighHigh

👉 Exam Trick:
If question mentions:

  • “Cheapest” → Backup & Restore
  • “Fast recovery, low cost” → Pilot Light
  • “Quick failover” → Warm Standby
  • “Zero downtime” → Multi-Site

5. Choosing the Right DR Strategy (Exam Logic)

To select the correct DR strategy, follow this thinking process:


Step 1: Check RTO requirement

  • Seconds/minutes → Multi-Site or Warm Standby
  • Hours → Backup or Pilot Light

Step 2: Check RPO requirement

  • Near zero data loss → Continuous replication needed
  • Some data loss acceptable → Backup-based solutions

Step 3: Check Budget

  • Low budget → Backup & Restore
  • Medium → Pilot Light / Warm Standby
  • High → Multi-Site

Step 4: Check Application Criticality

  • Non-critical → Backup
  • Important → Pilot Light
  • Business-critical → Warm Standby
  • Mission-critical → Multi-Site

6. AWS Services Used in DR (Exam Focus)

Data Replication

  • S3 Cross-Region Replication (CRR)
  • DynamoDB Global Tables
  • Aurora Global Database
  • RDS Read Replicas

Backup Services

  • AWS Backup
  • EBS Snapshots
  • S3 Glacier

Traffic Routing & Failover

  • Route 53
    • Failover routing
    • Health checks

Compute Recovery

  • EC2 AMIs
  • Auto Scaling Groups
  • CloudFormation (infrastructure automation)

7. Important Exam Scenarios

Scenario 1:

  • “Restore system after several hours is acceptable”
    👉 Answer: Backup & Restore

Scenario 2:

  • “Keep database running, start app during failure”
    👉 Answer: Pilot Light

Scenario 3:

  • “System must recover within minutes”
    👉 Answer: Warm Standby

Scenario 4:

  • “No downtime allowed, global users”
    👉 Answer: Multi-Site

8. Best Practices (Exam Must-Know)

  • Always define RTO and RPO first
  • Automate recovery using:
    • CloudFormation
    • Auto Scaling
  • Use multi-AZ for high availability (not DR alone)
  • Use multi-region for disaster recovery
  • Regularly test DR strategy
  • Encrypt backups and replicate securely

9. Common Mistakes (Exam Traps)

❌ Confusing High Availability vs Disaster Recovery

  • HA = within same region (Multi-AZ)
  • DR = across regions

❌ Choosing expensive solution unnecessarily

  • Always match requirement → not maximum performance

❌ Ignoring RPO/RTO

  • Most questions are based on these

10. Final Summary

  • DR ensures systems recover after failure
  • 4 key strategies:
    1. Backup & Restore (cheapest, slowest)
    2. Pilot Light (partial running)
    3. Warm Standby (scaled-down full system)
    4. Multi-Site (fully active, fastest)

👉 Golden Rule for Exam:

The correct DR strategy is the one that meets RTO, RPO, and cost requirements — not the most advanced one

Buy Me a Coffee