Task Statement 2.2: Design highly available and/or fault-tolerant architectures.
📘AWS Certified Solutions Architect – (SAA-C03)
1. Key Concepts (Must Understand First)
1. Durability
- Durability = data is not lost
- Measured as a percentage (e.g., 99.999999999% durability)
- Achieved by:
- Replication
- Backups
- Versioning
👉 Example (IT context): A database automatically stores copies of data in multiple storage systems so that even if one fails, data is still safe.
2. Availability
- Availability = data can be accessed when needed
- Measured as uptime (e.g., 99.99% availability)
- Achieved by:
- Multi-AZ deployments
- Replication
- Failover systems
👉 Example: A web application can still read/write data even if one data center fails.
3. Durability vs Availability (Exam Tip)
| Feature | Durability | Availability |
|---|---|---|
| Goal | Prevent data loss | Ensure access to data |
| Focus | Storage & backups | Access & uptime |
| Example | S3 replication | Multi-AZ database |
2. AWS Services for Data Durability & Availability
2.1 Amazon S3 (Highly Important)
Key Features:
- 11 9’s durability (99.999999999%)
- Data automatically stored across multiple Availability Zones
Strategies:
1. Versioning
- Keeps multiple versions of an object
- Protects against:
- Accidental deletion
- Overwrites
2. Cross-Region Replication (CRR)
- Copies data to another AWS Region
- Used for:
- Disaster recovery
- Global availability
3. Same-Region Replication (SRR)
- Replicates within same region
- Useful for compliance or isolation
4. Lifecycle Policies
- Automatically move data to cheaper storage:
- S3 Glacier
- S3 Glacier Deep Archive
5. Object Lock
- Prevents deletion (WORM – Write Once Read Many)
2.2 Amazon EBS (Elastic Block Store)
Key Features:
- Data automatically replicated within a single AZ
Strategies:
1. EBS Snapshots
- Backup stored in S3
- Incremental (only changes saved)
2. Cross-Region Snapshot Copy
- Copy snapshots to another region
3. Encryption
- Protects data at rest
2.3 Amazon RDS (Relational Database Service)
High Availability:
- Multi-AZ deployment
- Primary DB + standby replica
- Automatic failover
Durability Strategies:
1. Automated Backups
- Daily backups + transaction logs
- Point-in-time recovery
2. Read Replicas
- Improve availability and performance
- Can be promoted if primary fails
3. Snapshots
- Manual backups
2.4 Amazon DynamoDB
Built-in Features:
- Automatically replicated across multiple AZs
Strategies:
1. On-Demand Backup
- Full table backup
2. Point-in-Time Recovery (PITR)
- Restore to any second in last 35 days
3. Global Tables
- Multi-region replication
- Active-active setup
2.5 Amazon EFS (Elastic File System)
- Multi-AZ file storage
- Highly available and durable by default
Backup:
- Use AWS Backup or lifecycle policies
2.6 AWS Backup (Very Important Service)
What it does:
- Centralized backup management
Supports:
- EBS
- RDS
- DynamoDB
- EFS
- FSx
Features:
- Automated backup scheduling
- Retention policies
- Cross-region backup
- Backup vaults
3. Backup Strategies (Core Exam Section)
3.1 Types of Backups
1. Full Backup
- Entire dataset copied
- Slow but complete
2. Incremental Backup
- Only changes since last backup
- Faster and efficient
👉 Used by:
- EBS snapshots
- RDS backups
3.2 Backup Frequency
Depends on:
- Business requirements
- Data change rate
Key concept:
- RPO (Recovery Point Objective)
→ How much data loss is acceptable - RTO (Recovery Time Objective)
→ How quickly system must recover
3.3 Backup Storage Options
- Amazon S3 (most common)
- S3 Glacier (long-term, low-cost storage)
4. Disaster Recovery (DR) Strategies
4.1 Backup and Restore
- Data backed up and restored when needed
- Cheapest but slow recovery
4.2 Pilot Light
- Minimal infrastructure running
- Scale up during failure
4.3 Warm Standby
- Scaled-down fully functional system
- Faster recovery
4.4 Multi-Site (Active-Active)
- Fully running in multiple regions
- Highest availability, most expensive
5. Data Replication Strategies
Types:
1. Synchronous Replication
- Data written to multiple locations at same time
- High consistency
2. Asynchronous Replication
- Data copied after write
- Faster but slight delay
6. Security for Data Durability
Encryption
- At rest (S3, EBS, RDS)
- In transit (TLS)
Access Control
- IAM policies
- Bucket policies
Data Integrity
- Checksums
- Versioning
7. Monitoring and Automation
Monitoring:
- Amazon CloudWatch
- Backup failures
- Storage metrics
Automation:
- AWS Backup policies
- Lifecycle rules
- Lambda for custom backup logic
8. Best Practices (Exam-Focused)
- Enable S3 versioning for critical data
- Use Multi-AZ for databases (RDS)
- Take regular automated backups
- Store backups in multiple regions
- Use encryption everywhere
- Use lifecycle policies to reduce cost
- Test backup restoration regularly
- Use AWS Backup for centralized control
9. Common Exam Scenarios
Scenario 1:
Requirement: Prevent accidental deletion of files
✅ Solution: Enable S3 Versioning
Scenario 2:
Requirement: Recover database to specific point in time
✅ Solution: RDS Point-in-Time Recovery
Scenario 3:
Requirement: Global disaster recovery
✅ Solution: Cross-Region Replication / Global Tables
Scenario 4:
Requirement: Low-cost long-term backup
✅ Solution: S3 Glacier / Glacier Deep Archive
Scenario 5:
Requirement: Automatic backup for multiple services
✅ Solution: AWS Backup
10. Final Summary (Must Remember)
To pass the exam, remember:
- Durability = no data loss
- Availability = data accessible
- Use:
- S3 for high durability
- RDS Multi-AZ for availability
- Backups + replication for protection
- Understand:
- Versioning
- Snapshots
- Replication
- RPO & RTO
- Know disaster recovery strategies
