Task Statement 2.2: Design highly available and/or fault-tolerant architectures.
📘AWS Certified Solutions Architect – (SAA-C03)
1. What is AWS Global Infrastructure?
AWS Global Infrastructure is the worldwide network of data centers that AWS uses to run its cloud services.
It is designed to provide:
- High availability (systems stay up)
- Fault tolerance (systems keep working even if something fails)
- Low latency (fast response time)
- Scalability (handle growth easily)
2. Core Components of AWS Global Infrastructure
You must clearly understand these 3 main components:
2.1 AWS Regions
What is a Region?
An AWS Region is a geographical area that contains multiple data centers.
Examples:
- us-east-1 (North Virginia)
- eu-west-1 (Ireland)
- ap-south-1 (Mumbai)
Key Points for Exam:
- Each region is completely isolated from others
- You choose a region when deploying resources
- Data does NOT automatically move between regions
- Each region has multiple Availability Zones
Why Regions Matter:
- Used for disaster recovery
- Helps meet compliance requirements
- Reduces latency for users near that region
2.2 Availability Zones (AZs)
What is an Availability Zone?
An Availability Zone (AZ) is a separate data center (or group of data centers) within a region.
Each AZ:
- Has independent power, cooling, and networking
- Is physically separated from other AZs
- Is connected with high-speed private network
Example:
Region → ap-south-1
AZs → ap-south-1a, ap-south-1b, ap-south-1c
Key Points for Exam:
- AZs are designed for fault isolation
- If one AZ fails, others still work
- Used to build highly available systems
Important Design Principle:
👉 Always deploy applications across multiple AZs
2.3 Edge Locations
What are Edge Locations?
Edge locations are data centers used for content delivery and caching.
Used by services like:
- Amazon CloudFront
- AWS Global Accelerator
Key Points:
- Located close to end users
- Used to cache content
- Improves performance and latency
3. Amazon Route 53 (DNS Service)
What is Route 53?
Amazon Route 53 is a DNS (Domain Name System) service.
It translates:
example.com → IP address
Key Features for Exam:
3.1 Highly Available DNS
- Route 53 is globally distributed
- Automatically routes traffic even if failures occur
3.2 Routing Policies (VERY IMPORTANT)
You must know these:
1. Simple Routing
- One resource
- No failover
2. Weighted Routing
- Split traffic between multiple resources
- Example: 70% → Server A, 30% → Server B
3. Latency-Based Routing
- Sends users to lowest latency region
4. Failover Routing
- Primary + Secondary setup
- If primary fails → traffic goes to secondary
👉 Used for disaster recovery
5. Geolocation Routing
- Routes based on user location
6. Multi-Value Routing
- Returns multiple IPs
- Improves availability
3.3 Health Checks
Route 53 can:
- Monitor endpoints
- Detect failures
- Automatically redirect traffic
👉 Critical for fault tolerance
4. How AWS Global Infrastructure Enables High Availability
High availability means:
👉 System stays operational even if something fails
4.1 Multi-AZ Architecture
Design:
- Deploy application in 2 or more AZs
- Use load balancing
Example (IT Environment):
- Web servers in AZ-A and AZ-B
- Load balancer distributes traffic
- If AZ-A fails → AZ-B handles traffic
4.2 Elastic Load Balancer (ELB)
- Distributes traffic across multiple AZs
- Automatically removes unhealthy instances
👉 Works closely with AZ design
4.3 Auto Scaling
- Automatically adds/removes servers
- Ensures capacity during failure or load increase
5. How AWS Global Infrastructure Enables Fault Tolerance
Fault tolerance means:
👉 System continues working even when failures occur
5.1 Multi-Region Architecture
- Deploy application in multiple regions
- Use Route 53 for routing
Example:
- Primary region: ap-south-1
- Secondary region: us-east-1
If primary fails → Route 53 redirects traffic
5.2 Data Replication
Services that support replication:
- Amazon S3 (cross-region replication)
- Amazon RDS (read replicas, multi-AZ)
- DynamoDB (global tables)
6. Disaster Recovery Strategies (Exam Important)
Know these 4 types:
6.1 Backup and Restore
- Backup data (e.g., S3)
- Restore when failure happens
👉 Cheapest but slowest
6.2 Pilot Light
- Minimal environment always running
- Scale up during failure
6.3 Warm Standby
- Smaller version of full system running
- Quickly scale when needed
6.4 Multi-Site (Active-Active)
- Full system in multiple regions
- Traffic distributed across all
👉 Most expensive but highest availability
7. Exam Tips (VERY IMPORTANT)
1. AZ vs Region
- AZ = high availability
- Region = disaster recovery
2. Always Use:
- Multiple AZs for production workloads
- Load balancer + Auto Scaling
3. Route 53 Questions:
- Failover → disaster recovery
- Latency → performance optimization
- Weighted → traffic splitting
4. Data Safety:
- Use replication for fault tolerance
- Use backups for recovery
5. Isolation Concept:
- Regions are isolated
- AZs are isolated but connected
8. Quick Summary
| Component | Purpose |
|---|---|
| Region | Geographic isolation |
| Availability Zone | Fault isolation within region |
| Edge Location | Fast content delivery |
| Route 53 | DNS + traffic routing |
| Multi-AZ | High availability |
| Multi-Region | Disaster recovery |
Final Understanding
To pass the exam, remember:
- High availability → Multiple AZs
- Fault tolerance → Multi-region + replication
- Traffic routing → Route 53
- Performance → Edge locations
- Design goal → No single point of failure
