Task Statement 2.2: Design highly available and/or fault-tolerant architectures.
📘AWS Certified Solutions Architect – (SAA-C03)
1. What is a Distributed System?
A distributed system is an architecture where:
- Components run on multiple machines (nodes)
- These nodes communicate over a network
- Workload is shared across multiple resources
In AWS, this usually means:
- Multiple EC2 instances
- Multiple Availability Zones (AZs)
- Multiple Regions
- Managed services like Amazon SQS, Amazon DynamoDB, Amazon S3
2. Why Distributed Design Patterns are Important
Distributed patterns help achieve:
1. High Availability
System continues working even if one component fails.
2. Fault Tolerance
Failures are isolated and do not affect the entire system.
3. Scalability
System can handle increasing traffic by adding more resources.
4. Resilience
System recovers quickly from failures.
3. Key Distributed Design Patterns (Exam-Focused)
3.1 Load Balancing Pattern
A load balancer distributes incoming traffic across multiple servers.
AWS Services:
- Application Load Balancer (ALB)
- Network Load Balancer (NLB)
- Elastic Load Balancing (ELB)
How it works:
- Traffic comes in
- Load balancer sends it to healthy instances
- If an instance fails, traffic is redirected
Exam Tip:
- Use ALB for HTTP/HTTPS (layer 7)
- Use NLB for high performance (layer 4)
3.2 Stateless Architecture Pattern
A system is stateless when:
- No session data is stored on the server
- Each request is independent
AWS Implementation:
- Store session data in:
- Amazon DynamoDB
- Amazon ElastiCache (Redis/Memcached)
- Use multiple EC2 instances behind a load balancer
Why important?
- Any instance can handle any request
- Easy to scale horizontally
Exam Tip:
- Stateless systems = easier to scale and recover
3.3 Caching Pattern
Caching stores frequently accessed data to reduce latency and backend load.
AWS Services:
- Amazon CloudFront (CDN)
- Amazon ElastiCache
- API Gateway caching
Benefits:
- Faster response time
- Reduced load on databases
Exam Tip:
- Use cache for read-heavy workloads
- Use write-through or write-back caching strategies
3.4 Decoupling Pattern
Decoupling separates components so they don’t directly depend on each other.
AWS Services:
- Amazon SQS (Simple Queue Service)
- Amazon SNS (Simple Notification Service)
- Amazon EventBridge
How it works:
- Producer sends message to queue
- Consumer processes message later
Benefits:
- Systems are independent
- Failures do not affect the entire system
Exam Tip:
- Use SQS for asynchronous processing
- Use SNS for fan-out (one-to-many messaging)
3.5 Microservices Architecture Pattern
Application is broken into small, independent services.
Each service:
- Has a specific function
- Can be deployed independently
AWS Services:
- AWS Lambda
- Amazon ECS / EKS
- API Gateway
Benefits:
- Independent scaling
- Fault isolation
- Faster development
3.6 Event-Driven Architecture Pattern
System reacts to events instead of direct requests.
AWS Services:
- Amazon EventBridge
- SNS
- Lambda (event triggers)
How it works:
- Event occurs (e.g., file upload)
- Event triggers downstream services
Exam Tip:
- Use when:
- Systems need loose coupling
- Real-time processing is required
3.7 Failover Pattern
Automatic switching to a backup system when the primary system fails.
AWS Services:
- Amazon Route 53 (DNS failover)
- Multi-AZ deployments
- Multi-Region architectures
Types:
- Active-Passive
- Active-Active
Exam Tip:
- Use Route 53 health checks
- Use Multi-AZ RDS for automatic failover
3.8 Replication Pattern
Data is copied across multiple locations.
Types:
- Synchronous replication
- Asynchronous replication
AWS Services:
- Amazon RDS Multi-AZ
- DynamoDB Global Tables
- S3 Cross-Region Replication (CRR)
Benefits:
- Data durability
- High availability
3.9 Sharding (Partitioning) Pattern
Data is split into smaller parts and distributed across multiple databases.
Example (AWS):
- DynamoDB automatically partitions data
- RDS can use manual sharding
Benefits:
- Improved performance
- Scalable storage
Exam Tip:
- Use when a database becomes too large or slow
3.10 Bulkhead Pattern
Isolates parts of a system to prevent total failure.
How it works:
- Resources are divided into groups
- Failure in one group does not affect others
AWS Example:
- Separate EC2 Auto Scaling groups per service
- Separate queues for different workloads
3.11 Circuit Breaker Pattern
Prevents a system from repeatedly trying a failing operation.
How it works:
- Detect failure
- Stop sending requests temporarily
- Retry after some time
AWS Context:
- Used in application logic
- Often implemented with SDK retries
4. Common AWS Distributed Architectures
4.1 Multi-AZ Architecture
- Deploy resources across multiple AZs
- Example:
- ALB + EC2 + RDS Multi-AZ
4.2 Multi-Region Architecture
- Deploy across multiple Regions
- Used for:
- Disaster recovery
- Global applications
4.3 Serverless Distributed Architecture
- Uses:
- AWS Lambda
- API Gateway
- DynamoDB
- Fully managed, auto-scaling
5. Key Exam Concepts to Remember
1. Loose Coupling
- Services should not depend directly on each other
- Use SQS/SNS/EventBridge
2. Idempotency
- Same operation can be repeated safely without changing the result
- Important for retries in distributed systems
3. Retry Mechanisms
- Systems must handle transient failures
- Use exponential backoff
4. Consistency Models
- Strong consistency (immediate consistency)
- Eventual consistency (data updates propagate over time)
Example:
- DynamoDB supports both (depending on configuration)
6. When to Use Each Pattern (Exam Tips)
| Requirement | Best Pattern |
|---|---|
| High availability | Multi-AZ, Failover |
| Scalability | Load balancing, Stateless |
| Asynchronous processing | SQS |
| Event-based system | EventBridge, SNS |
| Data replication | Multi-AZ, CRR |
| Fast read performance | Caching |
| Microservices | Lambda, ECS |
7. Important AWS Services for Distributed Systems
- Amazon EC2 – compute instances
- Amazon SQS – message queues
- Amazon SNS – pub/sub messaging
- Amazon DynamoDB – NoSQL database
- Amazon RDS – relational database
- Amazon S3 – object storage
- Amazon CloudFront – content delivery
- Elastic Load Balancing – traffic distribution
- AWS Lambda – serverless compute
8. Exam Strategy Tips
- Identify keywords in questions:
- “scalable” → stateless, load balancing
- “decoupled” → SQS/SNS
- “failover” → Route 53, Multi-AZ
- “high throughput” → DynamoDB, sharding
- Look for failure scenarios
- Choose services that:
- Are managed
- Support auto-scaling
- Provide redundancy
9. Final Summary
Distributed design patterns help you:
- Build highly available systems
- Handle failures gracefully
- Scale horizontally
- Improve performance and resilience
In AWS, most distributed patterns are implemented using:
- Managed services (SQS, SNS, DynamoDB, Lambda)
- Multi-AZ and Multi-Region architectures
- Load balancing and caching
