Task Statement 2.2: Design highly available and/or fault-tolerant architectures.

📘AWS Certified Solutions Architect – (SAA-C03)

1. What is a Distributed System?

A distributed system is an architecture where:

Components run on multiple machines (nodes)
These nodes communicate over a network
Workload is shared across multiple resources

In AWS, this usually means:

Multiple EC2 instances
Multiple Availability Zones (AZs)
Multiple Regions
Managed services like Amazon SQS, Amazon DynamoDB, Amazon S3

2. Why Distributed Design Patterns are Important

Distributed patterns help achieve:

1. High Availability

System continues working even if one component fails.

2. Fault Tolerance

Failures are isolated and do not affect the entire system.

3. Scalability

System can handle increasing traffic by adding more resources.

4. Resilience

System recovers quickly from failures.

3. Key Distributed Design Patterns (Exam-Focused)

3.1 Load Balancing Pattern

A load balancer distributes incoming traffic across multiple servers.

AWS Services:

Application Load Balancer (ALB)
Network Load Balancer (NLB)
Elastic Load Balancing (ELB)

How it works:

Traffic comes in
Load balancer sends it to healthy instances
If an instance fails, traffic is redirected

Exam Tip:

Use ALB for HTTP/HTTPS (layer 7)
Use NLB for high performance (layer 4)

3.2 Stateless Architecture Pattern

A system is stateless when:

No session data is stored on the server
Each request is independent

AWS Implementation:

Store session data in:
- Amazon DynamoDB
- Amazon ElastiCache (Redis/Memcached)
Use multiple EC2 instances behind a load balancer

Why important?

Any instance can handle any request
Easy to scale horizontally

Exam Tip:

Stateless systems = easier to scale and recover

3.3 Caching Pattern

Caching stores frequently accessed data to reduce latency and backend load.

AWS Services:

Amazon CloudFront (CDN)
Amazon ElastiCache
API Gateway caching

Benefits:

Faster response time
Reduced load on databases

Exam Tip:

Use cache for read-heavy workloads
Use write-through or write-back caching strategies

3.4 Decoupling Pattern

Decoupling separates components so they don’t directly depend on each other.

AWS Services:

Amazon SQS (Simple Queue Service)
Amazon SNS (Simple Notification Service)
Amazon EventBridge

How it works:

Producer sends message to queue
Consumer processes message later

Benefits:

Systems are independent
Failures do not affect the entire system

Exam Tip:

Use SQS for asynchronous processing
Use SNS for fan-out (one-to-many messaging)

3.5 Microservices Architecture Pattern

Application is broken into small, independent services.

Each service:

Has a specific function
Can be deployed independently

AWS Services:

AWS Lambda
Amazon ECS / EKS
API Gateway

Benefits:

Independent scaling
Fault isolation
Faster development

3.6 Event-Driven Architecture Pattern

System reacts to events instead of direct requests.

AWS Services:

Amazon EventBridge
SNS
Lambda (event triggers)

How it works:

Event occurs (e.g., file upload)
Event triggers downstream services

Exam Tip:

Use when:
- Systems need loose coupling
- Real-time processing is required

3.7 Failover Pattern

Automatic switching to a backup system when the primary system fails.

AWS Services:

Amazon Route 53 (DNS failover)
Multi-AZ deployments
Multi-Region architectures

Types:

Active-Passive
Active-Active

Exam Tip:

Use Route 53 health checks
Use Multi-AZ RDS for automatic failover

3.8 Replication Pattern

Data is copied across multiple locations.

Types:

Synchronous replication
Asynchronous replication

AWS Services:

Amazon RDS Multi-AZ
DynamoDB Global Tables
S3 Cross-Region Replication (CRR)

Benefits:

Data durability
High availability

3.9 Sharding (Partitioning) Pattern

Data is split into smaller parts and distributed across multiple databases.

Example (AWS):

DynamoDB automatically partitions data
RDS can use manual sharding

Benefits:

Improved performance
Scalable storage

Exam Tip:

Use when a database becomes too large or slow

3.10 Bulkhead Pattern

Isolates parts of a system to prevent total failure.

How it works:

Resources are divided into groups
Failure in one group does not affect others

AWS Example:

Separate EC2 Auto Scaling groups per service
Separate queues for different workloads

3.11 Circuit Breaker Pattern

Prevents a system from repeatedly trying a failing operation.

How it works:

Detect failure
Stop sending requests temporarily
Retry after some time

AWS Context:

Used in application logic
Often implemented with SDK retries

4. Common AWS Distributed Architectures

4.1 Multi-AZ Architecture

Deploy resources across multiple AZs
Example:
- ALB + EC2 + RDS Multi-AZ

4.2 Multi-Region Architecture

Deploy across multiple Regions
Used for:
- Disaster recovery
- Global applications

4.3 Serverless Distributed Architecture

Uses:
- AWS Lambda
- API Gateway
- DynamoDB
Fully managed, auto-scaling

5. Key Exam Concepts to Remember

1. Loose Coupling

Services should not depend directly on each other
Use SQS/SNS/EventBridge

2. Idempotency

Same operation can be repeated safely without changing the result
Important for retries in distributed systems

3. Retry Mechanisms

Systems must handle transient failures
Use exponential backoff

4. Consistency Models

Strong consistency (immediate consistency)
Eventual consistency (data updates propagate over time)

Example:

DynamoDB supports both (depending on configuration)

6. When to Use Each Pattern (Exam Tips)

Requirement	Best Pattern
High availability	Multi-AZ, Failover
Scalability	Load balancing, Stateless
Asynchronous processing	SQS
Event-based system	EventBridge, SNS
Data replication	Multi-AZ, CRR
Fast read performance	Caching
Microservices	Lambda, ECS

7. Important AWS Services for Distributed Systems

Amazon EC2 – compute instances
Amazon SQS – message queues
Amazon SNS – pub/sub messaging
Amazon DynamoDB – NoSQL database
Amazon RDS – relational database
Amazon S3 – object storage
Amazon CloudFront – content delivery
Elastic Load Balancing – traffic distribution
AWS Lambda – serverless compute

8. Exam Strategy Tips

Identify keywords in questions:
- “scalable” → stateless, load balancing
- “decoupled” → SQS/SNS
- “failover” → Route 53, Multi-AZ
- “high throughput” → DynamoDB, sharding
Look for failure scenarios
Choose services that:
- Are managed
- Support auto-scaling
- Provide redundancy

9. Final Summary

Distributed design patterns help you:

Build highly available systems
Handle failures gracefully
Scale horizontally
Improve performance and resilience

In AWS, most distributed patterns are implemented using:

Managed services (SQS, SNS, DynamoDB, Lambda)
Multi-AZ and Multi-Region architectures
Load balancing and caching