Task Statement 4.2: Design cost-optimized compute solutions.

📘AWS Certified Solutions Architect – (SAA-C03)

1. What is Availability?

Availability means how much time your system is up and running without failure.

It is usually expressed as a percentage:

Availability	Downtime per year
99%	~3.65 days
99.9%	~8.7 hours
99.99%	~52 minutes
99.999%	~5 minutes

👉 Higher availability = less downtime but higher cost

2. Key Exam Concept

👉 Not all workloads need the same availability

You must:

Match availability to business importance
Avoid over-engineering (wasting money)
Avoid under-design (risking failures)

3. Workload Classification

AWS divides workloads into different categories:

A. Production Workloads (High Availability Required)

What are they?

Systems that are:

Live (used by real users)
Business-critical
Require continuous uptime

Examples (IT context):

Web applications used by customers
Backend APIs for mobile apps
Payment processing systems
Authentication services

Required Availability:

👉 High (99.9% to 99.99% or higher)

Design Requirements

1. Multi-AZ Deployment

Use multiple Availability Zones (AZs)
If one AZ fails → system still works

2. Load Balancing

Use:
- Application Load Balancer (ALB)
- Network Load Balancer (NLB)

3. Auto Scaling

Automatically replace failed instances
Handle traffic spikes

4. Fault Tolerance

No single point of failure

5. Data Replication

Use:
- Multi-AZ RDS
- DynamoDB (multi-AZ by default)

Cost Consideration

High cost (more resources)
Justified because downtime = business loss

B. Non-Production Workloads (Lower Availability Required)

What are they?

Systems used for:

Development
Testing
Staging

Examples (IT context):

QA testing environments
Developer sandboxes
Pre-production staging servers

Required Availability:

👉 Low to Medium (90%–99%)

Design Requirements

1. Single AZ Deployment

Cheaper than Multi-AZ
Acceptable downtime

2. Manual Recovery Allowed

No need for automatic failover

3. Limited Scaling

No need for full auto scaling

4. Smaller Instance Sizes

Reduce cost

Cost Consideration

Must be low cost
Downtime is acceptable

C. Batch / Background Workloads

What are they?

Processes that:

Run in the background
Are not user-facing
Can be delayed or retried

Examples:

Data processing jobs
Log analysis
Report generation

Required Availability:

👉 Flexible (can tolerate interruptions)

Design Approach

1. Use Spot Instances

Very cheap (up to 90% discount)
Can be interrupted

2. Retry Mechanisms

Jobs should restart automatically

3. Queue-Based Systems

Use:
- Amazon SQS
- AWS Batch

Cost Optimization

Maximum savings possible
Availability is not strict

D. Critical vs Non-Critical Components (Inside Same System)

Even within one application:

Component	Availability
Authentication API	High
Logging service	Medium
Analytics dashboard	Low

👉 Design each component separately

4. Availability vs Cost Trade-off (Very Important)

Design Choice	Availability	Cost
Single AZ	Low	Low
Multi-AZ	High	Medium
Multi-Region	Very High	Very High

Exam Tip:

👉 Do NOT choose Multi-Region unless explicitly required

5. AWS Services for Availability Design

High Availability Services

Amazon EC2 with Auto Scaling
Elastic Load Balancer (ELB)
Amazon RDS (Multi-AZ)
Amazon DynamoDB
Amazon S3 (99.999999999% durability)

Cost-Optimized (Lower Availability)

EC2 in single AZ
Spot Instances
AWS Lambda (for intermittent workloads)
Amazon ECS / Fargate (scale when needed)

6. Decision Framework (Exam Ready)

When you see a question:

Step 1: Identify Workload Type

Production → High availability
Dev/Test → Low availability
Batch → Flexible

Step 2: Check Requirements

Is downtime acceptable?
Is it user-facing?
Does it need real-time response?

Step 3: Choose Architecture

Requirement	Solution
High availability	Multi-AZ + Auto Scaling
Cost optimization	Single AZ or Spot
Extreme availability	Multi-Region

7. Common Exam Scenarios

Scenario 1:

“Customer-facing application must always be available”

✅ Use:

Multi-AZ
Load balancer
Auto Scaling

Scenario 2:

“Development environment with minimal cost”

✅ Use:

Single AZ
Small EC2
No auto scaling

Scenario 3:

“Batch jobs can be interrupted”

✅ Use:

Spot Instances
Queue-based processing

Scenario 4:

“System must survive entire region failure”

✅ Use:

Multi-Region architecture

8. Key Exam Takeaways

✔ Availability must match workload importance
✔ Production = High availability (Multi-AZ)
✔ Non-production = Low cost (Single AZ)
✔ Batch = Flexible (Spot + retry)
✔ Higher availability = Higher cost
✔ Avoid over-architecting

9. Quick Memory Summary

Production → Multi-AZ + Auto Scaling
Dev/Test → Single AZ
Batch → Spot Instances
Extreme requirement → Multi-Region