Determining the required availability for different classes of workloads (for example, production workloads, non-production workloads)

Task Statement 4.2: Design cost-optimized compute solutions.

📘AWS Certified Solutions Architect – (SAA-C03)


1. What is Availability?

Availability means how much time your system is up and running without failure.

It is usually expressed as a percentage:

AvailabilityDowntime per year
99%~3.65 days
99.9%~8.7 hours
99.99%~52 minutes
99.999%~5 minutes

👉 Higher availability = less downtime but higher cost


2. Key Exam Concept

👉 Not all workloads need the same availability

You must:

  • Match availability to business importance
  • Avoid over-engineering (wasting money)
  • Avoid under-design (risking failures)

3. Workload Classification

AWS divides workloads into different categories:


A. Production Workloads (High Availability Required)

What are they?

Systems that are:

  • Live (used by real users)
  • Business-critical
  • Require continuous uptime

Examples (IT context):

  • Web applications used by customers
  • Backend APIs for mobile apps
  • Payment processing systems
  • Authentication services

Required Availability:

👉 High (99.9% to 99.99% or higher)


Design Requirements

1. Multi-AZ Deployment

  • Use multiple Availability Zones (AZs)
  • If one AZ fails → system still works

2. Load Balancing

  • Use:
    • Application Load Balancer (ALB)
    • Network Load Balancer (NLB)

3. Auto Scaling

  • Automatically replace failed instances
  • Handle traffic spikes

4. Fault Tolerance

  • No single point of failure

5. Data Replication

  • Use:
    • Multi-AZ RDS
    • DynamoDB (multi-AZ by default)

Cost Consideration

  • High cost (more resources)
  • Justified because downtime = business loss

B. Non-Production Workloads (Lower Availability Required)

What are they?

Systems used for:

  • Development
  • Testing
  • Staging

Examples (IT context):

  • QA testing environments
  • Developer sandboxes
  • Pre-production staging servers

Required Availability:

👉 Low to Medium (90%–99%)


Design Requirements

1. Single AZ Deployment

  • Cheaper than Multi-AZ
  • Acceptable downtime

2. Manual Recovery Allowed

  • No need for automatic failover

3. Limited Scaling

  • No need for full auto scaling

4. Smaller Instance Sizes

  • Reduce cost

Cost Consideration

  • Must be low cost
  • Downtime is acceptable

C. Batch / Background Workloads

What are they?

Processes that:

  • Run in the background
  • Are not user-facing
  • Can be delayed or retried

Examples:

  • Data processing jobs
  • Log analysis
  • Report generation

Required Availability:

👉 Flexible (can tolerate interruptions)


Design Approach

1. Use Spot Instances

  • Very cheap (up to 90% discount)
  • Can be interrupted

2. Retry Mechanisms

  • Jobs should restart automatically

3. Queue-Based Systems

  • Use:
    • Amazon SQS
    • AWS Batch

Cost Optimization

  • Maximum savings possible
  • Availability is not strict

D. Critical vs Non-Critical Components (Inside Same System)

Even within one application:

ComponentAvailability
Authentication APIHigh
Logging serviceMedium
Analytics dashboardLow

👉 Design each component separately


4. Availability vs Cost Trade-off (Very Important)

Design ChoiceAvailabilityCost
Single AZLowLow
Multi-AZHighMedium
Multi-RegionVery HighVery High

Exam Tip:

👉 Do NOT choose Multi-Region unless explicitly required


5. AWS Services for Availability Design


High Availability Services

  • Amazon EC2 with Auto Scaling
  • Elastic Load Balancer (ELB)
  • Amazon RDS (Multi-AZ)
  • Amazon DynamoDB
  • Amazon S3 (99.999999999% durability)

Cost-Optimized (Lower Availability)

  • EC2 in single AZ
  • Spot Instances
  • AWS Lambda (for intermittent workloads)
  • Amazon ECS / Fargate (scale when needed)

6. Decision Framework (Exam Ready)

When you see a question:


Step 1: Identify Workload Type

  • Production → High availability
  • Dev/Test → Low availability
  • Batch → Flexible

Step 2: Check Requirements

  • Is downtime acceptable?
  • Is it user-facing?
  • Does it need real-time response?

Step 3: Choose Architecture

RequirementSolution
High availabilityMulti-AZ + Auto Scaling
Cost optimizationSingle AZ or Spot
Extreme availabilityMulti-Region

7. Common Exam Scenarios


Scenario 1:

“Customer-facing application must always be available”

✅ Use:

  • Multi-AZ
  • Load balancer
  • Auto Scaling

Scenario 2:

“Development environment with minimal cost”

✅ Use:

  • Single AZ
  • Small EC2
  • No auto scaling

Scenario 3:

“Batch jobs can be interrupted”

✅ Use:

  • Spot Instances
  • Queue-based processing

Scenario 4:

“System must survive entire region failure”

✅ Use:

  • Multi-Region architecture

8. Key Exam Takeaways

✔ Availability must match workload importance
✔ Production = High availability (Multi-AZ)
✔ Non-production = Low cost (Single AZ)
✔ Batch = Flexible (Spot + retry)
✔ Higher availability = Higher cost
✔ Avoid over-architecting


9. Quick Memory Summary

  • Production → Multi-AZ + Auto Scaling
  • Dev/Test → Single AZ
  • Batch → Spot Instances
  • Extreme requirement → Multi-Region
Buy Me a Coffee