Task Statement 4.2: Design cost-optimized compute solutions.
📘AWS Certified Solutions Architect – (SAA-C03)
1. What is Availability?
Availability means how much time your system is up and running without failure.
It is usually expressed as a percentage:
| Availability | Downtime per year |
|---|---|
| 99% | ~3.65 days |
| 99.9% | ~8.7 hours |
| 99.99% | ~52 minutes |
| 99.999% | ~5 minutes |
👉 Higher availability = less downtime but higher cost
2. Key Exam Concept
👉 Not all workloads need the same availability
You must:
- Match availability to business importance
- Avoid over-engineering (wasting money)
- Avoid under-design (risking failures)
3. Workload Classification
AWS divides workloads into different categories:
A. Production Workloads (High Availability Required)
What are they?
Systems that are:
- Live (used by real users)
- Business-critical
- Require continuous uptime
Examples (IT context):
- Web applications used by customers
- Backend APIs for mobile apps
- Payment processing systems
- Authentication services
Required Availability:
👉 High (99.9% to 99.99% or higher)
Design Requirements
1. Multi-AZ Deployment
- Use multiple Availability Zones (AZs)
- If one AZ fails → system still works
2. Load Balancing
- Use:
- Application Load Balancer (ALB)
- Network Load Balancer (NLB)
3. Auto Scaling
- Automatically replace failed instances
- Handle traffic spikes
4. Fault Tolerance
- No single point of failure
5. Data Replication
- Use:
- Multi-AZ RDS
- DynamoDB (multi-AZ by default)
Cost Consideration
- High cost (more resources)
- Justified because downtime = business loss
B. Non-Production Workloads (Lower Availability Required)
What are they?
Systems used for:
- Development
- Testing
- Staging
Examples (IT context):
- QA testing environments
- Developer sandboxes
- Pre-production staging servers
Required Availability:
👉 Low to Medium (90%–99%)
Design Requirements
1. Single AZ Deployment
- Cheaper than Multi-AZ
- Acceptable downtime
2. Manual Recovery Allowed
- No need for automatic failover
3. Limited Scaling
- No need for full auto scaling
4. Smaller Instance Sizes
- Reduce cost
Cost Consideration
- Must be low cost
- Downtime is acceptable
C. Batch / Background Workloads
What are they?
Processes that:
- Run in the background
- Are not user-facing
- Can be delayed or retried
Examples:
- Data processing jobs
- Log analysis
- Report generation
Required Availability:
👉 Flexible (can tolerate interruptions)
Design Approach
1. Use Spot Instances
- Very cheap (up to 90% discount)
- Can be interrupted
2. Retry Mechanisms
- Jobs should restart automatically
3. Queue-Based Systems
- Use:
- Amazon SQS
- AWS Batch
Cost Optimization
- Maximum savings possible
- Availability is not strict
D. Critical vs Non-Critical Components (Inside Same System)
Even within one application:
| Component | Availability |
|---|---|
| Authentication API | High |
| Logging service | Medium |
| Analytics dashboard | Low |
👉 Design each component separately
4. Availability vs Cost Trade-off (Very Important)
| Design Choice | Availability | Cost |
|---|---|---|
| Single AZ | Low | Low |
| Multi-AZ | High | Medium |
| Multi-Region | Very High | Very High |
Exam Tip:
👉 Do NOT choose Multi-Region unless explicitly required
5. AWS Services for Availability Design
High Availability Services
- Amazon EC2 with Auto Scaling
- Elastic Load Balancer (ELB)
- Amazon RDS (Multi-AZ)
- Amazon DynamoDB
- Amazon S3 (99.999999999% durability)
Cost-Optimized (Lower Availability)
- EC2 in single AZ
- Spot Instances
- AWS Lambda (for intermittent workloads)
- Amazon ECS / Fargate (scale when needed)
6. Decision Framework (Exam Ready)
When you see a question:
Step 1: Identify Workload Type
- Production → High availability
- Dev/Test → Low availability
- Batch → Flexible
Step 2: Check Requirements
- Is downtime acceptable?
- Is it user-facing?
- Does it need real-time response?
Step 3: Choose Architecture
| Requirement | Solution |
|---|---|
| High availability | Multi-AZ + Auto Scaling |
| Cost optimization | Single AZ or Spot |
| Extreme availability | Multi-Region |
7. Common Exam Scenarios
Scenario 1:
“Customer-facing application must always be available”
✅ Use:
- Multi-AZ
- Load balancer
- Auto Scaling
Scenario 2:
“Development environment with minimal cost”
✅ Use:
- Single AZ
- Small EC2
- No auto scaling
Scenario 3:
“Batch jobs can be interrupted”
✅ Use:
- Spot Instances
- Queue-based processing
Scenario 4:
“System must survive entire region failure”
✅ Use:
- Multi-Region architecture
8. Key Exam Takeaways
✔ Availability must match workload importance
✔ Production = High availability (Multi-AZ)
✔ Non-production = Low cost (Single AZ)
✔ Batch = Flexible (Spot + retry)
✔ Higher availability = Higher cost
✔ Avoid over-architecting
9. Quick Memory Summary
- Production → Multi-AZ + Auto Scaling
- Dev/Test → Single AZ
- Batch → Spot Instances
- Extreme requirement → Multi-Region
