Task Statement 4.4: Design cost-optimized network architectures.

📘AWS Certified Solutions Architect – (SAA-C03)

1. What is Throttling?

Throttling means limiting the number of requests a system accepts within a specific time period.

In AWS architectures, throttling is used to:

Protect backend services from overload
Prevent unexpected traffic spikes from increasing cost
Ensure fair usage between users or applications
Maintain predictable performance

When limits are exceeded, systems typically return:

HTTP 429 (Too Many Requests) error
Or queue/delay the request instead of rejecting it

2. Why Throttling is Important for Cost Optimization

Without throttling:

Applications may over-consume expensive services
Sudden traffic spikes may increase scaling costs
Downstream services (databases, APIs) may fail under load
Retry storms can multiply costs

With proper throttling:

You avoid unnecessary scaling
You control API usage costs
You smooth traffic into predictable workloads
You reduce wasted compute and database capacity

3. Common AWS Services That Support Throttling

3.1 Amazon API Gateway

Amazon API Gateway

API Gateway provides built-in throttling using:

Steady-state rate limit (requests per second)
Burst limit (temporary spikes)

You can apply throttling at:

API level
Stage level
Method level
Per API key (usage plans)

Exam focus:

Use API Gateway throttling to protect backend services like Lambda or EC2
Return 429 when limits are exceeded

3.2 AWS WAF Rate-Based Rules

AWS WAF

AWS WAF protects web applications by limiting request rates from:

IP addresses
User agents
Request patterns

Use case:

Blocking or throttling abusive traffic (e.g., excessive login attempts or scraping)

Exam focus:

WAF is used at the edge (CloudFront / ALB) level
Best for security-driven throttling

3.3 Amazon CloudFront Throttling (Edge Control)

Amazon CloudFront

CloudFront helps reduce origin load by:

Caching responses at edge locations
Reducing repeated requests to origin servers
Combining with WAF for rate limiting

Exam focus:

Use CloudFront when traffic is global and repetitive
Reduces cost by avoiding origin compute/database calls

3.4 Application Load Balancer (ALB)

Elastic Load Balancing (Application Load Balancer)

ALB does not directly throttle requests, but it helps with:

Distributing traffic evenly across targets
Preventing single instance overload
Integrating with AWS WAF for rate limits

Exam focus:

ALB = traffic distribution layer, not strict throttling layer

3.5 Amazon SQS (Buffering Instead of Throttling)

Amazon SQS

Instead of rejecting requests, SQS provides decoupling and buffering:

Incoming requests are stored in a queue
Workers process messages at a controlled rate

Why this is important:

Prevents backend overload
Smooths traffic spikes
Reduces need for over-provisioning

Exam keyword:

“Buffer instead of throttle” → choose SQS

3.6 AWS Lambda Concurrency Limits

AWS Lambda

Lambda supports throttling using:

Reserved concurrency (hard limit)
Account-level concurrency limit

When exceeded:

Requests are throttled
Can be retried or sent to DLQ (Dead Letter Queue)

Exam focus:

Use reserved concurrency to protect downstream services (databases, APIs)

3.7 Amazon DynamoDB Throttling

Amazon DynamoDB

DynamoDB throttles when capacity is exceeded:

Provisioned throughput limits (RCU/WCU)
On-demand scaling reduces but does not eliminate throttling

Exam focus:

Use auto scaling or on-demand to reduce throttling risk
Throttling appears as “ProvisionedThroughputExceededException”

3.8 Amazon Kinesis Shard-Based Throttling

Amazon Kinesis

Kinesis controls throughput using shards:

Each shard has read/write limits
If exceeded → throttling occurs

Exam focus:

Increase shard count to handle higher throughput
Use for streaming workloads with controlled ingestion rate

4. Types of Throttling Strategies

4.1 Hard Throttling (Reject Requests)

Requests above limit are rejected immediately
Returns 429 error
Used in API Gateway, WAF, Lambda

✔ Pros:

Protects backend instantly
✖ Cons:
Requests are lost unless retried

4.2 Soft Throttling (Queue-Based)

Requests are accepted but delayed
Uses buffering systems like SQS

✔ Pros:

No data loss
Smooth processing
✖ Cons:
Increased latency

4.3 Adaptive Throttling

System dynamically adjusts limits based on load
Used in autoscaling architectures

✔ Pros:

Efficient resource use
✖ Cons:
More complex to configure

4.4 Client-Side Throttling

Clients control request rate
Uses retry logic with exponential backoff

Common pattern:

Retry after delay
Increase delay after each failure

Exam focus:

Helps prevent retry storms

5. Key Design Patterns for the Exam

5.1 Rate Limiting at the Edge

Use:

CloudFront + WAF

✔ Best for:

Web applications exposed to the internet
Reducing attack traffic early

5.2 API-Level Throttling

Use:

API Gateway usage plans

✔ Best for:

SaaS APIs
Partner integrations
Multi-tenant systems

5.3 Queue-Based Load Leveling

Use:

SQS + Lambda/EC2 workers

✔ Best for:

High burst workloads
Background processing

5.4 Backend Protection Throttling

Use:

Lambda reserved concurrency
DynamoDB capacity limits

✔ Best for:

Protecting databases and compute services

6. Cost Optimization Angle (Very Important for Exam)

Throttling directly reduces cost by:

Preventing over-scaling of compute resources
Avoiding unnecessary database capacity increases
Reducing failed requests and retries
Smoothing traffic to avoid peak provisioning

7. Common Exam Scenarios

Scenario 1:

“Sudden traffic spikes are causing backend failures”

✔ Answer:

Use SQS buffering or API Gateway throttling

Scenario 2:

“Prevent abusive users from overwhelming an API”

✔ Answer:

Use AWS WAF rate-based rules

Scenario 3:

“Backend database is overloaded by API requests”

✔ Answer:

Use API Gateway throttling or Lambda concurrency limits

Scenario 4:

“Need global caching and reduced origin load”

✔ Answer:

Use CloudFront edge caching + WAF

8. Quick Exam Tips

API Gateway = request throttling
WAF = security-based rate limiting
SQS = buffering (preferred over rejecting)
Lambda = concurrency control
DynamoDB = capacity throttling
Kinesis = shard throughput limits
CloudFront = reduces request load via caching

9. Summary

A good throttling strategy in AWS is about:

Controlling request flow
Protecting backend systems
Reducing unnecessary scaling costs
Choosing between rejecting (throttle) or buffering (queue)

For the exam, always match the solution to:

Where throttling happens (edge, API, backend)
Whether requests should be rejected or queued
Cost vs performance trade-offs