Task Statement 4.4: Design cost-optimized network architectures.
📘AWS Certified Solutions Architect – (SAA-C03)
1. What is Throttling?
Throttling means limiting the number of requests a system accepts within a specific time period.
In AWS architectures, throttling is used to:
- Protect backend services from overload
- Prevent unexpected traffic spikes from increasing cost
- Ensure fair usage between users or applications
- Maintain predictable performance
When limits are exceeded, systems typically return:
- HTTP 429 (Too Many Requests) error
- Or queue/delay the request instead of rejecting it
2. Why Throttling is Important for Cost Optimization
Without throttling:
- Applications may over-consume expensive services
- Sudden traffic spikes may increase scaling costs
- Downstream services (databases, APIs) may fail under load
- Retry storms can multiply costs
With proper throttling:
- You avoid unnecessary scaling
- You control API usage costs
- You smooth traffic into predictable workloads
- You reduce wasted compute and database capacity
3. Common AWS Services That Support Throttling
3.1 Amazon API Gateway
Amazon API Gateway
API Gateway provides built-in throttling using:
- Steady-state rate limit (requests per second)
- Burst limit (temporary spikes)
You can apply throttling at:
- API level
- Stage level
- Method level
- Per API key (usage plans)
Exam focus:
- Use API Gateway throttling to protect backend services like Lambda or EC2
- Return 429 when limits are exceeded
3.2 AWS WAF Rate-Based Rules
AWS WAF
AWS WAF protects web applications by limiting request rates from:
- IP addresses
- User agents
- Request patterns
Use case:
- Blocking or throttling abusive traffic (e.g., excessive login attempts or scraping)
Exam focus:
- WAF is used at the edge (CloudFront / ALB) level
- Best for security-driven throttling
3.3 Amazon CloudFront Throttling (Edge Control)
Amazon CloudFront
CloudFront helps reduce origin load by:
- Caching responses at edge locations
- Reducing repeated requests to origin servers
- Combining with WAF for rate limiting
Exam focus:
- Use CloudFront when traffic is global and repetitive
- Reduces cost by avoiding origin compute/database calls
3.4 Application Load Balancer (ALB)
Elastic Load Balancing (Application Load Balancer)
ALB does not directly throttle requests, but it helps with:
- Distributing traffic evenly across targets
- Preventing single instance overload
- Integrating with AWS WAF for rate limits
Exam focus:
- ALB = traffic distribution layer, not strict throttling layer
3.5 Amazon SQS (Buffering Instead of Throttling)
Amazon SQS
Instead of rejecting requests, SQS provides decoupling and buffering:
- Incoming requests are stored in a queue
- Workers process messages at a controlled rate
Why this is important:
- Prevents backend overload
- Smooths traffic spikes
- Reduces need for over-provisioning
Exam keyword:
- “Buffer instead of throttle” → choose SQS
3.6 AWS Lambda Concurrency Limits
AWS Lambda
Lambda supports throttling using:
- Reserved concurrency (hard limit)
- Account-level concurrency limit
When exceeded:
- Requests are throttled
- Can be retried or sent to DLQ (Dead Letter Queue)
Exam focus:
- Use reserved concurrency to protect downstream services (databases, APIs)
3.7 Amazon DynamoDB Throttling
Amazon DynamoDB
DynamoDB throttles when capacity is exceeded:
- Provisioned throughput limits (RCU/WCU)
- On-demand scaling reduces but does not eliminate throttling
Exam focus:
- Use auto scaling or on-demand to reduce throttling risk
- Throttling appears as “ProvisionedThroughputExceededException”
3.8 Amazon Kinesis Shard-Based Throttling
Amazon Kinesis
Kinesis controls throughput using shards:
- Each shard has read/write limits
- If exceeded → throttling occurs
Exam focus:
- Increase shard count to handle higher throughput
- Use for streaming workloads with controlled ingestion rate
4. Types of Throttling Strategies
4.1 Hard Throttling (Reject Requests)
- Requests above limit are rejected immediately
- Returns 429 error
- Used in API Gateway, WAF, Lambda
✔ Pros:
- Protects backend instantly
✖ Cons: - Requests are lost unless retried
4.2 Soft Throttling (Queue-Based)
- Requests are accepted but delayed
- Uses buffering systems like SQS
✔ Pros:
- No data loss
- Smooth processing
✖ Cons: - Increased latency
4.3 Adaptive Throttling
- System dynamically adjusts limits based on load
- Used in autoscaling architectures
✔ Pros:
- Efficient resource use
✖ Cons: - More complex to configure
4.4 Client-Side Throttling
- Clients control request rate
- Uses retry logic with exponential backoff
Common pattern:
- Retry after delay
- Increase delay after each failure
Exam focus:
- Helps prevent retry storms
5. Key Design Patterns for the Exam
5.1 Rate Limiting at the Edge
Use:
- CloudFront + WAF
✔ Best for:
- Web applications exposed to the internet
- Reducing attack traffic early
5.2 API-Level Throttling
Use:
- API Gateway usage plans
✔ Best for:
- SaaS APIs
- Partner integrations
- Multi-tenant systems
5.3 Queue-Based Load Leveling
Use:
- SQS + Lambda/EC2 workers
✔ Best for:
- High burst workloads
- Background processing
5.4 Backend Protection Throttling
Use:
- Lambda reserved concurrency
- DynamoDB capacity limits
✔ Best for:
- Protecting databases and compute services
6. Cost Optimization Angle (Very Important for Exam)
Throttling directly reduces cost by:
- Preventing over-scaling of compute resources
- Avoiding unnecessary database capacity increases
- Reducing failed requests and retries
- Smoothing traffic to avoid peak provisioning
7. Common Exam Scenarios
Scenario 1:
“Sudden traffic spikes are causing backend failures”
✔ Answer:
- Use SQS buffering or API Gateway throttling
Scenario 2:
“Prevent abusive users from overwhelming an API”
✔ Answer:
- Use AWS WAF rate-based rules
Scenario 3:
“Backend database is overloaded by API requests”
✔ Answer:
- Use API Gateway throttling or Lambda concurrency limits
Scenario 4:
“Need global caching and reduced origin load”
✔ Answer:
- Use CloudFront edge caching + WAF
8. Quick Exam Tips
- API Gateway = request throttling
- WAF = security-based rate limiting
- SQS = buffering (preferred over rejecting)
- Lambda = concurrency control
- DynamoDB = capacity throttling
- Kinesis = shard throughput limits
- CloudFront = reduces request load via caching
9. Summary
A good throttling strategy in AWS is about:
- Controlling request flow
- Protecting backend systems
- Reducing unnecessary scaling costs
- Choosing between rejecting (throttle) or buffering (queue)
For the exam, always match the solution to:
- Where throttling happens (edge, API, backend)
- Whether requests should be rejected or queued
- Cost vs performance trade-offs
