Task Statement 1.3: Design solutions that integrate load balancing to meet high
availability, scalability, and security requirements.
📘AWS Certified Advanced Networking – Specialty
For the AWS Certified Advanced Networking – Specialty exam, you must understand how load balancers scale, what limits affect them, and how to design architectures that meet high availability, scalability, and security requirements.
This section focuses specifically on what affects the scaling of load balancers in AWS, how AWS handles scaling automatically, and what you must design correctly to avoid bottlenecks.
We will cover:
- What scaling means for load balancers
- Types of AWS load balancers
- Key scaling factors (very important for the exam)
- Limits and quotas
- Traffic patterns and design considerations
- Cross-zone load balancing
- Connection scaling
- Security and scaling impact
- Exam-focused design tips
1. What Does “Scaling” Mean for Load Balancers?
Scaling means the ability to handle:
- More users
- More connections
- More traffic (bandwidth)
- More requests per second
In AWS, scaling is mostly automatic for managed load balancers. But you must design your architecture correctly to avoid hitting limits.
2. AWS Load Balancer Types (Know the Differences)
AWS provides load balancers through Amazon Web Services under the Elastic Load Balancing service.
The main types are:
1. Application Load Balancer (ALB)
- Layer 7 (HTTP/HTTPS)
- Smart routing based on:
- URL path
- Host header
- HTTP headers
- Used for web applications and APIs
2. Network Load Balancer (NLB)
- Layer 4 (TCP/UDP/TLS)
- Very high performance
- Static IP support
- Used for high throughput and low latency workloads
3. Gateway Load Balancer (GWLB)
- Used to deploy and scale security appliances
- Works at Layer 3 and 4
- Used for firewall fleets, IDS/IPS
Each type has different scaling characteristics.
3. Key Scaling Factors for Load Balancers (VERY IMPORTANT)
These are the main factors that determine how a load balancer scales:
3.1 New Connections Per Second
This measures how many new client connections are created every second.
Example:
- A login-heavy application
- IoT devices frequently reconnecting
High new connection rates can stress:
- TLS negotiation
- Backend connections
Exam Tip:
NLB handles very high new connection rates better than ALB in extreme throughput scenarios.
3.2 Active Connections
This measures how many connections are open at the same time.
Example:
- Long-lived WebSocket connections
- Streaming applications
- Large file transfers
If you have:
- Many concurrent users
- Persistent connections
Your load balancer must support large numbers of simultaneous open sessions.
3.3 Requests Per Second (RPS)
Especially important for ALB.
Example:
- REST APIs
- Microservices
- High-traffic websites
ALB scales automatically, but you must ensure:
- Target groups can scale
- Auto Scaling Groups are configured correctly
3.4 Throughput (Bandwidth)
Measured in:
- Mbps or Gbps
Example:
- Video streaming
- Large downloads
- Backup services
NLB supports extremely high throughput (millions of requests per second).
If throughput is your main requirement → NLB is often preferred.
3.5 TLS/SSL Handshake Rate
TLS termination consumes CPU resources.
High handshake rates occur when:
- Clients reconnect often
- Sessions are not reused
- No keep-alive enabled
For heavy HTTPS workloads:
- Ensure TLS session reuse
- Consider NLB with TLS offloading
- Ensure backend capacity matches
3.6 Rule Evaluations (ALB-Specific)
ALB evaluates routing rules.
Example:
- Path-based routing
- Host-based routing
- Header-based routing
More rules = more processing.
Large numbers of rules may affect:
- Performance
- Cost (LCUs)
3.7 Load Balancer Capacity Units (LCUs)
ALB and NLB scale based on LCUs.
LCUs consider:
- New connections
- Active connections
- Processed bytes
- Rule evaluations (ALB only)
The highest usage dimension determines how many LCUs are consumed.
Exam Insight:
If traffic suddenly increases in one dimension (e.g., TLS handshakes), cost and scaling behavior are affected.
4. How AWS Handles Scaling
AWS load balancers scale automatically.
But scaling is not instant.
You must consider:
- Traffic spikes
- Sudden flash traffic
- DDoS-like patterns
For predictable spikes:
- Pre-warm (less common now but may apply in extreme cases)
For unpredictable spikes:
- Use auto scaling backend targets
- Use multiple Availability Zones
5. Availability Zone (AZ) Scaling
Load balancers are deployed across multiple AZs.
Best practice:
- Enable at least two AZs
- Register targets in each AZ
Why?
If one AZ fails:
- Traffic shifts to healthy AZs
If you only register targets in one AZ:
- You create a single point of failure
6. Cross-Zone Load Balancing
When enabled:
Traffic is distributed evenly across all targets in all AZs.
When disabled:
Each load balancer node routes only to targets in its own AZ.
For:
- Even distribution
- Better scaling
Cross-zone load balancing is often recommended.
7. Backend Scaling Dependency (Very Important Concept)
A load balancer scaling alone is useless if:
- EC2 instances cannot scale
- Containers cannot scale
- Databases cannot handle load
Load balancer scaling must be aligned with:
- Auto Scaling Groups
- ECS services
- EKS deployments
- Lambda concurrency limits
Scaling is an end-to-end design decision.
8. Connection Draining / Deregistration Delay
When scaling down:
Connections must finish properly.
ALB/NLB support:
- Deregistration delay
- Graceful connection draining
This prevents:
- Broken user sessions
- Application errors
Important in auto-scaling environments.
9. Security and Scaling Interaction
Security controls affect scaling:
1. AWS WAF
If attached to ALB:
- High inspection load
- May affect throughput
2. Security Groups
Must allow:
- Load balancer to targets
- Clients to load balancer
3. Gateway Load Balancer
Used when scaling security appliances like:
- Firewalls
- Deep packet inspection systems
GWLB ensures:
- Transparent scaling of security appliances
- High availability of inspection systems
10. Scaling Differences Between ALB and NLB
| Feature | ALB | NLB |
|---|---|---|
| Layer | 7 | 4 |
| Throughput | High | Extremely High |
| Latency | Low | Ultra-low |
| Static IP | No | Yes |
| Best for | Web apps | High-performance TCP/UDP |
Exam scenario examples:
- High RPS API → ALB
- Gaming backend → NLB
- Firewall fleet → GWLB
11. Sudden Traffic Spikes
For exam scenarios involving:
- Marketing campaign
- Product launch
- Large-scale event
You must consider:
- Pre-scaling backend
- Multiple AZs
- Monitoring via CloudWatch
- Avoiding single-AZ design
12. Common Design Mistakes (Exam Traps)
❌ Only one Availability Zone
❌ Backend not auto-scaling
❌ No health checks configured
❌ Security group blocking traffic
❌ Using ALB when ultra-low latency TCP is required
❌ Ignoring TLS handshake scaling
13. Monitoring Scaling Metrics
Key metrics to monitor:
- ActiveConnectionCount
- NewConnectionCount
- ProcessedBytes
- HTTPCode_Target_5XX_Count
- TargetResponseTime
These help detect:
- Saturation
- Bottlenecks
- Backend failures
14. Full Exam-Ready Summary
To pass this section, you must understand:
✔ Scaling dimensions:
- New connections per second
- Active connections
- Throughput
- Requests per second
- TLS handshakes
- Rule evaluations
✔ LCU model
✔ Multi-AZ deployment
✔ Cross-zone load balancing
✔ Backend scaling alignment
✔ Security impact on performance
✔ Differences between ALB, NLB, and GWLB
✔ Monitoring and bottleneck identification
Final Concept to Remember
Load balancer scaling is not just about handling more traffic.
It is about:
- Maintaining availability
- Preventing bottlenecks
- Protecting security layers
- Ensuring backend systems scale properly
- Designing for failure
For the AWS Advanced Networking Specialty exam, always think:
“If traffic doubles suddenly, will every layer of this architecture handle it?”
If the answer is not clearly yes, the design is incomplete.
