Scaling factors for load balancers

Task Statement 1.3: Design solutions that integrate load balancing to meet high
availability, scalability, and security requirements.

📘AWS Certified Advanced Networking – Specialty


For the AWS Certified Advanced Networking – Specialty exam, you must understand how load balancers scale, what limits affect them, and how to design architectures that meet high availability, scalability, and security requirements.

This section focuses specifically on what affects the scaling of load balancers in AWS, how AWS handles scaling automatically, and what you must design correctly to avoid bottlenecks.

We will cover:

  1. What scaling means for load balancers
  2. Types of AWS load balancers
  3. Key scaling factors (very important for the exam)
  4. Limits and quotas
  5. Traffic patterns and design considerations
  6. Cross-zone load balancing
  7. Connection scaling
  8. Security and scaling impact
  9. Exam-focused design tips

1. What Does “Scaling” Mean for Load Balancers?

Scaling means the ability to handle:

  • More users
  • More connections
  • More traffic (bandwidth)
  • More requests per second

In AWS, scaling is mostly automatic for managed load balancers. But you must design your architecture correctly to avoid hitting limits.


2. AWS Load Balancer Types (Know the Differences)

AWS provides load balancers through Amazon Web Services under the Elastic Load Balancing service.

The main types are:

1. Application Load Balancer (ALB)

  • Layer 7 (HTTP/HTTPS)
  • Smart routing based on:
    • URL path
    • Host header
    • HTTP headers
  • Used for web applications and APIs

2. Network Load Balancer (NLB)

  • Layer 4 (TCP/UDP/TLS)
  • Very high performance
  • Static IP support
  • Used for high throughput and low latency workloads

3. Gateway Load Balancer (GWLB)

  • Used to deploy and scale security appliances
  • Works at Layer 3 and 4
  • Used for firewall fleets, IDS/IPS

Each type has different scaling characteristics.


3. Key Scaling Factors for Load Balancers (VERY IMPORTANT)

These are the main factors that determine how a load balancer scales:


3.1 New Connections Per Second

This measures how many new client connections are created every second.

Example:

  • A login-heavy application
  • IoT devices frequently reconnecting

High new connection rates can stress:

  • TLS negotiation
  • Backend connections

Exam Tip:

NLB handles very high new connection rates better than ALB in extreme throughput scenarios.


3.2 Active Connections

This measures how many connections are open at the same time.

Example:

  • Long-lived WebSocket connections
  • Streaming applications
  • Large file transfers

If you have:

  • Many concurrent users
  • Persistent connections

Your load balancer must support large numbers of simultaneous open sessions.


3.3 Requests Per Second (RPS)

Especially important for ALB.

Example:

  • REST APIs
  • Microservices
  • High-traffic websites

ALB scales automatically, but you must ensure:

  • Target groups can scale
  • Auto Scaling Groups are configured correctly

3.4 Throughput (Bandwidth)

Measured in:

  • Mbps or Gbps

Example:

  • Video streaming
  • Large downloads
  • Backup services

NLB supports extremely high throughput (millions of requests per second).

If throughput is your main requirement → NLB is often preferred.


3.5 TLS/SSL Handshake Rate

TLS termination consumes CPU resources.

High handshake rates occur when:

  • Clients reconnect often
  • Sessions are not reused
  • No keep-alive enabled

For heavy HTTPS workloads:

  • Ensure TLS session reuse
  • Consider NLB with TLS offloading
  • Ensure backend capacity matches

3.6 Rule Evaluations (ALB-Specific)

ALB evaluates routing rules.

Example:

  • Path-based routing
  • Host-based routing
  • Header-based routing

More rules = more processing.

Large numbers of rules may affect:

  • Performance
  • Cost (LCUs)

3.7 Load Balancer Capacity Units (LCUs)

ALB and NLB scale based on LCUs.

LCUs consider:

  • New connections
  • Active connections
  • Processed bytes
  • Rule evaluations (ALB only)

The highest usage dimension determines how many LCUs are consumed.

Exam Insight:

If traffic suddenly increases in one dimension (e.g., TLS handshakes), cost and scaling behavior are affected.


4. How AWS Handles Scaling

AWS load balancers scale automatically.

But scaling is not instant.

You must consider:

  • Traffic spikes
  • Sudden flash traffic
  • DDoS-like patterns

For predictable spikes:

  • Pre-warm (less common now but may apply in extreme cases)

For unpredictable spikes:

  • Use auto scaling backend targets
  • Use multiple Availability Zones

5. Availability Zone (AZ) Scaling

Load balancers are deployed across multiple AZs.

Best practice:

  • Enable at least two AZs
  • Register targets in each AZ

Why?

If one AZ fails:

  • Traffic shifts to healthy AZs

If you only register targets in one AZ:

  • You create a single point of failure

6. Cross-Zone Load Balancing

When enabled:

Traffic is distributed evenly across all targets in all AZs.

When disabled:

Each load balancer node routes only to targets in its own AZ.

For:

  • Even distribution
  • Better scaling

Cross-zone load balancing is often recommended.


7. Backend Scaling Dependency (Very Important Concept)

A load balancer scaling alone is useless if:

  • EC2 instances cannot scale
  • Containers cannot scale
  • Databases cannot handle load

Load balancer scaling must be aligned with:

  • Auto Scaling Groups
  • ECS services
  • EKS deployments
  • Lambda concurrency limits

Scaling is an end-to-end design decision.


8. Connection Draining / Deregistration Delay

When scaling down:

Connections must finish properly.

ALB/NLB support:

  • Deregistration delay
  • Graceful connection draining

This prevents:

  • Broken user sessions
  • Application errors

Important in auto-scaling environments.


9. Security and Scaling Interaction

Security controls affect scaling:

1. AWS WAF

If attached to ALB:

  • High inspection load
  • May affect throughput

2. Security Groups

Must allow:

  • Load balancer to targets
  • Clients to load balancer

3. Gateway Load Balancer

Used when scaling security appliances like:

  • Firewalls
  • Deep packet inspection systems

GWLB ensures:

  • Transparent scaling of security appliances
  • High availability of inspection systems

10. Scaling Differences Between ALB and NLB

FeatureALBNLB
Layer74
ThroughputHighExtremely High
LatencyLowUltra-low
Static IPNoYes
Best forWeb appsHigh-performance TCP/UDP

Exam scenario examples:

  • High RPS API → ALB
  • Gaming backend → NLB
  • Firewall fleet → GWLB

11. Sudden Traffic Spikes

For exam scenarios involving:

  • Marketing campaign
  • Product launch
  • Large-scale event

You must consider:

  • Pre-scaling backend
  • Multiple AZs
  • Monitoring via CloudWatch
  • Avoiding single-AZ design

12. Common Design Mistakes (Exam Traps)

❌ Only one Availability Zone
❌ Backend not auto-scaling
❌ No health checks configured
❌ Security group blocking traffic
❌ Using ALB when ultra-low latency TCP is required
❌ Ignoring TLS handshake scaling


13. Monitoring Scaling Metrics

Key metrics to monitor:

  • ActiveConnectionCount
  • NewConnectionCount
  • ProcessedBytes
  • HTTPCode_Target_5XX_Count
  • TargetResponseTime

These help detect:

  • Saturation
  • Bottlenecks
  • Backend failures

14. Full Exam-Ready Summary

To pass this section, you must understand:

✔ Scaling dimensions:

  • New connections per second
  • Active connections
  • Throughput
  • Requests per second
  • TLS handshakes
  • Rule evaluations

✔ LCU model

✔ Multi-AZ deployment

✔ Cross-zone load balancing

✔ Backend scaling alignment

✔ Security impact on performance

✔ Differences between ALB, NLB, and GWLB

✔ Monitoring and bottleneck identification


Final Concept to Remember

Load balancer scaling is not just about handling more traffic.

It is about:

  • Maintaining availability
  • Preventing bottlenecks
  • Protecting security layers
  • Ensuring backend systems scale properly
  • Designing for failure

For the AWS Advanced Networking Specialty exam, always think:

“If traffic doubles suddenly, will every layer of this architecture handle it?”

If the answer is not clearly yes, the design is incomplete.

Buy Me a Coffee