High-availability features in Route 53 (for example, DNS load balancing using health checks with latency and weighted record sets)

Task Statement 3.3: Optimize AWS networks for performance, reliability, and cost-effectiveness.

📘AWS Certified Advanced Networking – Specialty


1. What is High Availability in Route 53?

High availability in DNS means:

  • If one application endpoint fails, users are automatically routed to a healthy endpoint.
  • DNS decisions are made based on health, location, and performance rules.
  • The goal is to keep applications accessible even during failures or performance issues.

In AWS, this is achieved using:

  • Health Checks
  • DNS Failover Routing
  • Latency-Based Routing
  • Weighted Routing
  • Multi-region deployment patterns

2. Core Building Block: Route 53 Health Checks

What is a Health Check?

A Route 53 Health Check continuously monitors the health of:

  • Web servers (HTTP/HTTPS endpoints)
  • TCP services (port-based checks)
  • Cloud resources (through endpoints or alarms)

How it works:

  • Route 53 sends periodic requests to an endpoint.
  • If the endpoint fails a threshold number of checks → it is marked UNHEALTHY.
  • If it recovers → marked HEALTHY again.

Types of Health Checks

1. Endpoint Health Checks

Checks a specific IP or domain:

  • HTTP / HTTPS response codes (e.g., 200 OK)
  • TCP connection success

2. Calculated Health Checks

Combines multiple health checks:

  • Example logic: “Healthy only if at least 2 out of 3 servers are healthy”

3. CloudWatch Alarm-Based Health Checks

  • Uses metrics (CPU, latency, errors)
  • If alarm triggers → endpoint is unhealthy

3. DNS Failover Routing (High Availability Feature)

What it does:

Automatically routes traffic to a backup endpoint when the primary fails.

Two modes:

A. Primary–Secondary (Active–Passive)

  • Primary endpoint serves traffic normally
  • Secondary endpoint is used only if primary fails

Example:

  • Primary: application in us-east-1
  • Secondary: application in eu-west-1

If primary fails → DNS switches to secondary.


B. Multi-Primary (Active–Active with health checks)

  • Multiple endpoints serve traffic at the same time
  • Only healthy endpoints receive traffic

4. Latency-Based Routing (Performance Optimization)

What it does:

Routes users to the AWS region with the lowest network latency.

How it works:

  • Amazon Route 53 checks latency measurements from different AWS regions.
  • Users are routed to the region that responds fastest from their location.

Example IT architecture:

  • Application deployed in:
    • Region A (Asia)
    • Region B (Europe)
    • Region C (US)

Route 53 automatically:

  • Sends Asian users to Region A
  • Sends European users to Region B
  • Sends US users to Region C

Key point for exam:

  • Latency-based routing improves performance, not just availability.

5. Weighted Routing (Traffic Distribution + Testing)

What it does:

Splits traffic between multiple endpoints based on defined percentages.

Example configuration:

  • Server A → 70%
  • Server B → 30%

Use cases in AWS networking:

  • Gradual application deployments
  • A/B testing of application versions
  • Capacity-based traffic distribution

How health checks work with weighted routing:

  • If one endpoint becomes unhealthy → it is removed automatically from the weighted pool.
  • Remaining healthy endpoints receive all traffic.

6. Combining Health Checks + Routing Policies (VERY IMPORTANT FOR EXAM)

In real AWS architectures, these features are combined:

Example architecture:

  • Two application regions:
    • Region 1 (primary)
    • Region 2 (secondary)

Setup:

  • Health checks monitor both regions
  • Latency-based routing selects fastest region
  • Failover routing ensures backup region is used if failure occurs
  • Weighted routing controls traffic distribution between versions

What happens during failure?

  1. Health check detects Region 1 is unhealthy
  2. Route 53 stops sending traffic to Region 1
  3. Traffic automatically shifts to Region 2
  4. No manual intervention required

7. TTL (Time to Live) – Important Exam Concept

TTL controls how long DNS records are cached.

Key points:

  • Low TTL → faster failover, more DNS queries
  • High TTL → slower failover, less DNS traffic

Exam tip:

For high availability systems:

  • Use lower TTL values to improve failover speed

8. Reliability Patterns Using Route 53

Pattern 1: Active-Passive Failover

  • Primary region handles traffic
  • Secondary region is standby
  • Used for disaster recovery

Pattern 2: Active-Active Multi-Region

  • Multiple regions serve traffic simultaneously
  • Health checks ensure only healthy endpoints are used

Pattern 3: Latency + Failover Hybrid

  • Users routed based on latency
  • Failover ensures redundancy

9. Key Limitations (Exam Trick Questions)

Route 53 is powerful, but:

  • It is DNS-based, not packet-based routing
  • Cannot detect application-level user sessions
  • DNS caching may delay failover
  • Health checks are not real-time (interval-based)

10. Exam Summary (Must Remember)

High Availability in Amazon Route 53 includes:

✔ Health Checks (endpoint, calculated, CloudWatch-based)
✔ DNS Failover Routing (primary/secondary)
✔ Latency-Based Routing (performance optimization)
✔ Weighted Routing (traffic splitting)
✔ Multi-region active-active architectures
✔ TTL tuning for faster failover


11. Simple Mental Model (For Exam Recall)

Think of Route 53 as a global traffic controller that:

  • Checks if systems are alive (health checks)
  • Sends users to fastest region (latency routing)
  • Splits traffic intelligently (weighted routing)
  • Switches systems during failures (failover routing)
Buy Me a Coffee