Task Statement 3.5: Determine high-performing data ingestion and transformation
solutions.
📘AWS Certified Solutions Architect – (SAA-C03)
This section covers two important areas:
- Data analytics and visualization services
- Selecting the appropriate load balancing strategy
Both are commonly tested in scenario-based questions in the exam.
🟦 PART 1: Data Analytics & Visualization Services
🔹 What is Data Analytics in AWS?
Data analytics means:
- Collecting data
- Processing it
- Querying it
- Visualizing insights
AWS provides serverless and scalable services so you don’t need to manage infrastructure.
🔹 Key Services You MUST Know
1️⃣ Amazon Athena
📌 What is it?
Amazon Athena is a serverless query service used to analyze data stored in Amazon S3 using SQL.
📌 Key Features
- No servers to manage
- Uses standard SQL
- Works directly on S3 data
- Pay per query (based on data scanned)
📌 How it Works
- Data is stored in S3 (CSV, JSON, Parquet, etc.)
- Define schema (table structure)
- Run SQL queries
- Get results instantly
📌 Best Use Cases (Exam Focus)
- Query logs stored in S3
- Ad-hoc analysis (quick queries)
- Analyze structured/semi-structured data
- No need for database setup
📌 Exam Tips
- If question says “query data in S3 using SQL” → Athena
- If no infrastructure management required → Athena
- If low cost analytics → Athena
2️⃣ AWS Lake Formation
📌 What is it?
AWS Lake Formation helps you build, secure, and manage a data lake.
📌 Key Features
- Centralized data lake management
- Fine-grained access control
- Data catalog integration
- Works with S3, Athena, Redshift, QuickSight
📌 What Problem It Solves
Without Lake Formation:
- Hard to manage permissions
- Data spread across multiple services
With Lake Formation:
- Central control of data access
- Easier governance and security
📌 Best Use Cases (Exam Focus)
- Building a secure data lake
- Managing access to large datasets
- Centralized governance
📌 Exam Tips
- If question mentions “data lake + security + permissions” → Lake Formation
- If multiple services need controlled access → Lake Formation
3️⃣ Amazon QuickSight
📌 What is it?
Amazon QuickSight is a business intelligence (BI) tool used to create dashboards and visualizations.
📌 Key Features
- Interactive dashboards
- Graphs, charts, reports
- Serverless and scalable
- Can connect to Athena, S3, Redshift, RDS
📌 What It Does
- Converts data into visual insights
- Used by analysts and business users
📌 Best Use Cases (Exam Focus)
- Dashboard creation
- Data visualization
- Business reporting
📌 Exam Tips
- If question says “create dashboards / visualize data” → QuickSight
- If business users need reports → QuickSight
🔄 How These Services Work Together
Typical AWS analytics flow:
- Data stored in S3
- Managed and secured by Lake Formation
- Queried using Athena
- Visualized using QuickSight
🟦 PART 2: Selecting the Appropriate Load Balancing Strategy
Load balancing is critical for high performance and scalability.
🔹 What is Load Balancing?
Load balancing distributes incoming traffic across multiple resources (like EC2 instances) to:
- Improve performance
- Increase availability
- Avoid overload
🔹 AWS Load Balancer Types (VERY IMPORTANT)
1️⃣ Application Load Balancer (ALB)
📌 Works at:
- Layer 7 (HTTP/HTTPS)
📌 Key Features
- Path-based routing (/api, /images)
- Host-based routing (different domains)
- Supports microservices and containers
- Integrates with ECS, EKS, Lambda
📌 Best Use Cases
- Web applications
- REST APIs
- Microservices architecture
📌 Exam Tips
- If question mentions:
- HTTP/HTTPS routing → ALB
- Path-based routing → ALB
- Microservices → ALB
2️⃣ Network Load Balancer (NLB)
📌 Works at:
- Layer 4 (TCP/UDP)
📌 Key Features
- Ultra-high performance
- Low latency
- Static IP support
- Handles millions of requests
📌 Best Use Cases
- Real-time systems
- Gaming backends
- Financial systems
- TCP/UDP workloads
📌 Exam Tips
- If question mentions:
- High performance / low latency → NLB
- TCP/UDP traffic → NLB
- Static IP needed → NLB
3️⃣ Gateway Load Balancer (GWLB)
📌 Purpose:
- Deploy and scale network security appliances
📌 Key Features
- Works with firewalls, intrusion detection systems
- Transparent traffic inspection
📌 Best Use Cases
- Security layer insertion
- Traffic inspection
📌 Exam Tips
- If question mentions:
- firewalls / inspection → GWLB
🔹 Load Balancing Strategies
1️⃣ Round Robin (Default)
- Requests distributed evenly
- Simple and effective
2️⃣ Least Outstanding Requests
- Send traffic to instance with least active requests
3️⃣ Sticky Sessions
- Same user → same backend instance
- Uses cookies
📌 Exam Tip:
- If session persistence required → Sticky sessions
4️⃣ Health Checks
Load balancer sends health checks to instances:
- Healthy → receives traffic
- Unhealthy → removed automatically
📌 Exam Tip:
- Always ensures high availability
🔹 Cross-Zone Load Balancing
📌 What it does:
- Distributes traffic evenly across all Availability Zones
📌 Exam Tip:
- Improves utilization and availability
🔹 Integration with Auto Scaling
Load balancers work with Auto Scaling to:
- Automatically add/remove instances
- Maintain performance during traffic changes
📌 Exam Tip:
- If question mentions:
- dynamic scaling + load balancing → ALB/NLB + Auto Scaling
🟩 Key Decision Table (VERY IMPORTANT)
| Requirement | Service |
|---|---|
| Query data in S3 using SQL | Athena |
| Build secure data lake | Lake Formation |
| Create dashboards | QuickSight |
| HTTP/HTTPS routing | ALB |
| High performance TCP/UDP | NLB |
| Security appliance routing | GWLB |
| Session persistence | Sticky Sessions |
🟨 Final Exam Tips
- Athena = SQL on S3
- Lake Formation = Data lake security
- QuickSight = Visualization
- ALB = Smart HTTP routing
- NLB = Speed + TCP/UDP
- GWLB = Security layer
✅ What You Should Be Able to Do for the Exam
After studying this, you should be able to:
✔ Identify the correct analytics service
✔ Choose the right visualization tool
✔ Understand data lake architecture
✔ Select the correct load balancer
✔ Match use cases with services
✔ Answer scenario-based questions confidently
