Streaming data services with appropriate use cases (for example, Amazon Kinesis)

Task Statement 3.5: Determine high-performing data ingestion and transformation solutions.

📘AWS Certified Solutions Architect – (SAA-C03)


1. Introduction

In AWS, when designing data ingestion and transformation systems, you must always consider:

  • How much data (size) is being processed
  • How fast (speed) the data is generated and needs to be processed

These two factors directly affect:

  • Performance
  • Cost
  • Scalability
  • Choice of AWS services (especially streaming services like Amazon Kinesis)

2. Understanding Data Size

What is Data Size?

Data size refers to the volume of data being ingested or processed.

Common Categories

  • Small-scale data
    • MBs to GBs
    • Example: Application logs from a few servers
  • Medium-scale data
    • GBs to TBs
    • Example: Logs from multiple services or databases
  • Large-scale data
    • TBs to PBs
    • Example: Enterprise-wide data platforms, analytics pipelines

Why Data Size Matters

  • Determines storage choice (S3, EBS, etc.)
  • Impacts processing services (Lambda vs EMR vs Glue)
  • Affects network throughput
  • Influences cost

3. Understanding Data Speed

What is Data Speed?

Data speed refers to how fast data is generated and processed.

Types of Speed

1. Batch Processing (Low Speed)

  • Data collected over time and processed later
  • Example: Daily reports from database exports

2. Near Real-Time Processing

  • Small delay (seconds to minutes)
  • Example: Monitoring dashboards

3. Real-Time Streaming (High Speed)

  • Data processed instantly as it arrives
  • Example: Real-time log processing, metrics pipelines

Why Speed Matters

  • Determines latency requirements
  • Influences architecture design
  • Decides streaming vs batch services

4. Matching Size and Speed to AWS Services

RequirementBest Approach
Low size + low speedBatch processing (S3 + Lambda)
High size + low speedBatch analytics (S3 + EMR/Glue)
Low size + high speedStreaming (Kinesis, Lambda)
High size + high speedHigh-throughput streaming (Kinesis, MSK)

5. Streaming Data Services in AWS

What is Streaming Data?

Streaming data is:

  • Continuous
  • Unbounded
  • Generated in real-time

Instead of waiting, data is processed immediately as it arrives.


6. Amazon Kinesis Overview

Amazon Kinesis is a fully managed service used to:

  • Collect
  • Process
  • Analyze real-time streaming data

7. Core Kinesis Services

1. Kinesis Data Streams (KDS)

Purpose

  • Real-time ingestion of streaming data

Key Features

  • Low latency (milliseconds)
  • Scalable using shards
  • Durable storage (24 hours to 7 days or more)

Important Concepts

  • Shard
    • Unit of capacity
    • Each shard supports:
      • 1 MB/sec write
      • 2 MB/sec read
  • Producer
    • Sends data into stream
  • Consumer
    • Reads data from stream

2. Kinesis Data Firehose

Purpose

  • Load streaming data directly into storage services

Key Features

  • Fully managed (no shard management)
  • Automatic scaling
  • Delivers data to:
    • Amazon S3
    • Amazon Redshift
    • Amazon OpenSearch

Use Case

  • When you want simple ingestion with no management

3. Kinesis Data Analytics

Purpose

  • Process streaming data using SQL or Apache Flink

Key Features

  • Real-time transformations
  • Filtering, aggregation, enrichment

4. Kinesis Video Streams

Purpose

  • Streaming video data processing

8. When to Use Amazon Kinesis

Use Kinesis when:

  • Data arrives continuously
  • Low latency is required
  • You need real-time analytics
  • Data must be processed immediately

9. Choosing the Right Kinesis Service

RequirementService
Full control over streamingKinesis Data Streams
No management, simple deliveryKinesis Firehose
Real-time analyticsKinesis Data Analytics
Video streamingKinesis Video Streams

10. Performance and Scaling in Kinesis

Scaling with Shards (Kinesis Data Streams)

  • Increase shards → increase throughput
  • More shards = more parallel processing

Throughput Example

  • 10 shards:
    • 10 MB/sec write
    • 20 MB/sec read

Important Exam Point

  • Shard limits are critical for performance questions
  • Know:
    • 1 MB/sec write per shard
    • 2 MB/sec read per shard

11. Comparing Kinesis with Other Services

ServiceTypeUse Case
KinesisStreamingReal-time ingestion
Amazon SQSQueueMessage buffering
Amazon SNSPub/SubNotifications
Amazon MSKKafkaAdvanced streaming

12. Architecture Considerations

When designing a solution, consider:

1. Throughput

  • How much data per second?

2. Latency

  • Real-time vs batch?

3. Scalability

  • Will data volume increase?

4. Durability

  • Need data retention?

5. Cost

  • More shards = higher cost

13. Common Architecture Patterns

Pattern 1: Real-Time Processing

  • Producer → Kinesis Data Streams → Lambda → Database

Pattern 2: Streaming to Storage

  • Producer → Kinesis Firehose → S3

Pattern 3: Real-Time Analytics

  • Producer → Kinesis Streams → Kinesis Analytics → Dashboard

14. Key Exam Tips

Must Remember

  • Streaming = real-time processing
  • Kinesis = main AWS streaming service
  • Shards control throughput
  • Firehose = easiest option (no shard management)

Service Selection Logic

  • Need control → Kinesis Data Streams
  • Need simplicity → Firehose
  • Need analytics → Data Analytics

Typical Exam Questions

You may be asked:

  • Which service handles real-time ingestion?
  • How to scale streaming throughput?
  • Difference between Streams vs Firehose
  • When to use batch vs streaming

15. Summary

  • Data size determines how much data is processed
  • Data speed determines how fast it must be processed
  • Streaming is used for real-time, continuous data
  • Amazon Kinesis is the key AWS streaming solution
  • Choose services based on:
    • Throughput
    • Latency
    • Scalability
    • Management overhead
Buy Me a Coffee