Task Statement 3.5: Determine high-performing data ingestion and transformation solutions.
📘AWS Certified Solutions Architect – (SAA-C03)
1. What Are Data Transfer Services?
Data transfer services are AWS services used to move data:
- From on-premises → AWS
- From AWS → on-premises
- Between AWS services
- Between AWS Regions
These services are designed to:
- Handle large data volumes
- Provide secure transfer
- Ensure high performance
- Reduce manual effort
2. Types of Data Transfer in AWS
You must understand how data is transferred, because exam questions often depend on this.
A. Online Transfer (Over Network)
- Uses internet or private connections
- Continuous or scheduled transfer
Examples:
- AWS DataSync
- AWS Storage Gateway
B. Offline Transfer (Physical Devices)
- Used when network is slow or unavailable
- AWS ships hardware to move data
Examples:
- AWS Snowball (important but separate topic)
C. Hybrid Transfer
- On-premises systems integrated with AWS
- Data moves automatically
Examples:
- Storage Gateway
- DataSync
3. AWS DataSync (VERY IMPORTANT)
What is AWS DataSync?
AWS DataSync is a service used to:
- Transfer large amounts of data quickly
- Move data between:
- On-premises storage ↔ AWS
- AWS storage ↔ AWS storage
Key Features
- Fully managed service
- Automates data transfer
- Built-in:
- Encryption
- Validation (data integrity check)
- High-speed transfer (faster than standard tools)
Supported Sources/Destinations
- On-premises:
- NFS (Network File System)
- SMB (Windows file share)
- AWS:
- Amazon S3
- Amazon EFS
- Amazon FSx
How It Works (Simple Flow)
- Install DataSync agent on-premises
- Connect agent to AWS
- Create task (source → destination)
- Run transfer (manual or scheduled)
When to Use AWS DataSync (Exam Focus)
Use DataSync when:
- Need to transfer large datasets quickly
- Want automated and scheduled transfers
- Need data validation and integrity
- Migrating file systems to AWS
- Syncing data regularly between environments
When NOT to Use DataSync
- Small, simple uploads → use CLI or SDK
- Block storage migration → use other tools
- Real-time streaming → use Kinesis instead
Exam Tips for DataSync
If the question says:
- “Fast transfer”
- “Automated data sync”
- “File-based data migration”
- “On-prem to S3/EFS”
👉 The answer is usually DataSync
4. AWS Storage Gateway (VERY IMPORTANT)
What is AWS Storage Gateway?
AWS Storage Gateway is a hybrid storage service that:
- Connects on-premises applications to AWS storage
- Provides local access + cloud storage
Key Idea
Applications think they are using local storage, but data is actually:
- Stored in AWS
- Or backed up to AWS
Types of Storage Gateway
1. File Gateway
What it does:
- Provides file-based access (NFS/SMB)
- Stores files in Amazon S3
Use Case:
- Applications need file storage
- Want to store files in S3 transparently
Exam Keywords:
- File share
- NFS/SMB
- S3 backend
2. Volume Gateway
Provides block storage (like disks).
Two Modes:
A. Cached Mode
- Frequently used data → stored locally
- Full dataset → stored in AWS
👉 Use when:
- Limited local storage
- Need low latency for active data
B. Stored Mode
- Full data stored on-premises
- Backup copies in AWS
👉 Use when:
- Need local access to full dataset
- Want cloud backup
3. Tape Gateway
What it does:
- Replaces physical backup tapes
- Stores backups in AWS
Use Case:
- Existing backup systems using tapes
- Want to move to cloud backup
Summary Table
| Gateway Type | Storage Type | Backend |
|---|---|---|
| File Gateway | File | S3 |
| Volume Gateway | Block | EBS/S3 |
| Tape Gateway | Virtual tape | S3/Glacier |
When to Use Storage Gateway (Exam Focus)
Use Storage Gateway when:
- Need hybrid architecture
- Applications require low latency local access
- Want to extend on-prem storage to AWS
- Need backup to cloud
- Want to replace tape backup systems
When NOT to Use Storage Gateway
- One-time migration → use DataSync
- No on-prem system → use direct AWS services
- Real-time streaming → use Kinesis
Exam Tips for Storage Gateway
If the question says:
- “Hybrid storage”
- “On-premises application using AWS storage”
- “Low latency local access”
- “Backup to AWS”
👉 The answer is Storage Gateway
5. DataSync vs Storage Gateway (VERY IMPORTANT COMPARISON)
| Feature | DataSync | Storage Gateway |
|---|---|---|
| Purpose | Data transfer | Hybrid storage |
| Type | Transfer service | Storage service |
| Usage | Migration / sync | Continuous access |
| Speed | High-speed transfer | Depends on cache |
| Access | No direct app access | Apps access storage |
| Automation | Yes (tasks) | Continuous integration |
Key Difference (Exam Trick)
- DataSync → moves data
- Storage Gateway → provides storage access
6. Other Related Data Transfer Services (Quick Overview)
Even though focus is on DataSync & Storage Gateway, you should know these:
AWS Transfer Family
- Managed FTP, SFTP, FTPS
- Used for file transfer with external systems
Amazon S3 Transfer Acceleration
- Speeds up uploads to S3 using edge locations
AWS Snow Family
- Physical devices for offline transfer
- Used when:
- Very large data
- Slow network
7. Common Exam Scenarios
Scenario 1
- Large dataset
- Fast migration needed
- On-prem → S3
👉 Answer: DataSync
Scenario 2
- Application running on-prem
- Needs access to cloud storage
- Low latency required
👉 Answer: Storage Gateway (File/Volume)
Scenario 3
- Replace tape backups
👉 Answer: Tape Gateway
Scenario 4
- Continuous sync between on-prem and AWS
👉 Answer: DataSync or Storage Gateway (depending on access need)
8. Final Exam Tips (Must Remember)
- DataSync = Data movement
- Storage Gateway = Hybrid storage
- File Gateway = File → S3
- Volume Gateway = Block storage
- Tape Gateway = Backup replacement
9. Quick Memory Trick
- Sync = Move data → DataSync
- Gateway = Access storage → Storage Gateway
