Task Statement 3.5: Determine high-performing data ingestion and transformation solutions.
📘AWS Certified Solutions Architect – (SAA-C03)
1. What is Data Transfer?
Data transfer means moving data from one place to another, such as:
- On-premises → AWS
- AWS → On-premises
- Between AWS services
- Between AWS Regions
- Between Availability Zones
2. Key Factors for Designing Data Transfer Solutions
In the exam, ALWAYS think about these:
1. Data Size
- Small (MBs–GBs)
- Medium (GBs–TBs)
- Large (TBs–PBs)
2. Transfer Speed Requirements
- Real-time (milliseconds)
- Near real-time (seconds)
- Batch (minutes/hours/days)
3. Frequency
- One-time migration
- Continuous transfer
- Scheduled transfers
4. Network Connectivity
- Public internet
- Private connection
- Hybrid connectivity
5. Cost Optimization
- Data transfer charges
- Service costs
- Network costs
6. Security
- Encryption
- Private connectivity
- Access control
3. Types of Data Transfer in AWS
A. Online Data Transfer (Over Network)
Used when data is transferred through a network connection.
Services:
1. Amazon S3 Transfer (Basic Upload/Download)
- Upload using:
- AWS CLI
- SDKs
- Console
- Suitable for:
- Small to medium data transfers
2. Multipart Upload (Amazon S3)
- Splits large files into parts
- Uploads parts in parallel
Benefits:
- Faster uploads
- Reliable (retry failed parts)
Use when:
- Large files (>100 MB)
3. Amazon S3 Transfer Acceleration
- Uses AWS edge locations to speed up uploads
Key idea:
- Data travels through AWS global network instead of public internet
Use when:
- Users are far from the S3 bucket region
- Need faster global uploads
4. AWS DataSync
- Automated data transfer service
Features:
- Fast and efficient
- Handles:
- Scheduling
- Encryption
- Data validation
Use when:
- Large-scale transfers (TBs)
- Ongoing data movement
- On-premises ↔ AWS
5. AWS Storage Gateway
- Hybrid storage service
Types:
- File Gateway
- Volume Gateway
- Tape Gateway
Use when:
- On-premises apps need access to AWS storage
B. Offline Data Transfer
Used when:
- Network is too slow
- Data is extremely large
1. AWS Snow Family
AWS Snowcone
- Small device
- Edge computing + transfer
AWS Snowball
- Larger device
- High-capacity data transfer
AWS Snowmobile
- Massive data transfer (petabytes)
Process:
- AWS sends device
- Load data locally
- Ship back to AWS
- AWS uploads to S3
Use when:
- Very large datasets
- Limited or slow internet
C. Hybrid Connectivity (Private Data Transfer)
Used for secure and consistent connectivity.
1. AWS Site-to-Site VPN
- Encrypted connection over internet
Features:
- Quick setup
- Low cost
Limitations:
- Uses public internet
- Variable performance
2. AWS Client VPN
- Individual users connect securely
3. AWS Direct Connect
- Dedicated private connection between:
- On-premises → AWS
Benefits:
- Consistent performance
- Lower latency
- More secure than internet
Use when:
- Large, frequent transfers
- Enterprise workloads
4. Direct Connect + VPN (Hybrid)
- VPN used as backup for Direct Connect
4. Data Transfer Between AWS Services
1. Within Same Region
- Usually free or low cost
- High speed
2. Between Availability Zones
- Charges apply
- Used for high availability
3. Between Regions
- Higher cost
- Used for:
- Disaster recovery
- Global applications
5. Data Transfer Cost Optimization
Important Exam Points:
- Inbound data (into AWS) → Usually FREE
- Outbound data (from AWS) → Charged
Cost Optimization Techniques:
- Use same Region services
- Minimize cross-AZ traffic
- Use CloudFront caching
- Use S3 Transfer Acceleration only when needed
- Use Direct Connect for large continuous data
- Compress data before transfer
6. Security Best Practices
1. Encryption in Transit
- Use HTTPS / SSL
- VPN tunnels
- Direct Connect + MACsec (if applicable)
2. Encryption at Rest
- S3 encryption
- EBS encryption
3. Access Control
- IAM roles and policies
- Bucket policies
4. Private Connectivity
- Prefer:
- VPC endpoints
- Direct Connect
7. Choosing the Right Solution (Exam Decision Guide)
Scenario-Based Thinking (VERY IMPORTANT)
1. Small, occasional transfers
→ Use:
- S3 upload
- AWS CLI
2. Large files over internet
→ Use:
- Multipart upload
- S3 Transfer Acceleration
3. Continuous data transfer (on-premises ↔ AWS)
→ Use:
- AWS DataSync
- Storage Gateway
4. Massive data (TBs–PBs), slow network
→ Use:
- Snowball / Snowmobile
5. Secure, consistent, private connection
→ Use:
- Direct Connect
6. Quick secure connection
→ Use:
- VPN
7. Hybrid architecture
→ Use:
- Storage Gateway
- DataSync
- Direct Connect
8. Common Exam Traps
❌ Using Snowball for small data
❌ Using VPN for high-performance needs
❌ Ignoring data transfer costs
❌ Choosing internet transfer for sensitive data
❌ Not using multipart upload for large files
9. Summary (Quick Revision)
- Small data → S3 upload
- Large data (online) → Multipart / Transfer Acceleration
- Huge data (offline) → Snow Family
- Continuous transfer → DataSync
- Hybrid storage → Storage Gateway
- Secure private link → Direct Connect
- Quick setup → VPN
