Task Statement 4.3: Design cost-optimized database solutions.
📘AWS Certified Solutions Architect – (SAA-C03)
🔹 What is a Data Retention Policy?
A data retention policy defines:
- How long data is stored
- Where it is stored
- When it should be deleted, archived, or moved to cheaper storage
It is used to control storage cost, meet compliance requirements, and manage data lifecycle efficiently.
🎯 Why Data Retention Policies Matter (Exam Focus)
You need to understand three key goals:
1. Cost Optimization
- Storing all data forever is expensive
- Older data is accessed less frequently → should be moved to cheaper storage
2. Compliance & Regulations
- Some data must be kept for a specific time period
- Some data must be deleted after a certain time
3. Performance Optimization
- Active systems should only store frequently accessed (hot) data
- Old data should not slow down databases
🔹 Data Lifecycle Concept (Very Important)
Data typically moves through stages:
| Stage | Description | Storage Type |
|---|---|---|
| Hot Data | Frequently accessed | Fast, expensive storage |
| Warm Data | Occasionally accessed | Medium-cost storage |
| Cold Data | Rarely accessed | Cheap storage |
| Archived Data | Almost never accessed | Very cheap storage |
👉 AWS uses this lifecycle model heavily in cost optimization questions.
🔹 AWS Services for Data Retention
1. Amazon S3 Lifecycle Policies
🔹 What it does:
Automatically moves or deletes objects based on rules.
🔹 Key Actions:
- Transition to cheaper storage:
- S3 Standard → S3 Standard-IA
- S3 Standard-IA → S3 Glacier
- S3 Glacier → S3 Glacier Deep Archive
- Expire (delete) objects after time
🔹 Example (IT scenario):
- Logs stored in S3:
- After 30 days → move to S3 IA
- After 90 days → move to Glacier
- After 1 year → delete
🔹 Key Exam Points:
- Fully automated
- Works at bucket or object level
- Reduces storage cost significantly
2. Amazon RDS Automated Backups & Snapshots
🔹 Features:
- Automated backups (retention: 1–35 days)
- Manual snapshots (kept until deleted)
🔹 Retention Strategy:
- Short-term recovery → automated backups
- Long-term retention → manual snapshots
🔹 Exam Tips:
- Automated backups expire automatically
- Snapshots must be deleted manually
3. Amazon DynamoDB TTL (Time To Live)
🔹 What it does:
- Automatically deletes items after a specified timestamp
🔹 Use Case:
- Temporary or session-based data
🔹 Exam Tips:
- No manual deletion needed
- Helps reduce storage cost automatically
4. Amazon EBS Snapshots Lifecycle
🔹 Used with:
- EC2 volumes
🔹 Retention Control:
- Use Amazon Data Lifecycle Manager (DLM) to:
- Schedule snapshots
- Automatically delete old snapshots
🔹 Exam Tips:
- Prevents accumulation of unused snapshots
- Important for cost control
5. AWS Backup
🔹 Centralized backup service
🔹 Features:
- Define backup plans
- Set retention periods
- Automate deletion
🔹 Exam Tips:
- Works across multiple services (RDS, EBS, DynamoDB, etc.)
- Good for organization-wide retention policies
6. Amazon S3 Object Lock (Compliance Feature)
🔹 What it does:
- Prevents object deletion for a fixed time
🔹 Modes:
- Governance mode
- Compliance mode (strict)
🔹 Exam Tips:
- Used for regulatory requirements
- Data cannot be deleted before retention period ends
🔹 Data Retention Strategies (Exam Scenarios)
1. Time-Based Retention
- Keep data for a fixed period (e.g., 90 days)
- Then delete or archive
2. Lifecycle-Based Retention
- Move data across storage tiers over time
3. Event-Based Retention
- Keep data until a condition is met
4. Legal Hold / Compliance Retention
- Prevent deletion for compliance reasons
🔹 Cost Optimization Techniques
✅ Move old data to cheaper storage
- S3 Glacier / Deep Archive
✅ Delete unnecessary data
- Use lifecycle expiration rules
✅ Automate retention
- Avoid manual management
✅ Avoid over-retention
- Do not keep data longer than needed
🔹 Common Exam Scenarios
🧠 Scenario 1:
Large amount of log data, rarely accessed after 30 days
✅ Solution: S3 Lifecycle → Glacier
🧠 Scenario 2:
Temporary data in NoSQL database
✅ Solution: DynamoDB TTL
🧠 Scenario 3:
Need backups but want automatic deletion
✅ Solution: AWS Backup or DLM
🧠 Scenario 4:
Data must not be deleted for compliance
✅ Solution: S3 Object Lock
🧠 Scenario 5:
Long-term database backup retention
✅ Solution: RDS manual snapshots
🔹 Key Differences to Remember
| Feature | Auto Delete | Manual Delete | Use Case |
|---|---|---|---|
| S3 Lifecycle | ✅ | ❌ | Object storage |
| DynamoDB TTL | ✅ | ❌ | Temporary data |
| RDS Backups | ✅ | ❌ | Short-term recovery |
| RDS Snapshots | ❌ | ✅ | Long-term backup |
| AWS Backup | ✅ | ❌ | Centralized backup |
| S3 Object Lock | ❌ (protected) | ❌ | Compliance |
🔹 Best Practices (Very Important)
- Use automation wherever possible
- Choose storage class based on access frequency
- Regularly review retention policies
- Combine backup + lifecycle policies
- Avoid keeping unused data
🔹 Exam Tips (Must Remember)
- Lifecycle policies = cost optimization
- Glacier = cheap, slow access
- TTL = automatic deletion
- Snapshots ≠ automatic deletion
- Object Lock = compliance, cannot delete
✅ Final Summary
A data retention policy in AWS is used to:
- Control how long data is stored
- Move data to cheaper storage over time
- Automatically delete unnecessary data
- Meet compliance and regulatory requirements
👉 The key to passing the exam:
- Understand when to store, move, archive, or delete data
- Know which AWS service handles each case
- Always think in terms of cost optimization + automation
