Data retention policies

Task Statement 4.3: Design cost-optimized database solutions.

📘AWS Certified Solutions Architect – (SAA-C03)


🔹 What is a Data Retention Policy?

A data retention policy defines:

  • How long data is stored
  • Where it is stored
  • When it should be deleted, archived, or moved to cheaper storage

It is used to control storage cost, meet compliance requirements, and manage data lifecycle efficiently.


🎯 Why Data Retention Policies Matter (Exam Focus)

You need to understand three key goals:

1. Cost Optimization

  • Storing all data forever is expensive
  • Older data is accessed less frequently → should be moved to cheaper storage

2. Compliance & Regulations

  • Some data must be kept for a specific time period
  • Some data must be deleted after a certain time

3. Performance Optimization

  • Active systems should only store frequently accessed (hot) data
  • Old data should not slow down databases

🔹 Data Lifecycle Concept (Very Important)

Data typically moves through stages:

StageDescriptionStorage Type
Hot DataFrequently accessedFast, expensive storage
Warm DataOccasionally accessedMedium-cost storage
Cold DataRarely accessedCheap storage
Archived DataAlmost never accessedVery cheap storage

👉 AWS uses this lifecycle model heavily in cost optimization questions.


🔹 AWS Services for Data Retention

1. Amazon S3 Lifecycle Policies

🔹 What it does:

Automatically moves or deletes objects based on rules.

🔹 Key Actions:

  • Transition to cheaper storage:
    • S3 Standard → S3 Standard-IA
    • S3 Standard-IA → S3 Glacier
    • S3 Glacier → S3 Glacier Deep Archive
  • Expire (delete) objects after time

🔹 Example (IT scenario):

  • Logs stored in S3:
    • After 30 days → move to S3 IA
    • After 90 days → move to Glacier
    • After 1 year → delete

🔹 Key Exam Points:

  • Fully automated
  • Works at bucket or object level
  • Reduces storage cost significantly

2. Amazon RDS Automated Backups & Snapshots

🔹 Features:

  • Automated backups (retention: 1–35 days)
  • Manual snapshots (kept until deleted)

🔹 Retention Strategy:

  • Short-term recovery → automated backups
  • Long-term retention → manual snapshots

🔹 Exam Tips:

  • Automated backups expire automatically
  • Snapshots must be deleted manually

3. Amazon DynamoDB TTL (Time To Live)

🔹 What it does:

  • Automatically deletes items after a specified timestamp

🔹 Use Case:

  • Temporary or session-based data

🔹 Exam Tips:

  • No manual deletion needed
  • Helps reduce storage cost automatically

4. Amazon EBS Snapshots Lifecycle

🔹 Used with:

  • EC2 volumes

🔹 Retention Control:

  • Use Amazon Data Lifecycle Manager (DLM) to:
    • Schedule snapshots
    • Automatically delete old snapshots

🔹 Exam Tips:

  • Prevents accumulation of unused snapshots
  • Important for cost control

5. AWS Backup

🔹 Centralized backup service

🔹 Features:

  • Define backup plans
  • Set retention periods
  • Automate deletion

🔹 Exam Tips:

  • Works across multiple services (RDS, EBS, DynamoDB, etc.)
  • Good for organization-wide retention policies

6. Amazon S3 Object Lock (Compliance Feature)

🔹 What it does:

  • Prevents object deletion for a fixed time

🔹 Modes:

  • Governance mode
  • Compliance mode (strict)

🔹 Exam Tips:

  • Used for regulatory requirements
  • Data cannot be deleted before retention period ends

🔹 Data Retention Strategies (Exam Scenarios)

1. Time-Based Retention

  • Keep data for a fixed period (e.g., 90 days)
  • Then delete or archive

2. Lifecycle-Based Retention

  • Move data across storage tiers over time

3. Event-Based Retention

  • Keep data until a condition is met

4. Legal Hold / Compliance Retention

  • Prevent deletion for compliance reasons

🔹 Cost Optimization Techniques

✅ Move old data to cheaper storage

  • S3 Glacier / Deep Archive

✅ Delete unnecessary data

  • Use lifecycle expiration rules

✅ Automate retention

  • Avoid manual management

✅ Avoid over-retention

  • Do not keep data longer than needed

🔹 Common Exam Scenarios

🧠 Scenario 1:

Large amount of log data, rarely accessed after 30 days
✅ Solution: S3 Lifecycle → Glacier


🧠 Scenario 2:

Temporary data in NoSQL database
✅ Solution: DynamoDB TTL


🧠 Scenario 3:

Need backups but want automatic deletion
✅ Solution: AWS Backup or DLM


🧠 Scenario 4:

Data must not be deleted for compliance
✅ Solution: S3 Object Lock


🧠 Scenario 5:

Long-term database backup retention
✅ Solution: RDS manual snapshots


🔹 Key Differences to Remember

FeatureAuto DeleteManual DeleteUse Case
S3 LifecycleObject storage
DynamoDB TTLTemporary data
RDS BackupsShort-term recovery
RDS SnapshotsLong-term backup
AWS BackupCentralized backup
S3 Object Lock❌ (protected)Compliance

🔹 Best Practices (Very Important)

  • Use automation wherever possible
  • Choose storage class based on access frequency
  • Regularly review retention policies
  • Combine backup + lifecycle policies
  • Avoid keeping unused data

🔹 Exam Tips (Must Remember)

  • Lifecycle policies = cost optimization
  • Glacier = cheap, slow access
  • TTL = automatic deletion
  • Snapshots ≠ automatic deletion
  • Object Lock = compliance, cannot delete

✅ Final Summary

A data retention policy in AWS is used to:

  • Control how long data is stored
  • Move data to cheaper storage over time
  • Automatically delete unnecessary data
  • Meet compliance and regulatory requirements

👉 The key to passing the exam:

  • Understand when to store, move, archive, or delete data
  • Know which AWS service handles each case
  • Always think in terms of cost optimization + automation
Buy Me a Coffee