Task Statement 4.3: Design cost-optimized database solutions.

📘AWS Certified Solutions Architect – (SAA-C03)

🔹 What is a Data Retention Policy?

A data retention policy defines:

How long data is stored
Where it is stored
When it should be deleted, archived, or moved to cheaper storage

It is used to control storage cost, meet compliance requirements, and manage data lifecycle efficiently.

🎯 Why Data Retention Policies Matter (Exam Focus)

You need to understand three key goals:

1. Cost Optimization

Storing all data forever is expensive
Older data is accessed less frequently → should be moved to cheaper storage

2. Compliance & Regulations

Some data must be kept for a specific time period
Some data must be deleted after a certain time

3. Performance Optimization

Active systems should only store frequently accessed (hot) data
Old data should not slow down databases

🔹 Data Lifecycle Concept (Very Important)

Data typically moves through stages:

Stage	Description	Storage Type
Hot Data	Frequently accessed	Fast, expensive storage
Warm Data	Occasionally accessed	Medium-cost storage
Cold Data	Rarely accessed	Cheap storage
Archived Data	Almost never accessed	Very cheap storage

👉 AWS uses this lifecycle model heavily in cost optimization questions.

🔹 AWS Services for Data Retention

1. Amazon S3 Lifecycle Policies

🔹 What it does:

Automatically moves or deletes objects based on rules.

🔹 Key Actions:

Transition to cheaper storage:
- S3 Standard → S3 Standard-IA
- S3 Standard-IA → S3 Glacier
- S3 Glacier → S3 Glacier Deep Archive
Expire (delete) objects after time

🔹 Example (IT scenario):

Logs stored in S3:
- After 30 days → move to S3 IA
- After 90 days → move to Glacier
- After 1 year → delete

🔹 Key Exam Points:

Fully automated
Works at bucket or object level
Reduces storage cost significantly

2. Amazon RDS Automated Backups & Snapshots

🔹 Features:

Automated backups (retention: 1–35 days)
Manual snapshots (kept until deleted)

🔹 Retention Strategy:

Short-term recovery → automated backups
Long-term retention → manual snapshots

🔹 Exam Tips:

Automated backups expire automatically
Snapshots must be deleted manually

3. Amazon DynamoDB TTL (Time To Live)

🔹 What it does:

Automatically deletes items after a specified timestamp

🔹 Use Case:

Temporary or session-based data

🔹 Exam Tips:

No manual deletion needed
Helps reduce storage cost automatically

4. Amazon EBS Snapshots Lifecycle

🔹 Used with:

EC2 volumes

🔹 Retention Control:

Use Amazon Data Lifecycle Manager (DLM) to:
- Schedule snapshots
- Automatically delete old snapshots

🔹 Exam Tips:

Prevents accumulation of unused snapshots
Important for cost control

5. AWS Backup

🔹 Centralized backup service

🔹 Features:

Define backup plans
Set retention periods
Automate deletion

🔹 Exam Tips:

Works across multiple services (RDS, EBS, DynamoDB, etc.)
Good for organization-wide retention policies

6. Amazon S3 Object Lock (Compliance Feature)

🔹 What it does:

Prevents object deletion for a fixed time

🔹 Modes:

Governance mode
Compliance mode (strict)

🔹 Exam Tips:

Used for regulatory requirements
Data cannot be deleted before retention period ends

🔹 Data Retention Strategies (Exam Scenarios)

1. Time-Based Retention

Keep data for a fixed period (e.g., 90 days)
Then delete or archive

2. Lifecycle-Based Retention

Move data across storage tiers over time

3. Event-Based Retention

Keep data until a condition is met

4. Legal Hold / Compliance Retention

Prevent deletion for compliance reasons

🔹 Cost Optimization Techniques

✅ Move old data to cheaper storage

S3 Glacier / Deep Archive

✅ Delete unnecessary data

Use lifecycle expiration rules

✅ Automate retention

Avoid manual management

✅ Avoid over-retention

Do not keep data longer than needed

🔹 Common Exam Scenarios

🧠 Scenario 1:

Large amount of log data, rarely accessed after 30 days
✅ Solution: S3 Lifecycle → Glacier

🧠 Scenario 2:

Temporary data in NoSQL database
✅ Solution: DynamoDB TTL

🧠 Scenario 3:

Need backups but want automatic deletion
✅ Solution: AWS Backup or DLM

🧠 Scenario 4:

Data must not be deleted for compliance
✅ Solution: S3 Object Lock

🧠 Scenario 5:

Long-term database backup retention
✅ Solution: RDS manual snapshots

🔹 Key Differences to Remember

Feature	Auto Delete	Manual Delete	Use Case
S3 Lifecycle	✅	❌	Object storage
DynamoDB TTL	✅	❌	Temporary data
RDS Backups	✅	❌	Short-term recovery
RDS Snapshots	❌	✅	Long-term backup
AWS Backup	✅	❌	Centralized backup
S3 Object Lock	❌ (protected)	❌	Compliance

🔹 Best Practices (Very Important)

Use automation wherever possible
Choose storage class based on access frequency
Regularly review retention policies
Combine backup + lifecycle policies
Avoid keeping unused data

🔹 Exam Tips (Must Remember)

Lifecycle policies = cost optimization
Glacier = cheap, slow access
TTL = automatic deletion
Snapshots ≠ automatic deletion
Object Lock = compliance, cannot delete

✅ Final Summary

A data retention policy in AWS is used to:

Control how long data is stored
Move data to cheaper storage over time
Automatically delete unnecessary data
Meet compliance and regulatory requirements

👉 The key to passing the exam:

Understand when to store, move, archive, or delete data
Know which AWS service handles each case
Always think in terms of cost optimization + automation