Data retention and classification

Task Statement 1.3: Determine appropriate data security controls.

📘AWS Certified Solutions Architect – (SAA-C03)


1. Introduction to Data Security Controls

In AWS, data security controls ensure that data is:

  • Protected from unauthorized access
  • Stored securely
  • Retained only as long as necessary
  • Deleted properly when no longer needed

For the SAA-C03 exam, you must understand:

  • How to classify data
  • How to apply retention policies
  • Which AWS services help manage classification and retention
  • How lifecycle policies and backups work
  • Compliance and governance considerations

2. What is Data Classification?

Definition

Data classification is the process of categorizing data based on:

  • Sensitivity
  • Confidentiality
  • Regulatory requirements
  • Business impact

It helps determine:

  • Who can access the data
  • How the data should be encrypted
  • How long it should be stored
  • Where it can be stored

2.1 Common Data Classification Levels

Although AWS does not force specific labels, organizations commonly use:

  1. Public Data
    • Safe for public access
    • Example: Public website content
  2. Internal Data
    • For internal use only
    • Example: Internal documentation
  3. Confidential Data
    • Sensitive business information
    • Example: Financial reports
  4. Highly Confidential / Restricted
    • Critical data with strict compliance requirements
    • Example: Personal identifiable information (PII), health data

For the exam:
You should understand that classification determines security controls and retention requirements.


3. Why Data Classification is Important in AWS

In AWS, data can be stored in:

  • Amazon S3
  • Amazon EBS
  • Amazon RDS
  • Amazon DynamoDB
  • Amazon Glacier

Different types of data require:

  • Encryption at rest
  • Encryption in transit
  • Access restrictions (IAM policies)
  • Logging and monitoring
  • Backup and retention rules

4. AWS Services Used for Data Classification

4.1 Amazon Macie

Purpose: Automatically discovers and classifies sensitive data in Amazon S3.

It can detect:

  • Personally identifiable information (PII)
  • Financial data
  • Credentials

Important for exam:

  • Macie works mainly with Amazon S3
  • Helps identify sensitive data stored improperly
  • Supports compliance efforts

4.2 Resource Tagging

AWS allows you to tag resources with:

  • Data classification labels
  • Environment labels (prod, dev)
  • Compliance requirements

Example tags:

  • DataType=Confidential
  • Retention=7Years

Tags help:

  • Apply lifecycle policies
  • Apply IAM restrictions
  • Automate governance

5. What is Data Retention?

Definition

Data retention is the practice of keeping data for a specific period of time based on:

  • Business requirements
  • Legal requirements
  • Compliance policies

After the retention period:

  • Data may be archived
  • Data may be deleted

6. Data Retention in AWS

Retention is controlled using:

  • Lifecycle policies
  • Backup policies
  • Versioning
  • Object Lock
  • Archival storage classes

7. S3 Lifecycle Policies (Very Important for Exam)

7.1 Amazon S3 Lifecycle Rules

Lifecycle rules allow you to:

  • Transition objects to cheaper storage
  • Expire (delete) objects automatically
  • Manage previous versions

Example transitions:

  • S3 Standard → S3 Standard-IA
  • S3 Standard → Glacier
  • Glacier → Deep Archive

This helps with:

  • Cost optimization
  • Retention compliance

8. S3 Object Lock (Compliance Feature)

S3 Object Lock provides:

  • Write Once Read Many (WORM) protection
  • Protection against deletion
  • Protection against modification

Two modes:

  1. Governance Mode
    • Users with special permissions can override
  2. Compliance Mode
    • Cannot be overridden
    • Even root user cannot delete

Important for exam:
Use Object Lock when data must not be deleted before a retention period.


9. Backup and Retention

9.1 AWS Backup

Centralized backup service for:

  • EBS
  • RDS
  • DynamoDB
  • EFS

Features:

  • Define backup plans
  • Set retention periods
  • Automatic deletion after retention
  • Cross-region backup

Exam Tip:
If question asks about centralized retention management → AWS Backup.


10. Versioning and Retention

10.1 S3 Versioning

When enabled:

  • Keeps multiple versions of an object
  • Protects against accidental deletion
  • Works with lifecycle rules

Versioning + Lifecycle:
You can:

  • Expire old versions
  • Retain current versions longer

11. Database Retention

11.1 Amazon RDS

RDS provides:

  • Automated backups
  • Retention period (1–35 days)
  • Manual snapshots (kept until deleted)

11.2 Amazon DynamoDB

Provides:

  • On-demand backups
  • Point-in-time recovery (PITR)

Retention decisions must match classification level.


12. Encryption and Classification Relationship

Higher classification = stronger encryption controls.

Encryption options:

  • Server-side encryption (SSE-S3)
  • SSE-KMS
  • Client-side encryption

KMS-based encryption uses:

  • AWS Key Management Service

Important for exam:
Highly confidential data → Use KMS with strict key policies.


13. Compliance Considerations

Some data must be retained for legal reasons.

Examples of compliance standards:

  • GDPR
  • HIPAA
  • PCI-DSS

AWS provides:

  • Audit logs using AWS CloudTrail
  • Configuration tracking via AWS Config

Retention policies must align with compliance rules.


14. Data Deletion and Secure Disposal

After retention period:

Options:

  • Automatic deletion (S3 lifecycle)
  • Manual deletion
  • Secure deletion via KMS key deletion

Key concept:
Deleting a KMS key can make encrypted data unreadable.


15. Exam Scenarios You Must Understand

You should be able to answer questions like:

  1. Which service discovers sensitive data in S3?
    → Amazon Macie
  2. How to enforce retention so objects cannot be deleted?
    → S3 Object Lock (Compliance Mode)
  3. How to reduce storage cost while retaining data?
    → Lifecycle transitions to Glacier
  4. How to centrally manage backup retention?
    → AWS Backup
  5. How to track configuration compliance?
    → AWS Config
  6. How to audit API actions for data access?
    → CloudTrail

16. Important Design Principles for the Exam

Always remember:

  1. Classify data first
  2. Apply least privilege access
  3. Encrypt sensitive data
  4. Define retention policy
  5. Automate lifecycle management
  6. Monitor and audit access
  7. Delete data securely after retention

17. Quick Comparison Table

RequirementAWS Solution
Discover sensitive dataAmazon Macie
Long-term archiveS3 Glacier
Prevent deletionS3 Object Lock
Backup managementAWS Backup
Encryption keysAWS KMS
API auditingCloudTrail
Config complianceAWS Config

18. Final Summary for SAA-C03

To pass this section, you must clearly understand:

  • What data classification means
  • How classification affects encryption and access
  • How to implement retention policies
  • Lifecycle rules and transitions
  • S3 Object Lock modes
  • Backup retention settings
  • Compliance enforcement tools

If you remember this structure:

Classify → Protect → Retain → Archive → Delete → Audit

You will be able to answer almost any exam question related to Data Retention and Classification.

Buy Me a Coffee