Data lifecycles

Task Statement 4.1: Design cost-optimized storage solutions.

📘AWS Certified Solutions Architect – (SAA-C03)


1. What is a Data Lifecycle?

A data lifecycle is the process of managing data from the moment it is created to the point it is deleted. In AWS, managing this lifecycle is important because storing data long-term can get expensive. AWS gives tools to automate this, so you can move data to cheaper storage or delete it when it’s no longer needed.

Key stages of a data lifecycle:

  1. Creation/Generation – Data is first created, for example, logs from servers, application backups, or uploaded files.
  2. Active Use – Data is frequently accessed and may need fast storage.
  3. Infrequent Access – Data is accessed less often but still needs to be available.
  4. Archival – Data is rarely accessed but must be kept for regulatory or business reasons.
  5. Deletion – Data is removed permanently when it is no longer required.

2. AWS Storage Services and Data Lifecycle

AWS provides different storage options for different lifecycle stages. You need to know the right storage for each stage to optimize cost.

Lifecycle StageAWS Service & TierDescription
ActiveAmazon S3 Standard / EBS SSDData that applications use regularly (e.g., database files, active logs).
Infrequent AccessS3 Standard-IA, S3 One Zone-IAData accessed less often (e.g., monthly reports, old project files). Cheaper than Standard but slightly slower.
ArchiveS3 Glacier / S3 Glacier Deep ArchiveData rarely accessed (e.g., compliance backups, historical logs). Very low cost, slower retrieval.
DeletionS3 Lifecycle Rules / Data Retention PoliciesAutomatically delete data when it is no longer needed. Helps save costs.

3. Lifecycle Policies in AWS

AWS lets you automate the movement of data between storage classes using lifecycle policies. This is key for cost optimization.

Example: Using S3 Lifecycle Policies

  • Scenario: You have a company storing log files in S3.
  • Policy Setup:
    1. 0–30 days: Keep logs in S3 Standard for fast access.
    2. 31–180 days: Move logs to S3 Standard-IA to reduce cost.
    3. 181–365 days: Move logs to S3 Glacier for archival.
    4. After 2 years: Automatically delete logs that are no longer needed.

Benefits:

  • Saves money by moving data to cheaper storage automatically.
  • Reduces manual management.
  • Helps meet compliance rules.

4. EBS Snapshots and Lifecycle

EBS volumes (used for EC2 instances) also follow a lifecycle:

  1. Active Volume: Your EC2 instance reads/writes data on the volume.
  2. Snapshot for Backup: Take a snapshot of the volume in S3 for protection.
  3. Snapshot Lifecycle: You can use EBS Snapshot Lifecycle Policies to automatically delete old snapshots after a certain period.

Example:

  • Keep snapshots daily for 7 days.
  • Keep weekly snapshots for 4 weeks.
  • Delete snapshots older than 30 days automatically.

5. FSx and EFS Lifecycle Management

  • Amazon FSx (for Windows or Lustre) and Amazon EFS (Elastic File System) also support lifecycle management.
  • You can move infrequently accessed files to lower-cost storage (EFS Infrequent Access) automatically.
  • This helps reduce costs for large file systems without manual intervention.

6. Key Exam Points

For the exam, you need to remember:

  1. Lifecycle Definition: Managing data from creation to deletion.
  2. S3 Lifecycle Policies: Automate moving objects between storage classes or deleting them.
  3. Storage Classes for Cost Optimization:
    • Standard → frequently accessed
    • Standard-IA / One Zone-IA → infrequent access
    • Glacier / Deep Archive → archive/rare access
  4. EBS Snapshot Lifecycle Policies: Automate deletion of old snapshots.
  5. EFS Infrequent Access Lifecycle: Move files automatically to reduce costs.
  6. Benefits of Lifecycle Management:
    • Cost optimization
    • Compliance
    • Automated management

7. Quick IT Examples for Exam Recall

  • Logs stored in S3 Standard, older logs moved to Glacier, and deleted after 2 years.
  • EC2 database backups taken as EBS snapshots, old snapshots deleted automatically.
  • Company file shares on EFS, inactive files moved to EFS Infrequent Access.

Summary:
Data lifecycle in AWS is about automating the storage and deletion of data across different services (S3, EBS, EFS, FSx) to reduce costs while keeping data accessible or compliant. For the exam, focus on knowing the storage classes, lifecycle policies, and examples of cost optimization.

Buy Me a Coffee