All Guides
AWS / Architecture Guide

AWS Backup
Deep Dive

How Vaults, Plans, Jobs, Copy Jobs, and Restore Jobs are orchestrated — with diagrams and real-world examples.

00 / Before You Start

New to AWS Backup? Start Here

AWS Backup can feel intimidating at first. Here's the mental model that makes everything click:

AWS Backup

Think of it like a Bank Vault

A Vault is your secure safe-deposit box. Your backups (called recovery points) sit inside it. You can have multiple vaults — one for prod, one for DR (disaster recovery), one for compliance.

EventBridge Schedule

Think of Plans like a Calendar Subscription

A Backup Plan is like a recurring calendar event: "every night at 2 AM, back up everything tagged Backup=true and keep it for 30 days." You write the plan once; AWS does the work automatically.

RDS Snapshot

Recovery Points are Snapshots in Time

Each time a backup runs, it creates a Recovery Point — a frozen copy of your resource at that exact moment. You can restore from any of these points later. Think of them like iPhone backups: you can roll back to yesterday's or last week's.

ARN

ARN = Amazon Resource Name (a unique ID)

You'll see ARNs everywhere in AWS. They're just unique identifiers for any resource, like arn:aws:rds:us-east-1:123456789:db:my-database. When the docs say "Recovery Point ARN", they mean the unique ID of a specific backup snapshot.

AWS Region

Region = Physical AWS Data Center Location

AWS has data centers worldwide (us-east-1 = N. Virginia, eu-west-1 = Ireland, etc.). "Cross-region copy" means sending a backup to a different geography so that if an entire region fails, you still have your data elsewhere.

IAM

IAM Role = Permission Pass for AWS Backup

AWS Backup needs permission to access your databases, EC2 instances, etc. An IAM Role grants those permissions. AWS provides a default one called AWSBackupDefaultServiceRole that works for most cases — just use that to start.

The 30-second summary: You tag your AWS resources (databases, servers, file systems) with Backup=true. You create a Backup Plan that says "back these up nightly". AWS Backup runs automatically, stores snapshots in a Vault, optionally copies them to another region for safety, and you can restore any snapshot to a brand-new resource whenever needed.
01 / Core Concepts

The 5 Building Blocks

AWS Backup is a fully managed service that centralizes and automates data protection across AWS services. Before diving into flows, understand each primitive.

AWS Backup Vault
VAULT

Backup Vault

An encrypted container that stores recovery points (backups). Each vault has an access policy and an optional vault lock (WORM). You can have multiple vaults per region/account.

Backup Plan
PLAN

Backup Plan

A policy document that defines when to back up (schedule), how long to retain (lifecycle), and where to send copies (copy rules). Assigned to resources via tags or ARNs.

Backup Job
JOB

Backup Job

The actual execution: takes a snapshot or continuous backup of a resource and stores a recovery point in the target vault. Triggered by a plan rule or manually.

Copy Job
COPY JOB

Copy Job

Copies an existing recovery point from one vault to another — either in the same region, a different region, or even a different AWS account. Useful for DR and compliance.

Restore Job
RESTORE JOB

Restore Job

Recreates a resource from a recovery point stored in a vault. You specify the target configuration; AWS Backup handles provisioning the new resource.

02 / Architecture

How Everything Connects

The diagram below shows the top-level relationships between AWS Backup components and the protected resources.

graph TD subgraph ACCOUNT["AWS Account (us-east-1)"] direction TB PLAN["Backup Plan ───────────── Schedule: cron(0 2 * * ? *) Retention: 30 days warm / 365 delete Copy rule to DR vault"] SEL["Resource Selection ───────────── Tag: Backup=true or specific ARNs"] subgraph RESOURCES["Protected Resources"] EC2["EC2 Instance"] RDS["RDS Database"] EFS["EFS File System"] DDB["DynamoDB Table"] end subgraph PRIMARY_VAULT["Primary Vault (us-east-1)"] RP1["Recovery Point 1 2024-01-15 02:00"] RP2["Recovery Point 2 2024-01-16 02:00"] RP3["Recovery Point 3 2024-01-17 02:00"] end end subgraph DR_ACCOUNT["DR Account / DR Region (eu-west-1)"] DR_VAULT["DR Vault Cross-region copy"] RP_DR["Recovery Points copied from primary"] end PLAN --> SEL SEL --> RESOURCES RESOURCES -->|"Backup Job"| PRIMARY_VAULT PRIMARY_VAULT -->|"Copy Job"| DR_VAULT DR_VAULT --> RP_DR style ACCOUNT fill:#111827,stroke:#1e2d45,color:#e2e8f0 style DR_ACCOUNT fill:#0f1e35,stroke:#1e2d45,color:#e2e8f0 style RESOURCES fill:#0a0e1a,stroke:#1e2d45,color:#e2e8f0 style PRIMARY_VAULT fill:#1a1a2e,stroke:#3b82f6,color:#e2e8f0 style DR_VAULT fill:#1a1a2e,stroke:#8b5cf6,color:#e2e8f0 style PLAN fill:#1e2d45,stroke:#3b82f6,color:#93c5fd style SEL fill:#1e2d45,stroke:#f59e0b,color:#fbbf24
A single Backup Plan can protect hundreds of resources simultaneously — AWS Backup runs one Backup Job per resource per rule execution.
03 / Backup Plans

Anatomy of a Backup Plan

A Backup Plan contains one or more rules, and is associated with resources via selections. Here's a real-world example plan JSON:

// Example: Production Backup Plan
{
  "BackupPlanName": "prod-daily-backup-plan",
  "Rules": [
    {
      "RuleName":             "DailyToUsEast1",
      "TargetBackupVaultName": "prod-primary-vault",
      "ScheduleExpression":   "cron(0 2 * * ? *)",  // 2 AM UTC daily
      "StartWindowMinutes":   60,
      "CompletionWindowMinutes": 180,
      "Lifecycle": {
        "MoveToColdStorageAfterDays": 30,
        "DeleteAfterDays": 365
      },
      "CopyActions": [         // triggers a Copy Job after backup
        {
          "DestinationBackupVaultArn": "arn:aws:backup:eu-west-1:DR_ACCOUNT_ID:backup-vault:dr-vault",
          "Lifecycle": {
            "DeleteAfterDays": 90
          }
        }
      ]
    },
    {
      "RuleName":             "WeeklyToUsEast1",
      "TargetBackupVaultName": "prod-primary-vault",
      "ScheduleExpression":   "cron(0 3 ? * SUN *)",  // Sunday 3 AM
      "Lifecycle": { "DeleteAfterDays": 1825 } // 5 years
    }
  ],

  "Selections": [  // what gets backed up by this plan
    {
      "SelectionName": "all-tagged-resources",
      "IamRoleArn": "arn:aws:iam::ACCOUNT:role/service-role/AWSBackupDefaultServiceRole",
      "ListOfTags": [
        { "ConditionType": "STRINGEQUALS",
          "ConditionKey":  "Backup",
          "ConditionValue":"true" }
      ]
    }
  ]
}
  

KEY FIELDS EXPLAINED

Field Purpose Example
ScheduleExpression Cron expression for when jobs fire cron(0 2 * * ? *) = 2 AM UTC daily
StartWindowMinutes Window during which job must start, or it becomes EXPIRED (min 60 min, default 8 hrs) 60 = job must start within 1 hour
CompletionWindowMinutes Time from scheduled start by which job must complete, or it is cancelled (default 7 days) 180 = 3 hours max runtime
MoveToColdStorageAfterDays Auto-transition to cold storage (cheaper) 30 = after 30 days → cold tier
DeleteAfterDays Auto-delete the recovery point 365 = deleted after 1 year
CopyActions Cross-region/account copy after backup completes Copy to EU DR vault
04 / Backup Jobs

Backup Job Lifecycle

When a plan rule fires, AWS Backup creates a Job for each matching resource. Each job goes through these states:

stateDiagram-v2 [*] --> CREATED : Schedule triggers CREATED --> PENDING : Resource being prepared PENDING --> RUNNING : Snapshot started RUNNING --> COMPLETED : Backup written to vault RUNNING --> PARTIAL : Completed with partial results RUNNING --> FAILED : Error occurred RUNNING --> ABORTING : Cancellation requested ABORTING --> ABORTED : Cancelled CREATED --> EXPIRED : Not started within StartWindow COMPLETED --> [*] PARTIAL --> [*] FAILED --> [*] ABORTED --> [*] EXPIRED --> [*] note right of COMPLETED Recovery Point now in Vault. Copy Job triggers if CopyActions defined. end note

STEP-BY-STEP FLOW

1

Schedule Fires

EventBridge rule triggers at the cron time. AWS Backup evaluates which resources match the plan's selection criteria (tags or ARNs).

2

Job Created → PENDING

One Backup Job per resource is created. The job enters PENDING while AWS Backup coordinates with the service (e.g. creates an EBS snapshot or RDS snapshot).

3

Job RUNNING — data transfer

The actual backup data is written to the vault. For EBS this is a snapshot. For EFS/DynamoDB it uses AWS Backup's native transfer. Progress is trackable via GetBackupJobStatus.

4

Recovery Point Created

On COMPLETED, a recovery point ARN is generated and stored in the vault. Metadata (creation time, resource type, encryption) is attached.

5

Copy Job Triggered (if configured)

If the rule has CopyActions, a Copy Job is automatically spawned to replicate the recovery point to the target vault.

05 / Copy Jobs

Copy Job — Cross-Region & Cross-Account

Copy Jobs replicate recovery points between vaults. They are the backbone of multi-region DR strategies and compliance isolation.

flowchart LR subgraph SOURCE["Source — us-east-1, Account A"] VAULT_SRC["prod-primary-vault Recovery Point: arn:...rp/abc123"] end subgraph DEST1["Same-Region Vault"] VAULT_SAME["prod-compliance-vault Copied Recovery Point locked / WORM"] end subgraph DEST2["DR Region — eu-west-1, Account B"] VAULT_DR["dr-vault Copied Recovery Point 90-day retention"] end VAULT_SRC -->|"Copy Job 1 — same-region"| VAULT_SAME VAULT_SRC -->|"Copy Job 2 — cross-region / cross-account"| VAULT_DR style SOURCE fill:#111827,stroke:#3b82f6 style DEST1 fill:#111827,stroke:#10b981 style DEST2 fill:#111827,stroke:#8b5cf6

COPY JOB REQUIREMENTS

Scenario Requirement Notes
Same-region copy Source & dest vault in same region Good for compliance vault isolation
Cross-region copy IAM role with cross-region permissions Both regions must be enabled in account
Cross-account copy Dest vault must add source account to access policy Use AWS Organizations for easier setup
Cross-account + cross-region Both org policies and vault access policies updated Recommended for DR isolation
// Destination vault access policy (allows Account A to copy in)
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect":    "Allow",
    "Principal": { "AWS": "arn:aws:iam::SOURCE_ACCOUNT_ID:root" },
    "Action": [
      "backup:CopyIntoBackupVault"
    ],
    "Resource":  "*"
  }]
}
  
06 / Restore Jobs

Restore Job — Recovering Resources

A Restore Job recreates an AWS resource from a recovery point stored in a vault. Think of it like pulling a saved game — AWS rebuilds the resource from that snapshot. For most services, it creates a brand-new resource and never touches the original. (Exceptions: S3 can restore to an existing bucket; EFS supports item-level restore into an existing file system.)

Do I need to spin up a new database?

Yes — always. AWS Backup cannot restore "in place". When you restore an RDS database, you get:
• A new DB instance with a brand-new hostname (e.g. prod-db-restored.abc123.rds.amazonaws.com)
• A new ARN and resource ID
• Your original database is left completely untouched

After the restore completes, you must manually redirect your app to the new database — by updating your connection string, Secrets Manager secret, or environment variables. This is by design: AWS protects you from accidentally overwriting a healthy database.
flowchart TD A["Operator / Automation"] --> B["1. List Recovery Points aws backup list-recovery-points-by-backup-vault"] B --> C["2. Choose a Recovery Point ARN e.g. from last night 02:30 UTC"] C --> D["3. Get required restore metadata aws backup get-recovery-point-restore-metadata"] D --> E["4. Start Restore Job aws backup start-restore-job"] E --> F{Job Status} F -->|"RUNNING 15-30 min"| G["AWS provisions new RDS instance..."] G --> F F -->|"COMPLETED"| H["New DB available new-prod-db.xyz.rds.amazonaws.com"] F -->|"FAILED"| I["Check CloudWatch Logs + DescribeRestoreJob"] H --> J["5. Validate data run queries / smoke tests"] J --> K{OK?} K -->|Yes| L["6. Cut over traffic Update Secrets Manager or Route 53 CNAME"] K -->|No| M["Try earlier recovery point"] L --> N["7. Delete old instance or keep for rollback"] style A fill:#1e2d45,stroke:#3b82f6,color:#93c5fd style H fill:#1a2e1a,stroke:#10b981,color:#6ee7b7 style I fill:#2e1a1a,stroke:#ef4444,color:#fca5a5 style L fill:#1a2e1a,stroke:#10b981,color:#6ee7b7 style N fill:#1e1a2e,stroke:#8b5cf6,color:#c4b5fd
Service What's restored New resource? Cutover needed?
RDS / Aurora New DB instance / cluster from snapshot New endpoint + ARN Yes — update connection string
EC2 (EBS) New EC2 instance from AMI created from snapshot New Instance ID + Volume IDs Update target groups / DNS
EFS New EFS file system New FS ID + DNS Yes — remount or update mount target
DynamoDB New table (point-in-time) New table name Yes — update app table reference
S3 Objects to same or different bucket Same or new bucket Only if new bucket name
Aurora New Aurora cluster New cluster ARN + endpoint Yes — update connection string

CODE SAMPLES

Click a tab to see the restore code for each service.

What happens when you run this?
AWS creates a completely new RDS instance with a new hostname like prod-mysql-restored-20240117.abc.us-east-1.rds.amazonaws.com.
Your original database keeps running untouched. Your app still talks to the old one until you change the connection string.
Prerequisites: AWS CLI installed & configured (aws configure), and your IAM user must have backup:* and rds:* permissions.
# ─────────────────────────────────────────────────────────────────────
# STEP 1 — Find available recovery points (backups) in your vault
#   This lists all RDS backups. Look at "Created" to find the one
#   from the date/time you want to restore from.
# ─────────────────────────────────────────────────────────────────────
aws backup list-recovery-points-by-backup-vault \
  --backup-vault-name "prod-primary-vault" \
  --by-resource-type "RDS" \
  --query 'RecoveryPoints[*].{ARN:RecoveryPointArn,Created:CreationDate,Status:Status}' \
  --output table

# Example output:
# -----------------------------------------------------------------------
# |            ListRecoveryPointsByBackupVault                          |
# +------------------------------+-----------+---------------------------+
# | ARN                          | Created   | Status                    |
# +------------------------------+-----------+---------------------------+
# | arn:aws:rds:...:awsbackup-.. | 2024-01-17| COMPLETED                 |
# | arn:aws:rds:...:awsbackup-.. | 2024-01-16| COMPLETED                 |
# +------------------------------+-----------+---------------------------+
#   ↑ Copy the ARN of the backup you want to restore from

# ─────────────────────────────────────────────────────────────────────
# STEP 2 — Ask AWS what parameters are needed to restore this backup.
#   AWS Backup returns a JSON object with all the config of the
#   original DB (instance class, engine, subnet group, etc.)
#   You'll use this in Step 3 — just change the DB name.
# ─────────────────────────────────────────────────────────────────────
aws backup get-recovery-point-restore-metadata \
  --backup-vault-name "prod-primary-vault" \
  --recovery-point-arn "arn:aws:rds:us-east-1:123456789:snapshot:awsbackup-2024-01-17-02-30"

# Returns something like:
# {
#   "DBInstanceIdentifier": "prod-mysql",      ← original DB name
#   "DBInstanceClass":      "db.t3.medium",    ← instance size
#   "Engine":               "mysql",           ← database engine
#   "MultiAZ":              "false",           ← high-availability setting
#   "DBSubnetGroupName":    "prod-subnet-group",
#   "VpcSecurityGroupIds":  "sg-0abc123"
# }
#   ↑ Copy this output. You'll paste it into Step 3, changing only
#     DBInstanceIdentifier to a new unique name.

# ─────────────────────────────────────────────────────────────────────
# STEP 3 — Start the Restore Job.
#   IMPORTANT: Change "DBInstanceIdentifier" to a NEW name.
#   If you use the same name as the original, it will FAIL because
#   a DB with that name already exists.
# ─────────────────────────────────────────────────────────────────────
aws backup start-restore-job \
  --recovery-point-arn "arn:aws:rds:us-east-1:123456789:snapshot:awsbackup-2024-01-17-02-30" \
  --iam-role-arn "arn:aws:iam::123456789:role/service-role/AWSBackupDefaultServiceRole" \
  --resource-type "RDS" \
  --metadata '{
    "DBInstanceIdentifier": "prod-mysql-restored-20240117",
    "DBInstanceClass":      "db.t3.medium",
    "Engine":               "mysql",
    "MultiAZ":              "false",
    "DBSubnetGroupName":    "prod-subnet-group",
    "VpcSecurityGroupIds":  "sg-0abc123"
  }'

# Returns: { "RestoreJobId": "ABCDEF123456" }
#   ↑ Save this ID — you need it to check progress in Step 4

# ─────────────────────────────────────────────────────────────────────
# STEP 4 — Monitor restore progress (takes 15-30 min for most DBs)
#   Run this every few minutes. Status goes:
#   PENDING → RUNNING → COMPLETED (or FAILED)
# ─────────────────────────────────────────────────────────────────────
aws backup describe-restore-job \
  --restore-job-id "ABCDEF123456"

# When COMPLETED, you'll see:
# { "Status": "COMPLETED", "CreatedResourceArn": "arn:aws:rds:...:db:prod-mysql-restored-20240117" }

# ─────────────────────────────────────────────────────────────────────
# STEP 5 — Get the hostname of the new database
#   This is the address your app needs to connect to.
# ─────────────────────────────────────────────────────────────────────
aws rds describe-db-instances \
  --db-instance-identifier "prod-mysql-restored-20240117" \
  --query 'DBInstances[0].Endpoint.Address'

# Output: "prod-mysql-restored-20240117.abc123.us-east-1.rds.amazonaws.com"
#   ↑ This is your new database hostname

# ─────────────────────────────────────────────────────────────────────
# STEP 6 — Update Secrets Manager so your app picks up the new host
#   (If you store DB credentials in Secrets Manager — recommended)
#   After this, restart your app containers/servers to reconnect.
# ─────────────────────────────────────────────────────────────────────
aws secretsmanager update-secret \
  --secret-id "prod/db/connection" \
  --secret-string '{"host":"prod-mysql-restored-20240117.abc123.us-east-1.rds.amazonaws.com","port":3306,"username":"admin","password":"your-password"}'
After step 6: Restart your application servers so they re-read the secret and connect to the new database. The old database is still running — you can delete it once you've confirmed everything works.
07 / Coordination

End-to-End Coordination

Here is how all components interact in a complete backup + DR + restore scenario, from schedule fire to successful restore.

sequenceDiagram participant SCHED as EventBridge Scheduler participant BACKUP as AWS Backup Service participant RESOURCE as Resource (RDS) participant VAULT_P as Primary Vault participant VAULT_DR as DR Vault (eu-west-1) participant OPS as Operator Note over SCHED,BACKUP: Backup Plan triggers at cron(0 2 * * ? *) SCHED->>BACKUP: Rule fires — create Backup Jobs BACKUP->>RESOURCE: Request snapshot / backup RESOURCE-->>VAULT_P: Stream backup data BACKUP-->>BACKUP: Backup Job: RUNNING RESOURCE-->>BACKUP: Snapshot complete BACKUP->>VAULT_P: Store Recovery Point (RP-001) BACKUP-->>BACKUP: Backup Job: COMPLETED Note over BACKUP,VAULT_DR: CopyAction defined in rule — auto-spawn Copy Job BACKUP->>VAULT_P: Read RP-001 BACKUP->>VAULT_DR: Write copy of RP-001 (cross-region) BACKUP-->>BACKUP: Copy Job: COMPLETED Note over OPS,VAULT_DR: Incident: Production DB corrupted OPS->>VAULT_DR: Browse recovery points VAULT_DR-->>OPS: List: [RP-001, RP-002, RP-003...] OPS->>BACKUP: Start Restore Job from RP-001 BACKUP->>VAULT_DR: Retrieve recovery point data BACKUP->>RESOURCE: Provision new RDS instance BACKUP-->>OPS: Restore Job: COMPLETED — new-rds-arn OPS->>OPS: Validate, then update DNS / app config
08 / Real-World Example

Complete Example: 3-Tier Web App

A production web app with EC2, RDS, and EFS — backed up daily with cross-region DR copies. Here's the full setup and what happens each night:

flowchart TB subgraph APP["Production App Stack (us-east-1)"] EC2["EC2 Auto Scaling Tag: Backup=true"] RDS["RDS MySQL Tag: Backup=true"] EFS["EFS Tag: Backup=true"] end subgraph PLAN_BOX["prod-backup-plan"] RULE1["Rule: DailyBackup cron 02:00 UTC Retention: 30 days CopyTo eu-west-1"] RULE2["Rule: WeeklyBackup cron SUN 03:00 Retention: 5 years No copy"] end subgraph JOBS["Nightly Jobs (02:00 UTC)"] JOB1["Backup Job EC2 — EBS snapshot"] JOB2["Backup Job RDS snapshot"] JOB3["Backup Job EFS backup"] end subgraph VAULT1["Primary Vault (us-east-1)"] RP_EC2["RP: EC2 daily"] RP_RDS["RP: RDS daily"] RP_EFS["RP: EFS daily"] end subgraph COPY_JOBS["Copy Jobs (auto-triggered)"] CJ1["Copy: EC2 RP to eu-west-1"] CJ2["Copy: RDS RP to eu-west-1"] CJ3["Copy: EFS RP to eu-west-1"] end subgraph VAULT_DR["DR Vault (eu-west-1)"] DR_EC2["Copy: EC2 RP"] DR_RDS["Copy: RDS RP"] DR_EFS["Copy: EFS RP"] end PLAN_BOX --> APP APP --> JOBS JOB1 --> RP_EC2 JOB2 --> RP_RDS JOB3 --> RP_EFS RP_EC2 --> CJ1 RP_RDS --> CJ2 RP_EFS --> CJ3 CJ1 --> DR_EC2 CJ2 --> DR_RDS CJ3 --> DR_EFS style APP fill:#111827,stroke:#3b82f6 style PLAN_BOX fill:#111827,stroke:#f59e0b style JOBS fill:#111827,stroke:#10b981 style VAULT1 fill:#111827,stroke:#3b82f6 style COPY_JOBS fill:#111827,stroke:#8b5cf6 style VAULT_DR fill:#111827,stroke:#ef4444

NIGHTLY TIMELINE

02:00 UTC — Schedule fires CREATED

EventBridge triggers the DailyBackup rule. AWS Backup evaluates all resources tagged Backup=true — finds EC2, RDS, EFS. Creates 3 Backup Jobs.

02:01 — Jobs start running RUNNING

Each job begins. RDS creates a native snapshot; EBS snapshot taken for EC2 volumes; EFS backup streamed to vault. These run in parallel.

02:30 — Backup jobs complete COMPLETED

3 recovery points now stored in prod-primary-vault in us-east-1. Retention lifecycle: warm for 30 days, then auto-deleted.

02:31 — Copy Jobs auto-spawn COPY RUNNING

Because CopyActions is defined, 3 Copy Jobs are automatically created. They stream the recovery points to dr-vault in eu-west-1 under the DR account.

03:15 — Copy Jobs complete COMPLETED

All 3 recovery points are now replicated to EU. DR account has 90-day retention. Primary backups remain independent in us-east-1.

09:00 (next day) — Incident: RDS corruption detected

Ops team decides to restore RDS from the 02:30 recovery point in the DR vault in eu-west-1.

09:05 — Restore Job started RESTORE RUNNING

Restore Job provisions a new RDS instance in eu-west-1 from the copied recovery point. Parameters: new DB identifier, same instance class, target VPC.

09:25 — Restore complete COMPLETED

New RDS instance available at new endpoint. Ops validates data integrity, then updates application config / Route 53 to point to new DB. RPO: ~7 hours. RTO: ~20 minutes.

09 / Supported Services

What Can AWS Backup Protect?

AWS Backup supports a wide range of services. Not all features are available for every service — check the matrix below.

Category Service Continuous / PITR Cold Storage Cross-Region Copy Cross-Account Copy
Compute Amazon EC2 (incl. VSS-enabled Windows) No Yes Yes Yes
Block Storage Amazon EBS No Yes Yes Yes
File Storage Amazon EFS No Yes Yes Yes
File Storage Amazon FSx (ONTAP: no cross-region/account) No No Yes Yes
Object Storage Amazon S3 Yes No Yes Yes
Relational DB Amazon RDS (all engines) Yes No Yes Yes
Relational DB Amazon Aurora Yes No Yes Yes
NoSQL DB Amazon DynamoDB (advanced features required) Yes Yes Yes Yes
Document DB Amazon DocumentDB No No Yes Yes
Graph DB Amazon Neptune No No Yes Yes
Data Warehouse Amazon Redshift No No No No
Time Series Amazon Timestream No Yes Yes Yes
Containers Amazon EKS No No Yes Yes
Hybrid AWS Storage Gateway (Volume) No No Yes Yes
Hybrid VMware VMs (via Backup Gateway) No Yes Yes Yes
SAP SAP HANA on EC2 Yes Yes Yes Yes
IaC AWS CloudFormation (stacks) No No Yes Yes
Continuous Backup / PITR allows restoring to any second within the last 35 days. Supported for RDS, Aurora, DynamoDB, S3, and SAP HANA.
10 / Security

Vault Lock, Encryption & Access Control

AWS Backup provides multiple layers of protection for your recovery points — from encryption and access policies to immutable WORM locks and legal holds.

ENCRYPTION

KMS Encryption

Every vault is tied to an AWS KMS key. Recovery points are encrypted at rest and in transit. You can use the AWS-managed key or your own customer-managed key (CMK) for full control.

ACCESS POLICY

Vault Access Policies

Resource-based policies on each vault control who can create, copy, or delete recovery points. Use these to restrict cross-account access or prevent unauthorized deletion.

CLOUDTRAIL

Audit Trail

All AWS Backup API calls are logged to CloudTrail. Every backup, copy, restore, and deletion is recorded with who did it, when, and from where — critical for compliance audits.

VAULT LOCK (WORM PROTECTION)

Vault Lock enforces a Write-Once, Read-Many (WORM) model on a vault. Once locked, recovery points cannot be deleted before their retention period expires — not even by the root account.

flowchart LR subgraph GOV["Governance Mode"] G1["Lock applied"] G2["Can be removed by privileged IAM users"] G3["Good for testing before compliance"] G1 --> G2 --> G3 end subgraph COMP["Compliance Mode"] C1["Lock applied"] C2["Minimum 72-hour cooling-off period"] C3["After cooling-off: IMMUTABLE forever"] C4["Cannot be removed by anyone — incl. AWS"] C1 --> C2 --> C3 --> C4 end style GOV fill:#111827,stroke:#f59e0b style COMP fill:#111827,stroke:#ef4444
Feature Governance Mode Compliance Mode
Removable? Yes, by users with sufficient IAM permissions No — immutable after minimum 72-hour cooling-off period
Delete recovery points early? No (unless lock is removed first) No — never, by anyone
Change retention? Only to extend (not shorten) Only to extend (not shorten)
Regulatory compliance Partial — good for internal policies SEC 17a-4, CFTC, FINRA compliant
Use case Test before committing to compliance Production compliance vaults

LEGAL HOLD

A Legal Hold preserves specific recovery points from deletion regardless of their lifecycle or retention policies. Unlike Vault Lock (which protects all backups in a vault), Legal Hold is applied to individual recovery points. Use it when regulatory or legal proceedings require you to preserve specific backups indefinitely until the hold is released.

LOGICALLY AIR-GAPPED VAULTS

Air-gapped vaults provide enhanced ransomware resilience. They are isolated from source accounts, automatically locked in compliance mode, and require multi-party approval for critical recovery operations. Recovery points inside cannot be manually deleted. These vaults can be shared across accounts via AWS Resource Access Manager (RAM).

# ── Enable Vault Lock in Governance mode ──────────────────────────────
aws backup put-backup-vault-lock-configuration \
  --backup-vault-name "prod-compliance-vault" \
  --min-retention-days 30 \
  --max-retention-days 365

# ── Switch to Compliance mode (IRREVERSIBLE after 72h) ────────────────
aws backup put-backup-vault-lock-configuration \
  --backup-vault-name "prod-compliance-vault" \
  --min-retention-days 30 \
  --max-retention-days 365 \
  --changeable-for-days 3

# After 3 days, this lock becomes permanent and cannot be removed.
    
11 / Lifecycle

Lifecycle Management — Warm & Cold Storage

AWS Backup can automatically transition recovery points between storage tiers to optimize costs. Older backups that you rarely access can move to cold storage at a fraction of the price.

flowchart LR A["Backup Job completes"] --> B["Recovery Point created in WARM storage"] B -->|"After N days (MoveToColdStorageAfterDays)"| C["Recovery Point moved to COLD storage"] C -->|"After N days (DeleteAfterDays)"| D["Recovery Point DELETED"] style A fill:#1e2d45,stroke:#3b82f6,color:#93c5fd style B fill:#1a2235,stroke:#f59e0b,color:#fbbf24 style C fill:#1a2235,stroke:#3b82f6,color:#60a5fa style D fill:#1a2235,stroke:#ef4444,color:#f87171
Tier Cost Retrieval Minimum Duration
Warm Storage Standard pricing per GB/month Immediate — restore anytime None
Cold Storage Up to ~80% cheaper than warm Slower retrieval, higher restore cost 90 days minimum (charged even if deleted earlier)
Cold storage minimum: Recovery points in cold storage are charged for a minimum of 90 days. If DeleteAfterDays is set to less than 90 days after the cold transition, you still pay for the full 90 days. Plan your lifecycle rules accordingly.

SERVICES SUPPORTING COLD STORAGE

Not all services support cold storage transitions. Currently supported: EC2 (EBS), EFS, DynamoDB, Timestream, SAP HANA, and VMware VMs. Services like RDS, Aurora, S3, and Storage Gateway do not support cold storage — their backups remain in warm storage for the entire retention period.

// Example lifecycle in a backup rule
"Lifecycle": {
  "MoveToColdStorageAfterDays": 30,   // warm for 30 days
  "DeleteAfterDays":             365   // deleted after 1 year
}
// Cold storage duration: 365 - 30 = 335 days in cold
// Total retention: 365 days (30 warm + 335 cold)
    
12 / Monitoring

Monitoring & Alerts

Backups are only useful if they actually succeed. AWS Backup integrates with EventBridge, CloudWatch, and SNS so you always know when something goes wrong.

EVENTBRIDGE

Amazon EventBridge

Real-time event stream for all backup state changes — job started, completed, failed, copy jobs, restore jobs. Route events to Lambda, SNS, SQS, or any EventBridge target.

CLOUDWATCH

CloudWatch Metrics & Alarms

AWS Backup emits metrics every 5 minutes. Set CloudWatch Alarms on backup job failure counts, restore durations, or recovery point creation rates to get alerted proactively.

SNS

SNS Notifications

Configure per-vault SNS notifications for backup job completed, failed, or expired events. Delivers alerts directly to email, Slack (via Lambda), PagerDuty, or any SNS subscriber.

EVENTBRIDGE RULE — ALERT ON BACKUP FAILURE

The most common monitoring setup: an EventBridge rule that triggers an SNS notification whenever a backup job fails.

# ── Create SNS topic for backup alerts ────────────────────────────────
aws sns create-topic --name "backup-failure-alerts"

aws sns subscribe \
  --topic-arn "arn:aws:sns:us-east-1:123456789:backup-failure-alerts" \
  --protocol "email" \
  --notification-endpoint "ops-team@company.com"

# ── Create EventBridge rule to catch backup job failures ──────────────
aws events put-rule \
  --name "backup-job-failed" \
  --event-pattern '{
    "source": ["aws.backup"],
    "detail-type": ["Backup Job State Change"],
    "detail": {
      "state": ["FAILED", "ABORTED", "EXPIRED"]
    }
  }'

# ── Connect the rule to the SNS topic ─────────────────────────────────
aws events put-targets \
  --rule "backup-job-failed" \
  --targets '[{
    "Id": "sns-target",
    "Arn": "arn:aws:sns:us-east-1:123456789:backup-failure-alerts"
  }]'
    

BACKUP AUDIT MANAGER

For compliance-heavy environments, Backup Audit Manager continuously monitors your backup activity against a set of controls. You define a framework (e.g., "all resources must have a backup plan", "recovery points must be encrypted", "backups must run at least daily") and AWS generates daily compliance reports to S3. Integrates with AWS Audit Manager for SOC, HIPAA, and PCI audits.

RESTORE TESTING

Backups you never test are backups you can't trust. AWS Backup Restore Testing lets you create automated restore testing plans that periodically restore recovery points, run a validation Lambda function (e.g., connectivity check, data integrity query), and clean up afterward. This ensures your backups are actually restorable and tracks restore duration for SLA reporting.

AWS BACKUP ARCHITECTURE GUIDE · Generated with Claude · aws.amazon.com/backup