How Vaults, Plans, Jobs, Copy Jobs, and Restore Jobs are orchestrated — with diagrams and real-world examples.
AWS Backup can feel intimidating at first. Here's the mental model that makes everything click:
A Vault is your secure safe-deposit box. Your backups (called recovery points) sit inside it. You can have multiple vaults — one for prod, one for DR (disaster recovery), one for compliance.
A Backup Plan is like a
recurring calendar
event: "every night at 2 AM, back up everything tagged Backup=true and
keep
it for 30 days." You write the plan once; AWS does the work automatically.
Each time a backup runs, it creates a Recovery Point — a frozen copy of your resource at that exact moment. You can restore from any of these points later. Think of them like iPhone backups: you can roll back to yesterday's or last week's.
You'll see ARNs everywhere in AWS. They're just unique identifiers
for any resource,
like arn:aws:rds:us-east-1:123456789:db:my-database. When the docs say
"Recovery Point ARN", they mean the unique ID of a specific backup snapshot.
AWS has data centers worldwide (us-east-1 = N. Virginia, eu-west-1 = Ireland, etc.). "Cross-region copy" means sending a backup to a different geography so that if an entire region fails, you still have your data elsewhere.
AWS Backup needs permission to access your databases, EC2 instances,
etc. An IAM Role grants those permissions. AWS provides a default one
called AWSBackupDefaultServiceRole that works for most cases — just use
that to start.
Backup=true. You create a Backup Plan that says "back these up nightly". AWS
Backup
runs automatically, stores snapshots in a Vault, optionally copies them to another region for safety, and you can
restore any snapshot to a brand-new resource whenever needed.
AWS Backup is a fully managed service that centralizes and automates data protection across AWS services. Before diving into flows, understand each primitive.
An encrypted container that stores recovery points (backups). Each vault has an access policy and an optional vault lock (WORM). You can have multiple vaults per region/account.
A policy document that defines when to back up (schedule), how long to retain (lifecycle), and where to send copies (copy rules). Assigned to resources via tags or ARNs.
The actual execution: takes a snapshot or continuous backup of a resource and stores a recovery point in the target vault. Triggered by a plan rule or manually.
Copies an existing recovery point from one vault to another — either in the same region, a different region, or even a different AWS account. Useful for DR and compliance.
Recreates a resource from a recovery point stored in a vault. You specify the target configuration; AWS Backup handles provisioning the new resource.
The diagram below shows the top-level relationships between AWS Backup components and the protected resources.
A Backup Plan contains one or more rules, and is associated with resources via selections. Here's a real-world example plan JSON:
// Example: Production Backup Plan { "BackupPlanName": "prod-daily-backup-plan", "Rules": [ { "RuleName": "DailyToUsEast1", "TargetBackupVaultName": "prod-primary-vault", "ScheduleExpression": "cron(0 2 * * ? *)", // 2 AM UTC daily "StartWindowMinutes": 60, "CompletionWindowMinutes": 180, "Lifecycle": { "MoveToColdStorageAfterDays": 30, "DeleteAfterDays": 365 }, "CopyActions": [ // triggers a Copy Job after backup { "DestinationBackupVaultArn": "arn:aws:backup:eu-west-1:DR_ACCOUNT_ID:backup-vault:dr-vault", "Lifecycle": { "DeleteAfterDays": 90 } } ] }, { "RuleName": "WeeklyToUsEast1", "TargetBackupVaultName": "prod-primary-vault", "ScheduleExpression": "cron(0 3 ? * SUN *)", // Sunday 3 AM "Lifecycle": { "DeleteAfterDays": 1825 } // 5 years } ], "Selections": [ // what gets backed up by this plan { "SelectionName": "all-tagged-resources", "IamRoleArn": "arn:aws:iam::ACCOUNT:role/service-role/AWSBackupDefaultServiceRole", "ListOfTags": [ { "ConditionType": "STRINGEQUALS", "ConditionKey": "Backup", "ConditionValue":"true" } ] } ] }
| Field | Purpose | Example |
|---|---|---|
| ScheduleExpression | Cron expression for when jobs fire | cron(0 2 * * ? *) = 2 AM UTC daily |
| StartWindowMinutes | Window during which job must start, or it becomes EXPIRED (min 60 min, default 8 hrs) | 60 = job must start within 1 hour |
| CompletionWindowMinutes | Time from scheduled start by which job must complete, or it is cancelled (default 7 days) | 180 = 3 hours max runtime |
| MoveToColdStorageAfterDays | Auto-transition to cold storage (cheaper) | 30 = after 30 days → cold tier |
| DeleteAfterDays | Auto-delete the recovery point | 365 = deleted after 1 year |
| CopyActions | Cross-region/account copy after backup completes | Copy to EU DR vault |
When a plan rule fires, AWS Backup creates a Job for each matching resource. Each job goes through these states:
EventBridge rule triggers at the cron time. AWS Backup evaluates which resources match the plan's selection criteria (tags or ARNs).
One Backup Job per resource is created. The job enters PENDING while AWS Backup coordinates with the service (e.g. creates an EBS snapshot or RDS snapshot).
The actual backup data is written to the vault. For EBS this is a
snapshot. For EFS/DynamoDB it uses AWS Backup's native transfer. Progress is trackable via GetBackupJobStatus.
On COMPLETED, a recovery point ARN is generated and stored in the vault. Metadata (creation time, resource type, encryption) is attached.
If the rule has CopyActions, a
Copy Job is automatically spawned to replicate the recovery point to the target vault.
Copy Jobs replicate recovery points between vaults. They are the backbone of multi-region DR strategies and compliance isolation.
| Scenario | Requirement | Notes |
|---|---|---|
| Same-region copy | Source & dest vault in same region | Good for compliance vault isolation |
| Cross-region copy | IAM role with cross-region permissions | Both regions must be enabled in account |
| Cross-account copy | Dest vault must add source account to access policy | Use AWS Organizations for easier setup |
| Cross-account + cross-region | Both org policies and vault access policies updated | Recommended for DR isolation |
// Destination vault access policy (allows Account A to copy in) { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::SOURCE_ACCOUNT_ID:root" }, "Action": [ "backup:CopyIntoBackupVault" ], "Resource": "*" }] }
A Restore Job recreates an AWS resource from a recovery point stored in a vault. Think of it like pulling a saved game — AWS rebuilds the resource from that snapshot. For most services, it creates a brand-new resource and never touches the original. (Exceptions: S3 can restore to an existing bucket; EFS supports item-level restore into an existing file system.)
prod-db-restored.abc123.rds.amazonaws.com)| Service | What's restored | New resource? | Cutover needed? |
|---|---|---|---|
| RDS / Aurora | New DB instance / cluster from snapshot | New endpoint + ARN | Yes — update connection string |
| EC2 (EBS) | New EC2 instance from AMI created from snapshot | New Instance ID + Volume IDs | Update target groups / DNS |
| EFS | New EFS file system | New FS ID + DNS | Yes — remount or update mount target |
| DynamoDB | New table (point-in-time) | New table name | Yes — update app table reference |
| S3 | Objects to same or different bucket | Same or new bucket | Only if new bucket name |
| Aurora | New Aurora cluster | New cluster ARN + endpoint | Yes — update connection string |
Click a tab to see the restore code for each service.
prod-mysql-restored-20240117.abc.us-east-1.rds.amazonaws.com.aws configure), and your
IAM user must have backup:* and rds:* permissions.
# ───────────────────────────────────────────────────────────────────── # STEP 1 — Find available recovery points (backups) in your vault # This lists all RDS backups. Look at "Created" to find the one # from the date/time you want to restore from. # ───────────────────────────────────────────────────────────────────── aws backup list-recovery-points-by-backup-vault \ --backup-vault-name "prod-primary-vault" \ --by-resource-type "RDS" \ --query 'RecoveryPoints[*].{ARN:RecoveryPointArn,Created:CreationDate,Status:Status}' \ --output table # Example output: # ----------------------------------------------------------------------- # | ListRecoveryPointsByBackupVault | # +------------------------------+-----------+---------------------------+ # | ARN | Created | Status | # +------------------------------+-----------+---------------------------+ # | arn:aws:rds:...:awsbackup-.. | 2024-01-17| COMPLETED | # | arn:aws:rds:...:awsbackup-.. | 2024-01-16| COMPLETED | # +------------------------------+-----------+---------------------------+ # ↑ Copy the ARN of the backup you want to restore from # ───────────────────────────────────────────────────────────────────── # STEP 2 — Ask AWS what parameters are needed to restore this backup. # AWS Backup returns a JSON object with all the config of the # original DB (instance class, engine, subnet group, etc.) # You'll use this in Step 3 — just change the DB name. # ───────────────────────────────────────────────────────────────────── aws backup get-recovery-point-restore-metadata \ --backup-vault-name "prod-primary-vault" \ --recovery-point-arn "arn:aws:rds:us-east-1:123456789:snapshot:awsbackup-2024-01-17-02-30" # Returns something like: # { # "DBInstanceIdentifier": "prod-mysql", ← original DB name # "DBInstanceClass": "db.t3.medium", ← instance size # "Engine": "mysql", ← database engine # "MultiAZ": "false", ← high-availability setting # "DBSubnetGroupName": "prod-subnet-group", # "VpcSecurityGroupIds": "sg-0abc123" # } # ↑ Copy this output. You'll paste it into Step 3, changing only # DBInstanceIdentifier to a new unique name. # ───────────────────────────────────────────────────────────────────── # STEP 3 — Start the Restore Job. # IMPORTANT: Change "DBInstanceIdentifier" to a NEW name. # If you use the same name as the original, it will FAIL because # a DB with that name already exists. # ───────────────────────────────────────────────────────────────────── aws backup start-restore-job \ --recovery-point-arn "arn:aws:rds:us-east-1:123456789:snapshot:awsbackup-2024-01-17-02-30" \ --iam-role-arn "arn:aws:iam::123456789:role/service-role/AWSBackupDefaultServiceRole" \ --resource-type "RDS" \ --metadata '{ "DBInstanceIdentifier": "prod-mysql-restored-20240117", "DBInstanceClass": "db.t3.medium", "Engine": "mysql", "MultiAZ": "false", "DBSubnetGroupName": "prod-subnet-group", "VpcSecurityGroupIds": "sg-0abc123" }' # Returns: { "RestoreJobId": "ABCDEF123456" } # ↑ Save this ID — you need it to check progress in Step 4 # ───────────────────────────────────────────────────────────────────── # STEP 4 — Monitor restore progress (takes 15-30 min for most DBs) # Run this every few minutes. Status goes: # PENDING → RUNNING → COMPLETED (or FAILED) # ───────────────────────────────────────────────────────────────────── aws backup describe-restore-job \ --restore-job-id "ABCDEF123456" # When COMPLETED, you'll see: # { "Status": "COMPLETED", "CreatedResourceArn": "arn:aws:rds:...:db:prod-mysql-restored-20240117" } # ───────────────────────────────────────────────────────────────────── # STEP 5 — Get the hostname of the new database # This is the address your app needs to connect to. # ───────────────────────────────────────────────────────────────────── aws rds describe-db-instances \ --db-instance-identifier "prod-mysql-restored-20240117" \ --query 'DBInstances[0].Endpoint.Address' # Output: "prod-mysql-restored-20240117.abc123.us-east-1.rds.amazonaws.com" # ↑ This is your new database hostname # ───────────────────────────────────────────────────────────────────── # STEP 6 — Update Secrets Manager so your app picks up the new host # (If you store DB credentials in Secrets Manager — recommended) # After this, restart your app containers/servers to reconnect. # ───────────────────────────────────────────────────────────────────── aws secretsmanager update-secret \ --secret-id "prod/db/connection" \ --secret-string '{"host":"prod-mysql-restored-20240117.abc123.us-east-1.rds.amazonaws.com","port":3306,"username":"admin","password":"your-password"}'
Here is how all components interact in a complete backup + DR + restore scenario, from schedule fire to successful restore.
A production web app with EC2, RDS, and EFS — backed up daily with cross-region DR copies. Here's the full setup and what happens each night:
EventBridge triggers the DailyBackup rule. AWS Backup evaluates all
resources tagged Backup=true — finds EC2, RDS, EFS. Creates 3 Backup
Jobs.
Each job begins. RDS creates a native snapshot; EBS snapshot taken for EC2 volumes; EFS backup streamed to vault. These run in parallel.
3 recovery points now stored in prod-primary-vault in us-east-1.
Retention lifecycle: warm for 30 days, then auto-deleted.
Because CopyActions is defined, 3 Copy Jobs are automatically created.
They stream the recovery points to
dr-vault in eu-west-1 under the DR account.
All 3 recovery points are now replicated to EU. DR account has 90-day retention. Primary backups remain independent in us-east-1.
Ops team decides to restore RDS from the 02:30 recovery point in the DR
vault in eu-west-1.
Restore Job provisions a new RDS instance in eu-west-1 from the copied recovery point. Parameters: new DB identifier, same instance class, target VPC.
New RDS instance available at new endpoint. Ops validates data integrity, then updates application config / Route 53 to point to new DB. RPO: ~7 hours. RTO: ~20 minutes.
AWS Backup supports a wide range of services. Not all features are available for every service — check the matrix below.
| Category | Service | Continuous / PITR | Cold Storage | Cross-Region Copy | Cross-Account Copy |
|---|---|---|---|---|---|
| Compute | Amazon EC2 (incl. VSS-enabled Windows) | No | Yes | Yes | Yes |
| Block Storage | Amazon EBS | No | Yes | Yes | Yes |
| File Storage | Amazon EFS | No | Yes | Yes | Yes |
| File Storage | Amazon FSx (ONTAP: no cross-region/account) | No | No | Yes | Yes |
| Object Storage | Amazon S3 | Yes | No | Yes | Yes |
| Relational DB | Amazon RDS (all engines) | Yes | No | Yes | Yes |
| Relational DB | Amazon Aurora | Yes | No | Yes | Yes |
| NoSQL DB | Amazon DynamoDB (advanced features required) | Yes | Yes | Yes | Yes |
| Document DB | Amazon DocumentDB | No | No | Yes | Yes |
| Graph DB | Amazon Neptune | No | No | Yes | Yes |
| Data Warehouse | Amazon Redshift | No | No | No | No |
| Time Series | Amazon Timestream | No | Yes | Yes | Yes |
| Containers | Amazon EKS | No | No | Yes | Yes |
| Hybrid | AWS Storage Gateway (Volume) | No | No | Yes | Yes |
| Hybrid | VMware VMs (via Backup Gateway) | No | Yes | Yes | Yes |
| SAP | SAP HANA on EC2 | Yes | Yes | Yes | Yes |
| IaC | AWS CloudFormation (stacks) | No | No | Yes | Yes |
AWS Backup provides multiple layers of protection for your recovery points — from encryption and access policies to immutable WORM locks and legal holds.
Every vault is tied to an AWS KMS key. Recovery points are encrypted at rest and in transit. You can use the AWS-managed key or your own customer-managed key (CMK) for full control.
Resource-based policies on each vault control who can create, copy, or delete recovery points. Use these to restrict cross-account access or prevent unauthorized deletion.
All AWS Backup API calls are logged to CloudTrail. Every backup, copy, restore, and deletion is recorded with who did it, when, and from where — critical for compliance audits.
Vault Lock enforces a Write-Once, Read-Many (WORM) model on a vault. Once locked, recovery points cannot be deleted before their retention period expires — not even by the root account.
| Feature | Governance Mode | Compliance Mode |
|---|---|---|
| Removable? | Yes, by users with sufficient IAM permissions | No — immutable after minimum 72-hour cooling-off period |
| Delete recovery points early? | No (unless lock is removed first) | No — never, by anyone |
| Change retention? | Only to extend (not shorten) | Only to extend (not shorten) |
| Regulatory compliance | Partial — good for internal policies | SEC 17a-4, CFTC, FINRA compliant |
| Use case | Test before committing to compliance | Production compliance vaults |
A Legal Hold preserves specific recovery points from deletion regardless of their lifecycle or retention policies. Unlike Vault Lock (which protects all backups in a vault), Legal Hold is applied to individual recovery points. Use it when regulatory or legal proceedings require you to preserve specific backups indefinitely until the hold is released.
Air-gapped vaults provide enhanced ransomware resilience. They are isolated from source accounts, automatically locked in compliance mode, and require multi-party approval for critical recovery operations. Recovery points inside cannot be manually deleted. These vaults can be shared across accounts via AWS Resource Access Manager (RAM).
# ── Enable Vault Lock in Governance mode ────────────────────────────── aws backup put-backup-vault-lock-configuration \ --backup-vault-name "prod-compliance-vault" \ --min-retention-days 30 \ --max-retention-days 365 # ── Switch to Compliance mode (IRREVERSIBLE after 72h) ──────────────── aws backup put-backup-vault-lock-configuration \ --backup-vault-name "prod-compliance-vault" \ --min-retention-days 30 \ --max-retention-days 365 \ --changeable-for-days 3 # After 3 days, this lock becomes permanent and cannot be removed.
AWS Backup can automatically transition recovery points between storage tiers to optimize costs. Older backups that you rarely access can move to cold storage at a fraction of the price.
| Tier | Cost | Retrieval | Minimum Duration |
|---|---|---|---|
| Warm Storage | Standard pricing per GB/month | Immediate — restore anytime | None |
| Cold Storage | Up to ~80% cheaper than warm | Slower retrieval, higher restore cost | 90 days minimum (charged even if deleted earlier) |
DeleteAfterDays is set to less than 90 days after the cold
transition, you still pay for the full 90 days. Plan your lifecycle rules accordingly.
Not all services support cold storage transitions. Currently supported: EC2 (EBS), EFS, DynamoDB, Timestream, SAP HANA, and VMware VMs. Services like RDS, Aurora, S3, and Storage Gateway do not support cold storage — their backups remain in warm storage for the entire retention period.
// Example lifecycle in a backup rule "Lifecycle": { "MoveToColdStorageAfterDays": 30, // warm for 30 days "DeleteAfterDays": 365 // deleted after 1 year } // Cold storage duration: 365 - 30 = 335 days in cold // Total retention: 365 days (30 warm + 335 cold)
Backups are only useful if they actually succeed. AWS Backup integrates with EventBridge, CloudWatch, and SNS so you always know when something goes wrong.
Real-time event stream for all backup state changes — job started, completed, failed, copy jobs, restore jobs. Route events to Lambda, SNS, SQS, or any EventBridge target.
AWS Backup emits metrics every 5 minutes. Set CloudWatch Alarms on backup job failure counts, restore durations, or recovery point creation rates to get alerted proactively.
Configure per-vault SNS notifications for backup job completed, failed, or expired events. Delivers alerts directly to email, Slack (via Lambda), PagerDuty, or any SNS subscriber.
The most common monitoring setup: an EventBridge rule that triggers an SNS notification whenever a backup job fails.
# ── Create SNS topic for backup alerts ──────────────────────────────── aws sns create-topic --name "backup-failure-alerts" aws sns subscribe \ --topic-arn "arn:aws:sns:us-east-1:123456789:backup-failure-alerts" \ --protocol "email" \ --notification-endpoint "ops-team@company.com" # ── Create EventBridge rule to catch backup job failures ────────────── aws events put-rule \ --name "backup-job-failed" \ --event-pattern '{ "source": ["aws.backup"], "detail-type": ["Backup Job State Change"], "detail": { "state": ["FAILED", "ABORTED", "EXPIRED"] } }' # ── Connect the rule to the SNS topic ───────────────────────────────── aws events put-targets \ --rule "backup-job-failed" \ --targets '[{ "Id": "sns-target", "Arn": "arn:aws:sns:us-east-1:123456789:backup-failure-alerts" }]'
For compliance-heavy environments, Backup Audit Manager continuously monitors your backup activity against a set of controls. You define a framework (e.g., "all resources must have a backup plan", "recovery points must be encrypted", "backups must run at least daily") and AWS generates daily compliance reports to S3. Integrates with AWS Audit Manager for SOC, HIPAA, and PCI audits.
Backups you never test are backups you can't trust. AWS Backup Restore Testing lets you create automated restore testing plans that periodically restore recovery points, run a validation Lambda function (e.g., connectivity check, data integrity query), and clean up afterward. This ensures your backups are actually restorable and tracks restore duration for SLA reporting.