AWS VPC CNI: Prefix Delegation Explained

What is the VPC CNI?

The AWS VPC CNI (Container Network Interface) assigns real VPC IP addresses to your Kubernetes pods. Unlike overlay-network CNIs, every pod gets an actual subnet IP — making pods first-class VPC citizens.

Every pod consumes a real subnet IP — your subnet size directly limits your total pod count.

What Actually Determines Your Subnet IP Usage

There is no single bottleneck. Four factors combine to determine whether your subnet can handle prefix delegation:

Instance type

Sets the ceiling — how many ENI slots exist, and therefore how many /28 blocks the node could hold. A t3.medium has 15 slots; an m5.4xlarge has 232. But this is just the hardware limit, not what gets allocated.

Actual pod count

Determines how many /28 blocks get actually allocated. The VPC CNI allocates on demand: ceil(pods / 16) + WARM_PREFIX_TARGET. A node with 30 pods uses 3 blocks, not 15.

Subnet size

Determines how many /28 blocks are available. A /24 has only 15 usable blocks. A /22 has 63. A /20 has 255. Smaller subnets hit the wall faster.

Nodes per subnet

Multiplies the demand. Each node in a subnet consumes its own /28 blocks. 5 nodes × 3 blocks each = 15 blocks — that's an entire /24 subnet. Spreading nodes across AZs helps.

The formula

blocks_needed = nodes_per_subnet × (ceil(actual_pods / 16) + WARM_PREFIX_TARGET)

fits? = blocks_needed ≤ usable /28 blocks in subnet

Example A: 2 nodes × 30 pods × /24 → 2 × (2+1) = 6 blocks ≤ 15 ✓

Example B: 2 nodes × 110 pods × /24 → 2 × (7+1) = 16 blocks > 15 ✗

Example C: 5 nodes × 30 pods × /22 → 5 × (2+1) = 15 blocks ≤ 63 ✓

Concrete Example: 3 × t3.medium on /24

t3.medium has 3 ENIs with 6 IPs each (5 usable slots per ENI after reserving the primary). A /24 subnet has 256 IPs total, 251 usable, and 15 allocatable /28 blocks. With 3 AZs, each subnet holds 1 node. EKS default: max-pods=110, WARM_PREFIX_TARGET=1. Prefixes are dynamic: a node can start with fewer /28s, then grow toward ceil(pods/16)+warm (8 at 110 pods).

	Default Mode	Prefix Delegation
Max pods per node	17	110 (capped by EKS)
Total pods (3 nodes)	51	330
Subnet IPs consumed per node	18	131 (8 × 16 + 3 ENI IPs)
Subnet utilization (per AZ)	7% (18 / 251)	52% (131 / 251)
/28 blocks used (per subnet)	N/A	8 / 15 (53%)
Room for another node?	Yes — 12+ more easily	Barely — 7 blocks left, need 8

The /24 squeeze with prefix delegation At full load (110 pods), one t3.medium needs 8 /28 blocks = 128 IPs from a /24 that only has 15 blocks. That's 53% of the subnet for a single node — a second node won't fit. Default mode uses just 18 IPs (7%).

The key insight: In default mode, 1 pod = 1 subnet IP (efficient, predictable). With prefix delegation, the CNI allocates /28 blocks (16 IPs each) — even a block serving 1 pod reserves all 16 IPs. In this t3.medium /24 example, you get ~6.5× more pod capacity but reserve ~7.3× more subnet IPs at max load. On small subnets (/24 or smaller), this trade-off can exhaust your address space fast.

Use Prefix Delegation When

• You need >17 pods per node (or more than your instance's default limit)

• Your subnets are /22 or larger (64+ /28 blocks)

• You run dense workloads — many small containers per node

• You can use a secondary CIDR (e.g. 100.64.0.0/16) for pod IPs

Don't Use Prefix Delegation When

• Your subnets are /24 or smaller — you'll exhaust /28 blocks instantly

• You run fewer than 20 pods per node — default mode is sufficient

• You value subnet IP efficiency over pod density

• You have many nodes sharing small subnets — fragmentation will bite

What is an Elastic Network Interface (ENI)?

An Elastic Network Interface (ENI) is a virtual network card that you attach to an EC2 instance. Think of it as a physical NIC (Network Interface Card) in a traditional server — but virtualized and managed by AWS. Each ENI gets its own private IP address, MAC address, security group memberships, and can optionally have a public IP or Elastic IP.

Every EC2 instance launches with at least one ENI (the primary ENI, eth0). You can attach additional ENIs up to the instance type's limit. Each ENI can also hold multiple secondary private IP addresses beyond its primary IP — this is the mechanism that makes Kubernetes pod networking possible on AWS.

Why does AWS use ENIs?

ENIs provide VPC-native networking. Each IP address on an ENI is a real, routable VPC IP. This means pods get first-class VPC citizenship — no overlay networks, no NAT, no encapsulation overhead. Pods can communicate directly with RDS, ElastiCache, and other VPC resources using native VPC routing.

How does it work?

The VPC CNI plugin (aws-node DaemonSet) runs on every node. It pre-allocates ENIs and secondary IPs from the subnet. When a pod is scheduled, the CNI assigns one of these pre-allocated IPs to the pod's network namespace using veth pairs and Linux routing rules.

Why is it useful?

Performance & simplicity. VPC-native IPs mean no packet encapsulation overhead (unlike flannel/Calico VXLAN). Security groups apply directly. VPC Flow Logs capture pod traffic. AWS load balancers can target pods directly via IP mode. Network policies map cleanly to VPC constructs.

ENI Lifecycle on a Kubernetes Node

1. Node boots

The primary ENI (eth0) is attached at launch. The VPC CNI plugin starts and allocates secondary IPs on this ENI for pods.

2. Warm pool fills

The CNI maintains a "warm pool" of available IPs. When the pool runs low, it attaches a new ENI or requests more secondary IPs/prefixes from the EC2 API.

3. Pod scheduled

kubelet calls the CNI plugin. It picks an IP from the warm pool, creates a veth pair, and configures routing so the pod can send/receive traffic using that VPC IP.

4. Pod terminates

The IP is returned to the warm pool for reuse. If enough IPs are free, the CNI may release an ENI back to the subnet (controlled by WARM_ENI_TARGET).

Key limitation: The number of ENIs and IPs per ENI is fixed by the EC2 instance type. A t3.medium supports 3 ENIs with 6 IPs each, while an m5.xlarge supports 4 ENIs with 15 IPs each. This is a hard limit set by the EC2 hypervisor and cannot be changed. Prefix delegation works within these same slot limits but assigns /28 blocks (16 IPs) per slot instead of individual IPs.

How ENIs Work on EC2

Each EC2 instance has a limit on ENIs and secondary IPs per ENI. These limits determine max pod capacity and vary by instance type.

EC2 Node

t3.medium

ENI 0 (primary)

eth0 — node IP

ENI 1

eth1

ENI 2

eth2

→

Default: Individual IPs

ENI 0: 1 reserved + 5 pod IPs

ENI 1: 6 pod IPs

ENI 2: 6 pod IPs

Total: 17 pods max

Prefix Delegation: /28 Blocks

ENI 0: 1 reserved + 5 prefix slots

ENI 1: 6 prefix slots

ENI 2: 6 prefix slots

Each slot = 1 × /28 = 16 IPs

Total: ~110+ pods max

Instance Type Comparison

Instance	Max ENIs	IPs/ENI	Default Pods	Prefix Pods	/28s at full load
t3.small	3	4	11	110	8
t3.medium	3	6	17	110	8
m5.large	3	10	29	110	8
m6g.large	3	10	29	110	8
m5.xlarge	4	15	58	110	8
m5.2xlarge	4	15	58	110	8
c5.4xlarge	8	30	234	110	8

Key insight: EKS defaults max-pods=110 for all instance types. With prefix delegation, the VPC CNI only allocates ceil(pods/16) + WARM_PREFIX_TARGET /28 blocks — not all ENI slots. So at full 110-pod load + 1 warm prefix, every instance needs just 8 /28 blocks (128 IPs). Larger instances give you headroom to raise --max-pods up to 250.

What is a /28 Block?

In CIDR notation, a /28 prefix means the first 28 bits of the IP address are fixed, leaving 4 bits for host addresses. That gives exactly 2⁴ = 16 IP addresses.

When prefix delegation is enabled, the VPC CNI plugin asks the EC2 API to assign entire /28 blocks to each ENI slot instead of individual secondary IPs. This is done via the AssignPrivateIpAddresses API with the Ipv4PrefixCount parameter. AWS requires these blocks to be contiguous and naturally aligned — the starting IP must fall on a 16-IP boundary.

Why /28 specifically?

AWS chose /28 as the prefix size because it balances granularity with efficiency. A /28 (16 IPs) is small enough to avoid massive waste on low-utilization nodes, but large enough to significantly boost pod density per ENI slot. Each slot that previously held 1 IP now holds 16.

Alignment requirement

A /28 block must start at an IP whose last octet is divisible by 16 (0, 16, 32, 48...). AWS cannot carve a /28 starting at an arbitrary IP. If free IPs exist but aren't aligned into a contiguous block of 16, allocation fails — this is the root cause of fragmentation.

Natural /28 boundaries in 10.0.1.0/24:

10.0.1.0–1510.0.1.16–3110.0.1.32–4710.0.1.48–63 10.0.1.64–7910.0.1.80–9510.0.1.96–11110.0.1.112–127 10.0.1.128–14310.0.1.144–15910.0.1.160–17510.0.1.176–191 10.0.1.192–20710.0.1.208–22310.0.1.224–23910.0.1.240–255

16 total × 16 IPs = 256. AWS reserves 5 IPs in the subnet (0, 1, 2, 3, and 255), so this page uses a practical planning estimate of ~15 allocatable /28 blocks per /24.

AWS reserved IPs in every subnet

AWS reserves the first 4 and last 1 IP in every VPC subnet, regardless of size:

10.0.1.0 — Network address
10.0.1.1 — VPC router
10.0.1.2 — DNS server
10.0.1.3 — Reserved for future use
10.0.1.255 — Broadcast (not supported in VPC, but reserved)

10.0.1.0/24 layout — each cell = 1 IP:

AWS Reserved

Block A (Node 1, ENI 1)

Block B (Node 1, ENI 2)

Block C (Node 2)

Free

/28 Alignment Math

A valid /28 start IP must have its last octet divisible by 16. This is called natural alignment — the block boundaries are fixed by the IP address space, not chosen by the user.

Formula: IP_last_octet mod 16 = 0 → valid /28 start 10.0.1.0 → 0 mod 16 = 0 ✓ valid
10.0.1.16 → 16 mod 16 = 0 ✓ valid
10.0.1.32 → 32 mod 16 = 0 ✓ valid
10.0.1.7 → 7 mod 16 = 7 ✗ NOT a valid /28 start
10.0.1.20 → 20 mod 16 = 4 ✗ NOT a valid /28 start

This means you cannot "shift" a /28 block to use arbitrary free IPs. If IPs 10.0.1.5–10.0.1.20 are free, that's 16 IPs — but they span two alignment boundaries (block 0 and block 1), so AWS cannot allocate a /28 from them. This constraint is what makes fragmentation so dangerous.

How Prefix Delegation Differs from Default Mode

	Default mode	Prefix delegation
Each ENI slot holds	1 secondary IP	1 × /28 prefix (16 IPs)
Subnet IPs per slot	1	16
IP allocation granularity	Individual IPs	16-IP aligned blocks
Wasted IPs (1 pod on slot)	0	15
Fragmentation risk	Low	High
Pod density	Low (limited by ENI count)	High (16x more per slot)

AWS Documentation References

Increase the amount of available IP addresses for your Amazon EC2 nodes

Official EKS guide on enabling prefix delegation, configuration, and subnet sizing recommendations.

docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html

Amazon VPC CNI plugin for Kubernetes

Plugin configuration including ENABLE_PREFIX_DELEGATION, WARM_PREFIX_TARGET, WARM_IP_TARGET, and MINIMUM_IP_TARGET environment variables.

docs.aws.amazon.com/eks/latest/userguide/managing-vpc-cni.html

Elastic network interfaces (ENI) limits

Full table of ENI counts and IPv4 addresses per ENI for every EC2 instance type. Used to calculate max pod capacity.

docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI

AssignPrivateIpAddresses API

The EC2 API call the CNI uses to assign /28 prefixes. Documents the Ipv4PrefixCount parameter and alignment constraints.

docs.aws.amazon.com/AWSEC2/latest/APIReference/API_AssignPrivateIpAddresses.html

VPC subnet sizing

Details on AWS-reserved IPs in each subnet and CIDR block sizing considerations for VPCs.

docs.aws.amazon.com/vpc/latest/userguide/configure-subnets.html#subnet-sizing

What is Subnet Fragmentation?

Fragmentation happens when a subnet has enough total free IPs but they are scattered across non-contiguous blocks, making it impossible for AWS to allocate new /28 prefixes. This is similar to disk fragmentation — free space exists, but not in usable chunks.

Why it happens

When nodes join and leave a cluster over time, they allocate and release /28 blocks at different positions. Released blocks may not be adjacent to other free blocks, creating gaps too small for a new /28.

Why it's dangerous

New nodes fail to start because the CNI cannot allocate prefixes. Monitoring tools show "free IPs available" but pod scheduling fails with IP exhaustion errors, making the issue confusing to diagnose.

How to prevent it

Use larger subnets (/22 or bigger) so there are many more /28 boundaries available. The more blocks a subnet has, the less likely all aligned boundaries are consumed.

Example (simplified demo): A /24 subnet has 251 usable IPs and 15 allocatable /28 blocks. If 4 nodes each take 3 blocks = 12 blocks used. Only 3 remain. A 5th node requesting 3 blocks will fail even though 48+ individual IPs are "free."

Interactive Demo

This simulates a 10.0.1.0/24 subnet (256 IPs, 15 usable /28 blocks). For visualization, each node in this demo claims a fixed 3 × /28 blocks when it joins. Real clusters allocate prefixes dynamically based on pod demand. Click "Add Node" to watch the subnet fill up and eventually fragment. Hover over cells to see individual IP addresses.

Press "Add Node" to start →

Step-by-Step: What Happens When a Node Joins

kubelet starts on new EC2 node

The aws-node daemonset (VPC CNI) initializes. It reads the instance metadata to determine the ENI capacity and prefix delegation settings.

CNI requests /28 prefix assignments from the EC2 API

For each ENI, the CNI calls AssignPrivateIpAddresses with Ipv4PrefixCount=N. AWS searches the subnet for N aligned, contiguous 16-IP blocks. If it can't find them, the call fails.

AWS reserves all 16 IPs in each /28 block

Even if the node runs zero pods, all IPs in the allocated /28 blocks are marked as "in use" at the VPC level. No other ENI (on any instance) can use these IPs. This is the root cause of IP exhaustion with prefix delegation.

Pods schedule and get IPs from the warm pool

When a pod is scheduled, the CNI assigns it one of the pre-allocated IPs from the /28 prefix. This is fast (no AWS API call per pod). The WARM_PREFIX_TARGET setting controls how many extra prefixes to keep ready.

Fragmentation — new nodes fail to join

Over time, nodes join and leave. Released /28 blocks leave gaps. Even if the subnet has 50+ free IPs, if no aligned block of 16 contiguous IPs exists, the EC2 API returns InsufficientCidrBlocks. New nodes cannot start, and pending pods stay in ContainerCreating state.

How to Spot Fragmentation in Production

These are the common symptoms and commands to diagnose IP exhaustion caused by /28 fragmentation:

Symptom: Pods stuck in ContainerCreating

kubectl describe pod shows: failed to assign an IP address to container

kubectl get pods --field-selector=status.phase!=Running -A

Symptom: aws-node logs show allocation failure

The CNI daemonset logs will contain errors about prefix allocation:

kubectl logs -n kube-system -l k8s-app=aws-node --tail=50 | grep -i "failed\|error\|prefix"

Diagnosis: Check subnet available IPs

If AvailableIpAddressCount looks healthy but nodes can't allocate, it's fragmentation — free IPs exist but not in aligned /28 blocks:

aws ec2 describe-subnets --subnet-ids <id> \
--query 'Subnets[].[SubnetId,AvailableIpAddressCount,CidrBlock]'

Diagnosis: View prefix assignments on a node

See which /28 prefixes are currently assigned to a node's ENIs:

aws ec2 describe-network-interfaces \
--filters Name=attachment.instance-id,Values=<instance-id> \
--query 'NetworkInterfaces[].[Ipv4Prefixes,PrivateIpAddresses]'

VPC CNI Capacity Calculator

Configure your cluster parameters below. Results update automatically.

Instance type

Subnet CIDR

Number of nodes

Number of AZs

Avg pods per node

0 = show max only

Solutions & Mitigations

Expand your subnets Recommended

Move to /22 or larger. Gives you 64 aligned /28 blocks. Create new subnets and migrate node groups via rolling updates.

Use a secondary CIDR block

Attach a secondary CIDR (e.g. 100.64.0.0/8) to your VPC. Create subnets in this range purely for pod IPs, keeping your primary subnet free.

Disable prefix delegation

If you don't need 100+ pods/node, revert to individual IP assignment. Requires draining and re-adding nodes — not a hot-swap.

Use smaller instance types

Smaller instances request fewer /28 blocks per node. Each block consumed by one node isn't available for others.

Tune WARM_PREFIX_TARGET

Lower the WARM_PREFIX_TARGET env variable on the CNI daemonset to cut wasted pre-allocated blocks on lightly-used nodes.

Useful AWS CLI Commands

# Check if prefix delegation is enabled

kubectl describe daemonset aws-node -n kube-system | grep ENABLE_PREFIX_DELEGATION

# View available IPs in your subnet

aws ec2 describe-subnets --subnet-ids <subnet-id> --query 'Subnets[].AvailableIpAddressCount'

# Disable prefix delegation (drain and recycle nodes after!)

kubectl set env daemonset aws-node -n kube-system ENABLE_PREFIX_DELEGATION=false

# Check ENI assignments on a node

aws ec2 describe-network-interfaces --filters Name=attachment.instance-id,Values=<instance-id>