If you want to automatically delete old container images from your Elastic Container Registry, the solution is a quite simple ECR Lifecycle Rule that deletes images e.g. 7 days after they have been pushed to the registry.
If you however want to always keep the image tagged production
, possibly because that is a floating tag always associated with the image currently deployed to production
, the situation suddenly is not so simple any more. ECR does not provide a keep
action in their lifecycle rules, only "expire". The logic here is that you're always expressing when to delete an image. If you never want to delete an image with a given tag, you'll have to express that as deleting it under a condition that will never occur. Unfortunately there is no clear way to express the intent of "keep this image" via the lifecycle rules.
Sample lifecycle rule
Attention
This rule needs to be adapted to your setup and needs to be tested before it's activated. Activating lifecycle rules can instantly delete images from registries.
We assume the following for this examples:
- all images are tagged with
history-20231225
for the timestamp when they were built. - One image is also tagged production, since that's what's deployed.
- We want to delete images older than 10 days, except if they're deployed to production
{
"rules": [
{
"rulePriority": 1,
"description": "Keep image tagged production",
"selection": {
"tagPrefixList": ["production"],
"tagStatus": "tagged",
"countType": "imageCountMoreThan",
"countNumber": 9999
},
"action": {
"type": "expire"
}
},
{
"rulePriority": 10,
"description": "Expire tagged images older than 10 days",
"selection": {
"tagPrefixList": [ "history" ],
"tagStatus": "tagged",
"countType": "sinceImagePushed",
"countUnit": "days",
"countNumber": 10
},
"action": {
"type": "expire"
}
}
]
}
Careful when testing
Testing these rules is a little confusing because of the wording in the AWS console. It's a good idea to test rules like these before applying them to production. When entering these rules in the screen behind the Edit Test Rules button in the AWS console and testing it, the results section has this headline:
Image matches for test lifecycle rules
However what's listed here is actions taken, i.e. the production
and staging
images should not appear here since no actions are to be taken for images with these tags. They did, however match rules. It's just that no action has been taken for them and, thus, they're not listed here.
So what you want to check for when testing is that the images tagged production
and staging
do not appear in the list.
Why that works
An image that matches the tagging requirements of a rule cannot be expired by a rule with a lower priority.
-- AWS ECR docs on Lifecycle policy evaluation rules Show archive.org snapshot
This means that if an image is already caught by a rule and has its expiry configuration set, another rule won't touch it. In this case, we're looking at the production
tag and setting it so that it will only delete images with this tag if there's more than 9999
images tagged with production
present. Keep in mind this is a prefix match, i.e. if you're using strings that start with production as tags, this rule will match after there are 10000 images are present and start deleting images.
The later rule with time based expiry rules will no longer affect the image tagged production
.
Not the most obvious way to design expiry rules.