Stefan Langenmaier
1 year
Claus-Theodor Riegg
6 years
Andreas Vöst
1 year

How to Protect container images with production tag from ECR lifecycle rules

Posted . Visible to the public.

If you want to automatically delete old container images from your Elastic Container Registry, the solution is a quite simple ECR Lifecycle Rule that deletes images e.g. 7 days after they have been pushed to the registry.

If you however want to always keep the image tagged production, possibly because that is a floating tag always associated with the image currently deployed to production, the situation suddenly is not so simple any more. ECR does not provide a keep action in their lifecycle rules, only "expire". The logic here is that you're always expressing when to delete an image. If you never want to delete an image with a given tag, you'll have to express that as deleting it under a condition that will never occur. Unfortunately there is no clear way to express the intent of "keep this image" via the lifecycle rules.

Sample lifecycle rule

Attention

This rule needs to be adapted to your setup and needs to be tested before it's activated. Activating lifecycle rules can instantly delete images from registries.

We assume the following for this examples:

  • all images are tagged with history-20231225 for the timestamp when they were built.
  • One image is also tagged production, since that's what's deployed.
  • We want to delete images older than 10 days, except if they're deployed to production
{
    "rules": [
      {
        "rulePriority": 1,
        "description": "Keep image tagged production",
        "selection": {
          "tagPrefixList": ["production"],
          "tagStatus": "tagged",
          "countType": "imageCountMoreThan",
          "countNumber": 9999
        },
        "action": {
          "type": "expire"
        }
      },
      {
        "rulePriority": 10,
        "description": "Expire tagged images older than 10 days",
        "selection": {
          "tagPrefixList": [ "history" ],
          "tagStatus": "tagged",
          "countType": "sinceImagePushed",
          "countUnit": "days",
          "countNumber": 10
        },
        "action": {
          "type": "expire"
        }
      }
    ]
}

Careful when testing

Testing these rules is a little confusing because of the wording in the AWS console. It's a good idea to test rules like these before applying them to production. When entering these rules in the screen behind the Edit Test Rules button in the AWS console and testing it, the results section has this headline:

Image matches for test lifecycle rules

However what's listed here is actions taken, i.e. the production and staging images should not appear here since no actions are to be taken for images with these tags. They did, however match rules. It's just that no action has been taken for them and, thus, they're not listed here.

So what you want to check for when testing is that the images tagged production and staging do not appear in the list.

Why that works

An image that matches the tagging requirements of a rule cannot be expired by a rule with a lower priority.

-- AWS ECR docs on Lifecycle policy evaluation rules Show archive.org snapshot

This means that if an image is already caught by a rule and has its expiry configuration set, another rule won't touch it. In this case, we're looking at the production tag and setting it so that it will only delete images with this tag if there's more than 9999 images tagged with production present. Keep in mind this is a prefix match, i.e. if you're using strings that start with production as tags, this rule will match after there are 10000 images are present and start deleting images.

The later rule with time based expiry rules will no longer affect the image tagged production.

Not the most obvious way to design expiry rules.

Emma Heinle
Last edit
Emma Heinle
License
Source code in this card is licensed under the MIT License.