feat: add ECS and AgentCore container build/package/deploy support by singledigit · Pull Request #9013 · aws/aws-sam-cli

singledigit · 2026-05-15T07:18:52Z

Which issue(s) does this change fix?

Why is this change necessary?

SAM CLI provides an excellent developer experience for Lambda Image functions (sam build && sam deploy), but users deploying containerized workloads to ECS (Fargate) or Bedrock AgentCore must manage their Docker build/push/deploy pipeline separately — even when these resources live in the same CloudFormation template. This creates a fragmented workflow requiring external tooling for an identical operation: build image → push to ECR → deploy.

How does it address the issue?

Extends the existing Lambda Image build pipeline to recognize AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources with a Metadata block containing Dockerfile and DockerContext. No new commands — sam build, sam package, sam deploy, and sam sync gain awareness of these resource types.

Template example:

Resources:
  MyAgent:
    Type: AWS::BedrockAgentCore::Runtime
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./agent
      Architecture: arm64
    Properties:
      AgentRuntimeArtifact:
        ContainerConfiguration:
          ContainerUri: placeholder

  MyTask:
    Type: AWS::ECS::TaskDefinition
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./app
      ContainerName: web
    Properties:
      ContainerDefinitions:
        - Name: web
          Image: placeholder

Key implementation details:

Reuses _build_lambda_image() — same Docker build logic, buildkit support included
--resolve-image-repos auto-creates ECR repos via companion stack
ContainerName metadata targets specific containers in multi-container TaskDefinitions
Architecture metadata sets --platform (e.g., arm64 for AgentCore)
ARTIFACT_TYPE = ZIP to pass the PackageType filter (these resources don't have PackageType)
No SAM Transform changes needed — uses native CloudFormation resource types

Design document: designs/container_image_builds_ecs_agentcore.md

What side effects does this change have?

sam build logs "Found N container service resource(s) to build" when applicable resources are present. No behavior change for templates without these resources.
--resolve-image-repos creates ECR repos for ECS/AgentCore in addition to Lambda Image functions.
_update_built_resource adds an optional metadata parameter (backward compatible, defaults to None).

Mandatory Checklist

PRs will only be reviewed after checklist is complete

Review the generative AI contribution guidelines
Add input/output type hints to new functions/methods
Write design document if needed (Do I need to write a design document?)
Write/update unit tests
Write/update integration tests
Write/update functional tests if needed
make pr passes
make update-reproducible-reqs if dependencies were changed
Write documentation

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

aws-sam-cli-bot

Code Review Results

Reviewed: 9ffa5ed..052400d
Files: 19
Comments: 3

aws-sam-cli-bot · 2026-05-15T07:22:41Z

+                        if container_def.get("Name") == target_name:
+                            container_def["Image"] = path
+                            break
+                else:


[BUG] The else on line 429 binds to the if target_name: on line 424, not to the for loop on line 425. As a result, when ContainerName is set in Metadata but no container in ContainerDefinitions matches that name (e.g. typo, renamed container), no Image field is updated on any container — silently. The image is built and the ECR repo is created, but the post-build template the user inspects/deploys still has the placeholder. The fallback to the first container only fires when ContainerName is unset.

Either match the for/else semantics so a missing target falls back to the first container, or — better — surface a clear error so the user knows their ContainerName doesn't resolve. Suggested fix:

if target_name: for container_def in container_defs: if container_def.get("Name") == target_name: container_def["Image"] = path break else: raise ValueError( f"Metadata.ContainerName '{target_name}' does not match any " f"container in ContainerDefinitions" ) else: container_defs[0]["Image"] = path

The new tests in test_container_build_integration.py only cover the matching and no-name cases, so this regression isn't caught.

aws-sam-cli-bot · 2026-05-15T07:22:41Z

+            for i, cd in enumerate(container_defs):
+                if cd.get("Name") == target_name:
+                    return i
+        return 0


[BUG] _get_target_index returns 0 (the first container) when Metadata.ContainerName is set but doesn't match any container in ContainerDefinitions. This means sam package/sam deploy will silently rewrite the Image of the wrong container — typically a sidecar like public.ecr.aws/envoy:latest in your own integration test template — replacing it with the freshly pushed ECR URI. The user's intended container keeps its placeholder.

Combined with the matching bug in app_builder._update_built_resource, a typo in ContainerName can cause a silent, incorrect deployment rather than a build/package failure.

Treat the no-match case explicitly. For example:

def gettarget_index(self, container_defs): metadata = getattr(self, "resource_metadata", None) or {} target_name = metadata.get("ContainerName") if target_name: for i, cd in enumerate(container_defs): if cd.get("Name") == target_name: return i raise exceptions.ExportFailedError( resource_id=..., # plumb through property_name=self.PROPERTY_NAME, property_value=target_name, ex=ValueError( f"Metadata.ContainerName '{target_name}' does not match any " f"container in ContainerDefinitions" ), ) return 0

aws-sam-cli-bot · 2026-05-15T07:22:41Z

+
+            if family:
+                # Paginate through all clusters
+                cluster_paginator = ecs_client.get_paginator("list_clusters")


[PERFORMANCE] _force_ecs_deployment paginates every cluster in the account and every service in every cluster, then calls describe_services in batches of 10 to find services whose taskDefinition family matches. In accounts with many ECS clusters/services this fan-out is O(clusters × services / 10) describe_services calls on every sam sync iteration — slow, rate-limit-prone, and requires ecs:ListClusters/ecs:ListServices across the entire account.

Since the user already supplied the task definition (and SAM owns the CloudFormation stack), the cluster/service references are typically discoverable from the template itself (e.g., sibling AWS::ECS::Service resources referencing this task definition via !Ref) or from the deployed stack resources. Consider scoping the search to services declared in or deployed by the current stack rather than scanning the whole account.

aws-sam-cli-bot

Code Review Results

Reviewed: 9ffa5ed..825128f
Files: 19
Comments: 3

aws-sam-cli-bot · 2026-05-15T09:26:40Z

+                            "Metadata.ContainerName '%s' does not match any container in ContainerDefinitions",
+                            target_name,
+                        )
+                        container_defs[0]["Image"] = path


[BUG] When Metadata.ContainerName is set but does not match any container in ContainerDefinitions (typo, renamed container, copy/paste from another task definition), the for…else fall-through warns and then silently writes the built image to container_defs[0]["Image"]. In a multi-container task definition where index 0 is a sidecar (e.g. public.ecr.aws/envoy:latest, as in the integration test fixture in tests/integration/testdata/buildcmd/container_image/template.yaml), the sidecar's image reference is silently replaced with the freshly built ECR URI while the user's intended container retains placeholder. The resulting post-build template deploys a broken task definition that the user only discovers at deploy/runtime.

A LOG.warning is not sufficient — users routinely miss warnings and the failure mode is data corruption of the template. Either fail fast, or skip the update entirely so the user is forced to address the mismatch:

if resource_type == AWS_ECS_TASK_DEFINITION: container_defs = resource_properties.get("ContainerDefinitions", []) if not container_defs: return target_name = metadata.get("ContainerName") if metadata else None if target_name: for container_def in container_defs: if container_def.get("Name") == target_name: container_def["Image"] = path return raise DockerBuildFailed( f"Metadata.ContainerName '{target_name}' does not match any container " f"in ContainerDefinitions. Update Metadata.ContainerName or add a matching " f"container." ) if len(container_defs) > 1: raise DockerBuildFailed( "ContainerDefinitions has multiple containers but Metadata.ContainerName " "is not set. Specify Metadata.ContainerName to disambiguate." ) container_defs[0]["Image"] = path

This was raised in the previous review and has not been addressed or dismissed — re-raising.

aws-sam-cli-bot · 2026-05-15T09:26:40Z

+                "Metadata.ContainerName '%s' does not match any container in ContainerDefinitions, defaulting to index 0",
+                target_name,
+            )
+        return 0


[BUG] _get_target_index has the exact same silent-fallback problem as app_builder._update_built_resource: when Metadata.ContainerName is set but no container matches, it logs a warning and returns 0. sam package / sam deploy then rewrite the Image of ContainerDefinitions[0] — which is typically a sidecar in multi-container task definitions — replacing it with the pushed ECR URI. The user's intended container keeps its placeholder/non-ECR value and the deploy proceeds with a broken task definition.

Two reasons to fix this regardless of whether app_builder already validates: (1) the package code path is exercised independently of build (e.g. sam package on an already-built template, or templates produced outside SAM), and (2) the delete path (line 748) uses the same _get_target_index, so sam delete can attempt to delete the wrong ECR artifact.

Raise an error rather than silently defaulting:

def gettarget_index(self, container_defs): metadata = getattr(self, "resource_metadata", None) or {} target_name = metadata.get("ContainerName") if target_name: for i, cd in enumerate(container_defs): if cd.get("Name") == target_name: return i raise exceptions.ExportFailedError( resource_id=getattr(self, "_resource_id", ""), property_name=self.PROPERTY_NAME, property_value=target_name, ex=ValueError( f"Metadata.ContainerName '{target_name}' does not match any " f"container in ContainerDefinitions" ), ) if len(container_defs) > 1: raise exceptions.ExportFailedError( resource_id=getattr(self, "_resource_id", ""), property_name=self.PROPERTY_NAME, property_value=None, ex=ValueError( "ContainerDefinitions has multiple containers but " "Metadata.ContainerName is not set" ), ) return 0

This was raised in the previous review and has not been addressed or dismissed — re-raising.

aws-sam-cli-bot · 2026-05-15T09:26:40Z

+
+            if family:
+                # Paginate through all clusters
+                cluster_paginator = ecs_client.get_paginator("list_clusters")


[PERFORMANCE] _force_ecs_deployment performs an unbounded scan of the entire account on every sam sync iteration: it paginates list_clusters, then list_services for every cluster, then batches describe_services in chunks of 10 — only to filter results client-side by taskDefinition family. In an account with N clusters and M services per cluster this is O(N + N·M/10) API calls per task-definition sync, repeated on every file-watch iteration of sam sync --watch. This is slow, easy to throttle (describe_services has aggressive rate limits in shared accounts), and forces the deployer's IAM principal to hold ecs:ListClusters and ecs:ListServices across the whole account — privileges that are typically far broader than what's needed to deploy a single task definition.

The user already provides the stack name (self._deploy_context.stack_name). The supported design is to discover services in the same CloudFormation stack rather than scanning the account: enumerate AWS::ECS::Service resources from the stack (you already load self._stacks), resolve their physical IDs from self._physical_id_mapping, and call update_service directly — or describe StackResources to find services pointing at this task definition family. This is O(services-in-stack) and uses only the IAM permissions the user already needs to deploy the stack.

If a wider scan is required for some edge case, gate it behind an opt-in flag rather than making it the default path on every sync.

This was raised in the previous review and has not been addressed or dismissed — re-raising.

aws-sam-cli-bot

Code Review Results

Reviewed: 9ffa5ed..03fd9e6
Files: 19
Comments: 2

aws-sam-cli-bot · 2026-05-15T09:44:51Z

+                            "Metadata.ContainerName '%s' does not match any container in ContainerDefinitions",
+                            target_name,
+                        )
+                        container_defs[0]["Image"] = path


[BUG] When Metadata.ContainerName is set but no container in ContainerDefinitions has that Name (typo, renamed container, copy/paste from another task definition), the for…else falls through, logs a warning, and silently writes the freshly built image to container_defs[0]["Image"]. In a multi-container task definition where index 0 is a sidecar, this overwrites the sidecar's image with the user's app image — exactly the layout used by the integration test fixture added in this PR (tests/integration/testdata/buildcmd/container_image/template.yaml):

ContainerDefinitions: - Name: sidecar Image: public.ecr.aws/envoy:latest - Name: web # if user mistypes ContainerName: 'wbe', envoy gets overwritten Image: placeholder

The post-build template the user inspects/deploys still has the placeholder on the intended container, while the unrelated sidecar entry now points at their freshly pushed ECR image — a hard-to-diagnose deploy-time failure. A warning in the build log is easy to miss in CI output.

This was raised on the previous review and not addressed. Fail loudly when the explicit ContainerName doesn't match — raise or skip the update — rather than guessing index 0:

if target_name: for container_def in container_defs: if container_def.get("Name") == target_name: container_def["Image"] = path break else: raise InvalidResourceException( resource_name, f"Metadata.ContainerName '{target_name}' does not match any " f"container in ContainerDefinitions", ) else: container_defs[0]["Image"] = path

aws-sam-cli-bot · 2026-05-15T09:44:51Z

+                "Metadata.ContainerName '%s' does not match any container in ContainerDefinitions, defaulting to index 0",
+                target_name,
+            )
+        return 0


[BUG] ECSTaskDefinitionImageResource._get_target_index has the same silent-fallback problem: when Metadata.ContainerName is set but doesn't match any Name in ContainerDefinitions, it logs a warning and returns 0. sam package / sam deploy then rewrite ContainerDefinitions[0]["Image"] — typically a sidecar in multi-container task definitions — replacing it with the pushed ECR URI. The user's intended container keeps its placeholder, and the sidecar starts running the wrong image at deploy time.

This is the package-side counterpart of the build-side bug above and compounds it: even if the build path is fixed, package will still corrupt the template here. It was raised on the previous review and not addressed.

Either return a sentinel and let export skip with a hard error, or raise directly. Returning index 0 on a typo is the most surprising possible behavior for a multi-container resource:

def gettarget_index(self, container_defs): metadata = getattr(self, "resource_metadata", None) or {} target_name = metadata.get("ContainerName") if target_name: for i, cd in enumerate(container_defs): if cd.get("Name") == target_name: return i raise exceptions.ExportFailedError( resource_id=..., property_name=self.PROPERTY_NAME, property_value=target_name, ex=ValueError( f"Metadata.ContainerName '{target_name}' does not match " f"any container in ContainerDefinitions" ), ) return 0

aws-sam-cli-bot

Code Review Results

Reviewed: 9ffa5ed..e0e5b2a
Files: 19
Comments: 1

aws-sam-cli-bot · 2026-05-15T09:48:31Z

+        container_defs = resource_dict.get("ContainerDefinitions", [])
+        if not container_defs:
+            return
+        target_idx = self._get_target_index(container_defs)


[BUG] ECSTaskDefinitionImageResource.delete() calls _get_target_index(container_defs), which reads self.resource_metadata via getattr(self, "resource_metadata", None) or {}. However, resource_metadata is only set on the exporter instance during the export flow (artifact_exporter.py:495) — the delete flow at artifact_exporter.py:522 instantiates the exporter and calls exporter.delete(...) without ever assigning resource_metadata.

As a result, during sam delete, _get_target_index always falls through to return 0, even when Metadata.ContainerName is set. In the multi-container TaskDefinition layout used by the integration test fixture in this PR (tests/integration/testdata/buildcmd/container_image/template.yaml) — where index 0 is a sidecar (public.ecr.aws/envoy:latest) and the user's container is web at index 1 — sam delete will inspect the sidecar's Image instead of the user's container. With a public image at index 0, the deletion silently no-ops (the user's pushed ECR image is leaked), and if index 0 happened to be a private ECR image, the wrong artifact would be deleted.

This is the same class of silent-fallback bug the export path was just fixed for. Either:

set exporter.resource_metadata = resource.get("Metadata") on the delete path in artifact_exporter.py before calling exporter.delete(...), mirroring the export path; or

have the delete code path resolve ContainerName from the resource_dict directly rather than relying on instance state set elsewhere.

# samcli/lib/package/artifact_exporter.py — delete() loop exporter = exporter_class(self.uploaders, None) exporter.resource_metadata = resource.get("Metadata") # add this exporter.delete(resource_id, resource_dict)

Extends SAM CLI to build, package, and deploy container images for AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources using the same Metadata convention as Lambda Image functions. Fixes aws#8933

singledigit requested a review from a team as a code owner May 15, 2026 07:18

github-actions Bot added area/build sam build command pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels May 15, 2026

aws-sam-cli-bot reviewed May 15, 2026

View reviewed changes

singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from 052400d to 825128f Compare May 15, 2026 09:23

aws-sam-cli-bot reviewed May 15, 2026

View reviewed changes

singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from 825128f to 03fd9e6 Compare May 15, 2026 09:42

aws-sam-cli-bot reviewed May 15, 2026

View reviewed changes

singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from 03fd9e6 to e0e5b2a Compare May 15, 2026 09:45

aws-sam-cli-bot reviewed May 15, 2026

View reviewed changes

feat: add ECS and AgentCore container build/package/deploy support

a29380d

Extends SAM CLI to build, package, and deploy container images for AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources using the same Metadata convention as Lambda Image functions. Fixes aws#8933

singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from e0e5b2a to a29380d Compare May 15, 2026 10:39

singledigit requested a review from aws-sam-cli-bot May 15, 2026 10:42

Conversation

singledigit commented May 15, 2026

Which issue(s) does this change fix?

Why is this change necessary?

How does it address the issue?

What side effects does this change have?

Mandatory Checklist

Uh oh!

aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

Code Review Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

Code Review Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

Code Review Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

Code Review Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants