Skip to content

feat: add ECS and AgentCore container build/package/deploy support#9013

Open
singledigit wants to merge 1 commit into
aws:developfrom
singledigit:feat/ecs-agentcore-container-builds-v2
Open

feat: add ECS and AgentCore container build/package/deploy support#9013
singledigit wants to merge 1 commit into
aws:developfrom
singledigit:feat/ecs-agentcore-container-builds-v2

Conversation

@singledigit
Copy link
Copy Markdown

Which issue(s) does this change fix?

Fixes #8933

Why is this change necessary?

SAM CLI provides an excellent developer experience for Lambda Image functions (sam build && sam deploy), but users deploying containerized workloads to ECS (Fargate) or Bedrock AgentCore must manage their Docker build/push/deploy pipeline separately — even when these resources live in the same CloudFormation template. This creates a fragmented workflow requiring external tooling for an identical operation: build image → push to ECR → deploy.

How does it address the issue?

Extends the existing Lambda Image build pipeline to recognize AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources with a Metadata block containing Dockerfile and DockerContext. No new commands — sam build, sam package, sam deploy, and sam sync gain awareness of these resource types.

Template example:

Resources:
  MyAgent:
    Type: AWS::BedrockAgentCore::Runtime
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./agent
      Architecture: arm64
    Properties:
      AgentRuntimeArtifact:
        ContainerConfiguration:
          ContainerUri: placeholder

  MyTask:
    Type: AWS::ECS::TaskDefinition
    Metadata:
      Dockerfile: Dockerfile
      DockerContext: ./app
      ContainerName: web
    Properties:
      ContainerDefinitions:
        - Name: web
          Image: placeholder

Key implementation details:

  • Reuses _build_lambda_image() — same Docker build logic, buildkit support included
  • --resolve-image-repos auto-creates ECR repos via companion stack
  • ContainerName metadata targets specific containers in multi-container TaskDefinitions
  • Architecture metadata sets --platform (e.g., arm64 for AgentCore)
  • ARTIFACT_TYPE = ZIP to pass the PackageType filter (these resources don't have PackageType)
  • No SAM Transform changes needed — uses native CloudFormation resource types

Design document: designs/container_image_builds_ecs_agentcore.md

What side effects does this change have?

  • sam build logs "Found N container service resource(s) to build" when applicable resources are present. No behavior change for templates without these resources.
  • --resolve-image-repos creates ECR repos for ECS/AgentCore in addition to Lambda Image functions.
  • _update_built_resource adds an optional metadata parameter (backward compatible, defaults to None).

Mandatory Checklist

PRs will only be reviewed after checklist is complete

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@singledigit singledigit requested a review from a team as a code owner May 15, 2026 07:18
@github-actions github-actions Bot added area/build sam build command pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels May 15, 2026
Copy link
Copy Markdown
Collaborator

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 9ffa5ed..052400d
Files: 19
Comments: 3

if container_def.get("Name") == target_name:
container_def["Image"] = path
break
else:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] The else on line 429 binds to the if target_name: on line 424, not to the for loop on line 425. As a result, when ContainerName is set in Metadata but no container in ContainerDefinitions matches that name (e.g. typo, renamed container), no Image field is updated on any container — silently. The image is built and the ECR repo is created, but the post-build template the user inspects/deploys still has the placeholder. The fallback to the first container only fires when ContainerName is unset.

Either match the for/else semantics so a missing target falls back to the first container, or — better — surface a clear error so the user knows their ContainerName doesn't resolve. Suggested fix:

if target_name:
   for container_def in container_defs:
       if container_def.get("Name") == target_name:
           container_def["Image"] = path
           break
   else:
       raise ValueError(
           f"Metadata.ContainerName '{target_name}' does not match any "
           f"container in ContainerDefinitions"
       )
else:
   container_defs[0]["Image"] = path

The new tests in test_container_build_integration.py only cover the matching and no-name cases, so this regression isn't caught.

for i, cd in enumerate(container_defs):
if cd.get("Name") == target_name:
return i
return 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] _get_target_index returns 0 (the first container) when Metadata.ContainerName is set but doesn't match any container in ContainerDefinitions. This means sam package/sam deploy will silently rewrite the Image of the wrong container — typically a sidecar like public.ecr.aws/envoy:latest in your own integration test template — replacing it with the freshly pushed ECR URI. The user's intended container keeps its placeholder.

Combined with the matching bug in app_builder._update_built_resource, a typo in ContainerName can cause a silent, incorrect deployment rather than a build/package failure.

Treat the no-match case explicitly. For example:

def gettarget_index(self, container_defs):
   metadata = getattr(self, "resource_metadata", None) or {}
   target_name = metadata.get("ContainerName")
   if target_name:
       for i, cd in enumerate(container_defs):
           if cd.get("Name") == target_name:
               return i
       raise exceptions.ExportFailedError(
           resource_id=...,  # plumb through
           property_name=self.PROPERTY_NAME,
           property_value=target_name,
           ex=ValueError(
               f"Metadata.ContainerName '{target_name}' does not match any "
               f"container in ContainerDefinitions"
           ),
       )
   return 0


if family:
# Paginate through all clusters
cluster_paginator = ecs_client.get_paginator("list_clusters")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PERFORMANCE] _force_ecs_deployment paginates every cluster in the account and every service in every cluster, then calls describe_services in batches of 10 to find services whose taskDefinition family matches. In accounts with many ECS clusters/services this fan-out is O(clusters × services / 10) describe_services calls on every sam sync iteration — slow, rate-limit-prone, and requires ecs:ListClusters/ecs:ListServices across the entire account.

Since the user already supplied the task definition (and SAM owns the CloudFormation stack), the cluster/service references are typically discoverable from the template itself (e.g., sibling AWS::ECS::Service resources referencing this task definition via !Ref) or from the deployed stack resources. Consider scoping the search to services declared in or deployed by the current stack rather than scanning the whole account.

@singledigit singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from 052400d to 825128f Compare May 15, 2026 09:23
Copy link
Copy Markdown
Collaborator

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 9ffa5ed..825128f
Files: 19
Comments: 3

Comment thread samcli/lib/build/app_builder.py Outdated
"Metadata.ContainerName '%s' does not match any container in ContainerDefinitions",
target_name,
)
container_defs[0]["Image"] = path
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] When Metadata.ContainerName is set but does not match any container in ContainerDefinitions (typo, renamed container, copy/paste from another task definition), the for…else fall-through warns and then silently writes the built image to container_defs[0]["Image"]. In a multi-container task definition where index 0 is a sidecar (e.g. public.ecr.aws/envoy:latest, as in the integration test fixture in tests/integration/testdata/buildcmd/container_image/template.yaml), the sidecar's image reference is silently replaced with the freshly built ECR URI while the user's intended container retains placeholder. The resulting post-build template deploys a broken task definition that the user only discovers at deploy/runtime.

A LOG.warning is not sufficient — users routinely miss warnings and the failure mode is data corruption of the template. Either fail fast, or skip the update entirely so the user is forced to address the mismatch:

if resource_type == AWS_ECS_TASK_DEFINITION:
   container_defs = resource_properties.get("ContainerDefinitions", [])
   if not container_defs:
       return
   target_name = metadata.get("ContainerName") if metadata else None
   if target_name:
       for container_def in container_defs:
           if container_def.get("Name") == target_name:
               container_def["Image"] = path
               return
       raise DockerBuildFailed(
           f"Metadata.ContainerName '{target_name}' does not match any container "
           f"in ContainerDefinitions. Update Metadata.ContainerName or add a matching "
           f"container."
       )
   if len(container_defs) > 1:
       raise DockerBuildFailed(
           "ContainerDefinitions has multiple containers but Metadata.ContainerName "
           "is not set. Specify Metadata.ContainerName to disambiguate."
       )
   container_defs[0]["Image"] = path

This was raised in the previous review and has not been addressed or dismissed — re-raising.

"Metadata.ContainerName '%s' does not match any container in ContainerDefinitions, defaulting to index 0",
target_name,
)
return 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] _get_target_index has the exact same silent-fallback problem as app_builder._update_built_resource: when Metadata.ContainerName is set but no container matches, it logs a warning and returns 0. sam package / sam deploy then rewrite the Image of ContainerDefinitions[0] — which is typically a sidecar in multi-container task definitions — replacing it with the pushed ECR URI. The user's intended container keeps its placeholder/non-ECR value and the deploy proceeds with a broken task definition.

Two reasons to fix this regardless of whether app_builder already validates: (1) the package code path is exercised independently of build (e.g. sam package on an already-built template, or templates produced outside SAM), and (2) the delete path (line 748) uses the same _get_target_index, so sam delete can attempt to delete the wrong ECR artifact.

Raise an error rather than silently defaulting:

def gettarget_index(self, container_defs):
   metadata = getattr(self, "resource_metadata", None) or {}
   target_name = metadata.get("ContainerName")
   if target_name:
       for i, cd in enumerate(container_defs):
           if cd.get("Name") == target_name:
               return i
       raise exceptions.ExportFailedError(
           resource_id=getattr(self, "_resource_id", ""),
           property_name=self.PROPERTY_NAME,
           property_value=target_name,
           ex=ValueError(
               f"Metadata.ContainerName '{target_name}' does not match any "
               f"container in ContainerDefinitions"
           ),
       )
   if len(container_defs) > 1:
       raise exceptions.ExportFailedError(
           resource_id=getattr(self, "_resource_id", ""),
           property_name=self.PROPERTY_NAME,
           property_value=None,
           ex=ValueError(
               "ContainerDefinitions has multiple containers but "
               "Metadata.ContainerName is not set"
           ),
       )
   return 0

This was raised in the previous review and has not been addressed or dismissed — re-raising.


if family:
# Paginate through all clusters
cluster_paginator = ecs_client.get_paginator("list_clusters")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[PERFORMANCE] _force_ecs_deployment performs an unbounded scan of the entire account on every sam sync iteration: it paginates list_clusters, then list_services for every cluster, then batches describe_services in chunks of 10 — only to filter results client-side by taskDefinition family. In an account with N clusters and M services per cluster this is O(N + N·M/10) API calls per task-definition sync, repeated on every file-watch iteration of sam sync --watch. This is slow, easy to throttle (describe_services has aggressive rate limits in shared accounts), and forces the deployer's IAM principal to hold ecs:ListClusters and ecs:ListServices across the whole account — privileges that are typically far broader than what's needed to deploy a single task definition.

The user already provides the stack name (self._deploy_context.stack_name). The supported design is to discover services in the same CloudFormation stack rather than scanning the account: enumerate AWS::ECS::Service resources from the stack (you already load self._stacks), resolve their physical IDs from self._physical_id_mapping, and call update_service directly — or describe StackResources to find services pointing at this task definition family. This is O(services-in-stack) and uses only the IAM permissions the user already needs to deploy the stack.

If a wider scan is required for some edge case, gate it behind an opt-in flag rather than making it the default path on every sync.

This was raised in the previous review and has not been addressed or dismissed — re-raising.

@singledigit singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from 825128f to 03fd9e6 Compare May 15, 2026 09:42
Copy link
Copy Markdown
Collaborator

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 9ffa5ed..03fd9e6
Files: 19
Comments: 2

Comment thread samcli/lib/build/app_builder.py Outdated
"Metadata.ContainerName '%s' does not match any container in ContainerDefinitions",
target_name,
)
container_defs[0]["Image"] = path
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] When Metadata.ContainerName is set but no container in ContainerDefinitions has that Name (typo, renamed container, copy/paste from another task definition), the for…else falls through, logs a warning, and silently writes the freshly built image to container_defs[0]["Image"]. In a multi-container task definition where index 0 is a sidecar, this overwrites the sidecar's image with the user's app image — exactly the layout used by the integration test fixture added in this PR (tests/integration/testdata/buildcmd/container_image/template.yaml):

ContainerDefinitions:
  - Name: sidecar
   Image: public.ecr.aws/envoy:latest
  - Name: web              # if user mistypes ContainerName: 'wbe', envoy gets overwritten
   Image: placeholder

The post-build template the user inspects/deploys still has the placeholder on the intended container, while the unrelated sidecar entry now points at their freshly pushed ECR image — a hard-to-diagnose deploy-time failure. A warning in the build log is easy to miss in CI output.

This was raised on the previous review and not addressed. Fail loudly when the explicit ContainerName doesn't match — raise or skip the update — rather than guessing index 0:

if target_name:
   for container_def in container_defs:
       if container_def.get("Name") == target_name:
           container_def["Image"] = path
           break
   else:
       raise InvalidResourceException(
           resource_name,
           f"Metadata.ContainerName '{target_name}' does not match any "
           f"container in ContainerDefinitions",
       )
else:
   container_defs[0]["Image"] = path

"Metadata.ContainerName '%s' does not match any container in ContainerDefinitions, defaulting to index 0",
target_name,
)
return 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] ECSTaskDefinitionImageResource._get_target_index has the same silent-fallback problem: when Metadata.ContainerName is set but doesn't match any Name in ContainerDefinitions, it logs a warning and returns 0. sam package / sam deploy then rewrite ContainerDefinitions[0]["Image"] — typically a sidecar in multi-container task definitions — replacing it with the pushed ECR URI. The user's intended container keeps its placeholder, and the sidecar starts running the wrong image at deploy time.

This is the package-side counterpart of the build-side bug above and compounds it: even if the build path is fixed, package will still corrupt the template here. It was raised on the previous review and not addressed.

Either return a sentinel and let export skip with a hard error, or raise directly. Returning index 0 on a typo is the most surprising possible behavior for a multi-container resource:

def gettarget_index(self, container_defs):
   metadata = getattr(self, "resource_metadata", None) or {}
   target_name = metadata.get("ContainerName")
   if target_name:
       for i, cd in enumerate(container_defs):
           if cd.get("Name") == target_name:
               return i
       raise exceptions.ExportFailedError(
           resource_id=...,
           property_name=self.PROPERTY_NAME,
           property_value=target_name,
           ex=ValueError(
               f"Metadata.ContainerName '{target_name}' does not match "
               f"any container in ContainerDefinitions"
           ),
       )
   return 0

@singledigit singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from 03fd9e6 to e0e5b2a Compare May 15, 2026 09:45
Copy link
Copy Markdown
Collaborator

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: 9ffa5ed..e0e5b2a
Files: 19
Comments: 1

container_defs = resource_dict.get("ContainerDefinitions", [])
if not container_defs:
return
target_idx = self._get_target_index(container_defs)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[BUG] ECSTaskDefinitionImageResource.delete() calls _get_target_index(container_defs), which reads self.resource_metadata via getattr(self, "resource_metadata", None) or {}. However, resource_metadata is only set on the exporter instance during the export flow (artifact_exporter.py:495) — the delete flow at artifact_exporter.py:522 instantiates the exporter and calls exporter.delete(...) without ever assigning resource_metadata.

As a result, during sam delete, _get_target_index always falls through to return 0, even when Metadata.ContainerName is set. In the multi-container TaskDefinition layout used by the integration test fixture in this PR (tests/integration/testdata/buildcmd/container_image/template.yaml) — where index 0 is a sidecar (public.ecr.aws/envoy:latest) and the user's container is web at index 1 — sam delete will inspect the sidecar's Image instead of the user's container. With a public image at index 0, the deletion silently no-ops (the user's pushed ECR image is leaked), and if index 0 happened to be a private ECR image, the wrong artifact would be deleted.

This is the same class of silent-fallback bug the export path was just fixed for. Either:

  • set exporter.resource_metadata = resource.get("Metadata") on the delete path in artifact_exporter.py before calling exporter.delete(...), mirroring the export path; or
  • have the delete code path resolve ContainerName from the resource_dict directly rather than relying on instance state set elsewhere.
# samcli/lib/package/artifact_exporter.py — delete() loop
exporter = exporter_class(self.uploaders, None)
exporter.resource_metadata = resource.get("Metadata")  # add this
exporter.delete(resource_id, resource_dict)

Extends SAM CLI to build, package, and deploy container images for
AWS::ECS::TaskDefinition and AWS::BedrockAgentCore::Runtime resources
using the same Metadata convention as Lambda Image functions.

Fixes aws#8933
@singledigit singledigit force-pushed the feat/ecs-agentcore-container-builds-v2 branch from e0e5b2a to a29380d Compare May 15, 2026 10:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build sam build command pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: Extend SAM container management to Bedrock AgentCore, ECS, and Fargate

2 participants