Skip to content

feat(providers): add Dockerfile/Containerfile provider for image analysis#569

Open
a-oren wants to merge 3 commits into
guacsec:mainfrom
a-oren:worktree-TC-4937
Open

feat(providers): add Dockerfile/Containerfile provider for image analysis#569
a-oren wants to merge 3 commits into
guacsec:mainfrom
a-oren:worktree-TC-4937

Conversation

@a-oren

@a-oren a-oren commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add oci_dockerfile.js provider that recognizes Dockerfile and Containerfile manifests for component and stack analysis
  • Provider parses FROM lines to extract the base image reference, reuses generateImageSBOM to produce a CycloneDX SBOM
  • Multi-stage Dockerfiles use the final stage's FROM image
  • Unit tests for isSupported, validateLockFile, readLicenseFromManifest, packageManagerName, and FROM line parsing
  • Fix: strip all leading --flag tokens from FROM lines, not just the first (TC-4977)
  • Fix: detect and reject ARG-substituted FROM targets with a clear error (TC-4978)

Implements TC-4937

Test plan

  • isSupported returns true for Dockerfile and Containerfile, false for others
  • FROM line parsing extracts correct image reference from single-stage Dockerfile
  • FROM line parsing uses last FROM in multi-stage Dockerfile
  • FROM line parsing handles multiple flags before image reference
  • FROM line parsing rejects ARG-substituted FROM targets
  • validateLockFile always returns true
  • readLicenseFromManifest returns null
  • All 20 unit tests pass
  • All existing tests unaffected
  • ESLint passes with 0 errors

🤖 Generated with Claude Code

…ysis

Add a new provider that recognizes Dockerfile and Containerfile manifests
for component and stack analysis. The provider parses FROM lines to extract
the base image reference, then reuses generateImageSBOM to produce a
CycloneDX SBOM. Multi-stage Dockerfiles use the final stage's FROM image.

Implements TC-4937

Assisted-by: Claude Code
@sourcery-ai

sourcery-ai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Reviewer's Guide

Adds a new OCI Dockerfile/Containerfile provider that parses FROM lines to derive the base image, generates an image SBOM via existing OCI utilities, wires it into the provider registry, and introduces unit tests around support detection, parsing, and behavior.

Sequence diagram for Dockerfile provider image SBOM generation

sequenceDiagram
  actor Client
  participant dockerfileProvider
  participant fs
  participant parseFromImage
  participant parseImageRef
  participant generateImageSBOM

  Client->>dockerfileProvider: provideComponent(manifest, opts)
  dockerfileProvider->>dockerfileProvider: getImageSBOM(manifest, opts)
  dockerfileProvider->>fs: readFileSync(manifest, utf-8)
  fs-->>dockerfileProvider: manifestContent
  dockerfileProvider->>parseFromImage: parseFromImage(manifestContent)
  parseFromImage-->>dockerfileProvider: image
  dockerfileProvider->>parseImageRef: parseImageRef(image, opts)
  parseImageRef-->>dockerfileProvider: imageRef
  dockerfileProvider->>generateImageSBOM: generateImageSBOM(imageRef, opts)
  generateImageSBOM-->>dockerfileProvider: sbom
  dockerfileProvider-->>Client: {ecosystem: oci, content, contentType}
Loading

File-Level Changes

Change Details Files
Introduce Dockerfile/Containerfile provider that generates CycloneDX SBOMs from base images defined in Dockerfile manifests.
  • Implement isSupported to recognize Dockerfile and Containerfile manifest names
  • Implement parseFromImage to extract the final stage base image from FROM lines, including handling optional flags and AS aliases
  • Implement getImageSBOM to read manifest content, parse the image reference, and call generateImageSBOM, wrapping the result in the OCI ecosystem format
  • Implement provideComponent and provideStack to delegate SBOM generation to getImageSBOM
  • Implement validateLockFile to always return true and readLicenseFromManifest to always return null
  • Expose the provider with packageManagerName of oci
src/providers/oci_dockerfile.js
Register the new Dockerfile provider in the global provider list so it participates in analysis flows.
  • Add dockerfileProvider to availableProviders array alongside existing language and ecosystem providers
src/provider.js
Add unit test coverage for the new Dockerfile provider’s support matrix and FROM parsing behavior.
  • Add tests for isSupported handling of Dockerfile and Containerfile names
  • Add tests for always-true validateLockFile and null readLicenseFromManifest
  • Add tests for packageManagerName returning oci
  • Add tests verifying parseFromImage correctly extracts single-stage and final multi-stage FROM image references
test/providers/oci_dockerfile.test.js

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The parseFromImage implementation assumes a relatively simple FROM syntax and only strips a single leading --flag; consider making this parsing more robust (e.g., handling multiple flags, comments, and more complex whitespace) or delegating to a Dockerfile parser to avoid mis-extracting the base image.
  • In getImageSBOM, synchronous fs.readFileSync is used; if other providers are asynchronous or this runs in a performance-sensitive path, consider aligning with the existing I/O pattern to avoid blocking the event loop on large Dockerfiles.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `parseFromImage` implementation assumes a relatively simple `FROM` syntax and only strips a single leading `--flag`; consider making this parsing more robust (e.g., handling multiple flags, comments, and more complex whitespace) or delegating to a Dockerfile parser to avoid mis-extracting the base image.
- In `getImageSBOM`, synchronous `fs.readFileSync` is used; if other providers are asynchronous or this runs in a performance-sensitive path, consider aligning with the existing I/O pattern to avoid blocking the event loop on large Dockerfiles.

## Individual Comments

### Comment 1
<location path="src/providers/oci_dockerfile.js" line_range="39-44" />
<code_context>
+ * @returns {string} the image reference from the last FROM line
+ * @throws {Error} when no FROM line is found in the Dockerfile
+ */
+export function parseFromImage(manifestContent) {
+	const lines = manifestContent.split(/\r?\n/)
+	let lastFrom = null
+	for (const line of lines) {
+		const trimmed = line.trim()
+		if (/^FROM\s+/i.test(trimmed)) {
+			// Extract image ref: FROM [--platform=...] image [AS name]
+			const withoutFrom = trimmed.replace(/^FROM\s+/i, '')
</code_context>
<issue_to_address>
**issue (bug_risk):** Dockerfiles that use ARG substitution in FROM lines may not be handled correctly.

This implementation only works when the FROM image is a literal value. In cases like `ARG BASE_IMAGE` followed by `FROM ${BASE_IMAGE}`, `parseFromImage` will return `${BASE_IMAGE}`, which will not be a valid reference for `parseImageRef`/`generateImageSBOM`. If these patterns should be supported, you’ll need to resolve ARGs (including defaults/env), or detect non-literal FROM targets and fail fast with a clear error.
</issue_to_address>

### Comment 2
<location path="src/providers/oci_dockerfile.js" line_range="44-52" />
<code_context>
+		if (/^FROM\s+/i.test(trimmed)) {
+			// Extract image ref: FROM [--platform=...] image [AS name]
+			const withoutFrom = trimmed.replace(/^FROM\s+/i, '')
+			// Skip optional --platform flag
+			const withoutFlags = withoutFrom.replace(/^--\S+\s+/, '')
+			// Take only the image part (before AS alias)
+			const parts = withoutFlags.split(/\s+/)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Flag stripping on FROM lines only removes a single leading flag, which may miss valid Dockerfile syntax.

The Dockerfile syntax allows multiple flags before the image (e.g. `FROM --platform=$BUILDPLATFORM --some-flag image AS name`), but the current regex only removes a single leading `--...` token. With additional flags, `parts[0]` could still be a flag instead of the image. Consider stripping all leading `--...` tokens (e.g. in a loop) or splitting and filtering out `--*` tokens before selecting the image reference.

```suggestion
		if (/^FROM\s+/i.test(trimmed)) {
			// Extract image ref: FROM [--platform=...] [--flags...] image [AS name]
			const withoutFrom = trimmed.replace(/^FROM\s+/i, '')
			// Split into tokens and drop all leading flag tokens (starting with "--")
			const tokens = withoutFrom.split(/\s+/).filter(Boolean)
			let imageIndex = 0
			while (imageIndex < tokens.length && tokens[imageIndex].startsWith('--')) {
				imageIndex++
			}
			if (imageIndex < tokens.length) {
				// The first non-flag token is the image; ignore any following AS alias
				lastFrom = tokens[imageIndex]
			}
		}
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/providers/oci_dockerfile.js
Comment thread src/providers/oci_dockerfile.js
@codecov-commenter

codecov-commenter commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.54545% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.66%. Comparing base (ab5949f) to head (dd053df).

Files with missing lines Patch % Lines
src/providers/oci_dockerfile.js 84.11% 17 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #569      +/-   ##
==========================================
- Coverage   90.75%   90.66%   -0.09%     
==========================================
  Files          36       37       +1     
  Lines        7766     7875     +109     
  Branches     1353     1367      +14     
==========================================
+ Hits         7048     7140      +92     
- Misses        718      735      +17     
Flag Coverage Δ
unit-tests 90.66% <84.54%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/provider.js 100.00% <100.00%> (ø)
src/providers/oci_dockerfile.js 84.11% <84.11%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@a-oren

a-oren commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

[sdlc-workflow/verify-pr] Re: @sourcery-ai[bot] review —

Suggestion 1 (parseFromImage robustness): Classified as code change request (upgraded from suggestion) — covered by inline comment sub-tasks TC-4977 and TC-4978.

Suggestion 2 (synchronous fs.readFileSync): Classified as suggestion — the project consistently uses fs.readFileSync across all providers (25+ occurrences in src/providers/). This is an established project convention; no sub-task created.

@a-oren

a-oren commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Verification Report for TC-4937 (commit 905c336)

Check Result Details
Review Feedback WARN 2 code change requests (upgraded from suggestions); sub-tasks TC-4977, TC-4978 created
Root-Cause Investigation DONE Implement-task skill gap: insufficient parser edge case analysis; root-cause task TC-4979 created
Scope Containment WARN File renamed from spec (oci_dockerfile.js vs dockerfile.js per naming convention); test file added beyond spec
Diff Size PASS 202 additions, 1 deletion across 3 files — proportionate for new provider + tests
Commit Traceability PASS Single commit references TC-4937 in body
Sensitive Patterns PASS No secrets or credentials detected
CI Status PASS All 5 CI checks pass (Node 22, Node 24, Sourcery, PR title, commit messages)
Acceptance Criteria PASS All 6 acceptance criteria satisfied
Test Quality PASS Repetitive Test Detection: PASS, Test Documentation: PASS, Eval Quality: N/A
Test Change Classification ADDITIVE All test files are new additions (96 lines)
Verification Commands PASS npm test and npm run lint pass on both Node 22 and Node 24

Overall: WARN

Two reviewer suggestions from sourcery-ai[bot] were upgraded to code change requests based on project conventions (CONVENTIONS.md §Error Handling):

  1. TC-4977parseFromImage only strips a single --flag token; should handle multiple flags via loop
  2. TC-4978parseFromImage should detect and reject ${VARIABLE} ARG substitution in FROM targets

Both are scoped improvements to the FROM line parser. All acceptance criteria pass and CI is green.


This comment was AI-generated by sdlc-workflow/verify-pr v0.11.0.

a-oren and others added 2 commits June 29, 2026 12:40
The parseFromImage function previously only stripped a single --flag
token using a regex. Dockerfile syntax allows multiple flags before
the image reference. Replace the single regex with a loop that skips
all leading --flag tokens.

TC-4977

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Assisted-by: Claude Code
When a Dockerfile uses ARG substitution in FROM lines (e.g.
FROM ${BASE_IMAGE}), parseFromImage now throws a clear error
instead of passing the unresolved variable to downstream functions.

TC-4978

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Assisted-by: Claude Code
@a-oren

a-oren commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Verification Report for TC-4937 (commit dd053df)

Check Result Details
Review Feedback PASS 2 prior code change requests (TC-4977, TC-4978) fully addressed — fix commits included in PR
Root-Cause Investigation N/A No new sub-tasks created; TC-4979 created in prior run
Scope Containment PASS All task-specified functionality present; file rename (dockerfile.js → oci_dockerfile.js) follows OCI naming convention
Diff Size PASS 217 additions, 1 deletion across 3 files — proportionate for new provider + tests
Commit Traceability PASS All 3 commits reference Jira IDs (TC-4937, TC-4977, TC-4978)
Sensitive Patterns PASS No secrets or credentials detected in added lines
CI Status PASS All 5 CI checks pass (Node 22, Node 24, Sourcery, PR title, commit messages)
Acceptance Criteria PASS All 6 acceptance criteria satisfied
Test Quality PASS Repetitive Test Detection: PASS, Test Documentation: PASS, Eval Quality: N/A
Test Change Classification ADDITIVE All test files are new additions (108 lines, 20 tests)
Verification Commands PASS npm test and npm run lint pass; 0 lint errors

Overall: PASS

All review feedback from the prior verification has been addressed:

  • TC-4977 (strip all leading flags) — fixed in commit e3c9903
  • TC-4978 (detect ARG substitution) — fixed in commit dd053df

Both sub-tasks are Done. No new issues found.


This comment was AI-generated by sdlc-workflow/verify-pr v0.11.0.

@a-oren a-oren requested review from Strum355 and ruromero June 29, 2026 10:06
* @returns {boolean} true if the manifest is a Dockerfile or Containerfile
*/
function isSupported(manifestName) {
return manifestName === 'Dockerfile' || manifestName === 'Containerfile'

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen a lot of cases where people will have multiple Dockerfiles but with different suffixes (see https://sourcegraph.com/search?q=context:global+f:/Dockerfile%5C..*&patternType=keyword&case=yes&sm=0), so it would be good to support those as well imo

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should support suffixes

Comment on lines +39 to +53
export function parseFromImage(manifestContent) {
const lines = manifestContent.split(/\r?\n/)
let lastFrom = null
for (const line of lines) {
const trimmed = line.trim()
if (/^FROM\s+/i.test(trimmed)) {
// Extract image ref: FROM [--flag=val ...] image [AS name]
const tokens = trimmed.replace(/^FROM\s+/i, '').split(/\s+/)
// Skip all leading --flag tokens (e.g. --platform=linux/amd64)
let i = 0
while (i < tokens.length && tokens[i].startsWith('--')) {
i++
}
lastFrom = tokens[i] || null
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to use a proper parser, we already use tree-sitter in some places in this repo and theres a dockerfile/containerfile parser for it here: https://github.com/wharflab/tree-sitter-containerfile

Our lack of using proper parsers in the java client is already a bit problematic, so if we can continue the trend of using parsers at least in the javascript client, thatd be great

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-oren make sure the use of parsers is preferred in the conventions file, specially if it is already used in other places like tree-sitter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants