feat(providers): add Dockerfile/Containerfile provider for image analysis#569
feat(providers): add Dockerfile/Containerfile provider for image analysis#569a-oren wants to merge 3 commits into
Conversation
…ysis Add a new provider that recognizes Dockerfile and Containerfile manifests for component and stack analysis. The provider parses FROM lines to extract the base image reference, then reuses generateImageSBOM to produce a CycloneDX SBOM. Multi-stage Dockerfiles use the final stage's FROM image. Implements TC-4937 Assisted-by: Claude Code
Reviewer's GuideAdds a new OCI Dockerfile/Containerfile provider that parses FROM lines to derive the base image, generates an image SBOM via existing OCI utilities, wires it into the provider registry, and introduces unit tests around support detection, parsing, and behavior. Sequence diagram for Dockerfile provider image SBOM generationsequenceDiagram
actor Client
participant dockerfileProvider
participant fs
participant parseFromImage
participant parseImageRef
participant generateImageSBOM
Client->>dockerfileProvider: provideComponent(manifest, opts)
dockerfileProvider->>dockerfileProvider: getImageSBOM(manifest, opts)
dockerfileProvider->>fs: readFileSync(manifest, utf-8)
fs-->>dockerfileProvider: manifestContent
dockerfileProvider->>parseFromImage: parseFromImage(manifestContent)
parseFromImage-->>dockerfileProvider: image
dockerfileProvider->>parseImageRef: parseImageRef(image, opts)
parseImageRef-->>dockerfileProvider: imageRef
dockerfileProvider->>generateImageSBOM: generateImageSBOM(imageRef, opts)
generateImageSBOM-->>dockerfileProvider: sbom
dockerfileProvider-->>Client: {ecosystem: oci, content, contentType}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The
parseFromImageimplementation assumes a relatively simpleFROMsyntax and only strips a single leading--flag; consider making this parsing more robust (e.g., handling multiple flags, comments, and more complex whitespace) or delegating to a Dockerfile parser to avoid mis-extracting the base image. - In
getImageSBOM, synchronousfs.readFileSyncis used; if other providers are asynchronous or this runs in a performance-sensitive path, consider aligning with the existing I/O pattern to avoid blocking the event loop on large Dockerfiles.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `parseFromImage` implementation assumes a relatively simple `FROM` syntax and only strips a single leading `--flag`; consider making this parsing more robust (e.g., handling multiple flags, comments, and more complex whitespace) or delegating to a Dockerfile parser to avoid mis-extracting the base image.
- In `getImageSBOM`, synchronous `fs.readFileSync` is used; if other providers are asynchronous or this runs in a performance-sensitive path, consider aligning with the existing I/O pattern to avoid blocking the event loop on large Dockerfiles.
## Individual Comments
### Comment 1
<location path="src/providers/oci_dockerfile.js" line_range="39-44" />
<code_context>
+ * @returns {string} the image reference from the last FROM line
+ * @throws {Error} when no FROM line is found in the Dockerfile
+ */
+export function parseFromImage(manifestContent) {
+ const lines = manifestContent.split(/\r?\n/)
+ let lastFrom = null
+ for (const line of lines) {
+ const trimmed = line.trim()
+ if (/^FROM\s+/i.test(trimmed)) {
+ // Extract image ref: FROM [--platform=...] image [AS name]
+ const withoutFrom = trimmed.replace(/^FROM\s+/i, '')
</code_context>
<issue_to_address>
**issue (bug_risk):** Dockerfiles that use ARG substitution in FROM lines may not be handled correctly.
This implementation only works when the FROM image is a literal value. In cases like `ARG BASE_IMAGE` followed by `FROM ${BASE_IMAGE}`, `parseFromImage` will return `${BASE_IMAGE}`, which will not be a valid reference for `parseImageRef`/`generateImageSBOM`. If these patterns should be supported, you’ll need to resolve ARGs (including defaults/env), or detect non-literal FROM targets and fail fast with a clear error.
</issue_to_address>
### Comment 2
<location path="src/providers/oci_dockerfile.js" line_range="44-52" />
<code_context>
+ if (/^FROM\s+/i.test(trimmed)) {
+ // Extract image ref: FROM [--platform=...] image [AS name]
+ const withoutFrom = trimmed.replace(/^FROM\s+/i, '')
+ // Skip optional --platform flag
+ const withoutFlags = withoutFrom.replace(/^--\S+\s+/, '')
+ // Take only the image part (before AS alias)
+ const parts = withoutFlags.split(/\s+/)
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Flag stripping on FROM lines only removes a single leading flag, which may miss valid Dockerfile syntax.
The Dockerfile syntax allows multiple flags before the image (e.g. `FROM --platform=$BUILDPLATFORM --some-flag image AS name`), but the current regex only removes a single leading `--...` token. With additional flags, `parts[0]` could still be a flag instead of the image. Consider stripping all leading `--...` tokens (e.g. in a loop) or splitting and filtering out `--*` tokens before selecting the image reference.
```suggestion
if (/^FROM\s+/i.test(trimmed)) {
// Extract image ref: FROM [--platform=...] [--flags...] image [AS name]
const withoutFrom = trimmed.replace(/^FROM\s+/i, '')
// Split into tokens and drop all leading flag tokens (starting with "--")
const tokens = withoutFrom.split(/\s+/).filter(Boolean)
let imageIndex = 0
while (imageIndex < tokens.length && tokens[imageIndex].startsWith('--')) {
imageIndex++
}
if (imageIndex < tokens.length) {
// The first non-flag token is the image; ignore any following AS alias
lastFrom = tokens[imageIndex]
}
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #569 +/- ##
==========================================
- Coverage 90.75% 90.66% -0.09%
==========================================
Files 36 37 +1
Lines 7766 7875 +109
Branches 1353 1367 +14
==========================================
+ Hits 7048 7140 +92
- Misses 718 735 +17
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
[sdlc-workflow/verify-pr] Re: @sourcery-ai[bot] review — Suggestion 1 (parseFromImage robustness): Classified as code change request (upgraded from suggestion) — covered by inline comment sub-tasks TC-4977 and TC-4978. Suggestion 2 (synchronous fs.readFileSync): Classified as suggestion — the project consistently uses |
Verification Report for TC-4937 (commit 905c336)
Overall: WARNTwo reviewer suggestions from sourcery-ai[bot] were upgraded to code change requests based on project conventions (CONVENTIONS.md §Error Handling):
Both are scoped improvements to the FROM line parser. All acceptance criteria pass and CI is green. This comment was AI-generated by sdlc-workflow/verify-pr v0.11.0. |
The parseFromImage function previously only stripped a single --flag token using a regex. Dockerfile syntax allows multiple flags before the image reference. Replace the single regex with a loop that skips all leading --flag tokens. TC-4977 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Assisted-by: Claude Code
When a Dockerfile uses ARG substitution in FROM lines (e.g.
FROM ${BASE_IMAGE}), parseFromImage now throws a clear error
instead of passing the unresolved variable to downstream functions.
TC-4978
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Assisted-by: Claude Code
Verification Report for TC-4937 (commit dd053df)
Overall: PASSAll review feedback from the prior verification has been addressed:
Both sub-tasks are Done. No new issues found. This comment was AI-generated by sdlc-workflow/verify-pr v0.11.0. |
| * @returns {boolean} true if the manifest is a Dockerfile or Containerfile | ||
| */ | ||
| function isSupported(manifestName) { | ||
| return manifestName === 'Dockerfile' || manifestName === 'Containerfile' |
There was a problem hiding this comment.
I've seen a lot of cases where people will have multiple Dockerfiles but with different suffixes (see https://sourcegraph.com/search?q=context:global+f:/Dockerfile%5C..*&patternType=keyword&case=yes&sm=0), so it would be good to support those as well imo
There was a problem hiding this comment.
I agree we should support suffixes
| export function parseFromImage(manifestContent) { | ||
| const lines = manifestContent.split(/\r?\n/) | ||
| let lastFrom = null | ||
| for (const line of lines) { | ||
| const trimmed = line.trim() | ||
| if (/^FROM\s+/i.test(trimmed)) { | ||
| // Extract image ref: FROM [--flag=val ...] image [AS name] | ||
| const tokens = trimmed.replace(/^FROM\s+/i, '').split(/\s+/) | ||
| // Skip all leading --flag tokens (e.g. --platform=linux/amd64) | ||
| let i = 0 | ||
| while (i < tokens.length && tokens[i].startsWith('--')) { | ||
| i++ | ||
| } | ||
| lastFrom = tokens[i] || null | ||
| } |
There was a problem hiding this comment.
It would be nice to use a proper parser, we already use tree-sitter in some places in this repo and theres a dockerfile/containerfile parser for it here: https://github.com/wharflab/tree-sitter-containerfile
Our lack of using proper parsers in the java client is already a bit problematic, so if we can continue the trend of using parsers at least in the javascript client, thatd be great
There was a problem hiding this comment.
@a-oren make sure the use of parsers is preferred in the conventions file, specially if it is already used in other places like tree-sitter
Summary
oci_dockerfile.jsprovider that recognizesDockerfileandContainerfilemanifests for component and stack analysisgenerateImageSBOMto produce a CycloneDX SBOMisSupported,validateLockFile,readLicenseFromManifest,packageManagerName, and FROM line parsing--flagtokens from FROM lines, not just the first (TC-4977)Implements TC-4937
Test plan
isSupportedreturns true forDockerfileandContainerfile, false for othersvalidateLockFilealways returns truereadLicenseFromManifestreturns null🤖 Generated with Claude Code