Skip to content

Make artifact path matching match the official runner#126

Merged
sjoerdsimons merged 1 commit into
mainfrom
wip/refi64/artifact-paths
Jun 23, 2026
Merged

Make artifact path matching match the official runner#126
sjoerdsimons merged 1 commit into
mainfrom
wip/refi64/artifact-paths

Conversation

@refi64

@refi64 refi64 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Older versions of gitlab-runner did not filter uploaded artifacts at all, regardless of the value of artifacts the user gave. In commit eccfdbc, artifact filtering was added, by treating artifacts entries as glob patterns and matching them against the artifact paths the runner instances proposed. However, gitlab-runner itself instead uses doublestar to traverse the current tree for artifacts, which has a few notable implications:

  • Paths are normalized first with filepath.Clean & filepath.ToSlash:

    https://github.com/bmatcuk/doublestar/blob/a9ad9e0ef4d6b7e4443090e9a7201d847a881711/utils.go#L74-L80

    So file and ./file// are both the same pattern and will match file. (We don't care too much about ToSlash for now, since that should only have an effect for the Windows path separator).

  • If a pattern matches a directory, the full subtree is added to the archive.

  • Wildcards * and ? retain the usual filesystem-level semantics of not crossing path components.

These behaviors do not line up with the current glob matches, which will only do an exact match on the full artifact path with the default glob options (require_literal_separator: false). The result is:

  • All globs are effectively recursive globs, so a/b/c will be matched by any of a/*, *b*, a?b?c.
  • You need to add an explicit trailing wildcard /* to directories, otherwise none of their contents get included.
    • e.g. dir or dir/ will not match dir/file, as it will be looking for full paths named dir or dir/ (including the trailing slash, due to the lack of normalization).

To fix this, both the declared artifact path and UploadableFile's path (for consistency) are normalized, and then the patterns are matched on successive ancestors of the current path, while telling glob that it needs to give the / special treatment.

Technically this leaves some other gitlab-runner behavior on the table, such as support for brace expansion, or the aforementioned Windows separator support, or handling absolute paths:

https://gitlab.com/gitlab-org/gitlab-runner/-/blob/fc6a8943d99928a9d293186be03674c333fd09fa/commands/helpers/file_archiver.go#L191

But that's all leaning more towards fringe functionality that we probably don't care about here (at least not for now).

@sjoerdsimons

Copy link
Copy Markdown
Collaborator

Is it broken or do you mean it didn't match gitlab's runner behaviour? If the former can you point to the commit that introduced the brokeness. In the latter case apart from the bits you documented is this change the behaviour of the runner in other ways that might impact users?

@refi64 refi64 force-pushed the wip/refi64/artifact-paths branch from 1bfd08f to 7013664 Compare June 18, 2026 14:42
@refi64

refi64 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

I updated the commit message / PR body. I guess strictly speaking, it's not "broken" in that before we weren't doing any filtering at all, but I'd argue it's still "broken" in that we have a bunch of consumers of the runner (e.g. ci-package-builder) that actually give "correct" values for artifacts: anyway, except the syntax this crate was now using is very different and arguably nonsensical in context (* matching slashes?).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes artifact path filtering to better match GitLab Runner’s semantics by normalizing paths and matching patterns against ancestor directories with path-separator-aware globbing.

Changes:

  • Add artifact_path_matches() to normalize artifact patterns and match them against successive ancestors of each candidate file path using glob options that prevent wildcards from crossing path components.
  • Normalize UploadableFile paths before both filtering and uploading, aligning behavior with filepath.Clean-style normalization.
  • Update artifact-related integration tests to cover normalization (./test2), directory patterns (dir), and trailing slash handling (test/), and add a small unit test suite for the matcher.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.

File Description
gitlab-runner/src/run.rs Implements normalized, ancestor-aware artifact path matching and applies normalization to uploaded artifact file paths.
gitlab-runner/tests/artifacts.rs Expands tests to validate directory matching and normalization behavior in upload/download flows.
gitlab-runner/Cargo.toml Adds the normalize-path dependency used for lexical path normalization.
Cargo.lock Locks the new dependency.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread gitlab-runner/src/run.rs
@sjoerdsimons

Copy link
Copy Markdown
Collaborator

For reference the referred to commit is since the gitlab runner v0.0.8 crate release . So this has been in the lava gitlab runner since release v0.3.0, which is widely deployed and used. I didn't see complaints there but the artifact that is uses is somewaht minimal as well

The OBS gitlab runner never had a release so any kind of artifact filtering will be new for it. So wherever you say we weren't doing this or that it's specific to that runner ;)

@sjoerdsimons

Copy link
Copy Markdown
Collaborator

@copilot Please consider the changes this PR introduces and add a comment with an example table where there would be functional differences from a users point of view (iotw. where the wildcards now have a different meaning)

Copilot AI commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@copilot Please consider the changes this PR introduces and add a comment with an example table where there would be functional differences from a users point of view (iotw. where the wildcards now h...

Relative to the filtering added in v0.0.8, the user-visible differences are mainly the cases where */? previously ended up consuming / inside a later path component.

artifacts:paths entry candidate file before this PR with this PR
*.xml reports/junit.xml matched no match
dir/*.txt dir/sub/result.txt matched no match
a?b a/b matched no match
foo*bar foo/x/bar matched no match

There are also two non-wildcard behavior changes worth calling out:

  • ./test// now matches test because both sides are normalized first.
  • A pattern that matches a directory, e.g. dir, includes the whole subtree under it, so dir/test still matches.

@sjoerdsimons

Copy link
Copy Markdown
Collaborator

@copilot Please consider the changes this PR introduces and add a comment with an example table where there would be functional differences from a users point of view (iotw. where the wildcards now h...

Relative to the filtering added in v0.0.8, the user-visible differences are mainly the cases where */? previously ended up consuming / inside a later path component.

artifacts:paths entry candidate file before this PR with this PR
*.xml reports/junit.xml matched no match
dir/*.txt dir/sub/result.txt matched no match
a?b a/b matched no match
foo*bar foo/x/bar matched no match
There are also two non-wildcard behavior changes worth calling out:

  • ./test// now matches test because both sides are normalized first.
  • A pattern that matches a directory, e.g. dir, includes the whole subtree under it, so dir/test still matches.

@copilot please further extend this table with items that are unchanged and work in your non-wildcard behaviours as well

Copilot AI commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@copilot Please consider the changes this PR introduces and add a comment with an example table where there would be functional differences from a users point of view (iotw. where the wildcards n...

artifacts:paths entry candidate file before this PR with this PR note
*.xml reports/junit.xml matched no match * no longer crosses /
dir/*.txt dir/sub/result.txt matched no match * no longer crosses /
a?b a/b matched no match ? no longer crosses /
foo*bar foo/x/bar matched no match * no longer crosses /
*.xml junit.xml matched matched unchanged wildcard case
dir/*.txt dir/result.txt matched matched unchanged wildcard case
a/**/c a/b/c matched matched unchanged doublestar-style case
test test matched matched unchanged literal case
./test// test no match matched normalized before matching
dir dir/test no match matched directory match now includes subtree

@sjoerdsimons

Copy link
Copy Markdown
Collaborator

So specifically lava-gitlab-runner only generates file named <lava job id>_{junit.xml,log.yaml} so its behaviour will be the same regardless. Meaning there is no practical impact for its users it's mainly the obs runner due it being out of date.

@sjoerdsimons

Copy link
Copy Markdown
Collaborator

For obs-gitlab-runner; it seems at least in the Collabora usage we primarily use the artifact path matching as part of the job generated by the generate-monitor command. I haven't seen a manual job yet, so that shouldn't cause regressions for at least our deployment.

Comment thread gitlab-runner/src/run.rs
Comment thread gitlab-runner/src/run.rs Outdated
Comment thread gitlab-runner/src/run.rs Outdated
Comment thread gitlab-runner/src/run.rs Outdated
@sjoerdsimons

Copy link
Copy Markdown
Collaborator

Now that i understand what's going on can you update your commit message a bit to say e.g. "Match gitlab-runner behaviour" or somesuch. As nothing was really broken it just didn't match.

Would also be nice to explain the impact a la the comparison table copilot create as that's a lot more clear (though your explanation as complete).

Also if you have a listing, don't add extra newlines, that makes it render in odd ways :)

@refi64 refi64 force-pushed the wip/refi64/artifact-paths branch from 7013664 to 292af12 Compare June 22, 2026 21:02
@refi64 refi64 changed the title Fix broken artifact path matching Make artifact path matching match the official runner Jun 22, 2026
@refi64 refi64 force-pushed the wip/refi64/artifact-paths branch from 292af12 to 93ddead Compare June 22, 2026 21:03
@refi64

refi64 commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator Author

Also if you have a listing, don't add extra newlines, that makes it render in odd ways :)

(I'm not really sure what this means?)

Older versions of this crate did not filter uploaded artifacts at
all, regardless of the value of `artifacts` the user gave. In commit
eccfdbc, artifact filtering was added,
by treating `artifacts` entries as glob patterns and matching them
against the artifact paths the runner instances proposed. However,
gitlab-runner itself instead uses doublestar to traverse the current
tree for artifacts, which has a few notable implications:

- Paths are normalized first with filepath.Clean & filepath.ToSlash:
    https://github.com/bmatcuk/doublestar/blob/a9ad9e0ef4d6b7e4443090e9a7201d847a881711/utils.go#L74-L80
  So `file` and `./file//` are both the same pattern and will match
  `file`. (We don't care too much about ToSlash for now, since that
  should only have an effect for the Windows path separator).
- If a pattern matches a directory, the full subtree is added to the
  archive.
- Wildcards `*` and `?` retain the usual filesystem-level semantics of
  not crossing path components.

These behaviors do not line up with the current glob matches, which will
only do an exact match on the full artifact path with the default glob
options (`require_literal_separator: false`). The result is:

- All globs are effectively recursive globs, so `a/b/c` will be matched
  by any of `a/*`, `*b*`, `a?b?c`.
- You need to add an explicit trailing wildcard `/*` to directories,
  otherwise none of their contents get included.
  - e.g. `dir` or `dir/` will not match `dir/file`, as it will be
    looking for full paths named `dir` or `dir/` (including the trailing
    slash, due to the lack of normalization).

To fix this, both the declared artifact path and UploadableFile's path
(for consistency) are normalized, and then the patterns are matched on
successive ancestors of the current path, while telling `glob` that it
needs to give the `/` special treatment.

Technically this leaves some other behavior on the table, such as
support for brace expansion, or the aforementioned Windows separator
support, or handling absolute paths:

https://gitlab.com/gitlab-org/gitlab-runner/-/blob/fc6a8943d99928a9d293186be03674c333fd09fa/commands/helpers/file_archiver.go#L191

But that's all leaning more towards fringe functionality that we
probably don't care about here (at least not for now).
@sjoerdsimons sjoerdsimons force-pushed the wip/refi64/artifact-paths branch from 93ddead to e629737 Compare June 23, 2026 06:58
@sjoerdsimons sjoerdsimons added this pull request to the merge queue Jun 23, 2026
Merged via the queue into main with commit 0d59702 Jun 23, 2026
21 checks passed
refi64 added a commit to collabora/obs-gitlab-runner that referenced this pull request Jun 23, 2026
This contains fixes for logs being blank in some GitLab versions:

collabora/gitlab-runner-rs#125

https://docs.gitlab.com/releases/patches/patch-release-gitlab-19-0-2-released/

as well as improved support for artifact paths that now matches GitLab's
usual semantics:

collabora/gitlab-runner-rs#126

Fixes #119.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants