Skip to content

sync: add --concurrency flag#5197

Open
igrekun wants to merge 4 commits intomainfrom
sync-max-concurrent-requests
Open

sync: add --concurrency flag#5197
igrekun wants to merge 4 commits intomainfrom
sync-max-concurrent-requests

Conversation

@igrekun
Copy link
Copy Markdown
Contributor

@igrekun igrekun commented May 6, 2026

Changes

Expose the previously-hardcoded request concurrency cap as a --concurrency flag on databricks sync and databricks bundle sync. Defaults to the existing limit (20) so behavior is unchanged when the flag is omitted; values < 1 are rejected.

The flag name matches databricks fs cp --concurrency (#4132).

Why

When uploading multiple files ~5mb size, high concurrency can trigger stream timeouts.

Tests

Expose the previously-hardcoded request concurrency cap as a flag on the
`databricks sync` command. Defaults to the existing limit (20) so behavior
is unchanged when the flag is omitted; values < 1 are rejected.

Co-authored-by: Isaac
@igrekun igrekun requested a review from pietern May 6, 2026 17:37
@igrekun igrekun temporarily deployed to test-trigger-is May 6, 2026 17:38 — with GitHub Actions Inactive
@igrekun igrekun temporarily deployed to test-trigger-is May 6, 2026 17:38 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Approval status: pending

/acceptance/bundle/ - needs approval

Files: acceptance/bundle/help/bundle-sync/output.txt
Suggested: @pietern
Also eligible: @denik, @andrewnester, @janniklasrose, @shreyas-goenka, @anton-107, @lennartkats-db

/cmd/bundle/ - needs approval

Files: cmd/bundle/sync.go
Suggested: @pietern
Also eligible: @denik, @andrewnester, @janniklasrose, @shreyas-goenka, @anton-107, @lennartkats-db

/cmd/sync/ - needs approval

Files: cmd/sync/sync.go, cmd/sync/sync_test.go
Suggested: @simonfaltum
Also eligible: @mihaimitrea-db, @renaudhartert-db, @hectorcast-db, @parthban-db, @tanmay-db, @Divyansh-db, @tejaskochar-db, @chrisst, @rauchy

/libs/sync/ - needs approval

Files: libs/sync/sync.go, libs/sync/watchdog.go
Suggested: @simonfaltum
Also eligible: @mihaimitrea-db, @renaudhartert-db, @hectorcast-db, @parthban-db, @tanmay-db, @Divyansh-db, @tejaskochar-db, @chrisst, @rauchy

General files (require maintainer)

Files: NEXT_CHANGELOG.md, acceptance/cmd/sync-without-args/output.txt
Based on git history:

  • @pietern -- recent work in libs/sync/, ./, cmd/sync/

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

Copy link
Copy Markdown
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Small suggestion on naming the flag and vars.

Comment thread cmd/sync/sync.go Outdated
cmd.Flags().StringVar(&f.excludeFrom, "exclude-from", "", "file containing patterns to exclude from sync (one pattern per line)")
cmd.Flags().StringVar(&f.includeFrom, "include-from", "", "file containing patterns to include to sync (one pattern per line)")
cmd.Flags().BoolVar(&f.dryRun, "dry-run", false, "simulate sync execution without making actual changes")
cmd.Flags().IntVar(&f.maxConcurrentRequests, "max-concurrent-requests", sync.MaxRequestsInFlight, "maximum number of concurrent requests to the workspace")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is precedent for configuring concurrency in fs cp in #4132.

Please mirror the flag name, handling, etc, for consistency.

Comment thread libs/sync/sync.go Outdated

DryRun bool

MaxConcurrentRequests int
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be Concurrency.

@pietern
Copy link
Copy Markdown
Contributor

pietern commented May 7, 2026

@renaudhartert-db @simonfaltum FYI

Match the name of `fs cp --concurrency` (#4132) for consistency, and
expose the same flag on `databricks bundle sync` so both sync entry
points are tunable.

Co-authored-by: Isaac
@pietern pietern temporarily deployed to test-trigger-is May 7, 2026 09:00 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 7, 2026 09:00 — with GitHub Actions Inactive
@pietern pietern changed the title sync: add --max-concurrent-requests flag sync: add --concurrency flag May 7, 2026
@pietern pietern requested a review from simonfaltum May 7, 2026 09:03
@pietern
Copy link
Copy Markdown
Contributor

pietern commented May 7, 2026

@igrekun I applied the suggestions to the PR, please take a look and confirm this matches what you need.

Also includes the same flag to bundle sync to keep it consistent.

@pietern pietern temporarily deployed to test-trigger-is May 7, 2026 09:16 — with GitHub Actions Inactive
@pietern pietern temporarily deployed to test-trigger-is May 7, 2026 09:16 — with GitHub Actions Inactive
@simonfaltum
Copy link
Copy Markdown
Member

My concern is the validation order for databricks sync.

Right now PreRunE runs root.MustWorkspaceClient before RunE validates --concurrency, so invalid flag values can get masked by auth/config errors. I verified this with:

go run . --profile __missing_profile_for_review__ sync --concurrency 0 /tmp /Workspace/tmp

This returns the missing-profile error instead of --concurrency must be at least 1.

I think we should move f.validate() into PreRunE, before mustWorkspaceClient, similar to how fs cp validates its concurrency flag before constructing the workspace client. That makes the CLI reject invalid user input before doing auth setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants