feat(coordinator): limit parallelism during chunk assignment by Thegaram · Pull Request #1797 · scroll-tech/scroll

Thegaram · 2026-02-26T08:51:37Z

Purpose or design rationale of this PR

applyUniversal calls debug_executionWitness, which degrades significantly under concurrent load. When multiple chunk proof assignments arrive simultaneously, the unthrottled parallel calls cause severe slowdowns.

This PR adds a semaphore (capacity 2) that limits concurrent applyUniversal invocations. Requests that arrive while both slots are occupied block until a slot opens or the request context is cancelled, preventing goroutine accumulation.

PR title

Your PR title must follow conventional commits (as we are doing squash merge for each PR), so it must start with one of the following types:

feat: A new feature

Deployment tag versioning

Has tag in common/version.go been updated or have you added bump-version label to this PR?

No, this PR doesn't involve a new deployment, git tag, docker image tag
Yes

Breaking change label

Does this PR have the breaking-change label?

No, this PR is not a breaking change
Yes

Summary by CodeRabbit

Bug Fixes
- Optimized execution witness generation with resource throttling to prevent system overload during concurrent operations and enhance stability.
Chores
- Version bumped to 4.7.13.

coderabbitai · 2026-02-26T08:51:55Z

📝 Walkthrough

Walkthrough

This PR bumps the version tag from v4.7.12 to v4.7.13 in the version file and introduces a global throttle mechanism with a maximum parallelism of 2 for debug_executionWitness processing in the prover task coordinator, using semaphores to limit concurrent universal-witness operations across two code paths.

Changes

Cohort / File(s)	Summary
Version Bump `common/version/version.go`	Updated Go build tag from "v4.7.12" to "v4.7.13"; affects the Version composition string at runtime.
Concurrency Throttle for Witness Processing `coordinator/internal/logic/provertask/chunk_prover_task.go`	Introduced package-level throttle variables (`applyUniversalMaxParallelism = 2` and `witnessSemaphore`) and added semaphore acquisition/release logic in two code paths (Assign path and subsequent task formatting path) when universal witness processing is enabled; includes context cancellation handling with logging and active attempt recovery.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

fix(coordinator): fix not-compatible-prover errors counting towards number of attempts #1619: Modifies the same file (chunk_prover_task.go) in the Assign flow for prover task coordination.
feat: bump zkvm-prover #1614: Updates the same version tag in common/version/version.go.
opt: pre-allocate chunk blocks slice in chunk proposer #1572: Modifies the same version tag variable in common/version/version.go.

Suggested labels

bump-version

Suggested reviewers

georgehao

Poem

🐰 A rabbit hops through version bumps with glee,
From 4.7.12 to lucky thirteen it shall be,
With semaphores standing guard, just two at a time,
The witness threads dance to concurrency's rhyme, ✨
Control flows smoothly as context bells chime.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(coordinator): limit parallelism during chunk assignment' accurately describes the main change: adding parallelism limits to chunk assignment operations.
Description check	✅ Passed	The description provides clear purpose, follows the template structure, answers key questions (what/why/how), confirms PR title follows conventional commits, and documents deployment versioning. All major sections are covered.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch coordinator-limit-chunk-parallelism

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

coordinator/internal/logic/provertask/chunk_prover_task.go (2)

27-32: Consider making universal parallelism configurable.

Hard-coding 2 at package scope makes tuning production behavior require code changes. A config-backed value (with sane default) would be easier to operate under varying prover capacity and API latency.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@coordinator/internal/logic/provertask/chunk_prover_task.go` around lines 27 -
32, The package currently hard-codes applyUniversalMaxParallelism = 2 and
constructs witnessSemaphore at package init; make this configurable by reading a
config/env/flag value at startup and initializing applyUniversalMaxParallelism
and witnessSemaphore there instead of at package scope. Specifically: add a
config key (or env var/flag) for universal max parallelism with a sane default,
validate it (ensure >=1), and create witnessSemaphore = make(chan struct{}, max)
during application initialization (e.g., in an init hook or the coordinator
startup/new constructor) so the semaphore capacity reflects the configured
value; update any code that references applyUniversalMaxParallelism to use the
configured variable.

212-215: Release the semaphore immediately after applyUniversal completes.

Line 215 defers release until Assign returns, so post-processing (DB insert/metrics/logging) still occupies a slot. Scope the acquire/release around cp.applyUniversal only to keep throughput predictable.

💡 Proposed refactor

 	if getTaskParameter.Universal {
 		var metadata []byte
-
-		select {
-		case witnessSemaphore <- struct{}{}:
-			// Released when Assign returns (defer).
-			defer func() { <-witnessSemaphore }()
-		case <-ctx.Done():
-			log.Warn("context canceled waiting for witness semaphore", "task_id", chunkTask.Hash, "err", ctx.Err())
-			cp.recoverActiveAttempts(ctx, chunkTask)
-			return nil, ctx.Err()
-		}
-
-		taskMsg, metadata, err = cp.applyUniversal(taskMsg)
+		taskMsg, metadata, err = func() (*coordinatorType.GetTaskSchema, []byte, error) {
+			select {
+			case witnessSemaphore <- struct{}{}:
+				defer func() { <-witnessSemaphore }()
+			case <-ctx.Done():
+				log.Warn("context canceled waiting for witness semaphore", "task_id", chunkTask.Hash, "err", ctx.Err())
+				if taskCtx.hasAssignedTask == nil {
+					cp.recoverActiveAttempts(ctx, chunkTask)
+				}
+				return nil, nil, ctx.Err()
+			}
+			return cp.applyUniversal(taskMsg)
+		}()
 		if err != nil {
 			cp.recoverActiveAttempts(ctx, chunkTask)
 			log.Error("Generate universal prover task failure", "task_id", chunkTask.Hash, "type", "chunk", "err", err)
 			return nil, ErrCoordinatorInternalFailure
 		}
 		proverTask.Metadata = metadata
 	}

Also applies to: 222-223

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@coordinator/internal/logic/provertask/chunk_prover_task.go` around lines 212
- 215, The semaphore acquire around the witness work currently holds the slot
until Assign returns because the release is deferred; change the scope so
witnessSemaphore is acquired before calling cp.applyUniversal and released
immediately after cp.applyUniversal completes (before any DB
inserts/metrics/logging or before calling Assign) instead of deferring until
Assign returns; update both places that acquire witnessSemaphore (the block
around cp.applyUniversal and the similar block later) so the defer is removed
and a direct <-witnessSemaphore is executed right after cp.applyUniversal
returns to limit semaphore occupancy to only the applyUniversal duration.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@coordinator/internal/logic/provertask/chunk_prover_task.go`:
- Around line 217-219: The early return on context cancellation is
unconditionally calling cp.recoverActiveAttempts, which must only run when this
request performed the increment (the increment happens in the
taskCtx.hasAssignedTask == nil path). Update the context-cancel branch in
chunk_prover_task.go to check taskCtx.hasAssignedTask == nil (or another flag
that indicates this request incremented attempts) before calling
cp.recoverActiveAttempts(ctx, chunkTask) and before returning; leave the return
of ctx.Err() unchanged when no increment occurred. Ensure the guard uses the
same condition used where attempts are incremented so accounting stays
consistent.

---

Nitpick comments:
In `@coordinator/internal/logic/provertask/chunk_prover_task.go`:
- Around line 27-32: The package currently hard-codes
applyUniversalMaxParallelism = 2 and constructs witnessSemaphore at package
init; make this configurable by reading a config/env/flag value at startup and
initializing applyUniversalMaxParallelism and witnessSemaphore there instead of
at package scope. Specifically: add a config key (or env var/flag) for universal
max parallelism with a sane default, validate it (ensure >=1), and create
witnessSemaphore = make(chan struct{}, max) during application initialization
(e.g., in an init hook or the coordinator startup/new constructor) so the
semaphore capacity reflects the configured value; update any code that
references applyUniversalMaxParallelism to use the configured variable.
- Around line 212-215: The semaphore acquire around the witness work currently
holds the slot until Assign returns because the release is deferred; change the
scope so witnessSemaphore is acquired before calling cp.applyUniversal and
released immediately after cp.applyUniversal completes (before any DB
inserts/metrics/logging or before calling Assign) instead of deferring until
Assign returns; update both places that acquire witnessSemaphore (the block
around cp.applyUniversal and the similar block later) so the defer is removed
and a direct <-witnessSemaphore is executed right after cp.applyUniversal
returns to limit semaphore occupancy to only the applyUniversal duration.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9b2b5e0 and f8cec5d.

📒 Files selected for processing (2)

common/version/version.go
coordinator/internal/logic/provertask/chunk_prover_task.go

coderabbitai · 2026-02-26T08:55:00Z

coordinator/internal/logic/provertask/chunk_prover_task.go

+			log.Warn("context canceled waiting for witness semaphore", "task_id", chunkTask.Hash, "err", ctx.Err())
+			cp.recoverActiveAttempts(ctx, chunkTask)
+			return nil, ctx.Err()


⚠️ Potential issue | 🟠 Major

Guard active-attempt recovery to only run when this request incremented attempts.

Line 218 decrements active attempts unconditionally, but increments only happen in the taskCtx.hasAssignedTask == nil path. This can corrupt attempt accounting for already-assigned tasks.

💡 Proposed fix

case <-ctx.Done(): log.Warn("context canceled waiting for witness semaphore", "task_id", chunkTask.Hash, "err", ctx.Err()) - cp.recoverActiveAttempts(ctx, chunkTask) + if taskCtx.hasAssignedTask == nil { + cp.recoverActiveAttempts(ctx, chunkTask) + } return nil, ctx.Err() }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@coordinator/internal/logic/provertask/chunk_prover_task.go` around lines 217 - 219, The early return on context cancellation is unconditionally calling cp.recoverActiveAttempts, which must only run when this request performed the increment (the increment happens in the taskCtx.hasAssignedTask == nil path). Update the context-cancel branch in chunk_prover_task.go to check taskCtx.hasAssignedTask == nil (or another flag that indicates this request incremented attempts) before calling cp.recoverActiveAttempts(ctx, chunkTask) and before returning; leave the return of ctx.Err() unchanged when no increment occurred. Ensure the guard uses the same condition used where attempts are incremented so accounting stays consistent.

This seems to be a pre-existing issue on other code paths as well, might this be a problem in practice? @georgehao

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

codecov-commenter · 2026-02-26T09:04:51Z

Codecov Report

❌ Patch coverage is 55.55556% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 36.42%. Comparing base (9b2b5e0) to head (f8cec5d).

Files with missing lines	Patch %	Lines
...tor/internal/logic/provertask/chunk_prover_task.go	55.55%	4 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #1797   +/-   ##
========================================
  Coverage    36.41%   36.42%           
========================================
  Files          248      248           
  Lines        21321    21330    +9     
========================================
+ Hits          7765     7770    +5     
- Misses       12733    12737    +4     
  Partials       823      823

Flag	Coverage Δ
coordinator	`32.32% <55.55%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coordinator: limit parallelism during chunk assignment

f8cec5d

Thegaram requested review from georgehao and noel2004 February 26, 2026 08:51

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(coordinator): limit parallelism during chunk assignment#1797

feat(coordinator): limit parallelism during chunk assignment#1797
Thegaram wants to merge 1 commit intodevelopfrom
coordinator-limit-chunk-parallelism

Thegaram commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 26, 2026 •

edited

Loading

Uh oh!

Thegaram Feb 26, 2026

Uh oh!

coderabbitai bot Feb 26, 2026

Uh oh!

codecov-commenter commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thegaram commented Feb 26, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose or design rationale of this PR

PR title

Deployment tag versioning

Breaking change label

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Thegaram Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Feb 26, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Thegaram commented Feb 26, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading

coderabbitai bot Feb 26, 2026 •

edited

Loading