Skip to content

Conversation

@majiru
Copy link
Contributor

@majiru majiru commented Jan 2, 2026

Ease issues when we have large amounts of pollers all timing out together.

What changed?

Add a jitter to our buffer for poller context timeouts

Why?

we were running in to issues when we had larger numbers of pollers all timing out together, this should spread out that load a bit more.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Ease issues when we have large amounts of pollers all timing out
together.
@majiru majiru requested review from a team as code owners January 2, 2026 23:09
// returned to the handler before a context timeout error is generated.
ctx, cancel := contextutil.WithDeadlineBuffer(ctx, pm.LongPollExpirationInterval(), returnEmptyTaskTimeBudget)
// We also want to jitter the timeout a bit to ease stampeding hurd issues when the pollers all timeout
// together. However take caution to only do this for non forwarded tasks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean "non-forwarded polls"? also is the condition backwards? if pollMetadata.forwardedFrom == "" that's a poll from a client, so it should jitter, otherwise it shouldn't?

also I want to go over the logic again and test it.. most/all sdks actually set the poll grpc timeout to 70s, and LongPollExpirationInterval is 60s, so actually I think a "10s" buffer is a no-op, based on the current logic in WithDeadlineBuffer. I think we need to subtract it from LongPollExpirationInterval. see the comments in slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants