Skip to content

CBG-5205 test speed up: avoid sleep on db start#8094

Open
torcolvin wants to merge 4 commits intomainfrom
CBG-5205
Open

CBG-5205 test speed up: avoid sleep on db start#8094
torcolvin wants to merge 4 commits intomainfrom
CBG-5205

Conversation

@torcolvin
Copy link
Collaborator

CBG-5205 test speed up: avoid sleep on db start

When starting a database that has a non zero _sync:seq, the database will sleep 1.5 to wait for released sequences.
In a test environment, this occurs often when there is a single RestTester running an a database configuration
occurs.

Fixes a bug where if users or roles are specified on database configuration, they would allocate a sequence
before the initial sequence was computed, so this would always result in waiting.

Set DisableSequenceWaitOnDbRestart anywhere there was a single RestTester. Keep this value for ISGR tests that
actually use multiple RestTesters.

Remove LongRunningTest for any test that runs faster than 50ms. This affects 37 tests.

I think it might be worthwhile to drop the comments around the usage of DisableSequenceWaitOnDbRestart?

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Integration Tests

When starting a database that has a non zero _sync:seq, the database will sleep 1.5 to wait for released sequences.
In a test environment, this occurs often when there is a single RestTester running an a database configuration
occurs.

Fixes a bug where if users or roles are specified on database configuration, they would allocate a sequence
before the initial sequence was computed, so this would always result in waiting.

Set DisableSequenceWaitOnDbRestart anywhere there was a single RestTester. Keep this value for ISGR tests that
actually use multiple RestTesters.

Remove LongRunningTest for any test that runs faster than 50ms
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR speeds up integration tests by eliminating unnecessary 1.5-second sleeps when restarting a database. The sleeps were originally designed for multi-node deployments to ensure allocated sequences from other nodes had been assigned or released before starting the change cache. In single-node tests, this wait serves no purpose.

A secondary, related bug fix moves the _sync:seq capture before the principal installation step, so that sequences allocated by InstallPrincipals are not incorrectly counted as "pre-existing" sequences requiring a wait.

Changes:

  • Fix in db/database.go: Move lastSequence read before InstallPrincipals so that sequences allocated during principal creation don't erroneously trigger the release wait
  • Test optimization: Add db.DisableSequenceWaitOnDbRestart(t) to 30+ single-node tests, and remove base.LongRunningTest(t) from the 37 tests that now run faster

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
db/database.go Bug fix: reads _sync:seq before creating config principals so principal-allocated sequences don't inflate the initial sequence value
rest/persistent_config_test.go Adds DisableSequenceWaitOnDbRestart; accidentally removes SkipImportTestsIfNotEnabled from TestImportFilterEndpoint
rest/audit_test.go Moves DisableSequenceWaitOnDbRestart into the shared createAuditLoggingRestTester helper, removes redundant direct call in TestDatabaseAuditChanges
rest/bootstrap_test.go Adds DisableSequenceWaitOnDbRestart while keeping LongRunningTest
rest/adminapitest/admin_api_test.go Removes LongRunningTest from several tests; adds DisableSequenceWaitOnDbRestart to two tests
rest/upgradetest/upgrade_registry_test.go Removes LongRunningTest; adds DisableSequenceWaitOnDbRestart to upgrade tests
rest/diagnostic_doc_api_test.go Adds DisableSequenceWaitOnDbRestart to 4 single-node tests
rest/diagnostic_api_test.go Removes LongRunningTest; adds DisableSequenceWaitOnDbRestart
rest/replicatortest/replicator_test.go Adds DisableSequenceWaitOnDbRestart to TestDBReplicationStatsTeardown; removes LongRunningTest from TestBlipSyncNonUpgradableConnection
rest/importtest/collections_import_test.go Adds DisableSequenceWaitOnDbRestart to 3 multi-collection import tests
rest/legacy_rev_test.go Adds DisableSequenceWaitOnDbRestart to TestResyncLegacyRev
rest/api_collections_test.go Removes LongRunningTest; adds DisableSequenceWaitOnDbRestart to 2 tests
rest/user_api_test.go Removes LongRunningTest from 2 tests; adds DisableSequenceWaitOnDbRestart to 2 others
rest/cors_test.go, rest/sync_fn_test.go, rest/oidc_api_test.go, rest/server_context_test.go Remove LongRunningTest; add DisableSequenceWaitOnDbRestart
rest/access_test.go, rest/api_test.go, rest/bytes_read_public_api_test.go, rest/blip_api_attachment_test.go, rest/config_test.go Remove LongRunningTest only
rest/functionsapitest/user_functions_queries_test.go Removes LongRunningTest only

Comment on lines 341 to +345
func TestImportFilterEndpoint(t *testing.T) {
base.LongRunningTest(t)

base.SkipImportTestsIfNotEnabled(t) // import tests don't work without xattrs
// speed up test by not sleeping for _sync:seq when database reloads
// this sleep is used for multiple Sync Gateway nodes starting up simultaneously, but this test is only running on a
// single node.
db.DisableSequenceWaitOnDbRestart(t)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base.SkipImportTestsIfNotEnabled(t) call (which skips tests when xattrs are not enabled) was accidentally removed from TestImportFilterEndpoint. This test exercises document import functionality (it writes documents directly to the data store and expects them to be auto-imported), which requires xattrs to be enabled. Without this guard, the test will fail in environments where xattrs are not enabled.

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Comment on lines +2933 to +2935
// speed up test by not sleeping for _sync:seq when database reloads
// this sleep is used for multiple Sync Gateway nodes starting up simultaneously, but this test is only
// using a single node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments throughout are definitely redundant as you mentioned.

This function name does a reasonable job at communicating the intention already, and the godoc comment has a good explanation of why. Maybe you can just improve the godoc comment if you feel like it needs it?

// speed up test by not sleeping for _sync:seq when database reloads
// this sleep is used for multiple Sync Gateway nodes starting up simultaneously, but this test is only using a
// single node
db.DisableSequenceWaitOnDbRestart(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since single node is the "normal" perhaps making this the default behaviour for RestTester would be beneficial rather than introducing this into a large number of tests?

There is already an under-utilised framework in place for "multi-node" RestTesters (RestTesterCluster) which may be worth thinking about rolling out more widely if there are tests doing a weird DIY multi-node thing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could bypass this wait time without losing anything important in our tests. The places that set up multiple nodes all call addActiveRT. My understanding of the wait is that it helps if multiple nodes are starting up at exactly the same time.

If we wanted to test that behavior, we should probably do it in a way that is more robust and a faster polling window than incidental 1.5 seconds.

Comment on lines +2248 to +2256
// Get current value of _sync:seq
initialSequence, seqErr := db.sequences.lastSequence(ctx)
if seqErr != nil {
return seqErr
}
initialSequenceTime := time.Now()

base.InfofCtx(ctx, base.KeyCRUD, "Database has _sync:seq value on startup of %d", initialSequence)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am finding it difficult to reason about what reordering this code means in practice. Sequences reserved but not allocated during user creation below will no longer block the change cache from starting up even on a single node?

Is that intentional? I think you'd hit this situation if you had a bunch of users to create that forced sequence batching but didn't use a full batch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read on this was that installing the user creation was entirely accidental to be this early and just a result of how StartOnlineProcess was created. I think I could move the code to InstallPrincipals to after the change cache starts up. Moving it in the middle caused this data race 7e9da5a which is how I ended up the sequence allocation to the middle.

I agree this is a weird way to do it, and I'm not fond of it. If we have DisableSequenceWaitOnDbRestart running all the times, I don't think we have to close this gap.

One thing that makes the principal creation odd is that they are only modified if they don't already exist (except for guest user) as per

err = base.HTTPErrorf(http.StatusConflict, "Already exists")
so they rarely would ever actually be created on multiple nodes simultaneously, since this only occurs once on the initial creation of the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants