Conversation
When starting a database that has a non zero _sync:seq, the database will sleep 1.5 to wait for released sequences. In a test environment, this occurs often when there is a single RestTester running an a database configuration occurs. Fixes a bug where if users or roles are specified on database configuration, they would allocate a sequence before the initial sequence was computed, so this would always result in waiting. Set DisableSequenceWaitOnDbRestart anywhere there was a single RestTester. Keep this value for ISGR tests that actually use multiple RestTesters. Remove LongRunningTest for any test that runs faster than 50ms
There was a problem hiding this comment.
Pull request overview
This PR speeds up integration tests by eliminating unnecessary 1.5-second sleeps when restarting a database. The sleeps were originally designed for multi-node deployments to ensure allocated sequences from other nodes had been assigned or released before starting the change cache. In single-node tests, this wait serves no purpose.
A secondary, related bug fix moves the _sync:seq capture before the principal installation step, so that sequences allocated by InstallPrincipals are not incorrectly counted as "pre-existing" sequences requiring a wait.
Changes:
- Fix in
db/database.go: MovelastSequenceread beforeInstallPrincipalsso that sequences allocated during principal creation don't erroneously trigger the release wait - Test optimization: Add
db.DisableSequenceWaitOnDbRestart(t)to 30+ single-node tests, and removebase.LongRunningTest(t)from the 37 tests that now run faster
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
db/database.go |
Bug fix: reads _sync:seq before creating config principals so principal-allocated sequences don't inflate the initial sequence value |
rest/persistent_config_test.go |
Adds DisableSequenceWaitOnDbRestart; accidentally removes SkipImportTestsIfNotEnabled from TestImportFilterEndpoint |
rest/audit_test.go |
Moves DisableSequenceWaitOnDbRestart into the shared createAuditLoggingRestTester helper, removes redundant direct call in TestDatabaseAuditChanges |
rest/bootstrap_test.go |
Adds DisableSequenceWaitOnDbRestart while keeping LongRunningTest |
rest/adminapitest/admin_api_test.go |
Removes LongRunningTest from several tests; adds DisableSequenceWaitOnDbRestart to two tests |
rest/upgradetest/upgrade_registry_test.go |
Removes LongRunningTest; adds DisableSequenceWaitOnDbRestart to upgrade tests |
rest/diagnostic_doc_api_test.go |
Adds DisableSequenceWaitOnDbRestart to 4 single-node tests |
rest/diagnostic_api_test.go |
Removes LongRunningTest; adds DisableSequenceWaitOnDbRestart |
rest/replicatortest/replicator_test.go |
Adds DisableSequenceWaitOnDbRestart to TestDBReplicationStatsTeardown; removes LongRunningTest from TestBlipSyncNonUpgradableConnection |
rest/importtest/collections_import_test.go |
Adds DisableSequenceWaitOnDbRestart to 3 multi-collection import tests |
rest/legacy_rev_test.go |
Adds DisableSequenceWaitOnDbRestart to TestResyncLegacyRev |
rest/api_collections_test.go |
Removes LongRunningTest; adds DisableSequenceWaitOnDbRestart to 2 tests |
rest/user_api_test.go |
Removes LongRunningTest from 2 tests; adds DisableSequenceWaitOnDbRestart to 2 others |
rest/cors_test.go, rest/sync_fn_test.go, rest/oidc_api_test.go, rest/server_context_test.go |
Remove LongRunningTest; add DisableSequenceWaitOnDbRestart |
rest/access_test.go, rest/api_test.go, rest/bytes_read_public_api_test.go, rest/blip_api_attachment_test.go, rest/config_test.go |
Remove LongRunningTest only |
rest/functionsapitest/user_functions_queries_test.go |
Removes LongRunningTest only |
| func TestImportFilterEndpoint(t *testing.T) { | ||
| base.LongRunningTest(t) | ||
|
|
||
| base.SkipImportTestsIfNotEnabled(t) // import tests don't work without xattrs | ||
| // speed up test by not sleeping for _sync:seq when database reloads | ||
| // this sleep is used for multiple Sync Gateway nodes starting up simultaneously, but this test is only running on a | ||
| // single node. | ||
| db.DisableSequenceWaitOnDbRestart(t) |
There was a problem hiding this comment.
The base.SkipImportTestsIfNotEnabled(t) call (which skips tests when xattrs are not enabled) was accidentally removed from TestImportFilterEndpoint. This test exercises document import functionality (it writes documents directly to the data store and expects them to be auto-imported), which requires xattrs to be enabled. Without this guard, the test will fail in environments where xattrs are not enabled.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| // speed up test by not sleeping for _sync:seq when database reloads | ||
| // this sleep is used for multiple Sync Gateway nodes starting up simultaneously, but this test is only | ||
| // using a single node |
There was a problem hiding this comment.
These comments throughout are definitely redundant as you mentioned.
This function name does a reasonable job at communicating the intention already, and the godoc comment has a good explanation of why. Maybe you can just improve the godoc comment if you feel like it needs it?
| // speed up test by not sleeping for _sync:seq when database reloads | ||
| // this sleep is used for multiple Sync Gateway nodes starting up simultaneously, but this test is only using a | ||
| // single node | ||
| db.DisableSequenceWaitOnDbRestart(t) |
There was a problem hiding this comment.
Since single node is the "normal" perhaps making this the default behaviour for RestTester would be beneficial rather than introducing this into a large number of tests?
There is already an under-utilised framework in place for "multi-node" RestTesters (RestTesterCluster) which may be worth thinking about rolling out more widely if there are tests doing a weird DIY multi-node thing?
There was a problem hiding this comment.
I think we could bypass this wait time without losing anything important in our tests. The places that set up multiple nodes all call addActiveRT. My understanding of the wait is that it helps if multiple nodes are starting up at exactly the same time.
If we wanted to test that behavior, we should probably do it in a way that is more robust and a faster polling window than incidental 1.5 seconds.
| // Get current value of _sync:seq | ||
| initialSequence, seqErr := db.sequences.lastSequence(ctx) | ||
| if seqErr != nil { | ||
| return seqErr | ||
| } | ||
| initialSequenceTime := time.Now() | ||
|
|
||
| base.InfofCtx(ctx, base.KeyCRUD, "Database has _sync:seq value on startup of %d", initialSequence) | ||
|
|
There was a problem hiding this comment.
I am finding it difficult to reason about what reordering this code means in practice. Sequences reserved but not allocated during user creation below will no longer block the change cache from starting up even on a single node?
Is that intentional? I think you'd hit this situation if you had a bunch of users to create that forced sequence batching but didn't use a full batch.
There was a problem hiding this comment.
My read on this was that installing the user creation was entirely accidental to be this early and just a result of how StartOnlineProcess was created. I think I could move the code to InstallPrincipals to after the change cache starts up. Moving it in the middle caused this data race 7e9da5a which is how I ended up the sequence allocation to the middle.
I agree this is a weird way to do it, and I'm not fond of it. If we have DisableSequenceWaitOnDbRestart running all the times, I don't think we have to close this gap.
One thing that makes the principal creation odd is that they are only modified if they don't already exist (except for guest user) as per
Line 92 in 39636bd
CBG-5205 test speed up: avoid sleep on db start
When starting a database that has a non zero _sync:seq, the database will sleep 1.5 to wait for released sequences.
In a test environment, this occurs often when there is a single RestTester running an a database configuration
occurs.
Fixes a bug where if users or roles are specified on database configuration, they would allocate a sequence
before the initial sequence was computed, so this would always result in waiting.
Set DisableSequenceWaitOnDbRestart anywhere there was a single RestTester. Keep this value for ISGR tests that
actually use multiple RestTesters.
Remove LongRunningTest for any test that runs faster than 50ms. This affects 37 tests.
I think it might be worthwhile to drop the comments around the usage of DisableSequenceWaitOnDbRestart?
Pre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiIntegration Tests
GSI=true,xattrs=truehttps://jenkins.sgwdev.com/job/SyncGatewayIntegration/322/