CBG-4651: change _online/_offline endpoints to work for both persistent and non persistent config#8093
Conversation
There was a problem hiding this comment.
Pull request overview
Aligns the admin /{db}/_online and /{db}/_offline endpoints so they behave consistently for both persistent and non-persistent config modes by driving the state transition via config changes + database reload.
Changes:
- Reworked admin
_online/_offlinehandlers to update config and reload the DB for both persistent and non-persistent modes; removed older helper methods that implemented online/offline transitions elsewhere. - Updated RestTester helpers and several tests to use
_online/_offlineuniformly. - Added/updated tests to verify disconnect behavior (BLIP) and config stability across offline/online transitions.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
rest/admin_api.go |
Implements new _online/_offline behavior by mutating config and reloading DBs in both modes. |
rest/utilities_testing_resttester.go |
Simplifies TakeDbOffline/TakeDbOnline helpers to always call _offline/_online. |
rest/server_context.go |
Removes ServerContext.TakeDbOnline helper now superseded by handler logic. |
db/database.go |
Removes DatabaseContext/Database.TakeDbOffline implementation now superseded by reload-based flow. |
rest/server_context_test.go |
Updates tests to bring DB online via endpoint and adjusts startup-failure test setup. |
rest/blip_api_crud_test.go |
Adds BLIP test asserting clients disconnect when DB is taken offline. |
rest/adminapitest/admin_api_test.go |
Updates expectations and adds coverage around offline/online endpoint behavior across config modes. |
torcolvin
left a comment
There was a problem hiding this comment.
The API docs will need to be updated as noted below and possibly also the public facing document.
Redocly previews |
rest/admin_api.go
Outdated
| h.server._databasesLock.Lock() | ||
| defer h.server._databasesLock.Unlock() |
There was a problem hiding this comment.
I think you actually want to lock this at the top of this function before checking if atomic.CompareAndSwapUint32(&h.db.State, db.DBOffline, db.DBStarting). That way if this function is called while a config update you will not lose the config update if it "wins".
If this is the correct behavior that's OK but I want to think about it and document why this is OK. This is one of the reasons that deduplicating these functions would be good.
There was a problem hiding this comment.
I don't think this needs to move? This lock I'm sure if for the databases maps on server context.
rest/admin_api.go
Outdated
|
|
||
| _ = base.JSONUnmarshal(body, &input) | ||
|
|
||
| contextNoCancel := base.NewNonCancelCtx() |
There was a problem hiding this comment.
I think this drops the db from the context.
I think we should keep db on the context so we don't have to worry about logging the database name everywhere. You could also drop all the context like above but then you want to make sure all logging will have database name.
IMO, I'd be fine with keeping all the context information including the correlation ID so we know what triggered this database update, vs bootstrap polling. This might be confusing because the request will be done, but some of the context will live on. I think this would depart from the way other code works so I'm fine just preserving the db name. If you did go with this approach ctx.WithoutCancel will drop the cancellations from any context.
There was a problem hiding this comment.
Good catch, there is a few areas of code using this so i've filed this https://jira.issues.couchbase.com/browse/CBG-5212 to address this.
rest/admin_api.go
Outdated
| if err := bucketDbConfig.validateConfigUpdate(contextNoCancel.Ctx, oldConfig, false); err != nil { | ||
| return nil, base.NewHTTPError(http.StatusBadRequest, err.Error()) | ||
| } | ||
|
|
||
| bucketDbConfig.Version, err = GenerateDatabaseConfigVersionID(ctx.Ctx, bucketDbConfig.Version, &bucketDbConfig.DbConfig) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| bucketDbConfig.SGVersion = base.ProductVersion.String() | ||
| updatedDbConfig = bucketDbConfig | ||
| return bucketDbConfig, nil | ||
| }) | ||
| if err != nil { | ||
| base.WarnfCtx(ctx.Ctx, "database reload failed during db online, Err: %v", err) | ||
| return |
There was a problem hiding this comment.
I wonder if this code should be its own function. This already has drift between other places that update config:
- handlePutCollectionConfigSync
- handleDeleteCollectionConfigSync
- handleDeleteCollectionConfigImportFilter
- handlePutCollectionConfigImportFilter
There seems like there are three steps:
Several steps:
- In
UpdateConfig:
i. Use the bucketconfig to update with one new value. This is going to unique to every function
ii. validate config update and produce a new bucketDbConfig (this should match for all functions) - After the config has been updated, call setup and reload the configuration
- If steps 1 and 2 succeed, then audit with the appropriate logging.
There was a problem hiding this comment.
I have refactored non persistent config to reduce deuplciated code. I have also moved the code after UpdateConfig into shared function called updateConfigAndReloadDatabase. I feel like the update callback itself has some subtle differences between uses in other functions, If we want to consolidate this code I think this should be a different ticket given that seems a bit out of scope for this as it think it needs some thought on how to best to do this.
| }) | ||
| } | ||
|
|
||
| func TestBlipDisconnectOnDbOffline(t *testing.T) { |
There was a problem hiding this comment.
nit:
I don't know I need these tests, I think they were useful as a proof of concept. I'm fine to leave them too.
I'm wary of too many tests that don't provide utility. If I did keep them I'd probably do something like TestNonBlockingContinuousConnectionsDbOffline.
I think mostly these prove that I don't think AccessLock is doing what it was originally doing. I'd also be inclined to remove AccessLock entirely. I think that it existed at a time when validateAndWriteHeaders didn't exist yet and AccessLock actually blocked when the handlers were running.
There was a problem hiding this comment.
I feel like this is good to catch regressions if offline doesn't close blip connections.
| @@ -466,26 +466,15 @@ func (rt *RestTester) PersistDbConfigToBucket(dbConfig DbConfig, bucketName stri | |||
|
|
|||
| // TakeDbOffline takes the database offline. | |||
There was a problem hiding this comment.
| // TakeDbOffline takes the database offline. | |
| // TakeDbOffline takes the database offline. This request is asynchronous. |
I'd be somewhat tempted to make this code synchronous, we actually rely on this being synchronous for each test that relies on resync.
sync_gateway/rest/adminapitest/resync_test.go
Line 198 in 39636bd
Are there any cases where we are trying to test its asynchronous behavior? If so, consider if this is more or fewer than rely on synchronous behavior.
There was a problem hiding this comment.
Do you mean online?
…nt config and non-persistent config
bbrks
left a comment
There was a problem hiding this comment.
A few things - mostly around context usage
commit 725b533 Author: Tor Colvin <tor.colvin@couchbase.com> Date: Thu Mar 19 10:56:25 2026 -0400 CBG-5192 add some log fixes for RTE errors (#8129) Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> commit 99040a0 Author: Tor Colvin <tor.colvin@couchbase.com> Date: Wed Mar 18 13:52:07 2026 -0400 Create sharded dcp options struct (#8112) commit f78dcc5 Author: Tor Colvin <tor.colvin@couchbase.com> Date: Tue Mar 17 17:10:49 2026 -0400 Add factory folder to gitignore (#8114) commit 90a39e6 Author: Tor Colvin <tor.colvin@couchbase.com> Date: Mon Mar 16 10:46:50 2026 -0400 Setup 4.0.4/3.3.4 builds (#8106) commit 50f8ed8 Author: Ben Brooks <ben.brooks@couchbase.com> Date: Mon Mar 16 14:16:43 2026 +0000 CBG-5203: Silently handle PING BLIP messages for cbl-js (#8101) Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> commit ca1a6a9 Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon Mar 16 09:15:29 2026 -0400 Bump golang.org/x/net from 0.51.0 to 0.52.0 (#8105) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit e2e5a9c Author: Ben Brooks <ben.brooks@couchbase.com> Date: Fri Mar 13 13:46:01 2026 +0000 Test: Add a timeout in ...Delta...BypassRevCache...WhenInFlightRevChanged (#8102) commit c7da947 Author: Gregory Newman-Smith <109068393+gregns1@users.noreply.github.com> Date: Fri Mar 13 13:45:49 2026 +0000 CBG-4651: change _online/_offline endpoints to work for both persistent and non persistent config (#8093) Merge branch 'main' of github.com:couchbase/sync_gateway into CBG-5153
CBG-4651
Pre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiDependencies (if applicable)
Integration Tests
GSI=true,xattrs=truehttps://jenkins.sgwdev.com/job/SyncGatewayIntegration/318/