Skip to content

CBG-4651: change _online/_offline endpoints to work for both persistent and non persistent config#8093

Merged
bbrks merged 8 commits intomainfrom
CBG-4651
Mar 13, 2026
Merged

CBG-4651: change _online/_offline endpoints to work for both persistent and non persistent config#8093
bbrks merged 8 commits intomainfrom
CBG-4651

Conversation

@gregns1
Copy link
Copy Markdown
Contributor

@gregns1 gregns1 commented Mar 4, 2026

CBG-4651

  • Change for online/offline endpoints to be aligned between persistent and non persistent config
  • Each endpoint will alter the db config to set offline true/false. Persistent config will persist this change to be picked up by other nodes, non persistent config will not
  • Both modes will disconnect clients as the db is reloaded for each mode

Pre-review checklist

  • Removed debug logging (fmt.Print, log.Print, ...)
  • Logging sensitive data? Make sure it's tagged (e.g. base.UD(docID), base.MD(dbName))
  • Updated relevant information in the API specifications (such as endpoint descriptions, schemas, ...) in docs/api

Dependencies (if applicable)

  • Link upstream PRs
  • Update Go module dependencies when merged

Integration Tests

Copilot AI review requested due to automatic review settings March 4, 2026 17:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns the admin /{db}/_online and /{db}/_offline endpoints so they behave consistently for both persistent and non-persistent config modes by driving the state transition via config changes + database reload.

Changes:

  • Reworked admin _online/_offline handlers to update config and reload the DB for both persistent and non-persistent modes; removed older helper methods that implemented online/offline transitions elsewhere.
  • Updated RestTester helpers and several tests to use _online/_offline uniformly.
  • Added/updated tests to verify disconnect behavior (BLIP) and config stability across offline/online transitions.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
rest/admin_api.go Implements new _online/_offline behavior by mutating config and reloading DBs in both modes.
rest/utilities_testing_resttester.go Simplifies TakeDbOffline/TakeDbOnline helpers to always call _offline/_online.
rest/server_context.go Removes ServerContext.TakeDbOnline helper now superseded by handler logic.
db/database.go Removes DatabaseContext/Database.TakeDbOffline implementation now superseded by reload-based flow.
rest/server_context_test.go Updates tests to bring DB online via endpoint and adjusts startup-failure test setup.
rest/blip_api_crud_test.go Adds BLIP test asserting clients disconnect when DB is taken offline.
rest/adminapitest/admin_api_test.go Updates expectations and adds coverage around offline/online endpoint behavior across config modes.

@gregns1 gregns1 self-assigned this Mar 4, 2026
Copy link
Copy Markdown
Collaborator

@torcolvin torcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API docs will need to be updated as noted below and possibly also the public facing document.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

Redocly previews

@gregns1 gregns1 assigned bbrks and unassigned gregns1 Mar 5, 2026
@gregns1 gregns1 requested a review from bbrks March 5, 2026 11:34
@bbrks bbrks assigned gregns1 and unassigned bbrks Mar 5, 2026
@gregns1 gregns1 assigned torcolvin and unassigned gregns1 Mar 9, 2026
Comment on lines +312 to +313
h.server._databasesLock.Lock()
defer h.server._databasesLock.Unlock()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you actually want to lock this at the top of this function before checking if atomic.CompareAndSwapUint32(&h.db.State, db.DBOffline, db.DBStarting). That way if this function is called while a config update you will not lose the config update if it "wins".

If this is the correct behavior that's OK but I want to think about it and document why this is OK. This is one of the reasons that deduplicating these functions would be good.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to move? This lock I'm sure if for the databases maps on server context.


_ = base.JSONUnmarshal(body, &input)

contextNoCancel := base.NewNonCancelCtx()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this drops the db from the context.

I think we should keep db on the context so we don't have to worry about logging the database name everywhere. You could also drop all the context like above but then you want to make sure all logging will have database name.

IMO, I'd be fine with keeping all the context information including the correlation ID so we know what triggered this database update, vs bootstrap polling. This might be confusing because the request will be done, but some of the context will live on. I think this would depart from the way other code works so I'm fine just preserving the db name. If you did go with this approach ctx.WithoutCancel will drop the cancellations from any context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, there is a few areas of code using this so i've filed this https://jira.issues.couchbase.com/browse/CBG-5212 to address this.

Comment on lines +286 to +301
if err := bucketDbConfig.validateConfigUpdate(contextNoCancel.Ctx, oldConfig, false); err != nil {
return nil, base.NewHTTPError(http.StatusBadRequest, err.Error())
}

bucketDbConfig.Version, err = GenerateDatabaseConfigVersionID(ctx.Ctx, bucketDbConfig.Version, &bucketDbConfig.DbConfig)
if err != nil {
return nil, err
}

bucketDbConfig.SGVersion = base.ProductVersion.String()
updatedDbConfig = bucketDbConfig
return bucketDbConfig, nil
})
if err != nil {
base.WarnfCtx(ctx.Ctx, "database reload failed during db online, Err: %v", err)
return
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this code should be its own function. This already has drift between other places that update config:

  • handlePutCollectionConfigSync
  • handleDeleteCollectionConfigSync
  • handleDeleteCollectionConfigImportFilter
  • handlePutCollectionConfigImportFilter

There seems like there are three steps:

Several steps:

  1. In UpdateConfig:
    i. Use the bucketconfig to update with one new value. This is going to unique to every function
    ii. validate config update and produce a new bucketDbConfig (this should match for all functions)
  2. After the config has been updated, call setup and reload the configuration
  3. If steps 1 and 2 succeed, then audit with the appropriate logging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored non persistent config to reduce deuplciated code. I have also moved the code after UpdateConfig into shared function called updateConfigAndReloadDatabase. I feel like the update callback itself has some subtle differences between uses in other functions, If we want to consolidate this code I think this should be a different ticket given that seems a bit out of scope for this as it think it needs some thought on how to best to do this.

})
}

func TestBlipDisconnectOnDbOffline(t *testing.T) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

I don't know I need these tests, I think they were useful as a proof of concept. I'm fine to leave them too.

I'm wary of too many tests that don't provide utility. If I did keep them I'd probably do something like TestNonBlockingContinuousConnectionsDbOffline.

I think mostly these prove that I don't think AccessLock is doing what it was originally doing. I'd also be inclined to remove AccessLock entirely. I think that it existed at a time when validateAndWriteHeaders didn't exist yet and AccessLock actually blocked when the handlers were running.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is good to catch regressions if offline doesn't close blip connections.

@@ -466,26 +466,15 @@ func (rt *RestTester) PersistDbConfigToBucket(dbConfig DbConfig, bucketName stri

// TakeDbOffline takes the database offline.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TakeDbOffline takes the database offline.
// TakeDbOffline takes the database offline. This request is asynchronous.

I'd be somewhat tempted to make this code synchronous, we actually rely on this being synchronous for each test that relies on resync.

Are there any cases where we are trying to test its asynchronous behavior? If so, consider if this is more or fewer than rely on synchronous behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean online?

@torcolvin torcolvin assigned gregns1 and unassigned torcolvin Mar 9, 2026
Copy link
Copy Markdown
Member

@bbrks bbrks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things - mostly around context usage

@bbrks bbrks assigned gregns1 and unassigned bbrks Mar 12, 2026
@gregns1 gregns1 assigned bbrks and unassigned gregns1 Mar 13, 2026
@bbrks bbrks merged commit c7da947 into main Mar 13, 2026
53 checks passed
@bbrks bbrks deleted the CBG-4651 branch March 13, 2026 13:45
RIT3shSapata added a commit that referenced this pull request Mar 20, 2026
commit 725b533
Author: Tor Colvin <tor.colvin@couchbase.com>
Date:   Thu Mar 19 10:56:25 2026 -0400

    CBG-5192 add some log fixes for RTE errors (#8129)

    Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

commit 99040a0
Author: Tor Colvin <tor.colvin@couchbase.com>
Date:   Wed Mar 18 13:52:07 2026 -0400

    Create sharded dcp options struct (#8112)

commit f78dcc5
Author: Tor Colvin <tor.colvin@couchbase.com>
Date:   Tue Mar 17 17:10:49 2026 -0400

    Add factory folder to gitignore (#8114)

commit 90a39e6
Author: Tor Colvin <tor.colvin@couchbase.com>
Date:   Mon Mar 16 10:46:50 2026 -0400

    Setup 4.0.4/3.3.4 builds (#8106)

commit 50f8ed8
Author: Ben Brooks <ben.brooks@couchbase.com>
Date:   Mon Mar 16 14:16:43 2026 +0000

    CBG-5203: Silently handle PING BLIP messages for cbl-js (#8101)

    Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

commit ca1a6a9
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Mar 16 09:15:29 2026 -0400

    Bump golang.org/x/net from 0.51.0 to 0.52.0 (#8105)

    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit e2e5a9c
Author: Ben Brooks <ben.brooks@couchbase.com>
Date:   Fri Mar 13 13:46:01 2026 +0000

    Test: Add a timeout in ...Delta...BypassRevCache...WhenInFlightRevChanged (#8102)

commit c7da947
Author: Gregory Newman-Smith <109068393+gregns1@users.noreply.github.com>
Date:   Fri Mar 13 13:45:49 2026 +0000

    CBG-4651: change _online/_offline endpoints to work for both persistent and non persistent config (#8093)
Merge branch 'main' of github.com:couchbase/sync_gateway into CBG-5153
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants