Skip to content

docs(lakebase-autoscale): canonical psycopg_pool + OAuthConnection pattern#488

Open
dgokeeffe wants to merge 6 commits intodatabricks-solutions:mainfrom
dgokeeffe:feat/lakebase-canonical-pattern-clean
Open

docs(lakebase-autoscale): canonical psycopg_pool + OAuthConnection pattern#488
dgokeeffe wants to merge 6 commits intodatabricks-solutions:mainfrom
dgokeeffe:feat/lakebase-canonical-pattern-clean

Conversation

@dgokeeffe
Copy link
Copy Markdown

Summary

Restructures the databricks-lakebase-autoscale skill to lead with the canonical connection pattern from the official Databricks Apps + Lakebase tutorial, and adds an explicit framing of how the Python ecosystem fits together.

What changed

connection-patterns.md — reordered and expanded:

  • Pattern 1 (new, canonical): psycopg_pool.ConnectionPool + OAuthConnection subclass + max_lifetime=2700. Matches the official tutorial, the external app SDK guide, and databricks-ai-bridge. Zero background threads — rotation happens transparently via pool recycling.
  • Pattern 2 (demoted): the previous SQLAlchemy do_connect + asyncio.Task refresh pattern is now marked "alternative for apps already using SQLAlchemy async", with a note that it adds unnecessary operational complexity for the common case.
  • Patterns 3–4: direct psycopg.connect (scripts only) and static URL (local dev only) — unchanged in spirit, trimmed.
  • Added FastAPI variant (open=False + explicit lifespan).

SKILL.md — new up-front overview section:

  • Explicit "There is no separate Lakebase SDK for Python" framing — readers repeatedly ask this.
  • Cross-language table (Python / Node-TS / Java-Go) showing which SDK and DB driver to use.
  • Mention of @databricks/lakebase as the Node/TS convenience wrapper (Autoscaling-only).
  • "What NOT to do" list — most importantly flagging that WorkspaceClient().config.token is workspace-scoped and will fail at Postgres login. Must use generate_database_credential() for a Lakebase-scoped token.

Why

  • The old connection-patterns.md led with a SQLAlchemy + background-refresh loop, which works but is not what the official tutorial or reference implementations use.
  • The config.token vs generate_database_credential() distinction was buried; it's the Add initial skills for Databricks development #1 cause of "my connection works locally but fails in prod" bugs.
  • max_lifetime=2700 vs the 3600 default was implicit; the new doc explains why the default creates a race condition.

Test plan

  • Read through both files end-to-end for accuracy
  • Verified the canonical pattern against the official tutorial URL
  • Verified max_lifetime=2700 rationale (15-min buffer before 1-hour expiry)
  • Cross-checked cross-language table with @databricks/lakebase README

This pull request and its description were written by Isaac.

Copy link
Copy Markdown
Collaborator

@cankoklu-db cankoklu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry have to fix this.

@cankoklu-db
Copy link
Copy Markdown
Collaborator

Correcting my earlier comment — I cited /aws/en/oltp/instances/authentication (which is labeled Lakebase Provisioned in its breadcrumb) when this skill is for Lakebase Autoscaling specifically. Narrower set of observations that hold up against the Autoscaling tutorial, the external apps Autoscaling guide, and the Apps Lakebase resource doc:

1. Auto-injected env vars: 6, not 5
The Apps Lakebase resource doc explicitly lists PGAPPNAME, PGDATABASE, PGHOST, PGPORT, PGSSLMODE, PGUSER for the first database resource. The PR's app.yaml comment omits PGAPPNAME. The doc also notes only the first database resource gets auto-injected — multi-Lakebase apps need valueFrom: {resource: <key>, key: ...} for resource #2+.

2. @databricks/lakebase Autoscaling-only scope should be in the table cell, not just the prose
The README states explicitly: "NOT compatible with the Databricks Lakebase Provisioned". Source code calls /api/2.0/postgres/credentials (Autoscaling endpoint-resource API). Readers scanning the cross-language table for Provisioned guidance miss the caveat above the table. Either add a Scope column or change the cell to @databricks/lakebase (Autoscaling only).

3. Sharpen the open=True vs open=False rationale
The primary driver for FastAPI's open=False isn't just "fail-fast on startup" — open=True is deprecated for AsyncConnectionPool and becomes an error in psycopg 4.0 (still the sync default). databricks-ai-bridge follows the same sync-True / async-False split. One sentence in the doc would make the asymmetry principled rather than stylistic.

4. One-line comment on the psycopg3 pin

psycopg[binary,pool]>=3.1.0  # psycopg3 required — psycopg2.pool has no connection_class hook for the OAuthConnection pattern

The Autoscaling tutorial uses psycopg3 throughout. Calling out why helps readers with psycopg2 muscle memory who'd otherwise try to swap drivers.

5. Pattern 2 pool_recycle — optional alignment, not a fix
databricks-ai-bridge uses 2700 (PR #316) with an explicit 45-min-before-60-min comment. Pattern 2's 3600 isn't wrong — the Autoscaling tutorial itself doesn't set max_lifetime at all (relies on psycopg_pool's defaults), which undercuts Pattern 1's "Always use 2700" strength. Up to you whether to align Pattern 2 with databricks-ai-bridge for internal consistency with Pattern 1, or leave both as-is and soften Pattern 1's "Always use 2700" to "prefer 2700".

Retracted

  • My earlier claim that Pattern 1's "minute 59 / minute 60 will fail" sentence is "provably false" — that was based on the Provisioned auth doc, which is not the right authority for an Autoscaling skill. The Autoscaling tutorial doesn't explicitly describe post-expiry connection behavior, so there's no public Autoscaling source that contradicts the PR's framing. No change needed there.
  • My earlier framing that the env-var list needed "fixing" was a tone error — it's a minor completeness gap, not a factual error in how the pattern works.

Apologies for the noise in the first pass.

@dgokeeffe
Copy link
Copy Markdown
Author

All feedback from the April 23 review has been addressed in dc23fb5:

  • PGAPPNAME — updated to list all 6 auto-injected env vars (PGAPPNAME, PGHOST, PGPORT, PGDATABASE, PGUSER, PGSSLMODE) and added the note that only the first database resource gets auto-injected
  • Cross-language table@databricks/lakebase cell now explicitly says (Autoscaling only)
  • open=False rationale — expanded to cover both points: fail-fast via pool.open(wait=True) and the open=True deprecation in AsyncConnectionPool (errors in psycopg 4.0)
  • psycopg3 pin comment — added inline comment explaining why psycopg2 won't work (no connection_class hook for OAuthConnection)

The point 5 (pool_recycle alignment) was a soft suggestion and I left Pattern 2's 3600 as-is — the tutorial itself doesn't set max_lifetime, so aligning to databricks-ai-bridge's 2700 would be the more opinionated call. Happy to change if you feel strongly.

Thanks for the thorough review — the PGAPPNAME miss in particular was a real gap.

@dgokeeffe
Copy link
Copy Markdown
Author

@cankoklu-db OK to merge in now?

David O'Keeffe added 3 commits May 8, 2026 12:14
…nection pattern

Restructure connection-patterns.md to match the official Databricks tutorial
and databricks-ai-bridge reference implementation:

- Pattern 1 (canonical, new): psycopg_pool.ConnectionPool + OAuthConnection
  subclass + max_lifetime=2700. Zero background threads, rotation via pool
  recycling. This is what docs.databricks.com's Lakebase Apps tutorial uses.
- Pattern 2: SQLAlchemy do_connect event (was previously presented as the
  production pattern — now demoted to "alternative for apps already using
  SQLAlchemy async", with an explicit note it adds unnecessary complexity).
- Pattern 3: Direct psycopg.connect for scripts/notebooks.
- Pattern 4: Static URL for local dev.

New explicit warnings:
- config.token / oauth_token().access_token is WORKSPACE-scoped and will fail
  at Postgres login. Must use w.postgres.generate_database_credential().
- max_lifetime=3600 (the default) creates a race condition; use 2700 so the
  pool recycles 15 min before the 1-hour token expiry.
- ENDPOINT_NAME env var must be set manually — Databricks auto-injects
  PGHOST/PGPORT/PGDATABASE/PGUSER/PGSSLMODE but NOT the endpoint path.

Canonical sources cited:
- docs.databricks.com/aws/en/oltp/projects/tutorial-databricks-apps-autoscaling
- docs.databricks.com/aws/en/oltp/projects/external-apps-connect
- github.com/databricks/databricks-ai-bridge (src/databricks_ai_bridge/lakebase.py)

Co-authored-by: Isaac
…oss-language table

The existing overview jumped straight into features. Readers arriving from
"how do I use Lakebase from Python?" needed two things made explicit:

1. There is no separate Lakebase SDK for Python. You use databricks-sdk
   only for minting OAuth credentials; a standard Postgres driver does the
   actual queries. (This was implicit in the connection patterns doc but
   not called out up-front.)
2. Node/TypeScript has a convenience wrapper: @databricks/lakebase
   (re-exported by @databricks/appkit). Autoscaling-only, not Provisioned.
   Worth mentioning so JS/TS readers know it exists.

Also added a cross-language summary table and an explicit "What NOT to do"
list — most importantly flagging that WorkspaceClient().config.token is
workspace-scoped and will be rejected at Postgres login. This is a trap
several of us have fallen into.

Co-authored-by: Isaac
- Fix PGAPPNAME omission: 6 env vars auto-injected, not 5; note multi-resource caveat
- Add psycopg3 pin comment explaining why psycopg2 won't work (no connection_class hook)
- Strengthen open=False rationale: deprecated for AsyncConnectionPool, errors in psycopg 4.0
- Clarify @databricks/lakebase scope in cross-language table (Autoscaling only)

Co-authored-by: Isaac
@dgokeeffe dgokeeffe force-pushed the feat/lakebase-canonical-pattern-clean branch from dc23fb5 to 3795f6b Compare May 8, 2026 02:30
Copy link
Copy Markdown
Collaborator

@QuentinAmbard QuentinAmbard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's great but I suspect this could be condensed, claude doesn't need full python example for instance, maybe try to do just 1 pass with something like:
Densify this skill without losing information, densify global knowledge/things you'd already know with instructions / guidance vs showing exactly what to do ?

import asyncio
import uuid
from contextlib import asynccontextmanager
from fastapi import FastAPI
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove a lot here, just mention the open=False and if we want just a short pseudo code no?

David O'Keeffe added 3 commits May 9, 2026 14:57
Address Can Köklü's two soft suggestions from internal Slack feedback:

- Soften "Always use 2700" to "Prefer 2700" — note that the official
  tutorial doesn't set max_lifetime and databricks-ai-bridge uses 2700,
  so 2700 is a defensive convention rather than a spec requirement.
- Retitle Pattern 2 to "SQLAlchemy do_connect Event + Background
  Refresh Loop (Alternative)" so the demotion clearly targets the
  homegrown asyncio.Task refresh loop, not do_connect itself.
  do_connect is the official Databricks SQLAlchemy auth hook.
- Add a callout in Pattern 2 distinguishing the official do_connect
  event from the community asyncio.Task variant, and a one-line
  alternative path for SQLAlchemy users who don't want a background loop.

No structural changes — densification pass to follow.

Co-authored-by: Isaac
…gfood)

Ran the same densification prompt Quentin used on the mlflow skill against
this skill via gpt-5.5 in logfood. Restructure: 6 files / 1,439 lines →
4 files / 769 lines (47% reduction).

Structural changes:
- SKILL.md trimmed to dense overview + cross-language framing + resource
  model + non-obvious facts to preserve. Trigger description retained
  verbatim (the "Use when..." phrasing required by the skill convention).
- connection-patterns.md → connections.md. Drops the full Flask/FastAPI
  app implementations and the LakebaseAutoscaleConnectionManager class;
  keeps the canonical OAuthConnection skeleton, the do_connect hook,
  Databricks Apps env-var gotchas, DNS workaround, retry/timeout notes.
- projects.md + branches.md + computes.md → operations.md. Drops
  generic SDK CRUD examples; keeps API names, FieldMask paths,
  TTL/protected/default/parent-child constraints, CU/RAM/connection-limit
  table, scale-to-zero defaults, project limits, MCP tool intent.
- reverse-etl.md compressed; keeps namespace split (w.database, not
  w.postgres), CDF requirement, type mapping, limits, and the deletion
  sequence.

Hard constraints preserved through the densification:
- Canonical Pattern 1 (psycopg_pool + OAuthConnection + max_lifetime=2700).
- The "config.token is workspace-scoped and FAILS at Postgres login —
  use generate_database_credential() instead" warning.
- Cross-language Python/Node-TS/Java-Go table.
- "There is no separate Lakebase SDK" framing.
- "Prefer 2700" softening (no "Always use 2700") — defensive convention,
  not a spec requirement.
- do_connect is the official Databricks SQLAlchemy auth hook
  (databricks-ai-bridge uses it); only the homegrown asyncio.Task refresh
  loop is demoted as a community variant.

Co-authored-by: Isaac
Two factual errors that pre-dated this PR (originally in computes.md,
preserved through the densification pass):

- Autoscale spread: was "max - min <= 8", correct is "max - min <= 16"
- Fixed-size always-on compute floor: was 36 CU, correct is 40 CU
- Updated "Valid / Invalid" examples to match the <= 16 spread rule
  (4-20, 8-16, 16-32; invalid 0.5-32 has spread 31.5)

Source: Lakebase Autoscaling tutorial / Dustin's official Genie Code
Lakebase skill draft, confirmed by user.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants