docs(lakebase-autoscale): canonical psycopg_pool + OAuthConnection pattern by dgokeeffe · Pull Request #488 · databricks-solutions/ai-dev-kit

dgokeeffe · 2026-04-23T09:29:34Z

Summary

Restructures the databricks-lakebase-autoscale skill to lead with the canonical connection pattern from the official Databricks Apps + Lakebase tutorial, and adds an explicit framing of how the Python ecosystem fits together.

What changed

connection-patterns.md — reordered and expanded:

Pattern 1 (new, canonical): psycopg_pool.ConnectionPool + OAuthConnection subclass + max_lifetime=2700. Matches the official tutorial, the external app SDK guide, and databricks-ai-bridge. Zero background threads — rotation happens transparently via pool recycling.
Pattern 2 (demoted): the previous SQLAlchemy do_connect + asyncio.Task refresh pattern is now marked "alternative for apps already using SQLAlchemy async", with a note that it adds unnecessary operational complexity for the common case.
Patterns 3–4: direct psycopg.connect (scripts only) and static URL (local dev only) — unchanged in spirit, trimmed.
Added FastAPI variant (open=False + explicit lifespan).

SKILL.md — new up-front overview section:

Explicit "There is no separate Lakebase SDK for Python" framing — readers repeatedly ask this.
Cross-language table (Python / Node-TS / Java-Go) showing which SDK and DB driver to use.
Mention of @databricks/lakebase as the Node/TS convenience wrapper (Autoscaling-only).
"What NOT to do" list — most importantly flagging that WorkspaceClient().config.token is workspace-scoped and will fail at Postgres login. Must use generate_database_credential() for a Lakebase-scoped token.

Why

The old connection-patterns.md led with a SQLAlchemy + background-refresh loop, which works but is not what the official tutorial or reference implementations use.
The config.token vs generate_database_credential() distinction was buried; it's the Add initial skills for Databricks development #1 cause of "my connection works locally but fails in prod" bugs.
max_lifetime=2700 vs the 3600 default was implicit; the new doc explains why the default creates a race condition.

Test plan

Read through both files end-to-end for accuracy
Verified the canonical pattern against the official tutorial URL
Verified max_lifetime=2700 rationale (15-min buffer before 1-hour expiry)
Cross-checked cross-language table with @databricks/lakebase README

This pull request and its description were written by Isaac.

cankoklu-db

sorry have to fix this.

cankoklu-db · 2026-04-23T13:04:01Z

Correcting my earlier comment — I cited /aws/en/oltp/instances/authentication (which is labeled Lakebase Provisioned in its breadcrumb) when this skill is for Lakebase Autoscaling specifically. Narrower set of observations that hold up against the Autoscaling tutorial, the external apps Autoscaling guide, and the Apps Lakebase resource doc:

1. Auto-injected env vars: 6, not 5
The Apps Lakebase resource doc explicitly lists PGAPPNAME, PGDATABASE, PGHOST, PGPORT, PGSSLMODE, PGUSER for the first database resource. The PR's app.yaml comment omits PGAPPNAME. The doc also notes only the first database resource gets auto-injected — multi-Lakebase apps need valueFrom: {resource: <key>, key: ...} for resource #2+.

2. @databricks/lakebase Autoscaling-only scope should be in the table cell, not just the prose
The README states explicitly: "NOT compatible with the Databricks Lakebase Provisioned". Source code calls /api/2.0/postgres/credentials (Autoscaling endpoint-resource API). Readers scanning the cross-language table for Provisioned guidance miss the caveat above the table. Either add a Scope column or change the cell to @databricks/lakebase (Autoscaling only).

3. Sharpen the open=True vs open=False rationale
The primary driver for FastAPI's open=False isn't just "fail-fast on startup" — open=True is deprecated for AsyncConnectionPool and becomes an error in psycopg 4.0 (still the sync default). databricks-ai-bridge follows the same sync-True / async-False split. One sentence in the doc would make the asymmetry principled rather than stylistic.

4. One-line comment on the psycopg3 pin

psycopg[binary,pool]>=3.1.0  # psycopg3 required — psycopg2.pool has no connection_class hook for the OAuthConnection pattern

The Autoscaling tutorial uses psycopg3 throughout. Calling out why helps readers with psycopg2 muscle memory who'd otherwise try to swap drivers.

5. Pattern 2 pool_recycle — optional alignment, not a fix
databricks-ai-bridge uses 2700 (PR #316) with an explicit 45-min-before-60-min comment. Pattern 2's 3600 isn't wrong — the Autoscaling tutorial itself doesn't set max_lifetime at all (relies on psycopg_pool's defaults), which undercuts Pattern 1's "Always use 2700" strength. Up to you whether to align Pattern 2 with databricks-ai-bridge for internal consistency with Pattern 1, or leave both as-is and soften Pattern 1's "Always use 2700" to "prefer 2700".

Retracted

My earlier claim that Pattern 1's "minute 59 / minute 60 will fail" sentence is "provably false" — that was based on the Provisioned auth doc, which is not the right authority for an Autoscaling skill. The Autoscaling tutorial doesn't explicitly describe post-expiry connection behavior, so there's no public Autoscaling source that contradicts the PR's framing. No change needed there.
My earlier framing that the env-var list needed "fixing" was a tone error — it's a minor completeness gap, not a factual error in how the pattern works.

Apologies for the noise in the first pass.

dgokeeffe · 2026-05-07T00:27:09Z

All feedback from the April 23 review has been addressed in dc23fb5:

PGAPPNAME — updated to list all 6 auto-injected env vars (PGAPPNAME, PGHOST, PGPORT, PGDATABASE, PGUSER, PGSSLMODE) and added the note that only the first database resource gets auto-injected
Cross-language table — @databricks/lakebase cell now explicitly says (Autoscaling only)
open=False rationale — expanded to cover both points: fail-fast via pool.open(wait=True) and the open=True deprecation in AsyncConnectionPool (errors in psycopg 4.0)
psycopg3 pin comment — added inline comment explaining why psycopg2 won't work (no connection_class hook for OAuthConnection)

The point 5 (pool_recycle alignment) was a soft suggestion and I left Pattern 2's 3600 as-is — the tutorial itself doesn't set max_lifetime, so aligning to databricks-ai-bridge's 2700 would be the more opinionated call. Happy to change if you feel strongly.

Thanks for the thorough review — the PGAPPNAME miss in particular was a real gap.

dgokeeffe · 2026-05-07T00:31:48Z

@cankoklu-db OK to merge in now?

…nection pattern Restructure connection-patterns.md to match the official Databricks tutorial and databricks-ai-bridge reference implementation: - Pattern 1 (canonical, new): psycopg_pool.ConnectionPool + OAuthConnection subclass + max_lifetime=2700. Zero background threads, rotation via pool recycling. This is what docs.databricks.com's Lakebase Apps tutorial uses. - Pattern 2: SQLAlchemy do_connect event (was previously presented as the production pattern — now demoted to "alternative for apps already using SQLAlchemy async", with an explicit note it adds unnecessary complexity). - Pattern 3: Direct psycopg.connect for scripts/notebooks. - Pattern 4: Static URL for local dev. New explicit warnings: - config.token / oauth_token().access_token is WORKSPACE-scoped and will fail at Postgres login. Must use w.postgres.generate_database_credential(). - max_lifetime=3600 (the default) creates a race condition; use 2700 so the pool recycles 15 min before the 1-hour token expiry. - ENDPOINT_NAME env var must be set manually — Databricks auto-injects PGHOST/PGPORT/PGDATABASE/PGUSER/PGSSLMODE but NOT the endpoint path. Canonical sources cited: - docs.databricks.com/aws/en/oltp/projects/tutorial-databricks-apps-autoscaling - docs.databricks.com/aws/en/oltp/projects/external-apps-connect - github.com/databricks/databricks-ai-bridge (src/databricks_ai_bridge/lakebase.py) Co-authored-by: Isaac

…oss-language table The existing overview jumped straight into features. Readers arriving from "how do I use Lakebase from Python?" needed two things made explicit: 1. There is no separate Lakebase SDK for Python. You use databricks-sdk only for minting OAuth credentials; a standard Postgres driver does the actual queries. (This was implicit in the connection patterns doc but not called out up-front.) 2. Node/TypeScript has a convenience wrapper: @databricks/lakebase (re-exported by @databricks/appkit). Autoscaling-only, not Provisioned. Worth mentioning so JS/TS readers know it exists. Also added a cross-language summary table and an explicit "What NOT to do" list — most importantly flagging that WorkspaceClient().config.token is workspace-scoped and will be rejected at Postgres login. This is a trap several of us have fallen into. Co-authored-by: Isaac

- Fix PGAPPNAME omission: 6 env vars auto-injected, not 5; note multi-resource caveat - Add psycopg3 pin comment explaining why psycopg2 won't work (no connection_class hook) - Strengthen open=False rationale: deprecated for AsyncConnectionPool, errors in psycopg 4.0 - Clarify @databricks/lakebase scope in cross-language table (Autoscaling only) Co-authored-by: Isaac

QuentinAmbard

I think it's great but I suspect this could be condensed, claude doesn't need full python example for instance, maybe try to do just 1 pass with something like:
Densify this skill without losing information, densify global knowledge/things you'd already know with instructions / guidance vs showing exactly what to do ?

QuentinAmbard · 2026-05-08T07:02:59Z

-import asyncio
-import uuid
 from contextlib import asynccontextmanager
+from fastapi import FastAPI


I think we can remove a lot here, just mention the open=False and if we want just a short pseudo code no?

Address Can Köklü's two soft suggestions from internal Slack feedback: - Soften "Always use 2700" to "Prefer 2700" — note that the official tutorial doesn't set max_lifetime and databricks-ai-bridge uses 2700, so 2700 is a defensive convention rather than a spec requirement. - Retitle Pattern 2 to "SQLAlchemy do_connect Event + Background Refresh Loop (Alternative)" so the demotion clearly targets the homegrown asyncio.Task refresh loop, not do_connect itself. do_connect is the official Databricks SQLAlchemy auth hook. - Add a callout in Pattern 2 distinguishing the official do_connect event from the community asyncio.Task variant, and a one-line alternative path for SQLAlchemy users who don't want a background loop. No structural changes — densification pass to follow. Co-authored-by: Isaac

…gfood) Ran the same densification prompt Quentin used on the mlflow skill against this skill via gpt-5.5 in logfood. Restructure: 6 files / 1,439 lines → 4 files / 769 lines (47% reduction). Structural changes: - SKILL.md trimmed to dense overview + cross-language framing + resource model + non-obvious facts to preserve. Trigger description retained verbatim (the "Use when..." phrasing required by the skill convention). - connection-patterns.md → connections.md. Drops the full Flask/FastAPI app implementations and the LakebaseAutoscaleConnectionManager class; keeps the canonical OAuthConnection skeleton, the do_connect hook, Databricks Apps env-var gotchas, DNS workaround, retry/timeout notes. - projects.md + branches.md + computes.md → operations.md. Drops generic SDK CRUD examples; keeps API names, FieldMask paths, TTL/protected/default/parent-child constraints, CU/RAM/connection-limit table, scale-to-zero defaults, project limits, MCP tool intent. - reverse-etl.md compressed; keeps namespace split (w.database, not w.postgres), CDF requirement, type mapping, limits, and the deletion sequence. Hard constraints preserved through the densification: - Canonical Pattern 1 (psycopg_pool + OAuthConnection + max_lifetime=2700). - The "config.token is workspace-scoped and FAILS at Postgres login — use generate_database_credential() instead" warning. - Cross-language Python/Node-TS/Java-Go table. - "There is no separate Lakebase SDK" framing. - "Prefer 2700" softening (no "Always use 2700") — defensive convention, not a spec requirement. - do_connect is the official Databricks SQLAlchemy auth hook (databricks-ai-bridge uses it); only the homegrown asyncio.Task refresh loop is demoted as a community variant. Co-authored-by: Isaac

Two factual errors that pre-dated this PR (originally in computes.md, preserved through the densification pass): - Autoscale spread: was "max - min <= 8", correct is "max - min <= 16" - Fixed-size always-on compute floor: was 36 CU, correct is 40 CU - Updated "Valid / Invalid" examples to match the <= 16 spread rule (4-20, 8-16, 16-32; invalid 0.5-32 has spread 31.5) Source: Lakebase Autoscaling tutorial / Dustin's official Genie Code Lakebase skill draft, confirmed by user. Co-authored-by: Isaac

cankoklu-db reviewed Apr 23, 2026

View reviewed changes

David O'Keeffe added 3 commits May 8, 2026 12:14

dgokeeffe force-pushed the feat/lakebase-canonical-pattern-clean branch from dc23fb5 to 3795f6b Compare May 8, 2026 02:30

QuentinAmbard reviewed May 8, 2026

View reviewed changes

David O'Keeffe added 3 commits May 9, 2026 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(lakebase-autoscale): canonical psycopg_pool + OAuthConnection pattern#488

docs(lakebase-autoscale): canonical psycopg_pool + OAuthConnection pattern#488
dgokeeffe wants to merge 6 commits intodatabricks-solutions:mainfrom
dgokeeffe:feat/lakebase-canonical-pattern-clean

dgokeeffe commented Apr 23, 2026

Uh oh!

cankoklu-db left a comment •

edited

Loading

Uh oh!

cankoklu-db commented Apr 23, 2026

Uh oh!

dgokeeffe commented May 7, 2026

Uh oh!

dgokeeffe commented May 7, 2026

Uh oh!

QuentinAmbard left a comment

Uh oh!

QuentinAmbard May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dgokeeffe commented Apr 23, 2026

Summary

What changed

Why

Test plan

Uh oh!

cankoklu-db left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cankoklu-db commented Apr 23, 2026

Retracted

Uh oh!

dgokeeffe commented May 7, 2026

Uh oh!

dgokeeffe commented May 7, 2026

Uh oh!

QuentinAmbard left a comment

Choose a reason for hiding this comment

Uh oh!

QuentinAmbard May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cankoklu-db left a comment •

edited

Loading