Add database schema, migrations, and connection pooling#2

Open

Sauhard74 wants to merge 2 commits intofeat/ci-pipelinefrom

feat/database-schema

Collaborator

Sauhard74 commented Feb 21, 2026

Summary

Complete pgvector integration: add vector(3072) embedding columns to pull_requests, issues, and vision_chunks tables
Add HNSW indexes on all embedding vectors for approximate nearest neighbor search
Add connection pooling via postgres-js (max 10 connections, 20s idle timeout)
Add programmatic migration runner (tsx src/migrate.ts) replacing drizzle-kit migrate
Add drizzle-kit config for schema generation and Drizzle Studio
Add partial index on pull_requests(repo_id, staleness_stage) WHERE state = 'open'

Schema (6 tables)

Table	Purpose
`repositories`	GitHub repo metadata, vision doc, per-repo config
`pull_requests`	PR data + embedding + quality/vision/abandon scores
`issues`	Issue data + embedding
`clusters`	Duplicate groups (PR or issue type)
`cluster_members`	Links items to clusters with rank + similarity
`vision_chunks`	Chunked vision document with embeddings
`actions_log`	Audit trail for all automated actions

TRD Phase 1 Deliverable

Database schema and migrations (Drizzle ORM, pgvector extension)

Test plan

Run pnpm db:generate and verify migration SQL matches schema
Run pnpm db:migrate against a local pgvector-enabled Postgres
Verify HNSW indexes are created on embedding columns
Confirm connection pooling works with concurrent queries

iyad-f requested changes

View reviewed changes

packages/db/drizzle/0000_smart_fixer.sql

@@ @@ -0,0 +1,113 @@ @@
+              -- Enable pgvector extension
+              CREATE EXTENSION IF NOT EXISTS vector;
+              --> statement-breakpoint

iyad-f Feb 21, 2026

remove those ai based comments

packages/db/drizzle/0000_smart_fixer.sql

+              -- Enable pgvector extension
+              CREATE EXTENSION IF NOT EXISTS vector;
+              --> statement-breakpoint
+              CREATE TABLE "actions_log" (

iyad-f Feb 21, 2026

probably name this action_log?

packages/db/drizzle/0000_smart_fixer.sql

+              -- Enable pgvector extension
+              CREATE EXTENSION IF NOT EXISTS vector;
+              --> statement-breakpoint
+              CREATE TABLE "actions_log" (

iyad-f Feb 21, 2026

should be CREATE TABLE IF NOT EXISTS, same for every other such block

packages/db/drizzle/0000_smart_fixer.sql

+              CREATE EXTENSION IF NOT EXISTS vector;
+              --> statement-breakpoint
+              CREATE TABLE "actions_log" (
+              	"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,

iyad-f Feb 21, 2026

are we aligned on using uuid if so why not uuidv7?

packages/db/drizzle/0000_smart_fixer.sql

+              CREATE EXTENSION IF NOT EXISTS vector;
+              --> statement-breakpoint
+              CREATE TABLE "actions_log" (
+              	"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,

iyad-f Feb 21, 2026

i would also like the type to be also upper case matching everything else, so uuid -> UUID

packages/db/drizzle/0000_smart_fixer.sql

Comment on lines +100 to +113

+              CREATE INDEX "idx_actions_repo_target" ON "actions_log" USING btree ("repo_id","target_type","target_number");--> statement-breakpoint
+              CREATE UNIQUE INDEX "idx_cluster_member_unique" ON "cluster_members" USING btree ("cluster_id","item_id");--> statement-breakpoint
+              CREATE INDEX "idx_cluster_members_cluster" ON "cluster_members" USING btree ("cluster_id");--> statement-breakpoint
+              CREATE INDEX "idx_cluster_members_item" ON "cluster_members" USING btree ("item_type","item_id");--> statement-breakpoint
+              CREATE INDEX "idx_clusters_repo_status" ON "clusters" USING btree ("repo_id","status");--> statement-breakpoint
+              CREATE UNIQUE INDEX "idx_issue_repo_number" ON "issues" USING btree ("repo_id","github_number");--> statement-breakpoint
+              CREATE INDEX "idx_issues_repo" ON "issues" USING btree ("repo_id");--> statement-breakpoint
+              CREATE UNIQUE INDEX "idx_pr_repo_number" ON "pull_requests" USING btree ("repo_id","github_number");--> statement-breakpoint
+              CREATE INDEX "idx_pr_repo_state" ON "pull_requests" USING btree ("repo_id","state");--> statement-breakpoint
+              CREATE INDEX "idx_vision_chunks_repo" ON "vision_chunks" USING btree ("repo_id");--> statement-breakpoint
+              CREATE INDEX "idx_pr_embedding" ON "pull_requests" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
+              CREATE INDEX "idx_issue_embedding" ON "issues" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
+              CREATE INDEX "idx_vision_embedding" ON "vision_chunks" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
+              CREATE INDEX "idx_pr_staleness" ON "pull_requests" ("repo_id", "staleness_stage") WHERE "state" = 'open';

                
                    No newline at end of file

iyad-f Feb 21, 2026

put indexes of columns beneath their table definition

packages/db/drizzle/0000_smart_fixer.sql

Comment on lines +100 to +113

+              CREATE INDEX "idx_actions_repo_target" ON "actions_log" USING btree ("repo_id","target_type","target_number");--> statement-breakpoint
+              CREATE UNIQUE INDEX "idx_cluster_member_unique" ON "cluster_members" USING btree ("cluster_id","item_id");--> statement-breakpoint
+              CREATE INDEX "idx_cluster_members_cluster" ON "cluster_members" USING btree ("cluster_id");--> statement-breakpoint
+              CREATE INDEX "idx_cluster_members_item" ON "cluster_members" USING btree ("item_type","item_id");--> statement-breakpoint
+              CREATE INDEX "idx_clusters_repo_status" ON "clusters" USING btree ("repo_id","status");--> statement-breakpoint
+              CREATE UNIQUE INDEX "idx_issue_repo_number" ON "issues" USING btree ("repo_id","github_number");--> statement-breakpoint
+              CREATE INDEX "idx_issues_repo" ON "issues" USING btree ("repo_id");--> statement-breakpoint
+              CREATE UNIQUE INDEX "idx_pr_repo_number" ON "pull_requests" USING btree ("repo_id","github_number");--> statement-breakpoint
+              CREATE INDEX "idx_pr_repo_state" ON "pull_requests" USING btree ("repo_id","state");--> statement-breakpoint
+              CREATE INDEX "idx_vision_chunks_repo" ON "vision_chunks" USING btree ("repo_id");--> statement-breakpoint
+              CREATE INDEX "idx_pr_embedding" ON "pull_requests" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
+              CREATE INDEX "idx_issue_embedding" ON "issues" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
+              CREATE INDEX "idx_vision_embedding" ON "vision_chunks" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
+              CREATE INDEX "idx_pr_staleness" ON "pull_requests" ("repo_id", "staleness_stage") WHERE "state" = 'open';

                
                    No newline at end of file

iyad-f Feb 21, 2026

no need to write using btree, thats the default

packages/db/src/connection.ts

+              export function getDb(): Database {
+                if (db) return db;
+                const url = process.env["DATABASE_URL"];

iyad-f Feb 21, 2026

db credentials should not come from env, instead they should come from docker secrets
https://docs.docker.com/compose/how-tos/use-secrets/

packages/db/src/connection.ts

Comment on lines +19 to +21

+                  max: 10,
+                  idle_timeout: 20,
+                  connect_timeout: 10,

iyad-f Feb 21, 2026

all of this should come through config (env)

packages/db/src/migrate.ts

iyad-f Feb 21, 2026

the get db function or anything that is getting the db conn should just be defined at one place and use everywhere

Sauhard74 added 2 commits

February 22, 2026 05:51


          Add database schema, migrations, and connection pooling

9c5e685

Complete Drizzle ORM setup with pgvector integration:
- Enable pgvector extension and add vector(3072) embedding columns
  to pull_requests, issues, and vision_chunks tables
- Add HNSW indexes on all embedding vectors for ANN search
- Add connection pooling with postgres-js (max 10, idle timeout 20s)
- Add programmatic migration runner (tsx src/migrate.ts)
- Add drizzle-kit config for schema generation and studio
- Add partial index on pull_requests for open PR staleness queries
- Update db package exports to include connection utilities
- Fix core package.json exports (add default entry)


          Add tests for core and db packages with 85% coverage thresholds

e3d22c6

- core: tests for normalizeRankingWeights and validateStalenessConfig
- core: tests for constants values and weight invariants
- db: tests for connection singleton, env validation, close behavior
- Add 85% coverage thresholds to core and db vitest configs
- Exclude migrate.ts from db coverage (auto-executing script)

Sauhard74 force-pushed the feat/database-schema branch from 8012504 to e3d22c6 Compare

February 22, 2026 00:22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet