Skip to content

Add database schema, migrations, and connection pooling#2

Open
Sauhard74 wants to merge 2 commits intofeat/ci-pipelinefrom
feat/database-schema
Open

Add database schema, migrations, and connection pooling#2
Sauhard74 wants to merge 2 commits intofeat/ci-pipelinefrom
feat/database-schema

Conversation

@Sauhard74
Copy link
Collaborator

Summary

  • Complete pgvector integration: add vector(3072) embedding columns to pull_requests, issues, and vision_chunks tables
  • Add HNSW indexes on all embedding vectors for approximate nearest neighbor search
  • Add connection pooling via postgres-js (max 10 connections, 20s idle timeout)
  • Add programmatic migration runner (tsx src/migrate.ts) replacing drizzle-kit migrate
  • Add drizzle-kit config for schema generation and Drizzle Studio
  • Add partial index on pull_requests(repo_id, staleness_stage) WHERE state = 'open'

Schema (6 tables)

Table Purpose
repositories GitHub repo metadata, vision doc, per-repo config
pull_requests PR data + embedding + quality/vision/abandon scores
issues Issue data + embedding
clusters Duplicate groups (PR or issue type)
cluster_members Links items to clusters with rank + similarity
vision_chunks Chunked vision document with embeddings
actions_log Audit trail for all automated actions

TRD Phase 1 Deliverable

Database schema and migrations (Drizzle ORM, pgvector extension)

Test plan

  • Run pnpm db:generate and verify migration SQL matches schema
  • Run pnpm db:migrate against a local pgvector-enabled Postgres
  • Verify HNSW indexes are created on embedding columns
  • Confirm connection pooling works with concurrent queries

@@ -0,0 +1,113 @@
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
--> statement-breakpoint
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove those ai based comments

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
--> statement-breakpoint
CREATE TABLE "actions_log" (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably name this action_log?

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
--> statement-breakpoint
CREATE TABLE "actions_log" (
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be CREATE TABLE IF NOT EXISTS, same for every other such block

CREATE EXTENSION IF NOT EXISTS vector;
--> statement-breakpoint
CREATE TABLE "actions_log" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we aligned on using uuid if so why not uuidv7?

CREATE EXTENSION IF NOT EXISTS vector;
--> statement-breakpoint
CREATE TABLE "actions_log" (
"id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would also like the type to be also upper case matching everything else, so uuid -> UUID

Comment on lines +100 to +113
CREATE INDEX "idx_actions_repo_target" ON "actions_log" USING btree ("repo_id","target_type","target_number");--> statement-breakpoint
CREATE UNIQUE INDEX "idx_cluster_member_unique" ON "cluster_members" USING btree ("cluster_id","item_id");--> statement-breakpoint
CREATE INDEX "idx_cluster_members_cluster" ON "cluster_members" USING btree ("cluster_id");--> statement-breakpoint
CREATE INDEX "idx_cluster_members_item" ON "cluster_members" USING btree ("item_type","item_id");--> statement-breakpoint
CREATE INDEX "idx_clusters_repo_status" ON "clusters" USING btree ("repo_id","status");--> statement-breakpoint
CREATE UNIQUE INDEX "idx_issue_repo_number" ON "issues" USING btree ("repo_id","github_number");--> statement-breakpoint
CREATE INDEX "idx_issues_repo" ON "issues" USING btree ("repo_id");--> statement-breakpoint
CREATE UNIQUE INDEX "idx_pr_repo_number" ON "pull_requests" USING btree ("repo_id","github_number");--> statement-breakpoint
CREATE INDEX "idx_pr_repo_state" ON "pull_requests" USING btree ("repo_id","state");--> statement-breakpoint
CREATE INDEX "idx_vision_chunks_repo" ON "vision_chunks" USING btree ("repo_id");--> statement-breakpoint
CREATE INDEX "idx_pr_embedding" ON "pull_requests" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
CREATE INDEX "idx_issue_embedding" ON "issues" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
CREATE INDEX "idx_vision_embedding" ON "vision_chunks" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
CREATE INDEX "idx_pr_staleness" ON "pull_requests" ("repo_id", "staleness_stage") WHERE "state" = 'open'; No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put indexes of columns beneath their table definition

Comment on lines +100 to +113
CREATE INDEX "idx_actions_repo_target" ON "actions_log" USING btree ("repo_id","target_type","target_number");--> statement-breakpoint
CREATE UNIQUE INDEX "idx_cluster_member_unique" ON "cluster_members" USING btree ("cluster_id","item_id");--> statement-breakpoint
CREATE INDEX "idx_cluster_members_cluster" ON "cluster_members" USING btree ("cluster_id");--> statement-breakpoint
CREATE INDEX "idx_cluster_members_item" ON "cluster_members" USING btree ("item_type","item_id");--> statement-breakpoint
CREATE INDEX "idx_clusters_repo_status" ON "clusters" USING btree ("repo_id","status");--> statement-breakpoint
CREATE UNIQUE INDEX "idx_issue_repo_number" ON "issues" USING btree ("repo_id","github_number");--> statement-breakpoint
CREATE INDEX "idx_issues_repo" ON "issues" USING btree ("repo_id");--> statement-breakpoint
CREATE UNIQUE INDEX "idx_pr_repo_number" ON "pull_requests" USING btree ("repo_id","github_number");--> statement-breakpoint
CREATE INDEX "idx_pr_repo_state" ON "pull_requests" USING btree ("repo_id","state");--> statement-breakpoint
CREATE INDEX "idx_vision_chunks_repo" ON "vision_chunks" USING btree ("repo_id");--> statement-breakpoint
CREATE INDEX "idx_pr_embedding" ON "pull_requests" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
CREATE INDEX "idx_issue_embedding" ON "issues" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
CREATE INDEX "idx_vision_embedding" ON "vision_chunks" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint
CREATE INDEX "idx_pr_staleness" ON "pull_requests" ("repo_id", "staleness_stage") WHERE "state" = 'open'; No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to write using btree, thats the default

export function getDb(): Database {
if (db) return db;

const url = process.env["DATABASE_URL"];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db credentials should not come from env, instead they should come from docker secrets
https://docs.docker.com/compose/how-tos/use-secrets/

Comment on lines +19 to +21
max: 10,
idle_timeout: 20,
connect_timeout: 10,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this should come through config (env)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the get db function or anything that is getting the db conn should just be defined at one place and use everywhere

Complete Drizzle ORM setup with pgvector integration:
- Enable pgvector extension and add vector(3072) embedding columns
  to pull_requests, issues, and vision_chunks tables
- Add HNSW indexes on all embedding vectors for ANN search
- Add connection pooling with postgres-js (max 10, idle timeout 20s)
- Add programmatic migration runner (tsx src/migrate.ts)
- Add drizzle-kit config for schema generation and studio
- Add partial index on pull_requests for open PR staleness queries
- Update db package exports to include connection utilities
- Fix core package.json exports (add default entry)
- core: tests for normalizeRankingWeights and validateStalenessConfig
- core: tests for constants values and weight invariants
- db: tests for connection singleton, env validation, close behavior
- Add 85% coverage thresholds to core and db vitest configs
- Exclude migrate.ts from db coverage (auto-executing script)
@Sauhard74 Sauhard74 force-pushed the feat/database-schema branch from 8012504 to e3d22c6 Compare February 22, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants