Add database schema, migrations, and connection pooling#2
Add database schema, migrations, and connection pooling#2Sauhard74 wants to merge 2 commits intofeat/ci-pipelinefrom
Conversation
| @@ -0,0 +1,113 @@ | |||
| -- Enable pgvector extension | |||
| CREATE EXTENSION IF NOT EXISTS vector; | |||
| --> statement-breakpoint | |||
| -- Enable pgvector extension | ||
| CREATE EXTENSION IF NOT EXISTS vector; | ||
| --> statement-breakpoint | ||
| CREATE TABLE "actions_log" ( |
| -- Enable pgvector extension | ||
| CREATE EXTENSION IF NOT EXISTS vector; | ||
| --> statement-breakpoint | ||
| CREATE TABLE "actions_log" ( |
There was a problem hiding this comment.
should be CREATE TABLE IF NOT EXISTS, same for every other such block
| CREATE EXTENSION IF NOT EXISTS vector; | ||
| --> statement-breakpoint | ||
| CREATE TABLE "actions_log" ( | ||
| "id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL, |
There was a problem hiding this comment.
are we aligned on using uuid if so why not uuidv7?
| CREATE EXTENSION IF NOT EXISTS vector; | ||
| --> statement-breakpoint | ||
| CREATE TABLE "actions_log" ( | ||
| "id" uuid PRIMARY KEY DEFAULT gen_random_uuid() NOT NULL, |
There was a problem hiding this comment.
i would also like the type to be also upper case matching everything else, so uuid -> UUID
| CREATE INDEX "idx_actions_repo_target" ON "actions_log" USING btree ("repo_id","target_type","target_number");--> statement-breakpoint | ||
| CREATE UNIQUE INDEX "idx_cluster_member_unique" ON "cluster_members" USING btree ("cluster_id","item_id");--> statement-breakpoint | ||
| CREATE INDEX "idx_cluster_members_cluster" ON "cluster_members" USING btree ("cluster_id");--> statement-breakpoint | ||
| CREATE INDEX "idx_cluster_members_item" ON "cluster_members" USING btree ("item_type","item_id");--> statement-breakpoint | ||
| CREATE INDEX "idx_clusters_repo_status" ON "clusters" USING btree ("repo_id","status");--> statement-breakpoint | ||
| CREATE UNIQUE INDEX "idx_issue_repo_number" ON "issues" USING btree ("repo_id","github_number");--> statement-breakpoint | ||
| CREATE INDEX "idx_issues_repo" ON "issues" USING btree ("repo_id");--> statement-breakpoint | ||
| CREATE UNIQUE INDEX "idx_pr_repo_number" ON "pull_requests" USING btree ("repo_id","github_number");--> statement-breakpoint | ||
| CREATE INDEX "idx_pr_repo_state" ON "pull_requests" USING btree ("repo_id","state");--> statement-breakpoint | ||
| CREATE INDEX "idx_vision_chunks_repo" ON "vision_chunks" USING btree ("repo_id");--> statement-breakpoint | ||
| CREATE INDEX "idx_pr_embedding" ON "pull_requests" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint | ||
| CREATE INDEX "idx_issue_embedding" ON "issues" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint | ||
| CREATE INDEX "idx_vision_embedding" ON "vision_chunks" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint | ||
| CREATE INDEX "idx_pr_staleness" ON "pull_requests" ("repo_id", "staleness_stage") WHERE "state" = 'open'; No newline at end of file |
There was a problem hiding this comment.
put indexes of columns beneath their table definition
| CREATE INDEX "idx_actions_repo_target" ON "actions_log" USING btree ("repo_id","target_type","target_number");--> statement-breakpoint | ||
| CREATE UNIQUE INDEX "idx_cluster_member_unique" ON "cluster_members" USING btree ("cluster_id","item_id");--> statement-breakpoint | ||
| CREATE INDEX "idx_cluster_members_cluster" ON "cluster_members" USING btree ("cluster_id");--> statement-breakpoint | ||
| CREATE INDEX "idx_cluster_members_item" ON "cluster_members" USING btree ("item_type","item_id");--> statement-breakpoint | ||
| CREATE INDEX "idx_clusters_repo_status" ON "clusters" USING btree ("repo_id","status");--> statement-breakpoint | ||
| CREATE UNIQUE INDEX "idx_issue_repo_number" ON "issues" USING btree ("repo_id","github_number");--> statement-breakpoint | ||
| CREATE INDEX "idx_issues_repo" ON "issues" USING btree ("repo_id");--> statement-breakpoint | ||
| CREATE UNIQUE INDEX "idx_pr_repo_number" ON "pull_requests" USING btree ("repo_id","github_number");--> statement-breakpoint | ||
| CREATE INDEX "idx_pr_repo_state" ON "pull_requests" USING btree ("repo_id","state");--> statement-breakpoint | ||
| CREATE INDEX "idx_vision_chunks_repo" ON "vision_chunks" USING btree ("repo_id");--> statement-breakpoint | ||
| CREATE INDEX "idx_pr_embedding" ON "pull_requests" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint | ||
| CREATE INDEX "idx_issue_embedding" ON "issues" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint | ||
| CREATE INDEX "idx_vision_embedding" ON "vision_chunks" USING hnsw ("embedding" vector_cosine_ops);--> statement-breakpoint | ||
| CREATE INDEX "idx_pr_staleness" ON "pull_requests" ("repo_id", "staleness_stage") WHERE "state" = 'open'; No newline at end of file |
There was a problem hiding this comment.
no need to write using btree, thats the default
| export function getDb(): Database { | ||
| if (db) return db; | ||
|
|
||
| const url = process.env["DATABASE_URL"]; |
There was a problem hiding this comment.
db credentials should not come from env, instead they should come from docker secrets
https://docs.docker.com/compose/how-tos/use-secrets/
| max: 10, | ||
| idle_timeout: 20, | ||
| connect_timeout: 10, |
There was a problem hiding this comment.
all of this should come through config (env)
There was a problem hiding this comment.
the get db function or anything that is getting the db conn should just be defined at one place and use everywhere
Complete Drizzle ORM setup with pgvector integration: - Enable pgvector extension and add vector(3072) embedding columns to pull_requests, issues, and vision_chunks tables - Add HNSW indexes on all embedding vectors for ANN search - Add connection pooling with postgres-js (max 10, idle timeout 20s) - Add programmatic migration runner (tsx src/migrate.ts) - Add drizzle-kit config for schema generation and studio - Add partial index on pull_requests for open PR staleness queries - Update db package exports to include connection utilities - Fix core package.json exports (add default entry)
- core: tests for normalizeRankingWeights and validateStalenessConfig - core: tests for constants values and weight invariants - db: tests for connection singleton, env validation, close behavior - Add 85% coverage thresholds to core and db vitest configs - Exclude migrate.ts from db coverage (auto-executing script)
8012504 to
e3d22c6
Compare
Summary
vector(3072)embedding columns topull_requests,issues, andvision_chunkstablespostgres-js(max 10 connections, 20s idle timeout)tsx src/migrate.ts) replacingdrizzle-kit migratedrizzle-kitconfig for schema generation and Drizzle Studiopull_requests(repo_id, staleness_stage) WHERE state = 'open'Schema (6 tables)
repositoriespull_requestsissuesclusterscluster_membersvision_chunksactions_logTRD Phase 1 Deliverable
Database schema and migrations (Drizzle ORM, pgvector extension)Test plan
pnpm db:generateand verify migration SQL matches schemapnpm db:migrateagainst a local pgvector-enabled Postgres