Skip to content

fix(db): add max_lifetime to prevent zombie connections causing CONNECT_TIMEOUT#1007

Closed
NieiR wants to merge 4 commits intoding113:devfrom
NieiR:fix/db-zombie-connections
Closed

fix(db): add max_lifetime to prevent zombie connections causing CONNECT_TIMEOUT#1007
NieiR wants to merge 4 commits intoding113:devfrom
NieiR:fix/db-zombie-connections

Conversation

@NieiR
Copy link
Copy Markdown
Contributor

@NieiR NieiR commented Apr 10, 2026

问题描述

在高并发或长时间运行的 Docker 环境中,postgres.js 连接池中的连接可能被网络层(Docker bridge/防火墙)静默断开,但应用层仍认为连接有效。当复用这些"僵尸连接"时,会导致 write CONNECT_TIMEOUT 错误,表现为:

  • SSR 页面(如 /zh-CN/login)TTFB 高达 20 秒
  • 日志中出现大量 write CONNECT_TIMEOUT postgres:5432 错误
  • API 端点(如 /api/actions/health)正常,仅 SSR 受影响

根因分析

postgres.js 默认 max_lifetime = 0(连接永不重建)。Docker 网络对空闲 TCP 连接有隐形超时(通常 10-15 分钟),当连接被静默断开后,应用层 socket 仍显示"已连接"。查询时 write() 到已死 socket,触发 10 秒 connect_timeout 后失败。

修复方案

添加 max_lifetime: 30 * 60(30分钟),强制连接定期重建,避免超过 Docker 网络隐形超时。

const client = postgres(connectionString, {
  max: env.DB_POOL_MAX ?? defaultMax,
  idle_timeout: env.DB_POOL_IDLE_TIMEOUT ?? 20,
  connect_timeout: env.DB_POOL_CONNECT_TIMEOUT ?? 10,
  max_lifetime: 30 * 60,  // 30分钟后强制重建
  prepare: false,          // 禁用 prepared statements 避免复用问题
});

验证

  • 部署后 20 分钟,CONNECT_TIMEOUT 错误:0
  • SSR TTFB:~70ms(正常)
  • 连接池状态:3 个 idle 连接,正常重建

检查清单

  • 目标分支为 dev
  • 运行 bun run lint 通过
  • 运行 bun run typecheck 通过
  • Docker Build Test 通过(等待 CI)
  • 与 main 无直接冲突

Greptile Summary

This PR fixes zombie connections in Docker environments by adding max_lifetime: 1800 to the postgres.js pool config, ensuring connections are recycled every 30 minutes before Docker's network layer silently kills them. It also adds prepare: false to disable prepared statements. Tests are updated to assert the new config values.

Confidence Score: 5/5

Safe to merge — the core fix is correct and well-tested; previously raised P2 concerns have been discussed in prior threads

No new P0 or P1 findings. The max_lifetime fix directly addresses the described zombie-connection root cause, tests cover all three pool config scenarios, and the .dockerignore change is harmless. Prior thread concerns are P2 level and have already been surfaced to the author.

No files require special attention

Important Files Changed

Filename Overview
src/drizzle/db.ts Adds max_lifetime: 1800 and prepare: false to the postgres.js client config; both values are hardcoded (unlike other pool options which are env-overridable)
tests/unit/drizzle/db-pool-config.test.ts Tests updated to assert max_lifetime: 1800 and prepare: false in all three test scenarios; coverage is adequate for the new config
.dockerignore Adds .worktrees to prevent local git worktree directories from being included in Docker build context

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Incoming Query] --> B{Pool has idle connection?}
    B -- Yes --> C{Connection age > max_lifetime 1800s?}
    C -- No --> D[Reuse connection]
    C -- Yes --> E[Gracefully close & create new connection]
    E --> D
    B -- No --> F{Pool size < max?}
    F -- Yes --> G[Create new connection]
    G --> D
    F -- No --> H[Wait in queue]
    H --> D
    D --> I{Connection idle > idle_timeout 20s?}
    I -- Yes --> J[Close connection, remove from pool]
    I -- No --> K[Return to pool]
Loading

Reviews (2): Last reviewed commit: "chore: format code (fix-db-zombie-connec..." | Re-trigger Greptile

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

在 Docker 构建忽略列表中新增 .worktrees,并在 Postgres 客户端配置中加入 max_lifetime: 30 * 60prepare: false,同时更新相应单元测试以匹配这些新增配置项。

Changes

Cohort / File(s) Summary
Docker 忽略配置
./.dockerignore
.worktrees 添加到 Docker 构建上下文的忽略列表。
数据库连接与测试
src/drizzle/db.ts, tests/unit/drizzle/db-pool-config.test.ts
向 Postgres 连接配置添加 max_lifetime: 30 * 60prepare: false;相应单元测试增加对这两个字段的断言。

Estimated code review effort

🎯 3 (中等复杂度) | ⏱️ ~20 分钟

Possibly related PRs

  • fix: lack of env while startup #872 — 涉及数据库连接池配置(为相同环境变量提供默认值),与本次为 Postgres 客户端添加固定选项的改动在配置表面相关。
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题清晰准确地总结了主要变更:添加 max_lifetime 参数来防止僵尸连接导致的 CONNECT_TIMEOUT 错误。
Description check ✅ Passed PR 描述详细解释了问题、根因和修复方案,与变更集直接相关。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Add max_lifetime: 30 minutes to force connection recreation,
preventing 'write CONNECT_TIMEOUT' errors caused by silently
disconnected TCP sockets in the connection pool.

Also disable prepared statements to avoid pool reuse issues.

Fixes #ISSUE
@NieiR NieiR force-pushed the fix/db-zombie-connections branch from 84f6887 to e0c0806 Compare April 10, 2026 04:56
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds .worktrees to the .dockerignore file and updates the database connection pool configuration in src/drizzle/db.ts to prevent zombie connections by setting a maximum lifetime and disabling prepared statements. Feedback suggests making these new parameters configurable through environment variables to ensure consistency with existing settings and to prevent potential logic conflicts with the idle timeout schema.

@github-actions github-actions bot added bug Something isn't working area:core size/XS Extra Small PR (< 50 lines) labels Apr 10, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Summary

This PR addresses a legitimate production issue with zombie database connections. The fix is targeted but introduces minor inconsistencies with project conventions.

PR Size: XS

  • Lines changed: 9
  • Files changed: 2

Issues Found

Category Critical High Medium Low
Logic/Bugs 0 0 0 0
Security 0 0 0 0
Error Handling 0 0 0 0
Types 0 0 0 0
Comments/Docs 0 0 0 0
Tests 0 0 1 0
Standards 0 0 1 0

High Priority Issues (Should Fix)

1. [TEST-INCOMPLETE] New connection pool options not verified (Confidence: 85)

File: tests/unit/drizzle/db-pool-config.test.ts

The new max_lifetime and prepare options are not asserted in existing tests. Update test assertions:

expect(postgresMock).toHaveBeenCalledWith(
  process.env.DSN,
  expect.objectContaining({
    max: 20,
    idle_timeout: 20,
    connect_timeout: 10,
    max_lifetime: 1800,  // Add
    prepare: false,       // Add
  })
);

2. [STANDARD-VIOLATION] Hardcoded value breaks env pattern (Confidence: 90)

File: src/drizzle/db.ts:31, src/lib/config/env.schema.ts

All other pool settings support env overrides. The hardcoded max_lifetime: 30 * 60 is inconsistent. Consider adding DB_POOL_MAX_LIFETIME env variable:

// env.schema.ts
DB_POOL_MAX_LIFETIME: optionalNumber(z.number().int().min(60).max(7200)),

// db.ts
max_lifetime: env.DB_POOL_MAX_LIFETIME ?? 1800,

Review Coverage

  • Logic and correctness - Fix addresses zombie connections appropriately
  • Security (OWASP Top 10) - No security issues identified
  • Error handling - No changes to error handling patterns
  • Type safety - No new types introduced
  • Documentation accuracy - Comments explain the fix well
  • Test coverage - Gap in test assertions for new options
  • Code clarity - Clear intent and commenting

Automated review by Claude AI

Add max_lifetime: 30 minutes to force connection recreation,
preventing 'write CONNECT_TIMEOUT' errors caused by silently
disconnected TCP sockets in the connection pool.

Also disable prepared statements to avoid pool reuse issues.

Fixes #ISSUE
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/drizzle/db.ts`:
- Line 30: The hardcoded prepare: false in the DB connection config should be
controlled by an env var; change the connection option (the prepare setting used
when creating the Postgres.js/Drizzle client) to read from an environment
variable like DB_PREPARE_STATEMENTS (default true), so prepared statements
remain enabled unless DB_PREPARE_STATEMENTS is explicitly set to "false" for
special PgBouncer/transaction-mode cases; update the inline comment around the
prepare key to document that disabling prepared statements is a
PgBouncer-specific workaround (e.g., for old PgBouncer transaction mode) and not
the default best practice.
- Around line 25-29: The patch fixes a problem where the Postgres client options
hard-code max_lifetime to 30 minutes which defeats the library's intended
randomized connection TTL; update the postgres(...) options in the client
initialization (the const client creation) to stop forcing max_lifetime to a
fixed value—either remove the max_lifetime entry to preserve the library's
default randomized TTL behavior or make it configurable via an environment
variable (e.g. env.DB_MAX_LIFETIME) with clear documentation and a sensible
default that only overrides the library if explicitly set; ensure references to
max_lifetime in the options are updated and document the change in comments
where the client is created.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9ff80ca2-c858-4e55-a3a6-c23950cc1201

📥 Commits

Reviewing files that changed from the base of the PR and between b5eb7d5 and 7a17583.

📒 Files selected for processing (2)
  • .dockerignore
  • src/drizzle/db.ts

NieiR

This comment was marked as resolved.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 10, 2026

Tip:

Greploop — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

NieiR

This comment was marked as resolved.

@NieiR NieiR closed this Apr 10, 2026
@github-project-automation github-project-automation bot moved this from Backlog to Done in Claude Code Hub Roadmap Apr 10, 2026
@NieiR NieiR deleted the fix/db-zombie-connections branch April 11, 2026 00:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core bug Something isn't working size/XS Extra Small PR (< 50 lines)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant