Skip to content

filter: migrate to ZSTD_compressStream2 (fix flush-promotion stall, worker spin, 131072 truncation)#52

Open
hsw wants to merge 2 commits into
tokers:masterfrom
hsw:pr/compressstream2
Open

filter: migrate to ZSTD_compressStream2 (fix flush-promotion stall, worker spin, 131072 truncation)#52
hsw wants to merge 2 commits into
tokers:masterfrom
hsw:pr/compressstream2

Conversation

@hsw

@hsw hsw commented May 31, 2026

Copy link
Copy Markdown

filter: migrate to ZSTD_compressStream2 (fix flush-promotion stall, worker spin, 131072 truncation)

Summary

Migrate the compression filter from the legacy three-call streaming API
(ZSTD_compressStream / ZSTD_flushStream / ZSTD_endStream driven by an
action state machine) to libzstd's unified
ZSTD_compressStream2(cctx, &out, &in, op), modelled on ngx_brotli's
per-call-op pattern.

This fixes three long-standing bugs at the root:

Root cause

The body filter only promotes COMPRESS -> FLUSH when libzstd reports pending
output:

if (rc > 0) {
    if (ctx->action == NGX_HTTP_ZSTD_FILTER_COMPRESS) {
        ctx->action = NGX_HTTP_ZSTD_FILTER_FLUSH;
    }
    ...

Two failure modes fall out of this:

  1. Stall / no flush — when a small flush'd chunk is swallowed by libzstd
    without producing output (rc == 0), no ZSTD_flushStream() is ever
    issued. The bytes sit in libzstd's internal buffer and the response stalls
    until the upstream connection closes. Reported on chunked / SSE / Upgrade
    responses.

  2. Infinite loop (bugfix: fix zstd module infinite loop when upstream return content-length abnormal #23) — when add_data hands the encoder a zero-length
    input buffer (NULL chain on a not-yet-finalized request) and
    ZSTD_compressStream returns 0, the loop makes no progress and never
    breaks — the worker spins at 100% CPU.

The 131072 truncation (#25) is the same family: the per-call bookkeeping around
the input buffer drops the remainder of an oversized buffer instead of draining
it.

The fix

Replace the action state machine with the per-call-op pattern from
ngx_brotli:

  1. Per-call op. Sticky ctx->last / ctx->flush flags are set in
    add_data from the input buf's last_buf / flush, and the
    ZSTD_EndDirective is derived per call (last -> ZSTD_e_end,
    flush -> ZSTD_e_flush, else ZSTD_e_continue). A finish op is complete
    when libzstd returns rc == 0. When a flush/end boundary carries no pending
    bytes, a separate zero-size sync buf is emitted so the boundary is visible
    to the write filter (a recycled temporary out_buf would be dropped by
    ngx_buf_special).

  2. Zero-progress guard. In ZSTD_e_continue mode the encoder must make
    forward progress; if a call consumes no input and emits no output the
    request is aborted instead of spinning the loop. This closes bugfix: fix zstd module infinite loop when upstream return content-length abnormal #23.

  3. NULL+0 UB guard. in_buf->pos / out_buf->last are advanced only when
    the delta is non-zero — a flush-only call can leave buffer_in.src NULL,
    and NULL + 0 is undefined behaviour (UBSan-reported) even though the
    result is unused.

  4. Modernized CCtx setup. ZSTD_CCtx_reset(session_only) followed by
    ZSTD_CCtx_refCDict (dict case) or
    ZSTD_CCtx_setParameter(ZSTD_c_compressionLevel), dropping the legacy
    ZSTD_initCStream / ZSTD_initCStream_usingCDict path and the >= 1.5.0
    version gate around the deprecated usingCDict call.

Split into two commits — a mechanical ZSTD_compressStream2 swap, then the
state-machine retirement — so each step is reviewable and bisectable on its own.

Compatibility note

ZSTD_compressStream2 has been the stable streaming API since libzstd 1.4.0
(Apr 2019), so this raises the minimum to 1.4.0 via a compile-time #error.
Every currently supported distro ships >= 1.4.4 (Ubuntu 20.04 is 1.4.4,
22.04 is 1.4.8). The static .zst module is unaffected — it does not include
zstd.h.

Testing

Verified against nginx.org-mainline builds on Ubuntu 22.04 / 24.04 / 26.04
(libzstd 1.4.8 / 1.5.5 / 1.5.7):

  • Worker spin / no-hang (bugfix: fix zstd module infinite loop when upstream return content-length abnormal #23): fails on the pre-migration baseline (worker
    burns ~2000 ms CPU over a 2 s idle window after a short read); passes after
    the change.
  • Proxy flush-promotion (chunked / chunked-off-tiny / SSE / Upgrade, HTTP/1.1
    and HTTP/2):
    delivered within the TTFB budget after the change.
  • 131072 truncation (endless loop  #25): the >128 KB roundtrip cases
    (131073 / 200000, and the unbuffered-proxy 131071 / 131073 / 200000) fail on
    the baseline and pass after the change.
  • Full compress / roundtrip / dict / HTTP/2-multiplex / WebSocket suite green.
  • ASan + UBSan (-DNGX_DEBUG_PALLOC=1): no diagnostics. Five-tool SAST sweep
    (scan-build / clang-tidy / cppcheck / gcc-fanalyzer / flawfinder): no new
    findings versus the unmodified baseline.

This has been running in production in a downstream fork; the fork carries a
broader pytest + Docker regression matrix (multi-distro, ASan/UBSan, Valgrind,
SAST) that exercises the change end to end.

Relationship to existing PRs

hsw and others added 2 commits May 31, 2026 13:28
Replace the three legacy streaming primitives (ZSTD_compressStream /
ZSTD_flushStream / ZSTD_endStream) with the unified
ZSTD_compressStream2(cctx, &out, &in, op) call. The action state machine
still selects the op for this step (FLUSH -> ZSTD_e_flush, END ->
ZSTD_e_end, default -> ZSTD_e_continue), so behaviour is unchanged; this
is a mechanical swap that isolates the API migration from the
state-machine retirement in the following commit.

ZSTD_compressStream2 is the unified streaming API available since libzstd
1.4.0 and is the path libzstd's own documentation steers new code toward.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d 1.4.0

Builds on the previous ZSTD_compressStream2 swap to fix the flush-promotion
stall reported in issue tokers#23 (mklooss, Stensel8, lowkeypriority).

Root cause: the old action state machine only promoted COMPRESS -> FLUSH
when libzstd reported pending output (rc > 0). For a small flush'd chunk
that libzstd swallowed without emitting output (rc == 0), no flush op was
ever issued -- the bytes stayed in libzstd's internal buffer and the worker
stalled until the upstream connection closed. Symptoms: chunked / SSE /
Upgrade responses delivered late, and a worker spinning at 100% CPU after a
short read.

Changes:

1. Per-call-op pattern (modelled on ngx_brotli). Replace the
   COMPRESS/FLUSH/END enum + redo flag with sticky ctx->last / ctx->flush
   flags set in add_data from the input buf's last_buf / flush, and derive
   the ZSTD_EndDirective per call (last -> ZSTD_e_end, flush ->
   ZSTD_e_flush, else ZSTD_e_continue). A finish op is complete when
   libzstd returns rc == 0. When a flush/end boundary has no pending bytes
   we now emit a separate zero-size sync buf so the boundary is visible to
   the write filter (a recycled temporary out_buf would be dropped by
   ngx_buf_special).

2. Zero-progress guard. In ZSTD_e_continue mode the encoder must make
   forward progress; if a call consumes no input and emits no output we
   abort instead of spinning the body-filter loop. This closes the
   worker-spin / infinite-loop class.

3. NULL+0 UB guard. Only advance in_buf->pos / out_buf->last when the
   delta is non-zero -- a flush-only call can leave buffer_in.src NULL, and
   NULL + 0 is undefined behaviour (UBSan-reported) even though the result
   is unused.

4. Modernize the CCtx setup. Use ZSTD_CCtx_reset(session_only) followed by
   ZSTD_CCtx_refCDict (dict case) or
   ZSTD_CCtx_setParameter(ZSTD_c_compressionLevel) unconditionally, dropping
   the legacy ZSTD_initCStream / ZSTD_initCStream_usingCDict path and the
   >= 1.5.0 version gate around the deprecated usingCDict call.

5. Require libzstd >= 1.4.0 (compile-time #error). ZSTD_compressStream2 has
   been the stable streaming API since 1.4.0 (Apr 2019); every currently
   supported distro ships >= 1.4.4. The static module is unaffected (it does
   not include zstd.h).

Verified against an nginx.org-mainline build on Ubuntu 24.04 (libzstd
1.5.x): the worker-spin and proxy flush-promotion regressions that fail on
the pre-migration baseline now pass, and the full compress / roundtrip /
truncation / dict / http2 / websocket suite is green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 31, 2026 11:28

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates the NGINX zstd body filter to use modern libzstd streaming APIs and fixes flush/end propagation when zstd produces no output for small inputs.

Changes:

  • Require libzstd >= 1.4.0 and migrate streaming compression to ZSTD_compressStream2().
  • Replace the previous action/redo state machine with sticky flush/last flags and unified post-compress handling (including signal-only buffers).
  • Modernize cstream initialization using ZSTD_CCtx_reset() + ZSTD_CCtx_refCDict() / ZSTD_CCtx_setParameter().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread filter/ngx_http_zstd_filter_module.c
Comment thread filter/ngx_http_zstd_filter_module.c
Comment on lines +14 to +16
#if ZSTD_VERSION_NUMBER < 10400
#error "libzstd 1.4.0 or later required for ZSTD_compressStream2"
#endif
@eilandert

Copy link
Copy Markdown
Contributor

hi @hsw

I would like to invite you to https://github.com/eilandert/zstd-nginx-module to test the module and fix it more up. i fixed 36 bugs and did multiple optimizations, new directives and a load of new testing, (and fixed these bugs). I feel the only real thing i am missing is a second review and production testing before i can submit back.

I have emailed the original author but he doesn't respond.

@hsw

hsw commented Jun 2, 2026

Copy link
Copy Markdown
Author

Thanks, I already have my own fork with bug fixes and better memory management (auto window).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

endless loop

3 participants