filter: migrate to ZSTD_compressStream2 (fix flush-promotion stall, worker spin, 131072 truncation)#52
filter: migrate to ZSTD_compressStream2 (fix flush-promotion stall, worker spin, 131072 truncation)#52hsw wants to merge 2 commits into
Conversation
Replace the three legacy streaming primitives (ZSTD_compressStream / ZSTD_flushStream / ZSTD_endStream) with the unified ZSTD_compressStream2(cctx, &out, &in, op) call. The action state machine still selects the op for this step (FLUSH -> ZSTD_e_flush, END -> ZSTD_e_end, default -> ZSTD_e_continue), so behaviour is unchanged; this is a mechanical swap that isolates the API migration from the state-machine retirement in the following commit. ZSTD_compressStream2 is the unified streaming API available since libzstd 1.4.0 and is the path libzstd's own documentation steers new code toward. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d 1.4.0 Builds on the previous ZSTD_compressStream2 swap to fix the flush-promotion stall reported in issue tokers#23 (mklooss, Stensel8, lowkeypriority). Root cause: the old action state machine only promoted COMPRESS -> FLUSH when libzstd reported pending output (rc > 0). For a small flush'd chunk that libzstd swallowed without emitting output (rc == 0), no flush op was ever issued -- the bytes stayed in libzstd's internal buffer and the worker stalled until the upstream connection closed. Symptoms: chunked / SSE / Upgrade responses delivered late, and a worker spinning at 100% CPU after a short read. Changes: 1. Per-call-op pattern (modelled on ngx_brotli). Replace the COMPRESS/FLUSH/END enum + redo flag with sticky ctx->last / ctx->flush flags set in add_data from the input buf's last_buf / flush, and derive the ZSTD_EndDirective per call (last -> ZSTD_e_end, flush -> ZSTD_e_flush, else ZSTD_e_continue). A finish op is complete when libzstd returns rc == 0. When a flush/end boundary has no pending bytes we now emit a separate zero-size sync buf so the boundary is visible to the write filter (a recycled temporary out_buf would be dropped by ngx_buf_special). 2. Zero-progress guard. In ZSTD_e_continue mode the encoder must make forward progress; if a call consumes no input and emits no output we abort instead of spinning the body-filter loop. This closes the worker-spin / infinite-loop class. 3. NULL+0 UB guard. Only advance in_buf->pos / out_buf->last when the delta is non-zero -- a flush-only call can leave buffer_in.src NULL, and NULL + 0 is undefined behaviour (UBSan-reported) even though the result is unused. 4. Modernize the CCtx setup. Use ZSTD_CCtx_reset(session_only) followed by ZSTD_CCtx_refCDict (dict case) or ZSTD_CCtx_setParameter(ZSTD_c_compressionLevel) unconditionally, dropping the legacy ZSTD_initCStream / ZSTD_initCStream_usingCDict path and the >= 1.5.0 version gate around the deprecated usingCDict call. 5. Require libzstd >= 1.4.0 (compile-time #error). ZSTD_compressStream2 has been the stable streaming API since 1.4.0 (Apr 2019); every currently supported distro ships >= 1.4.4. The static module is unaffected (it does not include zstd.h). Verified against an nginx.org-mainline build on Ubuntu 24.04 (libzstd 1.5.x): the worker-spin and proxy flush-promotion regressions that fail on the pre-migration baseline now pass, and the full compress / roundtrip / truncation / dict / http2 / websocket suite is green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates the NGINX zstd body filter to use modern libzstd streaming APIs and fixes flush/end propagation when zstd produces no output for small inputs.
Changes:
- Require libzstd >= 1.4.0 and migrate streaming compression to
ZSTD_compressStream2(). - Replace the previous action/redo state machine with sticky
flush/lastflags and unified post-compress handling (including signal-only buffers). - Modernize cstream initialization using
ZSTD_CCtx_reset()+ZSTD_CCtx_refCDict()/ZSTD_CCtx_setParameter().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #if ZSTD_VERSION_NUMBER < 10400 | ||
| #error "libzstd 1.4.0 or later required for ZSTD_compressStream2" | ||
| #endif |
|
hi @hsw I would like to invite you to https://github.com/eilandert/zstd-nginx-module to test the module and fix it more up. i fixed 36 bugs and did multiple optimizations, new directives and a load of new testing, (and fixed these bugs). I feel the only real thing i am missing is a second review and production testing before i can submit back. I have emailed the original author but he doesn't respond. |
|
Thanks, I already have my own fork with bug fixes and better memory management (auto window). |
filter: migrate to ZSTD_compressStream2 (fix flush-promotion stall, worker spin, 131072 truncation)
Summary
Migrate the compression filter from the legacy three-call streaming API
(
ZSTD_compressStream/ZSTD_flushStream/ZSTD_endStreamdriven by anactionstate machine) to libzstd's unifiedZSTD_compressStream2(cctx, &out, &in, op), modelled onngx_brotli'sper-call-op pattern.
This fixes three long-standing bugs at the root:
abnormal Content-Length.
than
ZSTD_CStreamInSize()(the bug filter: fix silent truncation at 131072 bytes for inputs >ZSTD_CStreamInSize #49 also targets).Upgradeflush-promotion stall (small flush'dchunks delivered late, only when the upstream connection closes).
Root cause
The body filter only promotes
COMPRESS -> FLUSHwhen libzstd reports pendingoutput:
Two failure modes fall out of this:
Stall / no flush — when a small flush'd chunk is swallowed by libzstd
without producing output (
rc == 0), noZSTD_flushStream()is everissued. The bytes sit in libzstd's internal buffer and the response stalls
until the upstream connection closes. Reported on chunked / SSE /
Upgraderesponses.
Infinite loop (bugfix: fix zstd module infinite loop when upstream return content-length abnormal #23) — when
add_datahands the encoder a zero-lengthinput buffer (NULL chain on a not-yet-finalized request) and
ZSTD_compressStreamreturns0, the loop makes no progress and neverbreaks — the worker spins at 100% CPU.
The 131072 truncation (#25) is the same family: the per-call bookkeeping around
the input buffer drops the remainder of an oversized buffer instead of draining
it.
The fix
Replace the action state machine with the per-call-op pattern from
ngx_brotli:Per-call op. Sticky
ctx->last/ctx->flushflags are set inadd_datafrom the input buf'slast_buf/flush, and theZSTD_EndDirectiveis derived per call (last -> ZSTD_e_end,flush -> ZSTD_e_flush, elseZSTD_e_continue). A finish op is completewhen libzstd returns
rc == 0. When a flush/end boundary carries no pendingbytes, a separate zero-size
syncbuf is emitted so the boundary is visibleto the write filter (a recycled temporary out_buf would be dropped by
ngx_buf_special).Zero-progress guard. In
ZSTD_e_continuemode the encoder must makeforward progress; if a call consumes no input and emits no output the
request is aborted instead of spinning the loop. This closes bugfix: fix zstd module infinite loop when upstream return content-length abnormal #23.
NULL+0 UB guard.
in_buf->pos/out_buf->lastare advanced only whenthe delta is non-zero — a flush-only call can leave
buffer_in.srcNULL,and
NULL + 0is undefined behaviour (UBSan-reported) even though theresult is unused.
Modernized CCtx setup.
ZSTD_CCtx_reset(session_only)followed byZSTD_CCtx_refCDict(dict case) orZSTD_CCtx_setParameter(ZSTD_c_compressionLevel), dropping the legacyZSTD_initCStream/ZSTD_initCStream_usingCDictpath and the>= 1.5.0version gate around the deprecated
usingCDictcall.Split into two commits — a mechanical
ZSTD_compressStream2swap, then thestate-machine retirement — so each step is reviewable and bisectable on its own.
Compatibility note
ZSTD_compressStream2has been the stable streaming API since libzstd 1.4.0(Apr 2019), so this raises the minimum to 1.4.0 via a compile-time
#error.Every currently supported distro ships >= 1.4.4 (Ubuntu 20.04 is 1.4.4,
22.04 is 1.4.8). The static
.zstmodule is unaffected — it does not includezstd.h.Testing
Verified against nginx.org-mainline builds on Ubuntu 22.04 / 24.04 / 26.04
(libzstd 1.4.8 / 1.5.5 / 1.5.7):
burns ~2000 ms CPU over a 2 s idle window after a short read); passes after
the change.
and HTTP/2): delivered within the TTFB budget after the change.
(131073 / 200000, and the unbuffered-proxy 131071 / 131073 / 200000) fail on
the baseline and pass after the change.
-DNGX_DEBUG_PALLOC=1): no diagnostics. Five-tool SAST sweep(scan-build / clang-tidy / cppcheck / gcc-fanalyzer / flawfinder): no new
findings versus the unmodified baseline.
This has been running in production in a downstream fork; the fork carries a
broader pytest + Docker regression matrix (multi-distro, ASan/UBSan, Valgrind,
SAST) that exercises the change end to end.
Relationship to existing PRs
special-casing the abnormal-Content-Length input.
narrower drain-loop patch on the legacy API.