Skip to content
Eugene Lazutkin edited this page Jun 7, 2026 · 23 revisions

stream-json processes huge inputs, so a tiny per-item cost compounds: at one microsecond per operation a billion operations cost ~16.5 minutes; at one millisecond, ~11.5 days. This page is how to keep a pipeline on the fast path. The measured comparisons are in Benchmarks; the model behind the advice is in Concepts.

Measure, don't guess

Performance here is empirical. Two rules before you trust any number:

  • Benchmark your real shape. A sync, filter-heavy pipeline and an async, lookup-heavy one behave nothing alike; the same change can be a big win on one and a wash on the other. Measure the workload you actually run.
  • Synthetic numbers over-state real gains. An isolated micro-benchmark strips out the parsing, allocation, and I/O that share wall-time in a real pipeline. Treat any figure as directional and measure end to end before relying on it.

Keep the pipeline short

Every stage is a call, and every stream boundary is more than a call. Combine small filters and transforms into one function where you can:

import chain from 'stream-chain';

// fine-grained, but more stages than it needs
chain([
  source,
  data => (data.key % 2 ? data : chain.none), // filter
  data => (data.value.important ? data : chain.none), // filter
  data => data.value.price, // transform
  price => price * taxRate // transform
]);

// one function, same result
chain([source, data => (data.key % 2 && data.value.important ? data.value.price * taxRate : chain.none)]);

Return chain.none to drop an item from a function stage — the one drop signal that works everywhere (null only drops at a stream boundary; see stream-chain).

Reserve stream boundaries for stages that emit a varying number of items — that is where built-in backpressure earns its cost. Between plain per-item transforms, a function call is far cheaper than a stream.

Filter early, and cheaply

Less traffic downstream means a faster pipeline. Order filters so the cheap, high-rejection ones run first:

// important objects are rare; valid() is expensive

// worse: every item pays for valid() first
chain([source, data => (valid(data) ? data : chain.none), data => (data.value.important ? data : chain.none)]);

// better: the cheap, selective test rejects most items before valid() runs
chain([source, data => (data.value.important && valid(data) ? data : chain.none)]);

When you only need the first match, set {once: true} on a filter so it stops processing the stream after it matches — common with a string filter doing a direct path match.

Tune what the parser emits

Parser emits each scalar two ways by default — as streamed chunks and as a packed value — so any downstream component finds what it needs. If you consume only one form, turn the other off to cut traffic:

import {parser} from 'stream-json';

parser(); // startString, stringChunk..., endString, stringValue
parser({packValues: false}); // startString, stringChunk..., endString   (no packed value)
parser({streamValues: false}); // stringValue                            (no chunks)

Know what downstream needs first: filters need packed key values; Stringer needs chunks unless you switch it to packed values (below). When you do not need token events, use parser() directly instead of the main module, which decorates the parser with emit():

import makeParser from 'stream-json';
import {parser} from 'stream-json';

makeParser(); // parser + emit() decoration
parser(); // just the parser, no event machinery

Assemble only what you need

Every streamer takes an objectFilter — a predicate run during assembly, so a rejected object is dropped before it is fully built. It pays off only when the deciding field arrives early; two caveats:

  • If the field you test usually comes last, the whole object is assembled before you can decide — no saving over filtering afterward.
  • The predicate runs on every update during assembly; if it is expensive, filtering after assembly can be cheaper.
import chain from 'stream-chain';
import {streamArray} from 'stream-json/streamers/stream-array.js';

// reject during assembly (when `important` appears early)
chain([
  source,
  streamArray({
    objectFilter: asm => (asm.current?.important === undefined ? undefined : asm.current.important)
  })
]);

// reject after assembly (when the decision needs the whole object)
chain([source, streamArray(), data => (data.value.important ? data : chain.none)]);

When you drive the Assembler yourself, you can skip its consume() wrapper and dispatch tokens directly: data => asm[data.name]?.(data.value).

Read and write files in one pass

For file input or output, the Node-only file edges — parseFile and stringerToFile — fold the file handles into the pipeline as its only I/O edges and fuse everything between into a single executor, with no per-item stream boundary. That is measurably faster than wiring createReadStream / createWriteStream with on('data') — see Benchmarks. They are backpressured end to end, so memory stays flat even with a file edge at both ends (see Concepts).

Pick the cheapest stage type

chain() accepts several kinds of stage. In rough order of per-item cost, cheapest first:

  1. plain functions
  2. async functions
  3. generator functions
  4. async generator functions
  5. Node streams
  6. Web streams

Prefer the leanest a stage actually needs. A plain function handles zero or one output (return the value, or chain.none to drop) and even bounded fan-out — return chain.many([...]), which is typically faster than a generator. Reach for a generator only when the fan-out is large or unbounded, where its laziness keeps memory flat (chain.many holds its outputs in an array). Use a stream only at a real I/O or backpressure edge. The ranking shifts between runtimes and versions, so if a stage is hot, measure the alternatives rather than trust the list.

Component notes

  • Replace / Ignore. Replace substitutes a matched value with tokens you supply, and can generate keys streamed (startKey / stringChunk / endKey) or packed (replace({streamKeys: false})keyValue). Ignore is Replace with an empty replacement. Keep the key/value style consistent across the pipeline.

  • Stringer. It serializes from value chunks by default, so it breaks if an upstream parser({streamValues: false}) removed them. Switch it to packed values with useValues (or useKeyValues / useStringValues / useNumberValues) to match — keep the whole pipeline on one value style.

  • Emitter / emit(). Convenience helpers; on a hot path, read tokens directly instead of attaching events:

    // instead of emit(pipeline) + pipeline.on('startObject', ...)
    pipeline.on('data', data => data.name === 'startObject' && count());
  • withParser(). The functional form drops into chain() with no extra boundary; withParserAsStream() wraps it in a Duplex for .pipe() and adds one. Prefer the functional form inside a chain.

JSONL

For line-delimited JSON, a dedicated splitter that JSON.parses each line is much faster than running the full tokenizer with parser({jsonStreaming: true}) + StreamValues. That component lives in stream-chain; stream-json's JSONL is a deprecated re-export of it.

Where to go next

Clone this wiki locally