fix(internal_logs source): prevent silent drops and improve throughput by thomasqueirozb · Pull Request #25218 · vectordotdev/vector

thomasqueirozb · 2026-04-17T20:21:58Z

Summary

Fixes issue #24220: the internal_logs source silently dropped events under high load.

Decouples broadcast consumption from downstream sending. A dedicated drain task pulls from the trace broadcast into a bounded intermediate queue; the main task batches from the queue and calls send_batch. This keeps the broadcast receiver drained while the sink is backpressured, and amortizes per-event overhead downstream.
Surfaces any remaining drops via the standard ComponentEventsDropped / component_discarded_events_total{intentional="false"} metric, replacing the previous silent BroadcastStreamRecvError::Lagged swallow.

Benchmark

Both internal_logs and internal_metrics sources feed into their own sinks. The prometheus_exporter is scraped at the end of each 20s run to read component_received_events_total and component_discarded_events_total{intentional="false"} for the internal_logs source.

On master the drops are silently filtered. For the "master" rows below, master's into_stream() was temporarily patched to increment the same drop metric (without any tracing call, to avoid a feedback loop) so the numbers are comparable. That patch is not part of this PR.

Minimal config (from the issue: console sink)

api:
  enabled: true

sources:
  internal_logs:
    type: internal_logs
  internal_metrics:
    type: internal_metrics
    scrape_interval_secs: 1

sinks:
  show_internal_logs:
    type: console
    inputs:
      - internal_logs
    encoding:
      codec: json
  prom:
    type: prometheus_exporter
    inputs:
      - internal_metrics
    address: 127.0.0.1:9598

Blackhole sink (isolates the source path from stdout/JSON costs)

api:
  enabled: true

sources:
  internal_logs:
    type: internal_logs
  internal_metrics:
    type: internal_metrics
    scrape_interval_secs: 1

sinks:
  null_sink:
    type: blackhole
    inputs:
      - internal_logs
  prom:
    type: prometheus_exporter
    inputs:
      - internal_metrics
    address: 127.0.0.1:9598

Design comparison, console sink, `VECTOR_LOG=trace`, 20s

Isolates the source-path change. Buffer size is the broadcast capacity in src/trace.rs.

Design	Broadcast buffer	Drops
Single-task loop (master)	99	876,567
Single-task loop	10,000	848,510
Drain + batching	99	353,217
Drain + batching	10,000	333,105

Buffer size made almost no difference (~6%) once the drain + batching path was in place, so the original 99 is retained.

Sink comparison, `VECTOR_LOG=trace`, 20s

Compares master (pathed) vs this branch across the console sink (from the issue) and a blackhole sink (isolates the source).

Keep in mind that the dropped events in master without a patch show up as 0, and logs are silently dropped.

Version	Sink	Received	Dropped	Total	Drop %
master (patched)	console	129,468	776,664	906,132	85.7%
master (patched)	blackhole	147,279	883,563	1,030,842	85.7%
this branch	console	395,443	353,870	749,313	47.2%
this branch	blackhole	1,524,445	0	1,524,445	0%

Interpretation:

On master the source itself is the bottleneck: single-event send_event and broadcast consumption being coupled cap throughput at ~5k events/sec delivered and drop ~86% of events even when the sink is free (blackhole). The BroadcastStreamRecvError::Lagged path is silently filtered on stock master, so those drops aren't visible anywhere.
With the drain + batching design, the source can deliver ~76k events/sec (~10x higher delivered throughput, ~1.5x higher combined throughput) when the sink doesn't backpressure. On the console sink it still drops under trace because stdout + JSON encoding caps at ~20k events/sec, but those drops are now surfaced in metrics.

Under VECTOR_LOG=debug (normal load), both configs show zero drops.

How did you test this PR?

cargo nextest run --no-default-features --features sources-internal_logs --lib sources::internal_logs:: (all existing tests pass)
cargo vdev check events
make check-clippy
make check-fmt
Ran both configs at VECTOR_LOG=debug and VECTOR_LOG=trace, comparing component_received_events_total and component_discarded_events_total between master and this branch.

Change Type

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

Closes: internal_logs source silently drops logs under high load #24220

…prevent silent drops

…oadcast lag

…send and batch

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d93dad90d5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

… batching

fix(internal_logs source): surface broadcast lag and widen buffer to …

773fd02

…prevent silent drops

github-actions Bot added the domain: sources Anything related to the Vector's sources label Apr 17, 2026

thomasqueirozb added 3 commits April 17, 2026 16:29

fix(internal_logs source): emit standard ComponentEventsDropped on br…

f22ad60

…oadcast lag

perf(internal_logs source): decouple broadcast drain from downstream …

846a31f

…send and batch

Improve changelog wording

d93dad9

thomasqueirozb added source: internal_logs Anything `internal_logs` source related labels Apr 17, 2026

thomasqueirozb changed the title ~~fix(internal_logs source): surface broadcast lag and widen buffer to prevent silent drops~~ fix(internal_logs source): prevent silent drops and improve throughput Apr 17, 2026

thomasqueirozb marked this pull request as ready for review April 17, 2026 22:21

thomasqueirozb requested a review from a team as a code owner April 17, 2026 22:21

chatgpt-codex-connector Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread src/sources/internal_logs.rs

fix(internal_logs source): keep shutdown token alive across main task…

5dd3097

… batching

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(internal_logs source): prevent silent drops and improve throughput#25218

fix(internal_logs source): prevent silent drops and improve throughput#25218
thomasqueirozb wants to merge 5 commits intomasterfrom
internal-logs-drop

thomasqueirozb commented Apr 17, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thomasqueirozb commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark

Minimal config (from the issue: console sink)

Blackhole sink (isolates the source path from stdout/JSON costs)

Design comparison, console sink, VECTOR_LOG=trace, 20s

Sink comparison, VECTOR_LOG=trace, 20s

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thomasqueirozb commented Apr 17, 2026 •

edited

Loading

Design comparison, console sink, `VECTOR_LOG=trace`, 20s

Sink comparison, `VECTOR_LOG=trace`, 20s