Skip to content

fix(internal_logs source): prevent silent drops and improve throughput#25218

Open
thomasqueirozb wants to merge 5 commits intomasterfrom
internal-logs-drop
Open

fix(internal_logs source): prevent silent drops and improve throughput#25218
thomasqueirozb wants to merge 5 commits intomasterfrom
internal-logs-drop

Conversation

@thomasqueirozb
Copy link
Copy Markdown
Contributor

@thomasqueirozb thomasqueirozb commented Apr 17, 2026

Summary

Fixes issue #24220: the internal_logs source silently dropped events under high load.

  • Decouples broadcast consumption from downstream sending. A dedicated drain task pulls from the trace broadcast into a bounded intermediate queue; the main task batches from the queue and calls send_batch. This keeps the broadcast receiver drained while the sink is backpressured, and amortizes per-event overhead downstream.
  • Surfaces any remaining drops via the standard ComponentEventsDropped / component_discarded_events_total{intentional="false"} metric, replacing the previous silent BroadcastStreamRecvError::Lagged swallow.

Benchmark

Both internal_logs and internal_metrics sources feed into their own sinks. The prometheus_exporter is scraped at the end of each 20s run to read component_received_events_total and component_discarded_events_total{intentional="false"} for the internal_logs source.

On master the drops are silently filtered. For the "master" rows below, master's into_stream() was temporarily patched to increment the same drop metric (without any tracing call, to avoid a feedback loop) so the numbers are comparable. That patch is not part of this PR.

Minimal config (from the issue: console sink)

api:
  enabled: true

sources:
  internal_logs:
    type: internal_logs
  internal_metrics:
    type: internal_metrics
    scrape_interval_secs: 1

sinks:
  show_internal_logs:
    type: console
    inputs:
      - internal_logs
    encoding:
      codec: json
  prom:
    type: prometheus_exporter
    inputs:
      - internal_metrics
    address: 127.0.0.1:9598

Blackhole sink (isolates the source path from stdout/JSON costs)

api:
  enabled: true

sources:
  internal_logs:
    type: internal_logs
  internal_metrics:
    type: internal_metrics
    scrape_interval_secs: 1

sinks:
  null_sink:
    type: blackhole
    inputs:
      - internal_logs
  prom:
    type: prometheus_exporter
    inputs:
      - internal_metrics
    address: 127.0.0.1:9598

Design comparison, console sink, VECTOR_LOG=trace, 20s

Isolates the source-path change. Buffer size is the broadcast capacity in src/trace.rs.

Design Broadcast buffer Drops
Single-task loop (master) 99 876,567
Single-task loop 10,000 848,510
Drain + batching 99 353,217
Drain + batching 10,000 333,105

Buffer size made almost no difference (~6%) once the drain + batching path was in place, so the original 99 is retained.

Sink comparison, VECTOR_LOG=trace, 20s

Compares master (pathed) vs this branch across the console sink (from the issue) and a blackhole sink (isolates the source).

Keep in mind that the dropped events in master without a patch show up as 0, and logs are silently dropped.

Version Sink Received Dropped Total Drop %
master (patched) console 129,468 776,664 906,132 85.7%
master (patched) blackhole 147,279 883,563 1,030,842 85.7%
this branch console 395,443 353,870 749,313 47.2%
this branch blackhole 1,524,445 0 1,524,445 0%

Interpretation:

  • On master the source itself is the bottleneck: single-event send_event and broadcast consumption being coupled cap throughput at ~5k events/sec delivered and drop ~86% of events even when the sink is free (blackhole). The BroadcastStreamRecvError::Lagged path is silently filtered on stock master, so those drops aren't visible anywhere.
  • With the drain + batching design, the source can deliver ~76k events/sec (~10x higher delivered throughput, ~1.5x higher combined throughput) when the sink doesn't backpressure. On the console sink it still drops under trace because stdout + JSON encoding caps at ~20k events/sec, but those drops are now surfaced in metrics.

Under VECTOR_LOG=debug (normal load), both configs show zero drops.

How did you test this PR?

  • cargo nextest run --no-default-features --features sources-internal_logs --lib sources::internal_logs:: (all existing tests pass)
  • cargo vdev check events
  • make check-clippy
  • make check-fmt
  • Ran both configs at VECTOR_LOG=debug and VECTOR_LOG=trace, comparing component_received_events_total and component_discarded_events_total between master and this branch.

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

@github-actions github-actions Bot added the domain: sources Anything related to the Vector's sources label Apr 17, 2026
@thomasqueirozb thomasqueirozb added source: internal_logs Anything `internal_logs` source related labels Apr 17, 2026
@thomasqueirozb thomasqueirozb changed the title fix(internal_logs source): surface broadcast lag and widen buffer to prevent silent drops fix(internal_logs source): prevent silent drops and improve throughput Apr 17, 2026
@thomasqueirozb thomasqueirozb marked this pull request as ready for review April 17, 2026 22:21
@thomasqueirozb thomasqueirozb requested a review from a team as a code owner April 17, 2026 22:21
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d93dad90d5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/sources/internal_logs.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: sources Anything related to the Vector's sources source: internal_logs Anything `internal_logs` source related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

internal_logs source silently drops logs under high load

1 participant