Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions changelogs/4.0.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
version: "4.0.0"
date: "2026-03-28"
type: "major"
title: "Major Release with checkv3 Protocol, Built-in Fasttext, Multi-Flag Fuzzy Hashes, and Ring Hash Consistent Hashing"
---

## Breaking Changes

- **Bayes per-user resharding**: Jump Hash replaced with Ring Hash (Ketama) for consistent upstream hashing; per-user Bayes data on sharded Redis deployments will be on wrong shards after upgrade. Run `rspamadm statistics_dump migrate` before upgrading. Single-server deployments are unaffected. ([#5914](https://github.com/rspamd/rspamd/pull/5914), [4ea7504](https://github.com/rspamd/rspamd/commit/4ea750466))
- **Content URLs included by default**: `include_content_urls` now defaults to `true`; URLs extracted from PDF and computed parts are returned by `task:get_urls()` by default, which may trigger new symbol hits on messages with PDF attachments. Restore old behavior with `include_content_urls = false` in `local.d/options.inc`. ([#5853](https://github.com/rspamd/rspamd/pull/5853))
- **SSL worker option removed**: The `ssl = true` worker option has been removed; SSL is now auto-detected from bind socket flags. Remove `ssl = true` from worker configs and use the `ssl` suffix on bind lines instead. ([#5884](https://github.com/rspamd/rspamd/pull/5884))
- **Proxy load balancing default changed**: Token bucket load balancing is now enabled by default for proxy upstreams, replacing simple round-robin. Remove the `token_bucket` key from proxy upstream config to restore round-robin behavior. ([#5874](https://github.com/rspamd/rspamd/pull/5874))
- **SenderScore RBLs disabled by default**: `senderscore_reputation` is disabled by default as it requires a MyValidity account and was returning blocked results for all unregistered IPs. Users with registered accounts must explicitly re-enable the rule. ([#5907](https://github.com/rspamd/rspamd/pull/5907))
- **DKIM unknown key handling per RFC**: Unknown and broken DKIM keys are now handled strictly per RFC, which may change DKIM results for messages with malformed keys. ([e9e6bac](https://github.com/rspamd/rspamd/commit/e9e6bac43))
- **Suspicious TLDs now map-based**: The hardcoded suspicious TLD list has been replaced with `conf/maps.d/suspicious_tlds.inc`. Customize by creating `local.d/maps.d/suspicious_tlds.inc` (override) or `local.d/maps.d/suspicious_tlds.inc.local` (extend). ([614e68c](https://github.com/rspamd/rspamd/commit/614e68c8b))
- **Neural module autolearn option renames**: Autolearn options in the neural module have been renamed to match RBL module naming conventions. Review custom neural configurations for use of old option names. ([#5835](https://github.com/rspamd/rspamd/pull/5835))
- **libfasttext external dependency removed**: The external libfasttext C++ library has been replaced with a built-in mmap-based shim. The `ENABLE_FASTTEXT` cmake option is removed (always enabled). Packagers must remove the libfasttext build dependency. ([#5897](https://github.com/rspamd/rspamd/pull/5897))

## Added

- **Jinja2 configuration templates**: Configuration files are now preprocessed by the [Lupa Jinja2-compatible template engine](/configuration/templates) before UCL parsing. Environment variables prefixed with `RSPAMD_` are exposed as the `env` table in templates; modified delimiters (`{= =}` for expressions, `{% %}` for control structures) avoid conflicts with UCL syntax. New validation filters (`mandatory`, `require_int`, `require_number`, `require_bool`, `require_duration`, `require_json`, `fromjson`) abort startup with a clear error on invalid input, enabling container-ready configuration validation without shell entrypoint scripts. ([#5938](https://github.com/rspamd/rspamd/pull/5938), [#5941](https://github.com/rspamd/rspamd/pull/5941))
- **checkv3 multipart protocol**: New `/checkv3` endpoint using `multipart/form-data` requests and `multipart/mixed` responses; metadata sent as structured JSON/msgpack instead of HTTP headers, per-part zstd compression, optional body part for rewritten messages, and zero-copy piecewise writev for responses. Use `rspamc --protocol-v3` or `rspamc --msgpack` to activate. ([#5880](https://github.com/rspamd/rspamd/pull/5880))
- **Pluggable Hyperscan cache backend**: Hyperscan compilation and caching moved to an async Lua backend with Redis-based shared database support across workers and hosts. Async compilation prevents blocking the main event loop; self-healing cache auto-detects stale blobs and triggers recompile; small databases compiled in-memory without file caching. ([#5813](https://github.com/rspamd/rspamd/pull/5813), [#5952](https://github.com/rspamd/rspamd/pull/5952))
- **Multi-flag fuzzy hashes**: A single fuzzy hash can now carry up to 8 flags simultaneously, allowing multiple rules to match the same digest with independent flag/value pairs. Redis update path rewritten in Lua with EVALSHA and NOSCRIPT recovery. Backward-compatible epoch 12 wire protocol with highest-value flag promoted to the primary slot. Fuzzy hashes now stored in Redis history. ([#5894](https://github.com/rspamd/rspamd/pull/5894), [#5860](https://github.com/rspamd/rspamd/pull/5860))
- **HTML fuzzy phishing detection**: Dual-mode fuzzy matching — template matching and domain-sensitive matching. New `FUZZY_HTML_PHISHING` symbol fires when an HTML template matches but domains differ, detecting reused phishing templates with swapped links. ([173058061](https://github.com/rspamd/rspamd/commit/173058061))
- **Built-in Fasttext shim**: External C++ libfasttext replaced with a zero-dependency mmap-based reader providing shared memory across workers via `MAP_SHARED`, eliminating per-worker heap copies and saving approximately 500MB–7GB RAM. No more C++ exception ABI issues. Existing `.bin`/`.ftz` models continue to work unchanged. Fasttext wired through maps infrastructure for hot-reloading. ([#5897](https://github.com/rspamd/rspamd/pull/5897), [#5909](https://github.com/rspamd/rspamd/pull/5909))
- **Neural network and LLM embedding improvements**: External pretrained neural model support; LLM embedding providers with multi-model support, mean+max pooling, and SIF word weighting; multi-layer funnel architecture; language-based model and URL selection; expression-based autolearn for neural LLM providers; GPT module with configurable consensus thresholds, `context_augment` hook, and mempool variable storage. ([#5924](https://github.com/rspamd/rspamd/pull/5924), [#5903](https://github.com/rspamd/rspamd/pull/5903), [#5897](https://github.com/rspamd/rspamd/pull/5897), [#5835](https://github.com/rspamd/rspamd/pull/5835))
- **HTTPS server support**: Workers can now serve HTTPS natively with SSL auto-detected from bind socket configuration, enabling secure WebUI and API without a reverse proxy. ([#5884](https://github.com/rspamd/rspamd/pull/5884), [d04b367](https://github.com/rspamd/rspamd/commit/d04b367db))
- **Ring Hash (Ketama) consistent hashing**: Proper consistent hashing with virtual nodes ensures only ~1/n keys redistribute when an upstream fails, and keys return to their original upstream on recovery. ([4ea7504](https://github.com/rspamd/rspamd/commit/4ea750466))
- **Token bucket proxy load balancing**: New load balancing algorithm for proxy upstreams with configurable `max_tokens`, `scale`, and `base_cost` parameters for better burst traffic handling. ([#5874](https://github.com/rspamd/rspamd/pull/5874))
- **Multiclass Bayes support**: Classifiers now support arbitrary classes beyond binary spam/ham. WebUI learning interface updated for multi-class workflows. `/stat` and `/bayes/classifiers` endpoints extended with classifier metadata. `rspamadm statistics_dump` supports multi-class dump and restore. ([#5900](https://github.com/rspamd/rspamd/pull/5900), [#5893](https://github.com/rspamd/rspamd/pull/5893), [#5914](https://github.com/rspamd/rspamd/pull/5914))
- **Structured metadata exporter**: New structured formatter for the metadata exporter module with zstd compression option and detected MIME types for attachments. ([#5890](https://github.com/rspamd/rspamd/pull/5890))
- **UUID v7 per task**: Native UUID v7 generation per scanning task synced with the `Log-Tag` header and ClickHouse UUID v7 column support. ([#5890](https://github.com/rspamd/rspamd/pull/5890))
- **ARC trusted_authserv_id**: Reuse upstream authentication results via trusted `Authentication-Results` headers from known authentication servers. ([506ef44](https://github.com/rspamd/rspamd/commit/506ef44b8))
- **Legacy protocol milter headers**: Milter `add_headers` and `remove_headers` exposed in the RSPAMC/SPAMC text protocol with extended symbol info including descriptions and options, enabling Exim to access milter headers via `$spam_report`. ([#5948](https://github.com/rspamd/rspamd/pull/5948))
- **rspamadm new subcommands**: `rspamadm autolearnstats` for autolearn statistics analysis; `rspamadm logstats` and `rspamadm mapstats` as rewrites of legacy Perl scripts; `rspamadm statistics_dump migrate` for Bayes shard migration. ([#5946](https://github.com/rspamd/rspamd/pull/5946), [#5885](https://github.com/rspamd/rspamd/pull/5885), [#5914](https://github.com/rspamd/rspamd/pull/5914))
- **HTTP content negotiation**: Framework for content negotiation on API endpoints; `/stat` endpoint supports zstd-compressed responses. ([#5832](https://github.com/rspamd/rspamd/pull/5832))
- **PDF improvements**: ASCII85 decode support, ligature substitution fix, object padding evasion defeat, and small objects no longer counted toward processing limits. ([73a37be](https://github.com/rspamd/rspamd/commit/73a37be63), [eb1acde](https://github.com/rspamd/rspamd/commit/eb1acde80), [2b91e5e](https://github.com/rspamd/rspamd/commit/2b91e5ef5), [1f02010](https://github.com/rspamd/rspamd/commit/1f020105e))
- **Reply-To validity checks**: New header checks for `Reply-To` address validity. ([e95533f](https://github.com/rspamd/rspamd/commit/e95533f1f))

## Fixed

- **Fuzzy UDP use-after-free** (critical): Fixed use-after-free on ev_io watcher in fuzzy UDP sessions. ([4557166](https://github.com/rspamd/rspamd/commit/455716621))
- **Fuzzy TCP CPU busy-loop**: Fixed CPU spin in fuzzy TCP client under certain error conditions. ([06dba44](https://github.com/rspamd/rspamd/commit/06dba4495))
- **SPF address family flag inheritance**: Correct propagation of address family flags in SPF resolution. ([2a8643e](https://github.com/rspamd/rspamd/commit/2a8643e5e))
- **DKIM RSA signing memory leak**: Fixed memory leak in RSA path of DKIM signing. ([9608160](https://github.com/rspamd/rspamd/commit/9608160b1))
- **RHEL/CentOS 10 SHA-1 DKIM policy bypass**: Fixed crypto-policy bypass for SHA-1 DKIM signatures on RHEL/CentOS 10. ([7a38a8e](https://github.com/rspamd/rspamd/commit/7a38a8e33))
- **Ratelimit compatibility with old records**: Fixed backward compatibility with legacy ratelimit bucket records. ([#5842](https://github.com/rspamd/rspamd/pull/5842))
- **Weighted round-robin not respecting weights**: Fixed upstream selection ignoring configured weights. ([f563e25](https://github.com/rspamd/rspamd/commit/f563e25a0))
- **SVG misdetection**: Fixed incorrect HTML detection for messages with embedded SVG content. ([170c4c5](https://github.com/rspamd/rspamd/commit/170c4c5d6))
- **Hyperscan use-after-free on config reload**: Multiple use-after-free issues in Hyperscan cache handling during live configuration reload resolved. ([#5813](https://github.com/rspamd/rspamd/pull/5813))
- **Jemalloc tuning**: Jemalloc tuned for Rspamd's single-threaded multi-process architecture, reducing memory overhead. ([#5949](https://github.com/rspamd/rspamd/pull/5949))

## Improved

- **Consistent hash distribution**: Ring Hash with virtual nodes provides true minimal disruption on upstream failure and guarantees key return to original upstream on recovery, replacing the previous Jump Hash algorithm.
- **Hyperscan async compilation**: Compilation no longer blocks the main event loop; self-healing blob detection ensures cache correctness after Hyperscan version changes.
- **Fasttext memory efficiency**: Built-in shim shares model data across all worker processes via shared memory, eliminating 500MB–7GB of duplicate heap allocations typical in multi-worker deployments.
- **Fuzzy hash expressiveness**: Multi-flag support allows a single stored digest to satisfy multiple independent rule checks simultaneously without duplication in storage.

Rspamd 4.0 is a landmark release delivering foundational infrastructure improvements alongside major new capabilities. The new `/checkv3` multipart protocol modernizes the scanning API with structured metadata, per-part compression, and zero-copy response paths. The built-in Fasttext shim eliminates a heavyweight C++ dependency while dramatically reducing per-worker memory usage. Multi-flag fuzzy hashes unlock more expressive detection rules, and HTML fuzzy phishing detection brings template-aware link-swap detection to the fuzzy engine. The move to Ring Hash consistent hashing corrects shard distribution behavior for Redis-backed deployments — users with sharded Bayes **must** run the migration tool before upgrading. This release is recommended for all users; users running sharded Redis Bayes backends should follow the migration procedure before upgrading.
4 changes: 2 additions & 2 deletions docs/developers/protocol.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ keypair {
}
```

Regrettably, the HTTPCrypt protocol hasn't gained widespread adoption among popular libraries. Nonetheless, you can effectively utilize it with the `rspamc` client and various internal clients, including Rspamd's proxy, which can serve as an encryption bridge for conducting spam scans via Rspamd.
Moreover, you have the option to employ Nginx for SSL termination on behalf of Rspamd. While Rspamd's client-side components (e.g., proxy or `rspamc`) offer native support for SSL encryption, it's important to note that SSL support on the server side is not currently available.
Regrettably, the HTTPCrypt protocol hasn't gained widespread adoption among popular libraries. Nonetheless, you can effectively utilize it with the `rspamc` client and various internal clients, including Rspamd's proxy, which can serve as an encryption bridge for conducting spam scans via Rspamd.
Starting from Rspamd 4.0, workers can also serve HTTPS natively — see [HTTPS support](/workers/#https-support) for configuration details. For earlier versions, or when advanced TLS features (OCSP stapling, client certificates) are needed, nginx can be used for SSL termination in front of Rspamd.

### HTTP request

Expand Down
2 changes: 1 addition & 1 deletion docs/modules/milter_headers.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ title: Milter headers module
# Milter headers module


The `milter headers` module (formerly known as `rmilter headers`) has been added in Rspamd 1.5 to provide a relatively simple way to configure adding/removing of headers via Rmilter (the alternative being to use the [API](/lua/rspamd_task#me7351)). Despite its name, it is not tied to the `milter` protocol and also works with supported mailservers that use the HTTP interface such as Haraka and OpenSMTPD.
The `milter headers` module (formerly known as `rmilter headers`) has been added in Rspamd 1.5 to provide a relatively simple way to configure adding/removing of headers via Rmilter (the alternative being to use the [API](/lua/rspamd_task#me7351)). Despite its name, it is not tied to the `milter` protocol and also works with supported mailservers that use the HTTP interface such as Haraka and OpenSMTPD, as well as with Exim via the RSPAMC protocol (since Rspamd 4.0, header operations are serialised into `$spam_report` — see [MTA integration](/tutorials/integration#milter-headers-in-exim-rspamd-40)).



Expand Down
78 changes: 67 additions & 11 deletions docs/tutorials/integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,7 @@ acl_check_spam:
# $spam_score is the message score (we unlikely need it)
# $spam_score_int is spam score multiplied by 10
# $spam_report lists symbols matched & protocol messages
# (Rspamd 4.0+: also contains X-Milter-Add/Del/Symbol lines)
# $spam_bar is a visual indicator of spam/ham level

# use greylisting available in rspamd v1.3+
Expand Down Expand Up @@ -253,29 +254,84 @@ The `X-Symbol` format is: `NAME(SCORE); DESCRIPTION [OPT1, OPT2, ...]`

These new lines are backward compatible — existing `Symbol:` lines remain unchanged.

#### Applying milter headers in Exim
#### Complete ACL with milter header support

To extract and apply milter-added headers in your Exim ACL, parse the `X-Milter-Add` and `X-Milter-Del` lines from `$spam_report`:
The following is a full Exim configuration snippet that scans the message once, extracts milter header operations, and applies both custom module headers and standard spam headers:

```sh
warn
spam = nobody:true
set acl_m_report = ${sg{$spam_report}{\\v\\s+}{\\n}}
# Global section
spamd_address = 127.0.0.1 11333 variant=rspamd
acl_smtp_data = acl_check_spam

begin acl

acl_check_spam:
# do not scan messages submitted from our own hosts
# +relay_from_hosts is assumed to be a list of hosts in configuration
accept hosts = +relay_from_hosts

# skip scanning for authenticated users (if desired?)
accept authenticated = *

# Add milter headers: filter X-Milter-Add lines, strip prefix + optional [N]
# scan the message with rspamd (sets $spam_action, $spam_score,
# $spam_score_int, $spam_report, $spam_bar)
warn spam = nobody:true

# Parse milter header operations from $spam_report (Rspamd 4.0+)
# Normalise vertical whitespace, then extract X-Milter-Add / X-Milter-Del lines
warn
set acl_m_report = ${sg{$spam_report}{\\v\\s+}{\\n}}
set acl_m_milter_add = ${sg{\
${sg{$acl_m_report}{(?m)^(?!X-Milter-Add: ).*(\\n|$)}{}}}\
{(?m)^X-Milter-Add: ([^\\[:\\n]+)(?:\\[\\d+\\])?: }{$1: }}
add_header = $acl_m_milter_add

# Remove milter headers: filter X-Milter-Del lines, strip prefix + optional [N]
set acl_m_milter_del = ${sg{\
${sg{$acl_m_report}{(?m)^(?!X-Milter-Del: ).*(\\n|$)}{}}}\
{(?m)^X-Milter-Del: ([^\\[\\n]+).*}{$1}}
remove_header = $acl_m_milter_del

# use greylisting available in rspamd v1.3+
defer message = Please try again later
condition = ${if eq{$spam_action}{soft reject}}

deny message = Message discarded as high-probability spam
condition = ${if eq{$spam_action}{reject}}

# Remove foreign headers
warn remove_header = x-spam-bar : x-spam-score : x-spam-report : x-spam-status

# Apply milter header additions from Rspamd modules (e.g. milter_headers, ARC)
warn
condition = ${if def:acl_m_milter_add}
add_header = $acl_m_milter_add

# Apply milter header removals from Rspamd modules
warn
condition = ${if def:acl_m_milter_del}
remove_header = $acl_m_milter_del

# add spam-score and spam-report header when "add header" action is recommended
warn
condition = ${if eq{$spam_action}{add header}}
add_header = X-Spam-Score: $spam_score ($spam_bar)
add_header = X-Spam-Report: $spam_report

# add x-spam-status header if message is not ham
# do not match when $spam_action is empty (e.g. when rspamd is not running)
warn
! condition = ${if match{$spam_action}{^no action\$|^greylist\$|^\$}}
add_header = X-Spam-Status: Yes

# add x-spam-bar header if score is positive
warn
condition = ${if >{$spam_score_int}{0}}
add_header = X-Spam-Bar: $spam_bar

accept
```

This effectively gives Exim the same header-manipulation capabilities that were previously exclusive to milter-based integrations (Postfix, Sendmail).
Key points:
- The message is scanned **once** by `warn spam = nobody:true`. All subsequent blocks read from `$spam_report` and `$spam_action` without rescanning.
- `acl_m_milter_add` / `acl_m_milter_del` are only applied when non-empty (the `${if def:...}` guard prevents adding a blank header line).
- This gives Exim the same header-manipulation capabilities previously exclusive to milter-based integrations (Postfix, Sendmail).

For further information please refer to the [Exim specification](https://www.exim.org/exim-html-current/doc/html/spec_html/), especially the [chapter about content scanning](https://www.exim.org/exim-html-current/doc/html/spec_html/ch-content_scanning_at_acl_time.html).

Expand Down
Loading
Loading