diff --git a/changelogs/4.0.0.md b/changelogs/4.0.0.md new file mode 100644 index 000000000..8038ac27e --- /dev/null +++ b/changelogs/4.0.0.md @@ -0,0 +1,62 @@ +--- +version: "4.0.0" +date: "2026-03-28" +type: "major" +title: "Major Release with checkv3 Protocol, Built-in Fasttext, Multi-Flag Fuzzy Hashes, and Ring Hash Consistent Hashing" +--- + +## Breaking Changes + +- **Bayes per-user resharding**: Jump Hash replaced with Ring Hash (Ketama) for consistent upstream hashing; per-user Bayes data on sharded Redis deployments will be on wrong shards after upgrade. Run `rspamadm statistics_dump migrate` before upgrading. Single-server deployments are unaffected. ([#5914](https://github.com/rspamd/rspamd/pull/5914), [4ea7504](https://github.com/rspamd/rspamd/commit/4ea750466)) +- **Content URLs included by default**: `include_content_urls` now defaults to `true`; URLs extracted from PDF and computed parts are returned by `task:get_urls()` by default, which may trigger new symbol hits on messages with PDF attachments. Restore old behavior with `include_content_urls = false` in `local.d/options.inc`. ([#5853](https://github.com/rspamd/rspamd/pull/5853)) +- **SSL worker option removed**: The `ssl = true` worker option has been removed; SSL is now auto-detected from bind socket flags. Remove `ssl = true` from worker configs and use the `ssl` suffix on bind lines instead. ([#5884](https://github.com/rspamd/rspamd/pull/5884)) +- **Proxy load balancing default changed**: Token bucket load balancing is now enabled by default for proxy upstreams, replacing simple round-robin. Remove the `token_bucket` key from proxy upstream config to restore round-robin behavior. ([#5874](https://github.com/rspamd/rspamd/pull/5874)) +- **SenderScore RBLs disabled by default**: `senderscore_reputation` is disabled by default as it requires a MyValidity account and was returning blocked results for all unregistered IPs. Users with registered accounts must explicitly re-enable the rule. ([#5907](https://github.com/rspamd/rspamd/pull/5907)) +- **DKIM unknown key handling per RFC**: Unknown and broken DKIM keys are now handled strictly per RFC, which may change DKIM results for messages with malformed keys. ([e9e6bac](https://github.com/rspamd/rspamd/commit/e9e6bac43)) +- **Suspicious TLDs now map-based**: The hardcoded suspicious TLD list has been replaced with `conf/maps.d/suspicious_tlds.inc`. Customize by creating `local.d/maps.d/suspicious_tlds.inc` (override) or `local.d/maps.d/suspicious_tlds.inc.local` (extend). ([614e68c](https://github.com/rspamd/rspamd/commit/614e68c8b)) +- **Neural module autolearn option renames**: Autolearn options in the neural module have been renamed to match RBL module naming conventions. Review custom neural configurations for use of old option names. ([#5835](https://github.com/rspamd/rspamd/pull/5835)) +- **libfasttext external dependency removed**: The external libfasttext C++ library has been replaced with a built-in mmap-based shim. The `ENABLE_FASTTEXT` cmake option is removed (always enabled). Packagers must remove the libfasttext build dependency. ([#5897](https://github.com/rspamd/rspamd/pull/5897)) + +## Added + +- **Jinja2 configuration templates**: Configuration files are now preprocessed by the [Lupa Jinja2-compatible template engine](/configuration/templates) before UCL parsing. Environment variables prefixed with `RSPAMD_` are exposed as the `env` table in templates; modified delimiters (`{= =}` for expressions, `{% %}` for control structures) avoid conflicts with UCL syntax. New validation filters (`mandatory`, `require_int`, `require_number`, `require_bool`, `require_duration`, `require_json`, `fromjson`) abort startup with a clear error on invalid input, enabling container-ready configuration validation without shell entrypoint scripts. ([#5938](https://github.com/rspamd/rspamd/pull/5938), [#5941](https://github.com/rspamd/rspamd/pull/5941)) +- **checkv3 multipart protocol**: New `/checkv3` endpoint using `multipart/form-data` requests and `multipart/mixed` responses; metadata sent as structured JSON/msgpack instead of HTTP headers, per-part zstd compression, optional body part for rewritten messages, and zero-copy piecewise writev for responses. Use `rspamc --protocol-v3` or `rspamc --msgpack` to activate. ([#5880](https://github.com/rspamd/rspamd/pull/5880)) +- **Pluggable Hyperscan cache backend**: Hyperscan compilation and caching moved to an async Lua backend with Redis-based shared database support across workers and hosts. Async compilation prevents blocking the main event loop; self-healing cache auto-detects stale blobs and triggers recompile; small databases compiled in-memory without file caching. ([#5813](https://github.com/rspamd/rspamd/pull/5813), [#5952](https://github.com/rspamd/rspamd/pull/5952)) +- **Multi-flag fuzzy hashes**: A single fuzzy hash can now carry up to 8 flags simultaneously, allowing multiple rules to match the same digest with independent flag/value pairs. Redis update path rewritten in Lua with EVALSHA and NOSCRIPT recovery. Backward-compatible epoch 12 wire protocol with highest-value flag promoted to the primary slot. Fuzzy hashes now stored in Redis history. ([#5894](https://github.com/rspamd/rspamd/pull/5894), [#5860](https://github.com/rspamd/rspamd/pull/5860)) +- **HTML fuzzy phishing detection**: Dual-mode fuzzy matching — template matching and domain-sensitive matching. New `FUZZY_HTML_PHISHING` symbol fires when an HTML template matches but domains differ, detecting reused phishing templates with swapped links. ([173058061](https://github.com/rspamd/rspamd/commit/173058061)) +- **Built-in Fasttext shim**: External C++ libfasttext replaced with a zero-dependency mmap-based reader providing shared memory across workers via `MAP_SHARED`, eliminating per-worker heap copies and saving approximately 500MB–7GB RAM. No more C++ exception ABI issues. Existing `.bin`/`.ftz` models continue to work unchanged. Fasttext wired through maps infrastructure for hot-reloading. ([#5897](https://github.com/rspamd/rspamd/pull/5897), [#5909](https://github.com/rspamd/rspamd/pull/5909)) +- **Neural network and LLM embedding improvements**: External pretrained neural model support; LLM embedding providers with multi-model support, mean+max pooling, and SIF word weighting; multi-layer funnel architecture; language-based model and URL selection; expression-based autolearn for neural LLM providers; GPT module with configurable consensus thresholds, `context_augment` hook, and mempool variable storage. ([#5924](https://github.com/rspamd/rspamd/pull/5924), [#5903](https://github.com/rspamd/rspamd/pull/5903), [#5897](https://github.com/rspamd/rspamd/pull/5897), [#5835](https://github.com/rspamd/rspamd/pull/5835)) +- **HTTPS server support**: Workers can now serve HTTPS natively with SSL auto-detected from bind socket configuration, enabling secure WebUI and API without a reverse proxy. ([#5884](https://github.com/rspamd/rspamd/pull/5884), [d04b367](https://github.com/rspamd/rspamd/commit/d04b367db)) +- **Ring Hash (Ketama) consistent hashing**: Proper consistent hashing with virtual nodes ensures only ~1/n keys redistribute when an upstream fails, and keys return to their original upstream on recovery. ([4ea7504](https://github.com/rspamd/rspamd/commit/4ea750466)) +- **Token bucket proxy load balancing**: New load balancing algorithm for proxy upstreams with configurable `max_tokens`, `scale`, and `base_cost` parameters for better burst traffic handling. ([#5874](https://github.com/rspamd/rspamd/pull/5874)) +- **Multiclass Bayes support**: Classifiers now support arbitrary classes beyond binary spam/ham. WebUI learning interface updated for multi-class workflows. `/stat` and `/bayes/classifiers` endpoints extended with classifier metadata. `rspamadm statistics_dump` supports multi-class dump and restore. ([#5900](https://github.com/rspamd/rspamd/pull/5900), [#5893](https://github.com/rspamd/rspamd/pull/5893), [#5914](https://github.com/rspamd/rspamd/pull/5914)) +- **Structured metadata exporter**: New structured formatter for the metadata exporter module with zstd compression option and detected MIME types for attachments. ([#5890](https://github.com/rspamd/rspamd/pull/5890)) +- **UUID v7 per task**: Native UUID v7 generation per scanning task synced with the `Log-Tag` header and ClickHouse UUID v7 column support. ([#5890](https://github.com/rspamd/rspamd/pull/5890)) +- **ARC trusted_authserv_id**: Reuse upstream authentication results via trusted `Authentication-Results` headers from known authentication servers. ([506ef44](https://github.com/rspamd/rspamd/commit/506ef44b8)) +- **Legacy protocol milter headers**: Milter `add_headers` and `remove_headers` exposed in the RSPAMC/SPAMC text protocol with extended symbol info including descriptions and options, enabling Exim to access milter headers via `$spam_report`. ([#5948](https://github.com/rspamd/rspamd/pull/5948)) +- **rspamadm new subcommands**: `rspamadm autolearnstats` for autolearn statistics analysis; `rspamadm logstats` and `rspamadm mapstats` as rewrites of legacy Perl scripts; `rspamadm statistics_dump migrate` for Bayes shard migration. ([#5946](https://github.com/rspamd/rspamd/pull/5946), [#5885](https://github.com/rspamd/rspamd/pull/5885), [#5914](https://github.com/rspamd/rspamd/pull/5914)) +- **HTTP content negotiation**: Framework for content negotiation on API endpoints; `/stat` endpoint supports zstd-compressed responses. ([#5832](https://github.com/rspamd/rspamd/pull/5832)) +- **PDF improvements**: ASCII85 decode support, ligature substitution fix, object padding evasion defeat, and small objects no longer counted toward processing limits. ([73a37be](https://github.com/rspamd/rspamd/commit/73a37be63), [eb1acde](https://github.com/rspamd/rspamd/commit/eb1acde80), [2b91e5e](https://github.com/rspamd/rspamd/commit/2b91e5ef5), [1f02010](https://github.com/rspamd/rspamd/commit/1f020105e)) +- **Reply-To validity checks**: New header checks for `Reply-To` address validity. ([e95533f](https://github.com/rspamd/rspamd/commit/e95533f1f)) + +## Fixed + +- **Fuzzy UDP use-after-free** (critical): Fixed use-after-free on ev_io watcher in fuzzy UDP sessions. ([4557166](https://github.com/rspamd/rspamd/commit/455716621)) +- **Fuzzy TCP CPU busy-loop**: Fixed CPU spin in fuzzy TCP client under certain error conditions. ([06dba44](https://github.com/rspamd/rspamd/commit/06dba4495)) +- **SPF address family flag inheritance**: Correct propagation of address family flags in SPF resolution. ([2a8643e](https://github.com/rspamd/rspamd/commit/2a8643e5e)) +- **DKIM RSA signing memory leak**: Fixed memory leak in RSA path of DKIM signing. ([9608160](https://github.com/rspamd/rspamd/commit/9608160b1)) +- **RHEL/CentOS 10 SHA-1 DKIM policy bypass**: Fixed crypto-policy bypass for SHA-1 DKIM signatures on RHEL/CentOS 10. ([7a38a8e](https://github.com/rspamd/rspamd/commit/7a38a8e33)) +- **Ratelimit compatibility with old records**: Fixed backward compatibility with legacy ratelimit bucket records. ([#5842](https://github.com/rspamd/rspamd/pull/5842)) +- **Weighted round-robin not respecting weights**: Fixed upstream selection ignoring configured weights. ([f563e25](https://github.com/rspamd/rspamd/commit/f563e25a0)) +- **SVG misdetection**: Fixed incorrect HTML detection for messages with embedded SVG content. ([170c4c5](https://github.com/rspamd/rspamd/commit/170c4c5d6)) +- **Hyperscan use-after-free on config reload**: Multiple use-after-free issues in Hyperscan cache handling during live configuration reload resolved. ([#5813](https://github.com/rspamd/rspamd/pull/5813)) +- **Jemalloc tuning**: Jemalloc tuned for Rspamd's single-threaded multi-process architecture, reducing memory overhead. ([#5949](https://github.com/rspamd/rspamd/pull/5949)) + +## Improved + +- **Consistent hash distribution**: Ring Hash with virtual nodes provides true minimal disruption on upstream failure and guarantees key return to original upstream on recovery, replacing the previous Jump Hash algorithm. +- **Hyperscan async compilation**: Compilation no longer blocks the main event loop; self-healing blob detection ensures cache correctness after Hyperscan version changes. +- **Fasttext memory efficiency**: Built-in shim shares model data across all worker processes via shared memory, eliminating 500MB–7GB of duplicate heap allocations typical in multi-worker deployments. +- **Fuzzy hash expressiveness**: Multi-flag support allows a single stored digest to satisfy multiple independent rule checks simultaneously without duplication in storage. + +Rspamd 4.0 is a landmark release delivering foundational infrastructure improvements alongside major new capabilities. The new `/checkv3` multipart protocol modernizes the scanning API with structured metadata, per-part compression, and zero-copy response paths. The built-in Fasttext shim eliminates a heavyweight C++ dependency while dramatically reducing per-worker memory usage. Multi-flag fuzzy hashes unlock more expressive detection rules, and HTML fuzzy phishing detection brings template-aware link-swap detection to the fuzzy engine. The move to Ring Hash consistent hashing corrects shard distribution behavior for Redis-backed deployments — users with sharded Bayes **must** run the migration tool before upgrading. This release is recommended for all users; users running sharded Redis Bayes backends should follow the migration procedure before upgrading. diff --git a/docs/developers/protocol.md b/docs/developers/protocol.md index 8d32f4716..304b31510 100644 --- a/docs/developers/protocol.md +++ b/docs/developers/protocol.md @@ -47,8 +47,8 @@ keypair { } ``` -Regrettably, the HTTPCrypt protocol hasn't gained widespread adoption among popular libraries. Nonetheless, you can effectively utilize it with the `rspamc` client and various internal clients, including Rspamd's proxy, which can serve as an encryption bridge for conducting spam scans via Rspamd. -Moreover, you have the option to employ Nginx for SSL termination on behalf of Rspamd. While Rspamd's client-side components (e.g., proxy or `rspamc`) offer native support for SSL encryption, it's important to note that SSL support on the server side is not currently available. +Regrettably, the HTTPCrypt protocol hasn't gained widespread adoption among popular libraries. Nonetheless, you can effectively utilize it with the `rspamc` client and various internal clients, including Rspamd's proxy, which can serve as an encryption bridge for conducting spam scans via Rspamd. +Starting from Rspamd 4.0, workers can also serve HTTPS natively — see [HTTPS support](/workers/#https-support) for configuration details. For earlier versions, or when advanced TLS features (OCSP stapling, client certificates) are needed, nginx can be used for SSL termination in front of Rspamd. ### HTTP request diff --git a/docs/modules/milter_headers.md b/docs/modules/milter_headers.md index bef5a566d..722358066 100644 --- a/docs/modules/milter_headers.md +++ b/docs/modules/milter_headers.md @@ -6,7 +6,7 @@ title: Milter headers module # Milter headers module -The `milter headers` module (formerly known as `rmilter headers`) has been added in Rspamd 1.5 to provide a relatively simple way to configure adding/removing of headers via Rmilter (the alternative being to use the [API](/lua/rspamd_task#me7351)). Despite its name, it is not tied to the `milter` protocol and also works with supported mailservers that use the HTTP interface such as Haraka and OpenSMTPD. +The `milter headers` module (formerly known as `rmilter headers`) has been added in Rspamd 1.5 to provide a relatively simple way to configure adding/removing of headers via Rmilter (the alternative being to use the [API](/lua/rspamd_task#me7351)). Despite its name, it is not tied to the `milter` protocol and also works with supported mailservers that use the HTTP interface such as Haraka and OpenSMTPD, as well as with Exim via the RSPAMC protocol (since Rspamd 4.0, header operations are serialised into `$spam_report` — see [MTA integration](/tutorials/integration#milter-headers-in-exim-rspamd-40)). diff --git a/docs/tutorials/integration.md b/docs/tutorials/integration.md index b755d8eb4..ac9798e51 100644 --- a/docs/tutorials/integration.md +++ b/docs/tutorials/integration.md @@ -192,6 +192,7 @@ acl_check_spam: # $spam_score is the message score (we unlikely need it) # $spam_score_int is spam score multiplied by 10 # $spam_report lists symbols matched & protocol messages + # (Rspamd 4.0+: also contains X-Milter-Add/Del/Symbol lines) # $spam_bar is a visual indicator of spam/ham level # use greylisting available in rspamd v1.3+ @@ -253,29 +254,84 @@ The `X-Symbol` format is: `NAME(SCORE); DESCRIPTION [OPT1, OPT2, ...]` These new lines are backward compatible — existing `Symbol:` lines remain unchanged. -#### Applying milter headers in Exim +#### Complete ACL with milter header support -To extract and apply milter-added headers in your Exim ACL, parse the `X-Milter-Add` and `X-Milter-Del` lines from `$spam_report`: +The following is a full Exim configuration snippet that scans the message once, extracts milter header operations, and applies both custom module headers and standard spam headers: ```sh - warn - spam = nobody:true - set acl_m_report = ${sg{$spam_report}{\\v\\s+}{\\n}} +# Global section +spamd_address = 127.0.0.1 11333 variant=rspamd +acl_smtp_data = acl_check_spam + +begin acl + +acl_check_spam: + # do not scan messages submitted from our own hosts + # +relay_from_hosts is assumed to be a list of hosts in configuration + accept hosts = +relay_from_hosts + + # skip scanning for authenticated users (if desired?) + accept authenticated = * - # Add milter headers: filter X-Milter-Add lines, strip prefix + optional [N] + # scan the message with rspamd (sets $spam_action, $spam_score, + # $spam_score_int, $spam_report, $spam_bar) + warn spam = nobody:true + + # Parse milter header operations from $spam_report (Rspamd 4.0+) + # Normalise vertical whitespace, then extract X-Milter-Add / X-Milter-Del lines + warn + set acl_m_report = ${sg{$spam_report}{\\v\\s+}{\\n}} set acl_m_milter_add = ${sg{\ ${sg{$acl_m_report}{(?m)^(?!X-Milter-Add: ).*(\\n|$)}{}}}\ {(?m)^X-Milter-Add: ([^\\[:\\n]+)(?:\\[\\d+\\])?: }{$1: }} - add_header = $acl_m_milter_add - - # Remove milter headers: filter X-Milter-Del lines, strip prefix + optional [N] set acl_m_milter_del = ${sg{\ ${sg{$acl_m_report}{(?m)^(?!X-Milter-Del: ).*(\\n|$)}{}}}\ {(?m)^X-Milter-Del: ([^\\[\\n]+).*}{$1}} - remove_header = $acl_m_milter_del + + # use greylisting available in rspamd v1.3+ + defer message = Please try again later + condition = ${if eq{$spam_action}{soft reject}} + + deny message = Message discarded as high-probability spam + condition = ${if eq{$spam_action}{reject}} + + # Remove foreign headers + warn remove_header = x-spam-bar : x-spam-score : x-spam-report : x-spam-status + + # Apply milter header additions from Rspamd modules (e.g. milter_headers, ARC) + warn + condition = ${if def:acl_m_milter_add} + add_header = $acl_m_milter_add + + # Apply milter header removals from Rspamd modules + warn + condition = ${if def:acl_m_milter_del} + remove_header = $acl_m_milter_del + + # add spam-score and spam-report header when "add header" action is recommended + warn + condition = ${if eq{$spam_action}{add header}} + add_header = X-Spam-Score: $spam_score ($spam_bar) + add_header = X-Spam-Report: $spam_report + + # add x-spam-status header if message is not ham + # do not match when $spam_action is empty (e.g. when rspamd is not running) + warn + ! condition = ${if match{$spam_action}{^no action\$|^greylist\$|^\$}} + add_header = X-Spam-Status: Yes + + # add x-spam-bar header if score is positive + warn + condition = ${if >{$spam_score_int}{0}} + add_header = X-Spam-Bar: $spam_bar + + accept ``` -This effectively gives Exim the same header-manipulation capabilities that were previously exclusive to milter-based integrations (Postfix, Sendmail). +Key points: +- The message is scanned **once** by `warn spam = nobody:true`. All subsequent blocks read from `$spam_report` and `$spam_action` without rescanning. +- `acl_m_milter_add` / `acl_m_milter_del` are only applied when non-empty (the `${if def:...}` guard prevents adding a blank header line). +- This gives Exim the same header-manipulation capabilities previously exclusive to milter-based integrations (Postfix, Sendmail). For further information please refer to the [Exim specification](https://www.exim.org/exim-html-current/doc/html/spec_html/), especially the [chapter about content scanning](https://www.exim.org/exim-html-current/doc/html/spec_html/ch-content_scanning_at_acl_time.html). diff --git a/docs/tutorials/migration.md b/docs/tutorials/migration.md index 66ea08e54..0cc2703d4 100644 --- a/docs/tutorials/migration.md +++ b/docs/tutorials/migration.md @@ -32,6 +32,116 @@ Discover a reliable step-by-step process for upgrading your Rspamd cluster while 10. Repeat the entire process starting from `step 1` for future updates. This approach ensures a smooth and controlled upgrade process that minimizes potential downtime and issues in your production environment. +## Migration to Rspamd 4.0.0 + +### 1. Bayes Per-User Resharding (Required for sharded Redis deployments) + +Rspamd 4.0 replaces Jump Hash with Ring Hash (Ketama) for consistent upstream selection in Redis-sharded Bayes deployments ([4ea7504](https://github.com/rspamd/rspamd/commit/4ea750466)). After upgrade, per-user Bayes keys will be looked up on different shards than where they were written. + +**Who is affected:** Only users with multiple `write_servers` configured for Bayes Redis backends. Single-server deployments are not affected. + +**Migration procedure:** + +1. Back up all Redis Bayes databases before proceeding. +2. While still running the old version, dump the statistics: +```bash +rspamadm statistics_dump dump -o /path/to/bayes-backup.bin +``` +3. Upgrade Rspamd to 4.0. +4. Run the migration tool to redistribute keys to the correct shards under Ring Hash: +```bash +rspamadm statistics_dump migrate +``` +5. Verify with `rspamc stat` that token counts look reasonable. + +If you skip the migration, existing Bayes data will not be lost — it will simply be on the wrong shard and accuracy will degrade until messages are re-learned naturally. ([36325c5](https://github.com/rspamd/rspamd/commit/36325c5c5)) + +### 2. Content URLs Included by Default + +`include_content_urls` now defaults to `true`, meaning `task:get_urls()` returns URLs extracted from PDF and other computed parts ([840e74d](https://github.com/rspamd/rspamd/commit/840e74db4)). This may trigger new RBL or URL reputation hits on messages with PDF attachments. + +To restore the previous behavior, add to `local.d/options.inc`: + +~~~hcl +include_content_urls = false; +~~~ + +### 3. SSL Worker Option Removed + +The `ssl = true` option in worker configuration blocks has been removed ([4674408](https://github.com/rspamd/rspamd/commit/4674408f6)). SSL is now auto-detected from bind socket flags. + +**Before:** +~~~hcl +worker "controller" { + bind_socket = "localhost:11334"; + ssl = true; + ssl_cert = "/path/to/cert.pem"; + ssl_key = "/path/to/key.pem"; +} +~~~ + +**After:** +~~~hcl +worker "controller" { + bind_socket = "localhost:11334 ssl"; + ssl_cert = "/path/to/cert.pem"; + ssl_key = "/path/to/key.pem"; +} +~~~ + +Remove `ssl = true` from all worker sections and append the `ssl` suffix to the relevant `bind_socket` lines. `rspamadm configtest` will flag any remaining `ssl = true` occurrences. + +### 4. Proxy Load Balancing Default Changed + +Token bucket load balancing is now the default algorithm for proxy upstreams ([728f19f](https://github.com/rspamd/rspamd/commit/728f19f20)), replacing simple round-robin. The change is generally transparent but alters request distribution under burst conditions. + +To restore round-robin, remove the `token_bucket` block from your proxy upstream configuration in `local.d/rspamd_proxy.inc`: + +~~~hcl +upstream "scan" { + # remove token_bucket { ... } block if present + hosts = "backend1:11333,backend2:11333"; +} +~~~ + +### 5. SenderScore RBLs Disabled + +`senderscore_reputation` is disabled by default because it requires a MyValidity account registration and was returning blocked results for all unregistered IPs ([ce71021](https://github.com/rspamd/rspamd/commit/ce71021ae)). + +Users with a registered MyValidity account who wish to keep using SenderScore should explicitly re-enable it in `local.d/reputation.conf`: + +~~~hcl +senderscore_reputation { + enabled = true; +} +~~~ + +### 6. DKIM Unknown Key Handling + +Unknown and broken DKIM keys are now handled strictly per RFC ([e9e6bac](https://github.com/rspamd/rspamd/commit/e9e6bac43)). Messages with malformed DKIM keys may receive different DKIM result symbols than before. No configuration change is required; review DKIM scores if you notice unexpected changes in classification. + +### 7. Suspicious TLDs Now Map-Based + +The hardcoded suspicious TLD list has been replaced with a map file at `conf/maps.d/suspicious_tlds.inc` ([614e68c](https://github.com/rspamd/rspamd/commit/614e68c8b)). + +- To **override** the list entirely, create `local.d/maps.d/suspicious_tlds.inc` with your own entries. +- To **extend** the default list, create `local.d/maps.d/suspicious_tlds.inc.local` and add extra TLDs there. + +Any TLDs previously maintained via hardcoded patches to the source or custom rules should be migrated to the map file. + +### 8. Neural Module Autolearn Option Renames + +Autolearn-related options in the neural module have been renamed to align with RBL module naming conventions ([71dac51](https://github.com/rspamd/rspamd/commit/71dac5167)). + +If you have custom neural configuration in `local.d/neural.conf` or `override.d/neural.conf`, review the [neural module documentation](/modules/neural) for the updated option names and update accordingly. Run `rspamadm configtest` to surface any unknown options. + +### 9. libfasttext Dependency Removed (Packagers) + +The external libfasttext C++ shared library is no longer required or used ([d96ee36](https://github.com/rspamd/rspamd/commit/d96ee3610)). The `ENABLE_FASTTEXT` cmake option has been removed — Fasttext support is always compiled in via the built-in shim. + +- **Packagers**: Remove libfasttext from build dependencies and runtime dependencies. +- **Users**: No action required. Existing `.bin` and `.ftz` model files continue to work without modification. + ## Migration to Rspamd 3.13.0 ### Multi-class Bayes diff --git a/docs/workers/rspamd_proxy.md b/docs/workers/rspamd_proxy.md index 65e9d8aac..f4f5c6293 100644 --- a/docs/workers/rspamd_proxy.md +++ b/docs/workers/rspamd_proxy.md @@ -55,6 +55,7 @@ The `hosts` option for the `upstream` and `mirror` can specify IP addresses or U | `ssl` | false | Use SSL/TLS for connection to upstream | | `keepalive` | false | Use HTTP keepalive (also accepted as `keep_alive`) | | `extra_headers` | - | Additional headers to send | +| `token_bucket` | enabled (4.0+) | Token bucket load balancing sub-block; see [Token bucket load balancing](#token-bucket-load-balancing) | For a full list of options, please refer to `rspamadm confighelp workers.rspamd_proxy`. @@ -136,6 +137,36 @@ upstream "scan" { } ~~~ +## Token bucket load balancing + +Starting from Rspamd 4.0, the proxy uses **token bucket** load balancing for upstream selection by default, replacing the previous round-robin algorithm. Token bucket distributes requests proportionally to available capacity and handles burst traffic more gracefully than round-robin. + +Each upstream maintains a bucket of tokens. Tokens are replenished at a configurable rate. Each forwarded request consumes tokens proportional to its cost. When a bucket is empty, the upstream is temporarily deprioritised. + +The token bucket behaviour is controlled per-upstream via the `token_bucket` sub-block: + +~~~hcl +# local.d/worker-proxy.inc +upstream "scan" { + default = yes; + hosts = "host1:11333,host2:11333"; + + token_bucket { + max_tokens = 100; # bucket capacity (default: 100) + scale = 1.0; # replenishment rate multiplier (default: 1.0) + base_cost = 1.0; # tokens consumed per request (default: 1.0) + } +} +~~~ + +| Option | Default | Description | +|--------|---------|-------------| +| `max_tokens` | 100 | Maximum bucket capacity | +| `scale` | 1.0 | Token replenishment rate multiplier relative to request rate | +| `base_cost` | 1.0 | Base token cost per request | + +To restore the pre-4.0 round-robin behaviour, remove the `token_bucket` block from the upstream configuration entirely. + ## Mirroring