diff --git a/README.md b/README.md
index d153163..25184be 100644
--- a/README.md
+++ b/README.md
@@ -95,7 +95,6 @@ For RL the trainer must see the exact token ids the sampler saw. The standard al
 - **Boolean round-trip.** Engine emits `false`; client parses to Python `bool(False)`; `apply_chat_template` re-renders via `str(False)` → `"False"`. Capital F. Reproducible on Qwen3.5-35B-A3B + mini-swe-agent-plus at ~50% break rate per rollout.
 - **BPE retokenization drift.** The same substring tokenizes differently depending on neighbouring bytes. `json` + `p` + `enderer` (3 tokens) vs `jsonp` + `enderer` (2 tokens) when whitespace shifts by one character. Every subsequent token is shifted from there on.
 - **Tool-call XML drift.** The engine emits a no-arg call with a stylistic empty `</parameter>`; the Jinja re-render of the reconstructed dict drops it. Extension property broken at every such call.
-- **Thinking stripped from non-latest assistants.** Some templates strip `<think>…</think>` blocks from prior assistant turns when re-rendering. The recorded stream has the thinking; the next prompt does not.
 - **Max-seq-len truncation zeroing the anchor.** Client-side `max_seq_len` enforcement zeros `completion_ids` when `prompt_len > max_seq_len`. The bridge anchor is empty, falling back to full re-render — triggering every mode above.
 - **Scaffold-level history rewriting.** Some agent scaffolds (e.g. opencode's `experimental_repairToolCall`) rewrite tool calls before sending them back as history. The next turn's prompt contains a tool call the model never emitted. *A renderer cannot fix this — the drift happens before rendering.*
 
@@ -122,9 +121,9 @@ from renderers import (
 )
 
 # Auto-resolve renderer from the tokenizer's model name. Carries the
-# shared preserve_* flags; template kwargs require an explicit choice.
+# shared thinking_retention flag; template kwargs require an explicit choice.
 renderer = create_renderer(tokenizer)
-renderer = create_renderer(tokenizer, AutoRendererConfig(preserve_all_thinking=True))
+renderer = create_renderer(tokenizer, AutoRendererConfig(thinking_retention="all"))
 
 # Explicit choice — the typed config exposes exactly the fields that
 # renderer's chat template honours.
@@ -142,12 +141,13 @@ renderer = create_renderer(
 
 Discriminated union: every per-renderer config is a variant of `RendererConfig`, dispatched on the `name` field. Bogus combinations (e.g. `add_vision_id` under `name="qwen3"`) error at construction with a `pydantic.ValidationError`. Downstream pydantic configs (prime-rl orchestrator, verifiers `ClientConfig`) hold a single field typed as `RendererConfig` and inherit the same strict-per-variant validation.
 
-Two shared behaviour flags live on every variant via `_BaseRendererConfig`:
+One shared behaviour flag lives on every variant via `_BaseRendererConfig`: `thinking_retention`, an ascending scale (`"template"` < `"tool_cycle"` < `"all"`) whose floor is the chat template's own decision.
 
-- `preserve_all_thinking=True` — every past assistant's `reasoning_content` is kept, even when the chat template would drop it.
-- `preserve_thinking_between_tool_calls=True` — reasoning is kept on assistants in the in-flight tool cycle (post-last-user A-T-…-A block when it contains a tool response). A new user turn closes the block and drops its thinking.
+- `thinking_retention="template"` (default) — defer entirely to the chat template.
+- `thinking_retention="tool_cycle"` — additionally keep reasoning on assistants in the in-flight tool cycle (post-last-user A-T-…-A block when it contains a tool response). A new user turn closes the block and drops its thinking.
+- `thinking_retention="all"` — additionally keep every past assistant's `reasoning_content`, even when the chat template would drop it.
 
-These OR-compose with template-level toggles (e.g. GLM-5 `clear_thinking`, Nemotron-3 `truncate_history_thinking`): either flag saying "keep" wins. preserve_* can only ever *extend* retention — never override a template kwarg into a "drop" decision. The canonical use case is **compaction**: injecting a `user` turn like *"summarize the work so far"* puts every prior assistant in a past cycle, and `preserve_all_thinking=True` keeps reasoning visible end-to-end.
+`thinking_retention` is honoured end-to-end — both `render()` and `bridge_to_next_turn` consult it. So a multi-turn rollout reproduces the chat template's history handling **faithfully by default**: when a new user turn arrives, a past block's reasoning is dropped exactly as `apply_chat_template` would, and the bridge declines that boundary (letting the caller re-render) rather than carrying the stale `<think>` forward. The override can only ever *extend* retention above the template floor — it never forces a drop the template would keep. GLM-5 `clear_thinking` / Nemotron-3 `truncate_history_thinking` are byte-equivalent template kwargs (`False` ≡ `"all"`); setting one of them *and* a contradictory `thinking_retention` raises at config-load rather than silently resolving. The canonical override use case is **compaction**: injecting a `user` turn like *"summarize the work so far"* puts every prior assistant in a past cycle, and `thinking_retention="all"` keeps reasoning visible end-to-end.
 
 ## `DefaultRenderer`
 
@@ -156,7 +156,7 @@ Fallback for unsupported models. Wraps `apply_chat_template` and accepts `tool_p
 ## Roadmap
 
 - **VLM support.** `ContentPart` is text-only today; `Qwen3VLRenderer` ships only because Qwen3-VL's text-only chat template differs from Qwen3's. Plan: add `ImagePart` / `VideoPart`, multimodal bridges, validate against a Qwen3-VL RL run.
-- **Patched chat templates.** Some shipped templates re-tokenize history, normalize JSON, or auto-strip thinking — each breaks the extension property. Plan: a `use_patched` opt-in per renderer that renders the same surface form while avoiding known-bad patterns.
+- **Patched chat templates.** Some shipped templates re-tokenize history or normalize JSON in ways that break token identity. Plan: a `use_patched` opt-in per renderer that renders the same surface form while avoiding known-bad patterns. (Auto-stripping thinking from past turns is *not* one of these — that's intended template behaviour the renderer reproduces; use `thinking_retention` to override it.)
 
 ## Testing
 
diff --git a/docs/renderer-config.md b/docs/renderer-config.md
index 3d5fa2a..93cced3 100644
--- a/docs/renderer-config.md
+++ b/docs/renderer-config.md
@@ -50,10 +50,10 @@ Configs are frozen. To override a field, construct a new instance or call
 
 ```python
 r = create_renderer(tokenizer)                                 # AutoRendererConfig() is the default
-r = create_renderer(tokenizer, AutoRendererConfig(preserve_all_thinking=True))
+r = create_renderer(tokenizer, AutoRendererConfig(thinking_retention="all"))
 ```
 
-`AutoRendererConfig` carries only the shared `preserve_*` flags. Template
+`AutoRendererConfig` carries only the shared `thinking_retention` flag. Template
 kwargs depend on the renderer, so overriding them requires naming the
 renderer explicitly:
 
@@ -67,33 +67,43 @@ falling back for a VLM would produce token streams the trainer can't
 reconstruct. Text-only fine-tunes without a registered renderer fall back to
 `DefaultRenderer` and log the choice at INFO.
 
-## `preserve_*` flags
-
-Every variant carries two renderer-agnostic flags on `_BaseRendererConfig`:
-
-- `preserve_all_thinking: bool = False` — re-emit `reasoning_content` on
-  every past assistant turn, even when the chat template would drop it.
-- `preserve_thinking_between_tool_calls: bool = False` — re-emit
-  `reasoning_content` only inside the in-flight tool cycle (the contiguous
-  A-T-…-A block after the most recent `user` message, when it contains at
-  least one `tool` response). A new user turn closes the block and drops
-  its thinking.
-
-These OR-compose with template-level toggles. GLM-5's `clear_thinking` and
-Nemotron-3's `truncate_history_thinking` already gate past thinking; the
-`preserve_*` flags add to that:
-
-| `clear_thinking` | `preserve_all_thinking` | past thinking? |
-|------------------|-------------------------|----------------|
-| `True` (default — drop) | `False` (default) | dropped |
-| `True`           | `True`                  | kept           |
-| `False` (keep)   | `False`                 | kept           |
-| `False`          | `True`                  | kept           |
-
-`preserve_*` can only extend retention, never force a drop. The canonical
-use case is **compaction**: injecting a `user` turn like *"summarize the work
-so far"* puts every prior assistant in a past cycle, and
-`preserve_all_thinking=True` keeps reasoning visible end-to-end.
+## `thinking_retention`
+
+Every variant carries one renderer-agnostic flag on `_BaseRendererConfig`,
+an ascending scale whose floor is the chat template's own decision:
+
+- `thinking_retention: Literal["template", "tool_cycle", "all"] = "template"`
+  - `"template"` (default) — defer entirely to the chat template.
+  - `"tool_cycle"` — additionally re-emit `reasoning_content` inside the
+    in-flight tool cycle (the contiguous A-T-…-A block after the most
+    recent `user` message, when it contains at least one `tool` response).
+    A new user turn closes the block and drops its thinking.
+  - `"all"` — additionally re-emit `reasoning_content` on every past
+    assistant turn, even when the chat template would drop it.
+
+The levels are nested: `"all"` ⊇ `"tool_cycle"` ⊇ `"template"`, and the
+level is honoured end-to-end — `render()` and `bridge_to_next_turn` both
+consult it, so multi-turn rollouts reproduce the template's history handling
+faithfully by default. GLM-5's `clear_thinking` and Nemotron-3's
+`truncate_history_thinking` are byte-equivalent template kwargs (`False` ≡
+`"all"`) gating the same past thinking; `thinking_retention` composes with
+them as:
+
+| `clear_thinking` | `thinking_retention` | past thinking? |
+|------------------|----------------------|----------------|
+| `True` (default — drop) | `"template"` (default) | dropped |
+| `True`           | `"all"`              | kept           |
+| `False` (keep)   | `"template"`         | kept           |
+| `False`          | `"all"`              | kept           |
+
+`thinking_retention` can only extend retention, never force a drop — the
+template is the floor. Because the kwarg and `thinking_retention` name the
+same thing, explicitly setting a keep-history kwarg to `False` *and* a
+non-`"all"` `thinking_retention` is contradictory and raises at config-load
+(set `thinking_retention="all"` instead). The canonical use case is **compaction**: injecting
+a `user` turn like *"summarize the work so far"* puts every prior assistant
+in a past cycle, and `thinking_retention="all"` keeps reasoning visible
+end-to-end.
 
 ## `DefaultRendererConfig` accepts arbitrary Jinja kwargs
 
@@ -139,7 +149,7 @@ In TOML / YAML, the discriminator routes deserialization:
 name = "qwen3.5"
 enable_thinking = false
 add_vision_id = true
-preserve_all_thinking = true
+thinking_retention = "all"
 ```
 
 Pydantic dispatches on `name = "qwen3.5"` to `Qwen35RendererConfig`. Bogus
diff --git a/renderers/base.py b/renderers/base.py
index ed5bc7e..7d5da7c 100644
--- a/renderers/base.py
+++ b/renderers/base.py
@@ -20,7 +20,11 @@
 )
 
 if TYPE_CHECKING:
-    from renderers.configs import AutoRendererConfig, RendererConfig
+    from renderers.configs import (
+        AutoRendererConfig,
+        RendererConfig,
+        ThinkingRetention,
+    )
 
 logger = logging.getLogger("renderers.base")
 
@@ -666,14 +670,13 @@ def render(
         """Render messages to token IDs with per-token message attribution.
 
         Behaviour around historical ``reasoning_content`` is owned by the
-        renderer instance — the ``preserve_all_thinking`` and
-        ``preserve_thinking_between_tool_calls`` flags are constructor
-        kwargs, not call-site kwargs. To render with a different
-        configuration, build a different renderer (or different pool).
-        Defaults preserve byte-identity with each model's chat template;
-        flipping a flag at construction restores ``reasoning_content``
-        the template would otherwise drop. See
-        ``should_preserve_past_thinking`` for the per-message
+        renderer instance — the ``thinking_retention`` level is a
+        constructor kwarg, not a call-site kwarg. To render with a
+        different configuration, build a different renderer (or different
+        pool). The ``"template"`` default preserves byte-identity with
+        each model's chat template; raising the level at construction
+        restores ``reasoning_content`` the template would otherwise drop.
+        See ``should_preserve_past_thinking`` for the per-message
         classification.
         """
         ...
@@ -769,7 +772,13 @@ def bridge_to_next_turn(
         Return ``None`` whenever the renderer can't prove that contract
         holds — the caller falls back to a full re-render. In particular,
         bridges refuse assistant messages in ``new_messages`` (those would
-        re-tokenize model-sampled content). Hand-coded renderers know their
+        re-tokenize model-sampled content), and a renderer whose template
+        drops a past block's thinking once a new user query arrives refuses
+        to span that boundary while carrying ``<think>`` verbatim would keep
+        it (it falls back to a re-render, which honours ``thinking_retention``
+        and drops the stale thinking — keeping the bridge faithful to the
+        template). ``thinking_retention="all"`` keeps thinking on every path,
+        so no decline is needed there. Hand-coded renderers know their
         canonical close and synthesise it on truncated priors;
         DefaultRenderer always returns ``None`` because the template's
         close is unknown.
@@ -1527,10 +1536,7 @@ def _resolve_auto(tokenizer, auto: AutoRendererConfig) -> Renderer:
     model_name = getattr(tokenizer, "name_or_path", "")
     renderer_name = MODEL_RENDERER_MAP.get(model_name)
 
-    preserve_carry = {
-        "preserve_all_thinking": auto.preserve_all_thinking,
-        "preserve_thinking_between_tool_calls": auto.preserve_thinking_between_tool_calls,
-    }
+    preserve_carry = {"thinking_retention": auto.thinking_retention}
 
     if renderer_name is not None:
         cfg_cls = _config_class_for(renderer_name)
@@ -1555,11 +1561,12 @@ def _resolve_auto(tokenizer, auto: AutoRendererConfig) -> Renderer:
     # Text-only fall back to default (apply_chat_template). For fine-tunes
     # with customized chat templates this is the *correct* choice, so we
     # don't warn. Note the pick at INFO and advertise the parser knobs.
-    if auto.preserve_all_thinking or auto.preserve_thinking_between_tool_calls:
+    if auto.thinking_retention != "template":
         raise NotImplementedError(
             "Auto-resolved DefaultRenderer can't selectively re-emit "
             "dropped reasoning_content. Pass an explicit typed renderer "
-            "config (model-specific) if you need preserve_*_thinking."
+            "config (model-specific) if you need thinking_retention != "
+            "'template'."
         )
     logger.info(
         "No model-specific renderer matched %r. Using DefaultRenderer "
@@ -1912,12 +1919,92 @@ def reject_assistant_in_extension(new_messages: list[Message]) -> bool:
     return any(m.get("role") == "assistant" for m in new_messages)
 
 
+def introduces_user_query(new_messages: list[Message]) -> bool:
+    """Return True if ``new_messages`` opens a new user-query turn.
+
+    A query boundary is a ``user`` message whose content isn't a
+    ``<tool_response>...</tool_response>`` wrapper — tool responses
+    (``role="tool"``, or folded into a wrapped user turn) continue the
+    in-flight cycle rather than starting a new query. This mirrors the
+    boundary chat templates use to decide when a past assistant block's
+    thinking becomes "older" and is dropped, so a bridge can tell whether
+    spanning ``new_messages`` would cross that boundary (cf.
+    ``should_preserve_past_thinking`` and per-renderer ``_last_query_index``).
+    """
+    for m in new_messages:
+        if m.get("role") != "user":
+            continue
+        content = m.get("content")
+        if (
+            isinstance(content, str)
+            and content.startswith("<tool_response>")
+            and content.endswith("</tool_response>")
+        ):
+            continue
+        return True
+    return False
+
+
+def _contains_subsequence(haystack: list[int], needle: list[int]) -> bool:
+    """True if ``needle`` appears as a contiguous run inside ``haystack``."""
+    n = len(needle)
+    if n == 0:
+        return False
+    if n == 1:
+        return needle[0] in haystack
+    first = needle[0]
+    for i in range(len(haystack) - n + 1):
+        if haystack[i] == first and haystack[i : i + n] == needle:
+            return True
+    return False
+
+
+def reject_thinking_strip_in_extension(
+    previous_prompt_ids: list[int],
+    previous_completion_ids: list[int],
+    new_messages: list[Message],
+    *,
+    thinking_retention: ThinkingRetention,
+    thinking_marker_ids: list[int],
+    enable_thinking: bool = True,
+) -> bool:
+    """Return True if a bridge must refuse to span ``new_messages``.
+
+    A renderer whose template drops a past block's thinking once a new user
+    query arrives can't span that boundary while keeping prior tokens
+    verbatim — the kept thinking would diverge from a faithful re-render
+    (which honours ``thinking_retention`` and drops the stale thinking).
+    Thinking renderers OR this into their bridge's reject check.
+
+    ``thinking_marker_ids`` is a token subsequence whose presence in the
+    prior tokens signals a sampled thinking block — a single-token close
+    like ``[</think>]`` (Qwen3, GLM, Nemotron), a multi-token close
+    (``Kimi-K2.5``), or a harmony analysis-channel header
+    (``[<|channel|>, …"analysis"…, <|message|>]`` for gpt-oss).
+    ``thinking_retention="all"`` keeps thinking on every path, so it never
+    declines.
+
+    Returns ``False`` (safe to bridge) when: ``thinking_retention="all"``;
+    thinking is disabled (no sampled thinking to strip — also avoids the
+    empty ``<think></think>`` generation-prompt scaffold a disabled model
+    leaves in ``previous_prompt_ids``); ``new_messages`` stays in the
+    in-flight cycle (no new user query); or the prior tokens carry no
+    thinking marker to strip.
+    """
+    if thinking_retention == "all" or not enable_thinking:
+        return False
+    if not introduces_user_query(new_messages):
+        return False
+    return _contains_subsequence(
+        previous_completion_ids, thinking_marker_ids
+    ) or _contains_subsequence(previous_prompt_ids, thinking_marker_ids)
+
+
 def should_preserve_past_thinking(
     messages: list[Message],
     msg_idx: int,
     *,
-    preserve_all_thinking: bool,
-    preserve_thinking_between_tool_calls: bool,
+    thinking_retention: ThinkingRetention,
 ) -> bool:
     """Should ``messages[msg_idx]``'s ``reasoning_content`` be emitted as
     thinking even when the chat template would drop it?
@@ -1925,25 +2012,25 @@ def should_preserve_past_thinking(
     Returns ``True`` only as an override above the template default. Each
     renderer ORs this into its own "render thinking?" condition; a result
     of ``False`` means "follow the template" (drop or keep as the template
-    decides), not "force-drop".
-
-    Override rules:
-
-    - ``preserve_all_thinking`` — every past-asst's thinking is kept.
-    - ``preserve_thinking_between_tool_calls`` — keeps thinking only
-      inside the *current* tool cycle: the contiguous A-T-...-A block
-      after the most recent ``user`` message, and only if that block
-      contains at least one ``tool`` response. As soon as a new
-      ``user`` turn arrives, the previous block becomes "older" and
-      its thinking is dropped (template default), matching how most
-      chat templates already handle multi-turn contexts. Use
-      ``preserve_all_thinking`` if you need thinking on older blocks
-      to survive the user-turn boundary too.
+    decides), not "force-drop". ``thinking_retention`` selects how far the
+    override reaches:
+
+    - ``"template"`` — no override; defer to the template (always ``False``).
+    - ``"all"`` — every past-asst's thinking is kept (always ``True``).
+    - ``"tool_cycle"`` — keeps thinking only inside the *current* tool
+      cycle: the contiguous A-T-...-A block after the most recent ``user``
+      message, and only if that block contains at least one ``tool``
+      response. As soon as a new ``user`` turn arrives, the previous block
+      becomes "older" and its thinking is dropped (template default),
+      matching how most chat templates handle multi-turn contexts. Use
+      ``"all"`` if you need thinking on older blocks to survive the
+      user-turn boundary too.
     """
-    if preserve_all_thinking:
+    if thinking_retention == "all":
         return True
-    if not preserve_thinking_between_tool_calls:
+    if thinking_retention == "template":
         return False
+    # thinking_retention == "tool_cycle"
     # Most recent user message (or -1 if none).
     last_user = -1
     for j in range(len(messages) - 1, -1, -1):
diff --git a/renderers/configs.py b/renderers/configs.py
index d500f8e..b87cbe1 100644
--- a/renderers/configs.py
+++ b/renderers/configs.py
@@ -4,14 +4,15 @@
 Each renderer accepts its own typed config; bad combinations (e.g.
 ``add_vision_id`` under ``name="qwen3"``) fail at config-load time with a
 pydantic ``ValidationError`` rather than at runtime via an allowlist
-check. The shared ``preserve_*`` flags live on ``BaseRendererConfig``
-and OR-compose with template-level toggles (e.g. GLM-5
-``clear_thinking``) inside each renderer — they extend retention, never
-override the template into a drop.
+check. The shared ``thinking_retention`` flag lives on
+``BaseRendererConfig`` and OR-composes with template-level toggles (e.g.
+GLM-5 ``clear_thinking``) inside each renderer — it extends retention,
+never overrides the template into a drop.
 
 ``AutoRendererConfig`` is a placeholder variant: ``create_renderer``
 resolves it via ``MODEL_RENDERER_MAP`` and constructs the matching
-typed config with the auto config's ``preserve_*`` fields carried over.
+typed config with the auto config's ``thinking_retention`` field carried
+over.
 
 ``DefaultRendererConfig`` uses ``extra="allow"`` to accept arbitrary
 Jinja kwargs as ``model_extra`` — ``DefaultRenderer`` doesn't know which
@@ -22,10 +23,45 @@
 
 from typing import Annotated, ClassVar, Literal, Union
 
-from pydantic import ConfigDict, Field
+from pydantic import ConfigDict, Field, model_validator
 from pydantic_config import BaseConfig
 
 
+def _reject_thinking_retention_conflict(config: BaseConfig, kwarg_name: str) -> None:
+    """Raise if an explicit template thinking-kwarg contradicts an explicit
+    ``thinking_retention``.
+
+    ``clear_thinking`` (GLM) / ``truncate_history_thinking`` (Nemotron) are
+    byte-equivalent to ``thinking_retention``: setting one ``False`` keeps
+    all past thinking, i.e. ``thinking_retention="all"``. They're the same
+    knob, so a user who explicitly sets both to disagreeing values almost
+    certainly has a bug — surface it rather than silently resolving. Only
+    fires when BOTH are explicit; a defaulted value is not an intent (and
+    this keeps the per-kwarg parity matrix, which sets one field at a time,
+    working).
+    """
+    fields_set = config.__pydantic_fields_set__
+    if (
+        kwarg_name in fields_set
+        and "thinking_retention" in fields_set
+        and getattr(config, kwarg_name) is False
+        and config.thinking_retention != "all"
+    ):
+        raise ValueError(
+            f"{kwarg_name}=False keeps all past thinking (the same as "
+            f"thinking_retention='all'), which conflicts with the explicit "
+            f"thinking_retention={config.thinking_retention!r}. They are the "
+            f"same knob — set thinking_retention='all' (or drop {kwarg_name})."
+        )
+
+
+ThinkingRetention = Literal["template", "tool_cycle", "all"]
+"""How far past-assistant ``reasoning_content`` is retained, as an override
+*above* the chat-template floor. ``"template"`` defers to the template;
+``"tool_cycle"`` and ``"all"`` progressively extend retention and never force a
+drop. See :attr:`BaseRendererConfig.thinking_retention`."""
+
+
 class BaseRendererConfig(BaseConfig):
     """Shared fields and config for every renderer config variant.
 
@@ -35,28 +71,32 @@ class BaseRendererConfig(BaseConfig):
     this class adds ``frozen=True`` so configs are hashable value
     objects.
 
-    ``preserve_all_thinking`` and ``preserve_thinking_between_tool_calls``
-    are renderer-internal behaviour flags — they don't map to any Jinja
-    chat-template kwarg. They OR-compose with template-level toggles on
-    renderers that expose one (GLM-5 ``clear_thinking``, Nemotron-3
-    ``truncate_history_thinking``): either flag saying "keep this
-    thinking" wins. preserve_* can only ever extend retention; setting
-    ``preserve_all_thinking=True`` always keeps past thinking, regardless
-    of the template kwarg. See ``renderers.base.should_preserve_past_thinking``.
+    ``thinking_retention`` is a renderer-internal behaviour flag — it
+    doesn't map to any Jinja chat-template kwarg. It OR-composes with
+    template-level toggles on renderers that expose one (GLM-5
+    ``clear_thinking``, Nemotron-3 ``truncate_history_thinking``):
+    whichever side says "keep this thinking" wins. The chat template is
+    the floor — ``thinking_retention`` can only ever *extend* retention,
+    never override the template into a drop. See
+    ``renderers.base.should_preserve_past_thinking``.
     """
 
     model_config = ConfigDict(frozen=True)
 
-    preserve_all_thinking: bool = False
-    """Restore ``reasoning_content`` on every past assistant turn, even
-    when the chat template would drop it. Strict superset of
-    ``preserve_thinking_between_tool_calls``."""
+    thinking_retention: ThinkingRetention = "template"
+    """How much past-assistant ``reasoning_content`` to re-emit, layered as
+    an override *above* the chat template's own decision. The template is
+    the floor — each level only ever *extends* retention, never forces a
+    drop:
+
+    - ``"template"`` (default) — defer entirely to the chat template.
+    - ``"tool_cycle"`` — additionally keep thinking inside the in-flight
+      tool cycle: the contiguous A-T-...-A block after the most recent
+      ``user`` turn, when it contains at least one ``tool`` response. A new
+      user turn closes the block.
+    - ``"all"`` — additionally keep thinking on every past assistant turn.
 
-    preserve_thinking_between_tool_calls: bool = False
-    """Restore ``reasoning_content`` only inside the in-flight tool cycle:
-    the contiguous A-T-...-A block after the most recent ``user`` turn,
-    and only if it contains at least one ``tool`` response. A new user
-    turn closes the block and drops its thinking (template default)."""
+    Ascending and nested: ``"all"`` ⊇ ``"tool_cycle"`` ⊇ ``"template"``."""
 
     # Fields that are renderer-internal — not forwarded to (or mirrored
     # by) ``apply_chat_template``. Override in subclasses that hold
@@ -87,8 +127,9 @@ def template_field_names(cls) -> frozenset[str]:
 
 class AutoRendererConfig(BaseRendererConfig):
     """Resolve the renderer from ``tokenizer.name_or_path`` at construction
-    time via ``MODEL_RENDERER_MAP``. Carries only the shared ``preserve_*``
-    fields; template kwargs require an explicit renderer choice so that
+    time via ``MODEL_RENDERER_MAP``. Carries only the shared
+    ``thinking_retention`` field; template kwargs require an explicit
+    renderer choice so that
     template-dependent behaviour stays visible at the call site."""
 
     name: Literal["auto"] = "auto"
@@ -119,6 +160,26 @@ class DefaultRendererConfig(BaseRendererConfig):
     # template. Jinja kwargs live in ``model_extra`` (extra="allow").
     _internal_fields = frozenset({"tool_parser", "reasoning_parser"})
 
+    @model_validator(mode="after")
+    def _reject_legacy_preserve_flags(self):
+        # ``extra="allow"`` would otherwise swallow the removed ``preserve_*``
+        # bools into ``model_extra`` and forward them to apply_chat_template,
+        # silently dropping the user's intent (DefaultRenderer can't
+        # selectively re-emit reasoning_content). Reject them like every other
+        # config's ``extra="forbid"`` does, pointing at the replacement.
+        legacy = {
+            "preserve_all_thinking",
+            "preserve_thinking_between_tool_calls",
+        } & set(self.model_extra or {})
+        if legacy:
+            raise ValueError(
+                f"{sorted(legacy)} were replaced by thinking_retention. "
+                "DefaultRenderer falls back to apply_chat_template and can't "
+                "selectively re-emit reasoning_content — use thinking_retention "
+                "on a model-specific renderer."
+            )
+        return self
+
 
 class Qwen3RendererConfig(BaseRendererConfig):
     """Qwen3 (text-only) renderer config."""
@@ -199,8 +260,13 @@ class GLM5RendererConfig(BaseRendererConfig):
     """When ``False``, the renderer keeps ``<think>{reasoning}</think>``
     on past-cycle assistant turns instead of dropping them. Mirrors the
     chat template's ``clear_thinking`` toggle. OR-composes with
-    ``preserve_all_thinking`` / ``preserve_thinking_between_tool_calls``
-    — see :class:`BaseRendererConfig` for the contract."""
+    ``thinking_retention`` — see :class:`BaseRendererConfig` for the
+    contract."""
+
+    @model_validator(mode="after")
+    def _check_thinking_retention(self):
+        _reject_thinking_retention_conflict(self, "clear_thinking")
+        return self
 
 
 class GLM51RendererConfig(BaseRendererConfig):
@@ -215,6 +281,11 @@ class GLM51RendererConfig(BaseRendererConfig):
     clear_thinking: bool = True
     """See :class:`GLM5RendererConfig.clear_thinking`."""
 
+    @model_validator(mode="after")
+    def _check_thinking_retention(self):
+        _reject_thinking_retention_conflict(self, "clear_thinking")
+        return self
+
 
 class GLM45RendererConfig(BaseRendererConfig):
     """GLM-4.5 Air renderer config."""
@@ -321,10 +392,11 @@ class LagunaXS2RendererConfig(BaseRendererConfig):
 class Llama3RendererConfig(BaseRendererConfig):
     """Llama-3.x Instruct renderer config.
 
-    Llama-3 ships no reasoning channel, so the base ``preserve_*_thinking``
-    flags don't apply: ``Llama3Renderer`` raises ``NotImplementedError``
-    if either is set (matching ``DefaultRenderer``'s contract for the
-    same case). Both fields below mirror real ``apply_chat_template``
+    Llama-3 ships no reasoning channel, so the base ``thinking_retention``
+    flag is a no-op: there's never any past-assistant thinking to retain
+    or drop, so any level leaves the token stream unchanged (same contract
+    as Kimi-K2 / Qwen3-VL). Both fields below mirror real
+    ``apply_chat_template``
     kwargs.
     """
 
@@ -373,8 +445,13 @@ class Nemotron3RendererConfig(BaseRendererConfig):
     """When ``False``, keep ``<think>{reasoning}</think>`` on past-cycle
     assistant turns instead of dropping them. Mirrors the chat
     template's ``truncate_history_thinking`` toggle. OR-composes with
-    ``preserve_all_thinking`` / ``preserve_thinking_between_tool_calls``
-    — see :class:`BaseRendererConfig` for the contract."""
+    ``thinking_retention`` — see :class:`BaseRendererConfig` for the
+    contract."""
+
+    @model_validator(mode="after")
+    def _check_thinking_retention(self):
+        _reject_thinking_retention_conflict(self, "truncate_history_thinking")
+        return self
 
     low_effort: bool = False
     """When ``True``, append ``\\n\\n{reasoning effort: low}`` to the last user
@@ -404,6 +481,11 @@ class Nemotron3UltraRendererConfig(BaseRendererConfig):
     truncate_history_thinking: bool = True
     """See :class:`Nemotron3RendererConfig.truncate_history_thinking`."""
 
+    @model_validator(mode="after")
+    def _check_thinking_retention(self):
+        _reject_thinking_retention_conflict(self, "truncate_history_thinking")
+        return self
+
     medium_effort: bool = False
     """When ``True``, append ``\\n\\n{reasoning effort: efficient}`` to the last
     user message. Mirrors the Ultra chat template's ``medium_effort`` kwarg."""
@@ -426,8 +508,8 @@ class DeepSeekR1RendererConfig(BaseRendererConfig):
     R1 always reasons — its chat template unconditionally prefills
     ``<think>\\n`` at the generation prompt and strips ``</think>`` from
     historical assistant turns. There is therefore no ``enable_thinking``
-    knob (thinking is not optional), and ``preserve_*`` flags are no-ops
-    (history reasoning is always dropped); both stored for protocol
+    knob (thinking is not optional), and ``thinking_retention`` is a no-op
+    (history reasoning is always dropped); stored for protocol
     uniformity. Applies to full ``deepseek-ai/DeepSeek-R1`` / ``-R1-0528``
     — NOT the R1-Distill-Qwen/Llama models, which use those base
     tokenizers and route to the Qwen3 / Llama-3 renderers.
@@ -473,7 +555,7 @@ class DeepSeekR1RendererConfig(BaseRendererConfig):
 # Map discriminator → config class. Used by ``create_renderer`` when
 # resolving ``AutoRendererConfig`` against ``MODEL_RENDERER_MAP``: the
 # resolved renderer name picks the corresponding typed config, and the
-# auto config's ``preserve_*`` fields are carried over.
+# auto config's ``thinking_retention`` field is carried over.
 _CONFIG_BY_NAME: dict[str, type[BaseRendererConfig]] = {
     "auto": AutoRendererConfig,
     "default": DefaultRendererConfig,
@@ -543,5 +625,6 @@ def config_from_name(name: str) -> BaseRendererConfig | None:
     "Qwen3RendererConfig",
     "Qwen3VLRendererConfig",
     "RendererConfig",
+    "ThinkingRetention",
     "config_from_name",
 ]
diff --git a/renderers/deepseek_v3.py b/renderers/deepseek_v3.py
index 5f4840a..3cb82d3 100644
--- a/renderers/deepseek_v3.py
+++ b/renderers/deepseek_v3.py
@@ -48,7 +48,7 @@ class DeepSeekV3Renderer:
     assistant content is emitted verbatim. The reasoning variant
     (``<think>``-prefilled prompt, history reasoning stripped) lives in
     :class:`renderers.deepseek_r1.DeepSeekR1Renderer`, which subclasses
-    this one. ``preserve_*`` flags are no-ops here (no reasoning channel),
+    this one. ``thinking_retention`` is a no-op here (no reasoning channel),
     stored for protocol uniformity.
     """
 
diff --git a/renderers/default.py b/renderers/default.py
index a662097..2e9c7d2 100644
--- a/renderers/default.py
+++ b/renderers/default.py
@@ -95,11 +95,12 @@ def __init__(
         config: DefaultRendererConfig | None = None,
     ):
         cfg = config or DefaultRendererConfig()
-        if cfg.preserve_all_thinking or cfg.preserve_thinking_between_tool_calls:
+        if cfg.thinking_retention != "template":
             raise NotImplementedError(
                 "DefaultRenderer falls back to apply_chat_template and can't "
                 "selectively re-emit dropped reasoning_content. Configure a "
-                "model-specific renderer if you need preserve_*_thinking."
+                "model-specific renderer if you need thinking_retention != "
+                "'template'."
             )
         self._tokenizer = tokenizer
         self.config = cfg
diff --git a/renderers/glm45.py b/renderers/glm45.py
index 7af9259..7ca4d1c 100644
--- a/renderers/glm45.py
+++ b/renderers/glm45.py
@@ -23,6 +23,7 @@
     attribute_text_segments,
     extract_message_tool_names,
     reject_assistant_in_extension,
+    reject_thinking_strip_in_extension,
     should_preserve_past_thinking,
 )
 from renderers.configs import GLM45RendererConfig
@@ -228,8 +229,7 @@ def emit_text_segments(
                 preserve_thinking = should_preserve_past_thinking(
                     messages,
                     i,
-                    preserve_all_thinking=self.config.preserve_all_thinking,
-                    preserve_thinking_between_tool_calls=self.config.preserve_thinking_between_tool_calls,
+                    thinking_retention=self.config.thinking_retention,
                 )
                 self._render_assistant(
                     msg,
@@ -293,7 +293,7 @@ def parse_response(
             token_ids,
             stop_ids={self._endoftext, self._user, self._observation},
             think_id=self._think,
-            think_end_id=self._think_end,
+            thinking_marker_ids=[self._think_end],
             tool_call_id=self._tool_call_tok,
             tool_call_end_id=self._tool_call_end_tok,
             arg_key_id=self._arg_key,
@@ -321,6 +321,19 @@ def bridge_to_next_turn(
         ):
             return None
 
+        # Faithfulness across a user-query boundary: the template drops a past
+        # block's thinking once a new user turn arrives (see
+        # reject_thinking_strip_in_extension).
+        if reject_thinking_strip_in_extension(
+            previous_prompt_ids,
+            previous_completion_ids,
+            new_messages,
+            thinking_retention=self.config.thinking_retention,
+            thinking_marker_ids=[self._think_end],
+            enable_thinking=self.config.enable_thinking,
+        ):
+            return None
+
         # Same next-turn-marker scheme as GLM-5, but role markers are
         # followed by a literal ``\n`` in the prompt text.
         previous_ids = list(previous_prompt_ids) + list(previous_completion_ids)
diff --git a/renderers/glm5.py b/renderers/glm5.py
index 924d754..ad215e5 100644
--- a/renderers/glm5.py
+++ b/renderers/glm5.py
@@ -24,6 +24,7 @@
     attribute_text_segments,
     extract_message_tool_names,
     reject_assistant_in_extension,
+    reject_thinking_strip_in_extension,
     should_preserve_past_thinking,
 )
 from renderers.configs import GLM5RendererConfig, GLM51RendererConfig
@@ -241,8 +242,7 @@ def emit_text_segments(
                 preserve_thinking = should_preserve_past_thinking(
                     messages,
                     i,
-                    preserve_all_thinking=self.config.preserve_all_thinking,
-                    preserve_thinking_between_tool_calls=self.config.preserve_thinking_between_tool_calls,
+                    thinking_retention=self.config.thinking_retention,
                 )
                 self._render_assistant(
                     msg,
@@ -309,7 +309,7 @@ def parse_response(
             token_ids,
             stop_ids={self._endoftext, self._user, self._observation},
             think_id=self._think,
-            think_end_id=self._think_end,
+            thinking_marker_ids=[self._think_end],
             tool_call_id=self._tool_call_tok,
             tool_call_end_id=self._tool_call_end_tok,
             arg_key_id=self._arg_key,
@@ -337,6 +337,25 @@ def bridge_to_next_turn(
         ):
             return None
 
+        # Faithfulness across a user-query boundary: the template drops a past
+        # block's thinking once a new user turn arrives (see
+        # reject_thinking_strip_in_extension). ``clear_thinking=False`` keeps
+        # all past thinking (≡ thinking_retention="all"), so the bridge stays
+        # faithful carrying it verbatim — fold it into the effective level so
+        # we don't over-decline and re-tokenize sampled thinking.
+        retention = (
+            "all" if not self.config.clear_thinking else self.config.thinking_retention
+        )
+        if reject_thinking_strip_in_extension(
+            previous_prompt_ids,
+            previous_completion_ids,
+            new_messages,
+            thinking_retention=retention,
+            thinking_marker_ids=[self._think_end],
+            enable_thinking=self.config.enable_thinking,
+        ):
+            return None
+
         # GLM has no per-turn close token. An assistant turn ends when the
         # next turn's role marker appears, OR the model emits <|endoftext|>.
         # vLLM includes these in ``stop_token_ids`` so a clean stop leaves
@@ -501,7 +520,7 @@ def _render_assistant(
         # ``preserve_thinking`` is the override output of
         # ``should_preserve_past_thinking`` — it adds historical assistants
         # back when the renderer was constructed with
-        # ``preserve_all_thinking=True``. ``clear_thinking=False`` mirrors
+        # ``thinking_retention="all"``. ``clear_thinking=False`` mirrors
         # the template's per-call ``clear_thinking is defined and not
         # clear_thinking`` gate: a chat_template_kwarg surface for the
         # same behaviour, gated explicitly by the caller per render.
diff --git a/renderers/gpt_oss.py b/renderers/gpt_oss.py
index 2a9c5ca..9dbc6b7 100644
--- a/renderers/gpt_oss.py
+++ b/renderers/gpt_oss.py
@@ -58,6 +58,7 @@
     ToolSpec,
     extract_message_tool_names,
     reject_assistant_in_extension,
+    reject_thinking_strip_in_extension,
     should_preserve_past_thinking,
     trim_to_turn_close,
 )
@@ -154,6 +155,14 @@ def __init__(
         self._channel = self._token_id("<|channel|>")
         self._message = self._token_id("<|message|>")
         self._constrain = self._token_id("<|constrain|>")
+        # Harmony marks reasoning as an analysis channel:
+        # ``<|channel|>analysis<|message|>``. Its presence in prior tokens
+        # signals a sampled thinking block the template strips from history.
+        self._analysis_marker_ids = [
+            self._channel,
+            *self._encode("analysis"),
+            self._message,
+        ]
 
     # ── token utilities ──────────────────────────────────────────────────────
 
@@ -425,8 +434,7 @@ def emit_harmony_message(
                 should_preserve_past_thinking(
                     messages,
                     i,
-                    preserve_all_thinking=self.config.preserve_all_thinking,
-                    preserve_thinking_between_tool_calls=self.config.preserve_thinking_between_tool_calls,
+                    thinking_retention=self.config.thinking_retention,
                 )
             )
             for hm in self._to_harmony_messages(
@@ -524,6 +532,20 @@ def bridge_to_next_turn(
         ):
             return None
 
+        # Faithfulness across a user-query boundary: harmony strips the
+        # analysis (reasoning) channel from history once a new user turn
+        # arrives, so the bridge can't carry it verbatim there (see
+        # reject_thinking_strip_in_extension). gpt-oss always reasons.
+        if reject_thinking_strip_in_extension(
+            previous_prompt_ids,
+            previous_completion_ids,
+            new_messages,
+            thinking_retention=self.config.thinking_retention,
+            thinking_marker_ids=self._analysis_marker_ids,
+            enable_thinking=True,
+        ):
+            return None
+
         previous_ids = trim_to_turn_close(
             previous_prompt_ids,
             previous_completion_ids,
diff --git a/renderers/kimi_k2.py b/renderers/kimi_k2.py
index e99dfa7..8cdc548 100644
--- a/renderers/kimi_k2.py
+++ b/renderers/kimi_k2.py
@@ -38,9 +38,9 @@ class KimiK2Renderer:
 
     Kimi K2's chat template doesn't read any thinking-related variable —
     ``content`` renders verbatim with no reasoning branch. The
-    ``enable_thinking`` / ``preserve_*`` fields on the config are stored
-    for protocol uniformity with the rest of the renderer family but
-    have no effect on the byte-level output.
+    ``enable_thinking`` / ``thinking_retention`` fields on the config are
+    stored for protocol uniformity with the rest of the renderer family
+    but have no effect on the byte-level output.
     """
 
     def __init__(
diff --git a/renderers/kimi_k25.py b/renderers/kimi_k25.py
index bca4464..e90a9ba 100644
--- a/renderers/kimi_k25.py
+++ b/renderers/kimi_k25.py
@@ -38,6 +38,7 @@
     ToolSpec,
     extract_message_tool_names,
     reject_assistant_in_extension,
+    reject_thinking_strip_in_extension,
     should_preserve_past_thinking,
     trim_to_turn_close,
 )
@@ -888,8 +889,7 @@ def emit_image(
                 preserve_thinking = should_preserve_past_thinking(
                     messages,
                     i,
-                    preserve_all_thinking=self.config.preserve_all_thinking,
-                    preserve_thinking_between_tool_calls=self.config.preserve_thinking_between_tool_calls,
+                    thinking_retention=self.config.thinking_retention,
                 )
                 self._render_assistant_body(
                     msg,
@@ -1036,6 +1036,20 @@ def bridge_to_next_turn(
         ):
             return None
 
+        # Faithfulness across a user-query boundary: the template drops a past
+        # block's thinking once a new user turn arrives. ``</think>`` is
+        # multi-token here, so pass the full close subsequence (see
+        # reject_thinking_strip_in_extension).
+        if reject_thinking_strip_in_extension(
+            previous_prompt_ids,
+            previous_completion_ids,
+            new_messages,
+            thinking_retention=self.config.thinking_retention,
+            thinking_marker_ids=self._think_close_ids,
+            enable_thinking=self.config.thinking,
+        ):
+            return None
+
         close_ids: set[int] = {self._im_end}
         if self._endoftext is not None:
             close_ids.add(self._endoftext)
diff --git a/renderers/laguna_xs2.py b/renderers/laguna_xs2.py
index bd6b64f..71953b9 100644
--- a/renderers/laguna_xs2.py
+++ b/renderers/laguna_xs2.py
@@ -17,10 +17,8 @@
   a ``### Tools`` header with an ``<available_tools>`` listing and prose
   format instructions that vary on ``enable_thinking``).
 - Reasoning is rendered for every assistant message — no last-user-index
-  gating. ``preserve_all_thinking`` and
-  ``preserve_thinking_between_tool_calls`` are accepted for protocol
-  uniformity but are effectively no-ops since past reasoning is preserved
-  by default.
+  gating. ``thinking_retention`` is accepted for protocol uniformity but
+  is effectively a no-op since past reasoning is preserved by default.
 """
 
 from __future__ import annotations
diff --git a/renderers/llama_3.py b/renderers/llama_3.py
index d15792a..5dd2798 100644
--- a/renderers/llama_3.py
+++ b/renderers/llama_3.py
@@ -9,7 +9,7 @@
 Notable differences from the Qwen / GLM family renderers:
 
 * No ``<think>`` / reasoning channel — Llama-3 doesn't ship a
-  reasoning-content concept, so ``preserve_*_thinking`` flags don't
+  reasoning-content concept, so the ``thinking_retention`` flag doesn't
   apply.
 * ``<|begin_of_text|>`` (BOS) is emitted at the very start of every
   render. The chat template never omits it.
@@ -94,10 +94,10 @@ def __init__(
         tokenizer: PreTrainedTokenizer,
         config: Llama3RendererConfig | None = None,
     ):
-        # ``preserve_*_thinking`` are accepted but no-ops: Llama-3 ships no
+        # ``thinking_retention`` is accepted but a no-op: Llama-3 ships no
         # reasoning_content channel, so there's never any past-assistant
-        # thinking to retain or drop. The flags are stored on ``self.config``
-        # for cross-renderer uniformity but never change the token stream —
+        # thinking to retain or drop. The level is stored on ``self.config``
+        # for cross-renderer uniformity but never changes the token stream —
         # the same contract as Kimi-K2 / Qwen3-VL (see the never-preserves
         # renderers in tests/test_preserve_thinking.py).
         self._tokenizer = tokenizer
diff --git a/renderers/minimax_m2.py b/renderers/minimax_m2.py
index f990274..9c92353 100644
--- a/renderers/minimax_m2.py
+++ b/renderers/minimax_m2.py
@@ -240,8 +240,7 @@ def emit_token_overlap_body(
                 preserve_thinking = should_preserve_past_thinking(
                     messages,
                     orig_idx,
-                    preserve_all_thinking=self.config.preserve_all_thinking,
-                    preserve_thinking_between_tool_calls=self.config.preserve_thinking_between_tool_calls,
+                    thinking_retention=self.config.thinking_retention,
                 )
                 self._render_assistant(
                     msg,
diff --git a/renderers/nemotron3.py b/renderers/nemotron3.py
index c29129c..7f98c87 100644
--- a/renderers/nemotron3.py
+++ b/renderers/nemotron3.py
@@ -27,6 +27,7 @@
     attribute_text_segments,
     extract_message_tool_names,
     reject_assistant_in_extension,
+    reject_thinking_strip_in_extension,
     should_preserve_past_thinking,
     trim_to_turn_close,
 )
@@ -424,14 +425,13 @@ def emit_text_segments(
             elif role == "assistant":
                 # Template: ``include_content = not (truncate_history_thinking
                 # and loop.index0 < last_user_idx)``. The renderer-internal
-                # preserve_* overrides only ever *extend* retention, so OR them
-                # in (a preserved turn keeps its thinking even when the
-                # template default would drop it).
+                # ``thinking_retention`` override only ever *extends*
+                # retention, so OR it in (a preserved turn keeps its thinking
+                # even when the template default would drop it).
                 preserve_thinking = msg_orig_idx >= 0 and should_preserve_past_thinking(
                     original_messages,
                     msg_orig_idx,
-                    preserve_all_thinking=self.config.preserve_all_thinking,
-                    preserve_thinking_between_tool_calls=self.config.preserve_thinking_between_tool_calls,
+                    thinking_retention=self.config.thinking_retention,
                 )
                 include_content = (
                     not self.config.truncate_history_thinking
@@ -510,7 +510,7 @@ def parse_response(
             token_ids,
             stop_ids=stop_ids,
             think_id=self._think,
-            think_end_id=self._think_end,
+            thinking_marker_ids=[self._think_end],
             tool_call_id=self._tool_call,
             tool_call_end_id=self._tool_call_end,
             tools=tools,
@@ -542,6 +542,28 @@ def bridge_to_next_turn(
         ):
             return None
 
+        # Faithfulness across a user-query boundary: the template drops a past
+        # block's thinking once a new user turn arrives (see
+        # reject_thinking_strip_in_extension). ``truncate_history_thinking=
+        # False`` keeps all past thinking (≡ thinking_retention="all"), so the
+        # bridge stays faithful carrying it verbatim — fold it into the
+        # effective level so we don't over-decline and re-tokenize sampled
+        # thinking.
+        retention = (
+            "all"
+            if not self.config.truncate_history_thinking
+            else self.config.thinking_retention
+        )
+        if reject_thinking_strip_in_extension(
+            previous_prompt_ids,
+            previous_completion_ids,
+            new_messages,
+            thinking_retention=retention,
+            thinking_marker_ids=[self._think_end],
+            enable_thinking=self.config.enable_thinking,
+        ):
+            return None
+
         close_ids: set[int] = {self._im_end}
         if self._endoftext is not None:
             close_ids.add(self._endoftext)
@@ -704,8 +726,9 @@ def _assistant_body(
         """Assemble the assistant body string exactly as the chat template.
 
         ``include_content`` is the template's ``not (truncate_history_thinking
-        and loop.index0 < last_user_idx)`` (already OR-ed with the preserve_*
-        overrides by the caller): ``True`` keeps the full think+content block,
+        and loop.index0 < last_user_idx)`` (already OR-ed with the
+        ``thinking_retention`` override by the caller): ``True`` keeps the
+        full think+content block,
         ``False`` collapses historical thinking to an empty ``<think></think>``.
         """
         ultra = self._ultra
diff --git a/renderers/qwen3.py b/renderers/qwen3.py
index 9253b3d..19141a5 100644
--- a/renderers/qwen3.py
+++ b/renderers/qwen3.py
@@ -21,6 +21,7 @@
     attribute_text_segments,
     extract_message_tool_names,
     reject_assistant_in_extension,
+    reject_thinking_strip_in_extension,
     should_preserve_past_thinking,
     trim_to_turn_close,
 )
@@ -209,8 +210,7 @@ def emit_text_segments(
                 preserve_thinking = should_preserve_past_thinking(
                     messages,
                     i,
-                    preserve_all_thinking=self.config.preserve_all_thinking,
-                    preserve_thinking_between_tool_calls=self.config.preserve_thinking_between_tool_calls,
+                    thinking_retention=self.config.thinking_retention,
                 )
                 self._render_assistant(
                     msg,
@@ -298,6 +298,19 @@ def bridge_to_next_turn(
         ):
             return None
 
+        # Faithfulness across a user-query boundary: Qwen3's template drops a
+        # past block's thinking once a new user turn arrives, so the bridge
+        # can't carry it verbatim there (see reject_thinking_strip_in_extension).
+        if reject_thinking_strip_in_extension(
+            previous_prompt_ids,
+            previous_completion_ids,
+            new_messages,
+            thinking_retention=self.config.thinking_retention,
+            thinking_marker_ids=[self._think_end],
+            enable_thinking=self.config.enable_thinking,
+        ):
+            return None
+
         previous_ids = trim_to_turn_close(
             previous_prompt_ids,
             previous_completion_ids,
diff --git a/renderers/qwen35.py b/renderers/qwen35.py
index cdb8ee1..5878487 100644
--- a/renderers/qwen35.py
+++ b/renderers/qwen35.py
@@ -29,6 +29,7 @@
     attribute_text_segments,
     extract_message_tool_names,
     reject_assistant_in_extension,
+    reject_thinking_strip_in_extension,
     should_preserve_past_thinking,
     trim_to_turn_close,
 )
@@ -511,8 +512,7 @@ def flush_buf() -> None:
                 preserve_thinking = should_preserve_past_thinking(
                     messages,
                     i,
-                    preserve_all_thinking=self.config.preserve_all_thinking,
-                    preserve_thinking_between_tool_calls=self.config.preserve_thinking_between_tool_calls,
+                    thinking_retention=self.config.thinking_retention,
                 )
                 self._render_assistant(
                     msg,
@@ -594,7 +594,7 @@ def parse_response(
             token_ids,
             stop_ids={self._im_end, self._endoftext},
             think_id=self._think,
-            think_end_id=self._think_end,
+            thinking_marker_ids=[self._think_end],
             tool_call_id=self._tool_call,
             tool_call_end_id=self._tool_call_end,
             tools=tools,
@@ -619,6 +619,19 @@ def bridge_to_next_turn(
         ):
             return None
 
+        # Faithfulness across a user-query boundary: the template drops a past
+        # block's thinking once a new user turn arrives (see
+        # reject_thinking_strip_in_extension).
+        if reject_thinking_strip_in_extension(
+            previous_prompt_ids,
+            previous_completion_ids,
+            new_messages,
+            thinking_retention=self.config.thinking_retention,
+            thinking_marker_ids=[self._think_end],
+            enable_thinking=self.config.enable_thinking,
+        ):
+            return None
+
         previous_ids = trim_to_turn_close(
             previous_prompt_ids,
             previous_completion_ids,
diff --git a/renderers/qwen36.py b/renderers/qwen36.py
index 6adf867..938efd8 100644
--- a/renderers/qwen36.py
+++ b/renderers/qwen36.py
@@ -10,8 +10,7 @@
 Historical-thinking retention follows Qwen3.5's default (drop past
 ``<think>`` blocks). The upstream template carries a ``preserve_thinking``
 Jinja toggle for the opposite polarity; on the renderer side that intent
-maps to the renderer-agnostic ``preserve_all_thinking`` /
-``preserve_thinking_between_tool_calls`` flags on
+maps to the renderer-agnostic ``thinking_retention`` flag on
 :class:`renderers.Qwen36RendererConfig`.
 
 Everything else — tool system prompt, tool-call XML structure, thinking
diff --git a/renderers/qwen3_vl.py b/renderers/qwen3_vl.py
index 7b82d7e..f223ff9 100644
--- a/renderers/qwen3_vl.py
+++ b/renderers/qwen3_vl.py
@@ -302,9 +302,9 @@ class Qwen3VLRenderer:
             keyed off ``tokenizer.name_or_path`` the first time a
             multimodal part is seen.
 
-    ``preserve_all_thinking`` / ``preserve_thinking_between_tool_calls``
-    on the config are no-ops here — the chat template drops past
-    ``<think>`` blocks unconditionally. Stored for Protocol parity.
+    ``thinking_retention`` on the config is a no-op here — the chat
+    template drops past ``<think>`` blocks unconditionally. Stored for
+    Protocol parity.
     """
 
     def __init__(
diff --git a/tests/test_bridge.py b/tests/test_bridge.py
index 8b7bf77..714092c 100644
--- a/tests/test_bridge.py
+++ b/tests/test_bridge.py
@@ -67,7 +67,25 @@ def br_renderer(br_model, br_renderer_name):
     return _load(br_model, br_renderer_name)[1]
 
 
-def _simulate_prior_turn(renderer):
+@pytest.fixture
+def br_renderer_all(br_model, br_renderer_name):
+    """Renderer forced to ``thinking_retention="all"``.
+
+    The verbatim-extension mechanic tests below cross a user-query
+    boundary. For thinking models the template drops a past block's
+    thinking there, so the faithful bridge declines (covered by
+    ``test_bridge_declines_across_user_query_when_template_drops_thinking``).
+    ``"all"`` keeps thinking on every path, isolating the pure extension
+    mechanic from the retention policy across all renderers.
+    """
+    from renderers import create_renderer
+
+    tok, base = _load(br_model, br_renderer_name)
+    cfg = base.config.model_copy(update={"thinking_retention": "all"})
+    return create_renderer(tok, cfg)
+
+
+def _simulate_prior_turn(renderer, assistant=None):
     """Build a (prev_prompt, prev_completion) pair that a real rollout
     would produce for a one-turn prior with a clean stop.
 
@@ -76,13 +94,15 @@ def _simulate_prior_turn(renderer):
     gen_prompt, and take the diff as prev_completion. We then trim
     prev_completion to the last close token so it matches what vLLM
     actually hands back (vLLM stops at the close token and excludes the
-    trailing template scaffolding).
+    trailing template scaffolding). Pass ``assistant`` to override the
+    default no-thinking turn (e.g. one carrying ``reasoning_content``).
     """
     prior = [
         {"role": "system", "content": "You are helpful."},
         {"role": "user", "content": "Hi."},
     ]
-    assistant = [{"role": "assistant", "content": "Hello!"}]
+    if assistant is None:
+        assistant = [{"role": "assistant", "content": "Hello!"}]
 
     prev_prompt = renderer.render_ids(prior, add_generation_prompt=True)
     full_with_assistant = renderer.render_ids(
@@ -105,11 +125,11 @@ def _simulate_prior_turn(renderer):
     return prev_prompt, prev_completion
 
 
-def test_bridge_extends_prev_verbatim_on_clean_stop(br_renderer, br_model):
-    prev_prompt, prev_completion = _simulate_prior_turn(br_renderer)
+def test_bridge_extends_prev_verbatim_on_clean_stop(br_renderer_all, br_model):
+    prev_prompt, prev_completion = _simulate_prior_turn(br_renderer_all)
     new_messages = [{"role": "user", "content": "What's 2+2?"}]
 
-    bridged = br_renderer.bridge_to_next_turn(
+    bridged = br_renderer_all.bridge_to_next_turn(
         prev_prompt, prev_completion, new_messages
     )
     assert bridged is not None, f"{br_model}: bridge returned None on clean stop"
@@ -148,8 +168,8 @@ def test_bridge_rejects_empty_prev_or_new(br_renderer):
     assert br_renderer.bridge_to_next_turn(prev_prompt, prev_completion, []) is None
 
 
-def test_bridge_synthesises_close_on_truncation(br_renderer, br_model):
-    prev_prompt, prev_completion = _simulate_prior_turn(br_renderer)
+def test_bridge_synthesises_close_on_truncation(br_renderer_all, br_model):
+    prev_prompt, prev_completion = _simulate_prior_turn(br_renderer_all)
     # Drop the final close token to simulate a max_tokens truncation.
     prev_completion_trunc = prev_completion[:-1] if prev_completion else prev_completion
     if len(prev_completion_trunc) == 0:
@@ -157,7 +177,7 @@ def test_bridge_synthesises_close_on_truncation(br_renderer, br_model):
             f"{br_model}: simulated prior had no completion tokens — can't truncate"
         )
 
-    bridged = br_renderer.bridge_to_next_turn(
+    bridged = br_renderer_all.bridge_to_next_turn(
         prev_prompt,
         prev_completion_trunc,
         [{"role": "user", "content": "What's 2+2?"}],
@@ -176,12 +196,12 @@ def test_bridge_synthesises_close_on_truncation(br_renderer, br_model):
 
 
 def test_bridge_extension_includes_new_message_text(
-    br_renderer, br_tokenizer, br_model
+    br_renderer_all, br_tokenizer, br_model
 ):
-    prev_prompt, prev_completion = _simulate_prior_turn(br_renderer)
+    prev_prompt, prev_completion = _simulate_prior_turn(br_renderer_all)
     new_messages = [{"role": "user", "content": "HELLO_SENTINEL_XYZ"}]
 
-    bridged = br_renderer.bridge_to_next_turn(
+    bridged = br_renderer_all.bridge_to_next_turn(
         prev_prompt, prev_completion, new_messages
     )
     assert bridged is not None
@@ -190,3 +210,131 @@ def test_bridge_extension_includes_new_message_text(
     assert "HELLO_SENTINEL_XYZ" in decoded, (
         f"{br_model}: new-message content missing from extension; got {decoded!r}"
     )
+
+
+def test_bridge_declines_across_user_query_when_template_drops_thinking():
+    """Qwen3's template drops a past block's thinking once a new user turn
+    arrives. The bridge carries prior tokens verbatim, so spanning that
+    boundary would keep thinking the template strips. The faithful bridge
+    must DECLINE there (the caller then re-renders, byte-faithfully) — but
+    only when it actually matters:
+
+      - new user query + prior thinking + retention!="all" -> decline,
+        and the fallback re-render equals ``apply_chat_template``.
+      - thinking_retention="all" keeps thinking on every path -> extend.
+      - a tool response (in-flight cycle, no new query) keeps thinking in
+        the template too -> extend.
+      - no prior thinking -> nothing to strip -> extend.
+    """
+    from renderers import create_renderer
+    from renderers.base import load_tokenizer
+    from renderers.configs import Qwen3RendererConfig
+
+    tok = load_tokenizer("Qwen/Qwen3-8B")
+    im_end = tok.convert_tokens_to_ids("<|im_end|>")
+
+    u1 = {"role": "user", "content": "What is 2+2?"}
+    u2 = {"role": "user", "content": "Multiply that by 3."}
+    tool = {"role": "tool", "content": "ok"}
+    think = "<think>\n2 plus 2 is 4.\n</think>\n\n4"
+
+    def prior(r, asst_text):
+        p = r.render_ids([u1], add_generation_prompt=True)
+        completion = tok.encode(asst_text, add_special_tokens=False) + [im_end]
+        return p, completion
+
+    # new user query + prior thinking + default retention -> decline
+    r = create_renderer(tok, Qwen3RendererConfig())
+    p, comp = prior(r, think)
+    assert r.bridge_to_next_turn(p, comp, [u2]) is None
+    # ...and the caller's faithful re-render matches the chat template
+    hist = [u1, {"role": "assistant", "content": think}, u2]
+    rendered = tok.decode(r.render_ids(hist, add_generation_prompt=True))
+    assert rendered == tok.apply_chat_template(
+        hist, tokenize=False, add_generation_prompt=True
+    )
+
+    # thinking_retention="all" keeps thinking everywhere -> extend
+    r_all = create_renderer(tok, Qwen3RendererConfig(thinking_retention="all"))
+    p, comp = prior(r_all, think)
+    assert r_all.bridge_to_next_turn(p, comp, [u2]) is not None
+
+    # tool response continues the in-flight cycle (no new query) -> extend
+    p, comp = prior(r, think)
+    assert r.bridge_to_next_turn(p, comp, [tool]) is not None
+
+    # no prior thinking -> nothing to strip -> extend even across a user turn
+    p, comp = prior(r, "4")
+    assert r.bridge_to_next_turn(p, comp, [u2]) is not None
+
+
+# Renderers wired with the bridge faithfulness guard. Detection markers
+# vary: single ``</think>`` token (Qwen3, GLM, Nemotron), multi-token
+# ``</think>`` (Kimi-K2.5), or harmony analysis-channel header (gpt-oss).
+# minimax has no bridge; non-thinking models (llama, deepseek-v3) are out
+# of scope.
+_GUARDED_THINKING_MODELS = {
+    "Qwen/Qwen3-8B",
+    "Qwen/Qwen3.5-9B",
+    "Qwen/Qwen3.6-35B-A3B",
+    "zai-org/GLM-5",
+    "zai-org/GLM-5.1",
+    "THUDM/GLM-4.5-Air",
+    "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
+    "moonshotai/Kimi-K2.5",
+    "openai/gpt-oss-20b",
+}
+
+
+def test_bridge_declines_across_user_turn_when_thinking_present(br_renderer, br_model):
+    """Across every guarded thinking renderer: a prior turn carrying
+    ``reasoning_content`` makes the bridge decline a new *user* query (the
+    template would strip that thinking) but still extend a *tool* response
+    (in-flight cycle keeps it). Non-guarded models are skipped."""
+    if br_model not in _GUARDED_THINKING_MODELS:
+        pytest.skip(f"{br_model}: no bridge thinking-guard")
+
+    asst = [
+        {"role": "assistant", "reasoning_content": "Let me think.", "content": "Hello!"}
+    ]
+    prev_prompt, prev_completion = _simulate_prior_turn(br_renderer, asst)
+
+    declined = br_renderer.bridge_to_next_turn(
+        prev_prompt, prev_completion, [{"role": "user", "content": "next"}]
+    )
+    assert declined is None, f"{br_model}: expected faithful decline across a user turn"
+
+    extended = br_renderer.bridge_to_next_turn(
+        prev_prompt, prev_completion, [{"role": "tool", "content": "result"}]
+    )
+    assert extended is not None, f"{br_model}: should still bridge within a tool cycle"
+
+
+def test_bridge_keeps_thinking_when_history_kwarg_disables_truncation():
+    """GLM ``clear_thinking=False`` / Nemotron ``truncate_history_thinking=
+    False`` keep all past thinking (the template doesn't strip it), so the
+    bridge must NOT decline across a user turn — declining would re-render and
+    re-tokenize model-sampled thinking bytes."""
+    from renderers import create_renderer
+    from renderers.base import load_tokenizer
+    from renderers.configs import GLM5RendererConfig, Nemotron3RendererConfig
+
+    asst = [
+        {"role": "assistant", "reasoning_content": "Let me think.", "content": "Hi"}
+    ]
+    cases = [
+        ("zai-org/GLM-5", GLM5RendererConfig(clear_thinking=False)),
+        (
+            "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
+            Nemotron3RendererConfig(truncate_history_thinking=False),
+        ),
+    ]
+    for model, cfg in cases:
+        r = create_renderer(load_tokenizer(model), cfg)
+        prev_prompt, prev_completion = _simulate_prior_turn(r, asst)
+        bridged = r.bridge_to_next_turn(
+            prev_prompt, prev_completion, [{"role": "user", "content": "next"}]
+        )
+        assert bridged is not None, (
+            f"{model}: bridge must keep verbatim when the template keeps all thinking"
+        )
diff --git a/tests/test_llama_3.py b/tests/test_llama_3.py
index c6e4c75..8c16232 100644
--- a/tests/test_llama_3.py
+++ b/tests/test_llama_3.py
@@ -80,8 +80,8 @@ def test_default_date_matches_chat_template_strftime_fallback(llama_pair):
 
 
 def test_preserve_thinking_flags_are_noops(llama_pair):
-    """Llama-3 has no reasoning channel, so the ``preserve_*_thinking``
-    flags are accepted but never change the token stream — the same
+    """Llama-3 has no reasoning channel, so any ``thinking_retention``
+    level is accepted but never changes the token stream — the same
     never-preserves contract as Kimi-K2 / Qwen3-VL. (Cross-renderer
     coverage lives in tests/test_preserve_thinking.py.)"""
     _, _, tok, _ = llama_pair
@@ -94,10 +94,12 @@ def test_preserve_thinking_flags_are_noops(llama_pair):
         },
     ]
     base = Llama3Renderer(tok).render_ids(msgs)
-    for flag in ("preserve_all_thinking", "preserve_thinking_between_tool_calls"):
-        r = Llama3Renderer(tok, Llama3RendererConfig(**{flag: True}))
-        assert r.config.__getattribute__(flag) is True
-        assert r.render_ids(msgs) == base, f"{flag} must be a no-op for Llama-3"
+    for level in ("tool_cycle", "all"):
+        r = Llama3Renderer(tok, Llama3RendererConfig(thinking_retention=level))
+        assert r.config.thinking_retention == level
+        assert r.render_ids(msgs) == base, (
+            f"thinking_retention={level!r} must be a no-op for Llama-3"
+        )
 
 
 # ---------------------------------------------------------------------------
diff --git a/tests/test_preserve_thinking.py b/tests/test_preserve_thinking.py
index daa4836..3d3a372 100644
--- a/tests/test_preserve_thinking.py
+++ b/tests/test_preserve_thinking.py
@@ -1,16 +1,17 @@
-"""Smoke coverage for the ``preserve_*_thinking`` override flags.
+"""Smoke coverage for the ``thinking_retention`` override flag.
 
-Flags live on the typed renderer config (e.g.
-``Qwen3RendererConfig(preserve_all_thinking=True)``) and are stored on
-the renderer as ``self.config.preserve_*``. Each test that wants a
-non-default flag builds a fresh renderer for that configuration via
+The flag lives on the typed renderer config (e.g.
+``Qwen3RendererConfig(thinking_retention="all")``) and is stored on the
+renderer as ``self.config.thinking_retention``. Each test that wants a
+non-default level builds a fresh renderer for that configuration via
 ``_make`` below.
 
 Two invariants per renderer:
 
-1. Default render (no flags) is byte-identical to the existing
-   ``apply_chat_template`` parity baseline — covered exhaustively elsewhere.
-2. Setting either flag never *removes* tokens compared to the default and,
+1. Default render (``thinking_retention="template"``) is byte-identical to
+   the existing ``apply_chat_template`` parity baseline — covered
+   exhaustively elsewhere.
+2. Raising the level never *removes* tokens compared to the default and,
    for renderers whose template would drop past-asst thinking, actually
    adds tokens for a conversation containing past-asst ``reasoning_content``.
 
@@ -31,7 +32,7 @@
 
 
 def _make(tokenizer, renderer_name, **flags):
-    """Build a fresh renderer with the given preserve_*_thinking flags
+    """Build a fresh renderer with the given thinking_retention level
     bound at construction. Reuses the cached tokenizer fixture."""
     if renderer_name == "auto":
         renderer_name = MODEL_RENDERER_MAP.get(
@@ -51,8 +52,8 @@ def _make(tokenizer, renderer_name, **flags):
     "Qwen/Qwen3-VL-8B-Instruct",
     "Qwen/Qwen3-VL-30B-A3B-Instruct",
     "poolside/Laguna-XS.2",
-    # Llama-3 has no reasoning channel at all — preserve flags can't add
-    # or drop anything, so they're pure no-ops.
+    # Llama-3 has no reasoning channel at all — thinking_retention can't
+    # add or drop anything, so it's a pure no-op.
     "meta-llama/Llama-3.2-1B-Instruct",
     "meta-llama/Llama-3.2-3B-Instruct",
     "unsloth/Llama-3.2-1B-Instruct",
@@ -80,7 +81,7 @@ def _make(tokenizer, renderer_name, **flags):
 
 
 def test_should_preserve_past_thinking_classification():
-    # CURRENT-block-only behaviour. between_tool_calls preserves thinking
+    # CURRENT-block-only behaviour. "tool_cycle" preserves thinking
     # ONLY for asst messages that sit AFTER the last user turn AND are in
     # a segment that contains a tool. Anything before the last user turn
     # falls back to template default (typically dropped).
@@ -100,44 +101,38 @@ def test_should_preserve_past_thinking_classification():
     assert should_preserve_past_thinking(
         live_cycle,
         1,
-        preserve_all_thinking=False,
-        preserve_thinking_between_tool_calls=True,
+        thinking_retention="tool_cycle",
     )
     assert should_preserve_past_thinking(
         live_cycle,
         3,
-        preserve_all_thinking=False,
-        preserve_thinking_between_tool_calls=True,
+        thinking_retention="tool_cycle",
     )
 
     # Same shape with a NEW user appended → now the prior tool block is
-    # "older" and between_tool_calls must drop its thinking (template
-    # default). Only preserve_all_thinking would keep them.
+    # "older" and "tool_cycle" must drop its thinking (template default).
+    # Only thinking_retention="all" would keep them.
     closed_cycle = live_cycle + [{"role": "user", "content": "next"}]
     assert not should_preserve_past_thinking(
         closed_cycle,
         1,
-        preserve_all_thinking=False,
-        preserve_thinking_between_tool_calls=True,
+        thinking_retention="tool_cycle",
     )
     assert not should_preserve_past_thinking(
         closed_cycle,
         3,
-        preserve_all_thinking=False,
-        preserve_thinking_between_tool_calls=True,
+        thinking_retention="tool_cycle",
     )
-    # preserve_all_thinking still keeps them.
+    # thinking_retention="all" still keeps them.
     assert should_preserve_past_thinking(
         closed_cycle,
         1,
-        preserve_all_thinking=True,
-        preserve_thinking_between_tool_calls=False,
+        thinking_retention="all",
     )
     assert should_preserve_past_thinking(
         closed_cycle,
         3,
-        preserve_all_thinking=True,
-        preserve_thinking_between_tool_calls=False,
+        thinking_retention="all",
     )
 
     # Current segment without a tool → not a tool cycle → not preserved.
@@ -148,37 +143,35 @@ def test_should_preserve_past_thinking_classification():
     assert not should_preserve_past_thinking(
         no_tool_yet,
         1,
-        preserve_all_thinking=False,
-        preserve_thinking_between_tool_calls=True,
+        thinking_retention="tool_cycle",
     )
 
-    # Both flags False → always False.
+    # thinking_retention="template" → always False.
     assert not should_preserve_past_thinking(
         live_cycle,
         1,
-        preserve_all_thinking=False,
-        preserve_thinking_between_tool_calls=False,
+        thinking_retention="template",
     )
 
 
-def test_preserve_flags_default_unchanged(
+def test_thinking_retention_template_unchanged(
     model_name, tokenizer, renderer_name, renderer
 ):
-    # A renderer constructed with both flags explicitly ``False`` must
-    # produce byte-identical output to one constructed with the defaults.
+    # A renderer constructed with thinking_retention="template" explicitly
+    # must produce byte-identical output to one constructed with the defaults.
     bare = renderer.render_ids(CONVERSATION)
     explicit_off = _make(
         tokenizer,
         renderer_name,
-        preserve_all_thinking=False,
-        preserve_thinking_between_tool_calls=False,
+        thinking_retention="template",
     ).render_ids(CONVERSATION)
     assert bare == explicit_off, (
-        f"{model_name}: explicit flags=False must equal bare default render"
+        f"{model_name}: explicit thinking_retention='template' must equal "
+        f"bare default render"
     )
 
 
-def test_preserve_all_thinking_grows_or_no_op(
+def test_thinking_retention_all_grows_or_no_op(
     model_name, tokenizer, renderer_name, renderer
 ):
     from renderers.default import DefaultRenderer
@@ -186,37 +179,37 @@ def test_preserve_all_thinking_grows_or_no_op(
     if isinstance(renderer, DefaultRenderer):
         pytest.skip("DefaultRenderer raises on these flags — covered separately")
     default = renderer.render_ids(CONVERSATION)
-    preserved = _make(tokenizer, renderer_name, preserve_all_thinking=True).render_ids(
+    preserved = _make(tokenizer, renderer_name, thinking_retention="all").render_ids(
         CONVERSATION
     )
 
     if model_name in NO_OP_MODELS:
         assert preserved == default, (
-            f"{model_name} is a no-op renderer; preserve_all_thinking must "
+            f"{model_name} is a no-op renderer; thinking_retention='all' must "
             f"not change output (got {len(default)} → {len(preserved)})"
         )
     else:
         assert len(preserved) > len(default), (
-            f"{model_name}: preserve_all_thinking should add tokens for a "
+            f"{model_name}: thinking_retention='all' should add tokens for a "
             f"conversation with past-asst reasoning_content "
             f"(default={len(default)}, preserved={len(preserved)})"
         )
 
 
-def test_preserve_between_tool_calls_strict_subset(
+def test_thinking_retention_tool_cycle_strict_subset(
     model_name, tokenizer, renderer_name, renderer
 ):
-    """``preserve_thinking_between_tool_calls`` is strictly weaker than
-    ``preserve_all_thinking``: token count satisfies default <= between <= all."""
+    """``thinking_retention="tool_cycle"`` is strictly weaker than
+    ``"all"``: token count satisfies default <= tool_cycle <= all."""
     from renderers.default import DefaultRenderer
 
     if isinstance(renderer, DefaultRenderer):
         pytest.skip("DefaultRenderer raises on these flags — covered separately")
     default = renderer.render_ids(CONVERSATION)
     between = _make(
-        tokenizer, renderer_name, preserve_thinking_between_tool_calls=True
+        tokenizer, renderer_name, thinking_retention="tool_cycle"
     ).render_ids(CONVERSATION)
-    all_ = _make(tokenizer, renderer_name, preserve_all_thinking=True).render_ids(
+    all_ = _make(tokenizer, renderer_name, thinking_retention="all").render_ids(
         CONVERSATION
     )
     assert len(default) <= len(between) <= len(all_), (
@@ -244,28 +237,28 @@ def test_preserve_between_tool_calls_strict_subset(
 ]
 
 
-def test_preserve_btc_on_live_cycle_matches_all(
+def test_thinking_retention_tool_cycle_matches_all_on_live_cycle(
     model_name, tokenizer, renderer_name, renderer
 ):
     """In a live tool cycle (no trailing user), every past-asst sits in
-    the current tool-bearing segment. ``preserve_thinking_between_tool_calls``
+    the current tool-bearing segment. ``thinking_retention="tool_cycle"``
     should preserve all of their thinking — same set of asst messages as
-    ``preserve_all_thinking``, so the resulting token sequences must be
+    ``"all"``, so the resulting token sequences must be
     identical (independent of which template-default condition each
     renderer uses internally)."""
     from renderers.default import DefaultRenderer
 
     if isinstance(renderer, DefaultRenderer):
         pytest.skip("DefaultRenderer raises on these flags — covered separately")
-    btc = _make(
-        tokenizer, renderer_name, preserve_thinking_between_tool_calls=True
-    ).render_ids(LIVE_TOOL_CYCLE)
-    all_ = _make(tokenizer, renderer_name, preserve_all_thinking=True).render_ids(
+    btc = _make(tokenizer, renderer_name, thinking_retention="tool_cycle").render_ids(
+        LIVE_TOOL_CYCLE
+    )
+    all_ = _make(tokenizer, renderer_name, thinking_retention="all").render_ids(
         LIVE_TOOL_CYCLE
     )
     assert btc == all_, (
-        f"{model_name}: in a live tool cycle btc must match preserve_all "
-        f"(got len(btc)={len(btc)}, len(all)={len(all_)})"
+        f"{model_name}: in a live tool cycle tool_cycle must match all "
+        f"(got len(tool_cycle)={len(btc)}, len(all)={len(all_)})"
     )
 
 
@@ -325,17 +318,17 @@ def test_preserve_btc_on_live_cycle_matches_all(
     "Qwen/Qwen3-VL-8B-Instruct",
     "Qwen/Qwen3-VL-30B-A3B-Instruct",
     # Llama-3 ships no <think> rendering path, so reasoning_content never
-    # surfaces in the output regardless of the preserve flags.
+    # surfaces in the output regardless of thinking_retention.
     "meta-llama/Llama-3.2-1B-Instruct",
     "meta-llama/Llama-3.2-3B-Instruct",
     "unsloth/Llama-3.2-1B-Instruct",
 }
 
 
-def test_preserve_all_thinking_emits_every_asst_reasoning(
+def test_thinking_retention_all_emits_every_asst_reasoning(
     model_name, tokenizer, renderer_name, renderer
 ):
-    """``preserve_all_thinking=True`` must surface every past-asst's
+    """``thinking_retention="all"`` must surface every past-asst's
     ``reasoning_content`` in the decoded output — for renderers that
     have any pathway to render reasoning at all."""
     from renderers.default import DefaultRenderer
@@ -343,7 +336,7 @@ def test_preserve_all_thinking_emits_every_asst_reasoning(
     if isinstance(renderer, DefaultRenderer):
         pytest.skip("DefaultRenderer raises on these flags — covered separately")
 
-    ids = _make(tokenizer, renderer_name, preserve_all_thinking=True).render_ids(
+    ids = _make(tokenizer, renderer_name, thinking_retention="all").render_ids(
         TWO_BLOCK_CONV, tools=TWO_BLOCK_TOOLS
     )
     text = tokenizer.decode(ids)
@@ -352,38 +345,38 @@ def test_preserve_all_thinking_emits_every_asst_reasoning(
         for sentinel in ALL_SENTINELS:
             assert sentinel not in text, (
                 f"{model_name}: never-preserves renderer leaked {sentinel} "
-                f"under preserve_all_thinking"
+                f"under thinking_retention='all'"
             )
     else:
         for sentinel in ALL_SENTINELS:
             assert sentinel in text, (
-                f"{model_name}: preserve_all_thinking did not emit {sentinel} "
+                f"{model_name}: thinking_retention='all' did not emit {sentinel} "
                 f"in decoded output"
             )
 
 
-def test_preserve_btc_emits_current_block_reasoning(
+def test_thinking_retention_tool_cycle_emits_current_block_reasoning(
     model_name, tokenizer, renderer_name, renderer
 ):
-    """``preserve_thinking_between_tool_calls=True`` must surface the
-    current (post-last-user) tool block's reasoning. Older blocks fall
-    back to template default, which varies per renderer — no universal
-    assertion there."""
+    """``thinking_retention="tool_cycle"`` must surface the current
+    (post-last-user) tool block's reasoning. Older blocks fall back to
+    template default, which varies per renderer — no universal assertion
+    there."""
     from renderers.default import DefaultRenderer
 
     if isinstance(renderer, DefaultRenderer):
         pytest.skip("DefaultRenderer raises on these flags — covered separately")
 
-    ids = _make(
-        tokenizer, renderer_name, preserve_thinking_between_tool_calls=True
-    ).render_ids(TWO_BLOCK_CONV, tools=TWO_BLOCK_TOOLS)
+    ids = _make(tokenizer, renderer_name, thinking_retention="tool_cycle").render_ids(
+        TWO_BLOCK_CONV, tools=TWO_BLOCK_TOOLS
+    )
     text = tokenizer.decode(ids)
 
     if model_name in NEVER_PRESERVES_MODELS:
         for sentinel in ALL_SENTINELS:
             assert sentinel not in text, (
                 f"{model_name}: never-preserves renderer leaked {sentinel} "
-                f"under preserve_thinking_between_tool_calls"
+                f"under thinking_retention='tool_cycle'"
             )
     else:
         for sentinel in CURRENT_BLOCK_SENTINELS:
@@ -393,23 +386,23 @@ def test_preserve_btc_emits_current_block_reasoning(
             )
 
 
-def test_default_renderer_raises_on_flags():
+def test_default_renderer_raises_above_template():
     """``DefaultRenderer`` falls back to apply_chat_template with no
-    selective re-emit pathway, so constructing one with either flag set
-    must raise — fail fast, before any render is attempted."""
+    selective re-emit pathway, so constructing one with thinking_retention
+    above ``"template"`` must raise — fail fast, before any render."""
     from renderers import DefaultRendererConfig
     from renderers.base import load_tokenizer
 
     tok = load_tokenizer("Qwen/Qwen2.5-0.5B-Instruct")
-    # No flags → constructs cleanly.
+    # Default "template" → constructs cleanly.
     create_renderer(tok, DefaultRendererConfig())
-    # Either flag set → raises at construction.
+    # Any level above "template" → raises at construction.
     with pytest.raises(NotImplementedError):
-        create_renderer(tok, DefaultRendererConfig(preserve_all_thinking=True))
+        create_renderer(tok, DefaultRendererConfig(thinking_retention="all"))
     with pytest.raises(NotImplementedError):
         create_renderer(
             tok,
-            DefaultRendererConfig(preserve_thinking_between_tool_calls=True),
+            DefaultRendererConfig(thinking_retention="tool_cycle"),
         )
 
 
@@ -425,21 +418,16 @@ def test_create_renderer_records_flag_state(model_name, renderer_name, tokenizer
     from renderers.default import DefaultRenderer
 
     bare = _make(tokenizer, renderer_name)
-    assert bare.config.preserve_all_thinking is False
-    assert bare.config.preserve_thinking_between_tool_calls is False
+    assert bare.config.thinking_retention == "template"
 
     if not isinstance(bare, DefaultRenderer):
-        # DefaultRenderer raises at construction with either flag set —
+        # DefaultRenderer raises at construction above "template" —
         # covered by ``test_default_renderer_raises_on_flags``.
-        all_on = _make(tokenizer, renderer_name, preserve_all_thinking=True)
-        assert all_on.config.preserve_all_thinking is True
-        assert all_on.config.preserve_thinking_between_tool_calls is False
+        all_on = _make(tokenizer, renderer_name, thinking_retention="all")
+        assert all_on.config.thinking_retention == "all"
 
-        btc_on = _make(
-            tokenizer, renderer_name, preserve_thinking_between_tool_calls=True
-        )
-        assert btc_on.config.preserve_all_thinking is False
-        assert btc_on.config.preserve_thinking_between_tool_calls is True
+        btc_on = _make(tokenizer, renderer_name, thinking_retention="tool_cycle")
+        assert btc_on.config.thinking_retention == "tool_cycle"
 
 
 # ---------------------------------------------------------------------------
@@ -467,8 +455,8 @@ def test_glm5_config_accepts_clear_thinking():
 def test_qwen36_config_rejects_unknown_field():
     """``preserve_thinking`` on Qwen3.6 was a chat-template-kwarg
     pass-through in an earlier revision. It is superseded by the
-    renderer-agnostic ``preserve_all_thinking`` override and must not
-    appear on the typed config — its default-False semantics are now
+    renderer-agnostic ``thinking_retention`` override and must not
+    appear on the typed config — its default semantics are now
     inherited from Qwen3.5's render gate. ``extra="forbid"`` on the
     pydantic model enforces this at construction."""
     from renderers import Qwen36RendererConfig
diff --git a/tests/test_renderer_config.py b/tests/test_renderer_config.py
index 4bc0a31..0fe951c 100644
--- a/tests/test_renderer_config.py
+++ b/tests/test_renderer_config.py
@@ -85,7 +85,7 @@ def __init__(self, tokenizer, config):
 def test_create_renderer_auto_resolves_via_model_map(monkeypatch):
     """``AutoRendererConfig`` (or ``config=None``) routes through
     ``MODEL_RENDERER_MAP`` to pick the matching renderer + typed config,
-    carrying the shared ``preserve_*`` flags over from the auto config."""
+    carrying the shared ``thinking_retention`` field over from the auto config."""
 
     class _FakeQwen35:
         def __init__(self, tokenizer, config):
@@ -97,13 +97,13 @@ def __init__(self, tokenizer, config):
 
     renderer = create_renderer(
         SimpleNamespace(name_or_path="fake/qwen35"),
-        AutoRendererConfig(preserve_all_thinking=True),
+        AutoRendererConfig(thinking_retention="all"),
     )
 
     assert isinstance(renderer.config, Qwen35RendererConfig)
-    assert renderer.config.preserve_all_thinking is True
+    assert renderer.config.thinking_retention == "all"
     # Template-level kwargs stay at their per-renderer defaults — auto
-    # carries only the preserve_* flags.
+    # carries only the thinking_retention flag.
     assert renderer.config.add_vision_id is False
 
 
@@ -114,3 +114,42 @@ def test_create_renderer_default_argument_is_auto():
     renderer = create_renderer(tok)
     # Falls through to DefaultRenderer when no match and no vision config.
     assert renderer.__class__.__name__ == "DefaultRenderer"
+
+
+def test_thinking_retention_conflict_raises():
+    """``clear_thinking`` / ``truncate_history_thinking`` are byte-equivalent
+    to ``thinking_retention`` (``False`` ≡ ``"all"``). Explicitly setting both
+    to disagreeing values raises; a consistent pair, or a single field (the
+    parity matrix sets one at a time), is fine."""
+    from renderers import Nemotron3RendererConfig
+
+    # explicit conflict: keep-all kwarg vs a non-"all" retention
+    with pytest.raises(ValidationError, match="same knob"):
+        GLM5RendererConfig(clear_thinking=False, thinking_retention="template")
+    with pytest.raises(ValidationError, match="same knob"):
+        GLM5RendererConfig(clear_thinking=False, thinking_retention="tool_cycle")
+    with pytest.raises(ValidationError, match="same knob"):
+        Nemotron3RendererConfig(
+            truncate_history_thinking=False, thinking_retention="template"
+        )
+
+    # consistent (both keep all), single-field, and the normal override
+    # (template-default drop + retention="all") are all accepted.
+    GLM5RendererConfig(clear_thinking=False, thinking_retention="all")
+    GLM5RendererConfig(clear_thinking=False)
+    GLM5RendererConfig(thinking_retention="tool_cycle")
+    GLM5RendererConfig(clear_thinking=True, thinking_retention="all")
+
+
+def test_default_renderer_config_rejects_legacy_preserve_flags():
+    """``DefaultRendererConfig`` is ``extra="allow"``, so the removed
+    ``preserve_*`` bools would otherwise slip into ``model_extra`` and be
+    forwarded to ``apply_chat_template`` silently. A validator rejects them
+    with a migration message; genuine Jinja kwargs still pass through."""
+    with pytest.raises(ValidationError, match="thinking_retention"):
+        DefaultRendererConfig(preserve_all_thinking=True)
+    with pytest.raises(ValidationError, match="thinking_retention"):
+        DefaultRendererConfig(preserve_thinking_between_tool_calls=True)
+
+    cfg = DefaultRendererConfig(some_jinja_kwarg=True)
+    assert cfg.model_extra["some_jinja_kwarg"] is True