Skip to content

Add locally-executed DuckDuckGo search tool#67

Open
reformedot wants to merge 6 commits into
mainfrom
add-duckduckgo-search-tool
Open

Add locally-executed DuckDuckGo search tool#67
reformedot wants to merge 6 commits into
mainfrom
add-duckduckgo-search-tool

Conversation

@reformedot
Copy link
Copy Markdown

@reformedot reformedot commented Jun 5, 2026

What

Adds a locally-executed DuckDuckGo search tool to the async agent engine, ported from the Python search action (a browser_use Controller action that fetched lite.duckduckgo.com/lite/ over HTTP and parsed the results).

Per the request, only the search logic is carried over — the unrelated request_human_control action and the Controller/DB/session scaffolding are dropped.

How it differs from the existing web_search

web_search (existing) search (this PR)
Execution Hosted — provider runs it server-side Local — the client does the HTTP GET + HTML parse
Needs a capable provider Yes No (works against any provider)

They're complementary, not duplicates.

Implementation

  • tools/handlers/search.rs — new handler following the same trait stack (Approvable + Sandboxable + ToolRuntime) and doc/format conventions as the sibling tools (tool_search, update_plan, web_search).
  • Network seam for testability — the HTTP fetch lives behind a SearchBackend trait (real reqwest impl + fake in tests), mirroring the browser / python / mcp backend-injection pattern. The parsing/formatting logic is unit-tested against fixture HTML with no network.
  • No new dependencies — the repo deliberately carries no HTML-parser crate (browser DOM comes from CDP, never string-parsed), so the few fields are extracted with targeted regex over the fixed DuckDuckGo Lite markup (a.result-link, td.result-snippet), plus a small hand-rolled percent-decoder and HTML-entity decoder. Faithful to the original BeautifulSoup logic (redirect unwrapping, dedup, "more info"/duckduckgo.com filtering, snippet association, challenge/HTTP-error classification).
  • Registered as search in both default_registry and the production dispatcher (build_tool_dispatcher_with_cwd_and_goal_store) so the live model can actually call it. Read-only ⇒ parallel_safe = true. A dispatcher membership test guards against the tool silently dropping out of the production tool set.

Tests

New search_tests.rs (deterministic, no network) covers: URL unwrapping (redirect/protocol-relative/ads/unsafe-scheme/+/%XX decoding), HTML parsing (dedup, skip-"more info", skip-duckduckgo.com, no-snippet, inline-markup stripping, whitespace collapsing, entity decoding), response classification (200/202/anomaly/4xx/5xx + the 399/400 boundary), output formatting, and full run/orchestrator/registry/dispatcher wiring. Registry/dispatcher tests updated for the new tool.

Verification

  • cargo fmt --check
  • cargo clippy — no new warnings from this change ✓
  • cargo test — all new + existing tests pass. (Two pre-existing shell_tests PTY tests fail identically on a clean main; unrelated to this change.)
  • uv run pytest — 34 passed, 1 skipped ✓

Note

This PR was developed with the help of a multi-agent adversarial review of the diff; its main finding — that the production dispatcher is built manually and does not go through default_registry — is addressed here (the tool is registered in both, with a CI guard).

🤖 Generated with Claude Code


Summary by cubic

Adds a locally executed DuckDuckGo search tool that fetches lite.duckduckgo.com/lite/, parses results client-side, and returns compact text results; complements hosted web_search so search works without provider support.

  • New Features

    • New search tool registered in default_registry and the production dispatcher; runs serial (not parallel-safe) to avoid DuckDuckGo Lite rate-limit blocks.
    • Output formatting trims titles to 30 chars and descriptions to 125 (ellipsis included); URLs remain intact. Tool definition guidance steers models to use this instead of navigating search engines, noting it needs no browser session and is token-efficient.
    • SearchBackend seam with HttpSearchBackend; parsing uses targeted regex plus small percent/entity decoders; deterministic tests for URL unwrapping, HTML parsing, response classification, formatting, and registry/dispatcher wiring; includes an ignored live DuckDuckGo smoke test.
  • Dependencies

    • No new packages; uses reqwest.

Written for commit af4111c. Summary will update on new commits.

Review in cubic

Port the Python `search` action (DuckDuckGo Lite HTTP search) into the
async agent engine as a new locally-dispatched `search` tool. Only the
search logic is carried over — the `request_human_control` action and the
Controller/DB/session scaffolding are dropped per "keep the logic only".

Unlike the existing hosted `web_search` (provider-executed, no local I/O),
this tool performs a real HTTP GET against `lite.duckduckgo.com/lite/` and
parses the result HTML itself, so it works against any provider.

Implementation notes:
- New handler `tools/handlers/search.rs` follows the same trait stack
  (Approvable + Sandboxable + ToolRuntime) as the sibling tools, with the
  HTTP fetch behind a `SearchBackend` seam (real reqwest impl + fake for
  tests), mirroring the browser/python/mcp backend-injection pattern.
- No new dependencies: the repo deliberately avoids HTML-parser deps
  (browser DOM comes from CDP), so parsing uses targeted `regex` over the
  fixed DuckDuckGo Lite markup plus a small hand-rolled percent-decoder and
  entity decoder. Faithful to the original BeautifulSoup logic.
- Registered as `search` in both `default_registry` and the production
  dispatcher (`build_tool_dispatcher_with_cwd_and_goal_store`) so the live
  model can actually call it; parallel-safe (read-only).
- Tests are fully deterministic (fixture HTML + fake backend, no network):
  parsing, URL unwrapping, entity/whitespace handling, response
  classification, formatting, and orchestrator/registry/dispatcher wiring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/browser-use-agent/src/tools/handlers/search.rs">

<violation number="1" location="crates/browser-use-agent/src/tools/handlers/search.rs:202">
P1: Challenge detection is overly broad: any page containing the word "anomaly" is treated as CAPTCHA, causing false failures on valid searches.</violation>
</file>

<file name="crates/browser-use-agent/src/tools/registry.rs">

<violation number="1" location="crates/browser-use-agent/src/tools/registry.rs:1159">
P3: Broken intra-doc link: `[`web_search`](definitions::web_search)` in `search()`'s doc comment references `definitions::web_search` from within the same `definitions` module, resolving to a non-existent path. Should be `[`web_search`]` (same module) or use the full crate path.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic

/// (status 202 or an "anomaly" body) first, then any `>= 400` status as an
/// error, otherwise success.
pub fn classify_response(status: u16, body: &str) -> Result<(), SearchError> {
if status == 202 || body.to_ascii_lowercase().contains("anomaly") {
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Challenge detection is overly broad: any page containing the word "anomaly" is treated as CAPTCHA, causing false failures on valid searches.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/browser-use-agent/src/tools/handlers/search.rs, line 202:

<comment>Challenge detection is overly broad: any page containing the word "anomaly" is treated as CAPTCHA, causing false failures on valid searches.</comment>

<file context>
@@ -0,0 +1,736 @@
+/// (status 202 or an "anomaly" body) first, then any `>= 400` status as an
+/// error, otherwise success.
+pub fn classify_response(status: u16, body: &str) -> Result<(), SearchError> {
+    if status == 202 || body.to_ascii_lowercase().contains("anomaly") {
+        return Err(SearchError::Challenge);
+    }
</file context>
Fix with cubic

}

/// `search`: a LOCALLY-executed DuckDuckGo (Lite) web search. Unlike the
/// hosted [`web_search`](definitions::web_search), the client performs the
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Broken intra-doc link: [web_search](definitions::web_search) in search()'s doc comment references definitions::web_search from within the same definitions module, resolving to a non-existent path. Should be [web_search] (same module) or use the full crate path.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/browser-use-agent/src/tools/registry.rs, line 1159:

<comment>Broken intra-doc link: `[`web_search`](definitions::web_search)` in `search()`'s doc comment references `definitions::web_search` from within the same `definitions` module, resolving to a non-existent path. Should be `[`web_search`]` (same module) or use the full crate path.</comment>

<file context>
@@ -1155,6 +1155,34 @@ to the single frame that proves the task succeeded."
     }
 
+    /// `search`: a LOCALLY-executed DuckDuckGo (Lite) web search. Unlike the
+    /// hosted [`web_search`](definitions::web_search), the client performs the
+    /// HTTP request itself and returns the parsed results as text. Ported from
+    /// the Python `search` action's description.
</file context>
Suggested change
/// hosted [`web_search`](definitions::web_search), the client performs the
/// hosted [`web_search`], the client performs the
Fix with cubic

reformedot and others added 5 commits June 4, 2026 18:15
A network-dependent end-to-end check against the real DuckDuckGo Lite
endpoint via the default HttpSearchBackend. Ignored by default (so CI and
`cargo test` stay deterministic and offline); run manually with:

  cargo test -p browser-use-agent --lib -- --ignored --nocapture search_live_smoke

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iciency

The formatted model-facing output now trims each result's title to 15 chars
and description to 100 chars (ellipsis counted within the cap, on a Unicode
char boundary); destination URLs are kept intact so they stay usable.
Truncation is applied at the display layer (`format_results`), so
`SearchResult` still carries full data for any other consumer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tune the formatted-output truncation limits: titles 15 -> 30 chars,
descriptions 100 -> 125 chars (ellipsis still counted within the cap).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@gregpr07 gregpr07 changed the title Add locally-executed DuckDuckGo search tool DO NOT MERGE: Add locally-executed DuckDuckGo search tool Jun 5, 2026
@gregpr07
Copy link
Copy Markdown
Member

gregpr07 commented Jun 5, 2026

DO NOT MERGE. This does not work well enough in practice and should not be merged in its current form.

@gregpr07 gregpr07 changed the title DO NOT MERGE: Add locally-executed DuckDuckGo search tool Add locally-executed DuckDuckGo search tool Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants