Add locally-executed DuckDuckGo search tool#67
Open
reformedot wants to merge 6 commits into
Open
Conversation
Port the Python `search` action (DuckDuckGo Lite HTTP search) into the async agent engine as a new locally-dispatched `search` tool. Only the search logic is carried over — the `request_human_control` action and the Controller/DB/session scaffolding are dropped per "keep the logic only". Unlike the existing hosted `web_search` (provider-executed, no local I/O), this tool performs a real HTTP GET against `lite.duckduckgo.com/lite/` and parses the result HTML itself, so it works against any provider. Implementation notes: - New handler `tools/handlers/search.rs` follows the same trait stack (Approvable + Sandboxable + ToolRuntime) as the sibling tools, with the HTTP fetch behind a `SearchBackend` seam (real reqwest impl + fake for tests), mirroring the browser/python/mcp backend-injection pattern. - No new dependencies: the repo deliberately avoids HTML-parser deps (browser DOM comes from CDP), so parsing uses targeted `regex` over the fixed DuckDuckGo Lite markup plus a small hand-rolled percent-decoder and entity decoder. Faithful to the original BeautifulSoup logic. - Registered as `search` in both `default_registry` and the production dispatcher (`build_tool_dispatcher_with_cwd_and_goal_store`) so the live model can actually call it; parallel-safe (read-only). - Tests are fully deterministic (fixture HTML + fake backend, no network): parsing, URL unwrapping, entity/whitespace handling, response classification, formatting, and orchestrator/registry/dispatcher wiring. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
2 issues found across 6 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="crates/browser-use-agent/src/tools/handlers/search.rs">
<violation number="1" location="crates/browser-use-agent/src/tools/handlers/search.rs:202">
P1: Challenge detection is overly broad: any page containing the word "anomaly" is treated as CAPTCHA, causing false failures on valid searches.</violation>
</file>
<file name="crates/browser-use-agent/src/tools/registry.rs">
<violation number="1" location="crates/browser-use-agent/src/tools/registry.rs:1159">
P3: Broken intra-doc link: `[`web_search`](definitions::web_search)` in `search()`'s doc comment references `definitions::web_search` from within the same `definitions` module, resolving to a non-existent path. Should be `[`web_search`]` (same module) or use the full crate path.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
| /// (status 202 or an "anomaly" body) first, then any `>= 400` status as an | ||
| /// error, otherwise success. | ||
| pub fn classify_response(status: u16, body: &str) -> Result<(), SearchError> { | ||
| if status == 202 || body.to_ascii_lowercase().contains("anomaly") { |
There was a problem hiding this comment.
P1: Challenge detection is overly broad: any page containing the word "anomaly" is treated as CAPTCHA, causing false failures on valid searches.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/browser-use-agent/src/tools/handlers/search.rs, line 202:
<comment>Challenge detection is overly broad: any page containing the word "anomaly" is treated as CAPTCHA, causing false failures on valid searches.</comment>
<file context>
@@ -0,0 +1,736 @@
+/// (status 202 or an "anomaly" body) first, then any `>= 400` status as an
+/// error, otherwise success.
+pub fn classify_response(status: u16, body: &str) -> Result<(), SearchError> {
+ if status == 202 || body.to_ascii_lowercase().contains("anomaly") {
+ return Err(SearchError::Challenge);
+ }
</file context>
| } | ||
|
|
||
| /// `search`: a LOCALLY-executed DuckDuckGo (Lite) web search. Unlike the | ||
| /// hosted [`web_search`](definitions::web_search), the client performs the |
There was a problem hiding this comment.
P3: Broken intra-doc link: [web_search](definitions::web_search) in search()'s doc comment references definitions::web_search from within the same definitions module, resolving to a non-existent path. Should be [web_search] (same module) or use the full crate path.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At crates/browser-use-agent/src/tools/registry.rs, line 1159:
<comment>Broken intra-doc link: `[`web_search`](definitions::web_search)` in `search()`'s doc comment references `definitions::web_search` from within the same `definitions` module, resolving to a non-existent path. Should be `[`web_search`]` (same module) or use the full crate path.</comment>
<file context>
@@ -1155,6 +1155,34 @@ to the single frame that proves the task succeeded."
}
+ /// `search`: a LOCALLY-executed DuckDuckGo (Lite) web search. Unlike the
+ /// hosted [`web_search`](definitions::web_search), the client performs the
+ /// HTTP request itself and returns the parsed results as text. Ported from
+ /// the Python `search` action's description.
</file context>
Suggested change
| /// hosted [`web_search`](definitions::web_search), the client performs the | |
| /// hosted [`web_search`], the client performs the |
A network-dependent end-to-end check against the real DuckDuckGo Lite endpoint via the default HttpSearchBackend. Ignored by default (so CI and `cargo test` stay deterministic and offline); run manually with: cargo test -p browser-use-agent --lib -- --ignored --nocapture search_live_smoke Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…iciency The formatted model-facing output now trims each result's title to 15 chars and description to 100 chars (ellipsis counted within the cap, on a Unicode char boundary); destination URLs are kept intact so they stay usable. Truncation is applied at the display layer (`format_results`), so `SearchResult` still carries full data for any other consumer. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Tune the formatted-output truncation limits: titles 15 -> 30 chars, descriptions 100 -> 125 chars (ellipsis still counted within the cap). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
search toolsearch tool
Member
|
DO NOT MERGE. This does not work well enough in practice and should not be merged in its current form. |
search toolsearch tool
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a locally-executed DuckDuckGo
searchtool to the async agent engine, ported from the Pythonsearchaction (abrowser_useController action that fetchedlite.duckduckgo.com/lite/over HTTP and parsed the results).Per the request, only the search logic is carried over — the unrelated
request_human_controlaction and the Controller/DB/session scaffolding are dropped.How it differs from the existing
web_searchweb_search(existing)search(this PR)They're complementary, not duplicates.
Implementation
tools/handlers/search.rs— new handler following the same trait stack (Approvable+Sandboxable+ToolRuntime) and doc/format conventions as the sibling tools (tool_search,update_plan,web_search).SearchBackendtrait (realreqwestimpl + fake in tests), mirroring thebrowser/python/mcpbackend-injection pattern. The parsing/formatting logic is unit-tested against fixture HTML with no network.regexover the fixed DuckDuckGo Lite markup (a.result-link,td.result-snippet), plus a small hand-rolled percent-decoder and HTML-entity decoder. Faithful to the original BeautifulSoup logic (redirect unwrapping, dedup, "more info"/duckduckgo.comfiltering, snippet association, challenge/HTTP-error classification).searchin bothdefault_registryand the production dispatcher (build_tool_dispatcher_with_cwd_and_goal_store) so the live model can actually call it. Read-only ⇒parallel_safe = true. A dispatcher membership test guards against the tool silently dropping out of the production tool set.Tests
New
search_tests.rs(deterministic, no network) covers: URL unwrapping (redirect/protocol-relative/ads/unsafe-scheme/+/%XXdecoding), HTML parsing (dedup, skip-"more info", skip-duckduckgo.com, no-snippet, inline-markup stripping, whitespace collapsing, entity decoding), response classification (200/202/anomaly/4xx/5xx + the 399/400 boundary), output formatting, and fullrun/orchestrator/registry/dispatcher wiring. Registry/dispatcher tests updated for the new tool.Verification
cargo fmt --check✓cargo clippy— no new warnings from this change ✓cargo test— all new + existing tests pass. (Two pre-existingshell_testsPTY tests fail identically on a cleanmain; unrelated to this change.)uv run pytest— 34 passed, 1 skipped ✓Note
This PR was developed with the help of a multi-agent adversarial review of the diff; its main finding — that the production dispatcher is built manually and does not go through
default_registry— is addressed here (the tool is registered in both, with a CI guard).🤖 Generated with Claude Code
Summary by cubic
Adds a locally executed DuckDuckGo
searchtool that fetcheslite.duckduckgo.com/lite/, parses results client-side, and returns compact text results; complements hostedweb_searchso search works without provider support.New Features
searchtool registered indefault_registryand the production dispatcher; runs serial (not parallel-safe) to avoid DuckDuckGo Lite rate-limit blocks.SearchBackendseam withHttpSearchBackend; parsing uses targeted regex plus small percent/entity decoders; deterministic tests for URL unwrapping, HTML parsing, response classification, formatting, and registry/dispatcher wiring; includes an ignored live DuckDuckGo smoke test.Dependencies
reqwest.Written for commit af4111c. Summary will update on new commits.