Skip to content

[New Feature]: Add BrowserOS MCP support for better agent-controlled browsing #342

@Akalanka1337

Description

@Akalanka1337

Target Component

External Integrations (LLM/Search APIs)

Enhancement Description

PentAGI currently has a browser tool that uses the external scraper service to fetch:

  • Markdown content
  • HTML content
  • Links
  • Screenshots

This is useful for basic extraction, but it is still mostly a request/response scraper flow. For modern web apps, the agent may need a more interactive browser: click buttons, fill forms, handle login pages, wait for JavaScript, inspect dynamic UI, and extract data after navigation.

Can we add optional BrowserOS MCP integration so PentAGI agents can control a real browser through MCP?

Current behavior

From reviewing the source code/doc, it looks like the current browser tool is implemented in:

  • lives in backend/pkg/tools/browser.go
  • uses configured scraper URLs
  • calls scraper endpoints like /markdown, /html, /links, and /screenshot
  • captures screenshots in parallel with content extraction
  • supports public/private scraper routing
  • returns browser errors as text instead of hard-failing the agent chain

This works for simple pages, but it is limited when the task needs real browser interaction. 🥹

Proposed feature

Add BrowserOS as an optional MCP browser backend. 💯

When enabled, PentAGI agents should be able to use BrowserOS MCP tools for actions like:

  • open a URL
  • click elements
  • type into inputs
  • submit forms
  • wait for page changes
  • extract selected page content
  • capture screenshots
  • inspect visible UI state
  • navigate multi-step websites

Why BrowserOS?

BrowserOS is an open-source AI browser with a built-in MCP server. It allows external AI clients to control the browser using MCP.

This could make PentAGI browsing easier and more reliable than only using the old scraper-style method.

Suggested config

BROWSER_BACKEND=scraper
# options: scraper, browseros_mcp, hybrid

BROWSEROS_MCP_ENABLED=true
BROWSEROS_MCP_TRANSPORT=http
BROWSEROS_MCP_URL=http://host.docker.internal:PORT
BROWSEROS_MCP_REQUIRE_APPROVAL=false

Suggested modes

1. Scraper mode

Current behavior.

Use existing scraper endpoints:

  • /markdown
  • /html
  • /links
  • /screenshot

2. BrowserOS MCP mode 🥇

Use BrowserOS MCP as the main browser tool.

The agent can perform interactive browsing actions instead of only extracting static content.

3. Hybrid mode

Use the current scraper first for simple extraction.

If the scraper fails or the task needs interaction, fallback to BrowserOS MCP.

Example (Just my idea):

browser markdown extraction failed
→ try BrowserOS MCP
→ open URL
→ wait for page load
→ extract visible text
→ capture screenshot
→ return result to agent

Example agent flow

Current flow:

Agent asks browser tool for URL markdown
PentAGI calls scraper /markdown
PentAGI returns extracted text

New BrowserOS MCP flow:

Agent asks browser tool to inspect target
PentAGI sends MCP call to BrowserOS
BrowserOS opens the page
Agent clicks / types / navigates if needed
BrowserOS returns page text, DOM info, or screenshot
PentAGI stores the result in the flow

Possible implementation idea

Add a browser backend interface:

type BrowserBackend interface {
    Markdown(ctx context.Context, url string) (string, string, error)
    HTML(ctx context.Context, url string) (string, string, error)
    Links(ctx context.Context, url string) (string, string, error)

    Navigate(ctx context.Context, url string) error
    Click(ctx context.Context, selector string) error
    Type(ctx context.Context, selector string, text string) error
    Screenshot(ctx context.Context) (string, error)
    Extract(ctx context.Context) (string, error)
}

Then keep the existing scraper implementation as one backend and add BrowserOS MCP as another backend.

Benefits 👍🏻

  • Better support for JavaScript-heavy websites
  • Better handling of login pages and forms
  • More natural browser automation for agents
  • Less need for brittle HTML scraping
  • Easier screenshots and visual inspection
  • Reuses the growing MCP ecosystem
  • Fits well with the existing MCP client proposal

Safety considerations

Because PentAGI is a pentesting tool, BrowserOS MCP should be controlled safely:

  • disabled by default
  • configurable per deployment
  • log every browser action
  • optionally require approval for form submission or credential use
  • keep browser sessions isolated per flow
  • avoid leaking cookies/session data between flows
  • allow admins to restrict BrowserOS to approved targets only (must)

Acceptance criteria

  • PentAGI can connect to BrowserOS MCP as an optional browser backend.
  • Agents can use BrowserOS for interactive browser actions.
  • Existing scraper behavior still works.
  • Hybrid fallback from scraper to BrowserOS is supported.
  • Screenshots and extracted content are stored in the normal PentAGI flow logs.
  • Browser actions are visible in logs/observability.

Technical Details

No response

Designs and Mockups

No response

Alternative Solutions

No response

Verification

  • I have checked that this enhancement hasn't been already proposed
  • This enhancement aligns with PentAGI's goal of autonomous penetration testing
  • I have considered the security implications of this enhancement
  • I have provided clear use cases and benefits

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions