Target Component
External Integrations (LLM/Search APIs)
Enhancement Description
PentAGI currently has a browser tool that uses the external scraper service to fetch:
- Markdown content
- HTML content
- Links
- Screenshots
This is useful for basic extraction, but it is still mostly a request/response scraper flow. For modern web apps, the agent may need a more interactive browser: click buttons, fill forms, handle login pages, wait for JavaScript, inspect dynamic UI, and extract data after navigation.
Can we add optional BrowserOS MCP integration so PentAGI agents can control a real browser through MCP?
Current behavior
From reviewing the source code/doc, it looks like the current browser tool is implemented in:
- lives in
backend/pkg/tools/browser.go
- uses configured scraper URLs
- calls scraper endpoints like
/markdown, /html, /links, and /screenshot
- captures screenshots in parallel with content extraction
- supports public/private scraper routing
- returns browser errors as text instead of hard-failing the agent chain
This works for simple pages, but it is limited when the task needs real browser interaction. 🥹
Proposed feature
Add BrowserOS as an optional MCP browser backend. 💯
When enabled, PentAGI agents should be able to use BrowserOS MCP tools for actions like:
- open a URL
- click elements
- type into inputs
- submit forms
- wait for page changes
- extract selected page content
- capture screenshots
- inspect visible UI state
- navigate multi-step websites
Why BrowserOS?
BrowserOS is an open-source AI browser with a built-in MCP server. It allows external AI clients to control the browser using MCP.
This could make PentAGI browsing easier and more reliable than only using the old scraper-style method.
Suggested config
BROWSER_BACKEND=scraper
# options: scraper, browseros_mcp, hybrid
BROWSEROS_MCP_ENABLED=true
BROWSEROS_MCP_TRANSPORT=http
BROWSEROS_MCP_URL=http://host.docker.internal:PORT
BROWSEROS_MCP_REQUIRE_APPROVAL=false
Suggested modes
1. Scraper mode
Current behavior.
Use existing scraper endpoints:
/markdown
/html
/links
/screenshot
2. BrowserOS MCP mode 🥇
Use BrowserOS MCP as the main browser tool.
The agent can perform interactive browsing actions instead of only extracting static content.
3. Hybrid mode
Use the current scraper first for simple extraction.
If the scraper fails or the task needs interaction, fallback to BrowserOS MCP.
Example (Just my idea):
browser markdown extraction failed
→ try BrowserOS MCP
→ open URL
→ wait for page load
→ extract visible text
→ capture screenshot
→ return result to agent
Example agent flow
Current flow:
Agent asks browser tool for URL markdown
PentAGI calls scraper /markdown
PentAGI returns extracted text
New BrowserOS MCP flow:
Agent asks browser tool to inspect target
PentAGI sends MCP call to BrowserOS
BrowserOS opens the page
Agent clicks / types / navigates if needed
BrowserOS returns page text, DOM info, or screenshot
PentAGI stores the result in the flow
Possible implementation idea
Add a browser backend interface:
type BrowserBackend interface {
Markdown(ctx context.Context, url string) (string, string, error)
HTML(ctx context.Context, url string) (string, string, error)
Links(ctx context.Context, url string) (string, string, error)
Navigate(ctx context.Context, url string) error
Click(ctx context.Context, selector string) error
Type(ctx context.Context, selector string, text string) error
Screenshot(ctx context.Context) (string, error)
Extract(ctx context.Context) (string, error)
}
Then keep the existing scraper implementation as one backend and add BrowserOS MCP as another backend.
Benefits 👍🏻
- Better support for JavaScript-heavy websites
- Better handling of login pages and forms
- More natural browser automation for agents
- Less need for brittle HTML scraping
- Easier screenshots and visual inspection
- Reuses the growing MCP ecosystem
- Fits well with the existing MCP client proposal
Safety considerations
Because PentAGI is a pentesting tool, BrowserOS MCP should be controlled safely:
- disabled by default
- configurable per deployment
- log every browser action
- optionally require approval for form submission or credential use
- keep browser sessions isolated per flow
- avoid leaking cookies/session data between flows
- allow admins to restrict BrowserOS to approved targets only (must)
Acceptance criteria
- PentAGI can connect to BrowserOS MCP as an optional browser backend.
- Agents can use BrowserOS for interactive browser actions.
- Existing scraper behavior still works.
- Hybrid fallback from scraper to BrowserOS is supported.
- Screenshots and extracted content are stored in the normal PentAGI flow logs.
- Browser actions are visible in logs/observability.
Technical Details
No response
Designs and Mockups
No response
Alternative Solutions
No response
Verification
Target Component
External Integrations (LLM/Search APIs)
Enhancement Description
PentAGI currently has a browser tool that uses the external scraper service to fetch:
This is useful for basic extraction, but it is still mostly a request/response scraper flow. For modern web apps, the agent may need a more interactive browser: click buttons, fill forms, handle login pages, wait for JavaScript, inspect dynamic UI, and extract data after navigation.
Can we add optional BrowserOS MCP integration so PentAGI agents can control a real browser through MCP?
Current behavior
From reviewing the source code/doc, it looks like the current browser tool is implemented in:
backend/pkg/tools/browser.go/markdown,/html,/links, and/screenshotThis works for simple pages, but it is limited when the task needs real browser interaction. 🥹
Proposed feature
Add BrowserOS as an optional MCP browser backend. 💯
When enabled, PentAGI agents should be able to use BrowserOS MCP tools for actions like:
Why BrowserOS?
BrowserOS is an open-source AI browser with a built-in MCP server. It allows external AI clients to control the browser using MCP.
This could make PentAGI browsing easier and more reliable than only using the old scraper-style method.
Suggested config
Suggested modes
1. Scraper mode
Current behavior.
Use existing scraper endpoints:
/markdown/html/links/screenshot2. BrowserOS MCP mode 🥇
Use BrowserOS MCP as the main browser tool.
The agent can perform interactive browsing actions instead of only extracting static content.
3. Hybrid mode
Use the current scraper first for simple extraction.
If the scraper fails or the task needs interaction, fallback to BrowserOS MCP.
Example (Just my idea):
Example agent flow
Current flow:
New BrowserOS MCP flow:
Possible implementation idea
Add a browser backend interface:
Then keep the existing scraper implementation as one backend and add BrowserOS MCP as another backend.
Benefits 👍🏻
Safety considerations
Because PentAGI is a pentesting tool, BrowserOS MCP should be controlled safely:
Acceptance criteria
Technical Details
No response
Designs and Mockups
No response
Alternative Solutions
No response
Verification