Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
309 changes: 290 additions & 19 deletions src/content/open-source/usage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: You can dump an URL manually or start a CDP server.

Use `./lightpanda help` for all options.

## Dump an URL
## Fetch a webpage

```sh copy
./lightpanda fetch --obey-robots --dump html https://demo-browser.lightpanda.io/campfire-commerce/
Expand Down Expand Up @@ -38,17 +38,98 @@ INFO http : request complete . . . . . . . . . . . . . . . . [+234ms]

### Options

The fetch command accepts options:
* `--dump html` Dumps document to stdout in HTML. You can use also `--dump markdown` to get a Markdown version.
* `--with-base` Add a `<base>` tag in dump
* `--log-level` change the log level, default is `info`. `--log-level debug`.
* `--http-proxy` The HTTP proxy to use for all HTTP requests. A username:password can be included for basic authentication. `--http-proxy http://user:password@127.0.0.1:3000`.
* `--http-timeout` The maximum time, in milliseconds, the transfer is allowedto complete. 0 means it never times out. Defaults to `10000`.
* `--obey-robots` Fetches and obeys the robots.txt (if available) of the web pages we make requests towards.
### `fetch` command options

```bash
--dump Dumps document to stdout.
Argument must be 'html', 'markdown', 'semantic_tree', or 'semantic_tree_text'.
Defaults to no dump.

--strip-mode Comma separated list of tag groups to remove from dump
the dump. e.g. --strip-mode js,css
- "js" script and link[as=script, rel=preload]
- "ui" includes img, picture, video, css and svg
- "css" includes style and link[rel=stylesheet]
- "full" includes js, ui and css

--with-base Add a <base> tag in dump. Defaults to false.

--with-frames Includes the contents of iframes. Defaults to false.

--wait-ms Wait time in milliseconds.
Defaults to 5000.

--wait-until Wait until the specified event.
Supported events: load, domcontentloaded, networkidle, done.
Defaults to 'done'.

--insecure-disable-tls-host-verification
Disables host verification on all HTTP requests. This is an
advanced option which should only be set if you understand
and accept the risk of disabling host verification.

--obey-robots
Fetches and obeys the robots.txt (if available) of the web pages
we make requests towards.
Defaults to false.

--http-proxy The HTTP proxy to use for all HTTP requests.
A username:password can be included for basic authentication.
Defaults to none.

--proxy-bearer-token
The <token> to send for bearer authentication with the proxy
Proxy-Authorization: Bearer <token>

--http-max-concurrent
The maximum number of concurrent HTTP requests.
Defaults to 10.

--http-max-host-open
The maximum number of open connection to a given host:port.
Defaults to 4.

--http-connect-timeout
The time, in milliseconds, for establishing an HTTP connection
before timing out. 0 means it never times out.
Defaults to 0.

--http-timeout
The maximum time, in milliseconds, the transfer is allowed
to complete. 0 means it never times out.
Defaults to 10000.

--http-max-response-size
Limits the acceptable response size for any request
(e.g. XHR, fetch, script loading, ...).
Defaults to no limit.

--log-level The log level: debug, info, warn, error or fatal.
Defaults towarn.

--log-format The log format: pretty or logfmt.
Defaults to logfmt.

--log-filter-scopes
Filter out too verbose logs per scope:
http, unknown_prop, event, ...

--user-agent-suffix
Suffix to append to the Lightpanda/X.Y User-Agent

--web-bot-auth-key-file
Path to the Ed25519 private key PEM file.

--web-bot-auth-keyid
The JWK thumbprint of your public key.

--web-bot-auth-domain
Your domain e.g. yourdomain.com
```

See also [how to configure proxy](/open-source/guides/configure-a-proxy).

## Start a CDP server
## CDP server

To control Lightpanda with [Chrome Devtool Protocol](https://chromedevtools.github.io/devtools-protocol/) (CDP) clients like [Playwright](https://playwright.dev/) or [Puppeteer](https://pptr.dev/), you
need to start the browser as a CDP server.
Expand All @@ -60,22 +141,100 @@ need to start the browser as a CDP server.
INFO app : server running . . . . . . . . . . . . . . . . . [+0ms]
address = 127.0.0.1:9222
```
### Options
### `serve` command options

```bash
--host Host of the CDP server
Defaults to "127.0.0.1"

--port Port of the CDP server
Defaults to 9222

--advertise-host
The host to advertise, e.g. in the /json/version response.
Useful, for example, when --host is 0.0.0.0.
Defaults to --host value

--timeout Inactivity timeout in seconds before disconnecting clients
Defaults to 10 (seconds). Limited to 604800 (1 week).

--cdp-max-connections
Maximum number of simultaneous CDP connections.
Defaults to 16.

--cdp-max-pending-connections
Maximum pending connections in the accept queue.
Defaults to 128.

--insecure-disable-tls-host-verification
Disables host verification on all HTTP requests. This is an
advanced option which should only be set if you understand
and accept the risk of disabling host verification.

--obey-robots
Fetches and obeys the robots.txt (if available) of the web pages
we make requests towards.
Defaults to false.

--http-proxy The HTTP proxy to use for all HTTP requests.
A username:password can be included for basic authentication.
Defaults to none.

--proxy-bearer-token
The <token> to send for bearer authentication with the proxy
Proxy-Authorization: Bearer <token>

--http-max-concurrent
The maximum number of concurrent HTTP requests.
Defaults to 10.

--http-max-host-open
The maximum number of open connection to a given host:port.
Defaults to 4.

--http-connect-timeout
The time, in milliseconds, for establishing an HTTP connection
before timing out. 0 means it never times out.
Defaults to 0.

--http-timeout
The maximum time, in milliseconds, the transfer is allowed
to complete. 0 means it never times out.
Defaults to 10000.

--http-max-response-size
Limits the acceptable response size for any request
(e.g. XHR, fetch, script loading, ...).
Defaults to no limit.

--log-level The log level: debug, info, warn, error or fatal.
Defaults towarn.

The fetch command accepts options:
* `--host` Host of the CDP server, default `127.0.0.1`.
* `--port` Port of the CDP server, default `9222`.
* `--timeout` Inactivity timeout in seconds before disconnecting clients. Default `10` seconds.
* `--log-level` change the log level, default is `info`. `--log-level debug`.
* `--http-proxy` The HTTP proxy to use for all HTTP requests. A username:password can be included for basic authentication. `--http-proxy http://user:password@127.0.0.1:3000`.
* `--http-timeout` The maximum time, in milliseconds, the transfer is allowedto complete. 0 means it never times out. Defaults to `10000`.
* `--obey-robots` Fetches and obeys the robots.txt (if available) of the web pages we make requests towards.
--log-format The log format: pretty or logfmt.
Defaults to logfmt.

--log-filter-scopes
Filter out too verbose logs per scope:
http, unknown_prop, event, ...

--user-agent-suffix
Suffix to append to the Lightpanda/X.Y User-Agent

--web-bot-auth-key-file
Path to the Ed25519 private key PEM file.

--web-bot-auth-keyid
The JWK thumbprint of your public key.

--web-bot-auth-domain
Your domain e.g. yourdomain.com
```

See also [how to configure proxy](/open-source/guides/configure-a-proxy).

### Connect with Puppeteer

Once the CDP server started, you can run a [Puppeteer](https://pptr.dev/)
Once the CDP server started, you can run a [Puppeteer](https://playwright.dev/)
script by configuring the `browserWSEndpoint`.

```js copy
Expand Down Expand Up @@ -169,3 +328,115 @@ func main() {
log.Println("Got title of:", title)
}
```

## MCP server

Starts an MCP (Model Context Protocol) server over stdio


```sh copy
./lightpanda mcp
```

### Tools

| Name | Description |
|-|-|
| goto | Navigate to a specified URL and load the page in memory so it can be reused later for info extraction |
| markdown | Get the page content in markdown format. If a url is provided, it navigates to that url first. |
| links | Extract all links in the opened page. If a url is provided, it navigates to that url first. |
| evaluate | Evaluate JavaScript in the current page context. If a url is provided, it navigates to that url first. |
| semantic_tree | Get the page content as a simplified semantic DOM tree for AI reasoning. If a url is provided, it navigates to that url first. |
| interactiveElements | Extract interactive elements from the opened page. If a url is provided, it navigates to that url first. |
| structuredData | Extract structured data (like JSON-LD, OpenGraph, etc) from the opened page. If a url is provided, it navigates to that url first. |
| detectForms | Detect all forms on the page and return their structure including fields, types, and required status. If a url is provided, it navigates to that url first. |
| click | Click on an interactive element. Returns the current page URL and title after the click. |
| fill | Fill text into an input element. Returns the filled value and current page URL and title. |
| scroll | Scroll the page or a specific element. Returns the scroll position and current page URL and title. |
| waitForSelector | Wait for an element matching a CSS selector to appear in the page. Returns the backend node ID of the matched element. |

### Options

```bash
--insecure-disable-tls-host-verification
Disables host verification on all HTTP requests. This is an
advanced option which should only be set if you understand
and accept the risk of disabling host verification.

--obey-robots
Fetches and obeys the robots.txt (if available) of the web pages
we make requests towards.
Defaults to false.

--http-proxy The HTTP proxy to use for all HTTP requests.
A username:password can be included for basic authentication.
Defaults to none.

--proxy-bearer-token
The <token> to send for bearer authentication with the proxy
Proxy-Authorization: Bearer <token>

--http-max-concurrent
The maximum number of concurrent HTTP requests.
Defaults to 10.

--http-max-host-open
The maximum number of open connection to a given host:port.
Defaults to 4.

--http-connect-timeout
The time, in milliseconds, for establishing an HTTP connection
before timing out. 0 means it never times out.
Defaults to 0.

--http-timeout
The maximum time, in milliseconds, the transfer is allowed
to complete. 0 means it never times out.
Defaults to 10000.

--http-max-response-size
Limits the acceptable response size for any request
(e.g. XHR, fetch, script loading, ...).
Defaults to no limit.

--log-level The log level: debug, info, warn, error or fatal.
Defaults towarn.

--log-format The log format: pretty or logfmt.
Defaults to logfmt.

--log-filter-scopes
Filter out too verbose logs per scope:
http, unknown_prop, event, ...

--user-agent-suffix
Suffix to append to the Lightpanda/X.Y User-Agent

--web-bot-auth-key-file
Path to the Ed25519 private key PEM file.

--web-bot-auth-keyid
The JWK thumbprint of your public key.

--web-bot-auth-domain
Your domain e.g. yourdomain.com
```

### Claude Desktop / Cursor / Windsurf

Add to your MCP host configuration:

- **Claude Desktop:** Settings > Developer > Edit Config
- **Cursor:** `.cursor/mcp.json` in your project
- **Windsurf:** Cascade MCP settings

```json copy
{
"mcpServers": {
"lightpanda": {
"command": "/path/to/lightpanda",
"args": ["mcp"]
}
}
}
```
Loading