diff --git a/src/content/open-source/usage.mdx b/src/content/open-source/usage.mdx index 237dcde..04359fc 100644 --- a/src/content/open-source/usage.mdx +++ b/src/content/open-source/usage.mdx @@ -7,7 +7,7 @@ description: You can dump an URL manually or start a CDP server. Use `./lightpanda help` for all options. -## Dump an URL +## Fetch a webpage ```sh copy ./lightpanda fetch --obey-robots --dump html https://demo-browser.lightpanda.io/campfire-commerce/ @@ -38,17 +38,98 @@ INFO http : request complete . . . . . . . . . . . . . . . . [+234ms] ### Options -The fetch command accepts options: -* `--dump html` Dumps document to stdout in HTML. You can use also `--dump markdown` to get a Markdown version. -* `--with-base` Add a `` tag in dump -* `--log-level` change the log level, default is `info`. `--log-level debug`. -* `--http-proxy` The HTTP proxy to use for all HTTP requests. A username:password can be included for basic authentication. `--http-proxy http://user:password@127.0.0.1:3000`. -* `--http-timeout` The maximum time, in milliseconds, the transfer is allowedto complete. 0 means it never times out. Defaults to `10000`. -* `--obey-robots` Fetches and obeys the robots.txt (if available) of the web pages we make requests towards. +### `fetch` command options + +```bash +--dump Dumps document to stdout. + Argument must be 'html', 'markdown', 'semantic_tree', or 'semantic_tree_text'. + Defaults to no dump. + +--strip-mode Comma separated list of tag groups to remove from dump + the dump. e.g. --strip-mode js,css + - "js" script and link[as=script, rel=preload] + - "ui" includes img, picture, video, css and svg + - "css" includes style and link[rel=stylesheet] + - "full" includes js, ui and css + +--with-base Add a tag in dump. Defaults to false. + +--with-frames Includes the contents of iframes. Defaults to false. + +--wait-ms Wait time in milliseconds. + Defaults to 5000. + +--wait-until Wait until the specified event. + Supported events: load, domcontentloaded, networkidle, done. + Defaults to 'done'. + +--insecure-disable-tls-host-verification + Disables host verification on all HTTP requests. This is an + advanced option which should only be set if you understand + and accept the risk of disabling host verification. + +--obey-robots + Fetches and obeys the robots.txt (if available) of the web pages + we make requests towards. + Defaults to false. + +--http-proxy The HTTP proxy to use for all HTTP requests. + A username:password can be included for basic authentication. + Defaults to none. + +--proxy-bearer-token + The to send for bearer authentication with the proxy + Proxy-Authorization: Bearer + +--http-max-concurrent + The maximum number of concurrent HTTP requests. + Defaults to 10. + +--http-max-host-open + The maximum number of open connection to a given host:port. + Defaults to 4. + +--http-connect-timeout + The time, in milliseconds, for establishing an HTTP connection + before timing out. 0 means it never times out. + Defaults to 0. + +--http-timeout + The maximum time, in milliseconds, the transfer is allowed + to complete. 0 means it never times out. + Defaults to 10000. + +--http-max-response-size + Limits the acceptable response size for any request + (e.g. XHR, fetch, script loading, ...). + Defaults to no limit. + +--log-level The log level: debug, info, warn, error or fatal. + Defaults towarn. + +--log-format The log format: pretty or logfmt. + Defaults to logfmt. + +--log-filter-scopes + Filter out too verbose logs per scope: + http, unknown_prop, event, ... + +--user-agent-suffix + Suffix to append to the Lightpanda/X.Y User-Agent + +--web-bot-auth-key-file + Path to the Ed25519 private key PEM file. + +--web-bot-auth-keyid + The JWK thumbprint of your public key. + +--web-bot-auth-domain + Your domain e.g. yourdomain.com +``` See also [how to configure proxy](/open-source/guides/configure-a-proxy). -## Start a CDP server +## CDP server To control Lightpanda with [Chrome Devtool Protocol](https://chromedevtools.github.io/devtools-protocol/) (CDP) clients like [Playwright](https://playwright.dev/) or [Puppeteer](https://pptr.dev/), you need to start the browser as a CDP server. @@ -60,22 +141,100 @@ need to start the browser as a CDP server. INFO app : server running . . . . . . . . . . . . . . . . . [+0ms] address = 127.0.0.1:9222 ``` -### Options +### `serve` command options + +```bash +--host Host of the CDP server + Defaults to "127.0.0.1" + +--port Port of the CDP server + Defaults to 9222 + +--advertise-host + The host to advertise, e.g. in the /json/version response. + Useful, for example, when --host is 0.0.0.0. + Defaults to --host value + +--timeout Inactivity timeout in seconds before disconnecting clients + Defaults to 10 (seconds). Limited to 604800 (1 week). + +--cdp-max-connections + Maximum number of simultaneous CDP connections. + Defaults to 16. + +--cdp-max-pending-connections + Maximum pending connections in the accept queue. + Defaults to 128. + +--insecure-disable-tls-host-verification + Disables host verification on all HTTP requests. This is an + advanced option which should only be set if you understand + and accept the risk of disabling host verification. + +--obey-robots + Fetches and obeys the robots.txt (if available) of the web pages + we make requests towards. + Defaults to false. + +--http-proxy The HTTP proxy to use for all HTTP requests. + A username:password can be included for basic authentication. + Defaults to none. + +--proxy-bearer-token + The to send for bearer authentication with the proxy + Proxy-Authorization: Bearer + +--http-max-concurrent + The maximum number of concurrent HTTP requests. + Defaults to 10. + +--http-max-host-open + The maximum number of open connection to a given host:port. + Defaults to 4. + +--http-connect-timeout + The time, in milliseconds, for establishing an HTTP connection + before timing out. 0 means it never times out. + Defaults to 0. + +--http-timeout + The maximum time, in milliseconds, the transfer is allowed + to complete. 0 means it never times out. + Defaults to 10000. + +--http-max-response-size + Limits the acceptable response size for any request + (e.g. XHR, fetch, script loading, ...). + Defaults to no limit. + +--log-level The log level: debug, info, warn, error or fatal. + Defaults towarn. -The fetch command accepts options: -* `--host` Host of the CDP server, default `127.0.0.1`. -* `--port` Port of the CDP server, default `9222`. -* `--timeout` Inactivity timeout in seconds before disconnecting clients. Default `10` seconds. -* `--log-level` change the log level, default is `info`. `--log-level debug`. -* `--http-proxy` The HTTP proxy to use for all HTTP requests. A username:password can be included for basic authentication. `--http-proxy http://user:password@127.0.0.1:3000`. -* `--http-timeout` The maximum time, in milliseconds, the transfer is allowedto complete. 0 means it never times out. Defaults to `10000`. -* `--obey-robots` Fetches and obeys the robots.txt (if available) of the web pages we make requests towards. +--log-format The log format: pretty or logfmt. + Defaults to logfmt. + +--log-filter-scopes + Filter out too verbose logs per scope: + http, unknown_prop, event, ... + +--user-agent-suffix + Suffix to append to the Lightpanda/X.Y User-Agent + +--web-bot-auth-key-file + Path to the Ed25519 private key PEM file. + +--web-bot-auth-keyid + The JWK thumbprint of your public key. + +--web-bot-auth-domain + Your domain e.g. yourdomain.com +``` See also [how to configure proxy](/open-source/guides/configure-a-proxy). ### Connect with Puppeteer -Once the CDP server started, you can run a [Puppeteer](https://pptr.dev/) +Once the CDP server started, you can run a [Puppeteer](https://playwright.dev/) script by configuring the `browserWSEndpoint`. ```js copy @@ -169,3 +328,115 @@ func main() { log.Println("Got title of:", title) } ``` + +## MCP server + +Starts an MCP (Model Context Protocol) server over stdio + + +```sh copy +./lightpanda mcp +``` + +### Tools + +| Name | Description | +|-|-| +| goto | Navigate to a specified URL and load the page in memory so it can be reused later for info extraction | +| markdown | Get the page content in markdown format. If a url is provided, it navigates to that url first. | +| links | Extract all links in the opened page. If a url is provided, it navigates to that url first. | +| evaluate | Evaluate JavaScript in the current page context. If a url is provided, it navigates to that url first. | +| semantic_tree | Get the page content as a simplified semantic DOM tree for AI reasoning. If a url is provided, it navigates to that url first. | +| interactiveElements | Extract interactive elements from the opened page. If a url is provided, it navigates to that url first. | +| structuredData | Extract structured data (like JSON-LD, OpenGraph, etc) from the opened page. If a url is provided, it navigates to that url first. | +| detectForms | Detect all forms on the page and return their structure including fields, types, and required status. If a url is provided, it navigates to that url first. | +| click | Click on an interactive element. Returns the current page URL and title after the click. | +| fill | Fill text into an input element. Returns the filled value and current page URL and title. | +| scroll | Scroll the page or a specific element. Returns the scroll position and current page URL and title. | +| waitForSelector | Wait for an element matching a CSS selector to appear in the page. Returns the backend node ID of the matched element. | + +### Options + +```bash +--insecure-disable-tls-host-verification + Disables host verification on all HTTP requests. This is an + advanced option which should only be set if you understand + and accept the risk of disabling host verification. + +--obey-robots + Fetches and obeys the robots.txt (if available) of the web pages + we make requests towards. + Defaults to false. + +--http-proxy The HTTP proxy to use for all HTTP requests. + A username:password can be included for basic authentication. + Defaults to none. + +--proxy-bearer-token + The to send for bearer authentication with the proxy + Proxy-Authorization: Bearer + +--http-max-concurrent + The maximum number of concurrent HTTP requests. + Defaults to 10. + +--http-max-host-open + The maximum number of open connection to a given host:port. + Defaults to 4. + +--http-connect-timeout + The time, in milliseconds, for establishing an HTTP connection + before timing out. 0 means it never times out. + Defaults to 0. + +--http-timeout + The maximum time, in milliseconds, the transfer is allowed + to complete. 0 means it never times out. + Defaults to 10000. + +--http-max-response-size + Limits the acceptable response size for any request + (e.g. XHR, fetch, script loading, ...). + Defaults to no limit. + +--log-level The log level: debug, info, warn, error or fatal. + Defaults towarn. + +--log-format The log format: pretty or logfmt. + Defaults to logfmt. + +--log-filter-scopes + Filter out too verbose logs per scope: + http, unknown_prop, event, ... + +--user-agent-suffix + Suffix to append to the Lightpanda/X.Y User-Agent + +--web-bot-auth-key-file + Path to the Ed25519 private key PEM file. + +--web-bot-auth-keyid + The JWK thumbprint of your public key. + +--web-bot-auth-domain + Your domain e.g. yourdomain.com +``` + +### Claude Desktop / Cursor / Windsurf + +Add to your MCP host configuration: + +- **Claude Desktop:** Settings > Developer > Edit Config +- **Cursor:** `.cursor/mcp.json` in your project +- **Windsurf:** Cascade MCP settings + +```json copy +{ + "mcpServers": { + "lightpanda": { + "command": "/path/to/lightpanda", + "args": ["mcp"] + } + } +} +```