-
Notifications
You must be signed in to change notification settings - Fork 55
feat: enhance sandbox capability negotiation #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
commit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the MCP Apps sandbox capability negotiation by adding support for additional CSP directives and permissions policies. The changes enable apps to request nested iframe support, custom base URI configuration, and browser permissions (camera, microphone, geolocation) through resource metadata.
Key changes:
- Added
frameDomainsandbaseUriDomainsfields to CSP configuration for controlling nested iframes and base URI directives - Introduced
permissionsmetadata for requesting camera, microphone, and geolocation access via Permission Policy - Extended host capability negotiation to include CSP support indicators (
frameDomainsandbaseUriDomains)
Reviewed changes
Copilot reviewed 6 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/types.ts | Added exports for new McpUiResourcePermissions type and schema |
| src/spec.types.ts | Defined new interfaces for permissions and extended CSP configuration with frame/baseUri domains |
| src/generated/schema.ts | Added Zod schemas for permissions and extended CSP schemas with new domain fields |
| src/generated/schema.test.ts | Updated type inference tests to include new permissions schema |
| src/generated/schema.json | Generated JSON schemas for new permissions and CSP extensions |
| specification/draft/apps.mdx | Documented new CSP fields and permissions with examples and security guidance |
| examples/simple-host/sandbox.html | Added new sandbox proxy HTML implementation (missing CSP/permissions handling) |
| examples/basic-host/src/sandbox.ts | Implemented CSP meta tag building and iframe allow attribute for permissions |
| examples/basic-host/src/implementation.ts | Extended resource data extraction to include new CSP and permissions metadata |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
ochafik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ido!
| baseUriDomains?: string[]; | ||
| }; | ||
| permissions?: { | ||
| camera?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make these objects (as in capabilities) for future extensions?
e.g. what if one day there's fine-grained vers. coarse geolocation, or front vs. back camera permission, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like CSP, Permissions are coupled to the browser spec (Permissions Policy). I don't think we should diverge at this point. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 40+ Permission Policies you can stuff in an iFrame. Consider keeping permissions very simple and flexible, just have it be a string[].
Benefits:
- Client can set all permission policies with a single string. Flexible for future extensions.
- Enforcing this on the client side is simple. We just stuff the string into an iFrame's
allow. This is safe because invalid strings are silently ignored. - Still easily parseable on the server side. Server developer only has to do
permissions.contains("camera");.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively create some interface and have permission be an array of that interface if type safety and enforcement is of high importance.
interface Permissions {
camera: "camera",
....
}
permissions?: Permissions[]
ochafik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should tell app what permissions were granted (and potentially which domains were allowed)
|
Is there an iFrame in the |
|
I can see this being useful if the MCP server needs to know what the browser capabilities / permissions are. However, I'm pretty sure you can also fetch the browser capability within the widget too using https://developer.mozilla.org/en-US/docs/Web/API/FeaturePolicy/allowedFeatures I perhaps React widgets should be using this instead as a source of truth. |
I have no clue why midi is on there. That's for music instruments haha |
Thanks @matteo8p ! Permissions were originally excluded from negotiation because they could be detected directly. It was considered that the negotiation assumes less about the app runtime and the developer's awareness of the browser APIs. We should explore this further with the community to come to a decision. |
| if (event.data && event.data.method === 'ui/notifications/sandbox-resource-ready') { | ||
| const { html, sandbox, permissions } = event.data.params || {}; | ||
| // Note: csp is not extracted here - CSP is set via HTTP response headers in serve.ts | ||
| if (typeof sandbox === 'string') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need to set the sandbox attribute in the host itself (and let them be inherited transparently here), cf. comments from @domfarolino during today's meeting.
(or else, the child, being same origin, will be able to change its own sandbox attributes by poking at its parent dom ; or would be able to create other iframes w/o these limitations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same way that CSP settings are defined in the headers serving this page, and not injected by this page)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think whatever features are enabled inside of sandbox.html will be inherited by the same-origin iframe inner. So I think what really needs to happen is the explicit capability delegation to the iframe whose source is sandbox.html, and we should get delegation to the same-origin app contents for free. This is important because any changes made to sandbox and permissions policies here, after inner has been inserted into the DOM, will not apply to content (the html string, in this case) that has been synchronously written into the frame after the attributes change. For those attributes to take effect, the inner iframe will have to re-navigate to apply the attribute changes.
| if (event.data && event.data.method === 'ui/notifications/sandbox-resource-ready') { | ||
| const { html, sandbox, permissions } = event.data.params || {}; | ||
| // Note: csp is not extracted here - CSP is set via HTTP response headers in serve.ts | ||
| if (typeof sandbox === 'string') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think whatever features are enabled inside of sandbox.html will be inherited by the same-origin iframe inner. So I think what really needs to happen is the explicit capability delegation to the iframe whose source is sandbox.html, and we should get delegation to the same-origin app contents for free. This is important because any changes made to sandbox and permissions policies here, after inner has been inserted into the DOM, will not apply to content (the html string, in this case) that has been synchronously written into the frame after the attributes change. For those attributes to take effect, the inner iframe will have to re-navigate to apply the attribute changes.
| try { | ||
| if (inner.contentDocument) { | ||
| inner.contentDocument.open(); | ||
| inner.contentDocument.write(html); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that as mentioned above, I don't think any changes made to inner's allow or sandbox attributes will take effect after inner has been inserted to the DOM. You'll need to re-navigate the iframe for those attributes to stick.
Probably what you want to do is hold of on appending inner to the DOM until after you parse and set the intended attribute values. Or, given the comment further above, maybe we don't need to send the attribute values all the way down here, since inner should inherit most things from sandbox.html, since it's a same-origin iframe.
Implements early support for proposed CSP extensions to the MCP Apps spec: - frameDomains: Controls frame-src directive for nested iframes - baseUriDomains: Controls base-uri directive This anticipates changes proposed in: modelcontextprotocol/ext-apps#158 Note: The upstream spec PR is not yet merged, but we're implementing early to validate the approach.
|
Getting ahead of this and goose and implementing PTAL if interested. |

Proposal to tackle #58 -
frameDomainsandbaseUriDomainsoverrides to Resource Metadata'sui/cspattributepermissionsattribute(ui/permissions) to the Resource Metadata. Currently, it supportscamera,microphone, andgeolocation. There are many other permissions, but I think we should only add fields as they're needed. @alexi-openai could you please share what OpenAI allows?cspto Host<>App capability negotiation (since it's non-trivial for the app to detect at runtime)If it's acceptable, I'll add an E2E test before merging.
Thoughts?