-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add socket auditor for forwarding logs to coder agent #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fac8048 to
a2ea4f9
Compare
Add SocketAuditor that sends audit logs to the Coder workspace agent via a Unix socket. This enables boundary audit events to be forwarded to coderd for centralized logging. Implementation notes: - Batching: 10 logs or 5 seconds, whichever comes first - Wire format: tag & length prefixed protobuf. proto imported from AgentAPI to simplify boundary -> agent -> coderd forwarding to start. - CLI and config flag to disable sending of audit logs to workspace agent as an escape hatch
a2ea4f9 to
2365931
Compare
dannykopping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotted two fairly significant shortcomings which I think need to be addressed before this can land.
I'm not going to block merge because this is not my project, but I highly recommend these be addressed before proceeding.
dannykopping
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I don't think we should be silently falling back to a default socket value.
audit/socket_auditor.go
Outdated
| socketPath := os.Getenv(EnvAuditSocketPath) | ||
| if socketPath == "" { | ||
| socketPath = defaultAuditSocketPath | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest rather failing loud and proud here.
The env var has a default value, so it'd have to be explicitly set to an empty value for this to occur.
This implementation will silently work around a config issue. If you must go this route, please log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that I didn't define this environment variable the same way boundary does for all other env vars. The socket path now has an explicit default, a corresponding CLI flag, and empty socket path handling that bails out.
audit/socket_auditor.go
Outdated
| clearBatch() | ||
| } else { | ||
| // Network error: close connection but keep batch for a future retry. | ||
| s.logger.Warn("failed to flush audit logs; will retry", "error", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Retry when?
Please ensure warning logs have as much diagnostic relevance as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This made me realize this case has the same problem where the pending batch could be stuck if a new log doesn't arrive to kick off the retry. I added a timer reset here and added more log context.
cli/cli.go
Outdated
| { | ||
| Flag: "log-proxy-socket-path", | ||
| Description: "Path to the socket where the boundary log proxy server listens for audit logs.", | ||
| Default: "/tmp/boundary-audit.sock", // Important: this is the same default path used by the workspace agent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry to keep banging on about this but I think this is very important.
Having a default will cause problems in the future. This is a required param because the client and server need to agree on the path.
If a path is not provided and disable-audit-logs is false, boundary should fail loud and proud.
PS disable-audit-logs doesn't have an env var option, which is inconsistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence about this. I see how this could cause a problem in the future if a default gets changed in one place, but that should be unlikely. Having a sane default in both places makes for a better user experience because there's no need to configure the socket path. This will work out of the box for most users, while more unique deployments can still set the env var in terraform.
That said, I think defining an exported constant/func in the proxy server package for the default socket path and using that value as the default socket path in both coder and boundary would be an improvement in terms of path agreement. It's still possible they don't agree, but it seems like a good tradeoff to avoid requiring configuration.
Add agent forwarding of boundary audit logs from workspaces to coderd via agent API, and re-emission of boundary logs to coderd stderr. This change adds a server to the workspace agent that always listens on a unix socket for boundary to connect and send audit logs. coderd log format example: ``` [API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=.. ``` Corresponding boundary PR: coder/boundary#124 RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba #21280
| // agent is on a new enough version to prevent boundary application log spam from | ||
| // trying to connect to the agent. This assumes the agent will run and start the | ||
| // log proxy server before boundary runs. | ||
| _, err := os.Stat(logProxySocketPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@evgeniy-scherbina had a good comment about the independent versioning of boundary and coder. To avoid any boundary application log spam, I added a check for agent socket existence for boundary to determine if the agent is on a version new enough to proxy logs.
Add SocketAuditor that sends audit logs to the Coder workspace agent via a Unix socket. This enables boundary audit events to be forwarded to coderd for centralized logging.
Features:
RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba?pvs=23
Corresponding PR in coder/coder coder/coder#21345
coder/coder#21280