Skip to content

Conversation

@zedkipp
Copy link
Contributor

@zedkipp zedkipp commented Dec 19, 2025

Add SocketAuditor that sends audit logs to the Coder workspace agent via a Unix socket. This enables boundary audit events to be forwarded to coderd for centralized logging.

Features:

  • Batching: 10 logs or 5 seconds, whichever comes first
  • Wire format: length-prefixed protobuf (proto imported from AgentAPI) to make boundary -> agent -> coderd simple to start

RFC: https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba?pvs=23
Corresponding PR in coder/coder coder/coder#21345
coder/coder#21280

@zedkipp zedkipp force-pushed the zedkipp/socket-auditor branch 3 times, most recently from fac8048 to a2ea4f9 Compare December 19, 2025 21:37
@zedkipp zedkipp marked this pull request as ready for review December 23, 2025 21:39
Add SocketAuditor that sends audit logs to the Coder workspace agent
via a Unix socket. This enables boundary audit events to be forwarded
to coderd for centralized logging.

Implementation notes:
- Batching: 10 logs or 5 seconds, whichever comes first
- Wire format: tag & length prefixed protobuf. proto imported from AgentAPI to
  simplify boundary -> agent -> coderd forwarding to start.
- CLI and config flag to disable sending of audit logs to workspace agent
  as an escape hatch
@zedkipp zedkipp force-pushed the zedkipp/socket-auditor branch from a2ea4f9 to 2365931 Compare December 24, 2025 00:13
Copy link

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted two fairly significant shortcomings which I think need to be addressed before this can land.

I'm not going to block merge because this is not my project, but I highly recommend these be addressed before proceeding.

@zedkipp zedkipp requested a review from dannykopping December 30, 2025 01:32
Copy link

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I don't think we should be silently falling back to a default socket value.

Comment on lines 54 to 57
socketPath := os.Getenv(EnvAuditSocketPath)
if socketPath == "" {
socketPath = defaultAuditSocketPath
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest rather failing loud and proud here.
The env var has a default value, so it'd have to be explicitly set to an empty value for this to occur.
This implementation will silently work around a config issue. If you must go this route, please log.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that I didn't define this environment variable the same way boundary does for all other env vars. The socket path now has an explicit default, a corresponding CLI flag, and empty socket path handling that bails out.

clearBatch()
} else {
// Network error: close connection but keep batch for a future retry.
s.logger.Warn("failed to flush audit logs; will retry", "error", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retry when?
Please ensure warning logs have as much diagnostic relevance as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This made me realize this case has the same problem where the pending batch could be stuck if a new log doesn't arrive to kick off the retry. I added a timer reset here and added more log context.

@zedkipp zedkipp requested a review from dannykopping December 31, 2025 00:02
cli/cli.go Outdated
{
Flag: "log-proxy-socket-path",
Description: "Path to the socket where the boundary log proxy server listens for audit logs.",
Default: "/tmp/boundary-audit.sock", // Important: this is the same default path used by the workspace agent.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry to keep banging on about this but I think this is very important.
Having a default will cause problems in the future. This is a required param because the client and server need to agree on the path.

If a path is not provided and disable-audit-logs is false, boundary should fail loud and proud.

PS disable-audit-logs doesn't have an env var option, which is inconsistent.

Copy link
Contributor Author

@zedkipp zedkipp Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence about this. I see how this could cause a problem in the future if a default gets changed in one place, but that should be unlikely. Having a sane default in both places makes for a better user experience because there's no need to configure the socket path. This will work out of the box for most users, while more unique deployments can still set the env var in terraform.

That said, I think defining an exported constant/func in the proxy server package for the default socket path and using that value as the default socket path in both coder and boundary would be an improvement in terms of path agreement. It's still possible they don't agree, but it seems like a good tradeoff to avoid requiring configuration.

zedkipp added a commit to coder/coder that referenced this pull request Dec 31, 2025
Add agent forwarding of boundary audit logs from workspaces to coderd
via agent API, and re-emission of boundary logs to coderd stderr. This
change adds a server to the workspace agent that always listens on a
unix socket for boundary to connect and send audit logs.

coderd log format example:
```
[API] 2025-12-23 18:31:46.755 [info] coderd.agentrpc: boundary_request owner=.. workspace_name=.. agent_name=.. decision=.. workspace_id=.. http_method=.. http_url=.. event_time=.. request_id=..
```

Corresponding boundary PR: coder/boundary#124
RFC:
https://www.notion.so/coderhq/Agent-Boundary-Logs-2afd579be59280f29629fc9823ac41ba
#21280
// agent is on a new enough version to prevent boundary application log spam from
// trying to connect to the agent. This assumes the agent will run and start the
// log proxy server before boundary runs.
_, err := os.Stat(logProxySocketPath)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evgeniy-scherbina had a good comment about the independent versioning of boundary and coder. To avoid any boundary application log spam, I added a check for agent socket existence for boundary to determine if the agent is on a version new enough to proxy logs.

@zedkipp zedkipp merged commit 960680f into main Jan 5, 2026
5 checks passed
@zedkipp zedkipp deleted the zedkipp/socket-auditor branch January 5, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants