feat: emit envd logs through orchestrator logger#2309
feat: emit envd logs through orchestrator logger#2309joe-lombrozo-s-bot[bot] wants to merge 9 commits intomainfrom
Conversation
Use the existing envd -> hyperloop /logs pipeline and enrich incoming log records with sandbox/team identity before emitting them through the orchestrator logger. Changes: - accept envd log payloads even when instanceID is omitted - always overwrite sandbox-owned fields (instanceID, teamID, source) - log envd payloads locally through the orchestrator logger with level mapping - continue forwarding to the external collector when configured
Convert several remaining raw stderr prints in envd to use the shared zerolog logger so they flow through the existing envd -> hyperloop -> orchestrator log pipeline. This covers: - MMDS polling messages - envd startup/run-dir file errors - cgroup manager setup/teardown errors - process I/O read errors
Stop rejecting envd log payloads based on the embedded instanceID. We already know the source sandbox from the connection, so just overwrite sandbox-owned fields. If the payload included a mismatched instanceID, preserve it under invalid-instance-id for debugging.
packages/envd/internal/host/mmds.go
Outdated
| select { | ||
| case <-ctx.Done(): | ||
| fmt.Fprintf(os.Stderr, "context cancelled while waiting for mmds opts") | ||
| logger.Debug().Msg("context cancelled while waiting for mmds opts") |
There was a problem hiding this comment.
This should be logger.Error
packages/envd/internal/host/mmds.go
Outdated
| token, err := getMMDSToken(ctx, httpClient) | ||
| if err != nil { | ||
| fmt.Fprintf(os.Stderr, "error getting mmds token: %v\n", err) | ||
| logger.Debug().Err(err).Msg("error getting mmds token") |
There was a problem hiding this comment.
This should be logger.Error
packages/envd/internal/host/mmds.go
Outdated
| mmdsOpts, err := getMMDSOpts(ctx, httpClient, token) | ||
| if err != nil { | ||
| fmt.Fprintf(os.Stderr, "error getting mmds opts: %v\n", err) | ||
| logger.Debug().Err(err).Msg("error getting mmds opts") |
There was a problem hiding this comment.
This should be logger.Error
There was a problem hiding this comment.
Code review is billed via overage credits. To resume reviews, an organization admin can raise the monthly limit at claude.ai/admin-settings/claude-code.
Once credits are available, reopen this pull request to trigger a review.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4171c2d5a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| level, _ := payload["level"].(string) | ||
| zapLevel, _ := zapcore.ParseLevel(level) // defaults to info if invalid | ||
| h.logger. | ||
| WithOptions(zap.AddStacktrace(zapcore.FatalLevel)). // stack trace only points to hyperloop, useless | ||
| Log(ctx, zapLevel, message, fields...) |
There was a problem hiding this comment.
Clamp untrusted log level before calling logger.Log
The request payload controls payload["level"], and this value is passed directly to zapcore.ParseLevel and then h.logger.Log(...). If a sandbox sends {"level":"fatal"} (or panic), zap will execute fatal/panic behavior for that log level; in the fatal case this terminates the orchestrator process, so any sandbox can trigger a control-plane DoS via POST /logs. Please sanitize/mask incoming levels (e.g., cap at error/warn/info/debug) before invoking Log.
Useful? React with 👍 / 👎.
This uses the existing envd -> hyperloop pipeline rather than introducing a new transport.
What it does
overwrites sandbox-owned fields to prevent spoofing:
Small compatibility tweak
If is omitted from the envd payload, the handler now accepts it and fills it in from the resolved sandbox identity. If it is present, it is still validated before being overwritten.
This gives us immediately-readable envd logs outside the sandbox while reusing the plumbing that already exists in MMDS/envd/hyperloop.