feat(serve): add live pprof HTTP server to serve api command#3201
Conversation
Adds a hidden --pprof-addr flag (and CAGENT_PPROF_ADDR env var) to `serve api` that starts a dedicated pprof HTTP server at /debug/pprof/. Activated via env var in harbor-watch compose for memory leak profiling. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟡 NEEDS ATTENTION
Two LIKELY medium-severity concerns in the new pprof server implementation. One drafter hypothesis (custom mux missing heap/goroutine routes) was dismissed by verification — pprof.Index correctly serves named runtime profiles via URL path parsing, so the five explicit registrations are sufficient and correct.
| } | ||
|
|
||
| if pprofAddr := cmp.Or(f.pprofAddr, os.Getenv("CAGENT_PPROF_ADDR")); pprofAddr != "" { | ||
| if err := profiling.StartPprofServer(ctx, pprofAddr); err != nil { |
There was a problem hiding this comment.
[MEDIUM] pprof bind failure aborts the entire serve api command
If the address set via --pprof-addr / CAGENT_PPROF_ADDR is unavailable (e.g. port already in use, address already bound by another process, or a typo), StartPprofServer returns an error that propagates here — preventing the main API listener from ever starting.
This is a disproportionate failure mode for a debug-only facility: a misconfigured CAGENT_PPROF_ADDR environment variable in a production deployment would silently take the entire agent down. Consider logging the error and continuing rather than returning it fatally, or at minimum documenting that the env-var can cause a hard service outage:
if err := profiling.StartPprofServer(ctx, pprofAddr); err != nil {
// OPTION A – degrade gracefully
slog.WarnContext(ctx, "pprof server failed to start; continuing without profiling", "error", err)
// OPTION B – keep fatal, but make the impact explicit in the flag help text
}| srv := &http.Server{Handler: mux, ReadHeaderTimeout: 10 * time.Second} | ||
|
|
||
| slog.InfoContext(ctx, "pprof server listening", "addr", ln.Addr().String()) | ||
| if tcpAddr, ok := ln.Addr().(*net.TCPAddr); ok && !tcpAddr.IP.IsLoopback() { |
There was a problem hiding this comment.
[MEDIUM] Non-loopback pprof binding warns but does not refuse — unauthenticated memory access over the network
When a non-loopback address (e.g. 0.0.0.0:6060) is used, the warning is logged and the server starts anyway. Unlike the main API listener — which requires bearer-token authentication — pprof requires no authentication and exposes:
/debug/pprof/heap— full heap dump (may contain in-memory secrets, tokens, TLS keys)/debug/pprof/cmdline— process command-line arguments (may contain passwords)/debug/pprof/goroutine— goroutine stacks (code paths, internal state)
The existing warnIfNotLoopback for the main API listener is a reasonable pattern because that listener is authenticated. For an unauthenticated endpoint, a warn-and-proceed policy means any actor who can set CAGENT_PPROF_ADDR=0.0.0.0:6060 in the environment (e.g. via a Docker Compose override or a K8s config map) can read process memory from the network with no further credentials.
Consider either rejecting non-loopback addresses outright, or making the security implication explicit in user-facing documentation beyond the hidden flag help text.
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🟢 APPROVE
The pprof server implementation is well-structured: synchronous bind (fatal on failure), dedicated http.ServeMux (no DefaultServeMux pollution in practice), loopback warning matching existing project patterns, ReadHeaderTimeout for slow-loris protection, and correct context.WithoutCancel usage for shutdown. One minor logging inconsistency found (see inline comment).
Reviewed: cmd/root/api.go, pkg/profiling/pprof_server.go
|
|
||
| go func() { | ||
| if err := srv.Serve(ln); err != nil && !errors.Is(err, http.ErrServerClosed) { | ||
| slog.Warn("pprof server error", "error", err) |
There was a problem hiding this comment.
[LOW] slog.Warn used without context — loses trace IDs and structured context values
The serve goroutine calls slog.Warn(...) (context-free variant) while every other log statement in this file uses a context-aware variant (slog.InfoContext, slog.WarnContext). The goroutine closes over ctx, so the context is available and should be passed.
// Current (loses ctx values):
slog.Warn("pprof server error", "error", err)
// Should be:
slog.WarnContext(ctx, "pprof server error", "error", err)This loses any trace IDs or structured context fields attached to ctx at the time of the error, making it harder to correlate this log entry with the request/session that triggered the failure.
Summary
--pprof-addrflag (andCAGENT_PPROF_ADDRenv var) toserve apithat starts a Go pprof HTTP server at/debug/pprof/--cpuprofile,--memprofile)Warnlog matching the existingwarnIfNotLoopbackpattern for the API listenerDesign details
http.ServeMux(notDefaultServeMux) — avoids polluting any other listener with pprof routesReadHeaderTimeout: 10sguards slow-loris;WriteTimeoutintentionally omitted — profile/trace captures legitimately run tens of secondscontext.WithoutCancel(ctx)+ 5s timeout, so the grace window isn't pre-expired when the parent context is already cancelledunix://,npipe://,fd://not supported; documented in flag help and doc comment)(&net.ListenConfig{}).Listen(ctx, "tcp", addr)(noctx),context.WithoutCancel(gosec G118)Motivation
Harbor-watch (
serve apilong-lived deployment) has suspected memory leaks (per-LLM-call HTTP transport churn, session goroutine lifecycle gaps). This is the prerequisite for empirical investigation — capturing heap/goroutine profiles from a running instance.Test plan
go test ./cmd/root/... ./pkg/profiling/...passesgolangci-lint run ./cmd/root/... ./pkg/profiling/...— 0 issuesserve api --pprof-addr 127.0.0.1:6060 ./agentsand confirmcurl http://localhost:6060/debug/pprof/returns the pprof index--pprof-addrdoes not appear indocker-agent serve api --helpmake run-harbor-watch-api-debug, thengo tool pprof http://localhost:6060/debug/pprof/heap🤖 Generated with Claude Code