-
Notifications
You must be signed in to change notification settings - Fork 395
feat(serve): add live pprof HTTP server to serve api command #3201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| package profiling | ||
|
|
||
| import ( | ||
| "context" | ||
| "errors" | ||
| "fmt" | ||
| "log/slog" | ||
| "net" | ||
| "net/http" | ||
| "net/http/pprof" | ||
| "time" | ||
| ) | ||
|
|
||
| // StartPprofServer starts an HTTP server exposing Go runtime profiling endpoints | ||
| // at /debug/pprof/ on the given addr. It binds the listener synchronously and | ||
| // returns an error if the address is unavailable. The server runs in a background | ||
| // goroutine and shuts down when ctx is cancelled. | ||
| // addr must be a TCP host:port address (e.g. "127.0.0.1:6060"); unix://, npipe://, | ||
| // and fd:// schemes are not supported. Prefer a loopback address over a bare port | ||
| // (":6060") — the latter binds all interfaces, exposing process memory and arguments | ||
| // to the network. | ||
| func StartPprofServer(ctx context.Context, addr string) error { | ||
| mux := http.NewServeMux() | ||
| mux.HandleFunc("/debug/pprof/", pprof.Index) | ||
| mux.HandleFunc("/debug/pprof/cmdline", pprof.Cmdline) | ||
| mux.HandleFunc("/debug/pprof/profile", pprof.Profile) | ||
| mux.HandleFunc("/debug/pprof/symbol", pprof.Symbol) | ||
| mux.HandleFunc("/debug/pprof/trace", pprof.Trace) | ||
|
|
||
| ln, err := (&net.ListenConfig{}).Listen(ctx, "tcp", addr) | ||
| if err != nil { | ||
| return fmt.Errorf("pprof: listen on %s: %w", addr, err) | ||
| } | ||
|
|
||
| // ReadHeaderTimeout guards against slow-loris connections on the debug port. | ||
| // WriteTimeout is intentionally omitted: profile/trace captures legitimately | ||
| // run for tens of seconds and would be truncated by a short write deadline. | ||
| srv := &http.Server{Handler: mux, ReadHeaderTimeout: 10 * time.Second} | ||
|
|
||
| slog.InfoContext(ctx, "pprof server listening", "addr", ln.Addr().String()) | ||
| if tcpAddr, ok := ln.Addr().(*net.TCPAddr); ok && !tcpAddr.IP.IsLoopback() { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [MEDIUM] Non-loopback pprof binding warns but does not refuse — unauthenticated memory access over the network When a non-loopback address (e.g.
The existing Consider either rejecting non-loopback addresses outright, or making the security implication explicit in user-facing documentation beyond the hidden flag help text. |
||
| slog.WarnContext(ctx, "pprof server is listening on a non-loopback address — "+ | ||
| "/debug/pprof/cmdline and heap profiles are network-reachable without authentication", | ||
| "addr", tcpAddr.String()) | ||
| } | ||
|
|
||
| go func() { | ||
| if err := srv.Serve(ln); err != nil && !errors.Is(err, http.ErrServerClosed) { | ||
| slog.Warn("pprof server error", "error", err) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. [LOW] The serve goroutine calls // Current (loses ctx values):
slog.Warn("pprof server error", "error", err)
// Should be:
slog.WarnContext(ctx, "pprof server error", "error", err)This loses any trace IDs or structured context fields attached to |
||
| } | ||
| }() | ||
|
|
||
| go func() { | ||
| <-ctx.Done() | ||
| // 5s grace: favors prompt process exit over draining in-flight profile | ||
| // captures. CPU/trace profiles run up to 30s by default; callers should | ||
| // cancel their requests before the process exits rather than relying on | ||
| // this timeout to drain them. | ||
| // context.WithoutCancel preserves ctx values (e.g. trace IDs) without | ||
| // inheriting the cancellation, so the shutdown timeout is not pre-expired. | ||
| shutdownCtx, cancel := context.WithTimeout(context.WithoutCancel(ctx), 5*time.Second) | ||
| defer cancel() | ||
| if err := srv.Shutdown(shutdownCtx); err != nil { | ||
| slog.WarnContext(shutdownCtx, "pprof server shutdown error", "error", err) | ||
| } | ||
| }() | ||
|
|
||
| return nil | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[MEDIUM] pprof bind failure aborts the entire
serve apicommandIf the address set via
--pprof-addr/CAGENT_PPROF_ADDRis unavailable (e.g. port already in use, address already bound by another process, or a typo),StartPprofServerreturns an error that propagates here — preventing the main API listener from ever starting.This is a disproportionate failure mode for a debug-only facility: a misconfigured
CAGENT_PPROF_ADDRenvironment variable in a production deployment would silently take the entire agent down. Consider logging the error and continuing rather than returning it fatally, or at minimum documenting that the env-var can cause a hard service outage: