[Bug]: Logging guide: wrong namespace, missing container guidance, outdated shipper examples

## Summary

The [Kubernetes logging guide](https://docs.stacklok.com/toolhive/guides-k8s/logging) has several inaccuracies and gaps that make it difficult to set up log collection in practice. This was discovered while setting up log collection for ToolHive audit logs.

The [vMCP audit logging guide](https://docs.stacklok.com/toolhive/guides-vmcp/audit-logging) is more accurate (it uses the correct `toolhive-system` namespace and has a proper configuration reference table), but shares issues 2, 3, and 5 below. Issue 4 does not apply to vMCP as those containers emit pure JSON with no plain-text startup line.

## Issues found

### 1. Wrong default namespace in examples (operator docs only)

The guide's filtering examples (Fluentd, Filebeat, Splunk) all appear to assume the namespace is `toolhive`, but the default namespace used by the ToolHive operator is `toolhive-system`. Any log pipeline that filters by namespace path (e.g. `/var/log/pods/toolhive_*` in a filelog glob) will silently collect nothing.

**Suggested fix:** Update all namespace references in examples to `toolhive-system`, or note that the namespace may vary and show how to verify it (`kubectl get pods -n toolhive-system`).

### 2. No guidance on which container to target (both docs)

ToolHive pods have different container names depending on the component:
- **MCP proxy** Deployment pods: container named **`toolhive`** — emits structured JSON audit logs
- **vMCP** pods: container named **`vmcp`** — emits structured JSON audit logs
- StatefulSet pods (`-0` suffix, older generation): container named **`mcp`** — raw MCP server, emits plain text

The guides don't mention container names at all. Without targeting the right container (e.g. `/var/log/containers/*_toolhive-system_toolhive-*.log`), collectors pick up MCP server stdout (plain text, not JSON) and JSON parsing fails silently.

### 3. Outdated log shipper examples (both docs)

The guides cover Fluentd, Filebeat, and Splunk. These are not the tools most Kubernetes users reach for today. More relevant examples would be:

- **Grafana Alloy** — the current recommended Grafana log collector (Promtail is deprecated in favour of Alloy). The natural companion to Loki, which is the default dashboard backend for ToolHive's own observability stack.
- **FluentBit** — the most widely deployed lightweight log scraper in Kubernetes, with first-class JSON and CRI format support.

Working FluentBit config requires more pipeline stages than the existing examples suggest: a CRI parser on the tail input, a Kubernetes filter for metadata enrichment, a second JSON parser filter to promote the log body fields (including `level`) to the top level of the record, and a `service` label for Grafana Explore Logs grouping. The docs should show a complete, working example rather than a partial snippet.

The OTel Collector (`filelog` receiver) is an option when teams are already running it as an observability hub, but its file scraping support has rough edges and it is not commonly used as a primary log scraper. If included, it should be framed as a secondary "if you're already running OTel" option.

### 4. Undocumented plain-text startup line breaks JSON parsers (operator docs only)

Every `toolhive` proxy container emits one plain-text (non-JSON) log line on startup:

```
Workload started successfully. Press Ctrl+C to stop.
```

This line is not JSON, so any log pipeline configured to parse JSON will encounter a parse error on it. The guide doesn't mention this, so users setting up log collection will hit an unexpected error on every pod start/restart with no clear explanation.

**Suggested fix:** Add a note such as: _"Note: each proxy pod emits a single plain-text startup line (`Workload started successfully. Press Ctrl+C to stop.`) before structured JSON logging begins. Log pipelines should be configured to tolerate or drop non-JSON lines."_

The root fix for this is tracked in stacklok/toolhive#4295 — once that lands, the startup line will be structured JSON and this caveat can be removed from the docs.

### 5. vMCP audit events use a non-standard log level (both docs)

vMCP audit events are emitted at `"level":"INFO+2"` — a Go `log/slog` artifact rather than a standard named level. This means audit events show as `unknown` level in Loki's `detected_level` field and in Grafana's Explore Logs view, and cannot be filtered by level in any standard log pipeline.

The docs do not mention this. Users who set up level-based filtering or alerting on audit events will find they are silently excluded from level-filtered views.

**Suggested fix:** Add a note that audit events currently appear at level `INFO+2` and that level-based filters should account for this. The root fix is tracked in stacklok/toolhive#4296, which proposes logging audit events at a standard level.

## Impact

These gaps mean a user following the guide is likely to end up with a log pipeline that either silently collects nothing (wrong namespace/container), errors on startup (plain-text line), or loses audit events from level-filtered views (`INFO+2` level). The structured logging feature is not usable from the docs alone without significant trial and error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Logging guide: wrong namespace, missing container guidance, outdated shipper examples #625

Summary

Issues found

1. Wrong default namespace in examples (operator docs only)

2. No guidance on which container to target (both docs)

3. Outdated log shipper examples (both docs)

4. Undocumented plain-text startup line breaks JSON parsers (operator docs only)

5. vMCP audit events use a non-standard log level (both docs)

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Logging guide: wrong namespace, missing container guidance, outdated shipper examples #625

Description

Summary

Issues found

1. Wrong default namespace in examples (operator docs only)

2. No guidance on which container to target (both docs)

3. Outdated log shipper examples (both docs)

4. Undocumented plain-text startup line breaks JSON parsers (operator docs only)

5. vMCP audit events use a non-standard log level (both docs)

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions