Skip to content

Commit b23740b

Browse files
authored
docs(self-hosting): NodeLocal DNS and ClickHouse task events (#3568)
## Summary - Recommend deploying NodeLocal DNS and lowering `ndots` to `1` in the Kubernetes self-hosting guide. - Recommend storing task events in ClickHouse (`EVENT_REPOSITORY_DEFAULT_STORE=clickhouse_v2`) in both the Docker and Kubernetes guides, plus a new row in the webapp env var reference.
1 parent 759214e commit b23740b

3 files changed

Lines changed: 39 additions & 1 deletion

File tree

docs/self-hosting/docker.mdx

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,7 @@ docker compose up -d
189189
To create additional worker groups beyond the bootstrap group, use the admin API endpoint. This requires admin privileges.
190190
191191
**Making a user admin:**
192+
192193
- **New users**: Set `ADMIN_EMAILS` environment variable (regex pattern) before user creation.
193194
- **Existing users**: Set `admin = true` in the `user` table in your database.
194195
@@ -341,6 +342,18 @@ By default, the images will point at the latest versioned release via the `lates
341342
TRIGGER_IMAGE_TAG=v4.0.0
342343
```
343344
345+
## Task events
346+
347+
By default, task events (timeline, logs, spans) are stored in PostgreSQL. For production deployments we recommend storing them in ClickHouse instead, it scales to much higher volumes and avoids unbounded growth of the `TaskEvent` table.
348+
349+
To enable, set on the webapp in your `.env`:
350+
351+
```bash
352+
EVENT_REPOSITORY_DEFAULT_STORE=clickhouse_v2
353+
```
354+
355+
This only affects new runs; existing runs continue to read from wherever their events were originally stored.
356+
344357
## Troubleshooting
345358
346359
- **Deployment fails at the push step.** The machine running `deploy` needs registry access. See the [registry setup](#registry-setup) section for more details.
@@ -359,7 +372,9 @@ TRIGGER_IMAGE_TAG=v4.0.0
359372
- **ClickHouse migrations say "no migrations to run" but schema is missing.** The goose migration tracker is out of sync. Exec into the webapp container, set the GOOSE env vars (from webapp startup logs), and run `goose reset && goose up`.
360373
361374
<Warning>
362-
**Data Loss Warning:** The `goose reset` command is destructive and will drop the entire schema. Make sure to backup your data and confirm you are running this in a non-production environment before executing this command.
375+
**Data Loss Warning:** The `goose reset` command is destructive and will drop the entire schema.
376+
Make sure to backup your data and confirm you are running this in a non-production environment
377+
before executing this command.
363378
</Warning>
364379
365380
## CLI usage

docs/self-hosting/env/webapp.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,8 @@ mode: "wide"
123123
| `TRIGGER_OTEL_ATTRIBUTE_PER_LINK_COUNT_LIMIT` | No | 10 | OTel attribute per link count limit. |
124124
| `TRIGGER_OTEL_ATTRIBUTE_PER_EVENT_COUNT_LIMIT` | No | 10 | OTel attribute per event count limit. |
125125
| `SERVER_OTEL_SPAN_ATTRIBUTE_VALUE_LENGTH_LIMIT` | No | 8192 | OTel span attribute value length limit. |
126+
| **Task events** | | | |
127+
| `EVENT_REPOSITORY_DEFAULT_STORE` | No | postgres | Where to store task events. Set to `clickhouse_v2` to store in ClickHouse (recommended for production). |
126128
| **Realtime** | | | |
127129
| `REALTIME_STREAM_MAX_LENGTH` | No | 1000 | Realtime stream max length. |
128130
| `REALTIME_STREAM_TTL` | No | 86400 (1d) | Realtime stream TTL (s). |

docs/self-hosting/kubernetes.mdx

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,27 @@ webapp:
354354
- Compatible with secret management tools (External Secrets Operator, etc.)
355355
- Follows Kubernetes security best practices
356356

357+
## DNS performance
358+
359+
For production clusters we recommend deploying [NodeLocal DNSCache](https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/). DNS queries — especially to managed Postgres or Redis endpoints — can be very slow under Kubernetes' default resolver, and a node-local cache typically gives a large step change in latency and throughput across the cluster.
360+
361+
The default `ndots: 5` setting also forces every cluster search domain to be tried before resolving hostnames with fewer dots (the case for most external database hosts). Lowering `ndots` to `1` on the webapp and supervisor pods avoids those extra round-trips.
362+
363+
## Task events
364+
365+
By default, task events (timeline, logs, spans) are stored in PostgreSQL. For production deployments we recommend storing them in ClickHouse instead, it scales to much higher volumes and avoids unbounded growth of the `TaskEvent` table.
366+
367+
ClickHouse is already deployed by the chart, so no extra services are required. To enable, set `EVENT_REPOSITORY_DEFAULT_STORE` on the webapp via `extraEnvVars`:
368+
369+
```yaml
370+
webapp:
371+
extraEnvVars:
372+
- name: EVENT_REPOSITORY_DEFAULT_STORE
373+
value: "clickhouse_v2"
374+
```
375+
376+
This only affects new runs; existing runs continue to read from wherever their events were originally stored.
377+
357378
## Worker token
358379

359380
When using the default bootstrap configuration, worker creation and authentication is handled automatically. The webapp generates a worker token and makes it available to the supervisor via a shared volume.

0 commit comments

Comments
 (0)