Skip to content

bench | timeseries: JSON API, Grafana datasource plugin, ci-bench-timeseries profile#6562

Draft
Russoul wants to merge 20 commits into
masterfrom
russoul/timeseries-align-http-api-with-prometheus
Draft

bench | timeseries: JSON API, Grafana datasource plugin, ci-bench-timeseries profile#6562
Russoul wants to merge 20 commits into
masterfrom
russoul/timeseries-align-http-api-with-prometheus

Conversation

@Russoul
Copy link
Copy Markdown
Contributor

@Russoul Russoul commented May 6, 2026

Summary

Query language improvements

  • epoch parser bug fixedepoch now correctly produces Timestamp 0 (Unix epoch); previously it parsed identically to now, making epoch + Nms evaluate to ~year 2082 instead of the intended absolute timestamp
  • Metric-name awareness in the elaborator — unknown variable names now produce Undefined name: <x> errors at elaboration time instead of confusing downstream type-mismatch errors; St carries availableMetrics :: Set MetricIdentifier populated from the store
  • "Did you mean?" suggestions in Undefined-name errors — the elaborator computes Levenshtein distance (via edit-distance) to known metric names and local bindings, appending the closest match to the error message
  • Local name shadowinglet x = 1 in let x = 2 in x now correctly returns 2; previously the elaborator rejected re-use of any name
  • RangeVector/Scalar arithmetic+, -, *, / between a range vector and a scalar are now supported in both elaboration and interpretation
  • Unit and List typesUnit, Nil, and Cons constructors added to Value and the semantic Expr; metrics expression returns the metric list as a Cons-list of Text values
  • sum_over_time elab bug fixedSurface.SumOverTime was incorrectly elaborated to Semantic.AvgOverTime (copy-paste error)
  • Noncanonical arithmetic rules — four new bidirectional rules let the elaborator infer types in previously-stuck expressions:
    • Duration + Duration : ? → result is Duration (e.g. 1s + 2s now elaborates standalone)
    • ? - Duration : ? → lhs must be Timestamp (e.g. \x -> x - 1s infers x : Timestamp)
    • ? + ? : Duration → both args must be Duration (e.g. \x -> \y -> m[now; now : x + y] infers both lambda args)
    • ? - ? : Timestamp → lhs must be Timestamp, rhs must be Duration (e.g. \x -> \y -> m[x - y; x] infers arg types from the range start position)

Test suite

  • Comprehensive interpretation tests — 250 tests covering scalar arithmetic, boolean logic, timestamp/duration arithmetic, instant-vector operations, range-vector aggregations (avg_over_time, sum_over_time, rate, increase, quantile_over_time), label filtering, metrics query, shadowing, and error cases
  • Parser test suite — full coverage of the surface expression parser
  • Elaboration test suite — ~115 cases covering all elaborator paths: literals, duration literals, arithmetic (well-typed and ill-typed), comparisons, boolean operators, let/lambda, pairs/projections, to_scalar, abs/round, the metrics builtin, range vectors, instant-vector operations, and metric-name resolution (including "Did you mean?" suggestions)

Grafana datasource plugin (bench/grafana-datasource/)

  • $__from / $__to support — the plugin pre-processes queries before template substitution, rewriting $__from and $__to to epoch + Nms expressions the query language understands; dashboard panels now use [$__from; $__to] ranges driven by the Grafana time picker
  • Prometheus-compatible query APIPOST /timeseries/query now accepts { "query": "...", "time": <unix_s> } and returns { "status": "success", "data": <Value> } mirroring Prometheus conventions; errors return { "status": "error", "errorType": "...", "error": "..." }
  • Error handling — server-side errors are surfaced as Grafana DataQueryError with the first line as the panel message and the full elaboration/interpretation trace in the inspector
  • Provisioned RTView dashboard — 25 panels covering Resources, Blockchain, Leadership, and Transactions sections, all pre-wired to the datasource
  • Metric name sanitisation. and - in EKG metric names are replaced with _ so names match the cardano_node_metrics_* convention used in queries

ci-bench-timeseries workbench profile

  • Genesis funds fixed — the original funds_balance (10M ADA) was insufficient for the tx-generator's fund-splitting phase; a new fundsTimeseries vocabulary sets 22.5B ADA; tx_count reduced to 108,000 (2-hour run at 15 tps) so the required split fits comfortably within available funds
  • Profile definition — 2-node loopback cluster, Conway era, tracer.timeseries = true, tracer.rtview = true, no shutdown condition

Russoul added 20 commits May 7, 2026 00:16
…eseries profile

**cardano-tracer / cardano-timeseries-io**
- Add `Cardano.Timeseries.JSON` with `ToJSON` instances for `Value`, `Instant`,
  `Timeseries` and `SeriesIdentifier` (orphan module, imported for side-effects)
- Switch `POST /timeseries/query` response from plain text to `application/json`

**cardano-profile**
- Add `timeseries :: Bool` field to `Tracer`
- Add `tracerTimeseries` primitive
- Add `ci-bench-timeseries` profile: 2-node local cluster with timeseries endpoint
  enabled, no shutdown condition, generator runs for 100 000 epochs (effectively
  indefinite) — intended for interactive exploration via Grafana
- Regenerate `all-profiles-coay.json` and `wb_profiles.mk`; update all test fixtures

**Nix**
- `cardano-tracer-service-workbench.nix`: add `timeseries` option → `hasTimeseries`
- `cardano-tracer-service.nix`: add `timeseriesEnable/Host/Port` NixOS options
- `tracer.nix`: wire `profile.tracer.timeseries` → `{epHost, epPort = 3400}`
- `supervisor.sh`: replace hanging `netstat -pltn` with `lsof -nP -iTCP:9001
  -sTCP:LISTEN` for macOS compatibility

**bench/grafana-datasource** (new)
- TypeScript Grafana datasource plugin (`iog-cardanotimeseries-datasource`)
- Sends `POST /timeseries/query` via Grafana server-side proxy (avoids CORS)
- Converts `Value` tagged-union JSON to Grafana `DataFrame[]`
- `docker-compose.yaml` for local development; datasource auto-provisioned at
  `http://host.docker.internal:3400`; Colima-compatible (`extra_hosts`)
- Parse/eval errors from the server are now surfaced as DataQueryError:
  the banner shows the first line (summary); the full multi-line message
  (source location + caret) is in data.message for the inspector
- Requests go via Grafana's server-side proxy (instanceSettings.url) to
  avoid CORS and host-resolution issues in the browser
- Add Unit value: renders as an empty frame (no data to display)
- POST /timeseries/query now accepts JSON body {"query": "...", "time": <optional Unix seconds>}
- Success response wrapped in {"status":"success","data":...} envelope
- Error responses use {"status":"error","errorType":"...","error":"..."} format
- errorType mirrors Prometheus: "parse", "bad_data", "execution"
- Plugin updated to send application/json body and unwrap response envelope
- Remove ConfigEditor.tsx (Grafana's built-in URL field suffices)
Introduces Nil and Cons constructors to both the expression language and
the value domain. The primary motivation is to give the built-in Metrics
expression a well-typed return value: it now returns a proper linked list
of Text values (metric names) rather than a right-nested Pair tuple
terminated by Unit, making it possible to assign it the type List Text in
the elaborator.

- Interp: Metrics now builds a Nil/Cons spine instead of a Pair/Unit tuple
- JSON: ToJSON instances for Nil and Cons (tag/head/tail encoding)
- Show: show Nil = "[]", show (Cons h t) = "[h|t]"
- Resolve: structural pass-throughs for Nil and Cons
- Grafana plugin: Nil/Cons variants in the TypeScript Value union;
  Cons case walks the spine into a string table frame, Text items
  render their actual string value
…ew in ci-bench-timeseries

- Add provisioned Grafana dashboard mimicking RTView's four sections:
  Resources, Blockchain, Leadership, Transactions (26 panels, row-based
  collapsible sections, byte units for memory/mempool panels)
- Pin datasource uid in provisioning so dashboard references are stable
- Mount dashboards directory in docker-compose
- Enable RTView (hasRTView, port 3300) in ci-bench-timeseries profile
  alongside the existing timeseries endpoint
… inconsistencies

Implements pointwise arithmetic (add, sub, mul, div) between RangeVector Scalar
and Scalar, following the existing InstantVector/Scalar pattern. This enables
queries like 'metric[now - 1h; now] / 1048576' in Grafana panels.

Fixes ten inconsistencies found in typing.txt: missing List type in grammar,
filter_by_label argument order, A->B type formation rule conclusion, := vs ≔
notation, 'Syntax hugar' typo, map variable name mismatch, RangeVector missing
type parameter in quantile_over_time, four missing instant_vector_scalar relation
rules, mul_instant_vector_scalar missing arguments, and stale metrics return type
(Text -> List Text).

Adds a precondition note to 'interp' that the input expression must be well-typed.
…mar docs

Add a cardano-timeseries-test suite (tasty/tasty-hunit) mirroring the
cardano-recon-framework layout, with three suites:
- Elab.Expr.Parser.Suite: comprehensive parser coverage across all constructs,
  operator precedence checks, and wrong-arity error cases (152 tests total)
- Elab.Suite: well-typed / ill-typed elaboration checks
- Interp.Suite: end-to-end execution via API.execute against an empty store

Fix parser bug: the `let` RHS was parsed as `exprOr` instead of `exprUniverse`,
preventing unparenthesised lambdas and nested lets from appearing as let-RHS.
The `in` keyword acts as a natural terminator so no ambiguity arises.

Update grammar documentation:
- Elab/Expr.hs inline grammar: `let x = t{> universe}` -> `t{≥ universe}`
- docs/elab.txt: fix `<int>min` -> `<int>m`; add missing `round t`,
  `earliest x`, `latest x`
… elab bug

The test suite covers scalar/bool arithmetic and comparisons, duration and
timestamp literals, type conversions (to_scalar, abs, round), pairs, let/lambda,
InstantVector lookup/aggregation/filter/map/label-filter/unless/join,
InstantVector-Scalar arithmetic and relations, RangeVector aggregations
(avg_over_time, sum_over_time, rate, increase, quantile_over_time), and
metrics.

Writing the tests surfaced a bug in Elab.hs where Surface.SumOverTime was
elaborated to Semantic.AvgOverTime instead of Semantic.SumOverTime, causing
sum_over_time queries to silently return the mean.  Fixed.
Drop the checkFresh guard so duplicate names are no longer rejected.
Change variable lookup to search the context right-to-left (Seq.reverse)
so that the innermost binding wins, as scoping requires.
St now carries availableMetrics :: Set MetricIdentifier, populated at
the call site via metrics store (or Set.empty in metric-free contexts
like the elab test suite).

The fallback Variable case — previously an unconditional metric assumption
— now checks membership in availableMetrics and throws "Undefined name: <v>"
when the name appears neither in the local context nor in the store,
replacing the confusing downstream type-mismatch error that occurred before.
…tamp 0

The `epoch` keyword was mapped to `Now` in the parser, making it
identical to `now` (i.e. current server time).  It now produces
`Epoch`, which the interpreter evaluates to `Timestamp 0` (Unix
epoch), so expressions like `epoch + 1778499300385ms` yield the
intended absolute timestamp rather than ~year 2082.
The ci-bench-timeseries profile has ~900M transactions (100 000 epochs ×
~9k txs/epoch) but inherited fundsDefault (10 000 ADA) from base, which
covers only a handful of transactions before the generator exits with
"insufficient funds".

Add fundsTimeseries (25 000 000 ADA = 25 × 10^15 lovelace) to Vocabulary
and a baseTimeseries variant of base that uses it. Switch ciTimeseries02Value
to baseTimeseries so the ci-bench-timeseries profile has enough genesis funds
to sustain a long-running workload.
100 000 epochs needed ~23B ADA in the generator wallet (split grows
exponentially with tx_count via unfoldSplitSequence) but genesis could
only supply 22.5B, causing an immediate "insufficient funds" crash.

12 epochs at 15 tps / timescaleCompressed = ~108 000 transactions = 2 hours.
The required split is now ~2.3M ADA, well within the 22.5B available.
When a variable is not found in either the local context or the metric
store, the elaborator now appends a "Did you mean: ..." hint by ranking
all candidate names (locals + metrics) by Levenshtein distance and
showing the closest ones (up to 5) within threshold max(1, len/3).

Uses the `edit-distance` library (the canonical Haskell implementation,
also used by GHC for its own "did you mean" diagnostics).
Replace hardcoded [now - 1h; now] in all 25 dashboard panels with
[$__from; $__to] so the Grafana time picker controls the window.

Pre-process $__from/$__to in the datasource plugin before getTemplateSrv()
sees them — Grafana treats these as built-in variables and ignores
scopedVars overrides — converting them to epoch + Nms expressions that
the query language interprets as absolute timestamps.
Replaces the 7-test skeleton with a comprehensive suite covering all
elaborator code paths: literals, duration literals, arithmetic
(well-typed and ill-typed), comparisons, boolean operators, let/lambda,
pairs/projections, to_scalar, abs/round, metrics builtin, range
vectors, instant-vector operations, and metric-name resolution (known
metric, undefined name, Levenshtein suggestions).

Notes two elaborator bidirectionality constraints discovered via
test failures:
- `Duration + Duration` requires the result type to be driven by
  a containing expression (tested via `now + (1s + 2s)`).
- `\x -> x + 1` leaves `x`'s type unconstrained; replaced with
  `\x -> now + x` which forces `x : Duration` via the Timestamp+?
  noncanonical rule.
When both operands are known to be Duration but the result type is
still a hole, force the hole to Duration and re-queue as the canonical
Duration + Duration : Duration problem. This allows standalone
expressions like `1s + 2s` to elaborate without needing an outer
context to drive the result type.
All three rules fire only when the known type information uniquely
determines the outcome:

  ? - Duration : ?  ->  Timestamp - Duration : Timestamp
    Only Timestamp - Duration exists with Duration on the Sub rhs.
    Enables: \x -> x - 1s  (infer x : Timestamp)

  ? + ? : Duration  ->  Duration + Duration : Duration
    Only Duration + Duration produces Duration.
    Enables: \x -> \y -> m [now; now : x + y]  (infer both : Duration)

  ? - ? : Timestamp  ->  Timestamp - Duration : Timestamp
    Only Timestamp - Duration produces Timestamp via Sub.
    Enables: \x -> \y -> m [x - y; x]  (infer x : Timestamp, y : Duration)

Ordering: A (? - Duration) before C (? - ? : Timestamp) so the more
specific rhsTy=Duration match takes priority. B (? + ? : Duration)
after the existing Duration + Duration : ? rule so the known-both-sides
case is tried first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant