Skip to content

[bot] Merge master/cd91b312 into rel/dev#1590

Merged
yenkins-admin merged 10 commits intorel/devfrom
snapshot-master-cd91b312-to-rel/dev
May 7, 2026
Merged

[bot] Merge master/cd91b312 into rel/dev#1590
yenkins-admin merged 10 commits intorel/devfrom
snapshot-master-cd91b312-to-rel/dev

Conversation

@yenkins-admin
Copy link
Copy Markdown
Contributor

🚀 Automated PR to perform merge from master into rel/dev with changes up to cd91b31 (created by https://github.com/gooddata/gooddata-python-sdk/actions/runs/25497696893).

jaceksan and others added 10 commits May 7, 2026 14:11
Regenerated `gooddata-api-client` from
`https://staging.dev-latest.stg11.panther.intgdc.com` to pick up the new
HLL-related backend surfaces:

- `DeclarativeSetting.type = HLL_TYPE` (org setting `hyperLogLogType`,
  values `Native` / `Presto`)
- `DeclarativeDataset.type = AUXILIARY` with associated constraint docs
- `CreatePipeTableRequest.column_expressions` (enables native HLL via
  `hll_hash()` at PIPE INSERT)
- `GET /api/v1/ailake/object-storages` →
  `AILakeDatabasesApi.list_ai_lake_object_storages`

`Makefile` and a new `scripts/postprocess_api_client.py` harden the
`_api-client-generate` target against two reproducible bugs in
`openapi-generator-cli:v6.6.0` that surfaced on this regen:

1. `recursiveGetDiscriminator` infinite recursion (StackOverflowError) on
   the new `DashboardCompoundConditionItem` ↔ children `oneOf`/`allOf`
   cycle. Worked around with an inline `jq` step that drops the redundant
   `allOf: [{$ref: parent}]` from each child (parent has no own
   properties, so this is semantically a no-op).
2. Generator mangling regex patterns of the form `^[^\x00]*$` — sometimes
   dropping the NUL (leaving the invalid `^[^]*$`), sometimes embedding a
   literal NUL byte that makes the Python source un-importable. The new
   helper script handles both shapes.

Also replaced the previous `sed '/ /d'` step (which was a no-op — sed
BRE/ERE doesn't interpret `\uNNNN`) with `tr -d '\000'` so any literal NUL
bytes that jq decoded from ` ` escapes in the source spec are
actually stripped.

Generation override:
`make api-client BASE_URL=https://staging.dev-latest.stg11.panther.intgdc.com`

JIRA: CQ-2320
risk: low
11.35.0a2 carries the WASM round-trip support for aggregate-aware LDMs:
AUXILIARY datasets, pre-aggregation `aggregated_facts`, synthesized
dim datasets backed by `sql:`, and `APPROXIMATE_COUNT` aggregations
whose `assigned_to` resolves to an attribute target (HLL synopses).

risk: low
Adopt the polymorphic source-reference type from gdc-nas CQ-2147
("enable agg fact based on attribute", commit `c5601cce`), which
replaced `FactIdentifier` with `SourceReferenceIdentifier` so that
`aggregatedFacts[].sourceFactReference.reference` can target either a
fact or an attribute. The attribute path is required for HLL
APPROXIMATE_COUNT, whose count target is an attribute on an AUX dataset.

Changes:
- `catalog/identifier.py`: `CatalogFactIdentifier.client_class` now
  returns `SourceReferenceIdentifier`. SDK class name kept for
  back-compat.
- `catalog/workspace/.../dataset.py`:
  `CatalogDeclarativeSourceFactReference.client_class` now returns
  `DeclarativeSourceReference`. SDK class name and the wrapping
  field name (`source_fact_reference`) kept for back-compat — the
  backend itself kept the JSON key `sourceFactReference`.

Knock-on visible to consumers: the new identifier's `type` enum uses
lowercase string values (`"fact"`, `"attribute"`) instead of the
previous uppercase. Not surfaced in release notes — agg-aware datasets
were not exercised by Python-SDK consumers prior to this delivery.

Verified: `pytest packages/gooddata-sdk/tests/` (excluding cassette-
backed `tests/catalog/store/`) → 413 passed, 2 skipped.

JIRA: CQ-2320
risk: low
Adds first-class support for AUXILIARY datasets — synthetic datasets
that pre-aggregation tables target as HLL/aggregate reference attributes
(the keystone of the aggregate-aware design). The api-client schema
distinguishes NORMAL vs AUXILIARY only by the `type` discriminator, so
the SDK does the same: a single `CatalogDeclarativeDataset` carries an
optional `type: Literal["NORMAL", "AUXILIARY"]` field, and AUX-only
omissions (no `dataSourceTableId`, no `sql`, no `aggregatedFacts`,
no `precedence`, no physical `source_column` on attributes/facts/labels)
are encoded by relaxing those fields to optional. The platform
validator enforces the type-specific exclusions; we don't duplicate
that here.

Knock-on changes folded in:

- `client_class()` uses `builtins.type[DeclarativeDataset]`, mirroring
  `setting.py`/`identifier.py`. The new `type` field on the dataclass
  shadows the builtin `type` inside the class body, which trips ty.
- `gooddata_dbt/dbt/metrics.py` guards `source_column.lower()` calls so
  AUX entries (no physical column) are skipped instead of crashing.

A future TODO documents Option 3 — typed factory constructors
(`CatalogDeclarativeDataset.normal(...)` / `.auxiliary(...)`) for safer
construction without splitting into two classes.

risk: low
Wrap the AI Lake long-running-operation surface with a typed
`CatalogAILakeService` exposing the three methods consumers need to
trigger and wait on `ANALYZE TABLE` after registering aggregate-aware
LDM shapes:

- `analyze_statistics(instance_id, table_names=None, operation_id=None)`
  → `str`: posts the request and returns the operation ID. The caller
  can pre-supply a UUID; otherwise the service generates one and seeds
  it as the request `operation-id` header so the polling handle is
  known up front (the endpoint returns `Unit` body + the id in the
  response header).
- `get_operation(operation_id)` → `CatalogAILakeOperation`: typed
  handle with `id`, `kind`, `status` (Literal "pending"/"succeeded"/
  "failed" — these are the discriminator values of the OpenAPI
  `Operation` oneOf), and optional `result`/`error` payloads. Convenience
  predicates `is_terminal`, `is_succeeded`, `is_failed`.
- `wait_for_operation(operation_id, timeout_s=300, poll_s=2)`: blocks
  until terminal, raises `CatalogAILakeOperationError` on `failed` and
  `TimeoutError` when the deadline elapses.

Wiring:
- `GoodDataApiClient.ai_lake_api` property now exposes `apis.AILakeApi`
  for callers who need raw access to surfaces this service does not
  yet wrap (database provisioning, pipe-table CRUD, service commands —
  follow-up tickets).
- `GoodDataSdk.catalog_ai_lake` property surfaces the new service.
- Public re-exports: `CatalogAILakeOperation`, `CatalogAILakeOperationError`,
  `CatalogAILakeService`.

Tests (8 tests, all unit-mocked — no live stack needed):
- analyze_statistics seeds caller-supplied UUID, generates one when
  omitted, normalizes empty `table_names`
- get_operation handles success and failure shapes
- wait_for_operation polls until succeeded, raises on failed,
  raises TimeoutError when never terminal

Verified: full SDK unit suite — 425 passed, 2 skipped.

JIRA: CQ-2320
risk: low
Add a regression-guard for `CatalogWorkspaceContentService.put_declarative_ldm`
covering aggregate-aware LDM shapes — extends the round-trip
verification to the next link in the chain (the call to
`_layout_api.set_logical_model`).

The historical risk is silent field loss between the SDK class and the
api-client model: an `attrs` default collapsing an explicit `[]` into
"missing", or a missing field in `attribute_map` after a regen.
The three tests pin down what must reach the wire:

- AUXILIARY dataset round-trips with its synthetic identity attribute
  and references to dim datasets, never gaining `precedence`/`sql`/
  `dataSourceTableId` on the way out.
- Pre-aggregation dataset keeps `precedence`, `dataSourceTableId`, and
  both flavors of `aggregatedFacts.sourceFactReference` — SUM-of-fact
  AND APPROXIMATE_COUNT-of-attribute (the HLL load-bearing shape).
- Synthesized dim keeps its `sql.statement` and the attribute's
  `sourceColumn` mapping.

The fixture is shared with the prior round-trip test, so the two stay
aligned on what an agg-aware shape looks like.

No SDK code change needed — `put_declarative_ldm` already handles these
shapes correctly thanks to the recent dataset-class additions; this
test ensures it stays that way.

JIRA: CQ-2320
risk: low
YAML ↔ declarative round-trip tests for the three new aggregate-aware
shapes the WASM convertor handles end-to-end on 11.35.0a2:

- AUXILIARY datasets (no physical mapping; synthetic identity attrs).
- NORMAL pre-aggregation datasets with `aggregated_facts` — both the
  vanilla SUM-of-fact path and APPROXIMATE_COUNT-of-attribute (HLL
  synopses targeting the AUX identity attribute, requires
  `reference.type == "attribute"` per gdc-nas CQ-2147).
- NORMAL synthesized dim datasets backed by a `sql:` block.

These guard the SDK side of the AAC convertor pipeline; the heavy
lifting lives in `gooddata-code-convertors`.

risk: low
Add typed `set_hll_type` / `get_hll_type` methods on
`CatalogOrganizationService` for the new `hyperLogLogType` org setting
introduced by gdc-nas (`HLL_TYPE("hyperLogLogType",
SettingConfiguration.HyperLogLogType)`).

The setting controls which HLL function family calcique uses when
generating SQL over HLL synopses:

- `"Native"` (default) emits StarRocks-native `HLL_*` functions —
  works when the platform (or PIPE pipeline) builds synopses
  StarRocks-side.
- `"Presto"` emits Presto-compatible HLL functions, required when
  synopses arrive from an upstream Presto pipeline (the binary layout
  and hash family differ between the two). Requires the StarRocks
  deployment to carry the Presto HLL UDFs.

Surface:
- `HLLType` Literal alias (`"Native" | "Presto"`) re-exported from
  `gooddata_sdk` for consumer annotations.
- `HLL_TYPE_SETTING_ID` (`"hyperLogLogType"`) and
  `HLL_TYPE_SETTING_TYPE` (`"HLL_TYPE"`) constants for callers that
  prefer to drive the generic `CatalogOrganizationSetting.init(...)`
  path directly.
- `service.set_hll_type(value)` is idempotent — tries update first,
  falls back to create when the setting doesn't exist yet.
- `service.get_hll_type()` returns `HLLType | None`; defensively
  returns `None` for unrecognized stored values.

Tests: 7 unit tests with mocked `entities_api` — no live stack needed.
Cover create-on-missing, update-on-existing, both `"Native"` and
`"Presto"` reads, absent-setting case, and the public `HLLType` alias.

Verified: full SDK unit suite — 438 passed, 2 skipped, 1 xfailed.

JIRA: CQ-2320
risk: low
Public docs for the new surfaces this branch introduces:

- `data/ai-lake/`: new section with `_index.md` and method pages for
  `analyze_statistics`, `get_operation`, and `wait_for_operation`.
  Documents the long-running-operation contract callers see when they
  drive AI Lake actions from the SDK (today: ANALYZE TABLE for
  pre-aggregation tables; the surface will grow as more actions are
  wrapped in typed helpers).
- `administration/organization/set_hll_type.md` and
  `get_hll_type.md`, with the organization `_index.md` sidebar updated
  to link them.
- AFM compute-model smoke tests for `APPROXIMATE_COUNT` aggregation in
  `SimpleMetric`.

risk: low
feat(gooddata-sdk): deliver HLL / aggregate-aware LDM surfaces
@yenkins-admin yenkins-admin merged commit 65eb238 into rel/dev May 7, 2026
1 of 2 checks passed
@yenkins-admin yenkins-admin deleted the snapshot-master-cd91b312-to-rel/dev branch May 7, 2026 13:07
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.95%. Comparing base (efc6b2a) to head (cd91b31).
⚠️ Report is 503 commits behind head on rel/dev.

Additional details and impacted files
@@             Coverage Diff             @@
##           rel/dev    #1590      +/-   ##
===========================================
+ Coverage    78.83%   78.95%   +0.12%     
===========================================
  Files          230      231       +1     
  Lines        15486    15573      +87     
===========================================
+ Hits         12208    12296      +88     
+ Misses        3278     3277       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants