[bot] Merge master/cd91b312 into rel/dev by yenkins-admin · Pull Request #1590 · gooddata/gooddata-python-sdk

yenkins-admin · 2026-05-07T13:07:50Z

🚀 Automated PR to perform merge from master into rel/dev with changes up to cd91b31 (created by https://github.com/gooddata/gooddata-python-sdk/actions/runs/25497696893).

Regenerated `gooddata-api-client` from `https://staging.dev-latest.stg11.panther.intgdc.com` to pick up the new HLL-related backend surfaces: - `DeclarativeSetting.type = HLL_TYPE` (org setting `hyperLogLogType`, values `Native` / `Presto`) - `DeclarativeDataset.type = AUXILIARY` with associated constraint docs - `CreatePipeTableRequest.column_expressions` (enables native HLL via `hll_hash()` at PIPE INSERT) - `GET /api/v1/ailake/object-storages` → `AILakeDatabasesApi.list_ai_lake_object_storages` `Makefile` and a new `scripts/postprocess_api_client.py` harden the `_api-client-generate` target against two reproducible bugs in `openapi-generator-cli:v6.6.0` that surfaced on this regen: 1. `recursiveGetDiscriminator` infinite recursion (StackOverflowError) on the new `DashboardCompoundConditionItem` ↔ children `oneOf`/`allOf` cycle. Worked around with an inline `jq` step that drops the redundant `allOf: [{$ref: parent}]` from each child (parent has no own properties, so this is semantically a no-op). 2. Generator mangling regex patterns of the form `^[^\x00]*$` — sometimes dropping the NUL (leaving the invalid `^[^]*$`), sometimes embedding a literal NUL byte that makes the Python source un-importable. The new helper script handles both shapes. Also replaced the previous `sed '/ /d'` step (which was a no-op — sed BRE/ERE doesn't interpret `\uNNNN`) with `tr -d '\000'` so any literal NUL bytes that jq decoded from ` ` escapes in the source spec are actually stripped. Generation override: `make api-client BASE_URL=https://staging.dev-latest.stg11.panther.intgdc.com` JIRA: CQ-2320 risk: low

11.35.0a2 carries the WASM round-trip support for aggregate-aware LDMs: AUXILIARY datasets, pre-aggregation `aggregated_facts`, synthesized dim datasets backed by `sql:`, and `APPROXIMATE_COUNT` aggregations whose `assigned_to` resolves to an attribute target (HLL synopses). risk: low

Adopt the polymorphic source-reference type from gdc-nas CQ-2147 ("enable agg fact based on attribute", commit `c5601cce`), which replaced `FactIdentifier` with `SourceReferenceIdentifier` so that `aggregatedFacts[].sourceFactReference.reference` can target either a fact or an attribute. The attribute path is required for HLL APPROXIMATE_COUNT, whose count target is an attribute on an AUX dataset. Changes: - `catalog/identifier.py`: `CatalogFactIdentifier.client_class` now returns `SourceReferenceIdentifier`. SDK class name kept for back-compat. - `catalog/workspace/.../dataset.py`: `CatalogDeclarativeSourceFactReference.client_class` now returns `DeclarativeSourceReference`. SDK class name and the wrapping field name (`source_fact_reference`) kept for back-compat — the backend itself kept the JSON key `sourceFactReference`. Knock-on visible to consumers: the new identifier's `type` enum uses lowercase string values (`"fact"`, `"attribute"`) instead of the previous uppercase. Not surfaced in release notes — agg-aware datasets were not exercised by Python-SDK consumers prior to this delivery. Verified: `pytest packages/gooddata-sdk/tests/` (excluding cassette- backed `tests/catalog/store/`) → 413 passed, 2 skipped. JIRA: CQ-2320 risk: low

Adds first-class support for AUXILIARY datasets — synthetic datasets that pre-aggregation tables target as HLL/aggregate reference attributes (the keystone of the aggregate-aware design). The api-client schema distinguishes NORMAL vs AUXILIARY only by the `type` discriminator, so the SDK does the same: a single `CatalogDeclarativeDataset` carries an optional `type: Literal["NORMAL", "AUXILIARY"]` field, and AUX-only omissions (no `dataSourceTableId`, no `sql`, no `aggregatedFacts`, no `precedence`, no physical `source_column` on attributes/facts/labels) are encoded by relaxing those fields to optional. The platform validator enforces the type-specific exclusions; we don't duplicate that here. Knock-on changes folded in: - `client_class()` uses `builtins.type[DeclarativeDataset]`, mirroring `setting.py`/`identifier.py`. The new `type` field on the dataclass shadows the builtin `type` inside the class body, which trips ty. - `gooddata_dbt/dbt/metrics.py` guards `source_column.lower()` calls so AUX entries (no physical column) are skipped instead of crashing. A future TODO documents Option 3 — typed factory constructors (`CatalogDeclarativeDataset.normal(...)` / `.auxiliary(...)`) for safer construction without splitting into two classes. risk: low

Wrap the AI Lake long-running-operation surface with a typed `CatalogAILakeService` exposing the three methods consumers need to trigger and wait on `ANALYZE TABLE` after registering aggregate-aware LDM shapes: - `analyze_statistics(instance_id, table_names=None, operation_id=None)` → `str`: posts the request and returns the operation ID. The caller can pre-supply a UUID; otherwise the service generates one and seeds it as the request `operation-id` header so the polling handle is known up front (the endpoint returns `Unit` body + the id in the response header). - `get_operation(operation_id)` → `CatalogAILakeOperation`: typed handle with `id`, `kind`, `status` (Literal "pending"/"succeeded"/ "failed" — these are the discriminator values of the OpenAPI `Operation` oneOf), and optional `result`/`error` payloads. Convenience predicates `is_terminal`, `is_succeeded`, `is_failed`. - `wait_for_operation(operation_id, timeout_s=300, poll_s=2)`: blocks until terminal, raises `CatalogAILakeOperationError` on `failed` and `TimeoutError` when the deadline elapses. Wiring: - `GoodDataApiClient.ai_lake_api` property now exposes `apis.AILakeApi` for callers who need raw access to surfaces this service does not yet wrap (database provisioning, pipe-table CRUD, service commands — follow-up tickets). - `GoodDataSdk.catalog_ai_lake` property surfaces the new service. - Public re-exports: `CatalogAILakeOperation`, `CatalogAILakeOperationError`, `CatalogAILakeService`. Tests (8 tests, all unit-mocked — no live stack needed): - analyze_statistics seeds caller-supplied UUID, generates one when omitted, normalizes empty `table_names` - get_operation handles success and failure shapes - wait_for_operation polls until succeeded, raises on failed, raises TimeoutError when never terminal Verified: full SDK unit suite — 425 passed, 2 skipped. JIRA: CQ-2320 risk: low

Add a regression-guard for `CatalogWorkspaceContentService.put_declarative_ldm` covering aggregate-aware LDM shapes — extends the round-trip verification to the next link in the chain (the call to `_layout_api.set_logical_model`). The historical risk is silent field loss between the SDK class and the api-client model: an `attrs` default collapsing an explicit `[]` into "missing", or a missing field in `attribute_map` after a regen. The three tests pin down what must reach the wire: - AUXILIARY dataset round-trips with its synthetic identity attribute and references to dim datasets, never gaining `precedence`/`sql`/ `dataSourceTableId` on the way out. - Pre-aggregation dataset keeps `precedence`, `dataSourceTableId`, and both flavors of `aggregatedFacts.sourceFactReference` — SUM-of-fact AND APPROXIMATE_COUNT-of-attribute (the HLL load-bearing shape). - Synthesized dim keeps its `sql.statement` and the attribute's `sourceColumn` mapping. The fixture is shared with the prior round-trip test, so the two stay aligned on what an agg-aware shape looks like. No SDK code change needed — `put_declarative_ldm` already handles these shapes correctly thanks to the recent dataset-class additions; this test ensures it stays that way. JIRA: CQ-2320 risk: low

YAML ↔ declarative round-trip tests for the three new aggregate-aware shapes the WASM convertor handles end-to-end on 11.35.0a2: - AUXILIARY datasets (no physical mapping; synthetic identity attrs). - NORMAL pre-aggregation datasets with `aggregated_facts` — both the vanilla SUM-of-fact path and APPROXIMATE_COUNT-of-attribute (HLL synopses targeting the AUX identity attribute, requires `reference.type == "attribute"` per gdc-nas CQ-2147). - NORMAL synthesized dim datasets backed by a `sql:` block. These guard the SDK side of the AAC convertor pipeline; the heavy lifting lives in `gooddata-code-convertors`. risk: low

Add typed `set_hll_type` / `get_hll_type` methods on `CatalogOrganizationService` for the new `hyperLogLogType` org setting introduced by gdc-nas (`HLL_TYPE("hyperLogLogType", SettingConfiguration.HyperLogLogType)`). The setting controls which HLL function family calcique uses when generating SQL over HLL synopses: - `"Native"` (default) emits StarRocks-native `HLL_*` functions — works when the platform (or PIPE pipeline) builds synopses StarRocks-side. - `"Presto"` emits Presto-compatible HLL functions, required when synopses arrive from an upstream Presto pipeline (the binary layout and hash family differ between the two). Requires the StarRocks deployment to carry the Presto HLL UDFs. Surface: - `HLLType` Literal alias (`"Native" | "Presto"`) re-exported from `gooddata_sdk` for consumer annotations. - `HLL_TYPE_SETTING_ID` (`"hyperLogLogType"`) and `HLL_TYPE_SETTING_TYPE` (`"HLL_TYPE"`) constants for callers that prefer to drive the generic `CatalogOrganizationSetting.init(...)` path directly. - `service.set_hll_type(value)` is idempotent — tries update first, falls back to create when the setting doesn't exist yet. - `service.get_hll_type()` returns `HLLType | None`; defensively returns `None` for unrecognized stored values. Tests: 7 unit tests with mocked `entities_api` — no live stack needed. Cover create-on-missing, update-on-existing, both `"Native"` and `"Presto"` reads, absent-setting case, and the public `HLLType` alias. Verified: full SDK unit suite — 438 passed, 2 skipped, 1 xfailed. JIRA: CQ-2320 risk: low

Public docs for the new surfaces this branch introduces: - `data/ai-lake/`: new section with `_index.md` and method pages for `analyze_statistics`, `get_operation`, and `wait_for_operation`. Documents the long-running-operation contract callers see when they drive AI Lake actions from the SDK (today: ANALYZE TABLE for pre-aggregation tables; the surface will grow as more actions are wrapped in typed helpers). - `administration/organization/set_hll_type.md` and `get_hll_type.md`, with the organization `_index.md` sidebar updated to link them. - AFM compute-model smoke tests for `APPROXIMATE_COUNT` aggregation in `SimpleMetric`. risk: low

feat(gooddata-sdk): deliver HLL / aggregate-aware LDM surfaces

codecov · 2026-05-07T13:11:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.95%. Comparing base (efc6b2a) to head (cd91b31).
⚠️ Report is 503 commits behind head on rel/dev.

Additional details and impacted files

@@             Coverage Diff             @@
##           rel/dev    #1590      +/-   ##
===========================================
+ Coverage    78.83%   78.95%   +0.12%     
===========================================
  Files          230      231       +1     
  Lines        15486    15573      +87     
===========================================
+ Hits         12208    12296      +88     
+ Misses        3278     3277       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jaceksan and others added 10 commits May 7, 2026 14:11

Merge pull request #1589 from gooddata/jacek/meta

cd91b31

feat(gooddata-sdk): deliver HLL / aggregate-aware LDM surfaces

yenkins-admin requested review from hkad98, jaceksan, lupko and pcerny as code owners May 7, 2026 13:07

yenkins-admin merged commit 65eb238 into rel/dev May 7, 2026
1 of 2 checks passed

yenkins-admin deleted the snapshot-master-cd91b312-to-rel/dev branch May 7, 2026 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Merge master/cd91b312 into rel/dev#1590

[bot] Merge master/cd91b312 into rel/dev#1590
yenkins-admin merged 10 commits intorel/devfrom
snapshot-master-cd91b312-to-rel/dev

yenkins-admin commented May 7, 2026

Uh oh!

Uh oh!

codecov Bot commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yenkins-admin commented May 7, 2026

Uh oh!

Uh oh!

codecov Bot commented May 7, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants