Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
21 changes: 15 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -61,13 +61,22 @@ endef
.PHONY: _api-client-generate
_api-client-generate:
rm -f schemas/gooddata-api-client.json
cat schemas/gooddata-*.json | jq -S -s 'reduce .[] as $$item ({}; . * $$item) + { tags : ( reduce .[].tags as $$item (null; . + $$item) | unique_by(.name) ) }' | sed '/\u0000/d' > "schemas/gooddata-api-client.json"
# Merge per-domain specs and strip literal NUL bytes that jq decoded from
# escapes in the source (the previous `sed '/.../d'` pattern was a no-op:
# sed BRE/ERE doesn't interpret \uNNNN, so it never matched anything).
cat schemas/gooddata-*.json | jq -S -s 'reduce .[] as $$item ({}; . * $$item) + { tags : ( reduce .[].tags as $$item (null; . + $$item) | unique_by(.name) ) }' | tr -d '\000' > "schemas/gooddata-api-client.json"
# Break the DashboardCompoundConditionItem ↔ children oneOf/allOf cycle that
# crashes openapi-generator-cli v6.6.0 with StackOverflowError in
# recursiveGetDiscriminator (its walker has no visited-set). Parent has no
# own properties, so dropping the redundant `allOf: [{$$ref: parent}]` from
# each child is semantically a no-op.
jq '(.components.schemas.DashboardCompoundComparisonCondition.allOf) |= map(select(.["$$ref"] != "#/components/schemas/DashboardCompoundConditionItem")) | (.components.schemas.DashboardCompoundRangeCondition.allOf) |= map(select(.["$$ref"] != "#/components/schemas/DashboardCompoundConditionItem"))' schemas/gooddata-api-client.json > schemas/gooddata-api-client.json.tmp && mv schemas/gooddata-api-client.json.tmp schemas/gooddata-api-client.json
$(call generate_client,api)
# OpenAPI Generator drops the \x00 literal from regex patterns like ^[^\x00]*$,
# producing the invalid Python regex ^[^]*$. Restore the null-byte escape.
find gooddata-api-client/gooddata_api_client -name '*.py' -exec \
sed -i.bak 's/\^\[\^\]\*\$$/^[^\\x00]*$$/g' {} + && \
find gooddata-api-client/gooddata_api_client -name '*.py.bak' -delete
# Repair regex patterns of the form ^[^\x00]*$ that openapi-generator mangles
# in two ways: sometimes it drops the NUL (leaving the invalid `^[^]*$`),
# sometimes it embeds a literal NUL byte (producing a Python source that
# fails to import with SyntaxError). The helper handles both shapes.
./scripts/postprocess_api_client.py gooddata-api-client/gooddata_api_client

.PHONY: api-client
api-client: download _api-client-generate
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ See [Manage Organizations](https://www.gooddata.ai/docs/cloud/manage-deployment/
* [delete_jwk](./delete_jwk/)
* [get_jwk](./get_jwk/)
* [list_jwks](./list_jwks/)
* [set_hll_type](./set_hll_type/)
* [get_hll_type](./get_hll_type/)

## Example

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: "get_hll_type"
linkTitle: "get_hll_type"
weight: 31
no_list: true
superheading: "catalog_organization."
api_ref: "CatalogOrganizationService.get_hll_type"
---



``get_hll_type() -> HLLType | None``

Reads the organization-level `hyperLogLogType` setting. Returns the
configured value (`"Native"` or `"Presto"`), or `None` when the setting
is unset or carries an unrecognized value.

See [`set_hll_type`](../set_hll_type/) for the meaning of each value and
when to choose `"Native"` versus `"Presto"`.

{{% parameters-block title="Returns"%}}
{{< parameter p_name="value" p_type="HLLType | None" >}}
`"Native"`, `"Presto"`, or `None` if unset.
{{< /parameter >}}
{{% /parameters-block %}}

## Example

```python
current = sdk.catalog_organization.get_hll_type()
if current is None:
sdk.catalog_organization.set_hll_type("Native")
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: "set_hll_type"
linkTitle: "set_hll_type"
weight: 30
no_list: true
superheading: "catalog_organization."
api_ref: "CatalogOrganizationService.set_hll_type"
---



``set_hll_type(value: HLLType)``

Sets the organization-level `hyperLogLogType` setting that controls which
HyperLogLog function family the platform uses when generating SQL over HLL
synopses.

The call is idempotent: it updates the existing setting or creates it if
absent.

| value | when to use |
| -- | -- |
| `"Native"` | StarRocks-native HLL functions. The default. Use when synopses are produced by the platform itself or by a StarRocks-native pipeline. |
| `"Presto"` | Presto-compatible HLL functions. Use when synopses arrive from an upstream Presto pipeline — the binary layout and hash family of Presto HLL synopses differ from StarRocks-native. Requires the StarRocks deployment to carry the Presto HLL UDFs. |

{{% parameters-block title="Parameters"%}}

{{< parameter p_name="value" p_type="HLLType" >}}
Either `"Native"` or `"Presto"` (a `Literal` type re-exported from
`gooddata_sdk` as `HLLType`).
{{< /parameter >}}
{{% /parameters-block %}}

{{% parameters-block title="Returns" None="yes"%}}
{{% /parameters-block %}}

## Example

```python
from gooddata_sdk import GoodDataSdk

sdk = GoodDataSdk.create(host="https://demo.gooddata.com", token="<token>")

# Customer ingests pre-aggregated tables whose HLL columns were produced by
# Presto. Switch the org so calcique emits Presto-compatible HLL SQL.
sdk.catalog_organization.set_hll_type("Presto")
```
43 changes: 43 additions & 0 deletions docs/content/en/latest/data/ai-lake/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: "AI Lake"
linkTitle: "AI Lake"
weight: 20
no_list: true
---

Drive AI Lake long-running operations from the SDK. Today the surface
covers the actions needed by aggregate-aware logical data models — most
notably the `ANALYZE TABLE` refresh that pre-aggregation workflows rely
on so the cost-based optimizer picks up new statistics.

The AI Lake API uses long-running operations: an action returns
immediately with an `operation_id`, and the client polls until the
operation reaches a terminal status (`succeeded` or `failed`).

### Action Methods

* [analyze_statistics](./analyze_statistics/)

### Operation Methods

* [get_operation](./get_operation/)
* [wait_for_operation](./wait_for_operation/)

## Example

```python
from gooddata_sdk import GoodDataSdk

sdk = GoodDataSdk.create(host="https://demo.gooddata.com", token="<token>")

# Refresh CBO statistics for a single table after a bulk load.
operation_id = sdk.catalog_ai_lake.analyze_statistics(
instance_id="warehouse-prod",
table_names=["agg_orders_country_daily"],
)

# Block until the operation finishes; raises CatalogAILakeOperationError
# on failure and TimeoutError if it doesn't finish in time.
op = sdk.catalog_ai_lake.wait_for_operation(operation_id, timeout_s=600.0)
assert op.is_succeeded
```
54 changes: 54 additions & 0 deletions docs/content/en/latest/data/ai-lake/analyze_statistics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
title: "analyze_statistics"
linkTitle: "analyze_statistics"
weight: 10
no_list: true
superheading: "catalog_ai_lake."
api_ref: "CatalogAILakeService.analyze_statistics"
---



``analyze_statistics(instance_id: str, table_names: list[str] | None = None, operation_id: str | None = None) -> str``

Triggers `ANALYZE TABLE` over an AI Lake database instance so the
cost-based optimizer (CBO) picks up fresh statistics. Required after
schema or bulk-data changes — most importantly after registering a
pre-aggregation table whose dimension attributes the platform will later
resolve via filter pushdown.

The call returns immediately with an operation ID; the actual analyze
runs asynchronously. Use [`get_operation`](../get_operation/) or
[`wait_for_operation`](../wait_for_operation/) to poll for completion.

{{% parameters-block title="Parameters"%}}

{{< parameter p_name="instance_id" p_type="str" >}}
Database instance name (preferred) or UUID.
{{< /parameter >}}
{{< parameter p_name="table_names" p_type="list[str] | None" >}}
Tables to analyze. If `None` or empty, every table in the instance is
analyzed. Defaults to `None`.
{{< /parameter >}}
{{< parameter p_name="operation_id" p_type="str | None" >}}
Optional client-supplied operation identifier. If omitted, a fresh UUID
is generated. Pass the same value that subsequent
`get_operation` / `wait_for_operation` calls will poll on.
{{< /parameter >}}
{{% /parameters-block %}}

{{% parameters-block title="Returns"%}}
{{< parameter p_name="operation_id" p_type="str" >}}
The operation ID (UUID string) the platform will track this run under.
{{< /parameter >}}
{{% /parameters-block %}}

## Example

```python
operation_id = sdk.catalog_ai_lake.analyze_statistics(
instance_id="warehouse-prod",
table_names=["agg_orders_country_daily", "agg_orders_country_monthly"],
)
sdk.catalog_ai_lake.wait_for_operation(operation_id)
```
43 changes: 43 additions & 0 deletions docs/content/en/latest/data/ai-lake/get_operation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: "get_operation"
linkTitle: "get_operation"
weight: 20
no_list: true
superheading: "catalog_ai_lake."
api_ref: "CatalogAILakeService.get_operation"
---



``get_operation(operation_id: str) -> CatalogAILakeOperation``

Fetches the current state of a long-running AI Lake operation.

The returned `CatalogAILakeOperation` carries `id`, `kind`, `status`
(`"pending"`, `"succeeded"`, or `"failed"`), an optional `result` dict
on success, and an optional `error` dict on failure. Use the
`is_terminal` / `is_succeeded` / `is_failed` properties to branch on
status.

{{% parameters-block title="Parameters"%}}

{{< parameter p_name="operation_id" p_type="str" >}}
The operation ID returned by the action that started the operation
(e.g. `analyze_statistics`).
{{< /parameter >}}
{{% /parameters-block %}}

{{% parameters-block title="Returns"%}}
{{< parameter p_name="operation" p_type="CatalogAILakeOperation" >}}
Snapshot of the operation's current status and payload.
{{< /parameter >}}
{{% /parameters-block %}}

## Example

```python
operation_id = sdk.catalog_ai_lake.analyze_statistics(instance_id="warehouse-prod")
op = sdk.catalog_ai_lake.get_operation(operation_id)
if op.is_terminal:
print(op.status, op.result or op.error)
```
64 changes: 64 additions & 0 deletions docs/content/en/latest/data/ai-lake/wait_for_operation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
title: "wait_for_operation"
linkTitle: "wait_for_operation"
weight: 30
no_list: true
superheading: "catalog_ai_lake."
api_ref: "CatalogAILakeService.wait_for_operation"
---



``wait_for_operation(operation_id: str, timeout_s: float = 300.0, poll_s: float = 2.0) -> CatalogAILakeOperation``

Blocks until an AI Lake operation reaches a terminal status, polling
every `poll_s` seconds. Returns the final `CatalogAILakeOperation` on
success, raises `CatalogAILakeOperationError` if the operation finishes
in `failed` state, and raises `TimeoutError` if the operation does not
finish within `timeout_s`.

{{% parameters-block title="Parameters"%}}

{{< parameter p_name="operation_id" p_type="str" >}}
The operation ID returned by the action that started the operation.
{{< /parameter >}}
{{< parameter p_name="timeout_s" p_type="float" >}}
Maximum time to wait, in seconds. Defaults to `300.0`.
{{< /parameter >}}
{{< parameter p_name="poll_s" p_type="float" >}}
Sleep between polls, in seconds. Defaults to `2.0`.
{{< /parameter >}}
{{% /parameters-block %}}

{{% parameters-block title="Returns"%}}
{{< parameter p_name="operation" p_type="CatalogAILakeOperation" >}}
The terminal-state operation; `op.is_succeeded` is guaranteed `True`.
{{< /parameter >}}
{{% /parameters-block %}}

{{% parameters-block title="Raises"%}}
{{< parameter p_type="CatalogAILakeOperationError" >}}
Operation finished with `status="failed"`. The exception carries the
full operation snapshot on its `operation` attribute.
{{< /parameter >}}
{{< parameter p_type="TimeoutError" >}}
Operation did not reach a terminal state within `timeout_s`.
{{< /parameter >}}
{{% /parameters-block %}}

## Example

```python
from gooddata_sdk import CatalogAILakeOperationError

operation_id = sdk.catalog_ai_lake.analyze_statistics(
instance_id="warehouse-prod",
table_names=["agg_orders_country_daily"],
)
try:
op = sdk.catalog_ai_lake.wait_for_operation(operation_id, timeout_s=600.0)
except CatalogAILakeOperationError as exc:
print(f"analyze failed: {exc.operation.error}")
except TimeoutError:
print("analyze still pending; resume polling later with get_operation()")
```
Loading
Loading