Skip to content

[Feature] Add Apache Datasketches HLL sketches aggregate function #63142

@nooneuse

Description

@nooneuse

Search before asking

  • I had searched in the issues and found no similar issues.

Description

PR:see: #63143

An aggregate function is required to process user data containing Datasketches HLL sketches. In many data aggregation scenarios, users pre‑aggregate detailed data in Hive using the sketching techniques provided by Apache Datasketches, and then analyze the resulting sketches across various OLAP engines. Compared with the HLL union aggregate functions natively offered by these engines, there are two key diff to using Datasketches HLL sketches: firstly, the use cases differ; and secondly, HLL sketches can be used seamlessly across different engines—for example, simultaneously in ES, Doris, and ClickHouse. Such requirements are common in many production environments.

Use case

CREATE TABLE test.test_table (
    id   INT,
    data STRING
)
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
    "replication_num" = "1"
);

INSERT INTO test.test_table (id, data) VALUES (1, FROM_BASE64_BINARY('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK'));

INSERT INTO test.test_table (id, data) VALUES (2, FROM_BASE64_BINARY('AwEHCAUIAAkKAAAAgbxdBoYv+Q3L18IEwekXBdIWcwc0omEOdYFmB/wtQgp7ZeYIK/L7Bg=='));

INSERT INTO test.test_table (id, data) VALUES (3, FROM_BASE64_BINARY('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA=='));

select ds_cardinality(data) from test.test_table;

then doris should return result 20

Related issues

#26416,
#56246,
etc..

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions