Search before asking
Description
PR:see: #63143
An aggregate function is required to process user data containing Datasketches HLL sketches. In many data aggregation scenarios, users pre‑aggregate detailed data in Hive using the sketching techniques provided by Apache Datasketches, and then analyze the resulting sketches across various OLAP engines. Compared with the HLL union aggregate functions natively offered by these engines, there are two key diff to using Datasketches HLL sketches: firstly, the use cases differ; and secondly, HLL sketches can be used seamlessly across different engines—for example, simultaneously in ES, Doris, and ClickHouse. Such requirements are common in many production environments.
Use case
CREATE TABLE test.test_table (
id INT,
data STRING
)
DUPLICATE KEY(`id`)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_num" = "1"
);
INSERT INTO test.test_table (id, data) VALUES (1, FROM_BASE64_BINARY('AgEHCAMIBwjL18IEK/L7BoYv+Q11gWYHgbxdBntl5gj8LUIK'));
INSERT INTO test.test_table (id, data) VALUES (2, FROM_BASE64_BINARY('AwEHCAUIAAkKAAAAgbxdBoYv+Q3L18IEwekXBdIWcwc0omEOdYFmB/wtQgp7ZeYIK/L7Bg=='));
INSERT INTO test.test_table (id, data) VALUES (3, FROM_BASE64_BINARY('AwEHCAUIAAkKAAAAIjvrBcS1nwfGGWoEyHokBO8t9wc1qTEENkcJB7hWqQxZf9QNnuSbGA=='));
select ds_cardinality(data) from test.test_table;
then doris should return result 20
Related issues
#26416,
#56246,
etc..
Are you willing to submit PR?
Code of Conduct
Search before asking
Description
PR:see: #63143
An aggregate function is required to process user data containing Datasketches HLL sketches. In many data aggregation scenarios, users pre‑aggregate detailed data in Hive using the sketching techniques provided by Apache Datasketches, and then analyze the resulting sketches across various OLAP engines. Compared with the HLL union aggregate functions natively offered by these engines, there are two key diff to using Datasketches HLL sketches: firstly, the use cases differ; and secondly, HLL sketches can be used seamlessly across different engines—for example, simultaneously in ES, Doris, and ClickHouse. Such requirements are common in many production environments.
Use case
then doris should return result 20
Related issues
#26416,
#56246,
etc..
Are you willing to submit PR?
Code of Conduct