-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Search before asking
- I searched the issues and found no similar issues.
Linkis Component
linkis-engineconn-plugins
What happened
English:
Linkis currently lacks native support for StarRocks, the next-generation extreme performance OLAP engine. StarRocks has surpassed Doris and ClickHouse in SSB and TPC-H benchmarks and is rapidly gaining adoption in top-tier companies (Tencent, ByteDance, Ctrip, Meituan, Xiaomi) and financial institutions (China UnionPay, CMB, Taikang Insurance) due to its superior performance, high concurrency, and native lakehouse capabilities.
Market Demand:
- Extreme Query Performance: 10-30% faster than Doris in SSB benchmarks, industry-leading OLAP performance
- Real-time UPSERT: Native support for high-performance primary key model with UPSERT operations, filling ClickHouse's update gap
- MPP Architecture: High concurrency support with hundreds of concurrent queries per cluster
- Smart Materialized Views: Automatic query rewriting to select optimal materialized views
- Native Lakehouse: Deep integration with Hive, Iceberg, Hudi, Delta Lake for unified data access
- Rapid Growth: 8k+ GitHub stars, many enterprises migrating from Doris to StarRocks
Strategic Value:
While Linkis already supports Doris, StarRocks has evolved significantly and offers complementary value:
- StarRocks: Next-gen real-time data warehouse for extreme performance seekers
- Doris: Traditional real-time OLAP for mature stability requirements
- Strategy: Support both engines, let users choose based on scenarios
中文:
Linkis目前缺乏对StarRocks的原生支持,StarRocks是新一代极致性能OLAP引擎。StarRocks在SSB和TPC-H基准测试中超越了Doris和ClickHouse,并在头部公司(腾讯、字节跳动、携程、美团、小米)和金融机构(中国银联、招商银行、泰康保险)中快速获得采用,因其卓越的性能、高并发能力和原生湖仓能力。
市场需求:
- 极致查询性能: SSB基准测试中比Doris快10-30%,业界领先的OLAP性能
- 实时UPSERT: 原生支持高性能主键模型的UPSERT操作,填补ClickHouse的更新短板
- MPP架构: 高并发支持,单集群可支持数百并发查询
- 智能物化视图: 自动查询改写,选择最优物化视图
- 原生湖仓: 与Hive、Iceberg、Hudi、Delta Lake深度集成,实现统一数据访问
- 快速增长: GitHub 8k+ stars,许多企业正从Doris迁移到StarRocks
战略价值:
虽然Linkis已支持Doris,但StarRocks已显著演进并提供互补价值:
- StarRocks: 追求极致性能的新一代实时数仓
- Doris: 追求成熟稳定的传统实时OLAP
- 策略: 同时支持两个引擎,让用户根据场景选择
What you expected to happen
English:
Linkis should provide a StarRocks engine plugin with the following capabilities:
-
SQL Query Support:
- MySQL protocol compatibility (StarRocks is MySQL-compatible)
- Standard SQL syntax support
- Support for all table models (Duplicate, Aggregate, Unique, Primary Key)
- Materialized view queries with automatic rewriting
-
Data Operations:
- INSERT for batch data loading
- UPSERT for real-time updates (Primary Key model)
- DELETE for data deletion
- Stream Load and Broker Load support
- Complex JOIN and aggregation queries
-
Lakehouse Integration:
- Query external tables (Hive, Iceberg, Hudi, Delta Lake)
- External catalog support
- Unified SQL interface for data lake and warehouse
- Federated queries across multiple data sources
-
Performance Optimization:
- Connection pooling and reuse
- Query result streaming to avoid OOM
- Tablet-level parallel execution
- Automatic query optimization
- Resource usage monitoring
-
Integration with Linkis:
- Unified task submission interface
- Resource management integration
- Permission control integration
- Metadata catalog integration
中文:
Linkis应该提供StarRocks引擎插件,具备以下能力:
-
SQL查询支持:
- MySQL协议兼容(StarRocks兼容MySQL)
- 标准SQL语法支持
- 支持所有表模型(Duplicate、Aggregate、Unique、Primary Key)
- 物化视图查询与自动改写
-
数据操作:
- INSERT用于批量数据加载
- UPSERT用于实时更新(主键模型)
- DELETE用于数据删除
- Stream Load和Broker Load支持
- 复杂JOIN和聚合查询
-
湖仓集成:
- 查询外部表(Hive、Iceberg、Hudi、Delta Lake)
- 外部catalog支持
- 数据湖和数据仓库的统一SQL接口
- 跨多个数据源的联邦查询
-
性能优化:
- 连接池和复用
- 查询结果流式处理避免OOM
- Tablet级并行执行
- 自动查询优化
- 资源使用监控
-
与Linkis集成:
- 统一的任务提交接口
- 资源管理集成
- 权限控制集成
- 元数据目录集成
How to reproduce
English:
Current situation:
- Users need to manually set up StarRocks MySQL connections
- No dedicated engine plugin for StarRocks operations
- Cannot leverage Linkis's unified task submission and resource management
- Limited support for StarRocks-specific features (lakehouse, materialized views)
Use case example:
-- Real-time analytics with UPSERT (Primary Key model)
-- StarRocks excels at real-time updates unlike ClickHouse
CREATE TABLE user_profiles (
user_id BIGINT,
user_name STRING,
total_orders INT,
last_order_time DATETIME
) PRIMARY KEY (user_id)
DISTRIBUTED BY HASH(user_id);
-- Upsert operation (updates existing, inserts new)
INSERT INTO user_profiles VALUES
(1001, 'Alice', 150, '2024-12-20 10:30:00'),
(1002, 'Bob', 200, '2024-12-20 11:00:00')
ON DUPLICATE KEY UPDATE
total_orders = VALUES(total_orders),
last_order_time = VALUES(last_order_time);
-- Query lakehouse data (Iceberg table) - StarRocks native support
SELECT
date_trunc('day', event_time) as day,
event_type,
COUNT(*) as event_count
FROM iceberg_catalog.events_db.user_events
WHERE event_time >= CURRENT_DATE - INTERVAL 7 DAY
GROUP BY day, event_type
ORDER BY day DESC, event_count DESC;
-- Federated query across StarRocks and data lake
SELECT
s.user_id,
s.user_name,
COUNT(e.event_id) as event_count
FROM user_profiles s
JOIN iceberg_catalog.events_db.user_events e ON s.user_id = e.user_id
WHERE e.event_time >= CURRENT_DATE - INTERVAL 1 DAY
GROUP BY s.user_id, s.user_name;
-- Materialized view automatic rewriting (not available without plugin)
-- StarRocks automatically selects best MV for query optimization
SELECT region, SUM(sales) FROM sales_table GROUP BY region;