KalamDB is designed for speed, efficiency, and minimal resource use. We aim to store and process data with the smallest possible footprint, reducing CPU, memory, storage, and token costs while improving performance.
Faster operations. Lower infrastructure expenses. Zero waste.
KalamDB is a SQL-first, real-time database that scales to millions of concurrent users through a revolutionary table-per-user architecture. Built in Rust with Apache Arrow and DataFusion, it combines the familiarity of SQL with the performance needed for modern chat applications and AI assistants.
Get KalamDB running in seconds:
curl -sSL https://raw.githubusercontent.com/jamals86/KalamDB/main/docker/backend/docker-compose.yml | docker-compose -f - up -d- ⚡ Sub‑millisecond writes using RocksDB hot tier
- 📡 Live SQL subscriptions over WebSockets
- 🧍♂️➡️🧍♀️ Per‑user isolation — each user gets their own table & storage
- 💾 Cold tier (Parquet) optimized for analytics and long‑term storage
- 🌍 Multiple storage backends: Local, S3, Azure, GCS
Traditional Database (Shared Table):
┌─────────────────────────────────┐
│ messages (shared) │
│ userId │ conversationId │ ... │
│ ─────────┼────────────────┼──── │
│ user1 │ conv_A │ ... │
│ user2 │ conv_B │ ... │
│ user1 │ conv_C │ ... │
│ user3 │ conv_D │ ... │
│ ...millions of rows... │
└─────────────────────────────────┘
❌ Complex triggers on entire table
❌ Inefficient filtering for real-time
❌ Scaling bottlenecks at millions of users
KalamDB (Table-Per-User):
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ user1.msgs │ |user2.messages│ │user3.messages│
│ convId │ ... │ │ convId │ ... │ │ convId │ ... │
│────────┼──── │ │────────┼──── │ │────────┼──── │
│ conv_A │ ... │ │ conv_B │ ... │ │ conv_D │ ... │
│ conv_C │ ... │ │ ... │ │ ... │
└──────────────┘ └──────────────┘ └──────────────┘
✅ Simple per-user subscriptions
✅ Scales to millions of concurrent users
✅ Storage isolation for privacy, compliance, security & cost
KalamDB stores data in a simple, inspectable layout. Each folder now contains a small manifest.json alongside the data files.
data/
├── rocksdb/ # Hot storage (RocksDB column families)
│ ├── system_* # System tables
│ └── user_* / shared_* # Hot buffers per table
└── storage/ # Cold storage (Parquet segments)
├── user/{user_id}/{table}/
│ ├── manifest.json # Schema + segment index
│ └── batch-<index>.parquet # Flushed segments
└── shared/{table}/
├── manifest.json
└── batch-<index>.parquet
High level crate graph today:
+----------------+
| kalamdb-api | HTTP + WebSocket server
+--------+-------+
|
v
+----------------+
| kalamdb-core | SQL handlers, jobs, tables
+--------+-------+
|
+------------+-------------+
v v
+---------------+ +-----------------+
| kalamdb-store | | kalamdb-filestore| (Parquet + manifests)
+-------+-------+ +---------+-------+
| |
v v
RocksDB column families Filesystem / object storage
kalamdb-core orchestrates everything and never talks to RocksDB or the filesystem directly; it goes through kalamdb-store (key/value hot path) and kalamdb-filestore (Parquet + manifest.json and batch indexes).
- SQL engine with full DDL/DML support
- Three table types: USER, SHARED, STREAM
- Per-user tables with hot (RocksDB) + cold (Parquet) storage
- Real-time subscriptions over WebSocket
- Unified schema system with 16 data types (incl. EMBEDDING)
- Role-based access control and authentication
kalamCLI tool
- Indexes for both cold/hot storages
- Backup/restore and system catalog tables
- SDK for TypeScript using WASM
- Performance tuning and metrics
- Stronger WebSocket auth and rate limiting
- Cleanup and simplification of docs and examples
- Support for more storage backends (Azure, GCS, S3-compatible) using ObjectStore
- Admin UI and dashboard
- Run workflows on data changes (triggers)
- File storage and BLOB support
- High-availability and replication using Raft
- Richer search (full-text, vector embeddings as DataType)
- Query caching and more indexes
- Connectors to external data sources (Flink, Kafka, etc)
- Transactions and constraints
# Clone the repository
git clone https://github.com/jamals86/KalamDB.git
cd KalamDB/backend
# Run the server (uses config.toml or defaults)
cargo run --release --bin kalamdb-serverSee Quick Start Guide for detailed setup instructions.
Below is a minimal but realistic end-to-end example for a chat app with AI. It uses:
- One user table for conversations
- One user table for messages
- One stream table for ephemeral typing/thinking/cancel events
We assume:
- The server is running at
http://localhost:8080 - You're running on localhost (automatically connects as
rootuser) - You have the CLI built and available as
kalam(seedocs/CLI.md)
# Start the interactive CLI (connects as root on localhost by default)
kalamNow inside the kalam> prompt:
-- Create namespace and tables
CREATE NAMESPACE IF NOT EXISTS chat;
CREATE TABLE chat.conversations (
id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
title TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
) WITH (TYPE = 'USER', FLUSH_POLICY = 'rows:1000');
CREATE TABLE chat.messages (
id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
conversation_id BIGINT NOT NULL,
role_id TEXT NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
) WITH (TYPE = 'USER', FLUSH_POLICY = 'rows:1000');
CREATE TABLE chat.typing_events (
id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
conversation_id BIGINT NOT NULL,
user_id TEXT NOT NULL,
event_type TEXT NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
) WITH (TYPE = 'STREAM', TTL_SECONDS = 30);-- Create a new conversation and get its id
INSERT INTO chat.conversations (id, title) VALUES (1, 'Chat with AI About KalamDB');
-- Suppose the returned id is 1 – insert user + AI messages
INSERT INTO chat.messages (conversation_id, role_id, content) VALUES
(1, 'user', 'Hello, AI!'),
(1, 'assistant', 'Hi! How can I help you today?');
-- Query the conversation history
SELECT id, role_id, content, created_at
FROM chat.messages
WHERE conversation_id = 1
ORDER BY created_at ASC;-- User starts typing
INSERT INTO chat.typing_events (conversation_id, user_id, event_type)
VALUES (1, 'user_123', 'typing');
-- AI starts thinking
INSERT INTO chat.typing_events (conversation_id, user_id, event_type)
VALUES (1, 'ai_model', 'thinking');
-- AI cancels / stops
INSERT INTO chat.typing_events (conversation_id, user_id, event_type)
VALUES (1, 'ai_model', 'cancelled');
-- Subscribe to live typing events
\subscribe SELECT * FROM chat.typing_events WHERE conversation_id = 1 OPTIONS (last_rows=50);
-- You can also subscribe to live messages
\subscribe SELECT * FROM chat.messages WHERE conversation_id = 1 OPTIONS (last_rows=20);Note: Press Ctrl+C to stop the subscription and return to the prompt.
The recommended way to subscribe to real-time updates is using the official TypeScript SDK:
import { createClient } from '@kalamdb/client';
// Connect to KalamDB
const client = createClient({
url: 'http://localhost:8080',
username: 'admin',
password: 'admin'
});
await client.connect();
// Subscribe to messages for a specific conversation with options
const unsubMessages = await client.subscribeWithSql(
'SELECT * FROM chat.messages WHERE conversation_id = 1 ORDER BY created_at DESC',
(event) => {
if (event.type === 'change') {
console.log('New message:', event.rows);
}
},
{ batch_size: 50 } // Load initial data in batches of 50
);
// Subscribe to typing events (simple table subscription)
const unsubTyping = await client.subscribe('chat.typing_events', (event) => {
if (event.type === 'change') {
console.log('Typing event:', event.change_type, event.rows);
}
});
// Check active subscriptions
console.log(`Active subscriptions: ${client.getSubscriptionCount()}`);
// Later: cleanup
await unsubMessages();
await unsubTyping();
await client.disconnect();Note: You can also connect directly via WebSocket at
ws://localhost:8080/v1/wsfor custom implementations. See SDK Documentation for the full API reference and API Documentation for raw WebSocket protocol details.
📖 Complete SQL Reference: See SQL Syntax Documentation for the full command reference with all options.
Challenge: Traditional databases struggle with millions of concurrent users each needing real-time message updates.
KalamDB Solution:
- Per-user table isolation means 1 million users = 1 million independent WebSocket subscriptions
- No global table locks or complex WHERE filtering
- Sub-millisecond writes to RocksDB hot tier
- Automatic message history archival to Parquet cold tier
Challenge: Multiple users editing shared documents with real-time synchronization and conflict resolution.
KalamDB Solution:
- Shared tables for document content
- Stream tables for ephemeral cursor positions and presence
- Live query subscriptions for real-time collaboration
- User tables for per-user edit history
Example:
-- Shared document storage
CREATE TABLE docs.content (
doc_id TEXT PRIMARY KEY,
version INT,
content TEXT,
author TEXT DEFAULT CURRENT_USER(),
updated_at TIMESTAMP DEFAULT NOW()
) WITH (TYPE = 'SHARED', FLUSH_POLICY = 'interval:60');
-- Ephemeral presence tracking
CREATE TABLE docs.presence (
doc_id TEXT PRIMARY KEY,
user_id TEXT,
cursor_position INT,
last_seen TIMESTAMP DEFAULT NOW()
) WITH (TYPE = 'STREAM', TTL_SECONDS = 5); -- Auto-evict after 5 seconds
-- Subscribe to document changes
SUBSCRIBE TO docs.content
WHERE doc_id = 'project-proposal'
OPTIONS (last_rows=1);Result: Google Docs-style real-time collaboration with sub-second latency.
Challenge: Ingest millions of sensor readings per second with time-series analytics and real-time alerts.
KalamDB Solution:
- Stream tables for ephemeral sensor data with TTL eviction
- Shared tables for aggregated metrics and alerts
- Live subscriptions for anomaly detection
- Automatic cold tier archival for historical analysis
Example:
-- Ephemeral sensor readings (10-second retention)
CREATE TABLE iot.sensor_data (
sensor_id TEXT PRIMARY KEY,
temperature DOUBLE,
humidity DOUBLE,
timestamp TIMESTAMP DEFAULT NOW()
) WITH (TYPE = 'STREAM', TTL_SECONDS = 10);
-- Aggregated metrics (persisted)
CREATE TABLE iot.metrics (
sensor_id TEXT PRIMARY KEY,
avg_temp DOUBLE,
max_temp DOUBLE,
min_temp DOUBLE,
hour TIMESTAMP
) WITH (TYPE = 'SHARED', FLUSH_POLICY = 'interval:3600'); -- Flush every hour
-- Real-time alert subscription
SUBSCRIBE TO iot.sensor_data
WHERE temperature > 80.0 OR humidity > 95.0;Result: Prometheus-style monitoring with SQL queries and real-time alerting.
Challenge: Provide complete data export and deletion for user privacy regulations.
KalamDB Solution:
- Per-user storage enables trivial data export (copy directory)
- Soft delete with configurable grace period
- Physical data isolation prevents cross-user leakage
- Audit trails with
CURRENT_USER()tracking
Example:
-- Create user with audit trail
CREATE TABLE app.user_data (
id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
data_type TEXT,
content TEXT,
created_by TEXT DEFAULT CURRENT_USER(),
created_at TIMESTAMP DEFAULT NOW()
) WITH (TYPE = 'USER', FLUSH_POLICY = 'interval:300');
-- Export user data (simple file copy)
-- cp -r /var/lib/kalamdb/user/alice123/ /exports/alice-gdpr-export/
-- Delete user (soft delete with 30-day grace period)
DROP USER 'alice';
-- Hard delete after grace period (automatic cleanup)
-- Scheduled job removes /var/lib/kalamdb/user/alice123/Result: GDPR-compliant data management with minimal engineering effort.
Challenge: Isolate customer data while sharing infrastructure efficiently.
KalamDB Solution:
- User tables provide tenant-level isolation
- Shared tables for cross-tenant analytics
- Per-tenant backup and restore
- Role-based access control (RBAC)
Example:
-- Tenant-isolated data
CREATE TABLE saas.customer_data (
id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
customer_id TEXT DEFAULT CURRENT_USER(),
entity_type TEXT,
entity_data TEXT,
created_at TIMESTAMP DEFAULT NOW()
) WITH (TYPE = 'USER', FLUSH_POLICY = 'rows:10000,interval:600');
-- Cross-tenant analytics (aggregated)
CREATE TABLE saas.analytics (
id BIGINT PRIMARY KEY DEFAULT SNOWFLAKE_ID(),
metric_name TEXT,
metric_value DOUBLE,
tenant_id TEXT,
timestamp TIMESTAMP DEFAULT NOW()
) WITH (TYPE = 'SHARED', FLUSH_POLICY = 'interval:3600');
-- Create service account per tenant
CREATE USER 'tenant_acme' WITH PASSWORD 'SecureKey123!' ROLE 'service';Result: Salesforce-style multi-tenancy with SQL simplicity.
| Component | Technology | Version | Purpose |
|---|---|---|---|
| Language | Rust | 1.90+ | Performance, safety, concurrency |
| Storage (Hot) | RocksDB | 0.24 | Fast buffered writes (<1ms latency) |
| Storage (Cold) | Apache Parquet | 52.0 | Compressed columnar format for analytics |
| Query Engine | Apache DataFusion | 40.0 | SQL execution across hot+cold storage |
| In-Memory | Apache Arrow | 52.0 | Zero-copy data structures |
| API Server | Actix-web | 4.4 | REST endpoints + WebSocket subscriptions |
| Authentication | bcrypt + JWT | - | Password hashing + token-based auth |
| Real-time | WebSocket | - | Live message notifications |
| Deployment | Docker | - | Production-ready containerization |
| Client SDK | Rust → WASM | coming soon | TypeScript/JavaScript bindings |
- Quick Start Guide - Get up and running in 10 minutes
- SQL Reference – SQL syntax and examples
- API Reference – HTTP & WebSocket API overview
- CLI Guide – using the
kalamcommand-line client - SDK (TypeScript/WASM) – browser/Node.js client (under development)
The official TypeScript SDK provides a type-safe wrapper around KalamDB with real-time subscriptions:
import { createClient } from '@kalamdb/client';
const client = createClient({
url: 'http://localhost:8080',
username: 'admin',
password: 'admin'
});
await client.connect();
// Query data
const result = await client.query('SELECT * FROM chat.messages LIMIT 10');
// Subscribe to changes (Firebase/Supabase style)
const unsubscribe = await client.subscribe('chat.messages', (event) => {
console.log('Change:', event);
});
// Subscription management
console.log(`Active: ${client.getSubscriptionCount()}`);
await unsubscribe();
await client.disconnect();Features:
- ✅ Built on Rust → WASM for performance
- ✅ Real-time subscriptions with Firebase/Supabase-style API
- ✅ Subscription tracking (
getSubscriptionCount(),unsubscribeAll()) - ✅ Works in browsers and Node.js
- ✅ Full TypeScript type definitions
See SDK Documentation for the complete API reference.
KalamDB began while I was building several AI applications and realized there was no single database that could provide:
- Per-user storage isolation (true physical separation, not just WHERE filters)
- GDPR-ready design where storing user data is isolated per user, and deleting a user simply removes their entire storage directory
- Real-time subscriptions without needing Redis, Kafka, or WebSocket proxies
- Fast message/event storage with <1ms latency
- Event listening for AI typing/thinking/cancel states
- Both hot (RocksDB) and cold (Parquet) storage built-in
- All inside one database, without chaining multiple services together
Traditionally, you need a stack to achieve this:
- a relational database for storage
- Redis/Kafka for real-time events
- a backend API to glue things together
- a WebSocket reverse proxy/server
- constant fine-tuning, caching, scaling, sharding, backpressure handling…
That setup becomes complex, expensive, and fragile.
I wanted something simple, fast, and built for AI-era workloads — without needing to maintain a cluster of different technologies.
So KalamDB was created as a lightweight, unified database that:
- isolates data per user for scale & privacy,
- pushes live changes instantly,
- is GDPR-friendly by design,
- avoids unnecessary complexity,
- and scales to millions with minimal tuning.
Apache 2.0 License
Kalam (كلام) means "speech" or "conversation" in Arabic — fitting for a database designed specifically for storing and streaming human conversations and AI interactions.
Built with ❤️ in Rust for real-time conversations at scale.