Just a project I generated using Claude. I wanted to do something fun hands-on without using any AI help :) Feel free to do it yourself. Just for fun, no pressure.
A hands-on learning project: Build a real-time event analytics platform from scratch using Kubernetes on Azure VMs, Java microservices, and modern data engineering tools.
Why this project exists: To learn by doing — no managed services, no abstractions hiding complexity. Just you, VMs, and infrastructure built from the ground up.
- Overview
- Architecture
- Technology Stack
- Project Phases
- Repository Structure
- Getting Started
- Cost Estimation
- Learning Resources
- Progress Tracking
- License
StreamForge is a gaming analytics platform that tracks player events in real-time. The complete data flow:
- Event Generators (Java) simulate players: logins, matches, purchases, achievements
- Event Ingestion API (Java + Spring Boot) receives events via REST/gRPC
- Apache Kafka streams events to multiple consumers
- Apache Flink (Java) processes streams: aggregations, pattern detection, anomaly alerts
- ClickHouse stores time-series analytics data
- PostgreSQL stores user profiles and reference data
- Redis caches hot data and manages leaderboards
- React Dashboard displays real-time metrics, charts, leaderboards
- Alerting pushes notifications when interesting patterns emerge
| Domain | Skills |
|---|---|
| Infrastructure | VM provisioning, networking, Linux administration |
| Kubernetes | Manual cluster setup, operators, StatefulSets, RBAC |
| Java | Spring Boot, WebFlux, Flink, Gradle, testing |
| Data Engineering | Kafka, stream processing, time-series databases |
| DevOps | Terraform, Pulumi, GitHub Actions, GitOps |
| Observability | Prometheus, Grafana, Loki, OpenTelemetry, Tempo |
| Security | Network policies, secrets management, policy enforcement |
┌─────────────────────────────────────────────────────────────────────────────┐
│ Azure Cloud │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Virtual Network (10.0.0.0/16) │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │ │
│ │ │ Control Plane │ │ Worker Node 1 │ │ Worker Node 2 │ │ │
│ │ │ (2 vCPU/4GB) │ │ (4 vCPU/8GB) │ │ (4 vCPU/8GB) │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ • etcd │ │ • Kafka │ │ • Java Services │ │ │
│ │ │ • API Server │ │ • Flink │ │ • PostgreSQL │ │ │
│ │ │ • Scheduler │ │ • ClickHouse │ │ • Redis │ │ │
│ │ │ • Controller │ │ │ │ • React Frontend │ │ │
│ │ └──────────────────┘ └──────────────────┘ └──────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Observability Stack │ │ │
│ │ │ Prometheus → Grafana ← Loki ← OpenTelemetry ← Tempo │ │ │
│ │ └──────────────────────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Event │ │ Ingestion │ │ Kafka │ │ Flink │
│ Generators │────▶│ API │────▶│ Cluster │────▶│ Processing │
│ (Java) │ │ (Spring Boot)│ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘ └──────┬───────┘
│
┌────────────────────────────────────────────────┘
│
▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ React │◀────│ Query │◀────│ ClickHouse │◀────│ Aggregated │
│ Dashboard │ │ Service │ │ │ │ Data │
│ │ │ │ └──────────────┘ └──────────────┘
└──────────────┘ └──────┬───────┘
│
┌─────┴─────┐
▼ ▼
┌──────────┐ ┌──────────┐
│ PostgreSQL│ │ Redis │
│ (Profiles)│ │ (Cache) │
└──────────┘ └──────────┘
┌─────────────────────────────────────────────────────────────┐
│ Grafana (Visualization) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Prometheus │ │ Loki │ │ Tempo │ │
│ │ (Metrics) │ │ (Logs) │ │ (Distributed Traces)│ │
│ └──────┬──────┘ └──────┬──────┘ └──────────┬──────────┘ │
└─────────┼────────────────┼─────────────────────┼─────────────┘
│ │ │
┌───────┴────────────────┴─────────────────────┴───────┐
│ OpenTelemetry Collector │
└───────────────────────────┬───────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Java Apps│ │ Kafka │ │ Postgres │
│ (OTel) │ │ Exporter │ │ Exporter │
└──────────┘ └──────────┘ └──────────┘
| Technology | Purpose | Why |
|---|---|---|
| Terraform | Azure infrastructure provisioning | Industry standard, declarative |
| Pulumi | Additional infrastructure (ACR, KeyVault) | Learn both approaches |
| Kubernetes | Container orchestration | Manual setup for deep learning |
| Cilium | CNI & Network Policies | eBPF-based, future of K8s networking |
| containerd | Container runtime | Lightweight, K8s native |
| Technology | Purpose | Why |
|---|---|---|
| Apache Kafka | Event streaming | Industry standard for streaming |
| Apache Flink | Stream processing | Powerful, Java-native |
| ClickHouse | Analytics database | Fast columnar storage for time-series |
| PostgreSQL | Relational database | ACID compliance for user data |
| Redis | Caching & leaderboards | In-memory performance |
| Technology | Purpose | Why |
|---|---|---|
| Java 21 | Primary language | Enterprise standard, learning goal |
| Spring Boot 3 | REST API framework | Production-ready, comprehensive |
| Spring WebFlux | Reactive programming | Non-blocking I/O |
| jOOQ | Database queries | Type-safe SQL |
| Gradle (Kotlin DSL) | Build tool | Modern, flexible |
| React + TypeScript | Frontend | Industry standard |
| TanStack Query | Data fetching | Caching, real-time updates |
| Technology | Purpose | Why |
|---|---|---|
| Prometheus | Metrics collection | K8s native monitoring |
| Grafana | Visualization | Unified dashboards |
| Loki | Log aggregation | Prometheus-like for logs |
| Tempo | Distributed tracing | Integrated with Grafana |
| OpenTelemetry | Instrumentation | Vendor-neutral standard |
| Technology | Purpose | Why |
|---|---|---|
| Cilium Network Policies | Network segmentation | Fine-grained control |
| Falco | Runtime security | Threat detection |
| Kyverno | Policy enforcement | K8s-native policies |
| cert-manager | TLS certificates | Automated certificate management |
| Sealed Secrets | Secret management | GitOps-friendly secrets |
| Technology | Purpose | Why |
|---|---|---|
| GitHub Actions | CI/CD pipelines | Native integration |
| Argo CD | GitOps deployment | Declarative, auditable |
| Trivy | Security scanning | Container vulnerability scanning |
| k6 | Load testing | Modern performance testing |
Goal: Kubernetes cluster running on Azure VMs, fully configured by hand.
- Resource Group
- Virtual Network with subnets (control-plane, workers, bastion)
- 3 Ubuntu 22.04 VMs
- Network Security Groups
- Azure Bastion (or SSH jump host for cost savings)
- Managed disks for persistent storage
# Example: VM configuration
resource "azurerm_linux_virtual_machine" "control_plane" {
name = "streamforge-cp"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
size = "Standard_B2s"
admin_username = "azureuser"
network_interface_ids = [
azurerm_network_interface.control_plane.id,
]
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts"
version = "latest"
}
}- Azure Container Registry
- Key Vault for secrets
- DNS Zone
Follow these steps on each VM:
- Prepare all nodes:
# Disable swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# Load required kernel modules
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
# Set sysctl parameters
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system- Install containerd:
sudo apt-get update
sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd- Install kubeadm, kubelet, kubectl:
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl- Initialize control plane (on control plane node only):
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint="<CONTROL_PLANE_IP>"
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config- Install Cilium CNI:
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
cilium install- Join worker nodes:
# Run the join command from kubeadm init output on each worker
sudo kubeadm join <CONTROL_PLANE_IP>:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<HASH>- Terraform code for Azure infrastructure
- Pulumi code for ACR, KeyVault, DNS
- Ansible playbooks for K8s bootstrap (optional)
- Working 3-node Kubernetes cluster
- kubectl access configured locally
- Cilium CNI functioning
Goal: Deploy all databases and streaming infrastructure on Kubernetes.
# azure-disk-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: azure-disk-standard
provisioner: disk.csi.azure.com
parameters:
skuName: Standard_LRS
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true# Install operator
kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.23/releases/cnpg-1.23.0.yaml# postgresql-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: streamforge-db
namespace: database
spec:
instances: 2
storage:
size: 20Gi
storageClass: azure-disk-standard
postgresql:
parameters:
max_connections: "200"
shared_buffers: "256MB"
bootstrap:
initdb:
database: streamforge
owner: streamforge# Install via Helm
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis-cluster \
--namespace database \
--set persistence.storageClass=azure-disk-standard \
--set cluster.nodes=6 \
--set cluster.replicas=1# Install operator
kubectl apply -f https://raw.githubusercontent.com/Altinity/clickhouse-operator/master/deploy/operator/clickhouse-operator-install-bundle.yaml# clickhouse-cluster.yaml
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
name: streamforge-analytics
namespace: database
spec:
configuration:
clusters:
- name: analytics
layout:
shardsCount: 1
replicasCount: 2
users:
streamforge/password: "secure-password"
streamforge/networks/ip: "::/0"
defaults:
templates:
dataVolumeClaimTemplate: data-volume
templates:
volumeClaimTemplates:
- name: data-volume
spec:
accessModes: [ReadWriteOnce]
resources:
requests:
storage: 50Gi
storageClassName: azure-disk-standard# Install Strimzi
kubectl create namespace kafka
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka# kafka-cluster.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: streamforge
namespace: kafka
spec:
kafka:
version: 3.7.0
replicas: 3
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
storage:
type: persistent-claim
size: 20Gi
class: azure-disk-standard
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 10Gi
class: azure-disk-standard
entityOperator:
topicOperator: {}
userOperator: {}- StorageClass configured for Azure disks
- PostgreSQL cluster running (2 instances)
- Redis cluster running (6 nodes)
- ClickHouse cluster running (2 replicas)
- Kafka cluster running (3 brokers)
- All operators understood and documented
Goal: Build three Java microservices from scratch.
// settings.gradle.kts
rootProject.name = "streamforge"
include(
"common",
"event-ingestion",
"stream-processor",
"query-service",
"event-generator"
)// build.gradle.kts (root)
plugins {
java
id("org.springframework.boot") version "3.3.0" apply false
id("io.spring.dependency-management") version "1.1.5" apply false
id("com.google.cloud.tools.jib") version "3.4.2" apply false
}
subprojects {
apply(plugin = "java")
java {
toolchain {
languageVersion = JavaLanguageVersion.of(21)
}
}
repositories {
mavenCentral()
}
tasks.withType<Test> {
useJUnitPlatform()
}
}// GameEvent.java
public record GameEvent(
String eventId,
String playerId,
String eventType,
Map<String, Object> payload,
Instant timestamp,
String sessionId
) {
public enum EventType {
LOGIN, LOGOUT, MATCH_START, MATCH_END,
PURCHASE, ACHIEVEMENT, LEVEL_UP
}
}// EventController.java
@RestController
@RequestMapping("/api/v1/events")
@RequiredArgsConstructor
@Slf4j
public class EventController {
private final KafkaTemplate<String, GameEvent> kafkaTemplate;
private final MeterRegistry meterRegistry;
@PostMapping
public Mono<ResponseEntity<EventResponse>> ingestEvent(
@Valid @RequestBody GameEvent event) {
return Mono.fromCallable(() -> {
var enrichedEvent = enrichEvent(event);
kafkaTemplate.send("game-events",
enrichedEvent.playerId(),
enrichedEvent);
meterRegistry.counter("events.ingested",
"type", event.eventType()).increment();
log.debug("Ingested event: {}", enrichedEvent.eventId());
return ResponseEntity.accepted()
.body(new EventResponse(enrichedEvent.eventId(), "accepted"));
}).subscribeOn(Schedulers.boundedElastic());
}
@PostMapping("/batch")
public Mono<ResponseEntity<BatchResponse>> ingestBatch(
@Valid @RequestBody List<GameEvent> events) {
return Flux.fromIterable(events)
.flatMap(this::processEvent)
.collectList()
.map(results -> ResponseEntity.accepted()
.body(new BatchResponse(results.size(), "accepted")));
}
private GameEvent enrichEvent(GameEvent event) {
return new GameEvent(
UUID.randomUUID().toString(),
event.playerId(),
event.eventType(),
event.payload(),
Instant.now(),
event.sessionId()
);
}
}// KafkaConfig.java
@Configuration
public class KafkaConfig {
@Bean
public ProducerFactory<String, GameEvent> producerFactory(
KafkaProperties properties) {
Map<String, Object> config = new HashMap<>(properties.buildProducerProperties());
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
config.put(ProducerConfig.ACKS_CONFIG, "all");
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
return new DefaultKafkaProducerFactory<>(config);
}
@Bean
public KafkaTemplate<String, GameEvent> kafkaTemplate(
ProducerFactory<String, GameEvent> producerFactory) {
return new KafkaTemplate<>(producerFactory);
}
}// PlayerSessionAnalyzer.java
public class PlayerSessionAnalyzer {
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(60000);
env.setParallelism(2);
KafkaSource<GameEvent> source = KafkaSource.<GameEvent>builder()
.setBootstrapServers("streamforge-kafka-bootstrap.kafka:9092")
.setTopics("game-events")
.setGroupId("session-analyzer")
.setStartingOffsets(OffsetsInitializer.earliest())
.setValueOnlyDeserializer(new GameEventDeserializer())
.build();
DataStream<GameEvent> events = env.fromSource(
source,
WatermarkStrategy.<GameEvent>forBoundedOutOfOrderness(Duration.ofSeconds(5))
.withTimestampAssigner((event, timestamp) ->
event.timestamp().toEpochMilli()),
"Kafka Source"
);
// Session analysis - group events into player sessions
DataStream<PlayerSession> sessions = events
.keyBy(GameEvent::playerId)
.window(EventTimeSessionWindows.withGap(Time.minutes(30)))
.aggregate(new SessionAggregator(), new SessionWindowFunction());
// Real-time metrics - events per minute by type
DataStream<EventMetrics> metrics = events
.keyBy(GameEvent::eventType)
.window(TumblingEventTimeWindows.of(Time.minutes(1)))
.aggregate(new MetricsAggregator());
// Anomaly detection - too many purchases in short time
Pattern<GameEvent, ?> suspiciousPattern = Pattern.<GameEvent>begin("start")
.where(new SimpleCondition<>() {
@Override
public boolean filter(GameEvent event) {
return "PURCHASE".equals(event.eventType());
}
})
.timesOrMore(10)
.within(Time.minutes(5));
DataStream<Alert> alerts = CEP.pattern(
events.keyBy(GameEvent::playerId),
suspiciousPattern
).process(new SuspiciousActivityDetector());
// Sinks
sessions.addSink(new ClickHouseSink<>("player_sessions"));
metrics.addSink(new ClickHouseSink<>("event_metrics"));
alerts.addSink(new AlertSink());
env.execute("StreamForge Analytics");
}
}// SessionAggregator.java
public class SessionAggregator implements
AggregateFunction<GameEvent, SessionAccumulator, SessionAccumulator> {
@Override
public SessionAccumulator createAccumulator() {
return new SessionAccumulator();
}
@Override
public SessionAccumulator add(GameEvent event, SessionAccumulator acc) {
if (acc.sessionStart == null) {
acc.sessionStart = event.timestamp();
acc.playerId = event.playerId();
}
acc.sessionEnd = event.timestamp();
acc.eventCount++;
acc.eventTypes.merge(event.eventType(), 1, Integer::sum);
if ("PURCHASE".equals(event.eventType())) {
acc.totalSpent += extractAmount(event.payload());
}
return acc;
}
@Override
public SessionAccumulator getResult(SessionAccumulator acc) {
return acc;
}
@Override
public SessionAccumulator merge(SessionAccumulator a, SessionAccumulator b) {
a.eventCount += b.eventCount;
b.eventTypes.forEach((k, v) -> a.eventTypes.merge(k, v, Integer::sum));
a.totalSpent += b.totalSpent;
if (b.sessionEnd.isAfter(a.sessionEnd)) {
a.sessionEnd = b.sessionEnd;
}
return a;
}
}// AnalyticsController.java
@RestController
@RequestMapping("/api/v1/analytics")
@RequiredArgsConstructor
public class AnalyticsController {
private final AnalyticsService analyticsService;
private final CacheManager cacheManager;
@GetMapping("/dashboard")
public Mono<DashboardData> getDashboard() {
return analyticsService.getDashboardData();
}
@GetMapping("/players/{playerId}/sessions")
public Flux<PlayerSession> getPlayerSessions(
@PathVariable String playerId,
@RequestParam(defaultValue = "7") int days) {
return analyticsService.getPlayerSessions(playerId, days);
}
@GetMapping("/leaderboard")
@Cacheable(value = "leaderboard", key = "#metric + '-' + #limit")
public Mono<List<LeaderboardEntry>> getLeaderboard(
@RequestParam(defaultValue = "score") String metric,
@RequestParam(defaultValue = "100") int limit) {
return analyticsService.getLeaderboard(metric, limit);
}
@GetMapping(value = "/stream/events", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<EventMetrics>> streamMetrics() {
return analyticsService.getMetricsStream()
.map(metrics -> ServerSentEvent.<EventMetrics>builder()
.data(metrics)
.build());
}
}// AnalyticsRepository.java (ClickHouse queries with jOOQ)
@Repository
@RequiredArgsConstructor
public class AnalyticsRepository {
private final DSLContext clickhouse;
public List<EventMetrics> getMetricsForPeriod(Instant start, Instant end) {
return clickhouse
.select(
field("event_type"),
count().as("event_count"),
countDistinct(field("player_id")).as("unique_players")
)
.from(table("event_metrics"))
.where(field("timestamp").between(start, end))
.groupBy(field("event_type"))
.fetchInto(EventMetrics.class);
}
public List<LeaderboardEntry> getTopPlayers(String metric, int limit) {
return clickhouse
.select(
field("player_id"),
sum(field(metric)).as("value")
)
.from(table("player_sessions"))
.where(field("session_date").greaterThan(LocalDate.now().minusDays(7)))
.groupBy(field("player_id"))
.orderBy(field("value").desc())
.limit(limit)
.fetchInto(LeaderboardEntry.class);
}
}// Integration test with Testcontainers
@SpringBootTest
@Testcontainers
class EventIngestionIntegrationTest {
@Container
static KafkaContainer kafka = new KafkaContainer(
DockerImageName.parse("confluentinc/cp-kafka:7.6.0"));
@DynamicPropertySource
static void kafkaProperties(DynamicPropertyRegistry registry) {
registry.add("spring.kafka.bootstrap-servers", kafka::getBootstrapServers);
}
@Autowired
private WebTestClient webTestClient;
@Test
void shouldIngestEvent() {
var event = new GameEvent(
null, "player-1", "LOGIN",
Map.of("platform", "mobile"), null, "session-1"
);
webTestClient.post()
.uri("/api/v1/events")
.bodyValue(event)
.exchange()
.expectStatus().isAccepted()
.expectBody()
.jsonPath("$.eventId").isNotEmpty()
.jsonPath("$.status").isEqualTo("accepted");
}
}- Gradle multi-module project setup
- Event Ingestion Service with Kafka producer
- Stream Processor with Flink jobs
- Query Service with ClickHouse integration
- Event Generator for testing
- Unit and integration tests
- Docker images via Jib
Goal: Full observability with metrics, logs, and traces.
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--values observability/prometheus-values.yaml# prometheus-values.yaml
grafana:
adminPassword: "your-secure-password"
persistence:
enabled: true
size: 10Gi
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'custom'
folder: 'StreamForge'
type: file
options:
path: /var/lib/grafana/dashboards/custom
prometheus:
prometheusSpec:
retention: 15d
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
alertmanager:
config:
route:
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- channel: '#alerts'
api_url: 'https://hooks.slack.com/services/xxx'helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
--namespace monitoring \
--set promtail.enabled=true \
--set loki.persistence.enabled=true \
--set loki.persistence.size=20Gihelm install tempo grafana/tempo \
--namespace monitoring \
--set persistence.enabled=true \
--set persistence.size=10Gi# otel-collector.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: monitoring
data:
config.yaml: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
memory_limiter:
check_interval: 1s
limit_mib: 512
exporters:
prometheus:
endpoint: "0.0.0.0:8889"
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
otlp/tempo:
endpoint: "tempo:4317"
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlp/tempo]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [loki]// build.gradle.kts - add OpenTelemetry
dependencies {
implementation("io.opentelemetry:opentelemetry-api")
implementation("io.opentelemetry.instrumentation:opentelemetry-instrumentation-annotations")
runtimeOnly("io.opentelemetry.javaagent:opentelemetry-javaagent:2.4.0")
}// Custom instrumentation
@Service
@RequiredArgsConstructor
public class EventProcessingService {
private final Tracer tracer;
@WithSpan("process-game-event")
public void processEvent(
@SpanAttribute("player.id") String playerId,
GameEvent event) {
Span.current().setAttribute("event.type", event.eventType());
// Processing logic...
}
}# Kubernetes deployment with OTel agent
spec:
containers:
- name: event-ingestion
image: streamforge/event-ingestion:latest
env:
- name: JAVA_TOOL_OPTIONS
value: "-javaagent:/otel/opentelemetry-javaagent.jar"
- name: OTEL_SERVICE_NAME
value: "event-ingestion"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector:4317"
- name: OTEL_METRICS_EXPORTER
value: "otlp"
- name: OTEL_LOGS_EXPORTER
value: "otlp"- Application Overview: Request rate, error rate, latency (RED metrics)
- Kafka Dashboard: Consumer lag, partition distribution, throughput
- JVM Metrics: Heap usage, GC pauses, thread count
- Database Connections: Pool utilization, query latency
- Business Metrics: Events/sec by type, active players, revenue
- Prometheus + Grafana deployed
- Loki for log aggregation
- Tempo for distributed tracing
- OpenTelemetry Collector configured
- Java apps instrumented with OTel
- 5+ custom Grafana dashboards
- Alert rules configured
Goal: React dashboard with real-time updates.
npm create vite@latest frontend -- --template react-ts
cd frontend
npm install @tanstack/react-query recharts lucide-react
npm install -D tailwindcss postcss autoprefixer
npx tailwindcss init -p// src/App.tsx
import { QueryClient, QueryClientProvider } from '@tanstack/react-query';
import { Dashboard } from './components/Dashboard';
const queryClient = new QueryClient({
defaultOptions: {
queries: {
refetchInterval: 5000,
staleTime: 2000,
},
},
});
export function App() {
return (
<QueryClientProvider client={queryClient}>
<div className="min-h-screen bg-gray-900 text-white">
<Dashboard />
</div>
</QueryClientProvider>
);
}// src/components/Dashboard.tsx
import { useQuery } from '@tanstack/react-query';
import { MetricCard } from './MetricCard';
import { EventChart } from './EventChart';
import { Leaderboard } from './Leaderboard';
import { useEventStream } from '../hooks/useEventStream';
import { fetchDashboardData } from '../api/analytics';
export function Dashboard() {
const { data, isLoading } = useQuery({
queryKey: ['dashboard'],
queryFn: fetchDashboardData,
});
const liveEventRate = useEventStream();
if (isLoading) {
return <LoadingSpinner />;
}
return (
<div className="p-6">
<header className="mb-8">
<h1 className="text-3xl font-bold">StreamForge Analytics</h1>
<p className="text-gray-400">Real-time gaming analytics dashboard</p>
</header>
<div className="grid grid-cols-4 gap-4 mb-8">
<MetricCard
title="Active Players"
value={data?.activePlayers}
trend={data?.playersTrend}
icon="users"
/>
<MetricCard
title="Events/sec"
value={liveEventRate}
icon="activity"
live
/>
<MetricCard
title="Matches Today"
value={data?.matchesToday}
trend={data?.matchesTrend}
icon="gamepad"
/>
<MetricCard
title="Revenue (24h)"
value={data?.revenue24h}
format="currency"
trend={data?.revenueTrend}
icon="dollar-sign"
/>
</div>
<div className="grid grid-cols-3 gap-6">
<div className="col-span-2">
<EventChart data={data?.eventHistory} />
</div>
<div>
<Leaderboard players={data?.topPlayers} />
</div>
</div>
</div>
);
}// src/hooks/useEventStream.ts
import { useState, useEffect } from 'react';
export function useEventStream() {
const [eventRate, setEventRate] = useState(0);
useEffect(() => {
const eventSource = new EventSource('/api/v1/analytics/stream/events');
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
setEventRate(data.eventsPerSecond);
};
eventSource.onerror = () => {
eventSource.close();
// Reconnect after delay
setTimeout(() => {
// Reconnection logic
}, 5000);
};
return () => eventSource.close();
}, []);
return eventRate;
}// src/components/EventChart.tsx
import {
AreaChart, Area, XAxis, YAxis, Tooltip,
ResponsiveContainer, CartesianGrid
} from 'recharts';
interface EventChartProps {
data: Array<{
timestamp: string;
logins: number;
matches: number;
purchases: number;
}>;
}
export function EventChart({ data }: EventChartProps) {
return (
<div className="bg-gray-800 rounded-lg p-4">
<h2 className="text-xl font-semibold mb-4">Event Activity</h2>
<ResponsiveContainer width="100%" height={300}>
<AreaChart data={data}>
<CartesianGrid strokeDasharray="3 3" stroke="#374151" />
<XAxis
dataKey="timestamp"
stroke="#9CA3AF"
tickFormatter={(value) => new Date(value).toLocaleTimeString()}
/>
<YAxis stroke="#9CA3AF" />
<Tooltip
contentStyle={{
backgroundColor: '#1F2937',
border: 'none',
borderRadius: '8px'
}}
/>
<Area
type="monotone"
dataKey="logins"
stackId="1"
stroke="#3B82F6"
fill="#3B82F6"
fillOpacity={0.6}
/>
<Area
type="monotone"
dataKey="matches"
stackId="1"
stroke="#10B981"
fill="#10B981"
fillOpacity={0.6}
/>
<Area
type="monotone"
dataKey="purchases"
stackId="1"
stroke="#F59E0B"
fill="#F59E0B"
fillOpacity={0.6}
/>
</AreaChart>
</ResponsiveContainer>
</div>
);
}- React + TypeScript + Vite setup
- TanStack Query for data fetching
- Real-time updates via SSE
- Responsive dashboard layout
- Interactive charts with Recharts
- Leaderboard component
- Docker build for frontend
Goal: Comprehensive CI/CD pipeline with security scanning.
# .github/workflows/ci.yaml
name: CI
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
REGISTRY: streamforgeacr.azurecr.io
JAVA_VERSION: '21'
NODE_VERSION: '20'
jobs:
changes:
runs-on: ubuntu-latest
outputs:
java: ${{ steps.filter.outputs.java }}
frontend: ${{ steps.filter.outputs.frontend }}
infra: ${{ steps.filter.outputs.infra }}
steps:
- uses: actions/checkout@v4
- uses: dorny/paths-filter@v3
id: filter
with:
filters: |
java:
- 'services/**'
- 'build.gradle.kts'
frontend:
- 'frontend/**'
infra:
- 'infrastructure/**'
- 'kubernetes/**'
build-java:
needs: changes
if: needs.changes.outputs.java == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up JDK
uses: actions/setup-java@v4
with:
distribution: 'temurin'
java-version: ${{ env.JAVA_VERSION }}
cache: 'gradle'
- name: Build and Test
run: ./gradlew build test
- name: Upload test results
uses: actions/upload-artifact@v4
if: always()
with:
name: test-results
path: '**/build/test-results/test/*.xml'
- name: Build Docker images
if: github.event_name == 'push'
run: |
./gradlew jib \
-Djib.to.image=${{ env.REGISTRY }}/event-ingestion:${{ github.sha }} \
-Djib.to.auth.username=${{ secrets.ACR_USERNAME }} \
-Djib.to.auth.password=${{ secrets.ACR_PASSWORD }}
build-frontend:
needs: changes
if: needs.changes.outputs.frontend == 'true'
runs-on: ubuntu-latest
defaults:
run:
working-directory: frontend
steps:
- uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
cache-dependency-path: frontend/package-lock.json
- name: Install dependencies
run: npm ci
- name: Lint
run: npm run lint
- name: Type check
run: npm run typecheck
- name: Test
run: npm run test -- --coverage
- name: Build
run: npm run build
- name: Build and push Docker image
if: github.event_name == 'push'
uses: docker/build-push-action@v5
with:
context: frontend
push: true
tags: ${{ env.REGISTRY }}/frontend:${{ github.sha }}
validate-infrastructure:
needs: changes
if: needs.changes.outputs.infra == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
- name: Terraform Format Check
run: terraform fmt -check -recursive
working-directory: infrastructure/terraform
- name: Terraform Validate
run: |
terraform init -backend=false
terraform validate
working-directory: infrastructure/terraform/environments/dev
- name: Setup Pulumi
uses: pulumi/actions@v5
- name: Pulumi Preview
run: pulumi preview --stack dev
working-directory: infrastructure/pulumi
env:
PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}
security-scan:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
severity: 'CRITICAL,HIGH'
exit-code: '1'
- name: Run TruffleHog (secrets scanning)
uses: trufflesecurity/trufflehog@main
with:
extra_args: --only-verified
- name: OWASP Dependency Check
run: ./gradlew dependencyCheckAnalyze
- name: Upload OWASP report
uses: actions/upload-artifact@v4
with:
name: dependency-check-report
path: build/reports/dependency-check-report.html
kubernetes-lint:
needs: changes
if: needs.changes.outputs.infra == 'true'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install kubeconform
run: |
wget https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz
tar xf kubeconform-linux-amd64.tar.gz
sudo mv kubeconform /usr/local/bin/
- name: Validate Kubernetes manifests
run: |
kubeconform -strict -summary kubernetes/
- name: Run kube-linter
uses: stackrox/kube-linter-action@v1
with:
directory: kubernetes/# .github/workflows/deploy.yaml
name: Deploy
on:
push:
branches: [main]
workflow_dispatch:
inputs:
environment:
description: 'Environment to deploy to'
required: true
default: 'staging'
type: choice
options:
- staging
- production
jobs:
deploy:
runs-on: ubuntu-latest
environment: ${{ github.event.inputs.environment || 'staging' }}
steps:
- uses: actions/checkout@v4
- name: Azure Login
uses: azure/login@v2
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Set up kubeconfig
uses: azure/k8s-set-context@v4
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBE_CONFIG }}
- name: Deploy to Kubernetes
run: |
# Update image tags
cd kubernetes/overlays/${{ github.event.inputs.environment || 'staging' }}
kustomize edit set image \
event-ingestion=${{ env.REGISTRY }}/event-ingestion:${{ github.sha }} \
query-service=${{ env.REGISTRY }}/query-service:${{ github.sha }} \
frontend=${{ env.REGISTRY }}/frontend:${{ github.sha }}
# Apply changes
kubectl apply -k .
# Wait for rollout
kubectl rollout status deployment/event-ingestion -n streamforge
kubectl rollout status deployment/query-service -n streamforge
kubectl rollout status deployment/frontend -n streamforge
- name: Run smoke tests
run: |
INGRESS_IP=$(kubectl get ingress -n streamforge -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')
curl -f http://${INGRESS_IP}/api/v1/health || exit 1
- name: Notify on failure
if: failure()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "Deployment failed for ${{ github.repository }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": ":x: Deployment to ${{ github.event.inputs.environment || 'staging' }} failed"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}# .github/workflows/load-test.yaml
name: Load Test
on:
workflow_dispatch:
inputs:
duration:
description: 'Test duration'
required: true
default: '5m'
vus:
description: 'Virtual users'
required: true
default: '50'
jobs:
load-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install k6
run: |
sudo gpg -k
sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update
sudo apt-get install k6
- name: Run load test
run: |
k6 run \
--duration ${{ github.event.inputs.duration }} \
--vus ${{ github.event.inputs.vus }} \
--out json=results.json \
tests/load/event-ingestion.js
- name: Upload results
uses: actions/upload-artifact@v4
with:
name: load-test-results
path: results.json- CI pipeline with path-based triggers
- Java build with Gradle and Jib
- Frontend build with Node
- Infrastructure validation (Terraform + Pulumi)
- Security scanning (Trivy, TruffleHog, OWASP)
- Kubernetes manifest validation
- Deployment pipeline with rollout status
- Load testing workflow with k6
Goal: Production-grade security posture.
# network-policies/api-service.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: api-service-policy
namespace: streamforge
spec:
endpointSelector:
matchLabels:
app: event-ingestion
ingress:
- fromEndpoints:
- matchLabels:
app: ingress-nginx
toPorts:
- ports:
- port: "8080"
protocol: TCP
egress:
- toEndpoints:
- matchLabels:
app.kubernetes.io/name: kafka
toPorts:
- ports:
- port: "9092"
- toEndpoints:
- matchLabels:
io.kubernetes.pod.namespace: kube-system
k8s-app: kube-dns
toPorts:
- ports:
- port: "53"
protocol: UDP# network-policies/database.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: postgres-policy
namespace: database
spec:
endpointSelector:
matchLabels:
cnpg.io/cluster: streamforge-db
ingress:
- fromEndpoints:
- matchLabels:
app: query-service
- matchLabels:
app: stream-processor
toPorts:
- ports:
- port: "5432"
egress:
- toEndpoints:
- matchLabels:
cnpg.io/cluster: streamforge-db# policies/require-labels.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-labels
spec:
validationFailureAction: Enforce
rules:
- name: require-team-label
match:
any:
- resources:
kinds:
- Deployment
- StatefulSet
validate:
message: "Label 'team' is required"
pattern:
metadata:
labels:
team: "?*"# policies/disallow-privileged.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: disallow-privileged-containers
spec:
validationFailureAction: Enforce
rules:
- name: deny-privileged
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Privileged containers are not allowed"
pattern:
spec:
containers:
- securityContext:
privileged: "false"# policies/require-resource-limits.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-resource-limits
spec:
validationFailureAction: Enforce
rules:
- name: require-limits
match:
any:
- resources:
kinds:
- Pod
validate:
message: "CPU and memory limits are required"
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"# falco/custom-rules.yaml
customRules:
rules-streamforge.yaml: |-
- rule: Unexpected outbound connection from Java app
desc: Detect outbound connections from Java apps to unexpected destinations
condition: >
outbound and
container.image.repository contains "streamforge" and
not (fd.sip in (kafka_ips, postgres_ips, clickhouse_ips, redis_ips))
output: >
Unexpected outbound connection from StreamForge app
(command=%proc.cmdline connection=%fd.name container=%container.name)
priority: WARNING
- rule: Database credential access
desc: Detect access to database credential files
condition: >
open_read and
fd.name contains "database" and
fd.name contains "password"
output: >
Database credential file accessed
(user=%user.name command=%proc.cmdline file=%fd.name)
priority: CRITICAL# Install Sealed Secrets controller
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm install sealed-secrets sealed-secrets/sealed-secrets \
--namespace kube-system# Create a sealed secret
kubectl create secret generic db-credentials \
--from-literal=username=streamforge \
--from-literal=password=super-secret \
--dry-run=client -o yaml | \
kubeseal --format yaml > sealed-db-credentials.yaml# cert-manager/cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: [email protected]
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- http01:
ingress:
class: nginx# ingress with automatic TLS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: streamforge-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- streamforge.example.com
secretName: streamforge-tls
rules:
- host: streamforge.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-gateway
port:
number: 80# namespace with restricted security
apiVersion: v1
kind: Namespace
metadata:
name: streamforge
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted- Cilium network policies for all services
- Kyverno policies enforced
- Falco runtime monitoring
- Sealed Secrets for GitOps
- cert-manager with Let's Encrypt
- Pod Security Standards enforced
- RBAC properly configured
- Security documentation
Goal: Hands-on experience with cutting-edge tools.
| Tool | Category | Integration |
|---|---|---|
| Crossplane | IaC | Manage Azure resources via K8s CRDs |
| Argo CD | GitOps | Declarative deployments |
| Backstage | Developer Portal | Service catalog |
| OpenCost | FinOps | Cost monitoring per service |
| Kubeshark | Debugging | API traffic analysis |
| Karpenter | Autoscaling | Smart node provisioning |
# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: streamforge
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/your-org/streamforge
targetRevision: HEAD
path: kubernetes/overlays/production
destination:
server: https://kubernetes.default.svc
namespace: streamforge
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true# crossplane/azure-provider.yaml
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: provider-azure
spec:
package: xpkg.upbound.io/upbound/provider-azure:v0.42.0# crossplane/storage-account.yaml
apiVersion: storage.azure.upbound.io/v1beta1
kind: Account
metadata:
name: streamforge-backup
spec:
forProvider:
resourceGroupName: streamforge-rg
location: westeurope
accountTier: Standard
accountReplicationType: LRS
providerConfigRef:
name: azure-providerhelm install opencost opencost/opencost \
--namespace opencost \
--create-namespace \
--set prometheus.enabled=false \
--set prometheus.external.url=http://prometheus-server.monitoring:9090- Argo CD for GitOps deployments
- Crossplane managing some Azure resources
- OpenCost dashboard in Grafana
- Kubeshark for debugging
- Documentation on each tool
streamforge/
├── .github/
│ └── workflows/
│ ├── ci.yaml
│ ├── deploy.yaml
│ └── load-test.yaml
├── infrastructure/
│ ├── terraform/
│ │ ├── modules/
│ │ │ ├── network/
│ │ │ ├── compute/
│ │ │ └── security/
│ │ └── environments/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── prod/
│ ├── pulumi/
│ │ └── index.ts
│ └── ansible/
│ ├── playbooks/
│ └── inventory/
├── kubernetes/
│ ├── base/
│ │ ├── event-ingestion/
│ │ ├── stream-processor/
│ │ ├── query-service/
│ │ └── frontend/
│ ├── overlays/
│ │ ├── dev/
│ │ ├── staging/
│ │ └── production/
│ └── helm-values/
│ ├── kafka-values.yaml
│ ├── postgres-values.yaml
│ └── monitoring-values.yaml
├── services/
│ ├── common/
│ │ └── src/main/java/
│ ├── event-ingestion/
│ │ ├── src/
│ │ └── build.gradle.kts
│ ├── stream-processor/
│ │ ├── src/
│ │ └── build.gradle.kts
│ ├── query-service/
│ │ ├── src/
│ │ └── build.gradle.kts
│ └── event-generator/
│ └── src/
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ ├── hooks/
│ │ ├── api/
│ │ └── App.tsx
│ ├── package.json
│ └── Dockerfile
├── observability/
│ ├── dashboards/
│ │ ├── application.json
│ │ ├── kafka.json
│ │ └── jvm.json
│ ├── alerts/
│ │ └── rules.yaml
│ └── otel-collector-config.yaml
├── security/
│ ├── network-policies/
│ ├── kyverno-policies/
│ └── falco-rules/
├── tests/
│ ├── load/
│ │ └── event-ingestion.js
│ └── e2e/
├── docs/
│ ├── architecture/
│ ├── runbooks/
│ └── decisions/
├── scripts/
│ ├── setup-cluster.sh
│ ├── deploy-databases.sh
│ └── cleanup.sh
├── build.gradle.kts
├── settings.gradle.kts
└── README.md
- Azure subscription
- Azure CLI installed
- Terraform >= 1.5
- Pulumi >= 3.0
- kubectl >= 1.30
- Helm >= 3.14
- Java 21 (Temurin/Adoptium)
- Node.js 20 LTS
- Docker
# 1. Clone the repository
git clone https://github.com/your-org/streamforge.git
cd streamforge
# 2. Login to Azure
az login
az account set --subscription "Your Subscription"
# 3. Deploy infrastructure
cd infrastructure/terraform/environments/dev
terraform init
terraform plan -out=tfplan
terraform apply tfplan
# 4. Bootstrap Kubernetes (see Phase 1 instructions)
# 5. Deploy platform services
./scripts/deploy-databases.sh
# 6. Build and deploy applications
./gradlew build jib
kubectl apply -k kubernetes/overlays/dev
# 7. Access the dashboard
kubectl port-forward svc/frontend 3000:80 -n streamforge
open http://localhost:3000| Resource | Specification | Cost (EUR) |
|---|---|---|
| Control Plane VM | Standard_B2s (2 vCPU, 4GB) | ~€35 |
| Worker VMs (2x) | Standard_D2s_v5 Spot | ~€40 |
| Managed Disks | 100GB Standard SSD | ~€15 |
| Container Registry | Basic tier | ~€5 |
| Networking/Egress | ~50GB/month | ~€5 |
| Total | ~€100/month |
- Use spot instances for worker nodes (70% savings)
- Deallocate VMs when not in use:
az vm deallocate -g streamforge-rg -n <vm-name> - Use Azure Bastion only when needed, or use SSH jump host instead
- Set up auto-shutdown schedules for dev environment
- Monitor costs with OpenCost
- Effective Java (3rd Edition) - Joshua Bloch
- Baeldung - Spring Boot tutorials
- Java Brains - YouTube tutorials
- Spring Boot Documentation
- Kubernetes Up & Running - O'Reilly
- Kubernetes the Hard Way - Kelsey Hightower
- CKA Course - Mumshad Mannambeth
- Kubernetes Documentation
- Designing Data-Intensive Applications - Martin Kleppmann
- Kafka: The Definitive Guide
- Flink Documentation
- Confluent Developer
Use this checklist to track your progress:
- Terraform code complete
- Pulumi code complete
- VMs provisioned
- Kubernetes cluster bootstrapped
- Cilium CNI installed
- kubectl access working
- PostgreSQL running
- Redis running
- ClickHouse running
- Kafka running
- All persistence configured
- Project structure created
- Event Ingestion service
- Stream Processor service
- Query service
- Tests passing
- Docker images built
- Prometheus deployed
- Grafana deployed
- Loki deployed
- Tempo deployed
- OTel Collector configured
- Dashboards created
- Alerts configured
- React app scaffolded
- Dashboard components
- Real-time updates
- Charts working
- Docker build
- CI pipeline
- Security scanning
- Deploy pipeline
- Load testing
- Network policies
- Kyverno policies
- Falco rules
- TLS configured
- Secrets management
- Argo CD
- Crossplane
- OpenCost
- Documentation complete
This is a personal learning project, but suggestions and improvements are welcome!
MIT License - feel free to use this as a template for your own learning journey.
Happy Learning! 🚀