Skip to content

drsnell/streamforge

Repository files navigation

StreamForge

Just a project I generated using Claude. I wanted to do something fun hands-on without using any AI help :) Feel free to do it yourself. Just for fun, no pressure.

A hands-on learning project: Build a real-time event analytics platform from scratch using Kubernetes on Azure VMs, Java microservices, and modern data engineering tools.

Why this project exists: To learn by doing — no managed services, no abstractions hiding complexity. Just you, VMs, and infrastructure built from the ground up.


Table of Contents


Overview

StreamForge is a gaming analytics platform that tracks player events in real-time. The complete data flow:

  1. Event Generators (Java) simulate players: logins, matches, purchases, achievements
  2. Event Ingestion API (Java + Spring Boot) receives events via REST/gRPC
  3. Apache Kafka streams events to multiple consumers
  4. Apache Flink (Java) processes streams: aggregations, pattern detection, anomaly alerts
  5. ClickHouse stores time-series analytics data
  6. PostgreSQL stores user profiles and reference data
  7. Redis caches hot data and manages leaderboards
  8. React Dashboard displays real-time metrics, charts, leaderboards
  9. Alerting pushes notifications when interesting patterns emerge

What You'll Learn

Domain Skills
Infrastructure VM provisioning, networking, Linux administration
Kubernetes Manual cluster setup, operators, StatefulSets, RBAC
Java Spring Boot, WebFlux, Flink, Gradle, testing
Data Engineering Kafka, stream processing, time-series databases
DevOps Terraform, Pulumi, GitHub Actions, GitOps
Observability Prometheus, Grafana, Loki, OpenTelemetry, Tempo
Security Network policies, secrets management, policy enforcement

Architecture

High-Level Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                              Azure Cloud                                     │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │                    Virtual Network (10.0.0.0/16)                       │ │
│  │                                                                        │ │
│  │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────────┐  │ │
│  │  │  Control Plane   │  │   Worker Node 1  │  │    Worker Node 2     │  │ │
│  │  │   (2 vCPU/4GB)   │  │   (4 vCPU/8GB)   │  │    (4 vCPU/8GB)      │  │ │
│  │  │                  │  │                  │  │                      │  │ │
│  │  │  • etcd          │  │  • Kafka         │  │  • Java Services     │  │ │
│  │  │  • API Server    │  │  • Flink         │  │  • PostgreSQL        │  │ │
│  │  │  • Scheduler     │  │  • ClickHouse    │  │  • Redis             │  │ │
│  │  │  • Controller    │  │                  │  │  • React Frontend    │  │ │
│  │  └──────────────────┘  └──────────────────┘  └──────────────────────┘  │ │
│  │                                                                        │ │
│  │  ┌──────────────────────────────────────────────────────────────────┐  │ │
│  │  │                    Observability Stack                           │  │ │
│  │  │    Prometheus → Grafana ← Loki ← OpenTelemetry ← Tempo           │  │ │
│  │  └──────────────────────────────────────────────────────────────────┘  │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘

Data Flow Architecture

┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│    Event     │     │   Ingestion  │     │    Kafka     │     │    Flink     │
│  Generators  │────▶│     API      │────▶│   Cluster    │────▶│  Processing  │
│   (Java)     │     │ (Spring Boot)│     │              │     │              │
└──────────────┘     └──────────────┘     └──────────────┘     └──────┬───────┘
                                                                      │
                     ┌────────────────────────────────────────────────┘
                     │
                     ▼
┌──────────────┐     ┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│    React     │◀────│    Query     │◀────│  ClickHouse  │◀────│  Aggregated  │
│  Dashboard   │     │   Service    │     │              │     │    Data      │
│              │     │              │     └──────────────┘     └──────────────┘
└──────────────┘     └──────┬───────┘
                           │
                     ┌─────┴─────┐
                     ▼           ▼
              ┌──────────┐ ┌──────────┐
              │ PostgreSQL│ │  Redis   │
              │ (Profiles)│ │ (Cache)  │
              └──────────┘ └──────────┘

Observability Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Grafana (Visualization)                   │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │  Prometheus │  │    Loki     │  │       Tempo         │  │
│  │  (Metrics)  │  │   (Logs)    │  │ (Distributed Traces)│  │
│  └──────┬──────┘  └──────┬──────┘  └──────────┬──────────┘  │
└─────────┼────────────────┼─────────────────────┼─────────────┘
          │                │                     │
  ┌───────┴────────────────┴─────────────────────┴───────┐
  │              OpenTelemetry Collector                  │
  └───────────────────────────┬───────────────────────────┘
                              │
          ┌───────────────────┼───────────────────┐
          ▼                   ▼                   ▼
    ┌──────────┐       ┌──────────┐        ┌──────────┐
    │ Java Apps│       │  Kafka   │        │ Postgres │
    │  (OTel)  │       │ Exporter │        │ Exporter │
    └──────────┘       └──────────┘        └──────────┘

Technology Stack

Infrastructure & Orchestration

Technology Purpose Why
Terraform Azure infrastructure provisioning Industry standard, declarative
Pulumi Additional infrastructure (ACR, KeyVault) Learn both approaches
Kubernetes Container orchestration Manual setup for deep learning
Cilium CNI & Network Policies eBPF-based, future of K8s networking
containerd Container runtime Lightweight, K8s native

Data Layer

Technology Purpose Why
Apache Kafka Event streaming Industry standard for streaming
Apache Flink Stream processing Powerful, Java-native
ClickHouse Analytics database Fast columnar storage for time-series
PostgreSQL Relational database ACID compliance for user data
Redis Caching & leaderboards In-memory performance

Application Layer

Technology Purpose Why
Java 21 Primary language Enterprise standard, learning goal
Spring Boot 3 REST API framework Production-ready, comprehensive
Spring WebFlux Reactive programming Non-blocking I/O
jOOQ Database queries Type-safe SQL
Gradle (Kotlin DSL) Build tool Modern, flexible
React + TypeScript Frontend Industry standard
TanStack Query Data fetching Caching, real-time updates

Observability

Technology Purpose Why
Prometheus Metrics collection K8s native monitoring
Grafana Visualization Unified dashboards
Loki Log aggregation Prometheus-like for logs
Tempo Distributed tracing Integrated with Grafana
OpenTelemetry Instrumentation Vendor-neutral standard

Security

Technology Purpose Why
Cilium Network Policies Network segmentation Fine-grained control
Falco Runtime security Threat detection
Kyverno Policy enforcement K8s-native policies
cert-manager TLS certificates Automated certificate management
Sealed Secrets Secret management GitOps-friendly secrets

CI/CD

Technology Purpose Why
GitHub Actions CI/CD pipelines Native integration
Argo CD GitOps deployment Declarative, auditable
Trivy Security scanning Container vulnerability scanning
k6 Load testing Modern performance testing

Project Phases

Phase 1: Infrastructure Foundation (Weeks 1-2)

Goal: Kubernetes cluster running on Azure VMs, fully configured by hand.

Terraform Provisions

  • Resource Group
  • Virtual Network with subnets (control-plane, workers, bastion)
  • 3 Ubuntu 22.04 VMs
  • Network Security Groups
  • Azure Bastion (or SSH jump host for cost savings)
  • Managed disks for persistent storage
# Example: VM configuration
resource "azurerm_linux_virtual_machine" "control_plane" {
  name                = "streamforge-cp"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  size                = "Standard_B2s"
  admin_username      = "azureuser"

  network_interface_ids = [
    azurerm_network_interface.control_plane.id,
  ]

  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }

  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts"
    version   = "latest"
  }
}

Pulumi Provisions

  • Azure Container Registry
  • Key Vault for secrets
  • DNS Zone

Manual Kubernetes Setup

Follow these steps on each VM:

  1. Prepare all nodes:
# Disable swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# Load required kernel modules
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# Set sysctl parameters
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

sudo sysctl --system
  1. Install containerd:
sudo apt-get update
sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd
  1. Install kubeadm, kubelet, kubectl:
sudo apt-get install -y apt-transport-https ca-certificates curl gpg
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.30/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.30/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
  1. Initialize control plane (on control plane node only):
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --control-plane-endpoint="<CONTROL_PLANE_IP>"

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
  1. Install Cilium CNI:
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
cilium install
  1. Join worker nodes:
# Run the join command from kubeadm init output on each worker
sudo kubeadm join <CONTROL_PLANE_IP>:6443 --token <TOKEN> --discovery-token-ca-cert-hash sha256:<HASH>

Deliverables

  • Terraform code for Azure infrastructure
  • Pulumi code for ACR, KeyVault, DNS
  • Ansible playbooks for K8s bootstrap (optional)
  • Working 3-node Kubernetes cluster
  • kubectl access configured locally
  • Cilium CNI functioning

Phase 2: Core Platform Services (Weeks 3-4)

Goal: Deploy all databases and streaming infrastructure on Kubernetes.

Storage Configuration

# azure-disk-storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-disk-standard
provisioner: disk.csi.azure.com
parameters:
  skuName: Standard_LRS
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

PostgreSQL (CloudNativePG Operator)

# Install operator
kubectl apply --server-side -f \
  https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.23/releases/cnpg-1.23.0.yaml
# postgresql-cluster.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: streamforge-db
  namespace: database
spec:
  instances: 2
  storage:
    size: 20Gi
    storageClass: azure-disk-standard
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "256MB"
  bootstrap:
    initdb:
      database: streamforge
      owner: streamforge

Redis Cluster

# Install via Helm
helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis-cluster \
  --namespace database \
  --set persistence.storageClass=azure-disk-standard \
  --set cluster.nodes=6 \
  --set cluster.replicas=1

ClickHouse (Altinity Operator)

# Install operator
kubectl apply -f https://raw.githubusercontent.com/Altinity/clickhouse-operator/master/deploy/operator/clickhouse-operator-install-bundle.yaml
# clickhouse-cluster.yaml
apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  name: streamforge-analytics
  namespace: database
spec:
  configuration:
    clusters:
      - name: analytics
        layout:
          shardsCount: 1
          replicasCount: 2
    users:
      streamforge/password: "secure-password"
      streamforge/networks/ip: "::/0"
  defaults:
    templates:
      dataVolumeClaimTemplate: data-volume
  templates:
    volumeClaimTemplates:
      - name: data-volume
        spec:
          accessModes: [ReadWriteOnce]
          resources:
            requests:
              storage: 50Gi
          storageClassName: azure-disk-standard

Kafka (Strimzi Operator)

# Install Strimzi
kubectl create namespace kafka
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
# kafka-cluster.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: streamforge
  namespace: kafka
spec:
  kafka:
    version: 3.7.0
    replicas: 3
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      transaction.state.log.min.isr: 2
      default.replication.factor: 3
      min.insync.replicas: 2
    storage:
      type: persistent-claim
      size: 20Gi
      class: azure-disk-standard
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      class: azure-disk-standard
  entityOperator:
    topicOperator: {}
    userOperator: {}

Deliverables

  • StorageClass configured for Azure disks
  • PostgreSQL cluster running (2 instances)
  • Redis cluster running (6 nodes)
  • ClickHouse cluster running (2 replicas)
  • Kafka cluster running (3 brokers)
  • All operators understood and documented

Phase 3: Java Application Development (Weeks 5-7)

Goal: Build three Java microservices from scratch.

Project Setup (Gradle Multi-Module)

// settings.gradle.kts
rootProject.name = "streamforge"

include(
    "common",
    "event-ingestion",
    "stream-processor",
    "query-service",
    "event-generator"
)
// build.gradle.kts (root)
plugins {
    java
    id("org.springframework.boot") version "3.3.0" apply false
    id("io.spring.dependency-management") version "1.1.5" apply false
    id("com.google.cloud.tools.jib") version "3.4.2" apply false
}

subprojects {
    apply(plugin = "java")
    
    java {
        toolchain {
            languageVersion = JavaLanguageVersion.of(21)
        }
    }
    
    repositories {
        mavenCentral()
    }
    
    tasks.withType<Test> {
        useJUnitPlatform()
    }
}

Service 1: Event Ingestion (Spring Boot + WebFlux)

// GameEvent.java
public record GameEvent(
    String eventId,
    String playerId,
    String eventType,
    Map<String, Object> payload,
    Instant timestamp,
    String sessionId
) {
    public enum EventType {
        LOGIN, LOGOUT, MATCH_START, MATCH_END, 
        PURCHASE, ACHIEVEMENT, LEVEL_UP
    }
}
// EventController.java
@RestController
@RequestMapping("/api/v1/events")
@RequiredArgsConstructor
@Slf4j
public class EventController {
    
    private final KafkaTemplate<String, GameEvent> kafkaTemplate;
    private final MeterRegistry meterRegistry;
    
    @PostMapping
    public Mono<ResponseEntity<EventResponse>> ingestEvent(
            @Valid @RequestBody GameEvent event) {
        
        return Mono.fromCallable(() -> {
            var enrichedEvent = enrichEvent(event);
            
            kafkaTemplate.send("game-events", 
                enrichedEvent.playerId(), 
                enrichedEvent);
            
            meterRegistry.counter("events.ingested", 
                "type", event.eventType()).increment();
            
            log.debug("Ingested event: {}", enrichedEvent.eventId());
            
            return ResponseEntity.accepted()
                .body(new EventResponse(enrichedEvent.eventId(), "accepted"));
        }).subscribeOn(Schedulers.boundedElastic());
    }
    
    @PostMapping("/batch")
    public Mono<ResponseEntity<BatchResponse>> ingestBatch(
            @Valid @RequestBody List<GameEvent> events) {
        
        return Flux.fromIterable(events)
            .flatMap(this::processEvent)
            .collectList()
            .map(results -> ResponseEntity.accepted()
                .body(new BatchResponse(results.size(), "accepted")));
    }
    
    private GameEvent enrichEvent(GameEvent event) {
        return new GameEvent(
            UUID.randomUUID().toString(),
            event.playerId(),
            event.eventType(),
            event.payload(),
            Instant.now(),
            event.sessionId()
        );
    }
}
// KafkaConfig.java
@Configuration
public class KafkaConfig {
    
    @Bean
    public ProducerFactory<String, GameEvent> producerFactory(
            KafkaProperties properties) {
        
        Map<String, Object> config = new HashMap<>(properties.buildProducerProperties());
        config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
        config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JsonSerializer.class);
        config.put(ProducerConfig.ACKS_CONFIG, "all");
        config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
        
        return new DefaultKafkaProducerFactory<>(config);
    }
    
    @Bean
    public KafkaTemplate<String, GameEvent> kafkaTemplate(
            ProducerFactory<String, GameEvent> producerFactory) {
        return new KafkaTemplate<>(producerFactory);
    }
}

Service 2: Stream Processor (Apache Flink)

// PlayerSessionAnalyzer.java
public class PlayerSessionAnalyzer {
    
    public static void main(String[] args) throws Exception {
        StreamExecutionEnvironment env = 
            StreamExecutionEnvironment.getExecutionEnvironment();
        
        env.enableCheckpointing(60000);
        env.setParallelism(2);
        
        KafkaSource<GameEvent> source = KafkaSource.<GameEvent>builder()
            .setBootstrapServers("streamforge-kafka-bootstrap.kafka:9092")
            .setTopics("game-events")
            .setGroupId("session-analyzer")
            .setStartingOffsets(OffsetsInitializer.earliest())
            .setValueOnlyDeserializer(new GameEventDeserializer())
            .build();
        
        DataStream<GameEvent> events = env.fromSource(
            source, 
            WatermarkStrategy.<GameEvent>forBoundedOutOfOrderness(Duration.ofSeconds(5))
                .withTimestampAssigner((event, timestamp) -> 
                    event.timestamp().toEpochMilli()),
            "Kafka Source"
        );
        
        // Session analysis - group events into player sessions
        DataStream<PlayerSession> sessions = events
            .keyBy(GameEvent::playerId)
            .window(EventTimeSessionWindows.withGap(Time.minutes(30)))
            .aggregate(new SessionAggregator(), new SessionWindowFunction());
        
        // Real-time metrics - events per minute by type
        DataStream<EventMetrics> metrics = events
            .keyBy(GameEvent::eventType)
            .window(TumblingEventTimeWindows.of(Time.minutes(1)))
            .aggregate(new MetricsAggregator());
        
        // Anomaly detection - too many purchases in short time
        Pattern<GameEvent, ?> suspiciousPattern = Pattern.<GameEvent>begin("start")
            .where(new SimpleCondition<>() {
                @Override
                public boolean filter(GameEvent event) {
                    return "PURCHASE".equals(event.eventType());
                }
            })
            .timesOrMore(10)
            .within(Time.minutes(5));
        
        DataStream<Alert> alerts = CEP.pattern(
            events.keyBy(GameEvent::playerId), 
            suspiciousPattern
        ).process(new SuspiciousActivityDetector());
        
        // Sinks
        sessions.addSink(new ClickHouseSink<>("player_sessions"));
        metrics.addSink(new ClickHouseSink<>("event_metrics"));
        alerts.addSink(new AlertSink());
        
        env.execute("StreamForge Analytics");
    }
}
// SessionAggregator.java
public class SessionAggregator implements 
    AggregateFunction<GameEvent, SessionAccumulator, SessionAccumulator> {
    
    @Override
    public SessionAccumulator createAccumulator() {
        return new SessionAccumulator();
    }
    
    @Override
    public SessionAccumulator add(GameEvent event, SessionAccumulator acc) {
        if (acc.sessionStart == null) {
            acc.sessionStart = event.timestamp();
            acc.playerId = event.playerId();
        }
        acc.sessionEnd = event.timestamp();
        acc.eventCount++;
        acc.eventTypes.merge(event.eventType(), 1, Integer::sum);
        
        if ("PURCHASE".equals(event.eventType())) {
            acc.totalSpent += extractAmount(event.payload());
        }
        
        return acc;
    }
    
    @Override
    public SessionAccumulator getResult(SessionAccumulator acc) {
        return acc;
    }
    
    @Override
    public SessionAccumulator merge(SessionAccumulator a, SessionAccumulator b) {
        a.eventCount += b.eventCount;
        b.eventTypes.forEach((k, v) -> a.eventTypes.merge(k, v, Integer::sum));
        a.totalSpent += b.totalSpent;
        if (b.sessionEnd.isAfter(a.sessionEnd)) {
            a.sessionEnd = b.sessionEnd;
        }
        return a;
    }
}

Service 3: Query Service (Spring Boot + jOOQ)

// AnalyticsController.java
@RestController
@RequestMapping("/api/v1/analytics")
@RequiredArgsConstructor
public class AnalyticsController {
    
    private final AnalyticsService analyticsService;
    private final CacheManager cacheManager;
    
    @GetMapping("/dashboard")
    public Mono<DashboardData> getDashboard() {
        return analyticsService.getDashboardData();
    }
    
    @GetMapping("/players/{playerId}/sessions")
    public Flux<PlayerSession> getPlayerSessions(
            @PathVariable String playerId,
            @RequestParam(defaultValue = "7") int days) {
        return analyticsService.getPlayerSessions(playerId, days);
    }
    
    @GetMapping("/leaderboard")
    @Cacheable(value = "leaderboard", key = "#metric + '-' + #limit")
    public Mono<List<LeaderboardEntry>> getLeaderboard(
            @RequestParam(defaultValue = "score") String metric,
            @RequestParam(defaultValue = "100") int limit) {
        return analyticsService.getLeaderboard(metric, limit);
    }
    
    @GetMapping(value = "/stream/events", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ServerSentEvent<EventMetrics>> streamMetrics() {
        return analyticsService.getMetricsStream()
            .map(metrics -> ServerSentEvent.<EventMetrics>builder()
                .data(metrics)
                .build());
    }
}
// AnalyticsRepository.java (ClickHouse queries with jOOQ)
@Repository
@RequiredArgsConstructor
public class AnalyticsRepository {
    
    private final DSLContext clickhouse;
    
    public List<EventMetrics> getMetricsForPeriod(Instant start, Instant end) {
        return clickhouse
            .select(
                field("event_type"),
                count().as("event_count"),
                countDistinct(field("player_id")).as("unique_players")
            )
            .from(table("event_metrics"))
            .where(field("timestamp").between(start, end))
            .groupBy(field("event_type"))
            .fetchInto(EventMetrics.class);
    }
    
    public List<LeaderboardEntry> getTopPlayers(String metric, int limit) {
        return clickhouse
            .select(
                field("player_id"),
                sum(field(metric)).as("value")
            )
            .from(table("player_sessions"))
            .where(field("session_date").greaterThan(LocalDate.now().minusDays(7)))
            .groupBy(field("player_id"))
            .orderBy(field("value").desc())
            .limit(limit)
            .fetchInto(LeaderboardEntry.class);
    }
}

Testing Setup

// Integration test with Testcontainers
@SpringBootTest
@Testcontainers
class EventIngestionIntegrationTest {
    
    @Container
    static KafkaContainer kafka = new KafkaContainer(
        DockerImageName.parse("confluentinc/cp-kafka:7.6.0"));
    
    @DynamicPropertySource
    static void kafkaProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.kafka.bootstrap-servers", kafka::getBootstrapServers);
    }
    
    @Autowired
    private WebTestClient webTestClient;
    
    @Test
    void shouldIngestEvent() {
        var event = new GameEvent(
            null, "player-1", "LOGIN", 
            Map.of("platform", "mobile"), null, "session-1"
        );
        
        webTestClient.post()
            .uri("/api/v1/events")
            .bodyValue(event)
            .exchange()
            .expectStatus().isAccepted()
            .expectBody()
            .jsonPath("$.eventId").isNotEmpty()
            .jsonPath("$.status").isEqualTo("accepted");
    }
}

Deliverables

  • Gradle multi-module project setup
  • Event Ingestion Service with Kafka producer
  • Stream Processor with Flink jobs
  • Query Service with ClickHouse integration
  • Event Generator for testing
  • Unit and integration tests
  • Docker images via Jib

Phase 4: Observability Stack (Week 8)

Goal: Full observability with metrics, logs, and traces.

Install kube-prometheus-stack

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --values observability/prometheus-values.yaml
# prometheus-values.yaml
grafana:
  adminPassword: "your-secure-password"
  persistence:
    enabled: true
    size: 10Gi
  dashboardProviders:
    dashboardproviders.yaml:
      apiVersion: 1
      providers:
        - name: 'custom'
          folder: 'StreamForge'
          type: file
          options:
            path: /var/lib/grafana/dashboards/custom

prometheus:
  prometheusSpec:
    retention: 15d
    storageSpec:
      volumeClaimTemplate:
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 50Gi

alertmanager:
  config:
    route:
      receiver: 'slack'
    receivers:
      - name: 'slack'
        slack_configs:
          - channel: '#alerts'
            api_url: 'https://hooks.slack.com/services/xxx'

Install Loki for Logs

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --set promtail.enabled=true \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=20Gi

Install Tempo for Traces

helm install tempo grafana/tempo \
  --namespace monitoring \
  --set persistence.enabled=true \
  --set persistence.size=10Gi

OpenTelemetry Collector

# otel-collector.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: monitoring
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    
    processors:
      batch:
        timeout: 10s
      memory_limiter:
        check_interval: 1s
        limit_mib: 512
    
    exporters:
      prometheus:
        endpoint: "0.0.0.0:8889"
      loki:
        endpoint: "http://loki:3100/loki/api/v1/push"
      otlp/tempo:
        endpoint: "tempo:4317"
        tls:
          insecure: true
    
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp/tempo]
        metrics:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [prometheus]
        logs:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [loki]

Java Application Instrumentation

// build.gradle.kts - add OpenTelemetry
dependencies {
    implementation("io.opentelemetry:opentelemetry-api")
    implementation("io.opentelemetry.instrumentation:opentelemetry-instrumentation-annotations")
    runtimeOnly("io.opentelemetry.javaagent:opentelemetry-javaagent:2.4.0")
}
// Custom instrumentation
@Service
@RequiredArgsConstructor
public class EventProcessingService {
    
    private final Tracer tracer;
    
    @WithSpan("process-game-event")
    public void processEvent(
            @SpanAttribute("player.id") String playerId,
            GameEvent event) {
        
        Span.current().setAttribute("event.type", event.eventType());
        
        // Processing logic...
    }
}
# Kubernetes deployment with OTel agent
spec:
  containers:
    - name: event-ingestion
      image: streamforge/event-ingestion:latest
      env:
        - name: JAVA_TOOL_OPTIONS
          value: "-javaagent:/otel/opentelemetry-javaagent.jar"
        - name: OTEL_SERVICE_NAME
          value: "event-ingestion"
        - name: OTEL_EXPORTER_OTLP_ENDPOINT
          value: "http://otel-collector:4317"
        - name: OTEL_METRICS_EXPORTER
          value: "otlp"
        - name: OTEL_LOGS_EXPORTER
          value: "otlp"

Grafana Dashboards to Build

  1. Application Overview: Request rate, error rate, latency (RED metrics)
  2. Kafka Dashboard: Consumer lag, partition distribution, throughput
  3. JVM Metrics: Heap usage, GC pauses, thread count
  4. Database Connections: Pool utilization, query latency
  5. Business Metrics: Events/sec by type, active players, revenue

Deliverables

  • Prometheus + Grafana deployed
  • Loki for log aggregation
  • Tempo for distributed tracing
  • OpenTelemetry Collector configured
  • Java apps instrumented with OTel
  • 5+ custom Grafana dashboards
  • Alert rules configured

Phase 5: Frontend Dashboard (Week 9)

Goal: React dashboard with real-time updates.

Project Setup

npm create vite@latest frontend -- --template react-ts
cd frontend
npm install @tanstack/react-query recharts lucide-react
npm install -D tailwindcss postcss autoprefixer
npx tailwindcss init -p

Main Dashboard Component

// src/App.tsx
import { QueryClient, QueryClientProvider } from '@tanstack/react-query';
import { Dashboard } from './components/Dashboard';

const queryClient = new QueryClient({
  defaultOptions: {
    queries: {
      refetchInterval: 5000,
      staleTime: 2000,
    },
  },
});

export function App() {
  return (
    <QueryClientProvider client={queryClient}>
      <div className="min-h-screen bg-gray-900 text-white">
        <Dashboard />
      </div>
    </QueryClientProvider>
  );
}
// src/components/Dashboard.tsx
import { useQuery } from '@tanstack/react-query';
import { MetricCard } from './MetricCard';
import { EventChart } from './EventChart';
import { Leaderboard } from './Leaderboard';
import { useEventStream } from '../hooks/useEventStream';
import { fetchDashboardData } from '../api/analytics';

export function Dashboard() {
  const { data, isLoading } = useQuery({
    queryKey: ['dashboard'],
    queryFn: fetchDashboardData,
  });
  
  const liveEventRate = useEventStream();
  
  if (isLoading) {
    return <LoadingSpinner />;
  }
  
  return (
    <div className="p-6">
      <header className="mb-8">
        <h1 className="text-3xl font-bold">StreamForge Analytics</h1>
        <p className="text-gray-400">Real-time gaming analytics dashboard</p>
      </header>
      
      <div className="grid grid-cols-4 gap-4 mb-8">
        <MetricCard 
          title="Active Players" 
          value={data?.activePlayers} 
          trend={data?.playersTrend}
          icon="users"
        />
        <MetricCard 
          title="Events/sec" 
          value={liveEventRate} 
          icon="activity"
          live
        />
        <MetricCard 
          title="Matches Today" 
          value={data?.matchesToday} 
          trend={data?.matchesTrend}
          icon="gamepad"
        />
        <MetricCard 
          title="Revenue (24h)" 
          value={data?.revenue24h} 
          format="currency"
          trend={data?.revenueTrend}
          icon="dollar-sign"
        />
      </div>
      
      <div className="grid grid-cols-3 gap-6">
        <div className="col-span-2">
          <EventChart data={data?.eventHistory} />
        </div>
        <div>
          <Leaderboard players={data?.topPlayers} />
        </div>
      </div>
    </div>
  );
}

Real-time Event Stream Hook

// src/hooks/useEventStream.ts
import { useState, useEffect } from 'react';

export function useEventStream() {
  const [eventRate, setEventRate] = useState(0);
  
  useEffect(() => {
    const eventSource = new EventSource('/api/v1/analytics/stream/events');
    
    eventSource.onmessage = (event) => {
      const data = JSON.parse(event.data);
      setEventRate(data.eventsPerSecond);
    };
    
    eventSource.onerror = () => {
      eventSource.close();
      // Reconnect after delay
      setTimeout(() => {
        // Reconnection logic
      }, 5000);
    };
    
    return () => eventSource.close();
  }, []);
  
  return eventRate;
}

Event Chart Component

// src/components/EventChart.tsx
import { 
  AreaChart, Area, XAxis, YAxis, Tooltip, 
  ResponsiveContainer, CartesianGrid 
} from 'recharts';

interface EventChartProps {
  data: Array<{
    timestamp: string;
    logins: number;
    matches: number;
    purchases: number;
  }>;
}

export function EventChart({ data }: EventChartProps) {
  return (
    <div className="bg-gray-800 rounded-lg p-4">
      <h2 className="text-xl font-semibold mb-4">Event Activity</h2>
      <ResponsiveContainer width="100%" height={300}>
        <AreaChart data={data}>
          <CartesianGrid strokeDasharray="3 3" stroke="#374151" />
          <XAxis 
            dataKey="timestamp" 
            stroke="#9CA3AF"
            tickFormatter={(value) => new Date(value).toLocaleTimeString()}
          />
          <YAxis stroke="#9CA3AF" />
          <Tooltip 
            contentStyle={{ 
              backgroundColor: '#1F2937', 
              border: 'none',
              borderRadius: '8px'
            }}
          />
          <Area 
            type="monotone" 
            dataKey="logins" 
            stackId="1"
            stroke="#3B82F6" 
            fill="#3B82F6" 
            fillOpacity={0.6}
          />
          <Area 
            type="monotone" 
            dataKey="matches" 
            stackId="1"
            stroke="#10B981" 
            fill="#10B981" 
            fillOpacity={0.6}
          />
          <Area 
            type="monotone" 
            dataKey="purchases" 
            stackId="1"
            stroke="#F59E0B" 
            fill="#F59E0B" 
            fillOpacity={0.6}
          />
        </AreaChart>
      </ResponsiveContainer>
    </div>
  );
}

Deliverables

  • React + TypeScript + Vite setup
  • TanStack Query for data fetching
  • Real-time updates via SSE
  • Responsive dashboard layout
  • Interactive charts with Recharts
  • Leaderboard component
  • Docker build for frontend

Phase 6: CI/CD with GitHub Actions (Week 10)

Goal: Comprehensive CI/CD pipeline with security scanning.

Main CI Pipeline

# .github/workflows/ci.yaml
name: CI

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  REGISTRY: streamforgeacr.azurecr.io
  JAVA_VERSION: '21'
  NODE_VERSION: '20'

jobs:
  changes:
    runs-on: ubuntu-latest
    outputs:
      java: ${{ steps.filter.outputs.java }}
      frontend: ${{ steps.filter.outputs.frontend }}
      infra: ${{ steps.filter.outputs.infra }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            java:
              - 'services/**'
              - 'build.gradle.kts'
            frontend:
              - 'frontend/**'
            infra:
              - 'infrastructure/**'
              - 'kubernetes/**'

  build-java:
    needs: changes
    if: needs.changes.outputs.java == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up JDK
        uses: actions/setup-java@v4
        with:
          distribution: 'temurin'
          java-version: ${{ env.JAVA_VERSION }}
          cache: 'gradle'
      
      - name: Build and Test
        run: ./gradlew build test
      
      - name: Upload test results
        uses: actions/upload-artifact@v4
        if: always()
        with:
          name: test-results
          path: '**/build/test-results/test/*.xml'
      
      - name: Build Docker images
        if: github.event_name == 'push'
        run: |
          ./gradlew jib \
            -Djib.to.image=${{ env.REGISTRY }}/event-ingestion:${{ github.sha }} \
            -Djib.to.auth.username=${{ secrets.ACR_USERNAME }} \
            -Djib.to.auth.password=${{ secrets.ACR_PASSWORD }}

  build-frontend:
    needs: changes
    if: needs.changes.outputs.frontend == 'true'
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: frontend
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Node
        uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
          cache-dependency-path: frontend/package-lock.json
      
      - name: Install dependencies
        run: npm ci
      
      - name: Lint
        run: npm run lint
      
      - name: Type check
        run: npm run typecheck
      
      - name: Test
        run: npm run test -- --coverage
      
      - name: Build
        run: npm run build
      
      - name: Build and push Docker image
        if: github.event_name == 'push'
        uses: docker/build-push-action@v5
        with:
          context: frontend
          push: true
          tags: ${{ env.REGISTRY }}/frontend:${{ github.sha }}

  validate-infrastructure:
    needs: changes
    if: needs.changes.outputs.infra == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
      
      - name: Terraform Format Check
        run: terraform fmt -check -recursive
        working-directory: infrastructure/terraform
      
      - name: Terraform Validate
        run: |
          terraform init -backend=false
          terraform validate
        working-directory: infrastructure/terraform/environments/dev
      
      - name: Setup Pulumi
        uses: pulumi/actions@v5
      
      - name: Pulumi Preview
        run: pulumi preview --stack dev
        working-directory: infrastructure/pulumi
        env:
          PULUMI_ACCESS_TOKEN: ${{ secrets.PULUMI_ACCESS_TOKEN }}

  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'
      
      - name: Run TruffleHog (secrets scanning)
        uses: trufflesecurity/trufflehog@main
        with:
          extra_args: --only-verified
      
      - name: OWASP Dependency Check
        run: ./gradlew dependencyCheckAnalyze
      
      - name: Upload OWASP report
        uses: actions/upload-artifact@v4
        with:
          name: dependency-check-report
          path: build/reports/dependency-check-report.html

  kubernetes-lint:
    needs: changes
    if: needs.changes.outputs.infra == 'true'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Install kubeconform
        run: |
          wget https://github.com/yannh/kubeconform/releases/latest/download/kubeconform-linux-amd64.tar.gz
          tar xf kubeconform-linux-amd64.tar.gz
          sudo mv kubeconform /usr/local/bin/
      
      - name: Validate Kubernetes manifests
        run: |
          kubeconform -strict -summary kubernetes/
      
      - name: Run kube-linter
        uses: stackrox/kube-linter-action@v1
        with:
          directory: kubernetes/

Deploy Pipeline

# .github/workflows/deploy.yaml
name: Deploy

on:
  push:
    branches: [main]
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy to'
        required: true
        default: 'staging'
        type: choice
        options:
          - staging
          - production

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: ${{ github.event.inputs.environment || 'staging' }}
    steps:
      - uses: actions/checkout@v4
      
      - name: Azure Login
        uses: azure/login@v2
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
      
      - name: Set up kubeconfig
        uses: azure/k8s-set-context@v4
        with:
          method: kubeconfig
          kubeconfig: ${{ secrets.KUBE_CONFIG }}
      
      - name: Deploy to Kubernetes
        run: |
          # Update image tags
          cd kubernetes/overlays/${{ github.event.inputs.environment || 'staging' }}
          kustomize edit set image \
            event-ingestion=${{ env.REGISTRY }}/event-ingestion:${{ github.sha }} \
            query-service=${{ env.REGISTRY }}/query-service:${{ github.sha }} \
            frontend=${{ env.REGISTRY }}/frontend:${{ github.sha }}
          
          # Apply changes
          kubectl apply -k .
          
          # Wait for rollout
          kubectl rollout status deployment/event-ingestion -n streamforge
          kubectl rollout status deployment/query-service -n streamforge
          kubectl rollout status deployment/frontend -n streamforge
      
      - name: Run smoke tests
        run: |
          INGRESS_IP=$(kubectl get ingress -n streamforge -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')
          curl -f http://${INGRESS_IP}/api/v1/health || exit 1
      
      - name: Notify on failure
        if: failure()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "Deployment failed for ${{ github.repository }}",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": ":x: Deployment to ${{ github.event.inputs.environment || 'staging' }} failed"
                  }
                }
              ]
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Load Testing Workflow

# .github/workflows/load-test.yaml
name: Load Test

on:
  workflow_dispatch:
    inputs:
      duration:
        description: 'Test duration'
        required: true
        default: '5m'
      vus:
        description: 'Virtual users'
        required: true
        default: '50'

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Install k6
        run: |
          sudo gpg -k
          sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
          echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
          sudo apt-get update
          sudo apt-get install k6
      
      - name: Run load test
        run: |
          k6 run \
            --duration ${{ github.event.inputs.duration }} \
            --vus ${{ github.event.inputs.vus }} \
            --out json=results.json \
            tests/load/event-ingestion.js
      
      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: load-test-results
          path: results.json

Deliverables

  • CI pipeline with path-based triggers
  • Java build with Gradle and Jib
  • Frontend build with Node
  • Infrastructure validation (Terraform + Pulumi)
  • Security scanning (Trivy, TruffleHog, OWASP)
  • Kubernetes manifest validation
  • Deployment pipeline with rollout status
  • Load testing workflow with k6

Phase 7: Security Hardening (Week 11)

Goal: Production-grade security posture.

Network Policies with Cilium

# network-policies/api-service.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-service-policy
  namespace: streamforge
spec:
  endpointSelector:
    matchLabels:
      app: event-ingestion
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: ingress-nginx
      toPorts:
        - ports:
            - port: "8080"
              protocol: TCP
  egress:
    - toEndpoints:
        - matchLabels:
            app.kubernetes.io/name: kafka
      toPorts:
        - ports:
            - port: "9092"
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: UDP
# network-policies/database.yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: postgres-policy
  namespace: database
spec:
  endpointSelector:
    matchLabels:
      cnpg.io/cluster: streamforge-db
  ingress:
    - fromEndpoints:
        - matchLabels:
            app: query-service
        - matchLabels:
            app: stream-processor
      toPorts:
        - ports:
            - port: "5432"
  egress:
    - toEndpoints:
        - matchLabels:
            cnpg.io/cluster: streamforge-db

Kyverno Policies

# policies/require-labels.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-labels
spec:
  validationFailureAction: Enforce
  rules:
    - name: require-team-label
      match:
        any:
          - resources:
              kinds:
                - Deployment
                - StatefulSet
      validate:
        message: "Label 'team' is required"
        pattern:
          metadata:
            labels:
              team: "?*"
# policies/disallow-privileged.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: disallow-privileged-containers
spec:
  validationFailureAction: Enforce
  rules:
    - name: deny-privileged
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "Privileged containers are not allowed"
        pattern:
          spec:
            containers:
              - securityContext:
                  privileged: "false"
# policies/require-resource-limits.yaml
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-resource-limits
spec:
  validationFailureAction: Enforce
  rules:
    - name: require-limits
      match:
        any:
          - resources:
              kinds:
                - Pod
      validate:
        message: "CPU and memory limits are required"
        pattern:
          spec:
            containers:
              - resources:
                  limits:
                    memory: "?*"
                    cpu: "?*"

Falco Rules

# falco/custom-rules.yaml
customRules:
  rules-streamforge.yaml: |-
    - rule: Unexpected outbound connection from Java app
      desc: Detect outbound connections from Java apps to unexpected destinations
      condition: >
        outbound and
        container.image.repository contains "streamforge" and
        not (fd.sip in (kafka_ips, postgres_ips, clickhouse_ips, redis_ips))
      output: >
        Unexpected outbound connection from StreamForge app
        (command=%proc.cmdline connection=%fd.name container=%container.name)
      priority: WARNING
      
    - rule: Database credential access
      desc: Detect access to database credential files
      condition: >
        open_read and
        fd.name contains "database" and
        fd.name contains "password"
      output: >
        Database credential file accessed
        (user=%user.name command=%proc.cmdline file=%fd.name)
      priority: CRITICAL

Secret Management with Sealed Secrets

# Install Sealed Secrets controller
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm install sealed-secrets sealed-secrets/sealed-secrets \
  --namespace kube-system
# Create a sealed secret
kubectl create secret generic db-credentials \
  --from-literal=username=streamforge \
  --from-literal=password=super-secret \
  --dry-run=client -o yaml | \
  kubeseal --format yaml > sealed-db-credentials.yaml

cert-manager for TLS

# cert-manager/cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: [email protected]
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
      - http01:
          ingress:
            class: nginx
# ingress with automatic TLS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: streamforge-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - streamforge.example.com
      secretName: streamforge-tls
  rules:
    - host: streamforge.example.com
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: api-gateway
                port:
                  number: 80

Pod Security Standards

# namespace with restricted security
apiVersion: v1
kind: Namespace
metadata:
  name: streamforge
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Deliverables

  • Cilium network policies for all services
  • Kyverno policies enforced
  • Falco runtime monitoring
  • Sealed Secrets for GitOps
  • cert-manager with Let's Encrypt
  • Pod Security Standards enforced
  • RBAC properly configured
  • Security documentation

Phase 8: Emerging Tools to Explore (Week 12)

Goal: Hands-on experience with cutting-edge tools.

Tools to Integrate

Tool Category Integration
Crossplane IaC Manage Azure resources via K8s CRDs
Argo CD GitOps Declarative deployments
Backstage Developer Portal Service catalog
OpenCost FinOps Cost monitoring per service
Kubeshark Debugging API traffic analysis
Karpenter Autoscaling Smart node provisioning

Argo CD Setup

# argocd/application.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: streamforge
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/your-org/streamforge
    targetRevision: HEAD
    path: kubernetes/overlays/production
  destination:
    server: https://kubernetes.default.svc
    namespace: streamforge
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Crossplane for Azure

# crossplane/azure-provider.yaml
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-azure
spec:
  package: xpkg.upbound.io/upbound/provider-azure:v0.42.0
# crossplane/storage-account.yaml
apiVersion: storage.azure.upbound.io/v1beta1
kind: Account
metadata:
  name: streamforge-backup
spec:
  forProvider:
    resourceGroupName: streamforge-rg
    location: westeurope
    accountTier: Standard
    accountReplicationType: LRS
  providerConfigRef:
    name: azure-provider

OpenCost Integration

helm install opencost opencost/opencost \
  --namespace opencost \
  --create-namespace \
  --set prometheus.enabled=false \
  --set prometheus.external.url=http://prometheus-server.monitoring:9090

Deliverables

  • Argo CD for GitOps deployments
  • Crossplane managing some Azure resources
  • OpenCost dashboard in Grafana
  • Kubeshark for debugging
  • Documentation on each tool

Repository Structure

streamforge/
├── .github/
│   └── workflows/
│       ├── ci.yaml
│       ├── deploy.yaml
│       └── load-test.yaml
├── infrastructure/
│   ├── terraform/
│   │   ├── modules/
│   │   │   ├── network/
│   │   │   ├── compute/
│   │   │   └── security/
│   │   └── environments/
│   │       ├── dev/
│   │       ├── staging/
│   │       └── prod/
│   ├── pulumi/
│   │   └── index.ts
│   └── ansible/
│       ├── playbooks/
│       └── inventory/
├── kubernetes/
│   ├── base/
│   │   ├── event-ingestion/
│   │   ├── stream-processor/
│   │   ├── query-service/
│   │   └── frontend/
│   ├── overlays/
│   │   ├── dev/
│   │   ├── staging/
│   │   └── production/
│   └── helm-values/
│       ├── kafka-values.yaml
│       ├── postgres-values.yaml
│       └── monitoring-values.yaml
├── services/
│   ├── common/
│   │   └── src/main/java/
│   ├── event-ingestion/
│   │   ├── src/
│   │   └── build.gradle.kts
│   ├── stream-processor/
│   │   ├── src/
│   │   └── build.gradle.kts
│   ├── query-service/
│   │   ├── src/
│   │   └── build.gradle.kts
│   └── event-generator/
│       └── src/
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   ├── hooks/
│   │   ├── api/
│   │   └── App.tsx
│   ├── package.json
│   └── Dockerfile
├── observability/
│   ├── dashboards/
│   │   ├── application.json
│   │   ├── kafka.json
│   │   └── jvm.json
│   ├── alerts/
│   │   └── rules.yaml
│   └── otel-collector-config.yaml
├── security/
│   ├── network-policies/
│   ├── kyverno-policies/
│   └── falco-rules/
├── tests/
│   ├── load/
│   │   └── event-ingestion.js
│   └── e2e/
├── docs/
│   ├── architecture/
│   ├── runbooks/
│   └── decisions/
├── scripts/
│   ├── setup-cluster.sh
│   ├── deploy-databases.sh
│   └── cleanup.sh
├── build.gradle.kts
├── settings.gradle.kts
└── README.md

Getting Started

Prerequisites

  • Azure subscription
  • Azure CLI installed
  • Terraform >= 1.5
  • Pulumi >= 3.0
  • kubectl >= 1.30
  • Helm >= 3.14
  • Java 21 (Temurin/Adoptium)
  • Node.js 20 LTS
  • Docker

Quick Start

# 1. Clone the repository
git clone https://github.com/your-org/streamforge.git
cd streamforge

# 2. Login to Azure
az login
az account set --subscription "Your Subscription"

# 3. Deploy infrastructure
cd infrastructure/terraform/environments/dev
terraform init
terraform plan -out=tfplan
terraform apply tfplan

# 4. Bootstrap Kubernetes (see Phase 1 instructions)

# 5. Deploy platform services
./scripts/deploy-databases.sh

# 6. Build and deploy applications
./gradlew build jib
kubectl apply -k kubernetes/overlays/dev

# 7. Access the dashboard
kubectl port-forward svc/frontend 3000:80 -n streamforge
open http://localhost:3000

Cost Estimation

Monthly Costs (Development Environment)

Resource Specification Cost (EUR)
Control Plane VM Standard_B2s (2 vCPU, 4GB) ~€35
Worker VMs (2x) Standard_D2s_v5 Spot ~€40
Managed Disks 100GB Standard SSD ~€15
Container Registry Basic tier ~€5
Networking/Egress ~50GB/month ~€5
Total ~€100/month

Cost Optimization Tips

  1. Use spot instances for worker nodes (70% savings)
  2. Deallocate VMs when not in use: az vm deallocate -g streamforge-rg -n <vm-name>
  3. Use Azure Bastion only when needed, or use SSH jump host instead
  4. Set up auto-shutdown schedules for dev environment
  5. Monitor costs with OpenCost

Learning Resources

Java

Kubernetes

Stream Processing

DevOps & Infrastructure


Progress Tracking

Use this checklist to track your progress:

Phase 1: Infrastructure ⬜

  • Terraform code complete
  • Pulumi code complete
  • VMs provisioned
  • Kubernetes cluster bootstrapped
  • Cilium CNI installed
  • kubectl access working

Phase 2: Platform Services ⬜

  • PostgreSQL running
  • Redis running
  • ClickHouse running
  • Kafka running
  • All persistence configured

Phase 3: Java Development ⬜

  • Project structure created
  • Event Ingestion service
  • Stream Processor service
  • Query service
  • Tests passing
  • Docker images built

Phase 4: Observability ⬜

  • Prometheus deployed
  • Grafana deployed
  • Loki deployed
  • Tempo deployed
  • OTel Collector configured
  • Dashboards created
  • Alerts configured

Phase 5: Frontend ⬜

  • React app scaffolded
  • Dashboard components
  • Real-time updates
  • Charts working
  • Docker build

Phase 6: CI/CD ⬜

  • CI pipeline
  • Security scanning
  • Deploy pipeline
  • Load testing

Phase 7: Security ⬜

  • Network policies
  • Kyverno policies
  • Falco rules
  • TLS configured
  • Secrets management

Phase 8: Advanced ⬜

  • Argo CD
  • Crossplane
  • OpenCost
  • Documentation complete

Contributing

This is a personal learning project, but suggestions and improvements are welcome!


License

MIT License - feel free to use this as a template for your own learning journey.


Happy Learning! 🚀

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages