Skip to content

Production MLOps pipeline for Paris bike traffic prediction. Airflow orchestration, MLflow tracking (Cloud SQL), FastAPI deployment. Features: automated ingestion, drift detection, champion/challenger models, Prometheus+Grafana monitoring, Discord alerts. 15 Docker services locally.

Notifications You must be signed in to change notification settings

arthurcornelio88/bike-count-prediction-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🚲 Bike Traffic Prediction - MLOps Production Pipeline

CI Tests codecov Code style: ruff security: bandit type: mypy

Version 2.0.0-rc.1 - MLOps pipeline (Local Development Ready) for predicting hourly bike traffic in Paris. Features automated data ingestion from Paris Open Data API, intelligent drift detection with sliding window training, champion/challenger model system with double evaluation, real-time monitoring via Prometheus + Grafana, and Discord alerting for critical events. Orchestrated with Airflow, tracked with MLflow (Cloud SQL backend), and deployed with FastAPI. All infrastructure runs locally via Docker Compose with 15 services. Production Kubernetes deployment under construction.


πŸŽ₯ VidΓ©o demo

Watch a 4‑minute demo of the complete MLOps pipeline on Vimeo

πŸš€ Quick Start

Local Development (All Services)

Environment configuration (local vs prod)

  • Local (recommended for development)
    1. Copy the example file and edit values:
      cp .env.example .env
      # then edit .env with your editor (do NOT commit)
    2. Example file: .env.example
# Start full MLOps stack (MLflow, Airflow, FastAPI, Monitoring)
./scripts/start-all.sh --with-monitoring

# Access services
open http://localhost:5000   # MLflow tracking
open http://localhost:8081   # Airflow (admin / see .env)
open http://localhost:8000   # FastAPI API docs
open http://localhost:3000   # Grafana (admin / see .env)
open http://localhost:9090   # Prometheus

Trigger DAGs

# DAG 1: Ingest data from Paris Open Data API
docker exec airflow-webserver airflow dags trigger fetch_comptage_daily

# DAG 2: Generate predictions
docker exec airflow-webserver airflow dags trigger daily_prediction

# DAG 3: Monitor & train (with force flag)
docker exec airflow-webserver airflow dags trigger monitor_and_fine_tune \
  --conf '{"force_fine_tune": true, "test_mode": false}'

🎯 Features

MLOps Core

  • βœ… Champion/Challenger System - Explicit model promotion with double evaluation
  • βœ… Sliding Window Training - Learns from fresh data (660K baseline + 1.6K current)
  • βœ… Drift Detection - Evidently-based monitoring with hybrid retraining strategy
  • βœ… Real-time Monitoring - Prometheus metrics + 4 Grafana dashboards
  • βœ… Discord Alerting - Critical events, training failures, champion promotions

Data Pipeline

  • βœ… Automated Ingestion - Daily fetch from Paris Open Data API β†’ BigQuery
  • βœ… Prediction Pipeline - Daily ML predictions on last 7 days
  • βœ… Audit Logs - All training runs, drift metrics, deployment decisions tracked

Model Registry

  • βœ… MLflow Tracking - Cloud SQL PostgreSQL backend + GCS artifacts
  • βœ… Custom Registry - summary.json for fast model loading
  • βœ… Priority Loading - Champion models loaded first regardless of metrics

Quality Assurance

  • βœ… 68% Code Coverage - 47 tests across 4 suites
  • βœ… CI/CD Pipeline - GitHub Actions with Codecov integration
  • βœ… Pre-commit Hooks - Ruff, MyPy, Bandit, YAML validation

πŸ“Š Architecture Overview

3-Layer MLOps Stack

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LAYER 3: MLOps (Training & Monitoring)                 β”‚
β”‚  β€’ DAG 3: Monitor & Train (weekly)                      β”‚
β”‚  β€’ Sliding window training (660K + 1.6K samples)        β”‚
β”‚  β€’ Double evaluation (test_baseline + test_current)     β”‚
β”‚  β€’ Champion promotion + Discord alerts                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LAYER 2: DataOps (Ingestion & Predictions)             β”‚
β”‚  β€’ DAG 1: Daily data ingestion β†’ BigQuery               β”‚
β”‚  β€’ DAG 2: Daily predictions (last 7 days)               β”‚
β”‚  β€’ 3 BigQuery datasets (raw, predictions, audit)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LAYER 1: InfraOps (Services & Storage)                 β”‚
β”‚  β€’ 15 Docker services (MLflow, Airflow, FastAPI)        β”‚
β”‚  β€’ GCP: BigQuery, Cloud SQL, GCS                        β”‚
β”‚  β€’ Monitoring: Prometheus, Grafana, airflow-exporter    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Docker Services (15 containers)

Core Stack:

  • mlflow (port 5000) - Tracking server with Cloud SQL backend
  • regmodel-backend (port 8000) - FastAPI with 5 endpoints
  • airflow-webserver (port 8081) - DAG management UI
  • airflow-scheduler - Task scheduling
  • airflow-worker - Celery task execution
  • cloud-sql-proxy - Secure Cloud SQL connection

Monitoring Stack (--profile monitoring):

  • prometheus (port 9090) - Metrics collection
  • pushgateway (port 9091:9091) - For testing metrics collection
  • grafana (port 3000) - 4 dashboards (overview, performance, drift, training)
  • airflow-exporter (port 9101) - Custom MLOps metrics

Supporting Services:

  • postgres-airflow - Airflow metadata DB
  • redis-airflow - Celery broker
  • flower (port 5555) - Celery monitoring

High quality schema, for zoomed view:

Architecture


πŸ“ Project Structure

β”œβ”€β”€ backend/regmodel/app/       # FastAPI backend
β”‚   β”œβ”€β”€ fastapi_app.py          # API endpoints (/train, /predict, /promote_champion)
β”‚   β”œβ”€β”€ train.py                # Training logic (sliding window)
β”‚   β”œβ”€β”€ model_registry_summary.py  # Custom registry (summary.json)
β”‚   └── middleware/             # Prometheus metrics middleware
β”œβ”€β”€ dags/                       # Airflow DAGs
β”‚   β”œβ”€β”€ dag_daily_fetch_data.py      # Data ingestion (daily @ 02:00)
β”‚   β”œβ”€β”€ dag_daily_prediction.py      # Predictions (daily @ 04:00)
β”‚   β”œβ”€β”€ dag_monitor_and_train.py     # Monitor & train (weekly @ Sunday)
β”‚   └── utils/discord_alerts.py      # Discord webhook integration
β”œβ”€β”€ monitoring/                 # Monitoring configuration
β”‚   β”œβ”€β”€ grafana/provisioning/   # 4 dashboards + alerting rules
β”‚   β”œβ”€β”€ prometheus.yml          # Scrape config (3 targets)
β”‚   └── custom_exporters/       # airflow_exporter.py (MLOps metrics)
β”œβ”€β”€ scripts/                    # Utility scripts
β”‚   β”œβ”€β”€ start-all.sh            # Start all services (with/without monitoring)
β”‚   β”œβ”€β”€ restart-airflow.sh      # Reset Airflow password
β”‚   └── reset-airflow-password.sh
β”œβ”€β”€ data/                       # Training data (DVC tracked)
β”‚   β”œβ”€β”€ train_baseline.csv      # 660K samples (69.7%)
β”‚   └── test_baseline.csv       # 181K samples (30.3%)
β”œβ”€β”€ docs/                       # Documentation (20+ files)
β”‚   β”œβ”€β”€ MLOPS_ROADMAP.md        # Complete project roadmap
β”‚   β”œβ”€β”€ training_strategy.md    # Sliding window + drift management
β”‚   β”œβ”€β”€ ARCHITECTURE_DIAGRAM_GUIDE.md  # Excalidraw guide (3 layers)
β”‚   β”œβ”€β”€ DEMO_SCRIPT.md          # 2-minute video demo script
β”‚   └── monitoring/             # Monitoring docs (4 files)
β”œβ”€β”€ tests/                      # Test suite (47 tests, 68% coverage)
β”‚   β”œβ”€β”€ test_pipelines.py       # RF, NN pipeline validation
β”‚   β”œβ”€β”€ test_preprocessing.py   # Transformer logic
β”‚   β”œβ”€β”€ test_api_regmodel.py    # FastAPI endpoints
β”‚   └── test_model_registry.py  # Registry logic
β”œβ”€β”€ docker-compose.yaml         # 15 services (6GB memory for training)
└── .github/workflows/ci.yml    # CI/CD pipeline

πŸ”§ Setup & Installation

Prerequisites

  • Docker & Docker Compose
  • GCP credentials (service account JSON)
  • Python 3.11+ (for local development)

1. Environment Configuration

Create .env file at project root (see docs/secrets.md for production setup):

# ========================================
# Environment & GCP
# ========================================
ENV=DEV                                    # DEV or PROD
GOOGLE_CLOUD_PROJECT=your-project-id
GOOGLE_APPLICATION_CREDENTIALS=gcp.json

# ========================================
# BigQuery Configuration
# ========================================
BQ_PROJECT=your-project-id
BQ_RAW_DATASET=bike_traffic_raw          # Raw data from API
BQ_PREDICT_DATASET=bike_traffic_predictions  # Model predictions
BQ_LOCATION=europe-west1

# ========================================
# Google Cloud Storage
# ========================================
GCS_BUCKET=your-bucket-name              # MLflow artifacts + model registry

# ========================================
# API Configuration
# ========================================
API_URL_DEV=http://regmodel-api:8000     # Internal Docker network
API_KEY_SECRET=dev-key-unsafe            # Change for production!

# ========================================
# Model Performance Thresholds (v2.0.0)
# ========================================
R2_CRITICAL=0.45      # Below this β†’ immediate retraining
R2_WARNING=0.55       # Below this + drift β†’ proactive retraining
RMSE_THRESHOLD=90.0   # Above this β†’ immediate retraining

# Note: If you change these thresholds, also update:
#   - monitoring/grafana/provisioning/dashboards/overview.json (lines 177, 181)
#   - monitoring/grafana/provisioning/dashboards/model_performance.json
#   - monitoring/grafana/provisioning/dashboards/training_deployment.json

# ========================================
# Discord Alerting (Optional)
# ========================================
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/...

# ========================================
# Grafana
# ========================================
GF_SECURITY_ADMIN_PASSWORD=your-strong-password

# ========================================
# MLflow Backend (Cloud SQL)
# ========================================
MLFLOW_TRACKING_URI=http://mlflow:5000
MLFLOW_DB_USER=mlflow_user
MLFLOW_DB_PASSWORD=your-db-password
MLFLOW_DB_NAME=mlflow
MLFLOW_INSTANCE_CONNECTION=project-id:region:instance-name

# ========================================
# Airflow Configuration
# ========================================
_AIRFLOW_WWW_USER_USERNAME=admin
_AIRFLOW_WWW_USER_PASSWORD=admin
AIRFLOW_UID=50000  # Match host user for volume permissions
AIRFLOW_GID=50000

Security Notes:

  • ⚠️ .env contains secrets - NEVER commit to Git (already in .gitignore)
  • πŸ” For production: Use GCP Secret Manager (see docs/secrets.md)

2. Clone & Install

git clone https://github.com/arthurcornelio88/ds_traffic_cyclist1.git
cd ds_traffic_cyclist1

# Install dependencies (local dev)
uv init
uv venv
uv sync
source .venv/bin/activate

2. Configure GCP Credentials

# Place service account JSON in project root
# File: mlflow-trainer.json (for training + model upload)

Required GCP Services:

  • BigQuery (3 datasets: raw, predictions, audit)
  • Cloud SQL PostgreSQL (MLflow metadata)
  • GCS bucket: gs://df_traffic_cyclist1/

See docs/secrets.md for detailed setup.

3. Start Services

# Option 1: Core services only (MLflow, Airflow, FastAPI)
./scripts/start-all.sh

# Option 2: With monitoring (Prometheus + Grafana)
./scripts/start-all.sh --with-monitoring

# Check logs
docker compose logs -f regmodel-backend
docker compose logs -f airflow-scheduler

πŸ§ͺ Development

Run Tests

# All tests with coverage
uv run pytest tests/ -v --cov

# Specific test suite
uv run pytest tests/test_api_regmodel.py -v

# Generate HTML coverage report
uv run pytest tests/ --cov --cov-report=html
open htmlcov/index.html

Pre-commit Hooks

# Install hooks
uv run pre-commit install

# Run manually
uv run pre-commit run --all-files

Train Champion Model Locally

# Quick test (1K samples)
python backend/regmodel/app/train.py \
  --model-type rf \
  --data-source baseline \
  --model-test \
  --env dev

# Full production training (660K samples)
python backend/regmodel/app/train.py \
  --model-type rf \
  --data-source baseline \
  --env dev

πŸ“‘ API Endpoints

FastAPI (port 8000)

Base URL: http://localhost:8000

Endpoint Method Purpose
/train POST Train model with sliding window
/predict POST Generate predictions (returns champion metadata)
/evaluate POST Evaluate champion on test_baseline
/drift POST Detect data drift (Evidently)
/promote_champion POST Promote model to champion status
/metrics GET Prometheus metrics (scraped every 15s)
/docs GET Interactive API documentation

Example: Train via API

curl -X POST "http://localhost:8000/train" \
  -H "Content-Type: application/json" \
  -d '{
    "model_type": "rf",
    "data_source": "baseline",
    "test_mode": false,
    "env": "dev"
  }'

πŸ“Š Monitoring & Dashboards

Grafana Dashboards (4 total)

Access: http://localhost:3000 (admin / see .env)

  1. MLOps - Overview

    • Drift status (50% detected)
    • Champion RΒ² (0.78)
    • API request rate & error rate
    • Services health
  2. MLOps - Model Performance

    • RΒ² trends (champion vs challenger)
    • RMSE: 32.5
    • API latency percentiles (P50/P95/P99)
  3. MLOps - Drift Monitoring

    • Drift evolution over time
    • Drifted features count
    • RΒ² vs drift correlation
  4. MLOps - Training & Deployment

    • Training success rate (100%)
    • Deployment decisions (deploy/skip/reject)
    • Model improvement delta

Prometheus Metrics (15+ custom)

Key Metrics:

  • bike_model_r2_champion_current - Champion RΒ² on recent data
  • bike_drift_detected - Binary drift flag (0/1)
  • bike_training_runs_total - Training runs counter
  • bike_model_deployments_total - Deployment decisions
  • fastapi_requests_total - API request rate
  • fastapi_request_duration_seconds - API latency

See docs/monitoring/03_metrics_reference.md for full catalog.


🎬 Demo Video (2 minutes)

See docs/DEMO_SCRIPT.md for video presentation guide:

  1. Infrastructure startup (0:00-0:15) - Start all services
  2. Data pipeline (0:15-0:45) - DAG 1 & 2, Discord alerts, BigQuery
  3. MLOps pipeline (0:45-1:30) - DAG 3 training, champion promotion
  4. Grafana dashboards (1:30-2:00) - 4 dashboards overview

πŸ“š Documentation

Core Documentation

Infrastructure

Setup Guides

Monitoring

Presentation Materials


🚧 Version History

Version 2.0.0 (Current - November 2025)

Status: Local development ready, production Kubernetes deployment under construction

Major Features:

  • βœ… Production MLOps pipeline (Airflow, MLflow, FastAPI)
  • βœ… Champion/Challenger system with double evaluation
  • βœ… Sliding window training (660K + 1.6K samples)
  • βœ… Real-time monitoring (Prometheus + Grafana)
  • βœ… Discord alerting
  • 🚧 Kubernetes deployment (under construction)
  • 🚧 Production GCP deployment (under construction)

Version 1.0.0 (Legacy)

Features:

  • Streamlit frontend for manual predictions
  • Basic MLflow tracking (local only)
  • Single model registry (summary.json)
  • No automated orchestration

Note: V1 frontend (Streamlit) is deprecated in V2. Focus shifted to automated MLOps pipeline.


πŸ”‘ Key Technical Decisions

Data Strategy

  • Unified baseline: 905K records from current_api_data.csv (2024-09-01 β†’ 2025-10-10)
  • Temporal split: 660K train (69.7%) + 181K test (30.3%)
  • DVC tracking: Data versioned with GCS remote storage

Drift Management

  • Hybrid strategy: Proactive (preventive) + Reactive (corrective) triggers
  • Thresholds: RΒ² < 0.65 (critical), drift β‰₯ 50% (proactive)
  • Decision matrix: 5 priority levels (force, reactive, proactive, wait, all good)

Training Strategy

  • Sliding window: Concatenate train_baseline (660K) + train_current (1.6K)
  • Double evaluation: test_baseline (regression check) + test_current (improvement check)
  • Deployment logic: REJECT (RΒ² < 0.60) / SKIP (no improvement) / DEPLOY (RΒ² gain > 0.02)

πŸ› Troubleshooting

Airflow password issues

./scripts/reset-airflow-password.sh
# Default: admin / admin

Container memory issues

# Check memory usage
docker stats

# Restart with clean slate
docker compose down -v
./scripts/start-all.sh --with-monitoring

MLflow connection issues

See docs/mlflow_cloudsql.md for Cloud SQL troubleshooting.

Training failures

Check Discord alerts or Airflow logs:

docker compose logs -f airflow-scheduler

πŸ‘₯ Contributors

Built with ❀️ by:


πŸ“„ License

This project is part of a DataScientest MLOps training program.


Last Updated: November 2025 Version: 2.0.0 Status: Local ready, Kubernetes deployment under construction

About

Production MLOps pipeline for Paris bike traffic prediction. Airflow orchestration, MLflow tracking (Cloud SQL), FastAPI deployment. Features: automated ingestion, drift detection, champion/challenger models, Prometheus+Grafana monitoring, Discord alerts. 15 Docker services locally.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •