intersystems-community
diff --git a/‎.gitignore‎
Lines changed: 10 additions & 0 deletions b/‎.gitignore‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 141 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 141 additions & 0 deletions
diff --git a/‎PRD.md‎
Lines changed: 230 additions & 0 deletions b/‎PRD.md‎
Lines changed: 230 additions & 0 deletions
@@ -235,3 +235,13 @@ notebooks/
 .ipynb_checkpoints/
 .venv/
 .uv/
+
+# Allow tracked demo notebooks (outputs must remain stripped)
+!demos/credit_risk/notebooks/
+!demos/fraud_detection/notebooks/
+!demos/sales_forecasting/notebooks/
+!demos/dna_similarity/notebooks/
+!demos/credit_risk/notebooks/01_Credit_Risk_Complete_Demo.ipynb
+!demos/fraud_detection/notebooks/01_Fraud_Detection_Complete_Demo.ipynb
+!demos/sales_forecasting/notebooks/01_Sales_Forecasting_Complete_Demo.ipynb
+!demos/dna_similarity/notebooks/01_DNA_Similarity_Complete_Demo.ipynb
@@ -0,0 +1,141 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+This repository demonstrates **IntegratedML's Custom Models feature**, which allows Python ML models to be executed directly within InterSystems IRIS SQL commands. This is a groundbreaking capability that enables:
+
+- Custom Python preprocessing and model training code within SQL `CREATE MODEL` statements
+- Model validation using `VALIDATE MODEL` syntax
+- Real-time predictions using `SELECT ... PREDICT()` without data movement
+- Integration of any scikit-learn compatible model into database workflows
+
+**Key Innovation**: Data scientists can now bring their custom Python models directly into SQL, eliminating the need for data export/import cycles and enabling real-time ML on live data.
+
+See [PRD.md](PRD.md) for the complete product vision and feature documentation.
+
+## Common Development Commands
+
+### Environment Setup
+```bash
+# Install dependencies with uv (recommended) or pip
+make install
+
+# Start IRIS database (required for demos)
+make start
+
+# Complete setup (dependencies + IRIS)
+make setup
+```
+
+### Development Workflow
+```bash
+# Run all tests
+make test
+# or directly with pytest
+pytest demos/*/tests/ -v --tb=short
+
+# Format code
+make format
+# or directly
+black .
+
+# Run linting
+make lint
+# or directly
+flake8 . --max-line-length=88 --extend-ignore=E203,W503
+mypy shared/ --ignore-missing-imports
+
+# Open notebooks in VS Code
+make notebooks
+```
+
+### Running Demos
+```bash
+# Individual demos
+make demo-credit
+make demo-fraud
+make demo-sales
+make demo-dna
+
+# All demos
+make demos
+
+# Or run directly
+python run_credit_risk_demo.py
+python run_fraud_detection_demo.py
+python run_sales_forecasting_demo.py
+python run_dna_similarity_demo.py
+```
+
+### Database Management
+```bash
+# Check status
+make status
+
+# View logs
+make logs
+
+# Initialize database with sample data
+make init-db
+
+# Clean up (removes containers and volumes)
+make clean
+```
+
+## High-Level Architecture
+
+### Model Integration Pattern
+All ML models follow a standardized integration pattern based on scikit-learn compatibility:
+
+1. **Base Model Inheritance**: All models inherit from `IntegratedMLBaseModel` (shared/models/base.py:20), which provides:
+   - IntegratedML parameter serialization/deserialization
+   - Model state persistence
+   - Input validation and preprocessing hooks
+   - Consistent fit/predict interfaces
+
+2. **Model Types**: Three specialized base classes extend the core pattern:
+   - `ClassificationModel` (shared/models/classification.py) - Binary/multi-class classification
+   - `RegressionModel` (shared/models/regression.py) - Continuous value prediction
+   - `EnsembleModel` (shared/models/ensemble.py) - Multi-model voting/averaging
+
+3. **Demo Model Structure**: Each demo implements a custom model:
+   - `CustomCreditRiskClassifier` (demos/credit_risk/models/credit_risk_classifier.py:22) - Feature engineering for financial data
+   - `EnsembleFraudDetector` (demos/fraud_detection/models/ensemble_fraud_detector.py) - Multiple sub-models with weighted voting
+   - `HybridForecastingModel` (demos/sales_forecasting/models/hybrid_forecasting_model.py) - Prophet + LightGBM combination
+   - `DNASimilarityAnalyzer` (demos/dna_similarity/models/dna_classifier.py) - Sequence analysis algorithms
+
+### Key Architectural Patterns
+
+1. **Feature Engineering Pipeline**: Models implement custom preprocessing in `_engineer_features()` methods, allowing domain-specific transformations before training/prediction.
+
+2. **Model State Management**: Models use `_get_model_state()` and `_set_model_state()` for serialization, enabling persistence across database sessions.
+
+3. **Ensemble Architecture**: The fraud detection demo shows how to combine multiple models (neural, rule-based, behavioral) with weighted voting and confidence thresholds.
+
+4. **Database Integration**: Models are designed to work within IRIS database constraints:
+   - Parameters passed via JSON from SQL
+   - Models execute in-database for data locality
+   - Results returned directly to SQL queries
+
+### Database Connection
+The project uses InterSystems IRIS with IntegratedML. Connection details are configured via environment variables (see .env.example). The default setup uses:
+- Host: localhost
+- Port: 1972
+- Namespace: USER
+- Default credentials in docker-compose.yml
+
+### Testing Strategy
+Tests are organized by demo with shared utilities:
+- Unit tests for individual components
+- Integration tests with IRIS database
+- Performance benchmarks for latency requirements
+- Test data generators for reproducible scenarios
+
+Run specific test suites:
+```bash
+pytest demos/credit_risk/tests/
+pytest demos/fraud_detection/tests/
+pytest demos/sales_forecasting/tests/
+```
@@ -0,0 +1,230 @@
+# IntegratedML Custom Models Feature - Product Requirements Document
+
+## Executive Summary
+
+IntegratedML now supports **custom Python model integration**, enabling data scientists and developers to bring their own machine learning models directly into InterSystems IRIS SQL workflows. This groundbreaking feature allows custom Python preprocessing, feature engineering, and model training code to be executed within SQL commands like `CREATE MODEL` and `SELECT ... PREDICT()`.
+
+## Core Value Proposition
+
+### Before IntegratedML Custom Models
+```python
+# Traditional approach - data movement required
+data = fetch_from_database()
+processed_data = custom_preprocessing(data)
+model = train_custom_model(processed_data)
+predictions = model.predict(new_data)
+write_to_database(predictions)
+```
+
+### With IntegratedML Custom Models
+```sql
+-- Everything happens in-database!
+CREATE MODEL FraudDetectionModel
+PREDICTING (is_fraud)
+FROM Transactions
+USING "demos.fraud_detection.models.EnsembleFraudDetector";
+
+-- Real-time predictions without data movement
+SELECT transaction_id, amount,
+       PREDICT(FraudDetectionModel) as fraud_risk
+FROM LiveTransactions
+WHERE amount > 1000;
+```
+
+## Key Features
+
+### 1. Seamless SQL Integration
+- Use familiar SQL syntax to train and deploy custom ML models
+- No need to export data or manage separate ML infrastructure
+- Models execute where the data lives
+
+### 2. Python Flexibility
+- Bring any scikit-learn compatible model
+- Custom preprocessing and feature engineering
+- Support for ensemble methods and complex architectures
+- Integration with popular libraries (TensorFlow, LightGBM, Prophet)
+
+### 3. Production-Ready
+- Models persist in the database
+- Automatic versioning and lifecycle management
+- Built-in security and access controls
+- Scalable in-database execution
+
+## How It Works
+
+### Step 1: Define Your Custom Model
+```python
+from shared.models.base import IntegratedMLBaseModel
+
+class CustomCreditRiskClassifier(IntegratedMLBaseModel):
+    def fit(self, X, y):
+        # Custom feature engineering
+        X_engineered = self._engineer_features(X)
+        # Train your model
+        self.model = LogisticRegression()
+        self.model.fit(X_engineered, y)
+        return self
+
+    def predict(self, X):
+        X_engineered = self._engineer_features(X)
+        return self.model.predict(X_engineered)
+```
+
+### Step 2: Train Using SQL
+```sql
+CREATE MODEL CreditRiskModel
+PREDICTING (default_risk)
+FROM CreditApplications
+USING "demos.credit_risk.models.CustomCreditRiskClassifier"
+WITH (enable_debt_ratio=true, decision_threshold=0.7);
+```
+
+### Step 3: Validate Model Performance
+```sql
+VALIDATE MODEL CreditRiskModel
+FROM TestApplications;
+```
+
+### Step 4: Make Predictions
+```sql
+SELECT
+    customer_id,
+    credit_amount,
+    PREDICT(CreditRiskModel) as risk_prediction,
+    PREDICT(CreditRiskModel PROBABILITY) as risk_probability
+FROM NewApplications;
+```
+
+## Demo Showcase
+
+This repository demonstrates four real-world use cases:
+
+### 1. Credit Risk Assessment
+**Problem**: Banks need to assess credit risk while keeping sensitive financial data secure.
+
+**Solution**: Custom feature engineering (debt-to-income ratios, stability scores) executed in-database.
+
+**Key Features**:
+- Domain-specific financial calculations
+- Risk scoring with explanations
+- Compliance-friendly in-database processing
+
+### 2. Real-time Fraud Detection
+**Problem**: Payment processors need sub-100ms fraud detection without data movement latency.
+
+**Solution**: Ensemble model combining neural networks, rules, and behavioral analysis.
+
+**Key Features**:
+- Multiple model voting strategies
+- Real-time feature calculation
+- Configurable confidence thresholds
+
+### 3. Sales Forecasting
+**Problem**: Retailers need accurate forecasts combining time series and ML approaches.
+
+**Solution**: Hybrid model integrating Prophet (trending) with LightGBM (pattern learning).
+
+**Key Features**:
+- Third-party library integration
+- Confidence interval generation
+- Seasonal decomposition
+
+### 4. DNA Sequence Similarity
+**Problem**: Genomics researchers need specialized sequence analysis algorithms.
+
+**Solution**: Custom similarity metrics (Levenshtein distance, k-mer analysis) in SQL.
+
+**Key Features**:
+- Bioinformatics algorithms
+- Specialized distance calculations
+- Optimized sequence processing
+
+## Technical Architecture
+
+### Model Lifecycle
+1. **Development**: Create Python model inheriting from `IntegratedMLBaseModel`
+2. **Registration**: Model path specified in SQL `USING` clause
+3. **Training**: SQL `CREATE MODEL` triggers Python fit() method
+4. **Validation**: SQL `VALIDATE MODEL` evaluates performance metrics
+5. **Persistence**: Trained model stored in IRIS
+6. **Inference**: SQL `PREDICT()` calls Python predict() method
+
+### Integration Points
+- **Parameter Passing**: SQL `WITH` clause → Python `__init__` parameters
+- **Data Transfer**: IRIS tables → Pandas DataFrames
+- **Model Storage**: Python pickle → IRIS model repository
+- **Error Handling**: Python exceptions → SQL error messages
+
+## Benefits for Different Personas
+
+### For Data Scientists
+- Use familiar Python ML libraries
+- No infrastructure management
+- Focus on model development, not deployment
+
+### For SQL Developers
+- Access ML capabilities through SQL
+- No Python knowledge required for predictions
+- Consistent SQL-based workflows
+
+### For IT/Operations
+- Reduced infrastructure complexity
+- Unified security model
+- Simplified model governance
+
+### For Business Users
+- Faster time-to-insight
+- Real-time predictions on live data
+- Lower total cost of ownership
+
+## Getting Started
+
+### Quick Demo (5 minutes)
+```bash
+# Clone and setup
+git clone <repo>
+cd integratedml-flexible-model-integration
+make setup
+
+# Run a demo
+make demo-credit
+```
+
+### Try Your Own Model (15 minutes)
+1. Create a model class inheriting from `IntegratedMLBaseModel`
+2. Implement `fit()` and `predict()` methods
+3. Use SQL to train: `CREATE MODEL ... USING "your.model.path"`
+4. Validate performance: `VALIDATE MODEL YourModel FROM TestData`
+5. Make predictions: `SELECT PREDICT(YourModel) FROM YourTable`
+
+## Success Metrics
+
+- **Performance**: In-database execution eliminates data movement latency
+- **Flexibility**: Support for any Python ML approach
+- **Adoption**: Simple SQL interface for complex ML
+- **Governance**: Centralized model management
+
+## Roadmap
+
+### Current Release
+- Scikit-learn compatible models
+- Basic parameter passing
+- Model persistence
+- SQL prediction functions
+
+### Future Enhancements
+- Model versioning and A/B testing
+- Automated retraining triggers
+- Model explainability APIs
+- Distributed training support
+
+## Call to Action
+
+1. **Explore the Demos**: See how custom models solve real problems
+2. **Build Your Own**: Create models for your specific use cases
+3. **Share Feedback**: Help shape the future of IntegratedML
+4. **Join the Community**: Share your models and learn from others
+
+---
+
+*IntegratedML Custom Models Feature - Where SQL Meets Machine Learning*