Centralized feature store built on DVC for ML feature versioning, validation, and sharing.
- Version Control - Track feature changes with DVC
- Schema Validation - Validate features against Pydantic schemas
- Multi-Backend Storage - S3, GCS, Azure, or local storage
- Feature Registry - Central catalog for feature discovery
- CI/CD Integration - Automated validation in GitHub Actions
- MLflow Integration - Automatic feature lineage tracking
- Kubeflow Pipelines - Production ML pipeline components
- CLI Interface - Easy-to-use command line tools
# Basic installation
pip install -e .
# With MLflow support
pip install -e ".[all]"
# With Kubeflow support
pip install -e ".[kubeflow]"# Initialize feature store
feature-store init --remote-url s3://my-bucket/features
# Add a feature
feature-store feature add customer demographics data.parquet --schema schema.yaml
# List features
feature-store feature list
# Validate
feature-store feature validate customer/demographics
# Push to remote
feature-store pushfrom src.feature_store import FeatureStoreMLflow
fs = FeatureStoreMLflow(
registry_path=Path("features/registry.yaml"),
experiment_name="my-experiment",
)
with fs.start_run(run_name="training-v1"):
df = fs.create_training_dataset(
feature_names=["customer/demographics", "transaction/aggregates"],
join_keys=["customer_id"],
)
model = train_model(df)
fs.log_model_with_features(
model=model,
artifact_path="model",
feature_names=["customer/demographics", "transaction/aggregates"],
registered_model_name="my-model",
)from src.feature_store.pipelines import create_training_pipeline
from kfp import compiler
compiler.Compiler().compile(
pipeline_func=create_training_pipeline,
package_path="training_pipeline.yaml",
)| Document | Description |
|---|---|
| User Guide | General usage guide |
| MLflow Integration | MLflow setup and usage |
| Kubeflow Integration | Kubeflow pipeline guide |
| Contributing | Development setup |
| Architecture | Design decisions |
dvc-feature-store/
├── src/feature_store/ # Core library
│ ├── models.py # Pydantic models
│ ├── registry.py # Feature registry
│ ├── validator.py # Schema validation
│ ├── versioning.py # DVC operations
│ ├── storage.py # Remote storage
│ ├── mlflow_integration.py # MLflow integration
│ └── pipelines/ # Kubeflow components
├── features/ # Feature storage
├── examples/ # Usage examples
│ ├── ml_training/ # MLflow examples
│ └── kubeflow/ # Pipeline examples
├── tests/ # Test suite
└── docs/ # Documentation
MIT