Next-Generation LLM-Powered Auto Feature Engineering Framework
FeatCopilot automatically generates, selects, and explains predictive features using semantic understanding. It analyzes column meanings, applies domain-aware transformations, and provides human-readable explanationsβturning raw data into ML-ready features in seconds.
| Configuration | Improved | Avg Improvement | Best Improvement |
|---|---|---|---|
| Tabular Engine | 31 (49%) | +7.52% | +144% (triple_interaction) |
Models: RandomForest (n_estimators=200, max_depth=20), LogisticRegression/Ridge
| Framework | Datasets | Improved | Avg Improvement |
|---|---|---|---|
| FLAML | 10 | 9 (90%) | +1.85% |
| AutoGluon | 10 | 9 (90%) | +1.55% |
| Metric | FeatCopilot | autofeat | featuretools |
|---|---|---|---|
| Win Rate | 80% π | 40% | 0% |
| Avg Improvement | +1.89% π | +1.46% | -2.71% |
| Coverage | 100% π | 50% | 100% |
| Composite Score | 0.606 π₯ | 0.351 π₯ | 0.397 π₯ |
- π₯ +144% improvement on triple_interaction_regression (tabular only)
- π +104% on xor_regression, +70% on pairwise_product_regression
- π #1 FE tool β beats autofeat and featuretools across 10 datasets
- π 90% AutoML improvement rate across FLAML and AutoGluon
- π§ Multi-Engine Architecture: Tabular, time series, relational, and text feature engines
- π€ LLM-Powered Intelligence: Semantic feature discovery, domain-aware generation, and code synthesis
- π Intelligent Selection: Statistical testing, importance ranking, and redundancy elimination
- π Scikit-learn Compatible: Drop-in replacement for sklearn transformers
- π Interpretable: Every feature comes with human-readable explanations
# Basic installation
pip install featcopilot
# With LLM capabilities
pip install featcopilot[llm]
# Full installation
pip install featcopilot[full]from featcopilot import AutoFeatureEngineer
# Sub-second feature engineering
engineer = AutoFeatureEngineer(
engines=['tabular'],
max_features=50
)
X_transformed = engineer.fit_transform(X, y) # <1 second
print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")from featcopilot import AutoFeatureEngineer
# LLM-powered semantic features
engineer = AutoFeatureEngineer(
engines=['tabular', 'llm'],
max_features=50
)
X_transformed = engineer.fit_transform(
X, y,
column_descriptions={
'age': 'Customer age in years',
'income': 'Annual household income in USD',
'tenure': 'Months as customer',
},
task_description="Predict customer churn"
) # 30-60 seconds
# Get LLM-generated explanations
for feature, explanation in engineer.explain_features().items():
print(f"{feature}: {explanation}")Generates polynomial features, interaction terms, and mathematical transformations.
from featcopilot.engines import TabularEngine
engine = TabularEngine(
polynomial_degree=2,
interaction_only=False,
include_transforms=['log', 'sqrt', 'square']
)Extracts statistical, frequency, and temporal features from time series data.
from featcopilot.engines import TimeSeriesEngine
engine = TimeSeriesEngine(
features=['mean', 'std', 'skew', 'autocorr', 'fft_coefficients']
)Uses GitHub Copilot SDK (default) or LiteLLM (100+ providers) for intelligent feature generation.
from featcopilot.llm import SemanticEngine
# Default: GitHub Copilot SDK
engine = SemanticEngine(
model='gpt-5.2',
max_suggestions=20,
validate_features=True
)
# Alternative: LiteLLM backend
engine = SemanticEngine(
model='gpt-4o',
backend='litellm',
max_suggestions=20
)from featcopilot.selection import FeatureSelector
selector = FeatureSelector(
methods=['mutual_info', 'importance', 'correlation'],
max_features=30,
correlation_threshold=0.95
)
X_selected = selector.fit_transform(X, y)| Feature | FeatCopilot | Featuretools | TSFresh | AutoFeat | OpenFE | CAAFE |
|---|---|---|---|---|---|---|
| Tabular Features | β | β | β | β | β | β |
| Time Series | β | β | β | β | β | β |
| Relational | β | β | β | β | β | β |
| LLM-Powered | β | β | β | β | β | β |
| Semantic Understanding | β | β | β | β | β | |
| Code Generation | β | β | β | β | β | |
| Sklearn Compatible | β | β | β | β | β | β |
| Interpretable | β | β | β |
π Full Documentation: https://thinkall.github.io/featcopilot/
- Python 3.9+
- NumPy, Pandas, Scikit-learn
- GitHub Copilot SDK (default) or LiteLLM (for 100+ LLM providers)
MIT License
