Skip to content

thinkall/featcopilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

193 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

FeatCopilot πŸš€

Next-Generation LLM-Powered Auto Feature Engineering Framework

Tests codecov

FeatCopilot automatically generates, selects, and explains predictive features using semantic understanding. It analyzes column meanings, applies domain-aware transformations, and provides human-readable explanationsβ€”turning raw data into ML-ready features in seconds.

🎬 Introduction Video

FeatCopilot Introduction

πŸ“Š Benchmark Highlights

Simple Models Benchmark (63 Datasets)

Configuration Improved Avg Improvement Best Improvement
Tabular Engine 31 (49%) +7.52% +144% (triple_interaction)

Models: RandomForest (n_estimators=200, max_depth=20), LogisticRegression/Ridge

AutoML Benchmark (FLAML + AutoGluon, 120s budget)

Framework Datasets Improved Avg Improvement
FLAML 10 9 (90%) +1.85%
AutoGluon 10 9 (90%) +1.55%

FE Tools Comparison (FeatCopilot vs autofeat vs featuretools)

Metric FeatCopilot autofeat featuretools
Win Rate 80% πŸ† 40% 0%
Avg Improvement +1.89% πŸ† +1.46% -2.71%
Coverage 100% πŸ† 50% 100%
Composite Score 0.606 πŸ₯‡ 0.351 πŸ₯‰ 0.397 πŸ₯ˆ

Key Results

  • πŸ”₯ +144% improvement on triple_interaction_regression (tabular only)
  • πŸ“ˆ +104% on xor_regression, +70% on pairwise_product_regression
  • πŸ† #1 FE tool β€” beats autofeat and featuretools across 10 datasets
  • πŸš€ 90% AutoML improvement rate across FLAML and AutoGluon

View Full Benchmark Results

Key Features

  • πŸ”§ Multi-Engine Architecture: Tabular, time series, relational, and text feature engines
  • πŸ€– LLM-Powered Intelligence: Semantic feature discovery, domain-aware generation, and code synthesis
  • πŸ“Š Intelligent Selection: Statistical testing, importance ranking, and redundancy elimination
  • πŸ”Œ Scikit-learn Compatible: Drop-in replacement for sklearn transformers
  • πŸ“ Interpretable: Every feature comes with human-readable explanations

Installation

# Basic installation
pip install featcopilot

# With LLM capabilities
pip install featcopilot[llm]

# Full installation
pip install featcopilot[full]

Quick Start

Fast Mode (Tabular Only)

from featcopilot import AutoFeatureEngineer

# Sub-second feature engineering
engineer = AutoFeatureEngineer(
    engines=['tabular'],
    max_features=50
)

X_transformed = engineer.fit_transform(X, y)  # <1 second
print(f"Features: {X.shape[1]} -> {X_transformed.shape[1]}")

LLM Mode (With LiteLLM)

from featcopilot import AutoFeatureEngineer

# LLM-powered semantic features
engineer = AutoFeatureEngineer(
    engines=['tabular', 'llm'],
    max_features=50
)

X_transformed = engineer.fit_transform(
    X, y,
    column_descriptions={
        'age': 'Customer age in years',
        'income': 'Annual household income in USD',
        'tenure': 'Months as customer',
    },
    task_description="Predict customer churn"
)  # 30-60 seconds

# Get LLM-generated explanations
for feature, explanation in engineer.explain_features().items():
    print(f"{feature}: {explanation}")

Engines

Tabular Engine

Generates polynomial features, interaction terms, and mathematical transformations.

from featcopilot.engines import TabularEngine

engine = TabularEngine(
    polynomial_degree=2,
    interaction_only=False,
    include_transforms=['log', 'sqrt', 'square']
)

Time Series Engine

Extracts statistical, frequency, and temporal features from time series data.

from featcopilot.engines import TimeSeriesEngine

engine = TimeSeriesEngine(
    features=['mean', 'std', 'skew', 'autocorr', 'fft_coefficients']
)

LLM Engine

Uses GitHub Copilot SDK (default) or LiteLLM (100+ providers) for intelligent feature generation.

from featcopilot.llm import SemanticEngine

# Default: GitHub Copilot SDK
engine = SemanticEngine(
    model='gpt-5.2',
    max_suggestions=20,
    validate_features=True
)

# Alternative: LiteLLM backend
engine = SemanticEngine(
    model='gpt-4o',
    backend='litellm',
    max_suggestions=20
)

Feature Selection

from featcopilot.selection import FeatureSelector

selector = FeatureSelector(
    methods=['mutual_info', 'importance', 'correlation'],
    max_features=30,
    correlation_threshold=0.95
)

X_selected = selector.fit_transform(X, y)

Comparison with Existing Libraries

Feature FeatCopilot Featuretools TSFresh AutoFeat OpenFE CAAFE
Tabular Features βœ… ❌ ❌ βœ… βœ… βœ…
Time Series βœ… ❌ βœ… ❌ ❌ ❌
Relational βœ… βœ… ❌ ❌ ❌ ❌
LLM-Powered βœ… ❌ ❌ ❌ ❌ βœ…
Semantic Understanding βœ… ❌ ❌ ❌ ❌ ⚠️
Code Generation βœ… ❌ ❌ ❌ ❌ ⚠️
Sklearn Compatible βœ… βœ… βœ… βœ… βœ… ❌
Interpretable βœ… ⚠️ ⚠️ ⚠️ ❌ βœ…

Documentation

πŸ“– Full Documentation: https://thinkall.github.io/featcopilot/

Requirements

  • Python 3.9+
  • NumPy, Pandas, Scikit-learn
  • GitHub Copilot SDK (default) or LiteLLM (for 100+ LLM providers)

License

MIT License