Feature Forge
Modular experimentation platform for LLM-based multi-agent automated feature engineering.
Feature Forge is a production-ready refactoring of the MALMAS (Memory-Augmented LLM-based Multi-Agent System) research codebase into a modular, experiment-first Python package. It treats every method as a first-class, independently runnable, composable experiment unit.
Key Features
- 6 Specialized Agents: Unary, Cross-Compositional, Aggregation, Temporal, Local Transform, Local Pattern
- 3-Tier Memory: Procedural, Feedback, and Conceptual memory with LLM summarization
- Dynamic Router: Data-driven, performance-driven, hybrid, and LLM-based agent selection
- Enforced LLM Caching: DiskCache with SHA-256 keys prevents accidental API costs
- Sandboxed Execution: AST-validated code execution for LLM-generated features
- Experiment Matrix: Cartesian product of datasets × methods × seeds × models × rounds
- Baselines: OpenFE [2], CAAFE [3], LLM-FE [4], Malmus (structured JSON)
- Observability: structlog + Langfuse + OpenTelemetry
- Tracking: WandB (default) + MLflow (optional)
- Sklearn Compatible:
MALMASFeatureEngineerinheritsBaseEstimator+TransformerMixin
Installation
# Clone the repository
git clone https://github.com/minghao51/feature_forge.git
cd feature-forge
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e ".[base,docs,opinion]"
Quick Start
Sklearn API
from feature_forge.api import MALMASFeatureEngineer
fe = MALMASFeatureEngineer()
fe.fit(X_train, y_train)
X_test_enhanced = fe.transform(X_test)
# Use in a sklearn Pipeline
from sklearn.pipeline import Pipeline
from xgboost import XGBClassifier
pipeline = Pipeline([
("fe", MALMASFeatureEngineer()),
("clf", XGBClassifier()),
])
pipeline.fit(X_train, y_train)
Experiment Matrix
from feature_forge.experiment import ExperimentMatrix, ExperimentRunner, Reporter
matrix = (
ExperimentMatrix()
.datasets(["titanic", "house-prices"])
.methods({"malmas": ["full"], "openfe": ["openfe"]})
.seeds([0, 1, 2])
.models(["xgboost", "lightgbm"])
.rounds([1, 2, 4])
)
runner = ExperimentRunner()
results = runner.run(matrix.generate(), run_experiment)
reporter = Reporter(results)
print(reporter.to_markdown())
Custom Agent
from feature_forge.agents import BaseFeatureAgent
class DomainAgent(BaseFeatureAgent):
prompt_filename = "domain.txt"
agent_name = "domain"
Register in your pyproject.toml:
Configuration
Configuration priority (highest to lowest):
1. Constructor arguments
2. Environment variables (FF_* prefix)
3. .env file (dotenvx encrypted)
4. YAML files (config/settings.yaml)
export FF_TASK=classification
export FF_LLM__MODEL=deepseek-chat
export FF_LLM__API_KEY=sk-...
export FF_TRACKER__PROJECT=my-project
Architecture
Experiment Layer → ExperimentMatrix, ExperimentRunner, Tracker, Reporter
Pipeline Layer → MALMASFeatureEngineer, CorePipeline, IterativePipeline
Agent Layer → 6 Agents + Router + Registry
Memory Layer → Procedural, Feedback, Conceptual
LLM Layer → LLMClient, DiskCache, LangfuseWrapper
Evaluation Layer → Metrics, CV, ModelFactory, Sandbox
Data Layer → KaggleFetcher, DatasetRegistry
Observability Layer → structlog, Langfuse, OpenTelemetry
Development
# Run all tests
uv run pytest
# With coverage report
uv run pytest --cov=feature_forge --cov-report=html
# Linting
uv run ruff check src tests
# Type checking
uv run mypy src
# Pre-commit hooks
pre-commit install
pre-commit run --all-files
Documentation
- Methods & References — Full documentation of all methods, pipelines, and academic sources
- Implementation Plan
- API Reference
- Migration Guide
- Quick Start
- MALMAS Technical Roadmap
References
[1] MALMAS — "Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data" MINE-USTC. arXiv:2604.20261, ACL ARR 2026. GitHub
[2] OpenFE — "OpenFE: Automated Feature Generation with Expert-level Performance" Zhang et al. ICML 2023. arXiv:2211.12507. GitHub
[3] CAAFE — "LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering" Hollmann, Müller, Hutter. NeurIPS 2023. arXiv:2305.03403. GitHub
[4] LLM-FE — "LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers" Abhyankar, Shojaee, Reddy. arXiv:2503.14434, 2025. GitHub
License
MIT