Skip to content

Static Embeddings

Related docs: index.md | quickstart.md | models.md | architecture.md

Overview

Static embeddings use pre-computed lookup tables instead of on-the-fly encoding, providing 10-100x faster matching with minimal accuracy tradeoffs.

How It Works

Dynamic Embeddings (Traditional)

Input Text → Neural Network → Embedding Vector
   └──────────── ~50-500ms ────────────┘

Static Embeddings

Input Text → Token Lookup → Pre-computed Vector
   └────────── ~1-5ms ──────────┘

Static embeddings skip the neural network forward pass by storing pre-computed vectors for each token in the vocabulary.

Supported Static Backends

1. model2vec (minishlab potion models)

Fast, lightweight models distilled from larger sentence transformers.

Models: - potion-8m - 8M parameters, ultra-fast (default retrieval model) - potion-32m - 32M parameters, better quality

Characteristics: - Best for: English general-purpose retrieval - Backend: StaticModel.from_pretrained() - Dimension: 256-384 (configurable)

Usage:

from novelentitymatcher import Matcher

matcher = Matcher(entities=entities, model="potion-8m")
matcher.fit()
result = matcher.match("query")  # ~1-5ms per query

2. StaticEmbedding (RikkaBotan MRL models)

Matryoshka Representation Learning (MRL) models with configurable dimensionality.

Models: - mrl-en - English-only with MRL support - mrl-multi - Multilingual with MRL support

Characteristics: - Best for: Multilingual retrieval or dimension-constrained scenarios - Backend: SentenceTransformer with StaticEmbedding module - Dimension: Variable (truncate at runtime for efficiency)

Usage with MRL dimension reduction:

# Use full dimension
matcher = Matcher(entities=entities, model="mrl-en")

# Use reduced dimension (faster, less memory)
matcher = Matcher(
    entities=entities,
    model="mrl-en",
    embedding_dim=256  # Truncate to 256 dimensions
)

When to Use Static vs Dynamic

Use Case Recommended Why
High-throughput retrieval Static (potion-8m) 40-100x faster, sufficient accuracy
Multilingual retrieval Static (mrl-multi) Fast multilingual support
Training with few-shot data Dynamic (mpnet) SetFit requires trainable backbone
Context-heavy queries Dynamic (bge-base) Better contextual understanding
Resource-constrained Static (potion-8m) Lower CPU/memory usage

Performance Comparison

Benchmark results (queries per second):

Model Type Throughput Speedup vs minilm
potion-8m Static ~4000 QPS 39x faster
potion-32m Static ~3500 QPS 34x faster
mrl-en Static ~1800 QPS 17x faster
minilm Dynamic ~100 QPS baseline

Results from benchmark.md - actual performance varies by hardware.

Dimension Reduction with MRL

MRL (Matryoshka Representation Learning) models allow runtime dimensionality reduction:

# Full dimension (768d)
matcher_full = Matcher(entities=entities, model="mrl-en")

# Reduced dimension (256d) - faster, less memory
matcher_reduced = Matcher(
    entities=entities,
    model="mrl-en",
    embedding_dim=256
)

# Both work, reduced is 3x faster with minimal accuracy loss

Benefits: - Faster similarity computation - Lower memory footprint - Tunable speed/accuracy tradeoff

How to choose dimension: - Start with model's default (usually best) - Reduce if memory/speed constrained - Test accuracy impact on your data

Auto-Detection

The library automatically detects static embedding models:

# These all use static embeddings automatically
Matcher(model="potion-8m")      # model2vec
Matcher(model="mrl-en")         # StaticEmbedding
Matcher(model="minishlab/potion-base-8M")  # Full name

Detection logic: 1. Try model2vec.StaticModel.from_pretrained() 2. Fall back to SentenceTransformer with StaticEmbedding 3. Use dynamic SentenceTransformer if neither works

Fallback for Training

Static models don't support SetFit training. When training is requested:

# Static model requested for training
matcher = Matcher(entities=entities, model="potion-8m")
matcher.fit(training_data, mode="full")  # ⚠️ Fallback to mpnet

# Training happens with mpnet (trainable), not potion-8m

Why: Static embeddings lack the trainable parameters required for SetFit.

See: models.md for training-compatible model options.

Technical Details

model2vec Backend

from model2vec import StaticModel

model = StaticModel.from_pretrained("minishlab/potion-base-8M")
embeddings = model.encode(["text1", "text2"])
# Returns pre-computed vectors via lookup

StaticEmbedding Backend

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en",
    trust_remote_code=True  # Required for custom StaticEmbedding module
)
embeddings = model.encode(["text1", "text2"])
# Returns static lookup results

Troubleshooting

"Failed to load static embedding model"

Cause: Model requires dependencies not installed.

Solution:

# For model2vec models
uv pip install model2vec

# For RikkaBotan MRL models
uv pip install sentence-transformers

"MPS fallback" warning on Apple Silicon

Cause: RikkaBotan MRL models use operations not supported by MPS.

Solution: Already handled - library sets PYTORCH_ENABLE_MPS_FALLBACK=1 automatically.

Model loading errors

Cause: Trying to use static model for training without fallback.

Solution: Specify a training-compatible model:

# Wrong
matcher = Matcher(model="potion-8m")
matcher.fit(training_data, mode="full")  # Falls back to mpnet

# Right
matcher = Matcher(model="mpnet")  # Training-compatible
matcher.fit(training_data, mode="full")

Next Steps