Benchmarks¶

`novelentitymatcher.benchmarks.cli` ¶

CLI for HuggingFace benchmarks.

Classes¶

`novelentitymatcher.benchmarks.loader` ¶

Async dataset loader for HuggingFace benchmarks with parquet caching.

Classes¶

`DatasetLoader(cache_dir=None, cache_config=None)` ¶

Source code in src/novelentitymatcher/benchmarks/loader.py

def __init__(
    self,
    cache_dir: Path | None = None,
    cache_config: CacheConfig | None = None,
):
    self.cache_dir = cache_dir or DEFAULT_CACHE_DIR
    self.cache_config = cache_config or CacheConfig()
    self.cache_dir.mkdir(parents=True, exist_ok=True)

`novelentitymatcher.benchmarks.runner` ¶

Benchmark runner orchestrator for HuggingFace benchmarks.

Classes¶

`BenchmarkRunner(output_dir=None, cache_dir=None)` ¶

Source code in src/novelentitymatcher/benchmarks/runner.py

def __init__(
    self,
    output_dir: Path | None = None,
    cache_dir: Path | None = None,
):
    self.output_dir = output_dir or Path("data/hf_benchmarks")
    self.output_dir.mkdir(parents=True, exist_ok=True)
    self.loader = DatasetLoader(cache_dir=cache_dir)

    self.er_evaluator = EntityResolutionEvaluator()
    self.clf_evaluator = ClassificationEvaluator()
    self.novelty_evaluator = NoveltyEvaluator()

`novelentitymatcher.benchmarks.shared` ¶

Shared utilities for benchmark scripts.

Consolidates duplicated code from: - benchmark_bert.py / benchmark_bert_models.py (generate_synthetic_data, benchmark_training, benchmark_inference) - benchmark_full_pipeline.py / benchmark_novelty_strategies.py / benchmark_novelty_full.py (compute_ood_metrics, SplitData, OOD splitting)

`novelentitymatcher.benchmarks.registry` ¶

Dataset registry for HuggingFace benchmarks.

`novelentitymatcher.benchmarks.base` ¶

Base evaluator abstract class for benchmarks.

Classes¶

`BaseEvaluator(name)` ¶

Bases: ABC, Generic[T]

Source code in src/novelentitymatcher/benchmarks/base.py

def __init__(self, name: str):
    self.name = name

Functions¶

`evaluate(data, **kwargs)` `abstractmethod` ¶

Evaluate on the given data.

Source code in src/novelentitymatcher/benchmarks/base.py

@abstractmethod
def evaluate(
    self,
    data: T,
    **kwargs,
) -> EvaluationResult:
    """Evaluate on the given data."""
    raise NotImplementedError

`get_default_metrics()` `abstractmethod` ¶

Return list of default metric names this evaluator computes.

Source code in src/novelentitymatcher/benchmarks/base.py

@abstractmethod
def get_default_metrics(self) -> list[str]:
    """Return list of default metric names this evaluator computes."""
    raise NotImplementedError

`novelentitymatcher.benchmarks.novelty_bench` ¶

Merged novelty detection benchmark with depth levels.

Consolidates: - benchmark_full_pipeline.py (Phase 2) - benchmark_novelty_strategies.py - benchmark_novelty_full.py

Depth levels: - quick: KNN, Mahalanobis, LOF, OneClassSVM, IsolationForest - standard: quick + Pattern, SetFit Centroid, weighted/voting/adaptive ensembles - full: standard + hyperparameter tuning + SignalCombiner + meta-learner + adaptive weights

`novelentitymatcher.benchmarks.infra_bench` ¶

Infrastructure benchmarks: ANN backends and reranker models.

Benchmarks: - ANN backends (hnswlib vs faiss vs exact): build time, query latency, recall@k - Reranker models (bge-m3 vs bge-large vs ms-marco): accuracy, latency

Usage

novelentitymatcher-bench bench-ann --sizes 1000 10000 100000 novelentitymatcher-bench bench-reranker --queries 100

`novelentitymatcher.benchmarks.classifier_bench` ¶

Merged classifier benchmark: BERT vs SetFit comparison and multi-model sweep.

Consolidates: - benchmark_bert.py (head-to-head BERT vs SetFit) - benchmark_bert_models.py (multi-model BERT sweep)

Modes: - compare: BERT vs SetFit head-to-head - sweep-models: benchmark multiple BERT-family classifiers

`novelentitymatcher.benchmarks.async_bench` ¶

Async/sync performance benchmark for matcher APIs.

Consolidated from: benchmark_async.py

Benchmarks sync vs async matcher APIs across zero-shot, head-only, and full modes, measuring construct time, fit time, cold-query latency, steady-state match latency, QPS, and end-to-end wall time.

`novelentitymatcher.benchmarks.weight_optimizer` ¶

Bayesian optimization of ensemble weights using Optuna.

Searches for optimal strategy weights and thresholds that maximize AUROC on validation data. Compares weighted/voting/meta_learner combination methods.

Usage

novelentitymatcher-bench bench-weights --trials 200 --dataset ag_news

`novelentitymatcher.benchmarks.visualization` ¶

Visualization utilities for benchmark results.

Consolidates: - render_benchmark_report.py (JSON -> markdown tables) - visualize_benchmarks.py (JSON -> PNG charts)

Benchmarks¶

novelentitymatcher.benchmarks.cli ¶

Classes¶

novelentitymatcher.benchmarks.loader ¶

Classes¶

DatasetLoader(cache_dir=None, cache_config=None) ¶

novelentitymatcher.benchmarks.runner ¶

Classes¶

BenchmarkRunner(output_dir=None, cache_dir=None) ¶

novelentitymatcher.benchmarks.shared ¶

novelentitymatcher.benchmarks.registry ¶

novelentitymatcher.benchmarks.base ¶

Classes¶

BaseEvaluator(name) ¶

Functions¶

evaluate(data, **kwargs) abstractmethod ¶

get_default_metrics() abstractmethod ¶

novelentitymatcher.benchmarks.novelty_bench ¶

novelentitymatcher.benchmarks.infra_bench ¶

novelentitymatcher.benchmarks.classifier_bench ¶

novelentitymatcher.benchmarks.async_bench ¶

novelentitymatcher.benchmarks.weight_optimizer ¶

novelentitymatcher.benchmarks.visualization ¶

`novelentitymatcher.benchmarks.cli` ¶

`novelentitymatcher.benchmarks.loader` ¶

`DatasetLoader(cache_dir=None, cache_config=None)` ¶

`novelentitymatcher.benchmarks.runner` ¶

`BenchmarkRunner(output_dir=None, cache_dir=None)` ¶

`novelentitymatcher.benchmarks.shared` ¶

`novelentitymatcher.benchmarks.registry` ¶

`novelentitymatcher.benchmarks.base` ¶

`BaseEvaluator(name)` ¶

`evaluate(data, **kwargs)` `abstractmethod` ¶

`get_default_metrics()` `abstractmethod` ¶

`novelentitymatcher.benchmarks.novelty_bench` ¶

`novelentitymatcher.benchmarks.infra_bench` ¶

`novelentitymatcher.benchmarks.classifier_bench` ¶

`novelentitymatcher.benchmarks.async_bench` ¶

`novelentitymatcher.benchmarks.weight_optimizer` ¶

`novelentitymatcher.benchmarks.visualization` ¶