Benchmarks¶
novelentitymatcher.benchmarks.loader
¶
novelentitymatcher.benchmarks.runner
¶
Benchmark runner orchestrator for HuggingFace benchmarks.
Classes¶
BenchmarkRunner(output_dir=None, cache_dir=None)
¶
Source code in src/novelentitymatcher/benchmarks/runner.py
novelentitymatcher.benchmarks.shared
¶
Shared utilities for benchmark scripts.
Consolidates duplicated code from: - benchmark_bert.py / benchmark_bert_models.py (generate_synthetic_data, benchmark_training, benchmark_inference) - benchmark_full_pipeline.py / benchmark_novelty_strategies.py / benchmark_novelty_full.py (compute_ood_metrics, SplitData, OOD splitting)
novelentitymatcher.benchmarks.registry
¶
Dataset registry for HuggingFace benchmarks.
novelentitymatcher.benchmarks.base
¶
novelentitymatcher.benchmarks.novelty_bench
¶
Merged novelty detection benchmark with depth levels.
Consolidates: - benchmark_full_pipeline.py (Phase 2) - benchmark_novelty_strategies.py - benchmark_novelty_full.py
Depth levels:
- quick: KNN, Mahalanobis, LOF, OneClassSVM, IsolationForest
- standard: quick + Pattern, SetFit Centroid, weighted/voting/adaptive ensembles
- full: standard + hyperparameter tuning + SignalCombiner + meta-learner + adaptive weights
novelentitymatcher.benchmarks.infra_bench
¶
Infrastructure benchmarks: ANN backends and reranker models.
Benchmarks: - ANN backends (hnswlib vs faiss vs exact): build time, query latency, recall@k - Reranker models (bge-m3 vs bge-large vs ms-marco): accuracy, latency
Usage
novelentitymatcher-bench bench-ann --sizes 1000 10000 100000 novelentitymatcher-bench bench-reranker --queries 100
novelentitymatcher.benchmarks.classifier_bench
¶
Merged classifier benchmark: BERT vs SetFit comparison and multi-model sweep.
Consolidates: - benchmark_bert.py (head-to-head BERT vs SetFit) - benchmark_bert_models.py (multi-model BERT sweep)
Modes:
- compare: BERT vs SetFit head-to-head
- sweep-models: benchmark multiple BERT-family classifiers
novelentitymatcher.benchmarks.async_bench
¶
Async/sync performance benchmark for matcher APIs.
Consolidated from: benchmark_async.py
Benchmarks sync vs async matcher APIs across zero-shot, head-only, and full modes, measuring construct time, fit time, cold-query latency, steady-state match latency, QPS, and end-to-end wall time.
novelentitymatcher.benchmarks.weight_optimizer
¶
Bayesian optimization of ensemble weights using Optuna.
Searches for optimal strategy weights and thresholds that maximize AUROC on validation data. Compares weighted/voting/meta_learner combination methods.
Usage
novelentitymatcher-bench bench-weights --trials 200 --dataset ag_news
novelentitymatcher.benchmarks.visualization
¶
Visualization utilities for benchmark results.
Consolidates: - render_benchmark_report.py (JSON -> markdown tables) - visualize_benchmarks.py (JSON -> PNG charts)