Novelty Strategies¶
novelentitymatcher.novelty.strategies.base
¶
Base protocol for novelty detection strategies.
All strategies must implement this protocol to be compatible with the NoveltyDetector.
Classes¶
NoveltyStrategy
¶
Bases: ABC
Base protocol for all novelty detection strategies.
Each strategy is responsible for: 1. Initializing with reference embeddings and labels 2. Detecting novel samples from a batch of inputs 3. Providing per-sample metrics for signal combination 4. Specifying its weight for signal fusion
Attributes¶
config_schema
abstractmethod
property
¶
Return the config dataclass type for this strategy.
This is used for validation and defaults.
Functions¶
initialize(reference_embeddings, reference_labels, config)
abstractmethod
¶
Initialize strategy with reference data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_embeddings
|
ndarray
|
Embeddings of known samples |
required |
reference_labels
|
list[str]
|
Class labels for known samples |
required |
config
|
Any
|
Strategy-specific configuration object |
required |
Source code in src/novelentitymatcher/novelty/strategies/base.py
detect(texts, embeddings, predicted_classes, confidences, **kwargs)
abstractmethod
¶
Detect novel samples.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
Input texts |
required |
embeddings
|
ndarray
|
Text embeddings |
required |
predicted_classes
|
list[str]
|
Predicted class for each sample |
required |
confidences
|
ndarray
|
Prediction confidence scores |
required |
**kwargs
|
Additional strategy-specific parameters |
{}
|
Returns:
| Type | Description |
|---|---|
set[int]
|
(flags, metrics) - flagged indices and per-sample metrics |
dict[int, dict[str, Any]]
|
|
tuple[set[int], dict[int, dict[str, Any]]]
|
|
Source code in src/novelentitymatcher/novelty/strategies/base.py
get_weight()
abstractmethod
¶
Return weight for signal combination.
This weight determines how much this strategy contributes to the final novelty score.
get_config()
¶
Get the current configuration for this strategy.
Override this if your strategy stores its config differently.
novelentitymatcher.novelty.strategies.knn_distance
¶
kNN distance-based novelty detection strategy.
Flags samples based on their distance to k-nearest neighbors in the reference set.
Classes¶
KNNDistanceStrategy()
¶
Bases: NoveltyStrategy
kNN distance strategy for novelty detection.
Flags samples as novel if their average distance to k-nearest neighbors in the reference set exceeds a threshold.
Source code in src/novelentitymatcher/novelty/strategies/knn_distance.py
Attributes¶
config_schema
property
¶
Return KNNConfig as the config schema.
Functions¶
initialize(reference_embeddings, reference_labels, config)
¶
Initialize the kNN strategy with reference data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_embeddings
|
ndarray
|
Embeddings of known samples |
required |
reference_labels
|
list[str]
|
Labels of known samples |
required |
config
|
KNNConfig
|
KNNConfig with k, thresholds, and metric |
required |
Source code in src/novelentitymatcher/novelty/strategies/knn_distance.py
detect(texts, embeddings, predicted_classes, confidences, **kwargs)
¶
Detect novel samples using kNN distance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
Input texts |
required |
embeddings
|
ndarray
|
Text embeddings |
required |
predicted_classes
|
list[str]
|
Predicted classes |
required |
confidences
|
ndarray
|
Prediction confidences |
required |
**kwargs
|
Additional parameters |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[set[int], dict[int, dict[str, Any]]]
|
(flags, metrics) - Flagged indices and per-sample metrics |
Source code in src/novelentitymatcher/novelty/strategies/knn_distance.py
get_weight()
¶
get_config()
¶
Get the current configuration for this strategy.
Override this if your strategy stores its config differently.
novelentitymatcher.novelty.strategies.clustering
¶
Clustering-based novelty detection strategy.
Flags samples that form small, isolated clusters or don't fit well into any existing cluster.
Classes¶
ClusteringStrategy()
¶
Bases: NoveltyStrategy
Clustering-based strategy for novelty detection.
Uses HDBSCAN to cluster samples and identifies novel samples as those that are in small or low-cohesion clusters.
Source code in src/novelentitymatcher/novelty/strategies/clustering.py
Attributes¶
config_schema
property
¶
Return ClusteringConfig as the config schema.
Functions¶
initialize(reference_embeddings, reference_labels, config)
¶
Initialize the clustering strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_embeddings
|
ndarray
|
Embeddings of known samples |
required |
reference_labels
|
list[str]
|
Labels of known samples |
required |
config
|
ClusteringConfig
|
ClusteringConfig with thresholds |
required |
Source code in src/novelentitymatcher/novelty/strategies/clustering.py
detect(texts, embeddings, predicted_classes, confidences, **kwargs)
¶
Detect novel samples using clustering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
Input texts |
required |
embeddings
|
ndarray
|
Text embeddings |
required |
predicted_classes
|
list[str]
|
Predicted classes |
required |
confidences
|
ndarray
|
Prediction confidences |
required |
**kwargs
|
Additional parameters |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[set[int], dict[int, dict[str, Any]]]
|
(flags, metrics) - Flagged indices and per-sample metrics |
Source code in src/novelentitymatcher/novelty/strategies/clustering.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | |
get_weight()
¶
get_config()
¶
Get the current configuration for this strategy.
Override this if your strategy stores its config differently.
novelentitymatcher.novelty.strategies.pattern
¶
Pattern-based novelty detection strategy wrapper.
Wraps PatternScorer to implement NoveltyStrategy protocol.
Classes¶
novelentitymatcher.novelty.strategies.oneclass
¶
One-Class SVM novelty detection strategy wrapper.
Wraps OneClassSVMDetector to implement NoveltyStrategy protocol.
Classes¶
novelentitymatcher.novelty.strategies.setfit
¶
SetFit contrastive novelty detection strategy wrapper.
Wraps SetFitDetector to implement NoveltyStrategy protocol.
Classes¶
novelentitymatcher.novelty.strategies.setfit_centroid
¶
SetFit centroid distance novelty detection strategy.
Computes minimum cosine distance from each query to known class centroids in the SetFit fine-tuned embedding space. Produces continuous novelty scores.
This is the recommended strategy when SetFit full training is used for Phase 1, as contrastive learning creates tight, well-separated class clusters.
Classes¶
SetFitCentroidStrategy()
¶
Bases: NoveltyStrategy
Centroid distance strategy using SetFit fine-tuned embeddings.
For each known class, computes a centroid in the SetFit embedding space. Novelty score = minimum cosine distance from query to any centroid.
Source code in src/novelentitymatcher/novelty/strategies/setfit_centroid.py
Functions¶
initialize(reference_embeddings, reference_labels, config)
¶
Initialize centroids from reference embeddings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_embeddings
|
ndarray
|
Embeddings of known samples (already from SetFit model) |
required |
reference_labels
|
list[str]
|
Class labels for known samples |
required |
config
|
SetFitCentroidConfig
|
SetFitCentroidConfig with threshold |
required |
Source code in src/novelentitymatcher/novelty/strategies/setfit_centroid.py
detect(texts, embeddings, predicted_classes, confidences, **kwargs)
¶
Detect novel samples using centroid distance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
Input texts (unused, embeddings are pre-computed) |
required |
embeddings
|
ndarray
|
Query embeddings |
required |
predicted_classes
|
list[str]
|
Predicted class for each sample |
required |
confidences
|
ndarray
|
Prediction confidence scores |
required |
Returns:
| Type | Description |
|---|---|
tuple[set[int], dict[int, dict[str, Any]]]
|
(flags, metrics) - flagged indices and per-sample metrics |
Source code in src/novelentitymatcher/novelty/strategies/setfit_centroid.py
get_weight()
¶
get_config()
¶
Get the current configuration for this strategy.
Override this if your strategy stores its config differently.
novelentitymatcher.novelty.strategies.prototypical
¶
Prototypical network novelty detection strategy wrapper.
Wraps PrototypicalDetector to implement NoveltyStrategy protocol.
Classes¶
novelentitymatcher.novelty.strategies.mahalanobis
¶
Mahalanobis distance-based novelty detection strategy.
Flags samples based on their Mahalanobis distance to the class-conditional distribution of their predicted class. Supports optional conformal calibration for statistically grounded p-value based novelty routing.
Classes¶
MahalanobisDistanceStrategy()
¶
Bases: NoveltyStrategy
Mahalanobis distance strategy for novelty detection.
Computes the Mahalanobis distance from each sample to the class-conditional distribution (mean + shared covariance) of its predicted class. Samples whose distance exceeds a configurable threshold are flagged as novel.
When calibration_mode="conformal", raw distances are wrapped with
conformal p-values for statistically grounded routing. This is backward-
compatible: calibration_mode="none" produces identical results to the
original threshold-only behavior.
Source code in src/novelentitymatcher/novelty/strategies/mahalanobis.py
Attributes¶
config_schema
property
¶
Return MahalanobisConfig as the config schema.
Functions¶
initialize(reference_embeddings, reference_labels, config)
¶
Initialize the Mahalanobis strategy with reference data.
Computes per-class mean vectors and a shared (pooled) covariance matrix with regularization for numerical stability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_embeddings
|
ndarray
|
Embeddings of known samples (n_samples, dim) |
required |
reference_labels
|
list[str]
|
Class labels for known samples |
required |
config
|
MahalanobisConfig
|
MahalanobisConfig with threshold, regularization, etc. |
required |
Source code in src/novelentitymatcher/novelty/strategies/mahalanobis.py
detect(texts, embeddings, predicted_classes, confidences, **kwargs)
¶
Detect novel samples using Mahalanobis distance.
When calibration_mode="conformal", flagging uses p-values
instead of raw distance thresholds. A sample is flagged if
p_value < calibration_alpha.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
Input texts |
required |
embeddings
|
ndarray
|
Text embeddings |
required |
predicted_classes
|
list[str]
|
Predicted classes |
required |
confidences
|
ndarray
|
Prediction confidences |
required |
**kwargs
|
Additional parameters |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[set[int], dict[int, dict[str, Any]]]
|
(flags, metrics) - Flagged indices and per-sample metrics |
Source code in src/novelentitymatcher/novelty/strategies/mahalanobis.py
get_weight()
¶
get_config()
¶
Get the current configuration for this strategy.
Override this if your strategy stores its config differently.
Functions¶
novelentitymatcher.novelty.strategies.uncertainty
¶
Uncertainty-based novelty detection strategy.
Flags samples based on prediction uncertainty using margin and entropy.
Classes¶
UncertaintyStrategy()
¶
Bases: NoveltyStrategy
Uncertainty-based strategy for novelty detection.
Flags samples as novel if their prediction uncertainty exceeds configured thresholds (margin or entropy).
Source code in src/novelentitymatcher/novelty/strategies/uncertainty.py
Attributes¶
config_schema
property
¶
Return UncertaintyConfig as the config schema.
Functions¶
initialize(reference_embeddings, reference_labels, config)
¶
Initialize the uncertainty strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_embeddings
|
ndarray
|
Embeddings of known samples (not used) |
required |
reference_labels
|
list[str]
|
Labels of known samples (not used) |
required |
config
|
UncertaintyConfig
|
UncertaintyConfig with thresholds |
required |
Source code in src/novelentitymatcher/novelty/strategies/uncertainty.py
detect(texts, embeddings, predicted_classes, confidences, **kwargs)
¶
Detect novel samples using uncertainty metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
Input texts |
required |
embeddings
|
ndarray
|
Text embeddings (not used) |
required |
predicted_classes
|
list[str]
|
Predicted classes (not used) |
required |
confidences
|
ndarray
|
Prediction confidence scores |
required |
**kwargs
|
Additional parameters, may include 'all_probs' for full distribution |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[set[int], dict[int, dict[str, Any]]]
|
(flags, metrics) - Flagged indices and per-sample metrics |
Source code in src/novelentitymatcher/novelty/strategies/uncertainty.py
get_weight()
¶
get_config()
¶
Get the current configuration for this strategy.
Override this if your strategy stores its config differently.
novelentitymatcher.novelty.strategies.confidence
¶
Confidence threshold-based novelty detection strategy.
Flags samples with prediction confidence below a threshold as novel.
Classes¶
ConfidenceStrategy()
¶
Bases: NoveltyStrategy
Confidence threshold strategy for novelty detection.
Flags samples as novel if their prediction confidence falls below a configured threshold.
Source code in src/novelentitymatcher/novelty/strategies/confidence.py
Attributes¶
config_schema
property
¶
Return ConfidenceConfig as the config schema.
Functions¶
initialize(reference_embeddings, reference_labels, config)
¶
Initialize the confidence strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
reference_embeddings
|
ndarray
|
Embeddings of known samples (not used) |
required |
reference_labels
|
list[str]
|
Labels of known samples (not used) |
required |
config
|
ConfidenceConfig
|
ConfidenceConfig with threshold parameter |
required |
Source code in src/novelentitymatcher/novelty/strategies/confidence.py
detect(texts, embeddings, predicted_classes, confidences, **kwargs)
¶
Detect novel samples using confidence threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
texts
|
list[str]
|
Input texts |
required |
embeddings
|
ndarray
|
Text embeddings (not used) |
required |
predicted_classes
|
list[str]
|
Predicted classes (not used) |
required |
confidences
|
ndarray
|
Prediction confidence scores |
required |
**kwargs
|
Additional parameters |
{}
|
Returns:
| Type | Description |
|---|---|
tuple[set[int], dict[int, dict[str, Any]]]
|
(flags, metrics) - Flagged indices and per-sample metrics |
Source code in src/novelentitymatcher/novelty/strategies/confidence.py
get_weight()
¶
get_config()
¶
Get the current configuration for this strategy.
Override this if your strategy stores its config differently.
novelentitymatcher.novelty.strategies.self_knowledge
¶
Self-knowledge detection strategy wrapper.
Wraps SelfKnowledgeDetector to implement NoveltyStrategy protocol.
Classes¶
SelfKnowledgeStrategy()
¶
Bases: NoveltyStrategy
Self-knowledge strategy for novelty detection.
Uses a sparse autoencoder to learn representations of known samples and flags high reconstruction error as novel.
Source code in src/novelentitymatcher/novelty/strategies/self_knowledge.py
novelentitymatcher.novelty.strategies.conformal
¶
Conformal prediction-based calibration for OOD detection strategies.
Wraps raw strategy scores with statistically grounded p-values, enabling rigorous routing of out-of-distribution inputs.
Classes¶
ConformalCalibrator(alpha=0.1, method='split')
¶
Calibrate raw OOD scores into conformal p-values.
Supports two methods:
- "split": Holds out a fraction of reference data for calibration.
- "mondrian": Uses class-conditional (Mondrian) conformal calibration
with per-class nonconformity distributions.
Usage::
cal = ConformalCalibrator(alpha=0.1, method="split")
cal.calibrate(raw_scores, labels)
pvals = cal.predict_pvalues(test_scores)
Source code in src/novelentitymatcher/novelty/strategies/conformal.py
Attributes¶
calibration_metadata
property
¶
Return calibration metadata for reproducibility.
Functions¶
calibrate(scores, labels)
¶
Compute nonconformity scores from calibration data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores
|
ndarray
|
Raw OOD scores for calibration samples, shape (n_samples,). Higher scores indicate more anomalous / novel. |
required |
labels
|
ndarray
|
Class labels for calibration samples, shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
ConformalCalibrator
|
Self for fluent chaining. |
Source code in src/novelentitymatcher/novelty/strategies/conformal.py
predict_pvalues(scores)
¶
Convert raw OOD scores to calibrated p-values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores
|
ndarray
|
Raw scores for test samples, shape (n_samples,). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
p-values, shape (n_samples,). Lower p-value = more likely OOD. |
Source code in src/novelentitymatcher/novelty/strategies/conformal.py
predict_pvalues_for_class(scores, predicted_classes)
¶
Compute class-conditional p-values when predicted classes are known.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scores
|
ndarray
|
Raw OOD scores for test samples. |
required |
predicted_classes
|
list[str]
|
Predicted class for each sample. |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
p-values, shape (n_samples,). |