Novelty Schemas¶
novelentitymatcher.novelty.schemas.models
¶
Canonical Pydantic models for novelty detection and discovery.
Classes¶
NovelSampleMetadata
¶
Bases: BaseModel
Metadata for a single sample flagged as novel.
NovelSampleReport
¶
Bases: BaseModel
Novel samples found during a detection run.
ClusterEvidence
¶
Bases: BaseModel
Compact statistical evidence extracted for a cluster.
DiscoveryCluster
¶
Bases: BaseModel
Community of likely novel samples discovered in a batch.
ClassProposal
¶
Bases: BaseModel
A proposed class for a cluster of novel samples.
DiscoveredAttribute
¶
Bases: BaseModel
A discovered attribute/field for a proposed class.
NovelClassAnalysis
¶
Bases: BaseModel
Class proposals generated from a novelty discovery run.
ProposalReviewRecord
¶
Bases: BaseModel
Lifecycle-aware review record for a proposed class.
NovelClassDiscoveryReport
¶
Bases: BaseModel
End-to-end report for novelty detection and optional proposal generation.
PromotionResult(review_record, entities_added=list(), index_updated=False, retrain_required=False)
dataclass
¶
novelentitymatcher.novelty.schemas.reports
¶
Report dataclasses for novelty detection.
This module re-exports the main report classes for convenience.
Classes¶
DetectionReport(novelty_report, strategies_used, runtime_seconds, timestamp, additional_info=dict())
dataclass
¶
Report from a complete detection run.
Contains the NovelSampleReport plus additional metadata about the detection run (timing, strategy performance, etc.).
Attributes¶
novelty_report
instance-attribute
¶
The core novelty detection report.
strategies_used
instance-attribute
¶
List of strategies that were used.
runtime_seconds
instance-attribute
¶
Time taken for detection in seconds.
timestamp
instance-attribute
¶
ISO timestamp of when detection was run.
additional_info = field(default_factory=dict)
class-attribute
instance-attribute
¶
Any additional information to include in the report.
EvaluationReport(auroc, auprc, detection_rate_at_1, detection_rate_at_5, detection_rate_at_10, precision, recall, f1, optimal_threshold, confusion_matrix=None, per_class_metrics=None, num_samples=0, num_novel=0, timestamp='')
dataclass
¶
Report from evaluating novelty detection.
Contains metrics from evaluating on a labeled dataset.
Attributes¶
auroc
instance-attribute
¶
Area under ROC curve.
auprc
instance-attribute
¶
Area under Precision-Recall curve.
detection_rate_at_1
instance-attribute
¶
Detection rate at 1% false positive rate.
detection_rate_at_5
instance-attribute
¶
Detection rate at 5% false positive rate.
detection_rate_at_10
instance-attribute
¶
Detection rate at 10% false positive rate.
precision
instance-attribute
¶
Precision at optimal threshold.
recall
instance-attribute
¶
Recall at optimal threshold.
f1
instance-attribute
¶
F1 score at optimal threshold.
optimal_threshold
instance-attribute
¶
Threshold that maximizes F1 score.
confusion_matrix = None
class-attribute
instance-attribute
¶
Confusion matrix at optimal threshold.
per_class_metrics = None
class-attribute
instance-attribute
¶
Per-class metrics if available.
num_samples = 0
class-attribute
instance-attribute
¶
Total number of samples evaluated.
num_novel = 0
class-attribute
instance-attribute
¶
Number of actually novel samples.
timestamp = ''
class-attribute
instance-attribute
¶
ISO timestamp of when evaluation was run.
NovelSampleReport(novel_indices, novel_scores, num_novel, num_total, novel_ratio, sample_metadata, strategy_flags, config_used)
dataclass
¶
Comprehensive report from novelty detection.
Contains all results from running novelty detection on a batch of samples.
Attributes¶
novel_indices
instance-attribute
¶
Indices of samples flagged as novel.
novel_scores
instance-attribute
¶
Novelty scores for all flagged samples.
num_novel
instance-attribute
¶
Number of samples flagged as novel.
num_total
instance-attribute
¶
Total number of samples processed.
novel_ratio
instance-attribute
¶
Ratio of novel samples (num_novel / num_total).
sample_metadata
instance-attribute
¶
Per-sample metadata including text, class, confidence, metrics.
strategy_flags
instance-attribute
¶
Strategy-level statistics (num_flagged, flagged_indices).
config_used
instance-attribute
¶
Configuration used for detection.
Functions¶
get_novel_samples()
¶
Get metadata for only the novel samples.
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of metadata dicts for novel samples |
Source code in src/novelentitymatcher/novelty/schemas/results.py
get_strategy_novel_count(strategy_id)
¶
Get number of samples flagged by a specific strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy_id
|
str
|
Strategy identifier |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of samples flagged by the strategy |
Source code in src/novelentitymatcher/novelty/schemas/results.py
get_sample_at_index(index)
¶
Get metadata for a specific sample by index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
int
|
Sample index |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
Metadata dict if index is valid, None otherwise |
Source code in src/novelentitymatcher/novelty/schemas/results.py
novelentitymatcher.novelty.schemas.results
¶
Result dataclasses for novelty detection.
Contains data structures for detection results, metrics, and reports.
Classes¶
StrategyMetrics(strategy_id, flags, metrics)
dataclass
¶
Metrics from a single strategy.
Contains the flags and per-sample metrics produced by a strategy.
SampleMetrics(index, text, predicted_class, confidence, is_novel, novelty_score, strategy_flags, raw_metrics)
dataclass
¶
Aggregated metrics for a single sample.
Contains metrics from all strategies for a specific sample.
Attributes¶
index
instance-attribute
¶
Sample index in the input batch.
text
instance-attribute
¶
The input text.
predicted_class
instance-attribute
¶
Predicted class for this sample.
confidence
instance-attribute
¶
Prediction confidence score.
is_novel
instance-attribute
¶
Whether this sample was flagged as novel.
novelty_score
instance-attribute
¶
Final combined novelty score.
strategy_flags
instance-attribute
¶
Which strategies flagged this sample.
raw_metrics
instance-attribute
¶
Raw metrics from each strategy.
NovelSampleReport(novel_indices, novel_scores, num_novel, num_total, novel_ratio, sample_metadata, strategy_flags, config_used)
dataclass
¶
Comprehensive report from novelty detection.
Contains all results from running novelty detection on a batch of samples.
Attributes¶
novel_indices
instance-attribute
¶
Indices of samples flagged as novel.
novel_scores
instance-attribute
¶
Novelty scores for all flagged samples.
num_novel
instance-attribute
¶
Number of samples flagged as novel.
num_total
instance-attribute
¶
Total number of samples processed.
novel_ratio
instance-attribute
¶
Ratio of novel samples (num_novel / num_total).
sample_metadata
instance-attribute
¶
Per-sample metadata including text, class, confidence, metrics.
strategy_flags
instance-attribute
¶
Strategy-level statistics (num_flagged, flagged_indices).
config_used
instance-attribute
¶
Configuration used for detection.
Functions¶
get_novel_samples()
¶
Get metadata for only the novel samples.
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of metadata dicts for novel samples |
Source code in src/novelentitymatcher/novelty/schemas/results.py
get_strategy_novel_count(strategy_id)
¶
Get number of samples flagged by a specific strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy_id
|
str
|
Strategy identifier |
required |
Returns:
| Type | Description |
|---|---|
int
|
Number of samples flagged by the strategy |
Source code in src/novelentitymatcher/novelty/schemas/results.py
get_sample_at_index(index)
¶
Get metadata for a specific sample by index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index
|
int
|
Sample index |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
Metadata dict if index is valid, None otherwise |
Source code in src/novelentitymatcher/novelty/schemas/results.py
DetectionReport(novelty_report, strategies_used, runtime_seconds, timestamp, additional_info=dict())
dataclass
¶
Report from a complete detection run.
Contains the NovelSampleReport plus additional metadata about the detection run (timing, strategy performance, etc.).
Attributes¶
novelty_report
instance-attribute
¶
The core novelty detection report.
strategies_used
instance-attribute
¶
List of strategies that were used.
runtime_seconds
instance-attribute
¶
Time taken for detection in seconds.
timestamp
instance-attribute
¶
ISO timestamp of when detection was run.
additional_info = field(default_factory=dict)
class-attribute
instance-attribute
¶
Any additional information to include in the report.
EvaluationReport(auroc, auprc, detection_rate_at_1, detection_rate_at_5, detection_rate_at_10, precision, recall, f1, optimal_threshold, confusion_matrix=None, per_class_metrics=None, num_samples=0, num_novel=0, timestamp='')
dataclass
¶
Report from evaluating novelty detection.
Contains metrics from evaluating on a labeled dataset.
Attributes¶
auroc
instance-attribute
¶
Area under ROC curve.
auprc
instance-attribute
¶
Area under Precision-Recall curve.
detection_rate_at_1
instance-attribute
¶
Detection rate at 1% false positive rate.
detection_rate_at_5
instance-attribute
¶
Detection rate at 5% false positive rate.
detection_rate_at_10
instance-attribute
¶
Detection rate at 10% false positive rate.
precision
instance-attribute
¶
Precision at optimal threshold.
recall
instance-attribute
¶
Recall at optimal threshold.
f1
instance-attribute
¶
F1 score at optimal threshold.
optimal_threshold
instance-attribute
¶
Threshold that maximizes F1 score.
confusion_matrix = None
class-attribute
instance-attribute
¶
Confusion matrix at optimal threshold.
per_class_metrics = None
class-attribute
instance-attribute
¶
Per-class metrics if available.
num_samples = 0
class-attribute
instance-attribute
¶
Total number of samples evaluated.
num_novel = 0
class-attribute
instance-attribute
¶
Number of actually novel samples.
timestamp = ''
class-attribute
instance-attribute
¶
ISO timestamp of when evaluation was run.