Skip to content

Novelty Schemas

novelentitymatcher.novelty.schemas.models

Canonical Pydantic models for novelty detection and discovery.

Classes

NovelSampleMetadata

Bases: BaseModel

Metadata for a single sample flagged as novel.

NovelSampleReport

Bases: BaseModel

Novel samples found during a detection run.

ClusterEvidence

Bases: BaseModel

Compact statistical evidence extracted for a cluster.

DiscoveryCluster

Bases: BaseModel

Community of likely novel samples discovered in a batch.

ClassProposal

Bases: BaseModel

A proposed class for a cluster of novel samples.

DiscoveredAttribute

Bases: BaseModel

A discovered attribute/field for a proposed class.

NovelClassAnalysis

Bases: BaseModel

Class proposals generated from a novelty discovery run.

ProposalReviewRecord

Bases: BaseModel

Lifecycle-aware review record for a proposed class.

NovelClassDiscoveryReport

Bases: BaseModel

End-to-end report for novelty detection and optional proposal generation.

PromotionResult(review_record, entities_added=list(), index_updated=False, retrain_required=False) dataclass

Captures what happened during a promotion.

Attributes
state property

Backward-compatible alias for review_record.state.

promoted_at property

Backward-compatible alias for review_record.promoted_at.

novelentitymatcher.novelty.schemas.reports

Report dataclasses for novelty detection.

This module re-exports the main report classes for convenience.

Classes

DetectionReport(novelty_report, strategies_used, runtime_seconds, timestamp, additional_info=dict()) dataclass

Report from a complete detection run.

Contains the NovelSampleReport plus additional metadata about the detection run (timing, strategy performance, etc.).

Attributes
novelty_report instance-attribute

The core novelty detection report.

strategies_used instance-attribute

List of strategies that were used.

runtime_seconds instance-attribute

Time taken for detection in seconds.

timestamp instance-attribute

ISO timestamp of when detection was run.

additional_info = field(default_factory=dict) class-attribute instance-attribute

Any additional information to include in the report.

EvaluationReport(auroc, auprc, detection_rate_at_1, detection_rate_at_5, detection_rate_at_10, precision, recall, f1, optimal_threshold, confusion_matrix=None, per_class_metrics=None, num_samples=0, num_novel=0, timestamp='') dataclass

Report from evaluating novelty detection.

Contains metrics from evaluating on a labeled dataset.

Attributes
auroc instance-attribute

Area under ROC curve.

auprc instance-attribute

Area under Precision-Recall curve.

detection_rate_at_1 instance-attribute

Detection rate at 1% false positive rate.

detection_rate_at_5 instance-attribute

Detection rate at 5% false positive rate.

detection_rate_at_10 instance-attribute

Detection rate at 10% false positive rate.

precision instance-attribute

Precision at optimal threshold.

recall instance-attribute

Recall at optimal threshold.

f1 instance-attribute

F1 score at optimal threshold.

optimal_threshold instance-attribute

Threshold that maximizes F1 score.

confusion_matrix = None class-attribute instance-attribute

Confusion matrix at optimal threshold.

per_class_metrics = None class-attribute instance-attribute

Per-class metrics if available.

num_samples = 0 class-attribute instance-attribute

Total number of samples evaluated.

num_novel = 0 class-attribute instance-attribute

Number of actually novel samples.

timestamp = '' class-attribute instance-attribute

ISO timestamp of when evaluation was run.

NovelSampleReport(novel_indices, novel_scores, num_novel, num_total, novel_ratio, sample_metadata, strategy_flags, config_used) dataclass

Comprehensive report from novelty detection.

Contains all results from running novelty detection on a batch of samples.

Attributes
novel_indices instance-attribute

Indices of samples flagged as novel.

novel_scores instance-attribute

Novelty scores for all flagged samples.

num_novel instance-attribute

Number of samples flagged as novel.

num_total instance-attribute

Total number of samples processed.

novel_ratio instance-attribute

Ratio of novel samples (num_novel / num_total).

sample_metadata instance-attribute

Per-sample metadata including text, class, confidence, metrics.

strategy_flags instance-attribute

Strategy-level statistics (num_flagged, flagged_indices).

config_used instance-attribute

Configuration used for detection.

Functions
get_novel_samples()

Get metadata for only the novel samples.

Returns:

Type Description
list[dict[str, Any]]

List of metadata dicts for novel samples

Source code in src/novelentitymatcher/novelty/schemas/results.py
def get_novel_samples(self) -> list[dict[str, Any]]:
    """
    Get metadata for only the novel samples.

    Returns:
        List of metadata dicts for novel samples
    """
    return [m for m in self.sample_metadata if m["is_novel"]]
get_strategy_novel_count(strategy_id)

Get number of samples flagged by a specific strategy.

Parameters:

Name Type Description Default
strategy_id str

Strategy identifier

required

Returns:

Type Description
int

Number of samples flagged by the strategy

Source code in src/novelentitymatcher/novelty/schemas/results.py
def get_strategy_novel_count(self, strategy_id: str) -> int:
    """
    Get number of samples flagged by a specific strategy.

    Args:
        strategy_id: Strategy identifier

    Returns:
        Number of samples flagged by the strategy
    """
    return self.strategy_flags.get(strategy_id, {}).get("num_flagged", 0)
get_sample_at_index(index)

Get metadata for a specific sample by index.

Parameters:

Name Type Description Default
index int

Sample index

required

Returns:

Type Description
dict[str, Any] | None

Metadata dict if index is valid, None otherwise

Source code in src/novelentitymatcher/novelty/schemas/results.py
def get_sample_at_index(self, index: int) -> dict[str, Any] | None:
    """
    Get metadata for a specific sample by index.

    Args:
        index: Sample index

    Returns:
        Metadata dict if index is valid, None otherwise
    """
    if 0 <= index < len(self.sample_metadata):
        return self.sample_metadata[index]
    return None

novelentitymatcher.novelty.schemas.results

Result dataclasses for novelty detection.

Contains data structures for detection results, metrics, and reports.

Classes

StrategyMetrics(strategy_id, flags, metrics) dataclass

Metrics from a single strategy.

Contains the flags and per-sample metrics produced by a strategy.

Attributes
strategy_id instance-attribute

Identifier for the strategy.

flags instance-attribute

Indices flagged as novel by this strategy.

metrics instance-attribute

Per-sample metrics from this strategy.

SampleMetrics(index, text, predicted_class, confidence, is_novel, novelty_score, strategy_flags, raw_metrics) dataclass

Aggregated metrics for a single sample.

Contains metrics from all strategies for a specific sample.

Attributes
index instance-attribute

Sample index in the input batch.

text instance-attribute

The input text.

predicted_class instance-attribute

Predicted class for this sample.

confidence instance-attribute

Prediction confidence score.

is_novel instance-attribute

Whether this sample was flagged as novel.

novelty_score instance-attribute

Final combined novelty score.

strategy_flags instance-attribute

Which strategies flagged this sample.

raw_metrics instance-attribute

Raw metrics from each strategy.

NovelSampleReport(novel_indices, novel_scores, num_novel, num_total, novel_ratio, sample_metadata, strategy_flags, config_used) dataclass

Comprehensive report from novelty detection.

Contains all results from running novelty detection on a batch of samples.

Attributes
novel_indices instance-attribute

Indices of samples flagged as novel.

novel_scores instance-attribute

Novelty scores for all flagged samples.

num_novel instance-attribute

Number of samples flagged as novel.

num_total instance-attribute

Total number of samples processed.

novel_ratio instance-attribute

Ratio of novel samples (num_novel / num_total).

sample_metadata instance-attribute

Per-sample metadata including text, class, confidence, metrics.

strategy_flags instance-attribute

Strategy-level statistics (num_flagged, flagged_indices).

config_used instance-attribute

Configuration used for detection.

Functions
get_novel_samples()

Get metadata for only the novel samples.

Returns:

Type Description
list[dict[str, Any]]

List of metadata dicts for novel samples

Source code in src/novelentitymatcher/novelty/schemas/results.py
def get_novel_samples(self) -> list[dict[str, Any]]:
    """
    Get metadata for only the novel samples.

    Returns:
        List of metadata dicts for novel samples
    """
    return [m for m in self.sample_metadata if m["is_novel"]]
get_strategy_novel_count(strategy_id)

Get number of samples flagged by a specific strategy.

Parameters:

Name Type Description Default
strategy_id str

Strategy identifier

required

Returns:

Type Description
int

Number of samples flagged by the strategy

Source code in src/novelentitymatcher/novelty/schemas/results.py
def get_strategy_novel_count(self, strategy_id: str) -> int:
    """
    Get number of samples flagged by a specific strategy.

    Args:
        strategy_id: Strategy identifier

    Returns:
        Number of samples flagged by the strategy
    """
    return self.strategy_flags.get(strategy_id, {}).get("num_flagged", 0)
get_sample_at_index(index)

Get metadata for a specific sample by index.

Parameters:

Name Type Description Default
index int

Sample index

required

Returns:

Type Description
dict[str, Any] | None

Metadata dict if index is valid, None otherwise

Source code in src/novelentitymatcher/novelty/schemas/results.py
def get_sample_at_index(self, index: int) -> dict[str, Any] | None:
    """
    Get metadata for a specific sample by index.

    Args:
        index: Sample index

    Returns:
        Metadata dict if index is valid, None otherwise
    """
    if 0 <= index < len(self.sample_metadata):
        return self.sample_metadata[index]
    return None

DetectionReport(novelty_report, strategies_used, runtime_seconds, timestamp, additional_info=dict()) dataclass

Report from a complete detection run.

Contains the NovelSampleReport plus additional metadata about the detection run (timing, strategy performance, etc.).

Attributes
novelty_report instance-attribute

The core novelty detection report.

strategies_used instance-attribute

List of strategies that were used.

runtime_seconds instance-attribute

Time taken for detection in seconds.

timestamp instance-attribute

ISO timestamp of when detection was run.

additional_info = field(default_factory=dict) class-attribute instance-attribute

Any additional information to include in the report.

EvaluationReport(auroc, auprc, detection_rate_at_1, detection_rate_at_5, detection_rate_at_10, precision, recall, f1, optimal_threshold, confusion_matrix=None, per_class_metrics=None, num_samples=0, num_novel=0, timestamp='') dataclass

Report from evaluating novelty detection.

Contains metrics from evaluating on a labeled dataset.

Attributes
auroc instance-attribute

Area under ROC curve.

auprc instance-attribute

Area under Precision-Recall curve.

detection_rate_at_1 instance-attribute

Detection rate at 1% false positive rate.

detection_rate_at_5 instance-attribute

Detection rate at 5% false positive rate.

detection_rate_at_10 instance-attribute

Detection rate at 10% false positive rate.

precision instance-attribute

Precision at optimal threshold.

recall instance-attribute

Recall at optimal threshold.

f1 instance-attribute

F1 score at optimal threshold.

optimal_threshold instance-attribute

Threshold that maximizes F1 score.

confusion_matrix = None class-attribute instance-attribute

Confusion matrix at optimal threshold.

per_class_metrics = None class-attribute instance-attribute

Per-class metrics if available.

num_samples = 0 class-attribute instance-attribute

Total number of samples evaluated.

num_novel = 0 class-attribute instance-attribute

Number of actually novel samples.

timestamp = '' class-attribute instance-attribute

ISO timestamp of when evaluation was run.