Novelty Storage¶
novelentitymatcher.novelty.storage.index
¶
Approximate Nearest Neighbor (ANN) index wrapper for efficient similarity search.
Supports HNSWlib and FAISS backends for O(log n) similarity search.
Classes¶
ANNBackend
¶
Supported ANN backends.
ANNIndex(dim, backend=ANNBackend.HNSWLIB, max_elements=100000, ef_construction=200, M=16)
¶
Wrapper for Approximate Nearest Neighbor indexing.
Provides efficient O(log n) similarity search using HNSWlib or FAISS.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dim
|
int
|
Dimensionality of embeddings |
required |
backend
|
str
|
ANN backend to use ('hnswlib' or 'faiss') |
HNSWLIB
|
max_elements
|
int
|
Maximum number of elements to index |
100000
|
ef_construction
|
int
|
HNSW ef_construction parameter (higher = better quality) |
200
|
M
|
int
|
HNSW M parameter (higher = better quality, more memory) |
16
|
Source code in src/novelentitymatcher/novelty/storage/index.py
Attributes¶
n_elements
property
¶
Get number of elements in the index.
labels
property
¶
Return the labels stored alongside indexed vectors.
Functions¶
add_vectors(vectors, labels=None)
¶
Add vectors to the index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vectors
|
ndarray
|
Array of shape (n_vectors, dim) |
required |
labels
|
list[str] | None
|
Optional labels for the vectors |
None
|
Source code in src/novelentitymatcher/novelty/storage/index.py
knn_query(query, k=5)
¶
Find k-nearest neighbors for query vector(s).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
ndarray
|
Query vector or vectors of shape (n_queries, dim) |
required |
k
|
int
|
Number of neighbors to return |
5
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Tuple of (distances, indices) |
ndarray
|
|
tuple[ndarray, ndarray]
|
|
Source code in src/novelentitymatcher/novelty/storage/index.py
get_distance_matrix(queries, targets=None)
¶
Get distance matrix between queries and all indexed vectors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
queries
|
ndarray
|
Query vectors of shape (n_queries, dim) |
required |
targets
|
ndarray | None
|
Optional target vectors (if None, use all indexed vectors) |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Distance matrix of shape (n_queries, n_targets) |
Source code in src/novelentitymatcher/novelty/storage/index.py
save(path)
¶
Save index to disk.
Source code in src/novelentitymatcher/novelty/storage/index.py
load(path)
¶
Load index from disk.
Source code in src/novelentitymatcher/novelty/storage/index.py
clear()
¶
Clear all elements from the index.
Source code in src/novelentitymatcher/novelty/storage/index.py
Functions¶
novelentitymatcher.novelty.storage.persistence
¶
File-based storage for novel class discovery results.
Provides utilities for saving and loading proposals in YAML format.
Classes¶
Functions¶
save_proposals(report, output_dir='./proposals', format='yaml')
¶
Save novel class discovery report to file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
report
|
NovelClassDiscoveryReport
|
Discovery report to save |
required |
output_dir
|
str | Path
|
Directory to save proposals in |
'./proposals'
|
format
|
str
|
Output format ('yaml' or 'json') |
'yaml'
|
Returns:
| Type | Description |
|---|---|
str
|
Path to saved file |
Source code in src/novelentitymatcher/novelty/storage/persistence.py
load_proposals(path)
¶
Load novel class discovery report from file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
Path to proposal file |
required |
Returns:
| Type | Description |
|---|---|
NovelClassDiscoveryReport
|
NovelClassDiscoveryReport |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If file doesn't exist |
ValueError
|
If file format is invalid |
Source code in src/novelentitymatcher/novelty/storage/persistence.py
list_proposals(output_dir='./proposals', sort='newest')
¶
List all discovery reports in output directory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_dir
|
str | Path
|
Directory containing proposals |
'./proposals'
|
sort
|
str
|
Sort order ('newest', 'oldest', 'name') |
'newest'
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
List of proposal metadata dicts |
Source code in src/novelentitymatcher/novelty/storage/persistence.py
export_summary(report, output_path, format='markdown')
¶
Export a human-readable summary of the discovery report.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
report
|
NovelClassDiscoveryReport
|
Discovery report to export |
required |
output_path
|
str | Path
|
Path to save summary |
required |
format
|
str
|
Output format ('markdown' or 'text') |
'markdown'
|
Source code in src/novelentitymatcher/novelty/storage/persistence.py
305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 | |
novelentitymatcher.novelty.storage.review
¶
Lifecycle-aware review and promotion storage for discovery proposals.
Classes¶
ProposalReviewManager(storage_path='./proposals/review_records.json')
¶
Persist and update proposal review records for HITL workflows.
Source code in src/novelentitymatcher/novelty/storage/review.py
Functions¶
promote_with_index_update(review_id, matcher)
¶
Promote and automatically update the matcher's entity index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
review_id
|
str
|
The review record to promote. |
required |
matcher
|
Any
|
A NovelEntityMatcher or similar object with |
required |
Returns:
| Type | Description |
|---|---|
PromotionResult
|
PromotionResult with full details of the promotion. |