Quick Start: End-to-End Workflow

Complete lifecycle — conformal regression, classification, DistributionPrediction API, persistence, and validation strategies

Published

May 11, 2026

Quick Start: End-to-End Workflow

Every model in uncertainty_flow returns a DistributionPrediction — not a point estimate. This notebook walks through the complete lifecycle:

Choosing the right validation strategy
Conformal regression (tabular)
The full DistributionPrediction API
Parametric distribution fitting
Comprehensive evaluation
Model persistence (save / load)
Conformal classification

Setup

Code

import polars as pl
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor, RandomForestClassifier

from uncertainty_flow import (
    ConformalRegressor,
    ConformalClassifier,
    coverage_score,
    winkler_score,
    pinball_loss,
)
from uncertainty_flow.utils import select_validation_plan

Validation Strategy

Choosing the right train/test split matters. select_validation_plan() inspects your data shape and task type, then recommends a split strategy.

Code

df = pl.read_parquet("../data/concrete.parquet")
print(f"Shape: {df.shape}")
df.head(3)

Shape: (1030, 9)

shape: (3, 9)

cement	slag	ash	water	superplastic	coarseagg	fineagg	age	strength
f64	f64	f64	f64	f64	f64	f64	i64	f64
540.0	0.0	0.0	162.0	2.5	1040.0	676.0	28	79.99
540.0	0.0	0.0	162.0	2.5	1055.0	676.0	28	61.89
332.5	142.5	0.0	228.0	0.0	932.0	594.0	270	40.27

Code

plan = select_validation_plan(df, task_type="tabular")
print(plan.metadata.strategy_name)

random_holdout

The plan provides ready-to-use splits:

Code

train_df, test_df = plan.outer_split
print(f"Train: {len(train_df)}  Test: {len(test_df)}")

Train: 824  Test: 206

For small datasets (<250 rows), the plan automatically recommends cross-validation instead of holdout.

Conformal Regression

Wrap any scikit-learn regressor with distribution-free coverage guarantees:

Code

model = ConformalRegressor(
    base_model=GradientBoostingRegressor(random_state=42),
    coverage_target=0.9,
)
model.fit(train_df, target="strength")

<uncertainty_flow.wrappers.conformal.ConformalRegressor at 0x1171a42f0>

Code

pred = model.predict(test_df)

DistributionPrediction API

Every predict call returns a DistributionPrediction. Here’s the full surface:

Intervals & Quantiles

Code

intervals = pred.interval(confidence=0.9)
print(intervals.head())

shape: (5, 2)
┌───────────┬───────────┐
│ lower     ┆ upper     │
│ ---       ┆ ---       │
│ f64       ┆ f64       │
╞═══════════╪═══════════╡
│ 22.110626 ┆ 44.408737 │
│ 32.811675 ┆ 55.109786 │
│ 18.645393 ┆ 40.943503 │
│ 29.667123 ┆ 51.965233 │
│ 45.17357  ┆ 67.471681 │
└───────────┴───────────┘

Code

quantiles = pred.quantile([0.1, 0.5, 0.9])
print(quantiles.head())

shape: (5, 3)
┌───────────┬───────────┬───────────┐
│ q_0.100   ┆ q_0.500   ┆ q_0.900   │
│ ---       ┆ ---       ┆ ---       │
│ f64       ┆ f64       ┆ f64       │
╞═══════════╪═══════════╪═══════════╡
│ 23.734416 ┆ 28.761976 ┆ 36.783354 │
│ 34.435465 ┆ 39.463025 ┆ 47.484402 │
│ 20.269182 ┆ 25.296742 ┆ 33.31812  │
│ 31.290912 ┆ 36.318472 ┆ 44.33985  │
│ 46.79736  ┆ 51.82492  ┆ 59.846298 │
└───────────┴───────────┴───────────┘

Code

pred.median().head()

shape: (10,)

median
f64
28.761976
39.463025
25.296742
36.318472
51.82492
57.929891
48.490284
52.308695
46.745018
51.696277

Sampling

Draw synthetic samples from the predicted distribution:

Code

samples = pred.sample(n=500, random_state=42)
print(samples.shape)
samples.head()

(103000, 2)

shape: (5, 2)

sample_id	strength
i64	f64
0	32.02114
0	28.191501
0	35.005991
0	30.654094
0	23.54532

Summary

One-row overview of the entire prediction distribution:

Code

pred.summary()

/Users/minghao/Desktop/personal/uncertainty_flow/.venv/lib/python3.13/site-packages/IPython/core/interactiveshell.py:3748: UserWarning: Nearest quantile levels differ significantly from requested: 0.75→0.70. Consider using more quantile levels for accurate summary statistics.
  exec(code_obj, self.user_global_ns, self.user_ns)

shape: (1, 7)

target	median	mean_width_90	mean_width_50	aleatoric	epistemic	total_uncertainty
str	f64	f64	f64	f64	f64	f64
"strength"	45.814013	22.298111	4.885196	22.298111	1.2377e-29	22.298111

Parametric Distribution Fitting

Fit a parametric distribution (normal, Student-t, lognormal, gamma) to the quantile predictions:

Code

dist = pred.fit_distribution(family="auto")
print(dist)
print(f"  Mean: {dist.mean:.2f}  Variance: {dist.variance:.2f}")

ParametricDistribution(family='lognormal', s=0.76, loc=38.86, scale=6.16)
  Mean: 47.09  Variance: 52.86

Code

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

x = np.linspace(dist.mean - 4 * np.sqrt(dist.variance),
                dist.mean + 4 * np.sqrt(dist.variance), 200)

axes[0].plot(x, dist.pdf(x))
axes[0].set_title(f"PDF ({dist.family})")
axes[0].set_xlabel("Concrete strength")

axes[1].plot(x, dist.cdf(x))
axes[1].set_title(f"CDF ({dist.family})")
axes[1].set_xlabel("Concrete strength")

plt.tight_layout()
plt.show()

Comprehensive Evaluation

Code

y_true = test_df["strength"]
intv = pred.interval(0.9)

results = {
    "coverage_90": coverage_score(y_true, intv["lower"], intv["upper"]),
    "winkler_90": winkler_score(y_true, intv["lower"], intv["upper"], 0.9),
    "pinball_50": pinball_loss(y_true, pred.median(), 0.5),
    "crps": pred.crps(y_true),
    "log_score": pred.log_score(y_true),
}

for name, value in results.items():
    print(f"  {name:20s}: {value:.4f}")

  coverage_90         : 0.2621
  winkler_90          : 187.6153
  pinball_50          : 7.2306
  crps                : 11.7422
  log_score           : -3.6070

/Users/minghao/Desktop/personal/uncertainty_flow/uncertainty_flow/core/distribution.py:1258: UserWarning: 149 log-density values are non-finite. Check that y_true values are in the support of the fitted distribution.
  return _log_score(

Code

fig, ax = plt.subplots(figsize=(8, 4))
names = list(results.keys())
values = list(results.values())
colors = ["#4C78A8" if v != max(values) else "#E45756" for v in values]
ax.barh(names, values, color=colors)
ax.set_xlabel("Score")
ax.set_title("Evaluation Metrics (lower is better for all except coverage)")
ax.invert_yaxis()
plt.tight_layout()
plt.show()

Visualization

Code

pred.plot()
plt.show()

Persistence

Save and reload models with the .uf archive format:

Code

import tempfile
import os

with tempfile.TemporaryDirectory() as tmpdir:
    path = os.path.join(tmpdir, "concrete_model.uf")
    model.save(path)
    print(f"Saved to {path}")

    loaded = ConformalRegressor.load(path)
    pred_loaded = loaded.predict(test_df)

    assert np.allclose(
        pred._quantiles, pred_loaded._quantiles
    ), "Predictions differ after reload!"
    print("Verification passed: predictions identical after save/load round-trip")

Saved to /var/folders/zv/p_57kc9j1fb9xtj06cw1qb1c0000gn/T/tmpdi13zxuh/concrete_model.uf
Verification passed: predictions identical after save/load round-trip

Conformal Classification

The library also supports conformal prediction for classification tasks via ConformalClassifier, which produces PredictionSet objects with guaranteed marginal coverage.

Code

wine = pl.read_parquet("../data/wine_quality.parquet")
print(f"Shape: {wine.shape}")
print(f"Classes: {wine['quality'].unique().sort().to_list()}")
wine.head(3)

Shape: (1599, 12)
Classes: [3, 4, 5, 6, 7, 8]

shape: (3, 12)

fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	i64
7.4	0.7	0.0	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5
7.8	0.88	0.0	2.6	0.098	25.0	67.0	0.9968	3.2	0.68	9.8	5
7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.997	3.26	0.65	9.8	5

Code

plan_cls = select_validation_plan(wine, task_type="tabular")
train_w, test_w = plan_cls.outer_split

clf = ConformalClassifier(
    base_model=RandomForestClassifier(random_state=42, n_estimators=100),
    coverage_target=0.9,
)
clf.fit(train_w, target="quality")
pred_set = clf.predict(test_w)

Code

print(pred_set)
print(f"\nCoverage target: {pred_set.coverage}")
print(f"Average set size: {pred_set.size:.2f} (out of {len(clf._class_names_)} classes)")

PredictionSet(n_samples=319, n_classes=6, coverage=0.90, avg_size=3.12)

Coverage target: 0.9
Average set size: 3.12 (out of 6 classes)

Code

print("Sample prediction sets (first 5):")
for i in range(5):
    classes = pred_set.set(i)
    probs = pred_set.probabilities()
    print(f"  Sample {i}: {classes}")

Sample prediction sets (first 5):
  Sample 0: [np.int64(6), np.int64(5), np.int64(4)]
  Sample 1: [np.int64(6), np.int64(5), np.int64(4)]
  Sample 2: [np.int64(6), np.int64(5), np.int64(7), np.int64(4)]
  Sample 3: [np.int64(6), np.int64(5)]
  Sample 4: [np.int64(6), np.int64(7), np.int64(5)]

Code

pred_set.summary()

shape: (1, 4)

coverage_target	avg_set_size	n_samples	n_classes
f64	f64	i64	i64
0.9	3.122257	319	6

Code

sizes = pred_set.size_by_sample()
fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(sizes, bins=range(1, max(sizes) + 2), align="left", rwidth=0.8, color="#4C78A8")
ax.set_xlabel("Prediction set size (number of classes)")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of Prediction Set Sizes")
plt.tight_layout()
plt.show()

Key Takeaways

Concept	API
Auto validation split	`select_validation_plan(df, target, task_type)`
Conformal regression	`ConformalRegressor(base_model).fit(df, target).predict(df)`
Prediction intervals	`pred.interval(0.9)`
Quantiles	`pred.quantile([0.1, 0.5, 0.9])`
Samples	`pred.sample(n=500)`
Summary	`pred.summary()`
Parametric fit	`pred.fit_distribution(family="auto")`
All metrics	`pred.crps(y)`, `pred.log_score(y)`, `coverage_score()`, `winkler_score()`
Persistence	`model.save(path)` / `ModelClass.load(path)`
Classification	`ConformalClassifier(base_model)` → `PredictionSet`