Quick Start: End-to-End Workflow

Complete lifecycle — conformal regression, classification, DistributionPrediction API, persistence, and validation strategies
Published

May 11, 2026

Quick Start: End-to-End Workflow

Every model in uncertainty_flow returns a DistributionPrediction — not a point estimate. This notebook walks through the complete lifecycle:

  1. Choosing the right validation strategy
  2. Conformal regression (tabular)
  3. The full DistributionPrediction API
  4. Parametric distribution fitting
  5. Comprehensive evaluation
  6. Model persistence (save / load)
  7. Conformal classification

Setup

Code
import polars as pl
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingRegressor, RandomForestClassifier

from uncertainty_flow import (
    ConformalRegressor,
    ConformalClassifier,
    coverage_score,
    winkler_score,
    pinball_loss,
)
from uncertainty_flow.utils import select_validation_plan

Validation Strategy

Choosing the right train/test split matters. select_validation_plan() inspects your data shape and task type, then recommends a split strategy.

Code
df = pl.read_parquet("../data/concrete.parquet")
print(f"Shape: {df.shape}")
df.head(3)
Shape: (1030, 9)
shape: (3, 9)
cement slag ash water superplastic coarseagg fineagg age strength
f64 f64 f64 f64 f64 f64 f64 i64 f64
540.0 0.0 0.0 162.0 2.5 1040.0 676.0 28 79.99
540.0 0.0 0.0 162.0 2.5 1055.0 676.0 28 61.89
332.5 142.5 0.0 228.0 0.0 932.0 594.0 270 40.27
Code
plan = select_validation_plan(df, task_type="tabular")
print(plan.metadata.strategy_name)
random_holdout

The plan provides ready-to-use splits:

Code
train_df, test_df = plan.outer_split
print(f"Train: {len(train_df)}  Test: {len(test_df)}")
Train: 824  Test: 206

For small datasets (<250 rows), the plan automatically recommends cross-validation instead of holdout.

Conformal Regression

Wrap any scikit-learn regressor with distribution-free coverage guarantees:

Code
model = ConformalRegressor(
    base_model=GradientBoostingRegressor(random_state=42),
    coverage_target=0.9,
)
model.fit(train_df, target="strength")
<uncertainty_flow.wrappers.conformal.ConformalRegressor at 0x1171a42f0>
Code
pred = model.predict(test_df)

DistributionPrediction API

Every predict call returns a DistributionPrediction. Here’s the full surface:

Intervals & Quantiles

Code
intervals = pred.interval(confidence=0.9)
print(intervals.head())
shape: (5, 2)
┌───────────┬───────────┐
│ lower     ┆ upper     │
│ ---       ┆ ---       │
│ f64       ┆ f64       │
╞═══════════╪═══════════╡
│ 22.110626 ┆ 44.408737 │
│ 32.811675 ┆ 55.109786 │
│ 18.645393 ┆ 40.943503 │
│ 29.667123 ┆ 51.965233 │
│ 45.17357  ┆ 67.471681 │
└───────────┴───────────┘
Code
quantiles = pred.quantile([0.1, 0.5, 0.9])
print(quantiles.head())
shape: (5, 3)
┌───────────┬───────────┬───────────┐
│ q_0.100   ┆ q_0.500   ┆ q_0.900   │
│ ---       ┆ ---       ┆ ---       │
│ f64       ┆ f64       ┆ f64       │
╞═══════════╪═══════════╪═══════════╡
│ 23.734416 ┆ 28.761976 ┆ 36.783354 │
│ 34.435465 ┆ 39.463025 ┆ 47.484402 │
│ 20.269182 ┆ 25.296742 ┆ 33.31812  │
│ 31.290912 ┆ 36.318472 ┆ 44.33985  │
│ 46.79736  ┆ 51.82492  ┆ 59.846298 │
└───────────┴───────────┴───────────┘
Code
pred.median().head()
shape: (10,)
median
f64
28.761976
39.463025
25.296742
36.318472
51.82492
57.929891
48.490284
52.308695
46.745018
51.696277

Sampling

Draw synthetic samples from the predicted distribution:

Code
samples = pred.sample(n=500, random_state=42)
print(samples.shape)
samples.head()
(103000, 2)
shape: (5, 2)
sample_id strength
i64 f64
0 32.02114
0 28.191501
0 35.005991
0 30.654094
0 23.54532

Summary

One-row overview of the entire prediction distribution:

Code
pred.summary()
/Users/minghao/Desktop/personal/uncertainty_flow/.venv/lib/python3.13/site-packages/IPython/core/interactiveshell.py:3748: UserWarning: Nearest quantile levels differ significantly from requested: 0.75→0.70. Consider using more quantile levels for accurate summary statistics.
  exec(code_obj, self.user_global_ns, self.user_ns)
shape: (1, 7)
target median mean_width_90 mean_width_50 aleatoric epistemic total_uncertainty
str f64 f64 f64 f64 f64 f64
"strength" 45.814013 22.298111 4.885196 22.298111 1.2377e-29 22.298111

Parametric Distribution Fitting

Fit a parametric distribution (normal, Student-t, lognormal, gamma) to the quantile predictions:

Code
dist = pred.fit_distribution(family="auto")
print(dist)
print(f"  Mean: {dist.mean:.2f}  Variance: {dist.variance:.2f}")
ParametricDistribution(family='lognormal', s=0.76, loc=38.86, scale=6.16)
  Mean: 47.09  Variance: 52.86
Code
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

x = np.linspace(dist.mean - 4 * np.sqrt(dist.variance),
                dist.mean + 4 * np.sqrt(dist.variance), 200)

axes[0].plot(x, dist.pdf(x))
axes[0].set_title(f"PDF ({dist.family})")
axes[0].set_xlabel("Concrete strength")

axes[1].plot(x, dist.cdf(x))
axes[1].set_title(f"CDF ({dist.family})")
axes[1].set_xlabel("Concrete strength")

plt.tight_layout()
plt.show()

Comprehensive Evaluation

Code
y_true = test_df["strength"]
intv = pred.interval(0.9)

results = {
    "coverage_90": coverage_score(y_true, intv["lower"], intv["upper"]),
    "winkler_90": winkler_score(y_true, intv["lower"], intv["upper"], 0.9),
    "pinball_50": pinball_loss(y_true, pred.median(), 0.5),
    "crps": pred.crps(y_true),
    "log_score": pred.log_score(y_true),
}

for name, value in results.items():
    print(f"  {name:20s}: {value:.4f}")
  coverage_90         : 0.2621
  winkler_90          : 187.6153
  pinball_50          : 7.2306
  crps                : 11.7422
  log_score           : -3.6070
/Users/minghao/Desktop/personal/uncertainty_flow/uncertainty_flow/core/distribution.py:1258: UserWarning: 149 log-density values are non-finite. Check that y_true values are in the support of the fitted distribution.
  return _log_score(
Code
fig, ax = plt.subplots(figsize=(8, 4))
names = list(results.keys())
values = list(results.values())
colors = ["#4C78A8" if v != max(values) else "#E45756" for v in values]
ax.barh(names, values, color=colors)
ax.set_xlabel("Score")
ax.set_title("Evaluation Metrics (lower is better for all except coverage)")
ax.invert_yaxis()
plt.tight_layout()
plt.show()

Visualization

Code
pred.plot()
plt.show()

Persistence

Save and reload models with the .uf archive format:

Code
import tempfile
import os

with tempfile.TemporaryDirectory() as tmpdir:
    path = os.path.join(tmpdir, "concrete_model.uf")
    model.save(path)
    print(f"Saved to {path}")

    loaded = ConformalRegressor.load(path)
    pred_loaded = loaded.predict(test_df)

    assert np.allclose(
        pred._quantiles, pred_loaded._quantiles
    ), "Predictions differ after reload!"
    print("Verification passed: predictions identical after save/load round-trip")
Saved to /var/folders/zv/p_57kc9j1fb9xtj06cw1qb1c0000gn/T/tmpdi13zxuh/concrete_model.uf
Verification passed: predictions identical after save/load round-trip

Conformal Classification

The library also supports conformal prediction for classification tasks via ConformalClassifier, which produces PredictionSet objects with guaranteed marginal coverage.

Code
wine = pl.read_parquet("../data/wine_quality.parquet")
print(f"Shape: {wine.shape}")
print(f"Classes: {wine['quality'].unique().sort().to_list()}")
wine.head(3)
Shape: (1599, 12)
Classes: [3, 4, 5, 6, 7, 8]
shape: (3, 12)
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 i64
7.4 0.7 0.0 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
7.8 0.88 0.0 2.6 0.098 25.0 67.0 0.9968 3.2 0.68 9.8 5
7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.997 3.26 0.65 9.8 5
Code
plan_cls = select_validation_plan(wine, task_type="tabular")
train_w, test_w = plan_cls.outer_split

clf = ConformalClassifier(
    base_model=RandomForestClassifier(random_state=42, n_estimators=100),
    coverage_target=0.9,
)
clf.fit(train_w, target="quality")
pred_set = clf.predict(test_w)
Code
print(pred_set)
print(f"\nCoverage target: {pred_set.coverage}")
print(f"Average set size: {pred_set.size:.2f} (out of {len(clf._class_names_)} classes)")
PredictionSet(n_samples=319, n_classes=6, coverage=0.90, avg_size=3.12)

Coverage target: 0.9
Average set size: 3.12 (out of 6 classes)
Code
print("Sample prediction sets (first 5):")
for i in range(5):
    classes = pred_set.set(i)
    probs = pred_set.probabilities()
    print(f"  Sample {i}: {classes}")
Sample prediction sets (first 5):
  Sample 0: [np.int64(6), np.int64(5), np.int64(4)]
  Sample 1: [np.int64(6), np.int64(5), np.int64(4)]
  Sample 2: [np.int64(6), np.int64(5), np.int64(7), np.int64(4)]
  Sample 3: [np.int64(6), np.int64(5)]
  Sample 4: [np.int64(6), np.int64(7), np.int64(5)]
Code
pred_set.summary()
shape: (1, 4)
coverage_target avg_set_size n_samples n_classes
f64 f64 i64 i64
0.9 3.122257 319 6
Code
sizes = pred_set.size_by_sample()
fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(sizes, bins=range(1, max(sizes) + 2), align="left", rwidth=0.8, color="#4C78A8")
ax.set_xlabel("Prediction set size (number of classes)")
ax.set_ylabel("Frequency")
ax.set_title("Distribution of Prediction Set Sizes")
plt.tight_layout()
plt.show()

Key Takeaways

Concept API
Auto validation split select_validation_plan(df, target, task_type)
Conformal regression ConformalRegressor(base_model).fit(df, target).predict(df)
Prediction intervals pred.interval(0.9)
Quantiles pred.quantile([0.1, 0.5, 0.9])
Samples pred.sample(n=500)
Summary pred.summary()
Parametric fit pred.fit_distribution(family="auto")
All metrics pred.crps(y), pred.log_score(y), coverage_score(), winkler_score()
Persistence model.save(path) / ModelClass.load(path)
Classification ConformalClassifier(base_model)PredictionSet