Hits & Misses
Pythia's public track record. Every pick graded out-of-sample — the model trained on other seasons and scored the one it had never seen.
Updated June 2026 · leave-one-season-out · breakout and regression kept separate
Breakout calls
Young forwards (age ≤ 26) flagged before becoming a top-six scorer.
Top-5 picks break out at 30% vs the 18% base rate.
10 seasons (2009–2023) · 245 eligible · 43 actual breakouts · out-of-sample
Highest-confidence hits
Pythia ScoreHonest misses — top-flagged, didn't pan out
Pythia ScoreOnes we under-rated
Breakouts Pythia didn't see coming — the model's honest blind spots (real breakouts it scored low). Shown for credibility, not as calls we made.
Regression calls
Established forwards (any age) with room to fall, flagged before a sustained decline.
Top-5 picks decline at 16% vs the 16% base rate.
A harder, noisier call — the very top-5 lands near the base rate; the edge shows across the wider top decile (1.2×).
10 seasons (2009–2023) · 1,868 eligible · 301 actual declines · out-of-sample
Highest-confidence hits
Decline riskHonest misses — top-flagged, didn't pan out
Decline riskHow this works — and why we show the misses
Out-of-sample only. Every prediction is out-of-sample: the model was trained on all other seasons and scored the held-out season it had never seen (leave-one-season-out). No full-data model is used. Breakout and regression are separate track records and are never blended.
Every pick is shown — hits and misses. At these odds, misses are the majority, and that is expected: the breakout board's ~1.7× lift on a ~18% base rate still means most flagged players don't hit. A model that never missed would just be repeating the box score, not predicting ahead of it.
Breakout and regression are graded on separate labels, pools, and samples — they are never blended into one accuracy number. Data: MoneyPuck.com.