METHODOLOGY
EWA attributes each possession's win-probability swing across the ten players on the court via ridge regression. The pregame projection layer aggregates those ratings into team-level forecasts. Below: rolling-origin backtest, calibration, sensitivity sweeps, lineage, and code links. Plain-English version is on /about.
Roster-aware EWA improves over team-only on the pooled point estimate: Brier −3.58% (4/4 folds CI-exclude zero), log-loss −2.66% (4/4), margin RMSE −1.59% (3/4). Accuracy gains +3.40 pp pooled and is positive in every fold, but per-fold n=400-440 underpowers the individual accuracy CIs. Market odds average 67.7% accurate across folds — reported as benchmark, not target.
The 2024-25 fold (n_train = 5,822, n_test = 401) shown as a representative slice. All five models fit on the same train games and scored on the same test games. EWA uses the roster-aware aggregate (each team's most recent 30 train games). The three other folds (2021-22, 2022-23, 2023-24) show the same shape — see the rolling-origin table below for per-fold deltas. Lower is better for Brier and margin RMSE; higher is better for accuracy. Bracketed numbers are 95% bootstrap CIs.
| Model | Brier | Accuracy | Margin RMSE |
|---|---|---|---|
| Naive (50/50) | 0.2500 [0.250, 0.250] | 50% expected — | 15.75 [14.7, 16.7] |
| Home court only | 0.2456 [0.241, 0.251] | 56.9% [51.9, 61.6] | 15.58 [14.5, 16.5] |
| Team identity (no players) | 0.2451 [0.240, 0.251] | 58.1% [53.4, 62.8] | 15.56 [14.5, 16.6] |
| EWA (roster-aware) | 0.2365 [0.228, 0.244] | 59.4% [54.4, 64.3] | 15.31 [14.2, 16.3] |
| Market (Vegas, de-vigged) | 0.2011 [0.184, 0.218] | 67.3% [62.6, 72.1] | N/A — |
Market is included as benchmark/context. The accuracy gap (~8 pp pooled across folds) reflects information EWA does not use — line movement, sharp action, real-time injuries. We don't try to close it on this page.
Across the 4 folds, here's how often EWA's improvement over team-only is statistically distinguishable from zero. CIs that exclude zero are paired-bootstrap CIs computed within each individual fold (1,000 resamples, n ≈ 400-440 per fold).
Each row is an independent chronological fold: train strictly on games from prior seasons, test on one season's odds-matched games. The pattern holds across all four cutoffs — Brier and log-loss CI-exclude zero in 4/4 folds, margin RMSE in 3/4. Same direction, same approximate magnitude, every time.
| Test | n_train | n_test | EWA acc | Mkt acc | Δ Brier | Δ Log-loss | Δ RMSE |
|---|---|---|---|---|---|---|---|
| 2021-22 | 2,136 | 417 | 59.2% | 69.8% | +3.95% ✓ | +2.88% ✓ | +1.82% ✓ |
| 2022-23 | 3,366 | 404 | 60.9% | 64.4% | +3.11% ✓ | +2.34% ✓ | +1.23% ✗ |
| 2023-24 | 4,593 | 438 | 57.1% | 69.2% | +3.73% ✓ | +2.79% ✓ | +1.71% ✓ |
| 2024-25 | 5,822 | 401 | 59.4% | 67.3% | +3.51% ✓ | +2.63% ✓ | +1.61% ✓ |
✓ marks deltas whose 95% CI excludes zero within that fold. The single-table comparison above (5 models, 2024-25 fold) is the most-recent fold; the other three cutoffs show the same shape. The EWA-aware improvement is not a single-cutoff artifact.
We use a roster-aware recent-usage aggregate, defaulting to each team's last 30 games. Sensitivity checks across 15 / 30 / 45 / 60 games show the EWA signal is strongest in recent windows and fades as older roster usage is included — consistent with roster drift over time. The default of 30 was set as a disciplined mid-window value, not because it dominates any single metric.
| N | EWA Brier | Δ Brier | Δ Log-loss | Δ Margin RMSE |
|---|---|---|---|---|
| 15 | 0.2438 | +2.81% ✓ | +2.11% ✓ | +1.24% ✓ |
| 30 (default) | 0.2449 | +2.39% ✓ | +1.79% ✓ | +0.97% ✓ |
| 45 | 0.2473 | +1.44% ✗ | +1.08% ✗ | +0.62% ✓ |
| 60 | 0.2478 | +1.24% ✗ | +0.93% ✗ | +0.55% ✓ |
✓ marks deltas whose 95% CI excludes zero. The story is robust across recent windows, especially 15 ≤ N ≤ 30: three of four metrics (Brier, log-loss, margin RMSE) are statistically distinguishable. At N = 45 and 60 the aggregate grows stale and only margin RMSE remains significant. We publish at the default window rather than the best-on-test window.
When EWA says a team has a 65% chance to win, do they actually win about 65% of the time? Each dot below is a probability bin from the held-out games — predicted on the x-axis, actual win rate on the y-axis. Perfect calibration is the dashed diagonal. Dot size shows games per bin.
Central bins are the populated ones in this fold (n = 93, 181, 128, 22). Calibration drifts a little at the high end on this 438-game test set — fewer games per bin means more sampling noise. We treat calibration as a property to monitor across runs, not a single number.
The simplest impact stat is raw plus-minus — point differential while a player is on the court. It looks honest and breaks immediately. In recent seasons, players like Payton Pritchard and Luke Kornet have posted higher raw on-court plus-minus than Stephen Curry, Giannis Antetokounmpo, and Luka Dončić. Not because they generate more impact — because they happen to share the floor with stars on winning teams.
Ridge regression with player-level controls is what fixes this. EWA splits credit in a way that controls for teammates and opponents, so a strong rotation player on a great team doesn't inherit his teammates' impact. That's the attribution layer. Shrinkage then ensures small-sample players don't ride a hot streak to the top of the rankings.
Nikola Jokić's rate over the last three seasons is +8.16 EWA per 100 possessions. Decomposed by role, 84% of that comes from assisting — not scoring, not rebounding. His best pair with Jamal Murray adds +1.4 EWA together; strong, but they underperform what you'd expect from stacking their individual numbers. That's the kind of read no box score or single-number metric gives you.
A sequence model trained on play-by-play estimates win probability after every event. The change in win probability across each possession (WPA) is the unit of credit.
A regularized regression splits each possession’s WPA across the ten players on court while controlling for teammates, opponents, and home court. This is the regularized adjusted plus-minus tradition (Sill 2010), with role-aware interactions added on top.
Players with few possessions get pulled toward the population mean by both a count-based shrinkage (count / (count + k)) and an Empirical Bayes step. This is what keeps a 100-possession rookie from showing up next to Jokić on the leaderboard.
For game prediction, per-team EWA aggregates use each team's most recent 30 train games — not a static average across the whole training period. This keeps the predictor honest about mid-season trades and roster turnover.
EWA isn't a new technique. It's an honest reassembly of established methods with a transparent validation harness on top.
Regularized adjusted plus-minus via ridge regression. The base technique behind EWA’s attribution layer.
Possession-level win-probability swings as a credit signal. EWA inherits this framing rather than the raw point-differential one.
Statistical / Box Plus-Minus. Where role and box-stat information enter as priors. EWA’s role-aware interactions are in this tradition.
The two strongest public predictive metrics. EWA borrows their commitment to chronological holdout testing and roster-aware aggregation.
Reading these openly is the price of asking you to trust the rest. Every limitation below is on the roadmap and labeled in our internal validation reports.
The validation code is open and runnable. The numbers above came from scripts/validate_pregame_prediction.py with --recent-games-per-team 30 on a chronological holdout. The window-sensitivity sweep ran via scripts/sweep_recent_games_window.sh. The attribution math lives in unified_scores.py.
scripts/validate_pregame_prediction.pyEngine + ridge attribution: unified_scores.pySweep ridge alpha (2,500 / 5,000 / 7,500 / 10,000) and bootstrap seeds across the 4 rolling-origin folds. Demonstrates the result is not a single-hyperparameter or single-seed artifact.
Replace per-team possession averages with per-player rolling minute estimates. Closes part of the gap to EPM/DARKO's richer minute models.
Counterfactual calculator: "if Player X is out, EWA moves N points." The most direct expression of EWA's player-level attribution and the natural foundation for a paid analytics tier.
Daily-refreshed pregame projections that incorporate the day's active rosters and inactives. Today's harness uses recent training data; the live layer uses recent live data.
Retrain the win-probability model with a strict cutoff before each test window so the WPA labels themselves are leakage-free. The current harness uses the production WP model and discloses that limitation; this closes it.
Plus/minus measures point differential while you're on court. EWA measures how much each possession changed win probability — weighting high-leverage moments more — and then splits credit fairly via ridge regression. Plus/minus conflates your impact with your teammates'.
EWA captures context. A star on a dominant team faces fewer high-leverage possessions because the game state is already stable. The public scores also apply shrinkage, so lower-volume players get pulled toward the middle.
Score artifacts refresh on a daily cadence; the underlying win-probability model is retrained on a slower review cycle. The footer shows the most recent promoted run currently being served.
Market is a fifth column in our validation table — we have multi-season de-vigged moneylines for 1,954 NBA games matched cleanly to game IDs. Across all 4 rolling-origin folds, market accuracy averages ~67.7%; roster-aware EWA averages ~59.2%. The ~8 pp gap is real and reflects information markets have that we don't (sharp action, line movement, real-time injuries). We report it as a benchmark, not a target.
Those are the four most recent NBA seasons where we have both play-by-play data and de-vigged pregame moneylines, and where each fold has a strictly older training set available. The pattern (Brier and log-loss CIs excluding zero, margin RMSE excluding zero in 3/4) holds across every fold tested.
Yes — that's what /predictions is. Every game we predict, you can see what the model said and (after the game) whether it called the winner. Across the four published rolling-origin folds, EWA accuracy averages 59.2%; the de-vigged Vegas market averages 67.7%. EWA beats team-only baselines but doesn't approach the market — Vegas has information we don't (sharp action, line movement, real-time injuries). The page tracks the model's live record so you can see exactly how it's doing.