How the model works

Plain-English transparency on what predicts every fixture you see.

The pipeline at a glance

Each prediction goes through three stages. The first two produce probabilities; the third nudges a small subset of fixtures based on a league-cultural signal.

  1. v0.1 — Dixon-Coles base. A statistical model that estimates each team's attack strength and defence weakness from historical scorelines, weighted toward recent matches.
  2. v0.2 — XGBoost stacker. A gradient-boosted model that takes v0.1's output as a starting point, then adjusts it using ~77 context features for that specific fixture. Two cohort-specific stackers:
    • top cohort — Premier League, Bundesliga, Serie A, La Liga, Ligue 1
    • english cohort — Championship, League One, League Two
    Splitting prevents the noisier lower English tiers from dragging down the top European leagues.
  3. v0.3 — draw post-processor. Only fires on coin-flip fixtures (where two outcomes are within ~8% of each other) in the defensive/parity leagues — La Liga, Serie A, Ligue 1, League One. Uses recent form, blown-lead rate, and league position to decide whether the fixture is genuinely draw-prone.
  4. v0.4 — cross-season residual features. Same architecture as v0.3, stacker retrained with 10 extra features that capture rolling 10-match attack and defensive xG residuals (over- or under- finishing relative to xG, including hot-keeper signals). The window backfills from prior season when current-season match count is below 10, so early-season cold starts use last-season's tail. Captures the Bournemouth/Chelsea finishing-quality phenomenon that Elo can't see on its own.

Important features (plain English)

XGBoost ranks features by how much they reduce error. Here are the ones that drive the most lift, translated into football terms.

Elo rating differential
The classic chess-style rating, adapted for football: how much stronger one team is than the other right now. Updates after every match. Includes home advantage (~80 Elo points). Top driver across leagues.
Expected goals (xG) — last 5 / 10 matches
The quality of chances each team has been creating and conceding lately. Better than goals because it strips out luck — a team can lose 3-0 having generated more xG than the opponent.
Form points (last 5 / 10)
Total points won recently (3 win, 1 draw, 0 loss). Standard but rolling windows of 5 and 10 give the model both short and medium-term pictures.
Form trajectory (5 vs 10)
The direction of travel — is this team accelerating up or sliding down? Computed as last 5 minus matches 6-10. Captures Bournemouth-style mid-season turnarounds.
Key player absences
How many of the team's regular starters are unavailable, weighted by how recently they were starting. Time-correctly snapshotted so we don't peek at injury news that came after kickoff.
Manager tenure
How long the current head coach has been in charge. Brand-new coaches dampen the model's confidence in form features (uncertainty principle).
Lineup strength
When the actual starting XI is available pre-match, we score it against each player's season-long performance. Strong XI vs weak XI shifts the probability accordingly.
League position context
Top-6, mid-table, or bottom-6 — used by the draw post-processor in PD/SA/FL1 where bottom-6 teams that have been blowing leads are statistically more draw-prone.
Bottle coefficient
A team's tendency to blow late leads. Computed from goal-by-goal timing data across recent matches. Currently used only by the post-processor — bench tested inside v0.2 itself, where it didn't add value.
Attack residual (last 10, cross-season) v0.4
Σ (goals scored − xG) averaged over the team's last 10 matches, including carry-over from prior season when current count is below 10. Positive means a team has been over-finishing recently (Bournemouth-type clinical streaks); negative means under-finishing (Chelsea-type wasteful spells). Catches what Elo and goal-totals miss.
Defensive residual (last 10, cross-season) v0.4
Σ (xG against − goals conceded), same window. Positive means the team has been conceding fewer goals than xG suggests (good defending or hot keeper); negative means leakier than xG implies. Spots persistent goalkeeper form and last-ditch-defending streaks.
Bookmaker odds (when available)
Used for the "spicy pick" — finding longer-priced outcomes the model rates higher than the market. Not used by the main outcome prediction.

How we measure ourselves

What we deliberately don't do

Model changelog & transparency

Each fixture in the history page carries the model variant that produced its pick (look for the (i) hover popup). Once a fixture kicks off, its prediction is locked — future model upgrades never retroactively change historical picks.

v0.6 — current production (May 2026)

What changed: Same v0.5 hybrid architecture (XGBoost + LogisticRegression blend) plus 10 nothing-to-play-for binary flags per fixture and a continuous season_phase axis. The flags fire when a team is mathematically locked into title-won, relegated, auto-promoted, playoff-locked-no-auto, or safe-no-climb — encoding end-of-season motivation that bookmakers often misprice.

v0.5 — hybrid stacker (May 6 2026 → May 7 2026)

What changed: Replaced the single XGBoost stacker with an XGBoost + LogisticRegression hybrid blended by live league-relative dominance (ppm_z). LR extrapolates linearly past XGB's response-curve plateau — rescues the model on extreme fixtures (Bayern-class dominance) where trees compress predictions.

v0.4 — cross-season residual stacker (May 2026)

What changed: Same 2-stage architecture as v0.3 (DC base + XGB stacker + draw post-processor rule). Stacker retrained with 10 extra cross-season residual features — rolling 10-match attack residual (goals − xG) and defensive residual (xGA − goals against), per team, with prior-season backfill when current count is below 10.

v0.3 — production May 2026 (briefly)

What changed: Cohort-split XGBoost stackers (top vs english) replaced a single 8-league stacker, plus a league-targeted draw post-processor.

v0.2 — single-cohort stacker (deprecated same-day)

What changed: First stacker ship — one XGBoost model on top of v0.1, trained across all 8 leagues with one shared model.

v0.1 — Dixon-Coles base (Feb 2026 → May 2026)

What it did: Per-league Dixon-Coles fit on attack/defence ratings, blended with a cross-league XGBoost on rolling features (60/40), then wrapped with a multinomial logistic calibration on prior backtest pairs.

What we deliberately tried and rejected

What's next

Two-stage stack inspired by Dixon-Coles (1997) plus modern gradient boosting.