Leon Górecki Aerospace / Mechanical Engineering
Machine Learning · Formula 1

F1 Race Predictor

F1 Predictor project thumbnail

What it does

Prediction pipeline

1
Data ingestion — Qualifying results (OpenF1), historical race outcomes 2014–2024 (FastF1), circuit data, and live weather (Open-Meteo).
2
Feature engineering — Driver form metrics, team performance trends, grid position advantage, and weather-adjusted pace estimates.
3
Ensemble inference — Three XGBoost binary classifiers (one per podium position) combined with a PyTorch MLP with driver and team embeddings, stacked via logistic regression.
4
Probability calibration — Isotonic regression calibration for reliable probability outputs across all podium positions.
5
Dashboard delivery — Results served via a Streamlit web app with race-by-race probability breakdowns.

Model architecture

XGBoost layer
3 binary classifiers (P1, P2, P3) trained independently on engineered features
PyTorch layer
MLP with learnable driver and constructor embeddings — captures latent team/driver identity
Stacking
Logistic regression meta-learner combines XGBoost and PyTorch outputs
Calibration
Isotonic regression post-processing for well-calibrated probability estimates

Data sources

  • FastF1Historical race results, lap times and telemetry (2014–2024 seasons)
  • OpenF1Live qualifying data for upcoming races
  • Open-MeteoCircuit weather conditions — temperature, rain probability, wind speed
  • PRAW + RoBERTaOptional: Reddit sentiment analysis on driver and team news pre-race

Evaluation

Brier score Log loss ROC-AUC Winner accuracy Podium overlap