Machine Learning · Formula 1

F1 Race Predictor

What it does

Prediction pipeline

Data ingestion — Qualifying results (OpenF1), historical race outcomes 2014–2024 (FastF1), circuit data, and live weather (Open-Meteo).

Feature engineering — Driver form metrics, team performance trends, grid position advantage, and weather-adjusted pace estimates.

Ensemble inference — Three XGBoost binary classifiers (one per podium position) combined with a PyTorch MLP with driver and team embeddings, stacked via logistic regression.

Probability calibration — Isotonic regression calibration for reliable probability outputs across all podium positions.

Dashboard delivery — Results served via a Streamlit web app with race-by-race probability breakdowns.

Model architecture

XGBoost layer

3 binary classifiers (P1, P2, P3) trained independently on engineered features

PyTorch layer

MLP with learnable driver and constructor embeddings — captures latent team/driver identity

Stacking

Logistic regression meta-learner combines XGBoost and PyTorch outputs

Calibration

Isotonic regression post-processing for well-calibrated probability estimates

Data sources

FastF1Historical race results, lap times and telemetry (2014–2024 seasons)
OpenF1Live qualifying data for upcoming races
Open-MeteoCircuit weather conditions — temperature, rain probability, wind speed
PRAW + RoBERTaOptional: Reddit sentiment analysis on driver and team news pre-race

Evaluation

Brier score Log loss ROC-AUC Winner accuracy Podium overlap