What it does
Prediction pipeline
1
Data ingestion — Qualifying results (OpenF1), historical race outcomes 2014–2024 (FastF1), circuit data, and live weather (Open-Meteo).
2
Feature engineering — Driver form metrics, team performance trends, grid position advantage, and weather-adjusted pace estimates.
3
Ensemble inference — Three XGBoost binary classifiers (one per podium position) combined with a PyTorch MLP with driver and team embeddings, stacked via logistic regression.
4
Probability calibration — Isotonic regression calibration for reliable probability outputs across all podium positions.
5
Dashboard delivery — Results served via a Streamlit web app with race-by-race probability breakdowns.
Model architecture
XGBoost layer
3 binary classifiers (P1, P2, P3) trained independently on engineered features
PyTorch layer
MLP with learnable driver and constructor embeddings — captures latent team/driver identity
Stacking
Logistic regression meta-learner combines XGBoost and PyTorch outputs
Calibration
Isotonic regression post-processing for well-calibrated probability estimates
Data sources
- FastF1Historical race results, lap times and telemetry (2014–2024 seasons)
- OpenF1Live qualifying data for upcoming races
- Open-MeteoCircuit weather conditions — temperature, rain probability, wind speed
- PRAW + RoBERTaOptional: Reddit sentiment analysis on driver and team news pre-race
Evaluation
Brier score
Log loss
ROC-AUC
Winner accuracy
Podium overlap