Leon Górecki Aerospace / Mechanical Engineering

BEng Thesis: PPO vs Robust MPC on a Simulated Roomba

Controller benchmark in simulation: trajectory tracking, obstacle avoidance, and robustness under noise and disturbances.

BEng thesis — Warsaw University of Technology, Nov 2025. Supervisor: Prof. Marcin Żugaj, DSc, Eng. Source: Chapters 5–6.

Project thumbnail

What this thesis is about

A simulated ground robot navigates cluttered indoor environments, following a reference path to a goal while avoiding obstacles. The thesis asks one question: when GPS positioning is jammed mid-run, which controller holds up better — Tube Robust MPC, with formal safety guarantees, or PPO, a neural network trained through reinforcement learning? Both were benchmarked head-to-head across 500 identical environments under nominal GPS and GNSS-denied conditions.

Controllers at a glance

RMPC
Tube Robust MPC
  • Predicts future states using an explicit dynamics model
  • Tube formulation bounds the effect of noise — hard constraints guaranteed
  • Conservative by design; higher online compute cost
PPO
Proximal Policy Optimisation (PPO)
  • Policy learned through simulated trial-and-error — no explicit model
  • Single neural-network forward pass; 63× faster online than RMPC
  • No formal stability or obstacle-avoidance guarantees

In action — GNSS-denied run, environment 044

Same environment, same conditions. Watch how each controller handles the denial zone.

RMPC
PPO

Experiment at a glance

500
environments evaluated
2
controllers compared
2
operating conditions
0.0%
RMPC collision rate
Safety highlight

RMPC recorded zero collisions in both nominal and GNSS-denied conditions. PPO collided in 3.1% of nominal runs and 4.7% under GPS denial — a direct result of operating without hard obstacle-avoidance constraints.

Results Visualised

Success Rate

NOMINAL GNSS-DENIED RMPC PPO RMPC PPO 89.8% 93.6% 85.2% 91.3% 0% 25% 50% 75% 100%

Cross-Track Error — lower is better

NOMINAL GNSS-DENIED RMPC PPO RMPC PPO 0.266 m 0.731 m 0.325 m 0.852 m 0 0.3 m 0.6 m 0.9 m
RMPC
PPO
Full opacity = nominal  ·  Faded = GNSS-denied

Raw Results (Thesis Table, N = 500)

Metric RMPC (nominal) PPO (nominal) RMPC (denial) PPO (denial)
Success [%] 89.893.685.291.3
Collision [%] 0.03.10.04.7
Timeout [%] 10.23.314.84.0
XTE_nom / XTE_mean [m] 0.26640.73120.32450.8515
XTE_in [m] ----0.32090.2099
XTE_out [m] ----0.32760.8667

GNSS Denial: Performance Impact

RMPC
Tracking degradation (nominal → denied)
+21.8%
0.266 → 0.325 m mean XTE
RMPC
Denial-zone consistency (inside vs. outside)
−0.95%
XTE_in 0.321 m ≈ XTE_out 0.328 m
PPO
Tracking degradation (nominal → denied)
+16.5%
0.731 → 0.852 m mean XTE
PPO
Denial-zone consistency (inside vs. outside)
−75.8%
XTE_in 0.210 m vs XTE_out 0.867 m — spike after denial

P_deg-nom compares tracking error under denial vs. nominal. P_deg-InOut isolates whether degradation occurs inside the denial zone or outside it. RMPC's −0.95% means it degrades uniformly throughout. PPO's −75.8% reveals it tracks well inside the denial zone but accumulates large error once outside — suggesting position drift that compounds after denial ends.

Study Setup

  • Controllers: Tube RMPC and PPO.
  • Evaluation dataset size: N = 500 environments.
  • Same worlds used for both controllers and both denial configurations.
  • Conditions: nominal and GNSS-denied operation.

Key Metric Definition

Tracking quality is measured with Cross-Track Error (XTE), defined as the Euclidean distance from robot position to the nearest reference waypoint with monotonic index progression along the path.

Cost of Deployment

Rollout speed measures how fast each controller issues actions during operation. PPO's advantage (63× faster) comes from a single neural-network forward pass, versus RMPC solving an online optimisation problem at each timestep.

Metric Tube RMPC PPO
Implementation time [man-hours] 24120
Training/tuning duration [h] 2250
Rollout speed [actions/s] 20.21270

Verdict

Safety & Predictability

RMPC is the clear winner where hard constraints matter. Zero collisions across 1 000 runs, predictable degradation under denial, and formal guarantees on obstacle avoidance make it the right choice for safety-critical applications — at the cost of slower online execution and a more complex implementation.

Flexibility & Online Speed

PPO matched or exceeded RMPC on raw success rate and runs 63× faster online. Its policy adapts naturally without an explicit model — but the 3–5% collision rate and lack of formal guarantees disqualify it from safety-critical use as-is.

Takeaway

No universal winner. RMPC is the right choice when safety is non-negotiable; PPO when throughput and adaptability matter more than constraint guarantees. The real value of this thesis is the direct, controlled comparison under identical conditions — a practical baseline for future controller design decisions.