Experiments

Benchmark

NAVSIM Benchmark [5]

Non-Reactive AV Simulation

NAVSIM simulates the ego vehicle’s future motion using kinematic equations, without modeling reactions from surrounding agents. This makes the evaluation stable, deterministic, and efficient while still reflecting realistic vehicle dynamics.


NuPlan-Based Data

The scenarios in NAVSIM are constructed from the NuPlan real-world driving dataset, which provides rich information about surrounding traffic participants, road topology, and environmental context. This ensures diverse and realistic inputs for evaluating driving policies.


Scenario Filtering

To focus on meaningful decision-making, NAVSIM filters out uninformative or trivial scenes, such as fully static environments or cases where all vehicles move steadily in straight lines. Only challenging, interaction-heavy scenarios are kept for evaluation.

Metric

PDMS=(m{NC,DAC}scorem)penalties×(w{EP,TTC,C}weightw×scoreww{EP,TTC,C}weightw)weighted average \text{PDMS} = \underbrace{\left( \prod_{m \in \{\text{NC}, \text{DAC}\}} \text{score}_m \right)}_{\text{penalties}} \;\times\; \underbrace{ \left( \frac{ \sum_{w \in \{\text{EP},\text{TTC},\text{C}\}} \text{weight}_w \times \text{score}_w }{ \sum_{w \in \{\text{EP},\text{TTC},\text{C}\}} \text{weight}_w } \right) }_{\text{weighted average}}

PDMS Score (PDM Score)

The PDMS score is a comprehensive evaluation metric used in NAVSIM to measure overall driving performance across safety, comfort, and efficiency.

It combines two components:

  1. Penalties: A multiplicative penalty based on No Collision (NC) and Drivable-Area Compliance (DAC). Any collision or off-road behavior sharply reduces the score.
  2. Weighted Average: A weighted combination of Ego Progress (EP), Time-To-Collision (TTC), and Comfort (C) that reflects driving quality.

The final score ranges from 0 to 100, with higher values indicating safer and better-quality driving.

Quantitative


Our method achieves consistent improvements across all key metrics, including NC, DAC, TTC, and the overall PDMS score. However, the performance gains are relatively modest. After analyzing the dataset, we found that most human driving trajectories in NAVSIM already score very high, meaning the reward signals are saturated and leave limited room for improvement. As a result, even strong policy enhancements translate into only small numerical gains within this benchmark.

Qualitative

Our method demonstrates clear behavioral improvements across different real-world driving scenarios:

  • Smooth right turn maneuver (left): The controlled noise policy produces a noticeably smoother and more stable right-turn trajectory compared to the baseline, reducing unnecessary oscillations and aligning more closely with human-like driving.
  • Avoiding a potential collision (middle): In challenging urban traffic, our approach steers the vehicle away from an impending conflict with another agent. By selecting safer latent-noise modes, the model produces a more conservative trajectory that avoids collision while staying on the drivable area.
  • Bias inherited from incorrect human driving (right): Some trajectories still reflect biases present in the training data—for example, when human demonstrations slightly cross the lane boundary, the model may follow this behavior. This highlights the limitations of relying solely on high-reward but imperfect demonstrations in NAVSIM.