Reinforcement Learning for Noise Steering in Diffusion-Based Driving Models

Student: Junhong Zhou

Advised by Prof. Katia Sycara and Dr. Yaqi Xie

Building exterior in Toronto, Canada

Drivers in the real world exhibit a wide spectrum of behaviors—from calm, conservative cruising to sudden lane changes and aggressive acceleration. These differences are not only personal; they are strongly shaped by local driving cultures, traffic density, and city-specific norms. For example, driving in many Chinese cities often involves dense traffic abd more frequent merges, while driving in typical American cities is generally more regulated and spacious.

Such variations create distinct trajectory preferences, meaning that a model trained on one region or style may behave poorly when deployed in another. Additionally, visual domain shifts (different cities, lighting, road geometry, or camera viewpoints) further amplify this mismatch.

Traditional diffusion-based driving models assume a single “universal” driving behavior. However, this assumption breaks when adapting to new environments—often requiring expensive full-model retraining or access to large quantities of region-specific data.

Our goal is to develop a diffusion-based driving policy that can flexibly adjust its behavior—smooth, assertive, cautious, or aggressive—based on the scenario, without modifying the diffusion model itself. By learning to steer the latent noise instead of retraining the entire model, we aim to build a driving agent capable of style-adaptive, environment-aware behavior while maintaining robustness and sample efficiency.

Key Words for our project


Our project introduces a reinforcement-learning–based noise-steering approach that controls the behavior of diffusion driving models without modifying diffusion weights, enabling adaptive, style-aware, and efficient autonomous driving.

Autonomous Driving

Building reliable, adaptive driving systems that can handle diverse road conditions and human behaviors.

Diffusion Policy

A generative policy framework that produces smooth, multi-mode action sequences through iterative denoising..

Reinforcement Learning

Learning to make better decisions over time by optimizing rewards from driving performance and safety.

Noise Steering

Controlling the latent noise of diffusion models to shape driving behaviors without modifying model weights.

DSRL / Latent Actor-Critic

An efficient RL method operating directly in latent space, enabling stable noise selection and fine-grained control.

Driving Style Adaptation

Adapting the driving policy to calm, aggressive, or city-specific styles without retraining the full diffusion model.

Key insights of our project



We shift reinforcement learning from the action space to the latent noise space, enabling efficient, stable, and lightweight fine-tuning without modifying the pretrained diffusion-based driving model weights.

Reinforcement Learning can help

  • No deep-nested backprogation
  • Lower computation
  • Better performance
Tourist taking photo of a building
DSRL [1]

From Action Space to Noise Space

  • No nested backprop through diffusion
  • No modification to diffusion weights
  • Lightweight latent-space policy only


“Driving the world

toward Physical AI.

Junhong Zhou

MSCV 25′ @ CMU RI