All of the following video demonstrations were done on the no crash benchmark (dense) on Town 2. We primarily compare the performance of the SAC finetuned agent to the DAgger based agent. Both these agents were trained on Town 1.
Yellow lights : Deciding to stop or pass
Our initial RL expert agent was trained on eight dimensional features. Only one of these eight features describe the traffic light’s state. The feature specifies the distance to red light if it is near enough. If the light is yellow or green or too far, the feature is set to 1. Thus, we’d expect our final agent to not know anything about yellow lights. However, we see in video 1, at around 0:40, that the car comes to a complete stop for a yellow light.
In contrast, in video 2, around 0:25, we see the opposite; the car speeds through the yellow light to take a left turn. This is very interesting learned behavior exhibited by the SAC fine-tuned model. Since the RL agent seeks to maximize it’s reward while avoiding penalties for red light violations, it learns this behavior. If we only did behavior cloning we’d expect our agent to never stop for yellow lights as it would only notice red lights, which were part of the engineered state space.
Improving over the purely behavior cloned policy
The SAC finetuned agent outperforms the DAgger agent by 24 points in the no-crash dense benchmark. This is also reflected qualitatively in the results. We see that the behavior cloned agent often crashes into vehicles or other objects. This is demonstrated by the short video number 3 where the car initial recognizes another car and stops for it but then slowly edges forward to crash into it. We see that the finetuned agent drives much more smoothly and crashes less often.
Failure Case: Representation Error
We noticed that sometimes our behavior cloned agent stops for no reason. This behavior was observed in 1 out of 25 test scenarios. We think that the car stops because it sees red lights/other cars where there aren’t. Even so, it eventually manages to complete the episode.
Since we are freezing the conv layers for our SAC fine-tuning as of now, we also see this behavior in the SAC finetuned agent. This agent also eventually completes its episode.
This is not a major concern as of now because this can probably be fixed by unfreezing the conv layers and finetuning further.
All 25 test episodes : YouTube Playlist
Finally, given below is a playlist with all the 25 episodes from the best seed of the no crash dense benchmark on the SAC finetuned agent. There is only one failure (ep 24) which happens due to gridlock.