In the task of street scene segmentation, getting the annotation for high resolution images has become an issue. For a 1000*1000 image, a million pixels need to be annotated in segmentation mask, which is a really heavy labor. In this capstone project, we exploit synthetic data in place of real data for this task. Synthetic data can be generated from game engine automatically and has very precise annotations. But synthetic data has some domain gap with real data, which prevents model to generalize well. Therefore, our goal is to make models generalize better from synthetic to real. We have formulated two problem settings for this goal.
The first problem setting is unsupervised domain adaptation. In this problem, we can access labeled synthetic data and unlabeled real data, and may exploit unlabeled real data to help the model generalize well. We propose strategic curriculum for this problem setting. Our approach tries to generate fake labels for unlabeled data by some curriculum during training and makes the network learn the domain information from unlabeled data. Our approach reaches state of the art in GTA->Cityscapes domain adaptation task.
The second problem setting is domain robust training. In this problem, we can only access labeled synthetic data and don’t know anything about real data. We propose semantic input masking as a data augmentation technique for this problem. Our approach forces the network to learn independence to color and texture during training by randomly masking out part of the input images. With this intuitive technique, we also get good results on various datasets.