TPSeNCE: Towards Artifact-Free Realistic Rain Generation

Outline

  1. Abstract
  2. Challenges
  3. Problems
  4. Solutions
  5. Workflows
  6. Formulas
  7. Experiments
  8. Future Works
  9. Resources
  10. References
  11. Teams

Student: Shen Zheng. Advisor: Srinivasa Narasimhan

Abstract

Rain generation algorithms have the potential to improve the generalization of deraining methods and for better scene understanding in rainy conditions. However, in practice, they produce distortions and artifacts and struggle to control the amount of rain generated due to a lack of proper constraints. To address this issue, in this paper, we propose an unpaired image-to-image translation framework for generating realistic rainy images. We first introduce a Triangular Probability Similarity (TPS) loss to guide the generated images toward clear and rainy images in the discriminator manifold, thereby minimizing artifacts and distortions during rain generation. Unlike the conventional contrastive approaches, which indiscriminately push negative samples away from the anchors, we propose a Semantic Noise Contrastive Estimation (SeNCE) strategy and reassess the pushing force of negative samples based on the semantic similarity between the clear and the rainy images and the feature similarity between the anchor and the negative samples, thereby optimizing the amounts of generated rain. Extensive experiments on real-world rainy images demonstrate that the proposed methods generate realistic rainy images with minimal artifacts, which benefits image deraining and object detection in the rain.

Challenges

Rain is a common bad weather condition that can significantly impair the quality of images and videos. Rain streaks, especially during heavy rain, can obscure background details and textures. Raindrops can create a layer of water droplets on glass or windshields, leading to distortion and blurriness of object appearance. Wet roads are very shiny creating object reflections. Additionally, rain mist can scatter ambient light, reducing the visibility of distant objects.

Problems

Current rain generation methods often produce artifacts and distortions while generating rain due to the lack of proper constraints. Additionally, controlling the amount of rain produced is challenging, as generating too much rain can lead to overlapping of the background and feature loss while generating too little rain results in an unrealistic-looking image.

Solutions

  • We present an unpaired image-to-image translation framework for generating realistic rainy images
  • We introduce a Triangular Probability Similarity (TPS) loss to minimize the artifacts and distortions during rain generation.
  • We propose a Semantic Noise Contrastive Estimation (SeNCE) strategy to optimize the amounts of generated rain.
  • Extensive experiments demonstrate the effectiveness of the proposed modules qualitatively and quantitatively.

Workflows

We observe that the generated rainy images with fewer artifacts and distortions are closer to the line segment connecting the clear and rainy images. Based on this observation, we propose a Triangular Probability Similarity (TPS) loss to guide the generated rainy images towards the real and clear images, minimizing the artifacts and distortions. We then revisit the contrastive learning strategy of CUT and find that we can control the amount of rain we generate by regulating the pushing force of contrastive learning. To fully consider the clear, rainy, and generated images, we propose a Semantic Noise Contrastive Estimation strategy (SeNCE). This strategy reweights the pushing force of negative pairs based on the similarity between the negatives and anchor, and the mean Pixel Accuracy (mPA) between the semantic segmentation map of the clear and rainy images.

Formulas

Suppose X is the clear image, Y is the rainy image, and Z is the generated rainy image.

1. Triangular Probability Similarity (TPS)

2. Semantic Noise Contrastive Estimation (SeNCE)

Where w_{ij} = softmax(F(i,j)/beta), and

Experiments: Setups

  1. Dataset: BDD100K, INIT
  2. Baselines: UNIT, MUNIT, CUT. QS-Attn, MoNCE
  3. Evaluation Metrics: FID, KID, MMD, ED, mAP, User Study

Experiments: Histogram Analysis

IoU histogram for model pretrained on clear images and model finetuned on our generated rainy images

Experiments: Quantitative Comparisons

1. Rain Generation

2. Image Deraining

3. Object Detection

Experiments: Qualitative Comparisons (Images)

1. Rain Generation (top), Image Deraining (middle), and Object Detection (bottom)

2. Object Detection on BDD (red bboxes denote the failure cases).

3. Object Detection on our video frames (red bboxes denote the failure cases)

Experiments: Qualitative Comparison (Videos)

  1. Rain Generation Video
  2. Object Detection Video
  3. Another Rain Generation Video
  4. Another Object Detection Video

Future Works

  1. Nightime heavy rain
  2. Haze and snow generation
  3. Physic-based methods
  4. Diffusion models

Resources

Spring 2023: Presentation Slides

References

[1] Xueqi Hu, Xinyue Zhou, Qiusheng Huang, Zhengyi Shi, Li Sun, and Qingli Li. Qs-attn: Query-selected attention for contrastive learning in i2i translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pages 18291–18300, 2022.
[2] Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. Multimodal unsupervised image-to-image translation. In Proceedings of the European conference on computer vision (ECCV), pages 172–189, 2018.
[3] Ming-Yu Liu, Thomas Breuel, and Jan Kautz. Unsupervised image-to-image translation networks. Advances in neural information processing systems, 30, 2017.
[4] Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. Contrastive learning for unpaired image-to-image translation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings,
Part IX 16, pages 319–345. Springer, 2020.
[5] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
[6] Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, and Thomas S Huang. Towards instance-level image-toimage translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3683–3692, 2019.
[7] Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2636–2645, 2020.
[8] Fangneng Zhan, Jiahui Zhang, Yingchen Yu, Rongliang Wu, and Shijian Lu. Modulated contrast for versatile image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18280–18290,
2022.

Teams

Shen Zheng is a MSCV student at CMU. He is currently working with Dr. Srinivasa Narasimhan on autonomous driving in bad weather and sub-optimal illuminations using image-to-image translation and image restoration. Shen Zheng completed his bachelor’s degree in mathematical sciences from Wenzhou-Kean University. His research interest spans efficient network training, image restoration and enhancement, adversariallearning, and unsupervised domain adaptation.

Dr. Srinivasa Narasimhan is a computer science and robotics researcher and professor at Carnegie Mellon University. He is known for his contributions in the fields of computer vision, robotics, and graphics. His research focuses on the physics of computer vision and computer graphics. His projects highlight three main aspects – the mathematical modeling of the interactions of light with materials and the atmosphere; the design of novel cameras and programmable lighting; and the development of algorithms for rendering and interpreting scene appearance. His research is motivated by applications in a wide range of fields including robotics, intelligent transportation, digital entertainment, remote sensing, underwater imaging and medical imaging.

Got any book recommendations?