Introduction

Motivation

Endoscopic videos are critical tools in minimally invasive medical procedures, enabling physicians to visualize internal organs in real time. However, these videos often suffer from visual artifacts, such as specular highlights, blur, bubbles, and instrument shadows, that can obscure critical anatomical features. Such artifacts not only hinder clinical interpretation but also pose challenges for downstream tasks such as computer-aided diagnosis, robotic navigation, and automated video analysis.

These distortions can significantly affect the accuracy of diagnosis, training, and surgical planning, especially in high-stakes environments like bronchoscopy and urology. For instance, J&J’s MONARCH™ platforms rely heavily on high-quality visual data to enable precise robotic navigation and support various computer vision pipelines.

To address this, our project focuses on enhancing endoscopic video quality using generative AI techniques. We explore state-of-the-art models for artifact detection, segmentation, and removal, aiming to restore image fidelity while preserving anatomical integrity.

Technical Challenges

Improving video quality in real-time and at high resolution introduces several challenges:

  • Artifact diversity – Variability in lighting, anatomy, and motion causes artifacts to appear differently across cases.
  • Lack of clean ground truth – Collecting paired clean/corrupted endoscopic data is difficult.
  • Balancing clarity with fidelity – Removing artifacts without damaging anatomical integrity is non-trivial.

Artifact Types

Our dataset and model target the following common artifact types:

  • Specular highlights – Bright light reflections that mask surface detail
  • Blur – Due to motion or focus errors during camera manipulation
  • Fragments – Tissue debris or tools partially occluding the view
  • Under-exposure – Dark regions that obscure anatomy
  • Bubbles – Introduced during fluid irrigation in endoscopy

Project Goals

  • Develop a high-fidelity model for artifact removal in endoscopic images.
  • Achieve real-time processing performance for clinical applicability.
  • Reconstruct artifact-free frames into smooth, temporally consistent video.