Datasets
The REFUGE (Retinal Fundus Glaucoma Challenge) [1] dataset is a publicly available benchmark designed for the development and evaluation of automated methods for optic disc and optic cup segmentation, which are key to glaucoma screening.
It contains 400 color fundus images, collected from both healthy individuals and glaucoma patients, with diverse imaging conditions. Each image is accompanied by pixel-level expert annotations of the optic disc and cup regions. The dataset is split into training, validation, and test sets.
In this study, we primarily tested our methods on the validation set of the REFUGE dataset.
Additionally, we aim to use 9 more datasets from the MedSegBench [2] Benchmark. This includes: KvasirMSBench, Isic2018MSBench, PolypGenMSBench, UltrasoundNerveMsBench, BusiMSBench, USforKidneyMSBench, Isic2016MSBench, Promise12MSBench, UWSkinCancerMSBench. This covers different modalities to account for the generalizability of our method. After performing evaluations on these methods, we will know how robust our method is across different medical datasets. Then, we can work on further improvements.
Segmentation Masks
Below is an example of the ground truth label of the optic disc in the validation set of REFUGE. We preprocess the ground truth label from white background and grey foreground to black background and white foreground to match with out mask prediction outputs.

Here, we display some results of our segmented masks from SAM for comparison.

Together, we calculate the performance of the segmentation method using the Dice score, or the Dice Similarity Coefficient. The image below shows how we visualized the overlap between the prediction mask and processed ground truth for better understanding.

Dice Similarity Coefficient
For the performance measurement, we use the Dice Similarity Coefficient. Specifically, in our case, we visualize the ground truth, mask prediction, and overlap in three different colors. The yellow part of the image below is the overlap between the ground truth and prediction, the green part is the ground truth only, and the red part is the mask prediction only.

Below is the formula for the Dice Similarity Coefficient. We can replace X with the prediction mask and Y with the ground truth. The maximum score is 1 for a perfect overlap, whereas the lowest score we can get is 0 for no overlap.
Dice Similarity Coefficient [3]

Results

The image above displays some results with 1. SAM HQ: the segmentation result with bounding boxes only; 2. SAM HQ Points: the segmentation result with bounding boxes and point prompts; and 3. SAM HQ Points Contrast: the segmentation result with bounding boxes, point prompts, and contrast enhancement applied. We see that the last one, box prompts + point prompts + contrast augmentation has the best results.
This is the evaluation on the validation set of REFUGE. We further need to perform examination on the test set of REFUGE, as well as other datasets to analyze the performance of our method.
Citations
[1] Orlando, J. I., Fu, H., Breda, J. B., Van Keer, K., Bathula, D. R., Zheng, Y., … & Trucco, E. (2020). REFUGE Challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Medical Image Analysis, 59, 101570. https://doi.org/10.1016/j.media.2019.101570
[2] Kuş, Z., & Aydin, M. (2024). MedSegBench: A comprehensive benchmark for medical image segmentation in diverse data modalities. Scientific Data, 11, 1283. https://doi.org/10.1038/s41597-024-04159-2
[3] Swerdlow, M., Guler, Ö., Yaakov, R., & Armstrong, D. G. (2023). Simultaneous segmentation and classification of pressure injury image data using Mask-R-CNN. Computational and Mathematical Methods in Medicine, 2023, Article ID 3858997. https://doi.org/10.1155/2023/3858997
