The table below compares the accuracy on balanced test set for different methods


Visualizations

Ablation Studies

The ablation study examines the effect of different choices of ( k ) (the target subspace dimension) on performance. The results are presented in Figure 2., where each column corresponds to a dataset: CelebA-G&B-0%, CelebA-H&E-0%, and Dogs&Cats-Fur-0%. The blue line displays performance on class-balanced samples, while the red line shows performance on bias-conflicting samples. The yellow bar in each subplot represents the distribution of singular values in the Pairwise Difference Matrix.
From the figure, it is evident that the choice of ( k ) significantly impacts debiasing performance. Specifically, for bias-conflicting samples, the success of debiasing, measured by bias-conflicting accuracy gradually declines as ( k ) increases. This occurs because a higher ( k ) retains more nuisance attributes, and NMS fails to marginalize these attributes. However, for class-balanced samples, performance does not always decrease as ( k ) increases. For instance, in CelebA-G&B, accuracy steadily declines with increasing ( k ), whereas in CelebA-H&E, it gradually improves. In Dogs&Cats-Fur, accuracy initially rises before dropping. This suggests that while increasing ( k ) can introduce more bias, in some cases, this added bias may improve overall accuracy by enhancing performance on bias-aligned samples, depending on the dataset’s characteristics.
