Method

Figure 1: Method Overview

Our method begins by defining the target attribute through the construction of two sets of diverse prompts that describe the attribute in varying contexts. These sentences are then encoded using the CLIP text encoder to obtain the corresponding embeddings, as illustrated in Figure 2.

Figure 2: Discovering Target and Nuisance subspaces

Leveraging the discovered subspaces, we decompose each image into target and nuisance variables as shown in Figure 3.

Figure 3: Debiasing using discovered Target and Nuisance Spaces

To construct an unbiased embedding set of an image dataset, for each class, we uniformly sample embeddings. This process effectively marginalizes the nuisance attributes by decoupling the selection of target and nuisance attributes, ensuring that the context information is sampled independently of the target variable. We construct a pairwise difference matrix and perform Singular Value Decomposition on that, to obtain top k features, which forms a subspace useful for classification and the orthogonal subspace forms the bias subspace.