In this section, we display the debiasing results using our proposed method, which demonstrate the superiority of our method.
Debiasing single attribute
We first show the debias results on single attributes such as gender and eyeglasses. We visualize both the distribution variation and qualitative results.

From Figure 1, we can clearly observe the gender distribution of sampled images for stable diffusion model changes from 60% : 40% to 50% : 50%. As for the qualitative results, the perceived gender has been changed from man to woman for the image in the red frame.

In order to better demonstrate the availability of our method, we further conduct the distribution control experiments for the training dataset to see whether the distribution of predicted images will change correspondingly. From Figure 2, we can observe we set the gender distribution of training data as different ratio, the corresponding predicted gender ratio is roughly the same. This indicates that the distribution of sampled images can roughly match that in training dataset across the debiased attribute.

We show our debias results on Figure 3. From it, we can observe the changes of distribution before and after debias. Though it is still not perfect (50% each), it largely ease the imbalanced situation. From the qualitative results, we have seen many people wear eyeglasses.
Debiasing mutiple attributes
As mentioned in Method section, our first method does not work very well for debiasing multiple attributes simultaneously. Therefore, we come up with the method for debiasing multiple attributes and we display our results below:


From Figure 4, we can clearly observe that after debiasing, the distribution of generated images across 4 subclasses are rougly the same. From Figure 5, we can observe that by using 4 debiased text embedding to sample images from stable diffusion, we can nearly obtain the accurate results. For better understanding the principle of our method, we leverage tSNE to visualize the image and text features.

We visualize the tSNE features for both the image and text embeddings in Figure 6. From the left part, we can observe that the sampled data are very close to the training data in the same subclass, which indicates the effectiveness of our method. From the right part, the debiased text embeddings for each subclass are ‘pushed away’ from the original embedding (“face photo of a person”) and move in the direction to the corresponding training data.
In conclusion, the results above demonstrate the effectiveness of our method in debiasing both single and multiple attributes.