I recall a somewhat similar incident when I was showing an in-law of mine how Stable Diffusion worked a while back. She’s of Indian descent, and she asked Stable Diffusion to generate a picture of an Indian woman. All of the women it generated had Bindis and other “traditional” Indian cultural garb on, and she was initially kind of annoyed by that. But I explained that that’s because most of the photos of women in the training set that were explicitly tagged as Indian were dressed that way, whereas the rest of the Indian women in the training set probably weren’t explicitly tagged. They were just women.
It was kind of interesting trying to figure out which option was more biased. Realizing that there was an understandable reason behind that helped ease her annoyance.
Yes, but they trained on easily accessible data in large amounts. Which actually says that stock photo websites are the biased ones there.
No model can be trained on an equal amount of diverse data for everyone, and it’s not supposed to anyway. I bet it was hardly if at all trained on Mongolian goat herders, but you could hardly say it’s biased against them, just that there wasn’t an easily accessible large amount of pictures of them.
The bias isn’t in the software, it is in the data. The stock photos of professional women that were fed in were white.
That doesn’t say anything about the AI, but rather the community that created those biases.
I recall a somewhat similar incident when I was showing an in-law of mine how Stable Diffusion worked a while back. She’s of Indian descent, and she asked Stable Diffusion to generate a picture of an Indian woman. All of the women it generated had Bindis and other “traditional” Indian cultural garb on, and she was initially kind of annoyed by that. But I explained that that’s because most of the photos of women in the training set that were explicitly tagged as Indian were dressed that way, whereas the rest of the Indian women in the training set probably weren’t explicitly tagged. They were just women.
It was kind of interesting trying to figure out which option was more biased. Realizing that there was an understandable reason behind that helped ease her annoyance.
Yes, but they trained on easily accessible data in large amounts. Which actually says that stock photo websites are the biased ones there.
No model can be trained on an equal amount of diverse data for everyone, and it’s not supposed to anyway. I bet it was hardly if at all trained on Mongolian goat herders, but you could hardly say it’s biased against them, just that there wasn’t an easily accessible large amount of pictures of them.
That’s my point. The AI isn’t an independent subject to be criticized, it is a cultural mirror.