Pittsburgh Researchers Use AI To Study Bias in Bollywood, Hollywood Films

90.5 WESA | By Bill O'Driscoll

Published March 8, 2021 at 6:04 AM EST

It doesn’t take a genius to perceive that mainstream movies have a history of showing bias against women, or regarding standards of beauty. But it might take artificial intelligence to put numbers to those suspicions.

Carnegie Mellon University researchers have developed AI that analyzed the subtitles of some 1,500 movies, including the 100 top-grossing films from both Hollywood and Bollywood from each of the past seven decades. The new research confirms that while there’s been progress, there’s still plenty of gender and social bias on the screen. And it points the way on how to study, with unprecedented speed, the way social issues play out in films and TV from around the world.

“Gender Bias, Social Bias, and Representation: 70 Years of B(H)ollywood” is the title of the paper, co-authored by Kunal Khadilkar, Ashiqur R. KhudaBukhsh, and Tom Mitchell. The researchers – Khadilkar in particular considers himself a big Bollywood fan – knew sexism was a problem in movies, and wanted to quantify the problem. They acquired the subtitles to the 1,400 films, all in English, from online sources.

Some of the findings are particular to India and its $2.1 billion film industry centered in Mumbai. For example, the AI calculated that from 1950 through 1999, 74% of the babies born in Bollywood films were boys. However, since then the figure has dropped to 55%.

“So we achieved somewhat gender parity in newer movies,” said KhudaBukhsh, a project scientist with CMU’s Language Technologies Institute and who, like Khadilkar, was born in India.

Another finding likewise reflects what the authors said was broad social change in India: the decline in dialogue approving of dowries. The practice of brides’ families paying dowries was considered socially acceptable for a while after it was outlawed, in 1961, but has since fallen from favor. The trend as reflected on screen was discerned by having AI learn which words were most closely associated with “dowry.” In films through 1969, it was words like “money” and “jewelry”; in the past two decades, it’s been terms indicating noncompliance, like “guts,” “divorce” and “refused.”

In looking at other social issues, researchers compared Bollywood and Hollywood films. To quantify gender bias, they calculated the percentage of all gendered pronouns that were “he” and “him.” In the ’50s, about 65% of the gendered pronouns in Hollywood hits were male, and nearly 60% of Bollywood’s pronouns. Today, pronouns in subtitles for films from both industries run about 55% male. For comparison’s sake, the authors ran the same test on Google books texts from the same time period. In the ’50s, those books favored male pronouns over female by an astonishing 3-1 ratio, but by the past decade, the split had become 50-50, they found.

The association of particular skin tones with female beauty was another issue. “Colorism” is closely associated with racism in the U.S., and KhudaBukhsh said there is increasing backlash in India over cosmetics promising to lighten skin.

To look at skin tone, researchers used what’s called a “cloze” test, where AI mines data to predict a word left out of a given sentence – in this case, “A beautiful woman should have [blank] skin.” Where a language model would predict “soft,” researchers said, AI trained with both Bollywood and Hollywood scripts most often turned up “fair.”

“So there is an association of beauty with lighter skin color,” said KhudaBukhsh.

The result was true across eras, though the preference is more pronounced in Bollywood films than in Hollywood, researchers said.

The study also explored gender bias through what’s known as a Word Embedding Association Test (WEAT). The AI calculated how often male-associated words (“he,” “man” “male”) as opposed to “female” words correlated with given occupations, from “philosopher” and “boss” to “nurse” and “socialite.” Hollywood proved notably less biased in assigning jobs typed by gender, and in both film industries, the bias declined over time.

Additionally, the team ran the WEAT analysis again, this time comparing both film industries to a set of 150 Oscar nominees for Best Foreign-Language Film. The “world movies” exhibited about 40% less gender bias in terms of occupation than either Bollywood or Hollywood films.

“The films which are particularly acclaimed, and end up being nominated at the Academy Awards, they end up being less biased [than] the ones which are generating millions of dollars of revenue and end up being blockbuster movies,” said Khadilkar.

This type of analysis has its limits, researchers acknowledged: It considers only subtitles, which mostly reflect spoken dialogue and song lyrics, and don’t account for the way biases might be expressed by a film’s visuals or soundtrack, for instance. Many film fans are familiar with informal ways of gauging a film's regard for women, like the Bechdel Test, which evaluates how female characters relate to one another.

But researchers say the AI’s ability to sift through the subtitles of hundreds of films quickly is a boon to the field.

“You can now assign numbers to the amount of biases … hence helping quantify the different film industries all over the world,” said Khadilkar.

The research paper was presented in February at a virtual conference of the Association for the Advancement of Artificial Intelligence.