Play Live Radio
Next Up:
Available On Air Stations
Contact 90.5 WESA with a story idea or news tip:

Pittsburgh Researchers Use AI To Study Bias in Bollywood, Hollywood Films

Mark J. Terrill

It doesn’t take a genius to perceive that mainstream movies have a history of showing bias against women, or regarding standards of beauty. But it might take artificial intelligence to put numbers to those suspicions.

Carnegie Mellon University researchers have developed AI that analyzed the subtitles of some 1,500 movies, including the 100 top-grossing films from both Hollywood and Bollywood from each of the past seven decades. The new research confirms that while there’s been progress, there’s still plenty of gender and social bias on the screen. And it points the way on how to study, with unprecedented speed, the way social issues play out in films and TV from around the world.

“Gender Bias, Social Bias, and Representation: 70 Years of B(H)ollywood” is the title of the paper, co-authored by Kunal Khadilkar, Ashiqur R. KhudaBukhsh, and Tom Mitchell. The researchers – Khadilkar in particular considers himself a big Bollywood fan – knew sexism was a problem in movies, and wanted to quantify the problem. They acquired the subtitles to the 1,400 films, all in English, from online sources.

Some of the findings are particular to India and its $2.1 billion film industry centered in Mumbai. For example, the AI calculated that from 1950 through 1999, 74% of the babies born in Bollywood films were boys. However, since then the figure has dropped to 55%.

“So we achieved somewhat gender parity in newer movies,” said KhudaBukhsh, a project scientist with CMU’s Language Technologies Institute and who, like Khadilkar, was born in India.

Another finding likewise reflects what the authors said was broad social change in India: the decline in dialogue approving of dowries. The practice of brides’ families paying dowries was considered socially acceptable for a while after it was outlawed, in 1961, but has since fallen from favor. The trend as reflected on screen was discerned by having AI learn which words were most closely associated with “dowry.” In films through 1969, it was words like “money” and “jewelry”; in the past two decades, it’s been terms indicating noncompliance, like “guts,” “divorce” and “refused.”

In looking at other social issues, researchers compared Bollywood and Hollywood films. To quantify gender bias, they calculated the percentage of all gendered pronouns that were “he” and “him.” In the ’50s, about 65% of the gendered pronouns in Hollywood hits were male, and nearly 60% of Bollywood’s pronouns. Today, pronouns in subtitles for films from both industries run about 55% male. For comparison’s sake, the authors ran the same test on Google books texts from the same time period. In the ’50s, those books favored male pronouns over female by an astonishing 3-1 ratio, but by the past decade, the split had become 50-50, they found.

The association of particular skin tones with female beauty was another issue. “Colorism” is closely associated with racism in the U.S., and KhudaBukhsh said there is increasing backlash in India over cosmetics promising to lighten skin.

To look at skin tone, researchers used what’s called a “cloze” test, where AI mines data to predict a word left out of a given sentence – in this case, “A beautiful woman should have [blank] skin.” Where a language model would predict “soft,” researchers said, AI trained with both Bollywood and Hollywood scripts most often turned up “fair.”

“So there is an association of beauty with lighter skin color,” said KhudaBukhsh.

The result was true across eras, though the preference is more pronounced in Bollywood films than in Hollywood, researchers said.

The study also explored gender bias through what’s known as a Word Embedding Association Test (WEAT). The AI calculated how often male-associated words (“he,” “man” “male”) as opposed to “female” words correlated with given occupations, from “philosopher” and “boss” to “nurse” and “socialite.” Hollywood proved notably less biased in assigning jobs typed by gender, and in both film industries, the bias declined over time.

Additionally, the team ran the WEAT analysis again, this time comparing both film industries to a set of 150 Oscar nominees for Best Foreign-Language Film. The “world movies” exhibited about 40% less gender bias in terms of occupation than either Bollywood or Hollywood films.

“The films which are particularly acclaimed, and end up being nominated at the Academy Awards, they end up being less biased [than] the ones which are generating millions of dollars of revenue and end up being blockbuster movies,” said Khadilkar.

This type of analysis has its limits, researchers acknowledged: It considers only subtitles, which mostly reflect spoken dialogue and song lyrics, and don’t account for the way biases might be expressed by a film’s visuals or soundtrack, for instance. Many film fans are familiar with informal ways of gauging a film's regard for women, like the Bechdel Test, which evaluates how female characters relate to one another.

But researchers say the AI’s ability to sift through the subtitles of hundreds of films quickly is a boon to the field.

“You can now assign numbers to the amount of biases … hence helping quantify the different film industries all over the world,” said Khadilkar.

The research paper was presented in February at a virtual conference of the Association for the Advancement of Artificial Intelligence.

Bill is a long-time Pittsburgh-based journalist specializing in the arts and the environment. Previous to working at WESA, he spent 21 years at the weekly Pittsburgh City Paper, the last 14 as Arts & Entertainment editor. He is a graduate of Northwestern University's Medill School of Journalism and in 30-plus years as a journalist has freelanced for publications including In Pittsburgh, The Nation, E: The Environmental Magazine, American Theatre, and the Pittsburgh Post-Gazette. Bill has earned numerous Golden Quill awards from the Press Club of Western Pennsylvania. He lives in the neighborhood of Manchester, and he once milked a goat. Email:
To make informed decisions, the public must receive unbiased truth.

As Southwestern Pennsylvania’s only independent public radio news and information station, we give voice to provocative ideas that foster a vibrant, informed, diverse and caring community.

WESA is primarily funded by listener contributions. Your financial support comes with no strings attached. It is free from commercial or political influence…that’s what makes WESA a free vital community resource. Your support funds important local journalism by WESA and NPR national reporters.

You give what you can, and you get news you can trust.
Please give now to continue providing fact-based journalism — a monthly gift of just $5 or $10 makes a big difference.