CMU-Developed Algorithm Detects Fake Twitter Accounts, Yelp Reviews
If you want to gain a couple thousand Twitter followers overnight, it’s not hard.
There are hundreds of websites promising more Twitter followers, LinkedIn connections, Facebook likes and even fake reviews for a product on Amazon or a business on Yelp.
These accounts, whether created by bots or real people, are called fraudsters, and social networks and other sites play a constant game of catch-up trying to identify and disable them.
“If a person has a lot of followers, this person can … demand more money from advertisers or look more important in a political campaign,” said Christos Faloutsos, computer science professor at Carnegie Mellon University. “It pays off for an unscrupulous person to buy followers.”
Faloutsos advised a team of students who were honored with the Best Paper Award at the Association for Computing Machinery’s Conference on Knowledge Discovery and Data Mining this summer.
The team – Ph.D. students Bryan Hooi, Hyun Ah Song, Neil Shah and Kijung Shin and recent Ph.D. graduate Alex Beutel – developed an algorithm called FRAUDAR that can see through fraudsters' efforts to “camouflage” themselves by making the accounts appear normal.
Essentially, the algorithm looks for large groups of users who are all following the same other, smaller group of users.
“It’s near impossible to have 1,000 people follow exactly the same hundred targets,” Faloutsos said. “You can have a million people following (a few) famous people, like President Obama, Justin Bieber, Lady Gaga, but that’s about it. This is a telltale sign of fraud.”
He said the same idea works for reviews on sites such as Amazon or Yelp. It’s highly unlikely that the same 100 users are all reviewing the same 20 products.
“It causes a lot of problems for platforms like Amazon,” said Hooi. “If there are a lot of fake reviews on the site, people will not trust the reviews.”
Faloutsos said he became interested in catching fraudsters because social media is so prevalent in modern life and this technology has the potential to substantially impact the industry. Plus, he said, “it’s a nice mathematical problem that requires elegant tools and solutions.”
The algorithm is available as open source code for any developer to modify or improve upon.