The MyChoice project

MyChoice is a research project that aims to model, detect, and isolate outliers (aka fake) in online recommendation systems, as well as in online social networks. The final outcome of the project will provide new techniques to mitigate the spread of fake/unreliable/unaccurate information over the Web (e.g., fake accounts on social networks or anomalous ratings on e-advise/e-commerce websites).

Focusing on real online recommendation systems, MyChoice intends to tackle the (malicious) bias that may influence a high percentage of users. Secondly, the project pays attention to fake accounts on social networks and provides automatic fake detection techniques. As an example on Twitter, ``fake followers" are those accounts specifically created to inflate the number of followers of a target account, for example to make it more trustworthy and influential, in order to stand out from the crowd and attract other genuine followers.

Another phenomenon that the project deals with is the one of fake reviews, so widespread that it has captured the attention of academia and the mass media. Fake reviews can influence the opinions of users, having the effect of either promoting or damaging a particular target, thus a strong incentive exists for opinion spamming. Defining efficient methodologies and tools to mitigate proliferation of fake reviews is an issue that MyChoice is addressing.

The project started its activity in 2012, monitoring some of the most popular websites providing online advice for hotels, such as TripAdvisor, and online services for e-booking, such as Booking. A crawler was used to collect several million reviews relating to thousands of different hotels all around the world. Starting from the state-of-the-art in the field, the researchers involved in the project quantified the robustness of the rating aggregators used in such systems, against the malicious injection of fake reviews. The current experiments outcomes, for example, enrich past results attesting that a simple arithmetic mean of the ratings by the hotel guests (which is the usual way to provide aggregated information to users) is not the most robust aggregator, since it can be severely affected by even a small number of outliers. Experiments have been carried out considering different kinds of attack, such as batch injections, hotel-chain injections, and local competitor injections. To improve the robustness of the ranking, the project is defining new aggregators to more effectively tackle the activity of malicious reviewers.

To enhance the comprehension of the fake phenomenon, the project is also looking at other instances of the concept of ``fake". In particular, a research effort is focusing on the proliferation of fake Twitter followers, which has also aroused a great deal of interest in the mainstream media, such as New York Times and Financial Times. We have created a baseline dataset, a collection of both truly genuine (human) and truly fake accounts. In December, 2012, MyChoice launched a Twitter campaign called ``The Fake Project", with the creation of the Twitter account @TheFakeProject, whose profile claims ``Follow me only if you are NOT a fake". To obtain the status of ``certified human", each account that adheres to the initiative was the target of further checks to attest its credibility. The ``certified fake" set was collected by purchasing fake accounts, which are easily accessible to the general public.

The baseline dataset was used to train a set of machine-learning classifiers built over a series of rules and features characterising Twitter accounts. Our main result in the area is a novel ``Class A" classifier, that is general enough to thwart overfitting and that uses the less costly features (in terms of crawling costs), while being able to correctly classify more than 95% of the accounts of the training set.

The work is the basis for a more thorough comprehension of the fake phenomenon, which can lead to formal modelling that can help discriminate between an anomalous (possibly fake) account and a standard (possibly legitimate) one. Using this formalization as a reference model, the definition of fakes could be exported into different contexts, even to online reviews and reviewers.


Acknowledgements

MyChoice is a regional project funded by the Tuscany region under the ``Programma operativo regionale Competitività e Occupazione (Por Cro)", within the European Union Social Fund framework 2007-2013, the Institute for Informatics and Telematics of the Italian National Research Council (IIT-CNR), and the start up company Bay31. It is a two-year project, which started in November 2012.

People

Marinella Petrocchi picture Angelo Spognardi picture Alessandro Colantonio picture Roberto Di Pietro picture

Publications acknowledging MyChoice

A Criticism to Society (as seen by Twitter analytics), Stefano Cresci, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi. In The First International Workshop on Big Data Analytics for Security (DASec), June 2014, Madrid, Spain.


A Lot of Slots -- Outliers confinement in review-based systems, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi. In Proceedings of 15th International Conference on Web Information System Engineering (WISE 2014), Part I, LNCS 8786, Springer -- October 2014, Thessaloniki, Greece.


Discriminating Between the Wheat and the Chaff in Online Recommendation Systems, Roberto Di Pietro, Marinella Petrocchi, Angelo Spognardi, Maurizio Tesconi. In 96 Ercim News, Special Issue on Linked Open Data, January 2014.


Improved Automatic Maturity Assessment of Wikipedia Medical Articles, Emanuel Marzini, Angelo Spognardi, Ilaria Matteucci, Paolo Mori, Marinella Petrocchi, Riccardo Conti. In Proceedings of the 13th International Conference on Ontologies, DataBases, and Applications of Semantics (ODBASE 2014), October 2014, Amantea, Italy.


Maturity assessment of Wikipedia medical articles, Riccardo Conti, Emanuel Marzini, Ilaria Matteucci, Paolo Mori, Marinella Petrocchi, Angelo Spognardi. In Proceedings of the 27th International Symposium on Computer-Based Medical Systems (CBMS 2014), May 2014, New York, USA.


A Fake Follower Story: improving fake accounts detection on Twitter, S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi. IIT Technical Report IIT TR-03/2014. Currently under revision.


References

[1] Chao Yang, Robert Chandler Harkreader, Guofei Gu: Die free or live hard? Empirical evaluation and new design for fighting evolving Twitter spammers, in Proc. of RAID 2011, Springer

[2] Gianluca Stringhini, Christopher Kruegel Giovanni Vigna: Detecting spammers on social networks, in Proc. of ACM ACSAC '10, ACM