MIB Datasets

This page describes the datasets of Twitter accounts we have analysed within MIB.

Note on the datasets: Due to several constraints (e.g., user's privacy/data protection), we are not openly posting the datasets, instead, they are available for researchers who will ask for. Please contact us to have access to the data, for research purposes.

Two terms of usage apply:

  • The appropriate paper is cited in any research product whose findings are based on these datasets;
  • The datasets cannot be redistributed


First dataset

Description: genuine and spambot Twitter accounts, annotated by CrowdFlower contributors, as described in our paper:

The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race, S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi. WWW '17 Proceedings of the 26th International Conference on World Wide Web Companion, 963-972, 2017

@inproceedings{Cresci:2017:PSS:3041021.3055135,
 author = {Cresci, Stefano and Di Pietro, Roberto and Petrocchi, Marinella and Spognardi, Angelo and Tesconi, Maurizio},
 title = {The Paradigm-Shift of Social Spambots: Evidence, Theories, and Tools for the Arms Race},
 booktitle = {Proceedings of the 26th International Conference on World Wide Web Companion},
 series = {WWW '17 Companion},
 year = {2017},
 isbn = {978-1-4503-4914-7},
 location = {Perth, Australia},
 pages = {963--972},
 numpages = {10},
 url = {https://doi.org/10.1145/3041021.3055135},
 doi = {10.1145/3041021.3055135},
 acmid = {3055135},
 publisher = {International World Wide Web Conferences Steering Committee}
} 

To give the flavour of the collected information, we make available a sample of 100 genuine accounts and their labeles received by CrowdFlower users. You can download the sample following these links: sample Sql version - sample CSV version

The full dataset, available on request and only for research purposes, has the following composition:

group name description accounts tweets year
genuine accounts verified accounts that are human-operated 3,4748,377,522 2011
social spambots #1 retweeters of an Italian political candidate 991 1,610,176 2012
social spambots #2 spammers of paid apps for mobile devices 3,457428,542 2014
social spambots #3 spammers of products on sale at Amazon.com 464 1,418,626 2011
traditional spambots #1 training set of spammers used by C. Yang, R. Harkreader, and G. Gu. 1,000145,094 2009
traditional spambots #2 spammers of scam URLs 100 74,957 2014
traditional spambots #3 automated accounts spamming job offers 433 5,794,931 2013
traditional spambots #4 another group of automated accounts spamming job offers 1,128133,311 2009
fake followers simple accounts that inflate the number of followers of another account3,351196,027 2012

Sql version - CSV version


Second dataset

Description: The total amount of accounts has been split in multiple datasets, as described in our paper:
Fame for sale: efficient detection of fake Twitter followers.

  • TFP (the fake project): 100% humans
  • E13 (elections 2013): 100% humans
  • INT (intertwitter): 100% fake followers
  • FSF (fastfollowerz): 100% fake followers
  • TWT (twittertechnology): 100% fake followers

Sql version - CSV version

Fame for sale: efficient detection of fake Twitter followers, S. Cresci, R. Di Pietro, M. Petrocchi, A. Spognardi, M. Tesconi. arXiv:1509.04098 09/2015. Elsevier Decision Support Systems, Volume 80, December 2015, Pages 56–71

@article{fameforsale2015,
 author = {Cresci, Stefano and {Di Pietro}, Roberto and Petrocchi, Marinella and Spognardi, Angelo and Tesconi, Maurizio},
 title = {Fame for sale: efficient detection of fake Twitter followers},
 journal = {Decision Support Systems},
 publisher = {Elsevier},
 volume = {80},
 month = {December},
 issn = {0167-9236},
 doi = {http://dx.doi.org/10.1016/j.dss.2015.09.003},
 year = {2015},
 pages = {56-71}
}