Identifying language groups within multilingual cybercriminal forums

Victor Benjamin, Hsinchun Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Scopus citations


Online cybercriminal communities exist in various geopolitical regions, including America, China, Russia, and more. Some multilingual forums exist where cybercriminals of differing geopolitical origin interact and exchange hacking knowledge and cybercriminal assets. Researchers can study such forums to better understand the global cybercriminal supply chain and cybercrime trends. However, little work has focused on identifying members of different language groups and geopolitical origin within such forums. One challenge is the necessity of a technique that scales across multiple languages. We are motivated to explore computational techniques that support automated and scalable categorization of cybercriminal forum participants into varying language groups. In particular, we make use of Paragraph Vectors, a state-of-The-Art neural network language model to generate fixed-length vector representations (i.e., document embeddings) of messages posted by forum participants. Results indicate Paragraph Vectors outperforms traditional n-gram frequency approaches for generating document embeddings that are useful for clustering cybercriminals into language groups.

Original languageEnglish (US)
Title of host publicationIEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages3
ISBN (Electronic)9781509038657
StatePublished - Nov 15 2016
Event14th IEEE International Conference on Intelligence and Security Informatics, ISI 2015 - Tucson, United States
Duration: Sep 28 2016Sep 30 2016


Other14th IEEE International Conference on Intelligence and Security Informatics, ISI 2015
Country/TerritoryUnited States


  • Cybecrminal community
  • Cybersecurity
  • Language modeling
  • Multilingual
  • Neural network

ASJC Scopus subject areas

  • Information Systems
  • Artificial Intelligence
  • Computer Networks and Communications
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality


Dive into the research topics of 'Identifying language groups within multilingual cybercriminal forums'. Together they form a unique fingerprint.

Cite this