On detecting online radicalization using natural language processing

This paper suggests a new approach for radicalization detection using natural language processing techniques. Although, intuitively speaking, detection of radicalization from only language cues is not trivial and very debatable, the advances in computational linguistics together with the availability of large corpus that allows application of machine learning techniques opens us new horizons in the field. This paper advocates a two stage detection approach where in the first phase a radicalization score is obtained by analyzing mainly inherent characteristics of negative sentiment. In the second phase, a machine learning approach based on hybrid KNN-SVM and a variety of features, which include 1, 2 and 3-g, personality traits, emotions, as well as other linguistic and network related features were employed. The approach is validated using both Twitter and Tumblr dataset.