Investigating data archives to establish a correlation between hate speech and hate crimes

Monday 08/21/2023

The rise of technology has had a major impact on the upbringing of both Generation Z and Generation Alpha, as access to the internet has been seamlessly integrated into their lifestyle since birth. Older generations are learning to adapt to a technological lifestyle as well, from learning new recipes to connecting with loved ones. While our society continues to integrate modern technology into daily life, there are also risks that arise, as this gives a space for cybercriminals to roam freely and anonymously.

Willis Shaw, now at the College of Criminology and Criminal Justice at FSU, started his research at the Terrorism Research Center at the University of Arkansas. He has dedicated years of research to understanding the power of the internet, specifically its influence on mass mobilization and online radicalization. His undergraduate work shed light on imageboard platforms like 4Chan, where users can get away with sharing hate speech, suspicious comments, and politically extremist ideas, even if they pose a threat to others. With inspiration from this research, his current graduate research focuses on finding a temporal correlation between online hate speech and hate crimes.

Regarding research methodologies, Shaw and his team use natural language processing and time series analyses to identify changes in speech patterns before, during, and after major world events (in this case, COVID-19 and its impact on Asian Americans). His team explored comment threads on 4chan, an imageboard site where users can anonymously start discussions on any topics. Similar to how markets and businesses sort through reviews and attributes for specific key terms, Shaw wanted to see if he could teach the language processors how to flag slurs, racist comments, and other related hate speech found on the site.

While AI training is a big part of the project, the other important part is gathering the team and discussing what comments would be considered as violent versus sarcastic—nuances that a computer might not pick up on at first glance. With the right AI training and enough data to feed it, however, there should be less risk of false negatives.

Given the immense number of individual posts to sort through, Shaw says the Research Computing Center’s computer was the only campus resource that could withstand opening 86GB worth of data archives from an Excel spreadsheet. More research and experimentation showed that the High Performance Computing Cluster (HPC) can process 10,000 comments within approximately two seconds, which also significantly helped his team progress with his research.

Shaw emphasizes his interest in bridging the gap between existing criminological studies and modern approach methodologies so that future students can learn from it. “When we as a nation have the resources and opportunities to move us further as a culture, we should take that opportunity.”