Understanding live chat streams and detecting spam messages with a case study on VTubers
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Live chat for YouTube live streams provides a bridge for communication between YouTubers and audiences. This powerful feature enables YouTubers to dynamically adjust the live content according to the current audiences' feedback and make their live streams successful. In other words, audiences have opportunities to directly participate in live streams and exchange ideas with others. However, to my best knowledge, there have not yet been studies on the structure and analysis of the live chat, the metadata analysis for the user, and the spam detection. In this thesis, I design a live chat analyzer to reveal the structure of live chat and propose a multi-language spam detector method that has a tolerance for incomplete and inaccurate labels. Specifically, I propose the overlapped sliding-window standard deviation (OSWSD) to visually reveal the structural difference between toxic and healthy live chats, analyze metadata about audiences, and cluster similar chat messages to identify spammers. Evaluation on a real-life virtual YouTuber (i.e., VTuber) dataset and its corresponding crawled user dataset demonstrates the practicality of my solution. For example, my spam detection method led to 99.25% accuracy, 96.5% precision, 99.25% recall, and an F1-score of 99.25% in identifying a spam message. It also led to 98.25% accuracy, 98.5% precision, 98.25% recall, and an F1-score of 98.25% in detecting spamming bots.