Distilling knowledge through student-teacher model and BERT for sentiment analysis

Dong, Ximing

Distilling knowledge through student-teacher model and BERT for sentiment analysis

dc.contributor.author	Dong, Ximing
dc.contributor.examiningcommittee	Shaowei Wang	en_US
dc.contributor.examiningcommittee	Saumen Mandal	en_US
dc.contributor.supervisor	THULASIRAMAN, PARIMALA
dc.date.accessioned	2022-12-05T14:52:14Z
dc.date.available	2022-12-05T14:52:14Z
dc.date.copyright	2022-12-04
dc.date.issued	2022-12-04
dc.date.submitted	2022-12-05T01:42:20Z	en_US
dc.degree.discipline	Computer Science	en_US
dc.degree.level	Master of Science (M.Sc.)	en_US
dc.description.abstract	Bi-directional Encoder Representations from Transformers (BERT) is the state-of-the-art deep learning model for pre-training natural language processing (NLP) tasks such as sentiment analysis. The BERT model dynamically generates word representations according to the context and semantics using its bi-directional and attention mechanism features. The model, although, improves precision on NLP tasks, is compute-intensive and time-consuming to deploy on mobile or smaller platforms. In this thesis, to address this issue, we use knowledge distillation (KD), a "teacher-student" training technique, to compress the model. We use the BERT model as the "teacher" model to transfer knowledge to student models, ``first-generation'' convolution neural networks, and long-short term memory with attention mechanism (LSTM-atten). We conduct various experiments on sentiment analysis benchmark data sets and show that the “student models” through knowledge distillation have better performance with 70% improvement in accuracy, precision, recall, and F1-score compared to models without KD. We also investigate the convergence rate of student models and compare the results to the existing models in the literature. Finally, we show that compared to the full-size BERT model, our RNN series models are 50 times smaller in size and retain approximately 96% performance on benchmark data sets.	en_US
dc.description.note	February 2023	en_US
dc.identifier.uri	http://hdl.handle.net/1993/36990
dc.language.iso	eng	en_US
dc.rights	open access	en_US
dc.subject	Natural Language Processing	en_US
dc.title	Distilling knowledge through student-teacher model and BERT for sentiment analysis	en_US
dc.type	master thesis	en_US
local.subject.manitoba	no	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: XimingDongThesis.pdf
Size:: 1.63 MB
Format:: Adobe Portable Document Format
Description:: Thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.2 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

FGS - Electronic Theses and Practica