Distilling knowledge through student-teacher model and BERT for sentiment analysis

Dong, Ximing

Distilling knowledge through student-teacher model and BERT for sentiment analysis

Files

XimingDongThesis.pdf(1.63 MB)

Date

2022-12-04

Authors

Dong, Ximing

Abstract

Bi-directional Encoder Representations from Transformers (BERT) is the state-of-the-art deep learning model for pre-training natural language processing (NLP) tasks such as sentiment analysis. The BERT model dynamically generates word representations according to the context and semantics using its bi-directional and attention mechanism features. The model, although, improves precision on NLP tasks, is compute-intensive and time-consuming to deploy on mobile or smaller platforms. In this thesis, to address this issue, we use knowledge distillation (KD), a "teacher-student" training technique, to compress the model. We use the BERT model as the "teacher" model to transfer knowledge to student models, ``first-generation'' convolution neural networks, and long-short term memory with attention mechanism (LSTM-atten). We conduct various experiments on sentiment analysis benchmark data sets and show that the “student models” through knowledge distillation have better performance with 70% improvement in accuracy, precision, recall, and F1-score compared to models without KD. We also investigate the convergence rate of student models and compare the results to the existing models in the literature. Finally, we show that compared to the full-size BERT model, our RNN series models are 50 times smaller in size and retain approximately 96% performance on benchmark data sets.

Keywords

Natural Language Processing

URI

http://hdl.handle.net/1993/36990

Collections

FGS - Electronic Theses and Practica

Full item page