Application of polyscale methods for speaker verification

Sedigh, Sina

Application of polyscale methods for speaker verification

Files

Sedigh_Sina.pdf(15.49 MB)

data.zip(124.06 MB)

matlab_codes.zip(18.37 KB)

Date

2018-05-22

Authors

Sedigh, Sina

Abstract

Voice is a characteristic of the human body which is unique to an individual. Voice can be used for remote access applications, in order to verify the individual’s identity. However, robust feature extraction is required and the aim of this research is the establishment of security via the speaker’s voice. All the experiments in this thesis are based on a dataset recorded in an anechoic chamber, available at the Applied Electromagnetic Laboratory at the University of Manitoba. The following dataset consists of utterances, recorded using 24 volunteers raised in the Province of Manitoba, Canada. To provide a repeatable set of test words that would cover all of the phonemes, the Edinburg Machine Readable Phonetic Alphabet [KiGr08], consisting of 44 words was used. The utterances were recorded using a sampling frequency of 44.1 kilo-samples per second (kSps). The recording sessions took place between 10 AM to 3 PM, from March 27, 2017, until September 27, 2017. This thesis presents a study of text-independent speaker verification with the aim of experimental evaluation of features and embedding fractal algorithms to the front-end processing of the speaker verification system. A voice activity detection based on the variance fractal dimension was used to separate the non-speech segments of the signal. A fusion of multiple features, namely the linear prediction cepstral coefficients, Mel-frequency cepstral coefficients, Higuchi fractal dimension, variance fractal dimension, zero crossing rate, and turns count, was used to form the feature vectors. Meanwhile, an experimental sensitivity analysis was conducted to test the effects of each feature on the accuracy of classification using a support vector machine. The features were extracted using multiple voice activity detection algorithms. The best across-the-divide recognition accuracy of 91.60% was obtained by fusion of all the features that were extracted using the voice activity detection algorithm based on the variance fractal dimension. This shows that fusion of features and embedding of fractal methods to the front-end processing of text-independent speaker verification will increase the accuracy of the classifications.

Keywords

Speaker verification, Polyscale methods, Multifractal methods, Voice activity detection

URI

http://hdl.handle.net/1993/33078

Collections

FGS - Electronic Theses and Practica
Manitoba Heritage Theses

Full item page