Fractal modelling of residual in linear predictive coding of speech

Thumbnail Image
Vera, Epiphany
Journal Title
Journal ISSN
Volume Title
Linear predictive modelling of speech forms the basis of most low bit-rate speech coders, including Linear Predictive Coder 10 (LPC-10), Code Excited Linear Prediction (CELP), Vector Sum Excited Linear Prediction (VSELP), Special Mobile Group (GSM) 06.10. The main difference between such codecs is in the coding of the excitation signal. LPC-10 uses a very com act method for representing the excitation. It produces bit-rates around 2 to 4 kilobits per second (kbps), but the speech quality is very poor and sounds machine-like. CELP and VSELP use a less compact method, but it is computationally more intensive. The resulting bit-rates are as low as 4 kbps, with a better speech quality. GSM uses an even less compact representation, with a bit rate above 10 kbps, but the reproduction quality is very good. In this thesis, a method for modelling speech excitations using fractal interpolation is developed. These fractal interpolation techniques are based on iterated function systems (IFS). Two IFS models are used: (i) the self-affine model and (ii) the piecewise self-affine model. The first model is found to be inefficient for the purpose of representing excitations, while the second model is found to provide a better representation for the speech excitations. Consequently, a 6 kbps speech coder was implemented using the piecewise self-affine fractal model. The coder has a signal-to-noise ratio of 10.9dB and an informal subjective measure found the perceptual quality to be comparable to that of the 13 kbps GSM coder.