Abstract:
Speech signals are complex in nature with respect to other forms of communication media such as text or image. Different forms of noises (e.g., additive noise, channel noise, babble noise) interfere with the speech signals and drastically hamper the quality of the speech. Enhancement of speech signals is a daunting task considering multiple forms of noises while denoising a speech signal. Certain analog noise eliminator models have been studied over the years for this purpose. Researchers have also delved into some machine learning techniques (e.g., artificial neural network) to enhance speech signals. In this study, a speech enhancement system is investigated using Convolutional Denoising Autoencoder (CDAE). Convolutional neural network (CNN) is a special kind of deep neural networks which is suitable for 2D structured input (e.g., image) and CDAE is a CNN based special kind of Denoising Autoencoder. CDAE takes advantages from the 2D structured inputs of the features extracted from speech signals and also considers the local temporal relationship among the features. In the proposed system, CDAE is trained considering features from noisy
speech signal as input and clean speech features as desired output. The proposed CDAE based method has been tested on a benchmark dataset, called Speech Command Dataset, and
attained 80% similarity between denoised speech and actual clean speech. The proposed system achieved perceptual evaluation of speech quality (PESQ) value of 2.43 which outperformed other related existing methods.
Description:
This thesis is submitted to the Department of Computer Science and Engineering, Khulna University of Engineering & Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, May 2019.
Cataloged from PDF Version of Thesis.
Includes bibliographical references (pages 31-35).