ABSTRACT

Voice activity detection (VAD), which identifies speech and non-speech durations in speech signals, is a challenging task under noisy environment for various speech applications. In this paper, we propose a Gated Recurrent Unit (GRU) based VAD using MFCCs augmented delta and delta-delta features under the low signal-to-noise ratios (SNRs) environments to overcome the shortages of the traditional VAD models. We compare the proposed method with the traditional methods by using speech signals smeared with 10 types of noise at low SNRs. Experimental results reveal that the proposed method based on GRU is superior to traditional method under all the considered noisy environments, indicating that the network based on GRU improve the performance of speech detection.

Keywords: : voice activity detection, deep neural network, recurrent neural network, gated recurrent unit