Improving Holy Qur'an recitation system using Hybrid Deep Neural Network-Hidden Markov Model approach

Mustafa Abdallah; Mubarak Al-Marri; Sherif Abdou; Hazem Raafat; Mohsen Rashwan; Mohamed El-Gamal

Improving Holy Qur'an recitation system using Hybrid Deep Neural Network-Hidden Markov Model approach

Mustafa Abdallah, Mubarak Al-Marri, Sherif Abdou, Hazem Raafat, Mohsen Rashwan, Mohamed El-Gamal

Abstract

Teaching Holy Qur'an recitation rules and Arabic pronunciations to non-native speakers is a challenging task. Automatic Speech Recognition (ASR) utilizing Machine Learning techniques proved to be very promising. In this paper, we carried out a large number of experiments to achieve a significant improvement in the accuracy of an ASR system. A hybrid Deep Neural Network-Hidden Markov Models (DNN-HMM) approach is used for that purpose. Comparing the Recognition performance of the proposed approach with the traditional baseline HMM approach is performed. It turns out that our proposed approach is superior considering phone Error rate (PER). Experimental results show a significant improvement of the proposed approach in terms of recognition performance. Moreover, the performance of rules like (Vibration, Assimilation, Turning, etc.) is also improved. The proposed approach is tested using N-gram Language Model and Lattice Network.

Full Text:

PDF

References

REFERENCES

S. Abdou, S. Hamid and M. Rashwan, (2006). Computer Aided Pronunciation Learning System Using Speech Recognition Techniques, INTERSPEECH 2006, pp.849–852.

Abdurrahman Samir and Sherif Mahdy Abdou (2007). Enhancing usability of CAPL System for Qur’an recitation learning, INTERSPEECH 2007, pp.214–217.

F.Sha and L.Saul (2006).Large margin Gaussian mixture modeling for phone classification andrecognition, ICASSP, 2006, pp.265–268

Omar, M. K (1999).Phonetic segmentation of Arabic speech for verification using HMM, M.Sc. thesis, Cairo University, Faculty of engineering, Department of Electronics, Egypt, 1999.

Abdel-rahman Mohamed , George E. Dahl, and Geoffrey Hinton ( 2012 ). Acoustic modeling using Deep Belief Networks, IEEE Trans. on Audio, Speech, and Language Processing Vol. 20, NO. 1, pp.1-10.

George E. Dahl and Dong Yu (2012). Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , IEEE Trans. On Audio, Speech, and Language Processing, VOL. 20, NO. 1, pp.30–42.

Ossama Abdel-Hamid, Abdel-rahman Mohamed and Hui Jiang (2012) Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition, ICASSP 2012, pp. 4277-4280.

V. Mnih (2009). Cudamat : a CUDA-based matrix class for python, Department of Computer Science, University of Toronto, Tech. Rep. UTML TR 2009-004, November 2009.

Lee, Kawahara, et al. (2001). Julius – An open source real-time large vocabulary recognition engine. In Proc. European Conf. Speech Comm. & Tech. (EU-ROSPEECH), pp.1691-1694.

Refbacks

There are currently no refbacks.