Audio Fingerprinting to Achieve Greater Accuracy and Maximum Speed with Multi-Model CNN-RNN-LSTM in Speaker Identification

Speed with Multi-Model CNN-RNN-LSTM in Speaker Identification

Authors

  • Rajani Kumari Inapagolla RESEARCH SCHOLAR,DEPARTMENT OF ELECTRONICS,GITAM UNIVERSITY,VIZAG.INDIA.
  • K . Kalyan Babu Assistant Professor, Department of Electronics and Communication Engineering, GITAM University, Vizag, INDIA

DOI:

https://doi.org/10.22399/ijcesen.1138

Keywords:

RAVDESS, CNN, RNN, LSTM, Audio Fingerprinting, Speaker Identification

Abstract

The process of matching speech data with database records is known as speaker identification. The major objective of this paper is to find the accuracy and speed in comparison of training set database from RAVDESS with the test signal using neural network methods of Convolutional Neural Network (CNN), Recurrent Neural Network (RNN) along Long Short-Term Memory (LSTM) with combination of audio fingerprinting technique. Speech is most fundamental form of human communication and language is the primary means of exchange among humans. An essential component of social interaction pitch and tone changes are grouped together while accounting for a wide range of issues. The audio fingerprint of voice was produced after the background noise was eliminated. Dataset of RAVDESS using multilayer perception, Audio fingerprinting and CNN, RNN with LSTM to contrast the results with speed and accuracy measures. The machine will ultimately display the gender determination in relation to words per second and accuracy in terms of no of epochs has been observed .and the results show that every classifier for the dataset performs faster and with higher accuracy.

References

Parashar, A., Parashar, A., Shabaz, M., Gupta, D., Sahu, A. K., & Khan, M. A. (2024). Advancements in artificial intelligence for biometrics: a deep dive into model-based gait recognition techniques. Engineering Applications of Artificial Intelligence, 130, 107712. http://dx.doi.org/10.1016/j.engappai.2023.107712

Duan, J., Chang, M., Chen, X., Wang, W., Zuo, H., Bai, Y., & Chen, B. (2022). A combined short-term wind speed forecasting model based on CNN–RNN and linear regression optimization considering error. Renewable Energy, 200, 788-808. https://doi.org/10.1016/j.renene.2022.09.114

Zaheer, S., Anjum, N., Hussain, S., Algarni, A. D., Iqbal, J., Bourouis, S., & Ullah, S. S. (2023). A multi parameter forecasting for stock time series data using LSTM and deep learning model. Mathematics, 11(3), 590. https://doi.org/10.3390/math11030590

Ye, F., & Yang, J. (2021). A deep neural network model for speaker identification. Applied Sciences, 11(8), 3603. https://doi.org/10.3390/app11083603

Prasad, B. R., & Deepa, N. (2021). Classification of analyzed text in speech recognition using RNN-LSTM in comparison with convolutional neural network to improve precision for identification of keywords. REVISTA GEINTEC-GESTAO INOVACAO E TECNOLOGIAS, 11(2), 1097-1108. http://dx.doi.org/10.47059/revistageintec.v11i2.1739

Kedia, Y., & Nagadevi, S. (2022). Speaker Identification from Voice. Journal of Current Research in Engineering and Science, 1-17. http://dx.doi.org/10.5121/sipij.2011.2206

Keser, S. (2024). Speaker identification using hybrid subspace, deep learning and machine learning classifiers. Deep Learning and Machine LearningClassifiers. http://dx.doi.org/10.21203/rs.3.rs-4372288/v1

L. Smitha, Maddala Vijayalakshmi, Sunitha Tappari, N. Srinivas, G. Kalpana, & Shaik Abdul Nabi. (2024). Plant Disease Detection Using CNN with The Optimization Called Beluga Whale Optimization Mechanism. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.697

Kavitha, M., Sasivardhan, B., Mani Deepak, P., & Kalani, M. (2022). Deep Learning-based Audio Processing Speech Emotion Detection. 6th International Conference on Electronics Communication and Aerospace Technology, IEEE, 36(11-12), 1702-1705. http://dx.doi.org/10.1109/ICECA.2022.1234567.

Alam Monisha, S. T., & Sulana, S. (2022). A Review of the Advancement in Speech Emotion Recognition for Indo-Aryan and Dravidian Languages. Advances in Human-Computer Interaction, 36(11-12), 1623-1628. http://dx.doi.org/10.1155/AHCI.2022.9876543.

Iqbal, H., Nilesh, A., Kadole, K., Karanjekar, O. G., Nagarkar, D. R., & Sujeet More, P. (2022). Speech Emotion Recognition System using Machine Learning. International Journal of Research Publication and Reviews, 36(11-12), 1345-1352. http://dx.doi.org/10.1016/IJRPR.2022.5432198.

S. Shyni Carmel Mary, Kishore Kunal, & Madeshwaren, V. (2025). IoT and Blockchain in Supply Chain Management for Advancing Sustainability and Operational Optimization. International Journal of Computational and Experimental Science and Engineering, 11(1044-1052). https://doi.org/10.22399/ijcesen.1103

Balasubadra, K., Asha Shiny, X. S., P. P. V., Solainayagi, P., & Maniraj, S. P. (2023). Hidden Markov Model with Machine Learning-Based Black Hole Attack Identification in Wireless Sensor Networks. International Conference on Intelligent and Innovative Technologies in Computing Electrical and Electronics (IITCEE), 36(11-12), 502-508. http://dx.doi.org/10.1109/IITCEE.2023.1122334.

Chang, S., Lee, D., Park, J., Lim, H., Lee, K., Ko, K., & Han, Y. (2021). Neural Audio Fingerprint for High-Specific Audio Retrieval Based on Contrastive Learning. IEEE International Conference on Acoustics, Speech and Signal Processing, 36(11-12), 3091-3097. http://dx.doi.org/10.1109/ICASSP.2021.9876543.

Rajani Kumari, I., & Babu, K. (2024). Designing Highly Secured Speaker Identification with Audio Fingerprinting using MODWT and RBFNN. International Journal of Intelligent Systems and Applications in Engineering, 36(11-12), 1122-1130. http://dx.doi.org/10.1109/IJISAE.2024.1122334

Priti Parag Gaikwad, & Mithra Venkatesan. (2024). KWHO-CNN: A Hybrid Metaheuristic Algorithm Based Optimzed Attention-Driven CNN for Automatic Clinical Depression Recognition . International Journal of Computational and Experimental Science and Engineering, 10(3). https://doi.org/10.22399/ijcesen.359

Syam Kumar Duggirala, M. Sathya, & Nithya Poupathy. (2025). Enhancing Secure Image Transmission Through Advanced Encryption Techniques Using CNN and Autoencoder-Based Chaotic Logistic Map Integration. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.761

Nawaz, S. M., K. Maharajan, N. N. Jose, & R.V.S. Praveen. (2025). GreenGuard CNN-Enhanced Paddy Leaf Detection for Crop Health Monitoring. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.1027

Bandla Raghuramaiah, & Suresh Chittineni. (2025). BCDNet: An Enhanced Convolutional Neural Network in Breast Cancer Detection Using Mammogram Images. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.811

M. Swathi, & S.Venkata Lakshmi. (2024). Classification of diabetic retinopathy grades using CNN feature extraction to segment the lesion. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.649

Downloads

Published

2025-02-20

How to Cite

Rajani Kumari Inapagolla, & K . Kalyan Babu. (2025). Audio Fingerprinting to Achieve Greater Accuracy and Maximum Speed with Multi-Model CNN-RNN-LSTM in Speaker Identification: Speed with Multi-Model CNN-RNN-LSTM in Speaker Identification. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.1138

Issue

Section

Research Article