Real time speech emotion recognition for Bio-medical monitoring using hybrid deep learning architecture
DOI:
https://doi.org/10.22399/ijcesen.5205Keywords:
Speech Emotion Recognition, biomedical monitoring, deep learning, CNN, LSTMAbstract
Speech carries essential emotional information that is significant for biomedical monitoring and treatment of patients. Real-time SER is now a vital medical tool, allowing simultaneous evaluation of psychological states. The anticipated approach in this research is a DL model that integrates CNNs and LSTMs to enhance the accuracy of emotion recognition by extracting spectral and temporal features of speech signals. Most conventional SER models are plagued by low accuracy, poor noise adaptability, and poor temporal modeling, thus constraining their usability in biomedical situations. The proposed "CNN–LSTM model" overcomes such challenges by utilizing the efficient feature extraction ability of CNN and sequence model ability of LSTM to offer strong and real-time classification of emotional states. The model was trained and tested on the "RAVDESS dataset" with data augmentation and preprocessing for better generalization. Experimental results confirm that the model is 92.8% accurate, 91.6% precise, 92.1% recalled, and 91.8% F1-scored, outperforming existing solutions. This attests that it is well-suited to be implemented in healthcare platforms for real-time biomedical monitoring, stress detection, and adaptive interventions.
References
[1] Shih, Po-Yuan, Chia-Ping Chen, and Chung-Hsien Wu. "Speech emotion recognition with ensemble learning methods." In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2756-2760. IEEE, 2017.
[2] Lin, Wenqian, and Yunjian Zhang. "Review of researches on the emotion recognition and affective computing based on HCI." In 2022 7th International Conference on Image, Vision and Computing (ICIVC), pp. 573-578. IEEE, 2022.
[3] George, Swapna Mol, and P. Muhamed Ilyas. "A review on speech emotion recognition: A survey, recent advances, challenges, and the influence of noise." Neurocomputing 568 (2024): 127015.
[4] Reddy, Sandeep, Wendy Rogers, Ville-Petteri Makinen, Enrico Coiera, Pieta Brown, Markus Wenzel, Eva Weicken et al. "Evaluation framework to guide implementation of AI systems into healthcare settings." BMJ health & care informatics 28, no. 1 (2021): e100444.
[5] Rasheed, Bilal Hikmat, D. Yuvaraj, Saif Saad Alnuaimi, and S. Shanmuga Priya. "Automatic speech emotion recognition using hybrid deep learning techniques." International Journal of Intelligent Systems and Applications in Engineering 12, no. 15 (2024): 87-96.
[6] Vudathaneni, Vijaya Krishna Prasad, Rama Brahmam Lanke, Manasi Chinnadurai Mudaliyar, Kalikrishna Varaprasad Movva, Lakshmi Mounika Kalluri, Ramanarayana Boyapati, and Vijaya krishna prasad Vudathaneni. "The impact of telemedicine and remote patient monitoring on healthcare delivery: a comprehensive evaluation." Cureus 16, no. 3 (2024).
[7] Gopal, Dipesh P., Ula Chetty, Patrick O'Donnell, Camille Gajria, and Jodie Blackadder-Weinstein. "Implicit bias in healthcare: clinical practice, research and decision making." Future healthcare journal 8, no. 1 (2021): 40-48.
[8] Guo, Runfang, Hongfei Guo, Liwen Wang, Mengmeng Chen, Dong Yang, and Bin Li. "Development and application of emotion recognition technology—a systematic literature review." BMC psychology 12, no. 1 (2024): 95.
[9] Khosravi, Mohsen, Zahra Zare, Seyyed Morteza Mojtabaeian, and Reyhane Izadi. "Artificial intelligence and decision-making in healthcare: a thematic analysis of a systematic review of reviews." Health services research and managerial epidemiology 11 (2024): 23333928241234863.
[10] Udahemuka, Gustave, Karim Djouani, and Anish M. Kurien. "Multimodal Emotion Recognition using visual, vocal and Physiological Signals: a review." Applied Sciences 14, no. 17 (2024): 8071.
[11] Menne, Felix, Felix Dörr, Julia Schräder, Johannes Tröger, Ute Habel, Alexandra König, and Lisa Wagels. "The voice of depression: speech features as biomarkers for major depressive disorder." BMC psychiatry 24, no. 1 (2024): 794.
[12] Gkintoni, Evgenia, Anthimos Aroutzidis, Hera Antonopoulou, and Constantinos Halkiopoulos. "From neural networks to emotional networks: A systematic review of EEG-based emotion recognition in cognitive neuroscience and real-world applications." Brain Sciences 15, no. 3 (2025): 220.
[13] Verma, Milan Kumar, Sagar Choudhary, and Hritik Gupta. "IoT-Based Healthcare Monitoring System Towards Improving Quality of Life." International Journal of Sciences and Innovation Engineering 2, no. 5 (2025): 1032-1042.
[14] Dalvi, Chirag, Manish Rathod, Shruti Patil, Shilpa Gite, and Ketan Kotecha. "A survey of ai-based facial emotion recognition: Features, ml & dl techniques, age-wise datasets and future directions." Ieee Access 9 (2021): 165806-165840.
[15] Ragot, Martin, Nicolas Martin, Sonia Em, Nico Pallamin, and Jean-Marc Diverrez. "Emotion recognition using physiological signals: laboratory vs. wearable sensors." In International Conference on Applied Human Factors and Ergonomics, pp. 15-22. Cham: Springer International Publishing, 2017.
[16] Jain, Manas, Shruthi Narayan, Pratibha Balaji, Abhijit Bhowmick, and Rajesh Kumar Muthu. "Speech emotion recognition using support vector machine." arXiv preprint arXiv:2002.07590 (2020).
[17] Madanian, Samaneh, Talen Chen, Olayinka Adeleye, John Michael Templeton, Christian Poellabauer, Dave Parry, and Sandra L. Schneider. "Speech emotion recognition using machine learning—A systematic review." Intelligent systems with applications 20 (2023): 200266.
[18] Kapu, Nirmal Joshua, and Raghav Karan. "Towards Advanced Speech Signal Processing: A Statistical Perspective on Convolution-Based Architectures and its Applications." arXiv preprint arXiv:2411.18636 (2024).
[19] Poorna, S. S., Vivek Menon, and Sundararaman Gopalan. "Hybrid CNN-BiLSTM architecture with multiple attention mechanisms to enhance speech emotion recognition." Biomedical Signal Processing and Control 100 (2025): 106967.
[20] Chowdhury, Shahana Yasmin, Bithi Banik, Md Tamjidul Hoque, and Shreya Banerjee. "A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering." arXiv preprint arXiv:2507.07046 (2025).
[21] Barhoumi, Chawki, and Yassine BenAyed. "Real-time speech emotion recognition using deep learning and data augmentation." Artificial Intelligence Review 58, no. 2 (2024): 49.
[22] Kumar, Rajesh. "Development of a Facial Expression Recognition System for the DoctorLINK Telemedicine Platform Using Transfer Learning." (2025).
[23] Alhussein, Ghada, Mohanad Alkhodari, Ahsan H. Khandoker, and Leontios J. Hadjileontiadis. "Novel speech-based emotion climate recognition in peers’ conversations incorporating affect dynamics and temporal convolutional neural networks." IEEE Access (2025).
[24] Olatinwo, Damilola D., Adnan Abu-Mahfouz, Gerhard Hancke, and Hermanus Myburgh. "IoT-enabled WBAN and machine learning for speech emotion recognition in patients." Sensors 23, no. 6 (2023): 2948.
[25] de Santana, Maíra Araújo, Flávio Secco Fonseca, Arianne Sarmento Torcate, and Wellington Pinheiro dos Santos. "Emotion recognition from multimodal data: a machine learning approach combining classical and hybrid deep architectures." Research on Biomedical Engineering 39, no. 3 (2023): 613-638.
[26] Islam, Samiul, Md Maksudul Haque, and Abu Jobayer Md Sadat. "Capturing spectral and long-term contextual information for speech emotion recognition using deep learning techniques." arXiv preprint arXiv:2308.04517 (2023).
[27] Nimisha, M. R., T. Shamitha, M. Sanmathi, and M. N. Suma. "Real Time Speech Emotion Recognition Using LSTM and Raspberry Pi." In 2023 7th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), pp. 1-6. IEEE, 2023.
[28] Peng, Zixuan, Yu Lu, Shengfeng Pan, and Yunfeng Liu. "Efficient speech emotion recognition using multi-scale cnn and attention." In ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 3020-3024. IEEE, 2021.
[29] Venkateswaran, Balaji, Jyotirmay Mishra, Amit Kumar Ahuja, Rahul Kumar Jain, Sanjeev Kumar, and Arshad Rafiq Khan. "Automated Detection of Emotional States from Speech using Hybrid Deep Learning Model." Journal of Computational Analysis & Applications 33,
[30] Manolekshmi, M.A. Mukunthan, "Speech Emotion Recognition Using Hybrid Deep Learning and Ensemble Approaches," SSRG International Journal of Electronics and Communication Engineering, vol. 12, no. 1, pp. 216-235, 2025. Crossref, https://doi.org/10.14445/23488549/IJECE-V12I1P117
[31] Barhoumi, Chawki, and Yassine BenAyed. "Real-time speech emotion recognition using deep learning and data augmentation." Artificial Intelligence Review 58, no. 2 (2024): 49.
[32] Bhat, Akshata A., S. Kavitha, Shashank Mouli Satapathy, and J. Kavipriya. "Real time bimodal emotion recognition using hybridized deep learning techniques." Procedia Computer Science 235 (2024): 1772-1781.
[33] Slimi, Anwer, Henri Nicolas, and Mounir Zrigui. "Hybrid Time Distributed CNN-transformer for Speech Emotion Recognition." In ICSOFT, pp. 602-611. 2022.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Computational and Experimental Science and Engineering

This work is licensed under a Creative Commons Attribution 4.0 International License.