A Hybrid Deep Learning Approach for Efficient Cross-Language Detection

Authors

  • Ponugoti Kalpana Assistant Professor, Department of Computer Science and Engineering, AVN Institute of Engineering and Technology, Hyderabad, Telangana, 501510, India. https://orcid.org/0000-0002-4014-8566
  • Shaik Abdul Nabi
  • Panjagari Kavitha
  • K. Naresh
  • Maddala Vijayalakshmi
  • P. Vinayasree

DOI:

https://doi.org/10.22399/ijcesen.808

Keywords:

Cross-language detection, Hybrid deep learning, LSTM, Perplexity, Multilingual text processing, Language identification

Abstract

Cross-language detection is a challenging task that involves identifying the language of a given text across multiple languages, often in noisy or mixed-language environments. This also identify and classify text across different languages for various applications, such as multilingual sentiment analysis, language translation and cross-border content moderations. Traditional approaches often rely on rule-based systems or monolingual models, which lack scalability and adaptability to diverse linguistic structures. In this study, we propose a hybrid deep learning model combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks to enhance language detection accuracy and robustness. LSTM and GRU, known for their ability to capture long-term dependencies and reduce vanishing gradient problems, are integrated to leverage their complementary strengths. The model is evaluated using BLEU scores, a widely accepted metric for evaluating linguistic quality, and perplexity, which measures the model's ability to predict a sequence of words. Our experimental results demonstrate that the hybrid deep learning model outperforms traditional approaches, achieving high BLEU scores and low perplexity across diverse multilingual datasets. This approach not only improves language detection accuracy but also reduces computational complexity, making it suitable for real-time applications in multilingual text processing. The proposed model shows promise in real-world applications, enabling efficient cross-language detection in multilingual environments.

References

Akhter, M. P., Jiangbin, Z., Naqvi, I. R., Abdelmajeed, M., & Sadiq, M. T. (2020). Automatic detection of offensive language for Urdu and Roman Urdu. IEEE Access, 8, 91213–91226. https://doi.org/10.1109/ACCESS.2020.2997461 DOI: https://doi.org/10.1109/ACCESS.2020.2994950

Anand, M., Sahay, K. B., Ahmed, M. A., Sultan, D., Chandan, R. R., & Singh, B. (2023). Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques. Theoretical Computer Science, 943, 203-218. https://doi.org/10.1016/j.tcs.2023.04.030 DOI: https://doi.org/10.1016/j.tcs.2022.06.020

Fale, P. N., Goyal, K. K., & Shivani, S. (2023). A hybrid deep learning approach for abusive text detection. In AIP Conference Proceedings (Vol. 2753, No. 1, pp. 1-5). AIP Publishing. https://doi.org/10.1063/5.0119765 DOI: https://doi.org/10.1063/5.0128071

Al-Sarem, M., Alsaeedi, A., Saeed, F., Boulila, W., & AmeerBakhsh, O. (2021). A novel hybrid deep learning model for detecting COVID-19-related rumors on social media based on LSTM and concatenated parallel CNNs. Applied Sciences, 11(17), 7940. https://doi.org/10.3390/app11177940 DOI: https://doi.org/10.3390/app11177940

Kumar, A., Saumya, S., & Singh, A. (2023). Detecting Dravidian offensive posts in MIoT: A hybrid deep learning framework. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3572658 DOI: https://doi.org/10.1145/3592602

Haq, I., Qiu, W., Guo, J., & Tang, P. (2023). Pashto offensive language detection: A benchmark dataset and monolingual Pashto BERT.

Peer J Computer Science, 9, e1617. https://doi.org/10.7717/peerj-cs.1617 DOI: https://doi.org/10.7717/peerj-cs.1617

Fha, S., Sharma, U., & Naleer, H. M. M. (2023). Development of an efficient method to detect mixed social media data with Tamil-English code using machine learning techniques. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(2), 1-19. https://doi.org/10.1145/3580876 DOI: https://doi.org/10.1145/3563775

Nabi, S. A., Kalpana, P., Chandra, N. S., Smitha, L., Naresh, K., Ezugwu, A. E., & Abualigah, L. (2024). Distributed private preserving learning based chaotic encryption framework for cognitive healthcare IoT systems. Informatics in Medicine Unlocked, 49, 101547. https://doi.org/10.1016/j.imu.2024.101547 DOI: https://doi.org/10.1016/j.imu.2024.101547

Kalpana, P., Malleboina, K., Nikhitha, M., Saikiran, P., & Kumar, S. N. (2024). Predicting cyberbullying on social media in the big data era using machine learning algorithm. In 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 1-7. https://doi.org/10.1109/ICDSNS62112.2024.10691297 DOI: https://doi.org/10.1109/ICDSNS62112.2024.10691297

Shannaq, F., Hammo, B., Faris, H., & Castillo-Valdivieso, P. A. (2022). Offensive language detection in Arabic social networks using evolutionary-based classifiers learned from fine-tuned embeddings. IEEE Access, 10, 75018-75039. https://doi.org/10.1109/ACCESS.2022.3155969 DOI: https://doi.org/10.1109/ACCESS.2022.3190960

Ponugoti, K., Smitha, L., Madhavi, D., Abdul Nabi, S., Kalpana, G., & Kodati, S. (2024). A smart irrigation system using the IoT and advanced machine learning model: A systematic literature review. International Journal of Computational and Experimental Science and Engineering 10(4). https://doi.org/10.22399/ijcesen.526 DOI: https://doi.org/10.22399/ijcesen.526

Noor, T. H., Noor, A., Alharbi, A. F., Faisal, A., Alrashidi, R., Alsaedi, A. S., Alharbi, G., Alsanoosy, T., & Alsaeedi, A. (2024). Real-time Arabic sign language recognition using a hybrid deep learning model. Sensors, 24, 3683. https://doi.org/10.3390/s24113683 DOI: https://doi.org/10.3390/s24113683

Geethanjali, R., & Valarmathi, A. (2024). A novel hybrid deep learning IChOA-CNN-LSTM model for modality-enriched and multilingual emotion recognition in social media. Scientific Reports, 14, 22270. https://doi.org/10.1038/s41598-024-73452-2 DOI: https://doi.org/10.1038/s41598-024-73452-2

Kazbekova, G., Ismagulova, Z., Kemelbekova, Z., Tileubay, S., Boranbek, B., & Bazarbayeva, A. (2023). Offensive language detection on online social networks using hybrid deep learning architecture. International Journal of Advanced Computer Science and Applications, 14(11), 10-15. https://doi.org/10.14569/IJACSA.2023.0141180 DOI: https://doi.org/10.14569/IJACSA.2023.0141180

Yahya, M. A., & Kim, D.-K. (2022). Cross-language source code clone detection using deep learning with InferCode. arXiv preprint arXiv:2205.04913. https://doi.org/10.48550/arXiv.2205.04913

Ullah, F., Naeem, M., Naeem, H., Cheng, X., & Alazab, M. (2022). CroLSSim: Cross-language software similarity detector using a hybrid approach of LSA-based AST-MDrep features and CNN-LSTM model. International Journal of Intelligent Systems, 37(9), 5768-5795. https://doi.org/10.1002/int.22813 DOI: https://doi.org/10.1002/int.22813

Li, J., Zhang, J., & Qian, M. (2022). Cross-linguistic similarity evaluation techniques based on deep learning. Advanced Pattern Recognition Systems for Multimedia Data. https://doi.org/10.1155/2022/5439320 DOI: https://doi.org/10.1155/2022/5439320

Vijayakumar, V., Prasad, D. H., & P, A. (2021). Multimodal cyberbullying detection using hybrid deep learning algorithms. International Journal of Applied Engineering Research, 16(7), 568-574. https://doi.org/10.37622/IJAER/16.7.2021.568-574 DOI: https://doi.org/10.37622/IJAER/16.7.2021.568-574

Kalpana, Ponugoti. , Narayana, Potu. , L., Smitha,. , Madhavi, Dasari. , Keerthi, K.. , Smerat, Aseel. , Akram, Muhannad (2025). Health-Fots- A Latency Aware Fog Based IoT Environment and Efficient Monitoring of Body’s Vital Parameters in Smart Health Care Environment. Journal of Intelligent Systems and Internet of Things, 144-156. DOI: https://doi.org/10.54216/JISIoT.150112 DOI: https://doi.org/10.54216/JISIoT.150112

Deshwal, D., et al. (2020). A language identification system using hybrid features and back-propagation neural network. Applied Acoustics, 164, 107289. https://doi.org/10.1016/j.apacoust.2020.107289 DOI: https://doi.org/10.1016/j.apacoust.2020.107289

Hashmi, E., Yayilgan, S. Y., Hameed, I. A., Yamin, M. M., Ullah, M., & Abomhara, M. (2024). Enhancing multilingual hate speech detection: From language-specific insights to cross-linguistic integration. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3202547 DOI: https://doi.org/10.1109/ACCESS.2024.3452987

Khan, L., Amjad, A., Afaq, K. M., & Chang, H.-T. (2022). Deep sentiment analysis using CNN-LSTM architecture of English and Roman Urdu text shared in social media. Applied Sciences, 12(5), 2694. https://doi.org/10.3390/app12052694 DOI: https://doi.org/10.3390/app12052694

V.P., & Rao, M. R. A. (2024). A scalable, secure, and efficient framework for sharing electronic health records using permissioned blockchain technology. International Journal of Computational and Experimental Science and Engineering 10(4). https://doi.org/10.22399/ijcesen.535 DOI: https://doi.org/10.22399/ijcesen.535

Alzanin, S. M., Azmi, A. M., & Aboalsamh, H. A. (2022). Short text classification for Arabic social media tweets. Journal of King Saud University-Computer and Information Sciences, 34(9), 6595–6604. https://doi.org/10.1016/j.jksuci.2021.03.004 DOI: https://doi.org/10.1016/j.jksuci.2022.03.020

Zhai, Q., Wang, J., & Liu, W. (2022). Abusive language detection using a hybrid approach based on deep learning. Applied Sciences, 12(8), 2700. https://doi.org/10.3390/app12082700

Downloads

Published

2024-12-30

How to Cite

Ponugoti Kalpana, Shaik Abdul Nabi, Panjagari Kavitha, K. Naresh, Maddala Vijayalakshmi, & P. Vinayasree. (2024). A Hybrid Deep Learning Approach for Efficient Cross-Language Detection. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.808

Issue

Section

Research Article