A Hybrid Deep Learning Approach for Efficient Cross-Language Detection
DOI:
https://doi.org/10.22399/ijcesen.808Keywords:
Cross-language detection, Hybrid deep learning, LSTM, Perplexity, Multilingual text processing, Language identificationAbstract
Cross-language detection is a challenging task that involves identifying the language of a given text across multiple languages, often in noisy or mixed-language environments. This also identify and classify text across different languages for various applications, such as multilingual sentiment analysis, language translation and cross-border content moderations. Traditional approaches often rely on rule-based systems or monolingual models, which lack scalability and adaptability to diverse linguistic structures. In this study, we propose a hybrid deep learning model combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks to enhance language detection accuracy and robustness. LSTM and GRU, known for their ability to capture long-term dependencies and reduce vanishing gradient problems, are integrated to leverage their complementary strengths. The model is evaluated using BLEU scores, a widely accepted metric for evaluating linguistic quality, and perplexity, which measures the model's ability to predict a sequence of words. Our experimental results demonstrate that the hybrid deep learning model outperforms traditional approaches, achieving high BLEU scores and low perplexity across diverse multilingual datasets. This approach not only improves language detection accuracy but also reduces computational complexity, making it suitable for real-time applications in multilingual text processing. The proposed model shows promise in real-world applications, enabling efficient cross-language detection in multilingual environments.
References
Akhter, M. P., Jiangbin, Z., Naqvi, I. R., Abdelmajeed, M., & Sadiq, M. T. (2020). Automatic detection of offensive language for Urdu and Roman Urdu. IEEE Access, 8, 91213–91226. https://doi.org/10.1109/ACCESS.2020.2997461 DOI: https://doi.org/10.1109/ACCESS.2020.2994950
Anand, M., Sahay, K. B., Ahmed, M. A., Sultan, D., Chandan, R. R., & Singh, B. (2023). Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques. Theoretical Computer Science, 943, 203-218. https://doi.org/10.1016/j.tcs.2023.04.030 DOI: https://doi.org/10.1016/j.tcs.2022.06.020
Fale, P. N., Goyal, K. K., & Shivani, S. (2023). A hybrid deep learning approach for abusive text detection. In AIP Conference Proceedings (Vol. 2753, No. 1, pp. 1-5). AIP Publishing. https://doi.org/10.1063/5.0119765 DOI: https://doi.org/10.1063/5.0128071
Al-Sarem, M., Alsaeedi, A., Saeed, F., Boulila, W., & AmeerBakhsh, O. (2021). A novel hybrid deep learning model for detecting COVID-19-related rumors on social media based on LSTM and concatenated parallel CNNs. Applied Sciences, 11(17), 7940. https://doi.org/10.3390/app11177940 DOI: https://doi.org/10.3390/app11177940
Kumar, A., Saumya, S., & Singh, A. (2023). Detecting Dravidian offensive posts in MIoT: A hybrid deep learning framework. ACM Transactions on Asian and Low-Resource Language Information Processing. https://doi.org/10.1145/3572658 DOI: https://doi.org/10.1145/3592602
Haq, I., Qiu, W., Guo, J., & Tang, P. (2023). Pashto offensive language detection: A benchmark dataset and monolingual Pashto BERT.
Peer J Computer Science, 9, e1617. https://doi.org/10.7717/peerj-cs.1617 DOI: https://doi.org/10.7717/peerj-cs.1617
Fha, S., Sharma, U., & Naleer, H. M. M. (2023). Development of an efficient method to detect mixed social media data with Tamil-English code using machine learning techniques. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(2), 1-19. https://doi.org/10.1145/3580876 DOI: https://doi.org/10.1145/3563775
Nabi, S. A., Kalpana, P., Chandra, N. S., Smitha, L., Naresh, K., Ezugwu, A. E., & Abualigah, L. (2024). Distributed private preserving learning based chaotic encryption framework for cognitive healthcare IoT systems. Informatics in Medicine Unlocked, 49, 101547. https://doi.org/10.1016/j.imu.2024.101547 DOI: https://doi.org/10.1016/j.imu.2024.101547
Kalpana, P., Malleboina, K., Nikhitha, M., Saikiran, P., & Kumar, S. N. (2024). Predicting cyberbullying on social media in the big data era using machine learning algorithm. In 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 1-7. https://doi.org/10.1109/ICDSNS62112.2024.10691297 DOI: https://doi.org/10.1109/ICDSNS62112.2024.10691297
Shannaq, F., Hammo, B., Faris, H., & Castillo-Valdivieso, P. A. (2022). Offensive language detection in Arabic social networks using evolutionary-based classifiers learned from fine-tuned embeddings. IEEE Access, 10, 75018-75039. https://doi.org/10.1109/ACCESS.2022.3155969 DOI: https://doi.org/10.1109/ACCESS.2022.3190960
Ponugoti, K., Smitha, L., Madhavi, D., Abdul Nabi, S., Kalpana, G., & Kodati, S. (2024). A smart irrigation system using the IoT and advanced machine learning model: A systematic literature review. International Journal of Computational and Experimental Science and Engineering 10(4). https://doi.org/10.22399/ijcesen.526 DOI: https://doi.org/10.22399/ijcesen.526
Noor, T. H., Noor, A., Alharbi, A. F., Faisal, A., Alrashidi, R., Alsaedi, A. S., Alharbi, G., Alsanoosy, T., & Alsaeedi, A. (2024). Real-time Arabic sign language recognition using a hybrid deep learning model. Sensors, 24, 3683. https://doi.org/10.3390/s24113683 DOI: https://doi.org/10.3390/s24113683
Geethanjali, R., & Valarmathi, A. (2024). A novel hybrid deep learning IChOA-CNN-LSTM model for modality-enriched and multilingual emotion recognition in social media. Scientific Reports, 14, 22270. https://doi.org/10.1038/s41598-024-73452-2 DOI: https://doi.org/10.1038/s41598-024-73452-2
Kazbekova, G., Ismagulova, Z., Kemelbekova, Z., Tileubay, S., Boranbek, B., & Bazarbayeva, A. (2023). Offensive language detection on online social networks using hybrid deep learning architecture. International Journal of Advanced Computer Science and Applications, 14(11), 10-15. https://doi.org/10.14569/IJACSA.2023.0141180 DOI: https://doi.org/10.14569/IJACSA.2023.0141180
Yahya, M. A., & Kim, D.-K. (2022). Cross-language source code clone detection using deep learning with InferCode. arXiv preprint arXiv:2205.04913. https://doi.org/10.48550/arXiv.2205.04913
Ullah, F., Naeem, M., Naeem, H., Cheng, X., & Alazab, M. (2022). CroLSSim: Cross-language software similarity detector using a hybrid approach of LSA-based AST-MDrep features and CNN-LSTM model. International Journal of Intelligent Systems, 37(9), 5768-5795. https://doi.org/10.1002/int.22813 DOI: https://doi.org/10.1002/int.22813
Li, J., Zhang, J., & Qian, M. (2022). Cross-linguistic similarity evaluation techniques based on deep learning. Advanced Pattern Recognition Systems for Multimedia Data. https://doi.org/10.1155/2022/5439320 DOI: https://doi.org/10.1155/2022/5439320
Vijayakumar, V., Prasad, D. H., & P, A. (2021). Multimodal cyberbullying detection using hybrid deep learning algorithms. International Journal of Applied Engineering Research, 16(7), 568-574. https://doi.org/10.37622/IJAER/16.7.2021.568-574 DOI: https://doi.org/10.37622/IJAER/16.7.2021.568-574
Kalpana, Ponugoti. , Narayana, Potu. , L., Smitha,. , Madhavi, Dasari. , Keerthi, K.. , Smerat, Aseel. , Akram, Muhannad (2025). Health-Fots- A Latency Aware Fog Based IoT Environment and Efficient Monitoring of Body’s Vital Parameters in Smart Health Care Environment. Journal of Intelligent Systems and Internet of Things, 144-156. DOI: https://doi.org/10.54216/JISIoT.150112 DOI: https://doi.org/10.54216/JISIoT.150112
Deshwal, D., et al. (2020). A language identification system using hybrid features and back-propagation neural network. Applied Acoustics, 164, 107289. https://doi.org/10.1016/j.apacoust.2020.107289 DOI: https://doi.org/10.1016/j.apacoust.2020.107289
Hashmi, E., Yayilgan, S. Y., Hameed, I. A., Yamin, M. M., Ullah, M., & Abomhara, M. (2024). Enhancing multilingual hate speech detection: From language-specific insights to cross-linguistic integration. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3202547 DOI: https://doi.org/10.1109/ACCESS.2024.3452987
Khan, L., Amjad, A., Afaq, K. M., & Chang, H.-T. (2022). Deep sentiment analysis using CNN-LSTM architecture of English and Roman Urdu text shared in social media. Applied Sciences, 12(5), 2694. https://doi.org/10.3390/app12052694 DOI: https://doi.org/10.3390/app12052694
V.P., & Rao, M. R. A. (2024). A scalable, secure, and efficient framework for sharing electronic health records using permissioned blockchain technology. International Journal of Computational and Experimental Science and Engineering 10(4). https://doi.org/10.22399/ijcesen.535 DOI: https://doi.org/10.22399/ijcesen.535
Alzanin, S. M., Azmi, A. M., & Aboalsamh, H. A. (2022). Short text classification for Arabic social media tweets. Journal of King Saud University-Computer and Information Sciences, 34(9), 6595–6604. https://doi.org/10.1016/j.jksuci.2021.03.004 DOI: https://doi.org/10.1016/j.jksuci.2022.03.020
Zhai, Q., Wang, J., & Liu, W. (2022). Abusive language detection using a hybrid approach based on deep learning. Applied Sciences, 12(8), 2700. https://doi.org/10.3390/app12082700
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 International Journal of Computational and Experimental Science and Engineering

This work is licensed under a Creative Commons Attribution 4.0 International License.