Arabic text classification using graphs and deep learning

Mohammed Benhammouda; Abdelkader Khobzaoui; Nadir Mahammed

doi:10.22399/ijcesen.4402

Authors

Mohammed Benhammouda EEDIS laboratory, University of Djillali Liabes, Sidi Bel Abbes, Algeria
Abdelkader Khobzaoui Laboratory of Mathematic, University of Djillali Liabes, Sidi Bel Abbes, Algeria
Nadir Mahammed EEDIS laboratory, University of Djillali Liabes, Sidi Bel Abbes, Algeria

DOI:

https://doi.org/10.22399/ijcesen.4402

Keywords:

Arabic NLP, Text classification, Graph Neural Networks, AraBERT, Deep learning

Abstract

This paper proposes a novel approach to Arabic text classification that integrates Graph Convolutional Networks (GCNs) with AraBERT embeddings. Unlike traditional sequence-based methods, our framework constructs document-level graphs where words are represented as nodes and edges encode semantic and co-occurrence relations. AraBERT provides rich contextual embeddings for each node, enabling the GCN to capture both local and global dependencies. Experiments on the SANAD–Khaleej dataset (45,500 news articles across seven balanced categories) show that our model achieves 97.25% accuracy, 97.26% macro-F1, and 97.27% recall, significantly outperforming baseline models such as CNNs (95.89% accuracy) and LSTMs (95.23% accuracy). The results confirm the effectiveness of combining graph-based architectures with pre-trained language models for morphologically rich languages such as Arabic, and demonstrates scalability for large-scale text processing

References

[1]Kipf, T. N., & Welling, M. (2017). Semi-Supervised Classification with Graph Convolutional Networks (No. arXiv:1609.02907). arXiv. https://doi.org/10.48550/arXiv.1609.02907

[2]Yao, L., Mao, C., & Luo, Y. (2019). Graph Convolutional Networks for Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 7370–7377. https://doi.org/10.1609/aaai.v33i01.33017370

[3]Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., & Bengio, Y. (2018). Graph Attention Networks (No. arXiv:1710.10903). arXiv. https://doi.org/10.48550/arXiv.1710.10903

[4]Antoun, W., Baly, F., & Hajj, H. (2021). AraBERT: Transformer-based Model for Arabic Language Understanding (No. arXiv:2003.00104). arXiv. https://doi.org/10.48550/arXiv.2003.00104

[5]Elnagar, A., Al-Debsi, R., & Einea, O. (2020). Arabic text classification using deep learning models. Information Processing & Management, 57(1), 102121. https://doi.org/10.1016/j.ipm.2019.102121

[6]Sundus, K., Al-Haj, F., & Hammo, B. (2019). A Deep Learning Approach for Arabic Text Classification. 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS), 1–7. https://doi.org/10.1109/ICTCS.2019.8923083

[7]El Rifai, H., Al Qadi, L., & Elnagar, A. (2022). Arabic text classification: The need for multi-labeling systems. Neural Computing and Applications, 34(2), 1135–1159. https://doi.org/10.1007/s00521-021-06390-z

[8]Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space (No. arXiv:1301.3781). arXiv. https://doi.org/10.48550/arXiv.1301.3781

[9]Pennington, J., Socher, R., & Manning, C. (2014). GloVe: Global Vectors for Word Representation. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1162

[10]Sabri, T., Beggar, O. E., & Kissi, M. (2022). Comparative study of Arabic text classification using feature vectorization methods. Procedia Computer Science, 198, 269–275. https://doi.org/10.1016/j.procs.2021.12.239

[11]Aggarwal, C. C. (2023). Neural Networks and Deep Learning: A Textbook. Springer International Publishing. https://doi.org/10.1007/978-3-031-29642-0

[12]Elnagar, A., Omar Einea, & Al-Debsi, R. (2019). Automatic Text Tagging of Arabic News Articles Using Ensemble Deep Learning Models. In M. Abbas & A. A. Freihat (Eds.), Proceedings of the 3rd International Conference on Natural Language and Speech Processing (pp. 59–66). Association for Computational Linguistics. https://aclanthology.org/W19-7409/

[13]Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1423

[14]Labonne, M. (2023). Hands-on graph neural networks using Python. Packt Publishing Birmingham, UK.

[15]Hamilton, W. L. (2020). Graph representation learning. Morgan & Claypool Publishers

[16]Pal, A., Selvakumar, M., & Sankarasubbu, M. (2020). Multi-Label Text Classification using Attention-based Graph Neural Network. Proceedings of the 12th International Conference on Agents and Artificial Intelligence, 494–505. https://doi.org/10.5220/0008940304940505

[17]Powers, D. M. W. (2020). Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation (No. arXiv:2010.16061). arXiv. https://doi.org/10.48550/arXiv.2010.16061

Arabic text classification using graphs and deep learning

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Keywords

Announcements

Current Issue