Finetuning XLM-Roberta Pretrained Models For Question Answering In Hindi

Authors

  • ANirja D Shah
  • Jyoti Pareek

DOI:

https://doi.org/10.22399/ijcesen.2838

Keywords:

Hindi NLP, Reading Comprehension, RoBERTa, Question Answering, NLP Finetuning

Abstract

The paper intends to explore the development of a Hindi QA system using XLM-RoBERTa, trained on the Hindi subset of the chaii dataset. It tries to bridge the performance gap existing between low resource languages such as Hindi and high resource counterparts like English in the QA systems domain. We validate the model with systematic experimentation over a set of hyperparameters. The results reveal that relatively smaller learning rates, especially 0.00002, even with batch size 8, greatly enhance the performance with average BERTScore of 88.11. On the contrary, higher learning rates uniformly resulted in decreases in model performance. The batch size also mattered to performance but much less so than learning rate as lower batch sizes did not significantly degrade performance at lower learning rates. Further extensions to the above metrics depict good performance for smaller values of learning rates, with cases up to 21.07% above a BLEU score greater than 80 and 37.72% of cases with ROUGE1 F1 above 80. Such cases emphasize fine-tuning to be very important in QA tasks in low-resource languages. This paper contributes to understanding how QA systems can be optimized for Hindi and provides a benchmark for future research in this area

References

[1] Y. Liu et al. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach, arXiv (Cornell University), vol. 1., doi: https://doi.org/10.48550/arxiv.1907.11692

[2] I. Staliūnaitė and I. Iacobacci, (2020). Compositional and Lexical Semantics in RoBERTa, BERT and DistilBERT: A Case Study on CoQA, arXiv (Cornell University), doi: https://doi.org/10.48550/arxiv.2009.08257.

[3] J. S. McCarley, R. Chakravarti, and A. Sil, (2021). Structured Pruning of a BERT-based Question Answering Model, arXiv.org. https://arxiv.org/abs/1910.06360

[4] R. Jia, M. Lewis, and L. Zettlemoyer, (2021). Question Answering Infused Pre-training of General-Purpose Contextualized Representations, arXiv (Cornell University). doi: https://doi.org/10.48550/arxiv.2106.08190

[5] T. Tahsin Mayeesha, A. Md Sarwar, and R. M. Rahman, (2020). Deep learning based question answering system in Bengali, Journal of Information and Telecommunication, doi: https://doi.org/10.1080/24751839.2020.1833136

[6] A. Warstadt, Y. Zhang, H.-S. Li, H. Liu, and S. R. Bowman, (2020). Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually), arXiv (Cornell University). doi: https://doi.org/10.48550/arxiv.2010.05358

[7] Sony Bachina, Spandana Balumuri, and Sowmya Kamath S, (2021). Ensemble ALBERT and RoBERTa for Span Prediction in Question Answering. doi: https://doi.org/10.18653/v1/2021.dialdoc-1.9

[8] Sofian Chaybouti, Achraf Saghe, and Aymen Shabou, (2021). EfficientQA : a RoBERTa Based Phrase-Indexed Question-Answering System, arXiv (Cornell University). doi: https://doi.org/10.48550/arxiv.2101.02157

[9] B. Richardson and A. Wicaksana, (2022). Comparison Of Indobert-Lite And Roberta In Text Mining For Indonesian Language Question Answering Application, International Journal of Innovative Computing, Information and Control ICIC International c, vol. 2022(6), doi: https://doi.org/10.24507/ijicic.18.06.1719

[10] Wiwin Suwarningsih, Raka Aditya Pramata, Fadhil Yusuf Rahadika, and Mochamad Havid Albar Purnomo, (2022). RoBERTa: language modelling in building Indonesian question-answering systems, Telkomnika (Telecommunication Computing Electronics and Control), vol. 20(6), doi: https://doi.org/10.12928/telkomnika.v20i6.24248

[11] K. Pearce, T. Zhan, A. Komanduri, and J. Zhan, (2021). A Comparative Study of Transformer-Based Language Models on Extractive Question Answering, arXiv.org. https://arxiv.org/abs/2110.03142

[12] P. Yu and Y. Liu, (2021). Roberta-based Encoder-decoder Model for Question Answering System. doi: https://doi.org/10.1109/icaa53760.2021.00070

[13] B. S. Harish and R. K. Rangan, (2020). A comprehensive survey on Indian regional language processing, SN Applied Sciences, vol. 2(7), doi: https://doi.org/10.1007/s42452-020-2983-x

[14] K. Sourabh and V. Mansotra, (2012). An Experimental Analysis on the Influence of English on Hindi Language Information Retrieval, International Journal of Computer Applications, vol. 41(11), doi: https://doi.org/10.5120/5587-7832.

[15] J. A. Ilemobayo et al., (2024). Hyperparameter Tuning in Machine Learning: A Comprehensive Review, Journal of Engineering Research and Reports, vol. 26(6), doi: https://doi.org/10.9734/jerr/2024/v26i61188

[16] L. Liao, H. Li, W. Shang, and L. Ma, (2022). An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks, ACM Transactions on Software Engineering and Methodology, vol. 31(3), doi: https://doi.org/10.1145/3506695

[17] Y. Ding, (2021). The Impact of Learning Rate Decay and Periodical Learning Rate Restart on Artificial Neural Network, doi: https://doi.org/10.1145/3460268.3460270

18] S.-Y. Zhao, Y.-P. Xie, and W.-J. Li, (2020). Stagewise Enlargement of Batch Size for SGD-based Learning, arXiv.org, https://arxiv.org/abs/2002.11601

[19] D. Masters and C. Luschi, (2018). Revisiting Small Batch Training for Deep Neural Networks, arXiv:1804.07612 [cs, stat], https://arxiv.org/abs/1804.07612

[20] C. Simionescu, G. Stoica, and R. Herscovici, (2022). Dynamic Batch Adaptation, arXiv.org. https://arxiv.org/abs/2208.00815v1

[21] A. Yang, K. Liu, J. Liu, Y. Lyu, and S. Li, (2018). Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task, arXiv.org, https://arxiv.org/abs/1806.03578

[22] S. Banerjee and A. Lavie, (2005). METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. https://aclanthology.org/W05-0909.pdf

[23] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, (2020). BERTScore: Evaluating Text Generation with BERT, arXiv:1904.09675 [cs], https://arxiv.org/abs/1904.09675

[24] D. Amin, S. Govilkar, and S. Kulkarni, (2023). Question answering using deep learning in low resource Indian language Marathi, arXiv.org, https://arxiv.org/abs/2309.15779

[25] A. Prabhakar, G. S. Majumder, and A. Anand, (2022). CL-NERIL: A Cross-Lingual Model for NER in Indian Languages (Student Abstract), Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36(11), doi: https://doi.org/10.1609/aaai.v36i11.21652

[26] GeMQuAD : Generating Multilingual Question Answering Datasets from Large Language Models using Few Shot Learning, Arxiv.org, 2023. https://arxiv.org/html/2404.09163v1

Downloads

Published

2025-06-26

How to Cite

ANirja D Shah, & Jyoti Pareek. (2025). Finetuning XLM-Roberta Pretrained Models For Question Answering In Hindi. International Journal of Computational and Experimental Science and Engineering, 11(3). https://doi.org/10.22399/ijcesen.2838

Issue

Section

Research Article