Reinforcement Learning for grapheme-to-Phoneme Conversion in Kannada Speech Synthesis

Authors

  • Ganga Gudi
  • Mallamma V Reddy
  • Hanumanthappa M

DOI:

https://doi.org/10.22399/ijcesen.4684

Keywords:

Grapheme-to-Phoneme Mapping, Reinforcement Learning, Kannada Speech Processing, Text-to-Speech, Low-Resource Languages, Phonological Modeling

Abstract

Grapheme-to-phoneme (G2P) mapping plays a vital role in the development of text-to-speech systems, particularly for languages with complex morphology and limited computational resources such as Kannada. Existing G2P techniques based on handcrafted rules or supervised machine learning depend heavily on linguistic knowledge or large volumes of labeled data, making them difficult to scale for low-resource languages. To address these challenges, this work introduces a reinforcement learning–driven approach for Kannada grapheme-to-phoneme conversion. The task is modeled as a stepwise decision process in which an intelligent agent incrementally predicts phoneme sequences from written text by learning an optimal policy guided by a reward function that reflects pronunciation correctness and phonological coherence. By learning through interaction rather than direct supervision, the proposed framework adapts effectively to novel word forms and pronunciation variations. Experimental evaluation on a Kannada text dataset shows that the reinforcement learning model produces more accurate phoneme sequences and lower error rates when compared to conventional rule-based and statistical G2P methods. These findings demonstrate the potential of reinforcement learning as a flexible and data-efficient solution for building robust G2P systems in low-resource Indian languages, ultimately enhancing the clarity and naturalness of synthesized Kannada speech.

References

[1] Parmesh Kaur, Manpreet Kaur, and Priyanka Sharma, “Text-to-Braille and Speech Conversion System for Visually Impaired Using Deep Learning,” International Journal of Computer Applications, vol. 180, no. 12, pp. 25–32, 2020.

[2] Lakshmi Sahu, Ravi Kumar, and P. S. Kumar, “Corpus-Driven Text-to-Speech System for Indian Languages,” Procedia Computer Science, vol. 167, pp. 2232–2241, 2020.

[3] Ashwin K. Vijayakumar, T. Balaji, and S. K. Divakaran, “Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion in Indian Languages,” Proceedings of the 12th International Conference on Natural Language Processing, pp. 143–152, 2020.

[4] Sunil Kumar Kopparapu and T. V. Sreenivas, “Grapheme-to-Phoneme Conversion Using Statistical Models for Indian Languages,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1457–1466, 2010.

[5] Chaitanya Sharma, A. S. Shankar, and K. S. R. Anjaneyulu, “Deep Learning Approaches for Grapheme-to-Phoneme Conversion in Low-Resource Indian Languages,” Journal of Speech Technology, vol. 23, no. 3, pp. 201–215, 2021.

[6] Xie, Y., Zhang, H., and Li, M., “Neural Sequence-to-Sequence Grapheme-to-Phoneme Conversion Using Attention Mechanisms,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1205–1216, 2020.

[7] Rao, R., and Reddy, K., “Rule-Based Grapheme-to-Phoneme Conversion for Indian Languages Including Kannada,” International Journal of Speech Technology, vol. 15, pp. 101–112, 2012.

[8] Zhang, J., Wang, Y., and Li, X., “Reinforcement Learning for Sequence Generation in Speech Processing,” Proceedings of Interspeech 2019, pp. 2755–2759, 2019.

[9] Yao, K., and Zweig, G., “Sequence-Level Reinforcement Learning for Speech Recognition Optimization,” IEEE Spoken Language Technology Workshop, pp. 419–426, 2018.

[10] Sitaram, S., Kumar, A., and Patel, R., “Multilingual Text-to-Speech Framework for Indian Languages with Adaptive G2P Modules,” Speech Communication, vol. 120, pp. 45–57, 2020.

[11] Bisani, M., and Ney, H., “Joint-Sequence Models for Grapheme-to-Phoneme Conversion,” Speech Communication, vol. 50, no. 5, pp. 434–451, 2008.

[12] Sandeep Kumar, Shikhar Sharma, and V. Ravi, “Indic NLP Library: Tools for Indian Language Processing,” Proceedings of the 2nd Workshop on NLP for Low-Resource Languages, pp. 1–10, 2019.

[13] Ronald J. Williams, “Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning,” Machine Learning, vol. 8, pp. 229–256, 1992.

Downloads

Published

2026-01-07

How to Cite

Gudi, G., Mallamma V Reddy, & Hanumanthappa M. (2026). Reinforcement Learning for grapheme-to-Phoneme Conversion in Kannada Speech Synthesis. International Journal of Computational and Experimental Science and Engineering, 12(1). https://doi.org/10.22399/ijcesen.4684

Issue

Section

Research Article