Dynamic Task Weighting Mechanism for a Task-Aware Approach to Mitigating Catastrophic Forgetting
DOI:
https://doi.org/10.22399/ijcesen.985Keywords:
Catastrophic Forgetting, Continual Learning, Task similarity, Neural networks, Text-based datasetsAbstract
Catastrophic forgetting is still a big issue in sequential learning and in particular for Natural Language Processing (NLP) models that tend to forget knowledge encoded in previous tasks when learning new targets. To do this, we present a Dynamic Task Weighting Mechanism which forms a part of the Adaptive Knowledge Consolidation (AKC) framework. Our method dynamically adjust knowledge retention to task similarity and task specific performance, while contrasted to static regularization approaches such as Elastic Weight Consolidation (EWC) and Synaptic Intelligence (SI). This mechanism is proposed that involves computing task embeddings with pre-trained models BERT and quantifying their similarity from cosine similarity. To complete the above, we compute a similarity score which is merged with normalized task specific performance metrics of accuracy, F1 score to form an importance score. The model trades adaptability in learning in order to retain previously learned knowledge by prioritizing important tasks and minimizing interference from other unrelated tasks. We show that our proposed mechanism substantially mitigates forgetting and results in accuracy improvements on extensive experiments on standard NLP benchmarks such as GLUE, AG News, and SQuAD. Among baseline methods (EWC, SI, and GEM), the model also has the highest average accuracy of 86.7% and the least amount of forgetting of 6.2%.
References
Kirkpatrick et al., "Overcoming Catastrophic Forgetting in Neural Networks," Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
https://doi.org/10.1073/pnas.1611835114 DOI: https://doi.org/10.1073/pnas.1611835114
Zenke, B. Poole, and S. Ganguli, "Continual Learning through Synaptic Intelligence," in Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 2017, pp. 3987–3995. https://doi.org/10.48550/arXiv.1703.04200
Shin et al., "Continual Learning with Deep Generative Replay," in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, 2017, pp. 2994–3003.
https://doi.org/10.48550/arXiv.1705.08690
Rolnick et al., "Experience Replay for Continual Learning," in Advances in Neural Information Processing Systems (NeurIPS), vol. 32, 2019, pp. 350–360. https://doi.org/10.48550/arXiv.1811.11682
He et al., "Robust Multi-Task Learning with Excess Risks," ArXiv preprint, 2024.
https://doi.org/10.48550/arXiv.2402.02009
J. Pan and Q. Yang, "A Survey on Transfer Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, Oct. 2010. https://doi.org/10.1109/TKDE.2009.191 DOI: https://doi.org/10.1109/TKDE.2009.191
Zhuang et al., "A Comprehensive Survey on Transfer Learning," Proceedings of the IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2020.
https://doi.org/10.1109/JPROC.2020.3004555 DOI: https://doi.org/10.1109/JPROC.2020.3004555
Sun et al., "Revisiting Unsupervised Domain Adaptation for Robust NLP Models," in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 7427–7442. https://doi.org/10.18653/v1/2020.emnlp-main.497 DOI: https://doi.org/10.18653/v1/2020.emnlp-main.497
Kirchdorfer et al., "Analytical Uncertainty-Based Loss Weighting in Multi-Task Learning," ArXiv preprint, 2024.
https://doi.org/10.48550/arXiv.2408.07985
Tang et al., "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts," ArXiv preprint, 2024.
https://doi.org/10.48550/arXiv.2402.00433
Zhou, O. Wu, and M. Li, "Investigating the Sample Weighting Mechanism Using an Interpretable Weighting Framework," IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 9, pp. 2041–2055, Sept. 2024.
https://doi.org/10.1109/TKDE.2023.3316168 DOI: https://doi.org/10.1109/TKDE.2023.3316168
I. Parisi et al., "Continual Lifelong Learning with Neural Networks: A Review," Neural Networks, vol. 113, pp. 54–71, May 2019.
https://doi.org/10.1016/j.neunet.2019.01.012 DOI: https://doi.org/10.1016/j.neunet.2019.01.012
Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Minneapolis, MN, USA, 2019, pp. 4171–4186. https://doi.org/10.48550/arXiv.1810.04805
Ramasesh et al., "Effect of Model and Pretraining Scale on Catastrophic Forgetting in Neural Networks," ArXiv preprint, 2022.
Nagalapuram, J., & S. Samundeeswari. (2024). Genetic-Based Neural Network for Enhanced Soil Texture Analysis: Integrating Soil Sensor Data for Optimized Agricultural Management. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.572 DOI: https://doi.org/10.22399/ijcesen.572
Bandla Raghuramaiah, & Suresh Chittineni. (2025). BCDNet: An Enhanced Convolutional Neural Network in Breast Cancer Detection Using Mammogram Images. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.811 DOI: https://doi.org/10.22399/ijcesen.811
S.D.Govardhan, Pushpavalli, R., Tatiraju.V.Rajani Kanth, & Ponmurugan Panneer Selvam. (2024). Advanced Computational Intelligence Techniques for Real-Time Decision-Making in Autonomous Systems. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.591 DOI: https://doi.org/10.22399/ijcesen.591
R. Logesh Babu, K. Tamilselvan, N. Purandhar, Tatiraju V. Rajani Kanth, R. Prathipa, & Ponmurugan Panneer Selvam. (2025). Adaptive Computational Intelligence Algorithms for Efficient Resource Management in Smart Systems. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.836 DOI: https://doi.org/10.22399/ijcesen.836
Machireddy, C., & Chella, S. (2024). Reconfigurable Acceleration of Neural Networks: A Comprehensive Study of FPGA-based Systems. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.559 DOI: https://doi.org/10.22399/ijcesen.559
P. Padma, & G. Siva Nageswara Rao. (2024). CBDC-Net: Recurrent Bidirectional LSTM Neural Networks Based Cyberbullying Detection with Synonym-Level N-Gram and TSR-SCSOFeatures. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.623 DOI: https://doi.org/10.22399/ijcesen.623
Robert, N. R., A. Cecil Donald, & K. Suresh. (2025). Artificial Intelligence Technique Based Effective Disaster Recovery Framework to Provide Longer Time Connectivity in Mobile Ad-hoc Networks. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.713 DOI: https://doi.org/10.22399/ijcesen.713
S. Krishnaveni, Devi, R. R., Ramar, S., & S.S.Rajasekar. (2025). Novel Architecture For EEG Emotion Classification Using Neurofuzzy Spike Net. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.829 DOI: https://doi.org/10.22399/ijcesen.829
S. Amuthan, & N.C. Senthil Kumar. (2025). Emerging Trends in Deep Learning for Early Alzheimer’s Disease Diagnosis and Classification: A Comprehensive Review. International Journal of Computational and Experimental Science and Engineering, 11(1). https://doi.org/10.22399/ijcesen.739 DOI: https://doi.org/10.22399/ijcesen.739
PATHAPATI, S., N. J. NALINI, & Mahesh GADIRAJU. (2024). Comparative Evaluation of EEG signals for Mild Cognitive Impairment using Scalograms and Spectrograms with Deep Learning Models. International Journal of Computational and Experimental Science and Engineering, 10(4). https://doi.org/10.22399/ijcesen.534 DOI: https://doi.org/10.22399/ijcesen.534
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 International Journal of Computational and Experimental Science and Engineering

This work is licensed under a Creative Commons Attribution 4.0 International License.