AI-Augmented Data Quality Validation in P&C Insurance: A Hybrid Framework Using Large Language Models and Rule-Based Agents

Authors

  • Shreekant Malviya Research Scholar
  • Vrushali Parate

DOI:

https://doi.org/10.22399/ijcesen.3613

Keywords:

Data Quality, Large Language Models (LLMs), Agentic AI framework, Property & Casualty Insurance

Abstract

The growing complexity and volume of data in Property & Casualty (P&C) insurance have intensified the need for robust, scalable, and intelligible data quality validation methodologies. Conventional rule-based validation systems provide transparency; however, they have challenges in adapting to evolving data and regulatory requirements. This paper addresses these challenges using a hybrid methodology that integrates Agentic AI, merging the precision of deterministic rule logic with the inferential prowess of large language models (LLMs). The architecture consists of modular agents—ProfilerAgent, LLMRuleAgent, RuleAgent, and SummaryAgent—each designated with a distinct role in a data quality pipeline, enhancing transparency, reusability, and scalability. Through the use of a locally hosted LLaMA model with Ollama, the system produces schema-aware YAML rules, verifies structured datasets, and creates natural language data quality issue summaries. An experimental evaluation with a real-world auto insurance claims dataset from Kaggle showed that the framework successfully identified schema mismatches, format problems, and semantic discrepancies without requiring human rule generation. The results indicate that the agentic architecture increases flexibility in resource-limited, compliance-focused settings. The research presents a new solution to the current controversy on automatic data governance in insurance by fusing explainable AI with operational reliability and developing a feasible solution for businesses, striking a balance between regulatory requirements and digital transformation initiatives. While experimented in the P&C context, the modular design enables straightforward adaptation to other domains such as retail and healthcare where similar data quality challenges exist.

References

[1] C. R. O. Forum, “Data Quality in the Insurance sector - Stocktaking and proposed way forward,” The CRO Forum. Accessed: Jul. 17, 2025. [Online]. Available: https://thecroforum.org/data-quality-in-the-insurance-sector/

[2] G. Hall, M. Jones, K. Madigan, and S. Zheng, “DATA QUALITY MANAGEMENT IN THE P&C INSURANCE SECTOR”.

[3] M. K. Borowicz, “The data quality problem (in the European Financial Data Space),” Int. J. Law Inf. Technol., vol. 32, p. eaae015, Jun. 2024, doi: 10.1093/ijlit/eaae015. DOI: https://doi.org/10.1093/ijlit/eaae015

[4] B. M. V. Bernardo, H. S. Mamede, J. M. P. Barroso, and V. M. P. D. dos Santos, “Data governance & quality management—Innovation and breakthroughs across different fields,” J. Innov. Knowl., vol. 9, no. 4, p. 100598, Oct. 2024, doi: 10.1016/j.jik.2024.100598. DOI: https://doi.org/10.1016/j.jik.2024.100598

[5] A. M. Astobiza, “The role of LLMs in theory building,” Soc. Sci. Humanit. Open, vol. 11, p. 101617, Jan. 2025, doi: 10.1016/j.ssaho.2025.101617. DOI: https://doi.org/10.1016/j.ssaho.2025.101617

[6] A. Rath, “Structured Prompting and Feedback-Guided Reasoning with LLMs for Data Interpretation,” May 03, 2025, arXiv: arXiv:2505.01636. doi: 10.48550/arXiv.2505.01636.

[7] M. P. J. van der Loo and E. de Jonge, “Data Validation,” Dec. 21, 2020. doi: 10.1002/9781118445112. DOI: https://doi.org/10.1002/9781118445112.stat08255

[8] “Best Data Quality Tools for 2025: Top 10 Choices.” Accessed: Jul. 14, 2025. [Online]. Available: https://www.adverity.com/blog/data-quality-tools/

[9] A. Groll, A. Khanna, and L. Zeldin, A Machine Learning-based Anomaly Detection Framework in Life Insurance Contracts. 2024. doi: 10.48550/arXiv.2411.17495.

[10] Kevin N. Shah, Sandip J. Gami, and Abhishek Trehan, “An Intelligent Approach to Data Quality Management AI-Powered Quality Monitoring in Analytics,” Int. J. Adv. Res. Sci. Commun. Technol., pp. 109–119, Dec. 2024, doi: 10.48175/ijarsct-22820. DOI: https://doi.org/10.48175/IJARSCT-22820

[11] P. S. Dhoni, “Enhancing Data Quality through Generative AI: An Empirical Study with Data,” Nov. 07, 2023, Institute of Electrical and Electronics Engineers (IEEE). doi: 10.36227/techrxiv.24470032. DOI: https://doi.org/10.36227/techrxiv.24470032

[12] D. Coquelin et al., “Accelerating neural network training with distributed asynchronous and selective optimization (DASO),” J. Big Data, vol. 9, no. 1, Dec. 2022, doi: 10.1186/s40537-021-00556-1. DOI: https://doi.org/10.1186/s40537-021-00556-1

[13] The authors are with Universitas Indonesia, Indonesia, D. Maharani, H. Murfi, and Y. Satria, “Performance of Deep Neural Network for Tabular Data — A Case Study of Loss Cost Prediction in Fire Insurance,” Int. J. Mach. Learn. Comput., vol. 9, no. 6, pp. 734–742, Dec. 2019, doi: 10.18178/ijmlc.2019.9.6.866. DOI: https://doi.org/10.18178/ijmlc.2019.9.6.866

[14] S. Xie, “Improving Explainability of Major Risk Factors in Artificial Neural Networks for Auto Insurance Rate Regulation,” Risks, vol. 9, no. 7, p. 126, Jul. 2021, doi: 10.3390/risks9070126. DOI: https://doi.org/10.3390/risks9070126

[15] “Algomox Blog | Hybrid Approach: Pairing Threshold-Based Rules with AI/ML Techniques.” Accessed: Jul. 17, 2025. [Online]. Available: https://www.algomox.com/resources/blog/hybrid_approach_pairing_threshold_based_rules_with_ai_ml_techniques

[16] “Why a Rules Based Plus a Machine Learning Hybrid Approach - 2021 | 1Spatial.” Accessed: Jul. 17, 2025. [Online]. Available: https://1spatial.com/news/why-a-rules-based-plus-a-machine-learning-hybrid-approach-2021

[17] J. Oloyede and J. Owen, “Enhancing Data Quality and Integrity with AI: A Deep Learning Perspective Author: Joseph Oluwaseyi, Fajinmi John,” Feb. 19, 2025, Social Science Research Network, Rochester, NY: 5144205. doi: 10.2139/ssrn.5144205. DOI: https://doi.org/10.2139/ssrn.5144205

[18] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large Language Models are Zero-Shot Reasoners,” Jan. 29, 2023, arXiv: arXiv:2205.11916. doi: 10.48550/arXiv.2205.11916.

[19] I. P. Carrascosa, “Zero-Shot and Few-Shot Learning with Reasoning LLMs,” MachineLearningMastery.com. Accessed: Jul. 17, 2025. [Online]. Available: https://machinelearningmastery.com/zero-shot-and-few-shot-learning-with-reasoning-llms/

[20] G. Tao, “Autonomous AI-powered Anomaly Detection using Timeplus and DeepSeek-R1,” Timeplus. Accessed: Jul. 17, 2025. [Online]. Available: https://www.timeplus.com/post/ai-anomaly-detection-deepseek

[21] “LLM-Powered Test Case Generation: Enhancing Coverage and Efficiency.” Accessed: Jul. 17, 2025. [Online]. Available: https://www.frugaltesting.com/blog/llm-powered-test-case-generation-enhancing-coverage-and-efficiency

[22] “What are Models Thinking about? Understanding Large Language Model Hallucinations through Model Internal State Analysis.” Accessed: Jul. 17, 2025. [Online]. Available: https://arxiv.org/html/2502.13490v1

[23] Tamanna, “Understanding LLM Hallucinations. Causes, Detection, Prevention, and Ethical Concerns,” Medium. Accessed: Jul. 17, 2025. [Online]. Available: https://medium.com/@tam.tamanna18/understanding-llm-hallucinations-causes-detection-prevention-and-ethical-concerns-914bc89128d0

Downloads

Published

2025-07-31

How to Cite

Malviya, S., & Vrushali Parate. (2025). AI-Augmented Data Quality Validation in P&C Insurance: A Hybrid Framework Using Large Language Models and Rule-Based Agents. International Journal of Computational and Experimental Science and Engineering, 11(3). https://doi.org/10.22399/ijcesen.3613

Issue

Section

Research Article