Semantic–Lexical Fusion: Improving Retrieval Accuracy for AI-Driven Knowledge Systems

Authors

  • Karthik Chakravarthy Cheekuri

DOI:

https://doi.org/10.22399/ijcesen.4267

Keywords:

hybrid retrieval, semantic search, lexical matching, retrieval-augmented generation, information retrieval systems

Abstract

Large language models increasingly rely on external knowledge retrieval to generate accurate, context-aware responses. Dense vector representations enable semantic similarity matching but struggle with exact term identification, structured metadata constraints, and domain-specific identifiers common in enterprise environments. This framework integrates dense embedding-based retrieval with token-level inverted indexing to address these limitations. The architecture employs dual indexing structures, a unified query decomposition layer, and a weighted scoring mechanism that combines cosine similarity with BM25 relevance signals. Field-level filtering and access control mechanisms ensure compliance with organizational constraints while maintaining semantic generalization.

Performance characterization across production-scale deployments demonstrates that hybrid architectures achieve 60-150ms query latency with 1.0-2.0GB memory footprint per million documents, while pure semantic approaches require 768MB-1.5GB and pure lexical systems consume 200-500MB. Index construction analysis reveals vector encoding demands 15-50 hours for million-document collections compared to minutes for inverted indices, though batch processing strategies mitigate operational impact. Sensitivity analysis across weighting parameters identifies optimal semantic-lexical balance at α = 0.6, achieving peak F1 score of 0.847 and demonstrating 15-30% accuracy improvements over single-paradigm methods.

Evaluation across open-domain corpora (MS MARCO, Natural Questions) and enterprise document collections demonstrates improved precision and recall, particularly for queries containing product identifiers, user entities, and structured filters. The system supports retrieval-augmented generation pipelines, conversational interfaces, and knowledge-grounded chatbots where both conceptual relevance and deterministic matching are essential. Results indicate that fusing complementary retrieval paradigms provides robust performance across diverse query types while maintaining interpretability and production-grade reliability for AI-driven applications. Cost analysis for enterprise deployments reveals 50-80% infrastructure premium for hybrid systems compared to single-approach implementations, justified by reduced hallucinations, improved user satisfaction, and decreased support escalations in operational AI applications.

References

[1] Felix Mahr, et al., "Optimizing Semantic Search in Industrial Knowledge Retrieval: A Novel SHAP-Based Attention Mask Modification Approach," IEEE Transactions on Industrial Informatics, 23 May 2025, https://ieeexplore.ieee.org/document/11006545

[2] Elif Yozkan and Ilham Supriyanto, "How Reliable Is Semantic Search in Industrial Computing Domain? A Statistical Evaluation Pipeline," IEEE Access, 24 September 2025, https://ieeexplore.ieee.org/document/11166412

[5] Pouria Omrani, et al., "Hybrid Retrieval-Augmented Generation Approach for LLMs Query Response Enhancement," 2024 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), 21 May 2024, https://ieeexplore.ieee.org/document/10533345

[6] Jinyin Zhang and Rongsheng Xie, "Word2vec-Powered Algorithm for Efficient Retrieval of Bill of Quantities," 2023 IEEE International Conference on Big Data (BigData), 24 January 2024, https://ieeexplore.ieee.org/document/10405620

[7] Gülsüm Budakoglu and Hakan Emekci, "Unveiling the Power of Large Language Models: A Comparative Study of Retrieval-Augmented Generation, Fine-Tuning, and Their Synergistic Fusion for Enhanced Performance," 2024 IEEE International Conference on Artificial Intelligence and Data Engineering (AIDE), 14 February 2025, https://ieeexplore.ieee.org/document/10887212

[8] Wenqi Fan, et al., "Towards Retrieval-Augmented Large Language Models: Data Management and System Design," 2025 IEEE International Conference on Data Engineering and AI Systems (DEAIS), 20 August 2025, https://ieeexplore.ieee.org/document/11113067

[9] Yingjiao Pei, et al., "Deep Hashing Network With Hybrid Attention and Adaptive Weighting for Image Retrieval," IEEE Transactions on Image Processing, 30 October 2023, https://ieeexplore.ieee.org/document/10301569

[10] Hyunju Oh, et al., "Evaluating Performance Trade-offs of Caching Strategies for AI-Powered Data Management Systems," 2024 IEEE International Conference on Big Data (BigData), 16 January 2025, https://ieeexplore.ieee.org/document/10825819

[11]García Lirios, C., Jose Alfonso Aguilar Fuentes, & Gabriel Pérez Crisanto. (2025). Theories of Information and Communication in the face of risks from 1948 to 2024. International Journal of Natural-Applied Sciences and Engineering, 3(1). https://doi.org/10.22399/ijnasen.19

[12]Fabiano de Abreu Agrela Rodrigues. (2025). Related Hormonal Deficiencies and Their Association with Neurodegenerative Diseases. International Journal of Sustainable Science and Technology, 3(1). https://doi.org/10.22399/ijsusat.5

[13]García, R. (2025). Optimization in the Geometric Design of Solar Collectors Using Generative AI Models (GANs). International Journal of Applied Sciences and Radiation Research , 2(1). https://doi.org/10.22399/ijasrar.32

[14]Fabiano de Abreu Agrela Rodrigues, Flavio Henrique dos Santos Nascimento, André Di Francesco Longo, & Adriel Pereira da Silva. (2025). Genetic study of gifted individuals reveals individual variation in genetic contribution to intelligence. International Journal of Applied Sciences and Radiation Research , 2(1). https://doi.org/10.22399/ijasrar.25

[15] Chui, K. T. (2025). Artificial Intelligence in Energy Sustainability: Predicting, Analyzing, and Optimizing Consumption Trends. International Journal of Sustainable Science and Technology, 3(1). https://doi.org/10.22399/ijsusat.1

[16] García, R., Carlos Garzon, & Juan Estrella. (2025). Generative Artificial Intelligence to Optimize Lifting Lugs: Weight Reduction and Sustainability in AISI 304 Steel. International Journal of Applied Sciences and Radiation Research , 2(1). https://doi.org/10.22399/ijasrar.22

[17] Attia Hussien Gomaa. (2025). From TQM to TQM 4.0: A Digital Framework for Advancing Quality Excellence through Industry 4.0 Technologies. International Journal of Natural-Applied Sciences and Engineering, 3(1). https://doi.org/10.22399/ijnasen.21

[18] Kumari, S. (2025). Machine Learning Applications in Cryptocurrency: Detection, Prediction, and Behavioral Analysis of Bitcoin Market and Scam Activities in the USA. International Journal of Sustainable Science and Technology, 3(1). https://doi.org/10.22399/ijsusat.8

[19]Ibeh, C. V., & Adegbola, A. (2025). AI and Machine Learning for Sustainable Energy: Predictive Modelling, Optimization and Socioeconomic Impact In The USA. International Journal of Applied Sciences and Radiation Research , 2(1). https://doi.org/10.22399/ijasrar.19

[20] Soyal, H., & Canpolat, M. (2025). Intersections of Ergonomics and Radiation Safety in Interventional Radiology. International Journal of Sustainable Science and Technology, 3(1). https://doi.org/10.22399/ijsusat.12

[21]Olola, T. M., & Olatunde, T. I. (2025). Artificial Intelligence in Financial and Supply Chain Optimization: Predictive Analytics for Business Growth and Market Stability in The USA. International Journal of Applied Sciences and Radiation Research , 2(1). https://doi.org/10.22399/ijasrar.18

[22]Vishwanath Pradeep Bodduluri. (2025). Social Media Addiction and Its Overlay with Mental Disorders: A Neurobiological Approach to the Brain Subregions Involved. International Journal of Sustainable Science and Technology, 3(1). https://doi.org/10.22399/ijsusat.3

[23]Harsha Patil, Vikas Mahandule, Rutuja Katale, & Shamal Ambalkar. (2025). Leveraging Machine Learning Analytics for Intelligent Transport System Optimization in Smart Cities. International Journal of Applied Sciences and Radiation Research , 2(1). https://doi.org/10.22399/ijasrar.38

[24] Attia Hussien Gomaa. (2025). Value Engineering in the Era of Industry 4.0 (VE 4.0): A Comprehensive Review, Gap Analysis, and Strategic Framework. International Journal of Natural-Applied Sciences and Engineering, 3(1). https://doi.org/10.22399/ijnasen.22

Downloads

Published

2025-11-10

How to Cite

Karthik Chakravarthy Cheekuri. (2025). Semantic–Lexical Fusion: Improving Retrieval Accuracy for AI-Driven Knowledge Systems. International Journal of Computational and Experimental Science and Engineering, 11(4). https://doi.org/10.22399/ijcesen.4267

Issue

Section

Research Article