Real-Time AI-Driven Anomaly Detection in TV Ad Impression Logs Using Streaming Big Data Pipelines

Authors

  • Viharika Bhimanapati
  • Naveen K Chandu

DOI:

https://doi.org/10.22399/ijcesen.4367

Keywords:

Real-Time Anomaly Detection, Streaming Big Data, TV Ad Impressions, Machine Learning, Apache Kafka

Abstract

The growth of the television advertising ecosystems in complexity and size has gradually increased the importance of real-time monitoring and analysis of ad impression logs. The existing traditional (batch-based) and fixed-point anomaly detection systems are not suitable in the context of the high-velocity, high-volume streaming television data. In this paper, the author introduces an in-depth discussion of the AI-powered anomaly detection algorithms that can be implemented in streams of big data pipelines to provide the accuracy, integrity, and adherence of TV ad impressions. Through the use of more sophisticated machine learning and deep learning algorithms, including autoencoders, LSTMs, and ensemble algorithms that are integrated into scalable frameworks, e.g., Apache Kafka and Flink, real-time insights may be obtained at the lowest possible latency. The paper also examines the nature of anomalies that are often witnessed in the ad impression log, deploying model strategies, and performance of the system based on measures like precision, recall, latency, and throughput. The paper is concluded with a discussion of the existing issues, such as data heterogeneity, concept drift, and explainability, and the forecast of future advances in federated learning, edge AI, and hybrid detection models. This convergent strategy illustrates how AI and streaming data systems have a transformative potential in improving the performance (both operational and financial) of the TV advertising sector.

References

1. Fragkoulis, M., Carbone, P., Kalavri, V., & Katsifodimos, A. (2024). A survey on the evolution of stream processing systems. The VLDB Journal, 33(2), 507-541. DOI: https://doi.org/10.1007/s00778-023-00819-8

2. Chan, J. O. (2013). An architecture for big data analytics. Communications of the IIMA, 13(2), 1. DOI: https://doi.org/10.58729/1941-6687.1209

3. Thayyib, P. V., Mamilla, R., Khan, M., Fatima, H., Asim, M., Anwar, I., ... & Khan, M. A. (2023). State-of-the-art of artificial intelligence and big data analytics reviews in five different domains: a bibliometric summary. Sustainability, 15(5), 4026. DOI: https://doi.org/10.3390/su15054026

4. Ezzat, R. (2024). Enhance the advertising effectiveness by using artificial intelligence (AI). Journal of Art, Design and Music, 3(1), 1. DOI: https://doi.org/10.55554/2785-9649.1021

5. Lu, T., Wang, L., & Zhao, X. (2023). Review of anomaly detection algorithms for data streams. Applied Sciences, 13(10), 6353. DOI: https://doi.org/10.3390/app13106353

6. Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., & Muller, P. A. (2019). Deep learning for time series classification: a review. Data mining and knowledge discovery, 33(4), 917-963. DOI: https://doi.org/10.1007/s10618-019-00619-1

7. Singh, P. (2021). Deploy machine learning models to production. Cham, Switzerland: Springer. DOI: https://doi.org/10.1007/978-1-4842-6546-8

8. Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., & Zdonik, S. B. (2003, January). Scalable Distributed Stream Processing. In CIDR (Vol. 3, pp. 257-268).

9. Nazari, E., Shahriari, M. H., & Tabesh, H. (2019). Big Data analysis in healthcare: Apache Hadoop, Apache Spark, and Apache Flink. Frontiers in Health Informatics, 8(1), 14. DOI: https://doi.org/10.30699/fhi.v8i1.180

10. Pal, G., Li, G., & Atkinson, K. (2018, August). Big data real-time ingestion and machine learning. In 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP) (pp. 25-31). IEEE. DOI: https://doi.org/10.1109/DSMP.2018.8478598

11. Bhandari, R. (2024). Data storage handler & technology: evaluation of real-time data handling technology.

12. Daksa, R. P., & Kemala, A. P. (2025). A Comparative Study On Real-Time Data Streaming For Fraud Detection Using Kafka With Apache Flink And Apache Spark. Procedia Computer Science, 269, 192-199. DOI: https://doi.org/10.1016/j.procs.2025.08.272

13. Zahra, F. T., Bostanci, Y. S., Tokgozlu, O., Turkoglu, M., & Soyturk, M. (2024). Big Data Streaming and Data Analytics Infrastructure for Efficient AI-Based Processing. In Recent Advances in Microelectronics Reliability: Contributions from the European ECSEL JU project iRel40 (pp. 213-249). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-031-59361-1_9

14. Zhang, J., Wu, G., Hu, X., & Wu, X. (2012, September). A distributed cache for Hadoop Distributed File System in real-time cloud services. In 2012 ACM/IEEE 13th International Conference on Grid Computing (pp. 12-21). IEEE. DOI: https://doi.org/10.1109/Grid.2012.17

15. Thota, S., Chitta, S., Alluri, V., Vangoor, V., & Ravi, C. S. (2022). MLOps: Streamlining machine learning model deployment in production. African J. of Artificial Int. and Sust. Dev, 2(2), 186-206.

16. Kumar, P. (2024). AI-Powered Fraud Prevention in Digital Payment Ecosystems: Leveraging Machine Learning for Real-Time Anomaly Detection and Risk Mitigation. Journal of Information Systems Engineering and Management 2024, 9(4) e-ISSN: 2468-4376

17. Chandramouli, B., Goldstein, J., & Duan, S. (2012, April). Temporal analytics on big data for web advertising. In 2012, IEEE 28th International Conference on Data Engineering (pp. 90-101). IEEE. DOI: https://doi.org/10.1109/ICDE.2012.55

18. Živanović, M., Štrbac-Savić, S., & Minchev, Z. (2023). An application of machine learning methods for anomaly detection in internet advertising. Journal of Computer and Forensic Sciences, 2(1), 53-61. DOI: https://doi.org/10.5937/jcfs2-45169

19. Zhang, Y., & Haghani, A. (2015). A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies, 58, 308-324. DOI: https://doi.org/10.1016/j.trc.2015.02.019

20. Finke, T., Krämer, M., Morandini, A., Mück, A., & Oleksiyuk, I. (2021). Autoencoders for unsupervised anomaly detection in high-energy physics. Journal of High Energy Physics, 2021(6), 1-32. DOI: https://doi.org/10.1007/JHEP06(2021)161

21. Ribeiro, D., Matos, L. M., Moreira, G., Pilastri, A., & Cortez, P. (2022). Isolation forests and deep autoencoders for industrial screw tightening anomaly detection. Computers, 11(4), 54. DOI: https://doi.org/10.3390/computers11040054

22. Lin, S., Clark, R., Birke, R., Schönborn, S., Trigoni, N., & Roberts, S. (2020, May). Anomaly detection for time series using vae-lstm hybrid model. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4322-4326). IEEE. DOI: https://doi.org/10.1109/ICASSP40776.2020.9053558

23. Wikle, C. K. (2019). Comparison of deep neural networks and deep hierarchical models for spatio-temporal data. Journal of Agricultural, Biological and Environmental Statistics, 24(2), 175-203. DOI: https://doi.org/10.1007/s13253-019-00361-7

24. Helskyaho, H., Yu, J., & Yu, K. (2021). Building Reproducible ML Pipelines Using Oracle Machine Learning. In Machine Learning for Oracle Database Professionals: Deploying Model-Driven Applications and Automation Pipelines (pp. 249-282). Berkeley, CA: Apress. DOI: https://doi.org/10.1007/978-1-4842-7032-5_10

25. Ziffer, G., Bernardo, A., Della Valle, E., Cerqueira, V., & Bifet, A. (2023). Towards time-evolving analytics: Online learning for time-dependent evolving data streams. Data Science, 6(1-2), 1-16. DOI: https://doi.org/10.3233/DS-220057

26. Maltezos, E., Lioupis, P., Dadoukis, A., Karagiannidis, L., Ouzounoglou, E., Krommyda, M., & Amditis, A. (2022). A video analytics system for person detection combined with edge computing. Computation, 10(3), 35. DOI: https://doi.org/10.3390/computation10030035

27. Davis, J. (2008). Beyond the false dichotomy of centralized and decentralized IT deployment. The Tower and The Cloud, 118.

28. Kumar, A., Cuccuru, G., Grüning, B., & Backofen, R. (2023). An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy. GigaScience, 12, giad028. DOI: https://doi.org/10.1093/gigascience/giad028

29. Immaneni, J. (2021). Scaling Machine Learning in Fintech with Kubernetes. International Journal of Digital Innovation, 2(1).

30. Horchidan, S., Kritharakis, E., Kalavri, V., & Carbone, P. (2022, June). Evaluating model serving strategies over streaming data. In Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning (pp. 1-5). DOI: https://doi.org/10.1145/3533028.3533308

31. Sørbø, S., & Ruocco, M. (2024). Navigating the metric maze: A taxonomy of evaluation metrics for anomaly detection in time series. Data Mining and Knowledge Discovery, 38(3), 1027-1068. DOI: https://doi.org/10.1007/s10618-023-00988-8

32. Aissani, N., Beldjilali, B., & Trentesaux, D. (2008). Use of machine learning for continuous improvement of the real-time heterarchical manufacturing control system performances. International Journal of Industrial and Systems Engineering, 3(4), 474-497. DOI: https://doi.org/10.1504/IJISE.2008.017555

33. Nazat, S., Arreche, O., & Abdallah, M. (2024). On evaluating black-box explainable AI methods for enhancing anomaly detection in autonomous driving systems. Sensors, 24(11), 3515. DOI: https://doi.org/10.3390/s24113515

34. Khan, K. (2023). Adaptive video streaming: navigating challenges, embracing personalization, and charting future frontiers. International Transactions on Electrical Engineering and Computer Science, 2(4), 172-182. DOI: https://doi.org/10.62760/iteecs.2.4.2023.63

35. Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., ... & Zhang, D. (2019, August). Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering (pp. 807-817). DOI: https://doi.org/10.1145/3338906.3338931

36. Evensen, P., & Meling, H. (2012, July). AdScorer: an event-based system for near real-time impact analysis of television advertisements (industry article). In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (pp. 85-94). DOI: https://doi.org/10.1145/2335484.2335494

37. Lam, K. Y., Chan, E., & Yuen, J. C. H. (2000). Approaches for broadcasting temporal data in mobile computing systems. Journal of Systems and Software, 51(3), 175-189. DOI: https://doi.org/10.1016/S0164-1212(99)00122-3

38. Moosmann, P. (2018). A geo-clustering approach for the detection of areas of interest.

39. Lee, J. G., & Kang, M. (2015). Geospatial big data: challenges and opportunities. Big Data Research, 2(2), 74-81. DOI: https://doi.org/10.1016/j.bdr.2015.01.003

40. Akbilgic, O., & Howe, J. A. (2017). Symbolic pattern recognition for sequential data. Sequential Analysis, 36(4), 528-540. DOI: https://doi.org/10.1080/07474946.2017.1394719

41. Abner, E. L., Charnigo, R. J., & Kryscio, R. J. (2013). Markov chains and semi-Markov models in time-to-event analysis. Journal of biometrics & biostatistics, (e001), 19522.

42. Visengeriyeva, L., & Abedjan, Z. (2018, July). Metadata-driven error detection. In Proceedings of the 30th International Conference on Scientific and Statistical Database Management (pp. 1-12). DOI: https://doi.org/10.1145/3221269.3223028

43. Häglund, E., & Björklund, J. (2024). AI-driven contextual advertising: Toward relevant messaging without personal data. Journal of Current Issues & Research in Advertising, 45(3), 301-319. DOI: https://doi.org/10.1080/10641734.2024.2334939

44. Nasir, W., & Jack, H. (2025). Real-Time Machine Learning Pipelines: Optimizing Stream Processing for Scalable AI Applications. ResearchGate AI & Data Science Journal.

45. Thirimanne, S. P., Jayawardana, L., Yasakethu, L., Liyanaarachchi, P., & Hewage, C. (2022). Deep neural network-based real-time intrusion detection system. SN Computer Science, 3(2), 145. DOI: https://doi.org/10.1007/s42979-022-01031-1

46. da Silva Veith, A., de Assuncao, M. D., & Lefevre, L. (2018, November). Latency-aware placement of data stream analytics on edge computing. In International Conference on Service-Oriented Computing (pp. 215-229). Cham: Springer International Publishing. DOI: https://doi.org/10.1007/978-3-030-03596-9_14

47. Javed, M. H., Lu, X., & Panda, D. K. (2017, December). Characterization of big data stream processing pipeline: A case study using Flink and Kafka. In Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (pp. 1-10). DOI: https://doi.org/10.1145/3148055.3148068

48. Grill, M., Pevný, T., & Rehak, M. (2017). Reducing false positives of network anomaly detection by local adaptive multivariate smoothing. Journal of Computer and System Sciences, 83(1), 43-57. DOI: https://doi.org/10.1016/j.jcss.2016.03.007

49. Ghassemi, M., Sarwate, A. D., & Wright, R. N. (2016, October). Differentially private online active learning with applications to anomaly detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (pp. 117-128). DOI: https://doi.org/10.1145/2996758.2996766

50. Li, H. (2019). Special section introduction: Artificial intelligence and advertising. Journal of Advertising, 48(4), 333-337. DOI: https://doi.org/10.1080/00913367.2019.1654947

51. Pradhan, C., & Trehan, A. (2024). Data engineering for scalable machine learning, designing robust pipelines. International Journal of Computer Engineering and Technology (IJCET), 15(6), 1840-1852. DOI: https://doi.org/10.34218/IJCET_15_06_157

52. Gupta, V., & Hewett, R. (2019, November). Adaptive normalization in streaming data. In Proceedings of the 3rd International Conference on Big Data Research (pp. 12-17). DOI: https://doi.org/10.1145/3372454.3372466

53. Sistrunk, A., Cedeno, V., & Biswas, S. (2020). On synthetic data generation for anomaly detection in complex social networks. arXiv preprint arXiv:2010.13026.

54. Madireddy, S., Balaprakash, P., Carns, P., Latham, R., Lockwood, G. K., Ross, R., ... & Wild, S. M. (2019, August). Adaptive learning for concept drift in application performance modeling. In Proceedings of the 48th International Conference on Parallel Processing (pp. 1-11). DOI: https://doi.org/10.1145/3337821.3337922

55. Park, J., Aryal, P., Mandumula, S. R., & Asolkar, R. P. (2023). An optimized dnn model for real-time inferencing on an embedded device. Sensors, 23(8), 3992. DOI: https://doi.org/10.3390/s23083992

56. Li, Z., Zhu, Y., & Van Leeuwen, M. (2023). A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1), 1-54. DOI: https://doi.org/10.1145/3609333

57. Su, C., Wei, J., Lei, Y., & Li, J. (2023). A federated learning framework based on transfer learning and knowledge distillation for targeted advertising. PeerJ Computer Science, 9, e1496. DOI: https://doi.org/10.7717/peerj-cs.1496

Downloads

Published

2025-03-30

How to Cite

Viharika Bhimanapati, & Naveen K Chandu. (2025). Real-Time AI-Driven Anomaly Detection in TV Ad Impression Logs Using Streaming Big Data Pipelines. International Journal of Computational and Experimental Science and Engineering, 11(4). https://doi.org/10.22399/ijcesen.4367

Issue

Section

Research Article