Real-Time AI-Driven Anomaly Detection in TV Ad Impression Logs Using Streaming Big Data Pipelines
DOI:
https://doi.org/10.22399/ijcesen.4367Keywords:
Real-Time Anomaly Detection, Streaming Big Data, TV Ad Impressions, Machine Learning, Apache KafkaAbstract
The growth of the television advertising ecosystems in complexity and size has gradually increased the importance of real-time monitoring and analysis of ad impression logs. The existing traditional (batch-based) and fixed-point anomaly detection systems are not suitable in the context of the high-velocity, high-volume streaming television data. In this paper, the author introduces an in-depth discussion of the AI-powered anomaly detection algorithms that can be implemented in streams of big data pipelines to provide the accuracy, integrity, and adherence of TV ad impressions. Through the use of more sophisticated machine learning and deep learning algorithms, including autoencoders, LSTMs, and ensemble algorithms that are integrated into scalable frameworks, e.g., Apache Kafka and Flink, real-time insights may be obtained at the lowest possible latency. The paper also examines the nature of anomalies that are often witnessed in the ad impression log, deploying model strategies, and performance of the system based on measures like precision, recall, latency, and throughput. The paper is concluded with a discussion of the existing issues, such as data heterogeneity, concept drift, and explainability, and the forecast of future advances in federated learning, edge AI, and hybrid detection models. This convergent strategy illustrates how AI and streaming data systems have a transformative potential in improving the performance (both operational and financial) of the TV advertising sector.
References
1. Fragkoulis, M., Carbone, P., Kalavri, V., & Katsifodimos, A. (2024). A survey on the evolution of stream processing systems. The VLDB Journal, 33(2), 507-541.
2. Chan, J. O. (2013). An architecture for big data analytics. Communications of the IIMA, 13(2), 1.
3. Thayyib, P. V., Mamilla, R., Khan, M., Fatima, H., Asim, M., Anwar, I., ... & Khan, M. A. (2023). State-of-the-art of artificial intelligence and big data analytics reviews in five different domains: a bibliometric summary. Sustainability, 15(5), 4026.
4. Ezzat, R. (2024). Enhance the advertising effectiveness by using artificial intelligence (AI). Journal of Art, Design and Music, 3(1), 1.
5. Lu, T., Wang, L., & Zhao, X. (2023). Review of anomaly detection algorithms for data streams. Applied Sciences, 13(10), 6353.
6. Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., & Muller, P. A. (2019). Deep learning for time series classification: a review. Data mining and knowledge discovery, 33(4), 917-963.
7. Singh, P. (2021). Deploy machine learning models to production. Cham, Switzerland: Springer.
8. Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., & Zdonik, S. B. (2003, January). Scalable Distributed Stream Processing. In CIDR (Vol. 3, pp. 257-268).
9. Nazari, E., Shahriari, M. H., & Tabesh, H. (2019). Big Data analysis in healthcare: Apache Hadoop, Apache Spark, and Apache Flink. Frontiers in Health Informatics, 8(1), 14.
10. Pal, G., Li, G., & Atkinson, K. (2018, August). Big data real-time ingestion and machine learning. In 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP) (pp. 25-31). IEEE.
11. Bhandari, R. (2024). Data storage handler & technology: evaluation of real-time data handling technology.
12. Daksa, R. P., & Kemala, A. P. (2025). A Comparative Study On Real-Time Data Streaming For Fraud Detection Using Kafka With Apache Flink And Apache Spark. Procedia Computer Science, 269, 192-199.
13. Zahra, F. T., Bostanci, Y. S., Tokgozlu, O., Turkoglu, M., & Soyturk, M. (2024). Big Data Streaming and Data Analytics Infrastructure for Efficient AI-Based Processing. In Recent Advances in Microelectronics Reliability: Contributions from the European ECSEL JU project iRel40 (pp. 213-249). Cham: Springer International Publishing.
14. Zhang, J., Wu, G., Hu, X., & Wu, X. (2012, September). A distributed cache for Hadoop Distributed File System in real-time cloud services. In 2012 ACM/IEEE 13th International Conference on Grid Computing (pp. 12-21). IEEE.
15. Thota, S., Chitta, S., Alluri, V., Vangoor, V., & Ravi, C. S. (2022). MLOps: Streamlining machine learning model deployment in production. African J. of Artificial Int. and Sust. Dev, 2(2), 186-206.
16. Kumar, P. (2024). AI-Powered Fraud Prevention in Digital Payment Ecosystems: Leveraging Machine Learning for Real-Time Anomaly Detection and Risk Mitigation. Journal of Information Systems Engineering and Management 2024, 9(4) e-ISSN: 2468-4376
17. Chandramouli, B., Goldstein, J., & Duan, S. (2012, April). Temporal analytics on big data for web advertising. In 2012, IEEE 28th International Conference on Data Engineering (pp. 90-101). IEEE.
18. Živanović, M., Štrbac-Savić, S., & Minchev, Z. (2023). An application of machine learning methods for anomaly detection in internet advertising. Journal of Computer and Forensic Sciences, 2(1), 53-61.
19. Zhang, Y., & Haghani, A. (2015). A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies, 58, 308-324.
20. Finke, T., Krämer, M., Morandini, A., Mück, A., & Oleksiyuk, I. (2021). Autoencoders for unsupervised anomaly detection in high-energy physics. Journal of High Energy Physics, 2021(6), 1-32.
21. Ribeiro, D., Matos, L. M., Moreira, G., Pilastri, A., & Cortez, P. (2022). Isolation forests and deep autoencoders for industrial screw tightening anomaly detection. Computers, 11(4), 54.
22. Lin, S., Clark, R., Birke, R., Schönborn, S., Trigoni, N., & Roberts, S. (2020, May). Anomaly detection for time series using vae-lstm hybrid model. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4322-4326). IEEE.
23. Wikle, C. K. (2019). Comparison of deep neural networks and deep hierarchical models for spatio-temporal data. Journal of Agricultural, Biological and Environmental Statistics, 24(2), 175-203.
24. Helskyaho, H., Yu, J., & Yu, K. (2021). Building Reproducible ML Pipelines Using Oracle Machine Learning. In Machine Learning for Oracle Database Professionals: Deploying Model-Driven Applications and Automation Pipelines (pp. 249-282). Berkeley, CA: Apress.
25. Ziffer, G., Bernardo, A., Della Valle, E., Cerqueira, V., & Bifet, A. (2023). Towards time-evolving analytics: Online learning for time-dependent evolving data streams. Data Science, 6(1-2), 1-16.
26. Maltezos, E., Lioupis, P., Dadoukis, A., Karagiannidis, L., Ouzounoglou, E., Krommyda, M., & Amditis, A. (2022). A video analytics system for person detection combined with edge computing. Computation, 10(3), 35.
27. Davis, J. (2008). Beyond the false dichotomy of centralized and decentralized IT deployment. The Tower and The Cloud, 118.
28. Kumar, A., Cuccuru, G., Grüning, B., & Backofen, R. (2023). An accessible infrastructure for artificial intelligence using a Docker-based JupyterLab in Galaxy. GigaScience, 12, giad028.
29. Immaneni, J. (2021). Scaling Machine Learning in Fintech with Kubernetes. International Journal of Digital Innovation, 2(1).
30. Horchidan, S., Kritharakis, E., Kalavri, V., & Carbone, P. (2022, June). Evaluating model serving strategies over streaming data. In Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning (pp. 1-5).
31. Sørbø, S., & Ruocco, M. (2024). Navigating the metric maze: A taxonomy of evaluation metrics for anomaly detection in time series. Data Mining and Knowledge Discovery, 38(3), 1027-1068.
32. Aissani, N., Beldjilali, B., & Trentesaux, D. (2008). Use of machine learning for continuous improvement of the real-time heterarchical manufacturing control system performances. International Journal of Industrial and Systems Engineering, 3(4), 474-497.
33. Nazat, S., Arreche, O., & Abdallah, M. (2024). On evaluating black-box explainable AI methods for enhancing anomaly detection in autonomous driving systems. Sensors, 24(11), 3515.
34. Khan, K. (2023). Adaptive video streaming: navigating challenges, embracing personalization, and charting future frontiers. International Transactions on Electrical Engineering and Computer Science, 2(4), 172-182.
35. Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., ... & Zhang, D. (2019, August). Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering (pp. 807-817).
36. Evensen, P., & Meling, H. (2012, July). AdScorer: an event-based system for near real-time impact analysis of television advertisements (industry article). In Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (pp. 85-94).
37. Lam, K. Y., Chan, E., & Yuen, J. C. H. (2000). Approaches for broadcasting temporal data in mobile computing systems. Journal of Systems and Software, 51(3), 175-189.
38. Moosmann, P. (2018). A geo-clustering approach for the detection of areas of interest.
39. Lee, J. G., & Kang, M. (2015). Geospatial big data: challenges and opportunities. Big Data Research, 2(2), 74-81.
40. Akbilgic, O., & Howe, J. A. (2017). Symbolic pattern recognition for sequential data. Sequential Analysis, 36(4), 528-540.
41. Abner, E. L., Charnigo, R. J., & Kryscio, R. J. (2013). Markov chains and semi-Markov models in time-to-event analysis. Journal of biometrics & biostatistics, (e001), 19522.
42. Visengeriyeva, L., & Abedjan, Z. (2018, July). Metadata-driven error detection. In Proceedings of the 30th International Conference on Scientific and Statistical Database Management (pp. 1-12).
43. Häglund, E., & Björklund, J. (2024). AI-driven contextual advertising: Toward relevant messaging without personal data. Journal of Current Issues & Research in Advertising, 45(3), 301-319.
44. Nasir, W., & Jack, H. (2025). Real-Time Machine Learning Pipelines: Optimizing Stream Processing for Scalable AI Applications. ResearchGate AI & Data Science Journal.
45. Thirimanne, S. P., Jayawardana, L., Yasakethu, L., Liyanaarachchi, P., & Hewage, C. (2022). Deep neural network-based real-time intrusion detection system. SN Computer Science, 3(2), 145.
46. da Silva Veith, A., de Assuncao, M. D., & Lefevre, L. (2018, November). Latency-aware placement of data stream analytics on edge computing. In International Conference on Service-Oriented Computing (pp. 215-229). Cham: Springer International Publishing.
47. Javed, M. H., Lu, X., & Panda, D. K. (2017, December). Characterization of big data stream processing pipeline: A case study using Flink and Kafka. In Proceedings of the Fourth IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (pp. 1-10).
48. Grill, M., Pevný, T., & Rehak, M. (2017). Reducing false positives of network anomaly detection by local adaptive multivariate smoothing. Journal of Computer and System Sciences, 83(1), 43-57.
49. Ghassemi, M., Sarwate, A. D., & Wright, R. N. (2016, October). Differentially private online active learning with applications to anomaly detection. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (pp. 117-128).
50. Li, H. (2019). Special section introduction: Artificial intelligence and advertising. Journal of Advertising, 48(4), 333-337.
51. Pradhan, C., & Trehan, A. (2024). Data engineering for scalable machine learning, designing robust pipelines. International Journal of Computer Engineering and Technology (IJCET), 15(6), 1840-1852.
52. Gupta, V., & Hewett, R. (2019, November). Adaptive normalization in streaming data. In Proceedings of the 3rd International Conference on Big Data Research (pp. 12-17).
53. Sistrunk, A., Cedeno, V., & Biswas, S. (2020). On synthetic data generation for anomaly detection in complex social networks. arXiv preprint arXiv:2010.13026.
54. Madireddy, S., Balaprakash, P., Carns, P., Latham, R., Lockwood, G. K., Ross, R., ... & Wild, S. M. (2019, August). Adaptive learning for concept drift in application performance modeling. In Proceedings of the 48th International Conference on Parallel Processing (pp. 1-11).
55. Park, J., Aryal, P., Mandumula, S. R., & Asolkar, R. P. (2023). An optimized dnn model for real-time inferencing on an embedded device. Sensors, 23(8), 3992.
56. Li, Z., Zhu, Y., & Van Leeuwen, M. (2023). A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1), 1-54.
57. Su, C., Wei, J., Lei, Y., & Li, J. (2023). A federated learning framework based on transfer learning and knowledge distillation for targeted advertising. PeerJ Computer Science, 9, e1496.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Computational and Experimental Science and Engineering

This work is licensed under a Creative Commons Attribution 4.0 International License.