Observability: anomaly detection at scale with prometheus
DOI:
https://doi.org/10.22399/ijcesen.4576Keywords:
Observability, Anomaly Detection, Prometheus, Time-Series Metrics, Cloud-Native Systems, Monitoring and AlertingAbstract
Observability has become a key requirement for managing modern cloud-native systems that have dispersed architectures and a lot of different metrics. Traditional threshold-based monitoring methods are typically not good enough to find small or changing system problems on a large scale. This hypothetical study investigates anomaly identification inside a Prometheus-based observability framework, emphasizing the use of statistical, seasonality-aware, and machine-learning-inspired methodologies to extensive time-series measurements. To test detection accuracy, timeliness, alert quality, and scalability under different workloads and system conditions, a simulated microservices environment is used. The results show that dynamic baseline-driven anomaly detection is far better at discovering faults early and cutting down on alert noise than static thresholds. Advanced analytical techniques yield superior detection accuracy; nevertheless, Prometheus-native methods present a more scalable and operationally efficient alternative. The study shows how Prometheus may help with proactive observability and make sure that complicated distributed systems run smoothly
References
1. O. Mart, C. Negru, F. Pop, and A. Castiglione, “Observability in Kubernetes cluster: Automatic anomalies detection using Prometheus,” in Proc. IEEE 22nd Int. Conf. High Performance Computing and Communications (HPCC), IEEE 18th Int. Conf. Smart City, and IEEE 6th Int. Conf. Data Science and Systems (DSS), Dec. 2020, pp. 565–570.
2. H. Hämäläinen, I. Rantanen, S. Aalto, and M. Pum, “Monitoring and observability in Kubernetes clusters using Prometheus and Grafana,” 2021.
3. H. Barrett, J. Matthews, A. Ford, and H. Castro, “Observability and monitoring using Prometheus and Grafana in cloud setups,” 2023.
4. D. Flores, C. Adell, and J. Vanderaa, Modern Network Observability: A Hands-On Approach Using Open Source Tools Such as Telegraf, Prometheus, and Grafana. Birmingham, U.K.: Packt Publishing, 2024.
5. J. F. Caro-Director and D. Taibi, “Detección de anomalías con Prometheus,” 2020.
6. B. Madupati, “Observability in microservices architectures: Leveraging logging, metrics, and distributed tracing in large-scale systems,” Nov. 30, 2023.
7. W. Hegedus, Mastering Prometheus: Gain Expert Tips to Monitoring Your Infrastructure, Applications, and Services. Birmingham, U.K.: Packt Publishing, 2024.
8. M. Hausenblas, Cloud Observability in Action. New York, NY, USA: Simon & Schuster, 2024.
9. A. Yalavarti, “Observatory: Fast and scalable systems observability,” Ph.D. dissertation, Brown Univ., Providence, RI, USA, 2022.
10. D. Noetzold, A. G. Rossetto, V. R. Leithardt, and H. D. M. Costa, “Enhancing infrastructure observability: Machine learning for proactive monitoring and anomaly detection,” 2024.
11. H. Peter, “Monitoring and observability in DevOps environments,” 2022.
12. M. Chakraborty and A. P. Kundan, “Observability,” in Monitoring Cloud-Native Applications: Lead Agile Operations Confidently Using Open Source Software. Berkeley, CA, USA: Apress, 2021, pp. 25–54.
13. A. P. Perumal, “Implementing observability in distributed Linux systems: A comprehensive framework for performance monitoring and analysis,” J. Sci. Eng. Res., vol. 8, no. 3, pp. 224–229, 2021.
14. K. B. C. Rodrigues, K. V. Cardoso, and S. L. Correa, “Anomaly detection in cloud-native B5G systems using observability and machine learning COTS solutions,” J. Internet Services Appl., vol. 13, p. 1, 2023.
15. P. K. Lingamallu and F. Oliveira, AWS Observability Handbook: Monitor, Trace, and Alert Your Cloud Applications with AWS Observability Tools. Birmingham, U.K.: Packt Publishing
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 International Journal of Computational and Experimental Science and Engineering

This work is licensed under a Creative Commons Attribution 4.0 International License.