Cascading Timeouts: A Simple Strategy for Building Resilient Systems

Madhavi Latha Bhairavabhatla

doi:10.22399/ijcesen.3959

Authors

Madhavi Latha Bhairavabhatla

DOI:

https://doi.org/10.22399/ijcesen.3959

Keywords:

Cascading timeouts, distributed systems resilience, fault tolerance,, microservices architecture, system performance optimization

Abstract

Distributed systems are often susceptible to cascading failures when there is no adequate coordination of the timeout settings at architectural layers. The cascading timeout approach attempts to overcome this challenge by defining a series of increasing timeout values that grow from internal services toward gateway components, with timeouts increasing as you go outward to the ingress layer. This creates natural circuit breakers and prevents system-wide outages by ensuring internal components fail fast while gateway components provide adequate time for user responses. This approach transforms timeout configuration into an innovative architectural decision to maximize resources, resilience, thread pool efficiency, and user experience. The strategy must be carefully calibrated based on empirical patterns of latency and integrated with retry logic across database layers, platform services, load balancers, and ingress points. Cascading timeout systems have a superior system throughput, faster recovery during system failures, easier operational monitoring, and more predictable resource utilization patterns at different traffic loads.

References

[1] Jeffrey Dean and Luiz André Barroso, "The tail at scale," 2013. Available: https://www.barroso.org/publications/TheTailAtScale.pdf

[2] Peter Alvaro et al., "Lineage-driven fault injection," SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015. Available: https://dl.acm.org/doi/10.1145/2723372.2723711

[3] Amazon Web Services, "Fault tolerance and fault isolation," Available: https://docs.aws.amazon.com/whitepapers/latest/availability-and-beyond-improving-resilience/fault-tolerance-and-fault-isolation.html

[4] Lawal Abdulmujeeb Olabiyi, "How Netflix Scales Its API for Millions of Requests: A Technical Deep Dive," 2025. Available: https://medium.com/@biyilawal/how-netflix-scales-its-api-for-millions-of-requests-a-technical-deep-dive-e0c10aa786f3

[5] Daniel An, "Find out how you stack up to new industry benchmarks for mobile page speed," Google Business, 2018. Available: https://business.google.com/ca-en/think/marketing-strategies/mobile-page-speed-new-industry-benchmarks/

[6] ScyllaDB, "Eventual Consistency," Available: https://www.scylladb.com/glossary/eventual-consistency/

[7] Bowen Ruan et al., "A Performance Study of Containers in Cloud Environment,"Advances in Services Computing, 2016. Available: https://link.springer.com/chapter/10.1007/978-3-319-49178-3_27

[8] Uber Engineering, "The Uber Engineering Tech Stack, Part I: The Foundation," 2016. Available: https://www.uber.com/en-IN/blog/tech-stack-part-one-foundation/

[9] Jay Kreps, "Building LinkedIn Real-time Data Pipeline," Available: https://docs.huihoo.com/apache/kafka/Building-LinkedIn-Real-time-Data-Pipeline.pdf

[10] Bloomberg, "Spotify Swings To Second-Quarter Loss, Missing Estimates," 2025. Available: https://www.ndtvprofit.com/quarterly-earnings/spotify-swings-to-second-quarter-loss-missing-estimates#:~:text=Spotify%20Swings%20To%20Second%2DQuarter%20Loss%2C%20Missing%20Estimates,-Monthly%20active%20users&text=Earnings%20dropped%20to%20a%20loss,estimates%20of%20%E2%82%AC4.27%20billion.

Cascading Timeouts: A Simple Strategy for Building Resilient Systems

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Announcements

Fake Journal warning

Keywords

Announcements

Current Issue