Posted almost 2 years ago. Visible to the public. Draft.
Maximizing Service Reliability
- Load balancing and service health
- Rate limits
- Back-pressure via headers
- Validating reliability and fault tolerance
- Load testing ([The art of scalability](The
Art of Scalability))
- Load test as part of the delivery pipeline
- Exploratory load testing to identify limits and test assumptions
- Chaos testing
- Chaos toolkit
- Define a measurable steady state of normal system operation.
- Hypothesize that behavior in an experimental and control group will remain steady; the system will be resilient to the failure introduced.
- Introduce variables that reflect real-world failure events — for example, removing servers, severing network connections, or introducing higher levels of latency.
- Attempt to disprove the hypothesis you defined in (2).