Crashing Pods: How to compensate for such an outage?
Kubernetes offers a lot of functionalities to keep the downtime of pods very low. Graceful shutdown and zero downtime deployments are definitely possible with Kubernetes. However, this only applies to the proper transition of containers or pods. Despite all precautions taken by Kubernetes, it can happen that a service crash leads to HTTP 5xx responses. Other measures must be taken to fully compensate for services that are in such an error state.
This session shows why the classic approach with a resilience framework cannot completely solve these types of problems. For this purpose, the pod lifecycle is taken into account and the Kubernetes workflow for replacing faulty pods is analyzed. One possible solution strategy is client side load balancing. A service mesh tool like Istio is used to demonstrate what it takes to achieve full compensation using this strategy.
DevOpsCon London:
Crashing Pods: How to compensate for such an outage?