Resilience engineering is one of those things we all want to do, but never quite find the time to practice. There are always fires to put out and new infrastructure to deploy.
This talk will cover the fundamentals required for applying resilience engineering principles in your day to day work. It will cover how to identify the system dependencies that affect component and system-level service quality, and then go over some of the mitigation strategies that you can employ to prevent and alleviate failures in your own environment, starting from the simplest and cheapest, and then increasing in terms time, cost, and knowledge requirements.
It will cover:
Nisan Haramati is a software engineer who works on distributed data systems and their verification at Wallaroo Labs.
Having built and supported a variety of large scale distributed systems, he is
...