Our systems don’t have to be perfect to be operational - planes, networks, and elite athletes all function at extremely high levels even though they are not operating at 100%.
As an industry, we have moved the locus of control from hardware to operating system to virtual machine, to container, to orchestration, and now we are approaching serverless. None of that has reduced the amount of work that must happen, it just makes it possible to re-use and conceptually compress the work of others. Since we are making the work in our tools less visible, we also have less control over how they work. We end up assuming that the promises that have been true will continue to be true, but that is not in our control.
How do we handle this level of uncertainty? By adding in error budgets, layered access, and other accommodations for failure and for designing our systems for function over form or purity.
The audience will leave with some concrete ideas about how to add resiliency to their system by learning to trust but mitigate their reliance on perfect performance of their underlying tools.