Tuesday, 1 May 2018
site reliability
link - nice summary of googles site reliability book - "Postmortems should be blameless and focus on process and technology, not people. Assume the people involved in an incident are intelligent, are well intentioned... It follows that we can’t “fix” the people, but must instead fix their environment: e.g., improving system design to avoid entire classes of problems"