piannaf’s Twitter Archive—№ 1,613

1/ I read the Stella Report a while back. Finally getting around to sharing my highlights. If you haven't read it yet, do read the whole thing: snafucatchers.github.io/
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
2/ Without the continuous effort of engineers to keep them running they would stop working -- many in days, most in weeks, all within a year.
Permalink 2019 Aug 2 Mood -1 🙁

…in reply to @piannaf
3/ These platforms remain alive and functioning because workers are able to detect anomalies, diagnose their sources, remediate their effect, and repair their flaws and do so ceaselessly
Permalink 2019 Aug 2 Mood -1 🙁

…in reply to @piannaf
4/ The process of contrasting and shifting perspectives revealed what is otherwise hidden about resilient performances and what is essential to build and sustain the ability to be resilient in the face of surprise in the future
Permalink 2019 Aug 2 Mood +2 🙂

…in reply to @piannaf
5/ Experts are typically much better at solving problems than at describing accurately how problems are solved
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
6/ The software and hardware (collectively, the technical artifacts) running below the line cannot be seen or controlled directly. Instead, every interaction crossing the line is mediated by a representation.
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
7/ When a technical system surprises us, it is most often because our mental models of that system are flawed
Permalink 2019 Aug 2 Mood -3 🙁

…in reply to @piannaf
8/ Package maintenance routines for the system repository, Chef recipes and the Chef system, and the mistaken belief that installing a single server could not have system-wide side effects interacted to produce the anomaly.
Permalink 2019 Aug 2 Mood -2 🙁

…in reply to @piannaf
9/ The irony that the system was able to 'limp along' on a handful of servers that continued to run because they were not 'properly' configured was not lost on the operators.
Permalink 2019 Aug 2 Mood +2 🙂

…in reply to @piannaf
10/ The fact that experts can be surprised in this way is evidence of systemic complexity and also of operational variety.
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
11/ People are surprised when they find out that their own mental model of The System (in the Figure 1 or Figure 2 sense) doesn't match the behavior of the system
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
12/ The participants were engaged in a particularly complicated form of search: exploring the external world based on their internal representations of that world, available affordances, and multiple, interacting goals
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
13/ Experts demonstrated their ability to use their incomplete, fragmented models of the system as starting points for exploration and to quickly revise and expand their models during the anomaly response in order to understand the anomaly and develop and assess solutions
Permalink 2019 Aug 2 Mood +4 🙂

…in reply to @piannaf
14/ Although automation and monitoring provide convenient and efficient ways of doing things and keeping track of nominal performance,
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
15/ when things are broken or confusing or when decisive actions are taken, tools that provide tight interaction with the operating system are commonly used
Permalink 2019 Aug 2 Mood -2 🙁

…in reply to @piannaf
16/ This coordination effort is among the most interesting and potentially important aspects of the anomaly response
Permalink 2019 Aug 2 Mood +4 🙂

…in reply to @piannaf
17/ The postmortem discussions revealed that organizations seek ways to avoid burdening their technical staff with demands for updates and projections, especially in the early stages of anomaly response
Permalink 2019 Aug 2 Mood -4 🙁

…in reply to @piannaf
18/ One rationale for improving the quality of postmortems is to obtain better insight into the way that escalating consequences increase the pressure on IT staff and how to better inform their approach to these difficult situations.
Permalink 2019 Aug 2 Mood +7 🙂

…in reply to @piannaf
19/ Under 'normal' operating conditions many goals can be active simultaneously and the workers need to do little to maintain a balance between competing or mutually exclusive goals
Permalink 2019 Aug 2 Mood +3 🙂

…in reply to @piannaf
20/ sacrifice decisions are readily criticized afterwards and, this is ironically the case, especially when they are successful
2019 Aug 2 Mood +1 🙂

…in reply to @piannaf
21/ postmortems for events that produce large economic losses or engage regulatory bodies are more scripted, sometimes to the point of being little more than staged events at which carefully vetted statements are made and discussion of certain topics is deliberately avoided
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
22/ Anomalies are unambiguous but highly encoded messages about how systems really work. Postmortems represent an attempt to decode the messages and share them
Permalink 2019 Aug 2 Mood +1 🙂

…in reply to @piannaf
23/ Collectively, our skill isn’t in having a good model of how the system works, our skill is in being able to update our model efficiently and appropriately
Permalink 2019 Aug 2 Mood +5 🙂

…in reply to @piannaf
24/ There's the related, but different ‘how-did-this-ever-work?!’ experience that is even more troubling upon discovery.
Permalink 2019 Aug 2 Mood -2 🙁

…in reply to @piannaf
25/ You make a change to restore function to a system but are unable to construct a mental model that would have ever allowed the system to work correctly before you fixed it -- in direct opposition to the observation that the system did appear to be functioning previously."
Permalink 2019 Aug 2 Mood -1 🙁

…in reply to @piannaf
26/ They can also lead to deeper insights into the technical, organizational, economic, and even political factors that promote those conditions
Permalink 2019 Aug 2 Mood +1 🙂

…in reply to @piannaf
27/ The presence and nature of postmortems serves as a signal about the health and focus of the organization and technical artifacts themselves
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
28/ The presence of skilled facilitators -- most often people with technical chops who have devoted time and effort to learn how to manage these meeting and have practiced doing so -- certainly contributes to success
Permalink 2019 Aug 2 Mood +5 🙂

…in reply to @piannaf
29/ Although apparently technically focused, postmortems are inherently social events
Permalink 2019 Aug 2 Mood +2 🙂

…in reply to @piannaf
30/ critical but non-judgmental review of events can produce useful insights
Permalink 2019 Aug 2 Mood +2 🙂

…in reply to @piannaf
31/ Organizations often assert that their reviews are "blameless" although in many instances they are, in fact, sanctionless. As a practical matter, it is difficult to forego sanctions entirely.
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
32/ A "no blame" approach to managing incidents and accidents is predicated on the idea that the knowledge obtained from open, rapid, and thorough examination of these events is worth more than the gain from castigating individuals
Permalink 2019 Aug 2 Mood -1 🙁

…in reply to @piannaf
33/ The dilemma facing those already involved is whether they should stay focused on the anomaly in order to maximize their chances of quick diagnosis and repair or devote some of their effort to bringing others up to speed so that they can participate in that work.
Permalink 2019 Aug 2 Mood +3 🙂

…in reply to @piannaf
34/ steps have been taken, lines of inquiry pursued, diagnostics and workarounds attempted. Coupled to an anomaly that is itself cascading, the activities of initial responders create a new situation that has its own history.
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
35/ The incoming expert usually needs to review that history
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
36/ It is far easier to imagine how automation could be useful than it is to produce working automation that functions as a genuine "team player" in anomaly response
Permalink 2019 Aug 2 Mood +2 🙂

…in reply to @piannaf
37/ Deciding on a risky or expensive course of action, coping with the emotional nature of severe anomalies, and gauging fatigue may be more reliable, efficient, or nuanced with such meetings.
Permalink 2019 Aug 2 Mood -4 🙁

…in reply to @piannaf
38/ Business critical software presents a unique opportunity for innovative visualizations that improve resilient performance.
Permalink 2019 Aug 2 Mood +6 🙂

…in reply to @piannaf
39/ The interventions that responders make are experiments that test their mental models of the anomaly sources and the surrounding system
Permalink 2019 Aug 2 Mood 0

…in reply to @piannaf
40/ What is not clear is how to manage the risks posed by strange loop dependencies in business-critical software
Permalink 2019 Aug 2 Mood -4 🙁

…in reply to @piannaf
41/ object-oriented programming method created an opportunity to build systems quickly, to deploy them, and from their use to discover new abstractions that could then be incorporated into the software
Permalink 2019 Aug 2 Mood +2 🙂

…in reply to @piannaf
42/ Refactoring is not itself productive because it does not change the software's external behavior. Thus refactoring "pays back" technical debt but does not produce immediate value for users
Permalink 2019 Aug 2 Mood -2 🙁

…in reply to @piannaf
43/ Accepting too much technical debt in order to bring product features to the customer may doom the long-term viability of the product by making it impossible to revise in the future.
Permalink 2019 Aug 2 Mood -3 🙁

…in reply to @piannaf
44/ In contrast, concentrating exclusively on keeping the software spotlessly clean may cause the enterprise to miss opportunities for improving the current product and make it less competitive.
Permalink 2019 Aug 2 Mood +6 🙂

…in reply to @piannaf
45/ The organization has little idea of how much technical debt it 'carries' in its code and paying tech debt is notoriously difficult to make visible to those setting business level priorities.
Permalink 2019 Aug 2 Mood -5 🙁

…in reply to @piannaf
46/ There is no specific countermeasure that can be used against dark debt because it is invisible until an anomaly reveals its presence.
Permalink 2019 Aug 2 Mood -3 🙁

…in reply to @piannaf
47/ Critics of the notion of dark debt will argue that it is preventable by design, code review, thorough testing, etc. But these and many other preventative methods have already been used to create those systems where dark debt has created outages
Permalink 2019 Aug 2 Mood -8 🙁

…in reply to @piannaf
48/ "Why are things done the way they are?" is seldom asked during internal analysis but was quite common during the workshop
Permalink 2019 Aug 2 Mood 0