What Real Housewives taught me about postmortems

Paige Cruz (Chronosphere) shares why postmortems are never truly objective and how to make them useful anyway.

  • Paige Cruz
    Paige Cruz, Principal Developer Advocate, Chronosphere
The transcript below has been generated using AI and may not fully match the audio.
Hello, Sev zero. You all are my people. It is so good to be here nerding out about incidents with you. So as I was developing my slides, I realized I don't really love the term postmortems. I submitted with postmortems. Let's start as we wanna continue. We're gonna be talking about retros. We're gonna be talking about that meeting that happens where we discuss what went wrong and why.I'm Paige Cruz, also known as PagerDuty around all of the various places, so yes, I know. my name doesn't work with incident io, eh, if you don't read Catchpoint's state of SRE survey every year, I highly recommend it. This year or last year, they had one question that stood out to me, which was, in which incident areas could your organization improve upon?This was a multi-select. They had options for cross incident analysis, creating action items, completing action items, writing reports, facilitating blameless post-incident retros, and emphasizing learning versus fix it. So of all those options, what do you think the respondents, or how would you answer that for your org?What area would you like to see your org improve on? Incidents? If you said emphasizing learning versus fixing bingo, you were with 47% of people, which was like by far the like largest area for that was identified as an opportunity for improvement. This really resonated with me because maybe it's because I come from a family of teachers.I really think about the power of learning on the job. There's just nothing that a college course could teach you about real world production. like participating in an incident I. in some companies, we don't have to name names. It can feel like reliability, theater, it feels like going through the motions.We're here to write a report. We're here to create action items. We're here to complete the action items or identify ones that we could complete in a short enough time that it makes our incident metrics look good and that everything's on track, RINs and repeat. I think that the retro is the perfect place for us as individual contributors to emphasize learning over fixing.What I like about this is retros are already a part of your process. I hope. I hope it is a part of your process when an incident happens. So the good news, don't need to ask for permission. Don't need a big process overhaul. It's all about harnessing the power of facilitation to focus on learning. So how are we gonna do this?How are we gonna learn how to emphasize learning? We are gonna examine how the Real Housewives, the reunion episode, at the end of every season, how that, what we can do and what we cannot do. There's lessons and there's anti-lessons here, and I get it right now. It probably seems like our world Clouds, clusters, containers, and the world of Real Housewives, glitzy dramatic, fabulous vacations, miles apart.Real Housewives is yelling, table-flipping betrayal. A very weird amount of federal crimes and indictments. The data housewives have is gossip and hearsay and totally doctored screenshots. their evidence and data is suspect at best. Our world, by contrast, calm, coordinated, rational, objective. We have hard evidence, right?Look at this. We've got Slack threads. We've got. Zoom, transcripts, dashboards. I work out an observability vendor. I know we have so much data at our fingertips too much sometimes we're building a timeline. We're identifying that singular root cause, that pesky root cause, and we brainstorm the action items.We don't manufacture storylines. It's not fake. Our incident retros discuss the objective truth.If you think about it, what really is the difference between an end-of-season reunion and an incident retro? I contend both are workplace meetings among colleagues facilitated by somebody ostensibly neutral to discuss a workplace incident. and we look at the data, we solicit firsthand accounts and we discuss impacts and remedies, like really the difference is maybe what we're showing up and wearing.I've yet to see a fabulous incident retro as fabulous as the reunions. So what if I told you. Our incident retrospectives are just as produced as the Real Housewives reunion, and the only difference is what we're producing for, and that's whatever our organization's values and priorities are. So Andy and Bravo TV incentivized for drama.They're producing for dramatic moments in soundbites that'll capture viewers attention. Our job as retro facilitators is to produce for learning. The choice is yours. Like you're gonna have the retro anyway, right? What is the opportunity cost here? What are you gonna be producing? Reliability, theater or resilience?And I don't wanna bury the lead here. The way you do this, of course, is by doing the, emphasizing the power of facilitation through the vibe that you've set, that you enter the room, the questions you ask and how you ask them, and the way that you actively moderate. Facilitation is an active role, not a passive role.For those of you who are not Bravo holics, although I'm sure there's a few in the crowd, here's the example. We'll be looking at the TLDR for the Hawaii incident is a classic. She said. She said situation. These two had a conversation about filming schedules over spring break. Importantly not on camera.So we have no way to validate. We have no transcripts, we have no audio, we don't know what exactly was said. And we have two competing accounts. So in one version, the brunette asked the blonde, why would anyone be interested in filming you alone in Hawaii without your celebrity husband? there's a lot of editorializing going on in that account, but we'll set that aside.And the second version, the brunette says, I was innocently asking questions about your spring break. I've learned you were going to Hawaii. I thought that was interesting and I thought it was interesting. They were filming just you. Because again, Bravo wants drama and there's not a lot of drama on a family vacation, depending on the family.So we have these two accounts. Both of them left the conversation upset and they roped in everybody else in the cast. There were lots of fights, subsequent fights, and the point is it started, it was the genesis of a season-long feud. So really when we get to the retro, when we get to the reunion, we can't roll the tape and see what was said.So the conversation needs to be not litigating the exact truth of what was said in that moment, but it's more about comparing and contrasting these competing accounts and figuring out how we can move forward. And Again, if you don't watch Bravo meet Andy Cohen. He is the reunion host. Yes, these are real photos of him while filming reunions.Not quite the model of neutral objective facilitation. but when the cast sits down at a reunion, he as the executive producer and host, has a lot of power. he's there to bring everyone together, replay that timeline, invite different perspectives. He controls who gets to talk and when. and he also gets to bring folks to a resolution, when he feels like it's ready.So when I talk about the vibe, the first thing that we can do as a facilitator is set the vibe, set the tone of a room. Before a reunion begins, there is tension. Andy has said on the record of a successful reunion what his blueprint is. There were memorable lines, there was humor, there was big drama, there was reconciliation, there was everything.It was outstanding. From that little section, that little sentence, we can tell Andy's not there for making people whole right away. He doesn't want the apologies to be flowing in the first five minutes. He wants to draw it out. He wants to stoke the flames, of conflict there. And then he will be happy to have the reconciliation.He has this storyline in his head of how this conversation is gonna go and what makes it valuable to his organization. Bad vibes from the get go. Both the blonde and the brunette, Kyle and Camille, real Housewives of Beverly Hills, they know they're in the hot seat to retro the Hawaii incident. They know that the way that they act in this conversation will determine whether or not they get to keep their jobs and get their contract renewed.Quite a lot of pressure for a retro conversation. basically the absence of psychological safety. So from the jump. The, vibe in the room is tense. Everyone is primed to be on the defensive. It's designed to amplify misunderstanding, so it's like the opposite of what we want. What if instead people came to your retros, excited, curious, open, ready to share about the mistakes they made, what they saw, the actions they took, actions.They second guess with open minds, freely talking about the pressures they were under, how they felt, and how they responded. and I say during the course of an incident, I have been all of these women at once. I have been like, oh my God, those services are still talking to each other. What do you mean?I have been Tamara saying, yeah, I, told you guys about this risk a couple months ago and look here, we are. I wish you had listened. I've been Denise, super happy to see a mentee take their first on call, shift and drive, jump into incident response with enthusiasm, with confidence, and with competency.and when we, if we were to take Andy's example and to focus on the blame, to focus on the defense, to lean into sometimes the stories and reputations that teams develop in orgs, we do not leave space to share all of the emotions, the good, the bad, and ugly. Still bad check. the vibe does not have to be stuffy.It does not have to be sterile. It does not have to be super formal. and I think at SRE Con this year, Katie Wilde shared a fantastic practice that I would love to see more people do. Her tradition at SNY is to kick off incident retros by playing a song she especially selected, that somehow reflects the incident in some way.She does not reveal the song until people are in the room at the retro. This created buzz. This is kind of like promotion, clickbait. This is getting people talking about theorizing. What could the song be? What would I pick? What did happen at this incident? I wanna go read that report. Like how could you, if the song thing doesn't resonate with your engineering culture, how could you find a way to make people similarly as excited to attend your retros?And how do you think the conversation would flow if they came in with that excitement and that open-mindedness and that curiosity? Although if you do want to do the songs, Housewives Multiverse has plenty of songs you can choose from it. It's expensive to be me Perfect per cloud billing incident insecure, a security incident, and don't be tardy for the party.I would love to see someone have a, have an incident named after that. So that's the important, before we even set the, before we even start litigating and going over what happened, what mindset are you helping people bring to this conversation? Another thing that Andy does really well is he starts with the highlight reel.They don't just dive into the discussion. He rolls a supercut montage of the key moments throughout the season before diving into whatever his agenda is for that reunion. This puts everybody on the same page, and it helps. It helps folks say, actually, you're missing something from this timeline, or, I actually didn't think that was a big deal.Why is this on there? I used to be against the idea of walking through the timeline at the beginning, and now I've totally come around. I used to think just pre-read the notes. What are you doing? Come prepared to this meeting. so we can get into discussing the action item. That's why we're here, right?No. Now I see a lot of the value in walking through at least the the overview. And I think from what we've seen today with the postmortem demo, you can get the short version, the medium version, you can get as much detail as you want. It is not as laborious today to construct that information. So this is probably, I think the last thing that Andy does well, and the rest will be anti examples.He, for this one example, sought multiple perspectives because again, this conversation wasn't on camera. We only have two accounts, and so instead of saying what is the right version, who was right, what exactly was said? He phrased things. What did you hear? What did you hear her say? What did, what do you think you said?he gave both women floor to share their recollection of events. That was great. He didn't default to either as the blessed version of the truth. We learned a lot more from each lady's recollection how they felt their existing stresses in insecurities. and it was a lot more of a rich conversation than had he narrowed it to what exactly was said.We all need to be in agreement on that. sometimes we can be obsessed with the root cause. We kind of narrow our, options and our discussion possibilities. me as Andy, what could we do? What are ways that we could bring this to our practice? Asking questions like how much of this was new information for people in the room and taking turns to say, what was your role in the incident?What did you feel? What did you experience? the more that we understand the incident as it unfolded from each other's perspectives, the more space we have for empathy to recognize each other's expertise and experience and really open us up to. Not focusing on the fixing, right? What we were talking about earlier.How do we emphasize learning? The learning comes from sharing contrasting versions of events. What did you notice? What did you hear? How did you know to look at that dashboard? I've never seen it in my life. things like that. Okay. And now we flip to the anti example with questions that trap all the good work he did earlier.Totally obliterated by the blame-focused questions that he went into afterwards. questions like, why do you always see yourself as the victim? Are you overly sensitive about the way people perceive you? 'cause your husband is so famous. What options does Camille have? That's a yes/no question.That's framing and blaming and shaming. There's not a lot of places for Camille to go, but on the defense, which is exactly what he wanted, is why he designed the questions that way. So what do we do instead? I've got a few sampling of questions that reveal, however, this is just a sprinkling of my personal favorites.I really encourage you to look at and maybe develop a question bank that you can draw on as you're facilitating. one of my favorites though is what surprised you? it makes people stop and reflect. when you say, what happened? What went wrong? When did you get alerted? You kind of launch right into the story that you've been building up in your mind, but when you say what surprised you, it almost interrupts that.And you go, oh, I didn't think you were gonna ask me that. Okay? I didn't realize that there was this proxy only in production and not in staging, and I learned that the hard way, but now I know. What actions do you feel good about? Thanks to the negativity bias, our brains love to hold onto the bad, in the negative in what went wrong and how the blame, the shame that we can put internally on ourselves.So deliberately making space to ask for what we feel good about or what we're proud of. Kind of subverts that bias. It reveals the strengths that we should. I think, as Molly mentioned earlier and some other speakers, what could we do more of? How is this a great example of coordination that other teams could learn from?It doesn't always have to be, here's the long list of things we're doing bad and here's all the PRs to fix the action items and it everything was bad. No, most of the time, in a lot of orgs I've worked at, as I've done the incident retros, people are really happy about the collaboration, the camaraderie.There's some trauma bonding that can happen on some of these long incidents. It's really important to recognize that the way that folks acted during the incident can affect how things unfold, and we should celebrate the wins. This is a play on what surprised you, but how did the system respond differently than expected?That framing there is not blaming. It's not why did you push, why couldn't you run the tests, or why didn't you think about this harder? Why? Yeah, somebody looked at the PR but it wasn't the right person. Why didn't you request it from somebody else? This just says, this is assuming that with the framing of the question, you were doing your job to the best of your ability with all the knowledge and skills that you had.You were not trying to take the system down. What went wrong from your, what happened from your mental model of how you expected this to happen and then what actually happened? I don't think I have time for the story. you can talk to me afterwards about ways that I've used that for incidents that I have caused by my hand.And finally, I think something that never seems to come up in the technical write-ups is the organizational and systemic factors that contribute to an incident. I've, I've yet to see an incident that looks at, We laid off all the institutional knowledge and so of course it was gonna take a lot longer.So this question, given that you cultivated an aura of psychological safety, or perhaps you're doing one-on-one instant debriefs, people can be very honest about the ways that the organization put them in a double bind, and kind of exacerbated some of the incident impacts. and again, these are just a few of the incredible open, curious questions that you can ask to elicit learnings versus just focusing on the fix.So what did we learn today? Facilitation is kind of an awesome power. Yes, Andy uses it for bad, but we can use it for good. The tone you set, how you're balancing the airtime of folks who are speaking and when you pivot is gonna produce the story that the org remembers about the incident. So a quick recap.Do a vibe check. Yes. Let's start with replaying the timeline and making sure we're aligning on an aggregate narrative of the sequence. As we understand, seek multiple perspectives on the same incident and ask questions that reveal. And I think people always say, how do we measure this? What are you, really learning from incidents?We learn how the system works, we learn how the system fails, we learn how teams coordinate. All of those things help us respond better to the next incident. And also under our, improves our understanding as we work day to build and operate these systems. So what does this mean for you? the Real Housewives of Reliability has yet to hit the airwaves.Thank God it only exists in my mind. and our retros. Really shouldn't reflect how reality TV reunions go, but if we harness the power of facilitation, we can really provide some great learnings for ourselves and for the org. I wanna leave you with some resources, things that have been pretty influential in my understanding of how to approach incident response.The debriefing facilitation guide from Etsy. If facilitation and running meetings is new to you and a muscle you need to develop, read this guide. Full stop. It covers everything from before the meeting to after the meeting. Great set of questions. SREcon and Rù gave this talk. What is incident severity, but a lie agreed upon.Fantastic talk. that kind of talks a little bit about what I have done, but from the. From the incident severity. When we're declaring an incident, what does this really mean? Spoiler alert, talking to people is good. And finally, we're I was so stoked to see Lauren here today. Your understanding of reality is wrong.Blew my mind. if you were not on board when I said, we have multiple versions and multiple stories in the incident, as I experience it may not be the same as my colleagues check out Lauren's talk. very, fantastic. He. Basically answering why do bad incidents happen to good engineers?The age old question. So if you'd like to continue the conversation, get more resources, talk to me about your favorite incident. You can find me all of these places and with that to air as human pretty 17, 11, we think we can update that quote. so the Countess in her words, even Louis Vuitton, makes mistakes.

San Francisco 2025 Sessions