incident.io Product Showcase

Pete Hamilton, CTO, Lawrence Jones, AI Engineer, and Ed Dean, AI Product Manager runs through our new AI SRE product. AI SRE shortens time to resolution by automating investigation, root cause analysis, and a fix, all before you’ve even opened your laptop.

Pete Hamilton, Co-founder and CTO, incident.io
Lawrence Jones, Principal Engineer, incident.io
Ed Dean, Product Manager, AI, incident.io

The transcript below has been generated using AI and may not fully match the audio.

Pete: I am Pete. It's great to meet you all. fantastic to be at SEV0 on home turf. I was joking at the start of the week. I'm really looking forward to doing this presentation. Not incredibly sleep deprived and jet lagged. I'm not jet lagged. I'm incredibly sleep deprived, as I'm sure many of you're after, Monday as well.but I'm here to present. Some work we've been doing in the product team. I'm really excited to show you what we've been up to alongside Ed and Lawrence and to contextualize it. I get to stand up here, but these are some of the folks that have been behind everything you're about to see and I wanna put 'em up there.And a lot of them are in the room. So afterwards, whether it goes brilliantly or not brilliantly, they're the people you need to go find. Before we dig into AI SRE, which I know is what everyone wants to see. I wanna take us on a little bit of a rewind, go down memory lane briefly. I wanna go back to the early days of, incident.io.So I spent a ton of my career on call, much like many of you, same as Steven and Chris when we met. And so it was really natural that where we started was the problems. We knew it was coordination, it was communication, and that eventually grew into a company. And our first product, which we now call response.Since we started, we've helped almost 200,000 people solve almost half a thousand, half a million incidents. and I think that's pretty cool. we're feeling pretty pleased with ourselves, but being the fantastic customers that you are a couple of years in, that's not enough. we look at the integrations and we say, status pages great.We have, JIRA great, but we were getting a lot of pull for, why don't you just do these things? it's great that you have good integrations and you're stitching my tools together. But why aren't you guys owning this, end to end? And our reaction was like, honestly, you're right.we built out a team. We built status pages. since then, nearly three quarters of a billion, people have viewed those for some of the most well known brands in the world. and we've had over sort of 35, I think this is even a little bit stale, probably after Monday, we've had about 35,000 people, sharing with their customers through our platform.So not just managing incidents internally, but externally too. Very cool. The biggest feature request we've ever had in company history, and a great one to tick off was our on-call product. So I've used many providers. I didn't love many of them. and I think the same was true for a lot of our customers.And the biggest request we got was, please, can you also own paging? It just feels like such a natural fit and we always wanted to do this, but. For obvious reasons, this one is one you really wanna get right. and so in March, 2024, after a lot of thought, we launched this. and since then, I think within the first month we had half of our customers had switched over to incident.io and since then we've now hit about 72% adoption.So 72% of the people using our platform used our paging product direct. And 90% of all the paging that happens through the platform. Bear in mind, this is all integrations with all paging providers, including our own. 90% of those go through our own product. and I think that's really, cool. Just like about a year and a half, a year and a half on.Along the way, we built out some super strong foundations as well. So it's not all shiny products. A lot of it is really important, but super, I guess like comprehensive, foundational features as well. So the one I wanted to call out here and why this slide up is catalog. The idea here was, we had a lot of people saying, my instance are technical, but they're not just technical.we've heard from people earlier today talking about how incidents are about marketing sales. Legal. And so we wanted to bring all of that context into the platform as well. It's great having automations. It's great having, workflows, but if you can't plug those into the other areas of your business, you're hampered in an instant scenario.Right? And so we pulled in custom support context from Zendesk and others. We pulled in CRM data from folks like Salesforce and at eo. We brought in your team data, your service data, everything that you need to know about in order to mount a good incident response. We bought that into the platform using the power of catalog.And the reason that I'm calling that out is in the world of ai, like context is king and it's one of the things I'm gonna reflect back on later is I think that put us in an incredible position over the last sort of six months to really take advantage of the fact that we can see everything that you need in an instant, and we can now apply AI as one of these foundations to that context as well.There are two reasons that we did all of this, and I promise I'm coming to the demo. There are two reasons we did all of this. One is we wanna help you run a great process. We just wanna make you run incidents better. And that's where we started. And honestly, I think we've done a really good job. I think the team can be really proud of everything we've achieved.We've had loads of amazing feedback. This is the product we started with, but it's not the product we want to end with. The second thing we always wanted to do is help you fix issues faster. The way we start doing that is help you run a better response implicitly. That helps you run a faster, instant response.If you can do things more quickly, if you can cut out the manual steps, if you can automate all of the things, we heard from the picnic team. They built their own version of this. Instant IO came from Chris building his own version of this. The thing that we were always a bit bugged by is it feels like there should be some intelligent computer sciencey machine learning stuff that we could do here.And that was on the backlog for a very long time. And then suddenly like explosion of LLMs Foundation models and we go, oh, this is like a new piece of technology that perfectly plugs that gap. Puts us in a fantastic position to go and actually take a real shot at this. and that's what we've been doing for the last sort of six, nine months and what we're about to show you today, and what's really cool is I think what you're about to see is what it looks like when you do both of those things really, well.What does it look like when you don't just have point solutions? You don't have, Claude Code over here and about six other providers over here. What does it look at when you put it all in one end-to-end ai, native instant management platform? so let's go take a look. I think you've heard me talk enough.Let's go break some things in production. one thing I'm gonna call out before I do this is everything we're gonna show you today is gonna be completely live. we debated, should we do a prerecorded one, lower the risk. as you heard Stephen repeat probably an obnoxious number of times.Pete is going to do a 45 minute live demo, so I couldn't back out now even if I wanted to. and for us, I think like the proof is in the pudding. It's like e. You don't want to see another prerecorded video, another perfect simulated environment. We're gonna run this in production. I'm gonna literally cause an outage now, at least a little one.Don't worry if you're a customer. It's very, isolated. and we're gonna watch as AI SRE helps solve that right alongside me. So before I do this, I'm gonna just connect my phone. I forgot to do this in, and Chris had to shout at me from the sidelines. And then I'm in my terminal, which you can't see.I'm gonna fire. Some dodgy payloads at our production environment. and in a moment my phone, hopefully it's gonna ring. and then we'll, and then we'll kick off.It's gonna take a minute to hit the relevant threshold and then trigger through all of the plumbing.It'd be a weird world where I fail to break something as opposed to successfully breaking something in my career. There you go. Oh god. oh God. that is the, actually from the Chainsmokers that Stephen mentioned earlier. When we launched Oncall, they sent us a few samples and we liked 'em so much, put 'em in the product.So I'm gonna acknowledge the page. you don't really need to see all of this. It's responding to an and acting a page. You've all seen it a million times before. What I want to do is get straight into the Slack channel and figure out what's going on. And the first thing you'll notice is that the moment the alert came in, I was getting paged.The incident span channel had already spun, been spun up. A lot of you will have seen that if you've used the product and A ISE is already hard at work investigating. It's cranking through all of the stuff that I typically would be. So it's looking at. The error and the alert, it's actually received. It's looking at recent changes and pull requests going through all of our GitHub, repos.It's jumping into our dashboards and our telemetry. We obviously have quite a sophisticated setup there, so there's a lot of information, but it's going through all of it at pace and it's checking to see if there's anything relevant that anyone's mentioned. In one of the, I think we have 2000 Slack channels, incident.io and much like I would be is also going.Have I ever seen this before? I'm sure this can't be the first time that an incident like this has happened and to call it out, like I, I've been doing this a long time, very, recently been running a lot of incidents and I think generally I'm, pretty decent at it, but on my best day. Fresh in the morning, new brain, coffee in hand.All of the tools at my disposal, none of the logins forgotten. Like I can't be in 10 tabs at once, right? I can't go and analyze a thousand data points. I can't register like 20 things at the same time. I'm not a superhuman, but ai, SRE, can do all of those things and it's doing them right here on, on the screen live in front of you.and I think that's really cool and it's using everything, it's learning here to come up with an initial hypothesis on, what's going on. I'm gonna jump into desktop just 'cause it's a better place to view this, and that's probably what I would do at this point anyway. so let's jump over to Slack.We've directed this to its own channel, just so you don't see lots of other spam. and I'm gonna jump straight into the Instant Channel itself and you can see it's still, hard at work. and one that investigates, one thing I wanna call out is, oh, okay. It's even too fast. Great problem to have.I'll jump back to this in just a moment, but what's really cool here is it's still investigating, but it's already got a hypothesis of what's going on. And as Laura as is gonna show in a minute, just the technology required to even get to this point, it's super impressive and I'm so proud that the team have managed to make this work the way it does.so I'd like to pause and give them a very justified round of applause. Now while we wait for it to finish its investigation, let's just go take a quick look at the thread on the right hand side. you can see one of the things it's act as it's gone through is that actually it thinks it could fix this through code. And we'll come back to this in just a second.But before I do, I just wanna go through on the right hand side, the thread of all the things that it's finding. You can see a sort of more verbose version of the little bullets that you're seeing up on the top left there. And you can see it's leaving a sort of breadcrumb trail. I think trust is one of the things that is really important with any AI system.And so we wanted to make sure that we weren't just saying, trust me it's this, the kind of obnoxious senior engineer approach to just do it. and instead allow you to validate as a human. And Ed's gonna show you what this looks like in the dashboard a little bit later.It's figured out that there's a bug, in a recently merged pr, by me, well-meaning CTO trying to make performance improvements. And it's naturally recommending, what we should do is we should fix it. And recommendations are cool, but actual support is better. what if we actually helped you fix the problem?hence this lovely little message where it said, Hey, I think I know what's going on, and I actually think I know what we need to do about it. that plan looks pretty good and it's offered to fix it. And obviously this is a live demo, so let's kick that off, and put it to work. While I wait for that to chug away in, in the background, and put together a code fix.I'm gonna do a few other, things that I would typically do in instant. I'm just gonna close this down so you can all see better. and I'm actually gonna just talk directly to the instant IO platform. given that it's figured out what the bug is, let's ask it to explain it. and rather than going into laser technical detail, let's do the other end of the spectrum, and say, can you explain the bug like I am five.Hopefully it has enough understanding to do a good job of that. and while it thinks about that, I'm gonna say, can you give me a better idea of the impact this has had on traffic in the past 15 minutes? I hate typing on stage. Did I spell it all right? Epic. If I let it think about that and go back to the bug.Let's see what it's come up with. Imagine you have a box of toy cars and that will end up okay, innovative way to describe an offer by one error. And if I wanted to, I could continue the conversation. This is not a one shot thing. okay. Now do it like I'm three. I give it a second to think about that.And you can see I'm talking to it just like I would with a colleague. Obviously this is a little bit contrived, although I probably do ask my engineers to explain things like I'm three more often than I'd be willing to admit. that's, actually pretty accurate. The computer tried to grab a toy that wasn't there.I got confused and said Uhoh. I think Uhoh is a great way to describe, major, instant outage. if I now go take a quick look at telemetry, I can see here that it's taken a look at the last sort of 15 minutes or so. and it's flagged a few different things and before I get into that one thing to call out, I haven't told it exactly what I want.I just said, go have a look and tell me what's going on. The team have not then either like deterministically coded what that even means. It's gone and gone. Okay. Here are all the dashboards we have, here's what I think they're responsible for, what they do. Okay? That probably means this panel is relevant and probably this time window because Pete said the last 15 minutes, like what graphs might he want, what things would be good to call out?And you can see it's calling out an increased error rate. A sort of window of impact. And if I go down here, we can see, this is me, on my laptop, just firing bad payloads at production. And we can see the error rates dramatically, increased. It's in UTC, which is why the time zones are offset.Which I think is also pretty cool. And I guess the key here is like I could keep going here and just poke at various other systems. I can do that proactively by just conversationally chatting with instant ia. But it also can do a lot of this proactively. And that's some of the stuff that we're working on now.It's cool that I can ask and get an answer. It'd be even cooler if it proactively went, I think this would be a relevant thing for you to go take a look at right now. And so that's some of the stuff that's, gonna be coming, down the pipe. I think the PR I imagine at this point is probably like imminent.It's like chugging away in the background. But I think one thing I wanna call out is, you could say at this point, Pete, this is great, but couldn't I have done the same thing by just taking the error, lobbing it into Claude Code or Codex or something like that and basically get the same thing?And I think on the surface, yes, I think the difference here is that the agent that's running behind the scenes here is not just focused on code. It's contextualizing all of the other things that it found. If I go back up here. There's a ton of information in this thread and it's gonna know things like, what's the current state of telemetry?How urgent is this? Should I do a quick fix? Should I do a comprehensive fix? what's some of the business impact that we might see right now? How is this affecting support queues, things like that. It's got access to all of this context, and it's using that as well as knowledge of the bug to put together the fix.and that just gives it the ability to be like way, way more, like tailored and specific in how it like addresses the issue. Examples here would be like, rather than just doing a generic code Fixx, it could reference a script that like Martha ran last week. Like, how would Claude necessarily know about the incident we had last week?We know that information. It could say, I'm gonna make this change, and you'd probably get Lisa to review it because last time she was the one that got involved too late. If she'd been involved earlier, we would've had a much better. Outcome there. And it can do other things as well. maybe it's like flag previous incidents and say, Hey, not only is this gonna be the thing that fixes it, but importantly, if you don't get, this out really rapidly, last time you saw it go wrong, it was really bad.And so you probably wanna get this through your kind of hop fix pipeline versus through your normal build pipeline and deployment pipeline. which I think is really cool. Perfect timing. it's come in with the pull request itself. you don't have to trust me. We're gonna go take a look, and see what it's come up with.it's explained what it's done. It's fixed a buggy special case. and if I go look at the diff itself. You can see it's taken out some dodgy CTOs trying to be too clever code written by me. and it's changed a couple of the tests and added some new ones as well. So not only has it fixed the problem, but it's actually provably done that.And at this point it's probably still, it is a better engineer than I've been at certain points in my career where I would've just like yoed in the fix and hope that it worked. it's actually taken the time to prove it as well, which I think is pretty cool. and I think again, like getting to this point, having a generic code agent that can take all of that context, put together a fix, and now bring that through to the rest of the investigation as well.I think super, super impressive. and I think team, you've done a fantastic job. and I'll pause 'cause I think getting from, oh God, something's gone wrong in production to GitHub pull request without me having to type a single loan of code is pretty epic.Cool. So let's go back to Slack. and let's do a few other things that I would typically be doing in a situation like this. I'm very busy doing a live demo, so I'm gonna pull in some extra people to do that. let's just double check a few things first. So let. What are, the upstream dependent features here?I think I've probably broken their stuff. so let's do that and then think for a second. What's interesting here is how does it know that? like I've just asked it what depends on this. I haven't told it exactly how to figure that out. And so what it's gonna do is it's gonna use all the knowledge of the code base, the code that you just saw.It's gonna figure out how that code relates to all the other coding system, all the other services. It's gonna go look at our catalog where all of our teams are stored, our ownership is stored, our services are stored. Tons of other information, and it's gonna figure out like. Is there anything that, I've likely broken as a result of this and said Smart correlation relies on the Alert Insights team downstream impact, who's affected?So it is saying you're safe. Lila Insights team is my team for the purposes of this incident. So I haven't caused any broader blast radius, but I still need a pair of hands. So I'm gonna pull in, Rory in this case. but imagine I don't know that I need Rory. Imagine. All I know is that I need someone from one of my sort of peer teams.So I'm gonna say instant. Can you page the applied AI team? I know it's them. I dunno who's on call. I'm super stressed. Let's just get AI SRE to go figure it out for me. And here again, it's gonna take all the context this time, not just code. It's gonna go and figure out for that team. Do they have an oncall schedule?They have many oncall schedules. Who's oncall right now? How do they want us to get hold of them? And it's gonna go and figure all of that out and then it's gonna do it for me. Didn't touch the keyboard. It's told me Rory's on his way. and at some point, I guess in the audience, or maybe in the breakout room, Rory's phone is going off right now.and in a moment, there we go. He act. And at this point, A SRE is letting me know that he is on his way. How many times have you paged someone in some random other service that you haven't looked into in a while? You have to figure out who it is and then, you don't even know if they're gonna turn up.I, actually didn't get any review for this. I got it in 'cause we were prepping for the demo. and I think just even that interaction, right? Like just having someone on hand to go and do all of the things that take up Headspace when all you wanna do is get things fixed and back online. This is the stuff that we really want to help you fix and do much better at.With Rory handling the pull requests, I'm gonna pause for a moment and I wanna reflect on three things that we've just seen. The first is, rather than getting paged with a cryptic error message, wondering what's going on it, the plan is all on me. It's just me. I've been paged. All my other team members are maybe asleep at this point.and I have to check 20 different things to figure out what on earth is going on. A SRE just did it all for me in parallel, way quicker than I could have. and I was able to focus on, what do I wanna do next with that taken care of and in hand. Rather than going off to figure out what I need from sort of third party systems, like telemetry and things like that.A SE just did all of that for me, right? I just asked it, what do I need? How are these upstreams affected? Get me someone from the AI team. and it just took care of it. And finally it literally and I don't want to over labor this too much, but I think it's really cool. It's like it literally identified the bug.It's a real bug in production that I'm breaking right now. And it told me what the problem is. Drafted up a pull request. That is currently, being rolled out to production by Rory and I didn't have to touch a single thing. It's ready to merge. And I even had tests better than me, right? and this I think is the power of an end-to-end instant management platform.This is what happens when you take all of these different processes and tools and you put them behind one really smooth, clear interface with great foundations, great context, and well implemented ai. And I guess the call out here is. at this point the last thing that I would be doing is like a quick, like interim wrap up.The blast impact is mitigated, let's say. and I should tell everyone what's going on. You can see here it popped in a minute ago. Instant. I have gone that a lot of things have happened. Pete, you should probably be giving an update and not only has it given me a little nudge. It's told me if you're happy, this is my best guess at what I think.Everything, all the things that I think have happened so far, and it's referenced again, all the context, right? It's not just said, here's what I can see in Slack. It's gone. Here's the pool request in GitHub, it's gone. There's no downstream impact or customer facing issues, that have been confirmed yet.It's referenced to telemetry to figure out like what's the impact. and it's, I've, 24,000 panics in production. It's using all of this. I haven't had to go anywhere to figure this out or consolidate it into an update. And if I like it, I can literally just share it. so here, let's just say share that.and that now gets shared with everyone. If you're using internal subscriptions, this would've, gone out and messaged Steven. His instant I app is probably currently showing him this. and likewise, anyone in the channel knows what I know. And again, the key here is not doing this, is just part of a good response.The key is I didn't have to do it right. It's good. And I didn't have to get involved too heavily there. if I wanted to, I could edit it, but in this case I choose not to. at this point, one thing to call out before I wrap up, on the demo, first part of the demo, 'cause we've got three is.You've seen this all in Slack. That's where I've focused. It's the easiest way to demonstrate the capabilities of the system, but it's not restricted to Slack. Everything you've seen I could also have done in the dashboard. and the dashboard even has richer features and context and all those things.And Ed's gonna show you that in a second, which I'm, really excited for. But before we do that. I wanna bring Lawrence on stage, one of the engineers who's been working on this very heavily for the past 12 or so months, and I'm gonna get him to talk you through what's going on behind the scenes. I think it can be very easy to look at this and be like, how?How hard could that be to wire together if I just told Claude to build that? Like how long would that take? I think the answer is a long time, and this stuff is really difficult, and I think any of you that have worked with AI deeply know that it's this. Peak of, oh my God, this is magic to, oh my goodness, this is so incredibly difficult.Lawrence has been through the peak and the dip, and he's gonna share with you what's happened under the hood to make everything you just saw possible.Lawrence: hi, I'm Lawrence and I lead AI at incident io. so funny, Pete's talking about the peaks and the dips. It's a question which one I'm currently in. but anyway, I'm here to talk to you about a bit more of what goes on behind the scenes of the investigation system that you just saw.I think Pete's giving you a really good, idea of what it is or what it feels like as an incident responder to be working with Ai SE inside of an incident response channel. but I want to give you a sense of why it's taken us. Honestly about 18 months to try and get to this place. so I wanna start by saying, building a generic incident response solver is actually a really difficult undertaking.and in order for us to do this, we've had to build a ton of internal tooling to help us look inside the system and understand what's going on. So to give you a sense of everything that goes into what you've just seen, what I've done is I've recorded one of Pete's demo rehearsals from when he was preparing for this talk.and I've done it from within an internal only admin dashboard that we use as engineers to try and debug this system. So what I'm gonna do is I'm gonna walk back and play back the incident that you've just seen, but this time you'll be able to see under the hood of what's going on through our internal tooling.So what you're seeing here is, as I said, an internal, admin only dashboard that shows you, a bit about what's going on underneath the hood of the investigation system. So the first thing that we do is we emulate what a human responder would do, which is in this case when they've been paged for an alert coming from a tool like Century, is that they go to Century and they go pull down the details of the alert.And then they'll go, what system is this? How is this broken? like what is actually the nature of the problem here? now we have a ton of connections to a bunch of alert sources. So what investigations is doing over here on the right is it's actually reaching out to our century integration and it's doing exactly what I just said a human would do, and pulling down that data and then performing some preliminary analysis so that it goes, I have a sense of what's actually happening here.And that's the analysis that it goes and produces up there on the left. It's at this point that we've got an analyzer alert, but honestly, this isn't the exciting part, right? A human can do this, pretty easily. It's, actually the next step that this starts getting pretty exciting for. so it's at this point that we start doing things that a human responder just can't do in an incident.so while they can look at a century exception pretty quickly, it's at this point that we start searching through thousands of GitHub poor requests. So that was actually putting them down and looking at the code in those poor requests and trying to figure out what, what's gone on. and we're also gonna go through any of the Slack messages that, oh.Bit of a video issue. it's also gonna go through any of the Slack messages from any of the Slack channels that you've gonna connected into ai, SE. So you'll be looking for all of the messages inside of your Slack workspace to try and figure out, are these GitHub or requests relevant? Are these slack messages relevant?Will they help us try and figure out what's going on inside of this incident? so like I've said it before, but I'd like to say an exhaustive search like this is just not humanly possible, especially if you're in a time pressured incident situation where every second counts. and actually with AI it becomes possible, but even then it's still not easy.So what you've got on the right is this trace view of everything that's going on inside the investigation, but what you've got on the left. A trace of each one of the different checks that we have here over on the right, and you can see that we're not just hitting our index and returning you. The results we're actually fanning out to tens of parallel workers in each one of those checks.Each one of those workers is then looking at the indexes that we maintain continuously, which contain all of your organizational data. It's finding a long list of all these interesting resources, be them GitHub poor request or Slack messages that we think might be relevant, and then we're engaging frontier models to try and answer.Genuinely tough questions. Things that a human might do if they were to load up this resource such as, do I actually think that this GitHub poor request might have caused this incident? And if I do, I'm gonna return that as a candidate resource that I think is relevant to you in the incident. Done by a human.This honestly might take days, but at least with all of the investment that we put into our search infrastructure, into our prompt tuning and all of the background indexing that we do, we're actually able to find the relevant slack message that we think is interesting for this incident and even found the poor request that actually caused this incident in just 20 seconds.So much faster than you could do as a, as an actual human. But it's not just external systems. I think I've spoken about GitHub and SAC and all that stuff. it's actually incident data, and I think everyone in this room is very aware of how valuable your historic incident data is when it comes to solving your incidents.So it's not just that we're searching for these GitHub poor requests, we're actually looking through all of your historic incidents throughout the entirety of our platform, and we're not just searching for incidents on. Does the metadata match? Do these have similar custom fields? We're building up a, an understanding of the dimensions of this incident.So what is the type of error that we're seeing here? What symptoms do we see in this incident? Even what time of day has this incident happened at? And then we're going through all of our records to try and find incidents that actually make sense and look like they might be relevant to this. We then take all of those incidents and for all of the data inside of them, these are the slack messages that you've gonna put in your incident channels.they are any of the GitHub pull requests that you may have attached to those incidents, they're even the follow ups or the post-mortems that have been authored against those incidents. And we're taking all of those, passing them back through into our system and combining what we call an ephemeral runbook, which says, based on what we can see that you've done before, this is what we think that you should probably do to try and resolve the incident that you have at hand right now.I wanna make a point here because I think, as Pete was saying, incidents are not just about the technical or the code, aspects of things. one of the most important things and often the things that are forgotten are process changes. So things like in an incident like this, like a data breach, you need to be contacting your DPO.Or it might be that you need to engage with a regulatory body when some threshold map. Now we can go through all of your. Historic incidents, and we can find these process. We can find these policies, even if they're implicit and they've never been actually properly encoded anywhere. If you have a post-mortem where you had a thing that you forgot to do, we'll be loading that into our context too.And when we come around to this incident, we'll be suggesting that you don't make the same mistake that you made last time. So now we found a bunch of useful data. It's time to get a report to our responders. This is when we move to what we call the first assessment stage of the investigation. and it's at this point that you can see on the right our schedulers split and we've actually sent our long lived agents, which are code and telemetry.we've sent them off in the background so that they can continue their investigation while we try and generate a preliminary report that we're going to try and get to your responders. Now we've actually put a bunch of effort into trying to make sure that we can get you a p preliminary report really quickly.we think it's really important that we can tell you as soon as we think we have any idea of what's going on, we can get that feedback to you immediately. we've been in these incidents before. We know how important it is. The difference between. Getting you something in the first couple of minutes or 10 minutes later, can actually make a really substantial difference to your response.and that's why we've invested so much in it. And it's at this point that we're going to be able to produce the report that we actually send into the Slack channel that you saw on the first iteration of the demo. and it means that you get an investigation report about two minutes after you've first been paged.Now we actually have the data on this, so I can tell you that is far before most on callers actually get to their laptops. having been paged.While we've been building our first report. We've had the telemetry and the code agents go off and continue the investigation, so they're on their own separate thread and trying to find more interesting things that can help us in our response. with the telemetry, we're going through all of our Grafana dashboards and we're trying to build a sense of how big is the impact of this incident, what other systems are impacted, maybe what logs are interesting to us as a responder.and with our code agent, we're actually pulling now on our code repo and going through the code base and trying to confirm our understanding of what the error actually is. Now technically I wanna stress that this is a multi-agent system, which I'm not sure how many people are familiar with it, but multi-agent systems are really, difficult.and this has been probably one of the most technically hard challenges involved in building a system like this. It's how do we end up running agents like this over a long period of time? An incident can span anywhere up to a few days, and how do we have those agents communicate back what they find as they find it so that it comes back to the main agent and we can get it to your responders as soon as we know anything that might be of interest to you.So it's around now. It's around now that the code agent has looked through the code base and it's concluded that it's pretty sure it understands what the problem is. and it's actually gone I think I can fix this. So it ends up scheduling a code fix and it starts planning, looking at your code base, it starts planning what it might do to the code base to try and fix the error that was associated with the incident in the first place.Eventually when it comes up with a fix, it ends up sending a message just like that to the incident Slack channel. and your responders can then just say, I'd like to get this in, please. And hopefully that's the, end of the incident for them. At this point, we're just two minutes and 40 seconds after we've been first paged.So that's a pretty quick turnaround.Now we've come into the final iteration of the investigation and we're updating our report to state everything that we found. we've actually found a lot in this investigation. in this report, we are gonna reference Lisa's cautionary message about performance improvements like this in the past, causing similar problems.We're also going to reference the causing PR because obviously the code change that Pete merged initially is going to be relevant to us as responders. and we've also looked at the Grafana dashboards to try and understand what the impact is. So we're giving you error rates and the interval of the impact.and then we've obviously told you, Hey, we have a code fix ready and you should probably consider merging that. So we've taken everything that a human responder would do when responding to an incident like this, and we've encoded it into ai, SRE. but the thing that's really important here is unlike a human responder, ai, SRE is able to do a ton of this all in parallel, and it's able to exhaustively search all of the data that we've loaded from your organization, which is something a human can never do, and it's been able to get you a report of everything that it's done and even a fix for the incident all within about three minutes and 30 seconds after the initial page came in.In total, this is representing about an hour of war clock time across all of the work that we've been doing. Okay, so that's how it works. For anyone in the room who's been working with ai, and I imagine that's probably quite a lot of you at this point. you'll know that getting something working is just the first start of the story.and by far the most difficult work that we've been doing over the last couple of years has been trying to figure out how we can build a system like this and be confident that when we release it, it is actually working. But when it's in customer accounts, it's doing the right thing, that when it does the wrong thing, we actually find out and we can do something about it.And that's why we've had to create an evaluation system for the AI SRE product. and we actually have a tool now that produces scorecards for each of our investigations. and those scorecards look something like this. So this is a scorecard for one of our investigations, and you can see that it's grading every aspect of the investigation, everything from costs all the way through to the different aspects of the finders that it, uses.This allows us to tell how well the investigation has gone without us ever actually looking at the incident data that the investigation has been working on. Now, given the nature of the data in our system, that's really attractive to us. We have no interest in looking at our customer's incidents, and by doing this and building the evaluation system as we have, we can avoid actually having to go into customer data to figure out what's going wrong.It's these scorecards that tell us how we're performing. And it's these scorecards that allow us to make changes to the system in a way that don't cause regressions. It allows us to build a system that is actually going to work for all of our customers instead of trying to overfit on any single customer or just a handful.and it's also one of the tools that allows us to pick up on any of the improvements in the AI technology space. for example, whenever there's a new frontier model. We often adopt it within a couple of days of release. and that's because we're able to take this system. We're able to upgrade it to the new model, run a big back test of this against a bunch of other incidents in the, past from all of our customer accounts, and see on this scorecard whether or not the thing has improved or if it's got worse, and if it has, how it's got worse.So that's the look behind the scenes and I think it shows you just how much has gone into a building a system like this. We've had to solve so many very hard technical challenges to try and get here. honestly, this is a multi-agent system and it's not just a multi-agent system. It's a system dealing with highly ambiguous problems.It's a system that deals with large data sets and it has some real time constraints where we want to give you the information as fast as we possibly can. What I'm trying to say is it's basically as hard as it gets when it comes down to one of these AI challenges. But the good thing about this being as hard as it has been to build is that it's taught us a bunch about how to build AI systems like this, and it's also produced a ton of tools that we can now use to reimagine the entirety of the rest of the product that we offer to you guys.As an example, all of the work that we've done on long lived agents inside of investigations has been rolled into our chat bot. And it's why when you're using at incident in the chat bot inside of our incident channels, it's why those, chat bot experiences can be much more intelligent than you might see from other products with multi-step plans and exploring data in a very large scale in ways that you may not have seen in other products before.So I'm extremely proud of what we built here. I'm even more excited about what we can do when we take all of the lessons that we've learned from building ai, SRE, and we apply it to the rest of the product. And I'd like to invite Ed up here who's gonna walk you through the first steps that we've made in those directions.Ed: Thanks so much, Lawrence. hey everyone. I'm Ed. I'm the product lead for AI here at Incident io. so the incident that Pete showed you was a classic example of an alert going off, being paged and then jumping straight into investigating. But as you're probably all. Far too familiar. Oftentimes you are actually jumping into an instant midway through and then you are scrambling to figure out what's going on and if there's any way that you can help.So in this next section of the demo, I'm gonna show you exactly what that looks like now with a couple of examples, in instant io. So to do that, we're gonna start off. My team page in incident io. You can see there are two incidents here in my team page. I can see everything that's relevant for my team, like incidents, alerts, and who's on call right now.We're gonna go ahead and take a look at Pete's incident first, and once I'm here I'm gonna head straight over to the investigation. Here I can see the same investigation and the same work that a I SRE was doing that was actually helping Pete investigate. But in this context, it's going to be so useful for me to help me get up to speed with what's going on during the incident.Okay, so the first thing to call out is that A ISE is actually still working on the incident. That's because ai, SE is an ambient AI agent, which means it's constantly working behind the scenes to help you however it can, As new information is emerging, like messages in the channel or alerts or changes in your telemetry, AI SRE will use that information to continue the investigation and keep its findings as up to date as possible.I can see the report from a I sre, so that covers what's going on and what caused it, and from here there are tons of ways that I can dig into more detail. For example, if I click on these next steps, I can see a more thorough explanation of what I might wanna consider doing and why, as well as jumping out to the source materials as well, that recommendation is based on.As I come down the page, I can see findings from AI SRE. So here are all of the relevant pieces of context that AI SRE has found whilst it was investigating the incident, and that it then builds its hypothesis. from here, I can click in and see as much detail as I need to, as well as again, going out to the source materials so I can go and validate, the findings themselves.Coming back to the top of the page. The final thing I wanna show you is actually the reasoning trace from A ISE. So here I can see the latest summary at the top, but as I scroll down, I can actually see every step that A ISE has taken so far to investigate the incident. That's things like querying telemetry, finding relevant repositories that can go and analyze, even searching for messages across Slack and in past incidents as well.Okay, so now let's say in this incident, it actually looks like P and Rory have got everything covered. So from here I'm gonna head back to my team space, and into that other incident that we saw at the beginning. Cool. So as I arrive here, the first thing I can see is that Liz and Tom are actually already on a call for this incident.Calls are obviously. Super important during incident response, but usually one of two things happens during the call. Either nobody's taking notes, so anything that's been said on the call ends up easily forgotten or somebody is taking notes, but then that's pulling tons of time away from them when they could be investigating and resolving the issue.So we've solved that with Scribe, which is another AI agent that's a real time note taker, which transcribes the call and can share notes, observations, actions, all in real time. And you can see that here with the current topic that Liz and Tom are discussing. And these are also shared in Slack as well.I'm sure you, you all know that feeling of being on a crowded call where people are jumping in at different times and all have different contexts and it can be pretty confusing and really distracting if you're on that call and you're trying to coordinate response. But at the same time, as someone coming midway through, I really want to know if I can help in this situation.Now with incident io, I can just ask without even distracting Liz and Tom, so let's say based on the. Recent discussion on the call, is there something I can help with?And here, this is all powered by the same underlying technology that Lawrence just showed you and has incredibly rich context from the call from Slack. Everything AI SRE has found and. Okay, so it actually sounds like in this situation, Liz and Tom could do with some support in terms of communication and coordination.So I think the right next step is gonna be jumping into the call and having a quick chat with Liz and Tom.Hey Liz. Hey Tom. you are currently live on stage at SEV0. I'm obviously a little bit busy right now, but how can I help?Tom: for the purpose of today's demo, that's incredibly convenient.Ed: Yeah, it's almost like we planned this atom Tom: we're a little bit busy here as well. if you handle sending comms to the effective customers for us, and update the incident while Liz and I fix things, the incident is looking a bit stale at the moment.Ed: yeah, for sure. Before I actually do reach out to customers though, do we have a timeline on a fix that I could share with them?Liz: yeah, for sure. So Tom just approved my pr I'm about to go ahead and merge it, so it should be in fraud in 10 minutes.Ed: Okay, sounds great. I'll leave you to it then.Liz: Awesome. Thank you, ed.Ed: Thanks Liz. Thanks Tom. Bye.Awesome. So as I head back to this page, the first thing you'll see is that Scribe has actually already noted down the conversation that we were just having. Obviously we're on Pete's laptop right now, but it says P has been assigned responsibility for comms. So that's perfect. That's all covered. And as I come down the page and look at the incident timeline, I can see that Scribe has been pulling out those key moments as well.Obviously, this is gonna be really useful for me when it comes to writing my postmortem later because those key moments have all been pulled out. Okay, so from now let's actually get into helping Liz and Tom with those things that we mentioned. Can't quite remember what they all were. So let's see if, incident IO does.So let's say create actions for all the points we discussed on the call and assign them to the right people.So first off, I could see in the notes that Liz and Tom were discussing some actions that we should keep track of, but I wasn't actually around in the call then to hear them. And it sounded like I could also help with communicating with customers. So here again, incident IO can pull out information from the call itself.again, pull out information from the call itself, and create actions based on what we discussed with all of the relevant details, things like assignee on those actions. I think there's some other things we can improve on this incident as well. Like this name is not particularly helpful. So let's say, can you suggest a better name for this, incident and here incident I open draft names.It can do updates, set custom fields, essentially everything you might be used to in response. Perfect. That looks much better. So if I go ahead and accept that, you can see it's able to set fields and change things on the incident directly. Okay. So my action item was to reach out to customers. Before I go ahead and do that, we should assess who is actually impacted.So which customers has this incident affected? Again here, it could be drawing context from anywhere in the incident, including the call or Slack or anything AI SRE has found so far.And whilst it's thinking away, there's some other bits of information I might wanna pull up about those customers as well. So let's say search catalog to find the revenue impact for each of those. So in this case, this is something that's actually not in the context of the incident whatsoever, but it's able to go and look up those customers and catalog and then find any associated attributes, things like the revenue associated with those customers.Okay, perfect. It looks like we've got nearly 4 million of revenue across those five customers. one final thing I'd like to know from catalog. Is who are the relevant CSMs? This is obviously gonna help me because I can give, then a point of contact to those customers as I reach out in case they have any other questions.again, this is from catalog and in this case, this information is synced into my catalog through the Salesforce integration where, that context lives. Okay. so I have now lots of useful information in this thread. I just wanna make sure that this is captured in a structured way too. So I'll say, can you update.The affected customers field, please. and as I do that, I'll just switch right over to properties and we should see those custom fields be set in a moment. obviously it's important for me to track this information in a structured way too, so that when it comes to analytics and things like that downstream, it's all here.Awesome. So yeah, we can see those five customers marked against the incident. Now the other thing that's really cool here is we can see the derived custom fields that have been set based on those effective customers as well. So the total revenue impact and the number of customers too. Alright, at this point it looks like we're in a good spot, so I'm gonna hand back over to Pete who's gonna show you what we do next.Once this incident is all closed out and we're ready to review and learn from it.Pete: Moving on to the final thing that we wanna show you guys today. we call it internally magic postmortems, product name TBC. But this is how I feel about it. and before I get into that, just like a little bit of context that I think is relevant, I think. The thing that matters here is not, can we write you a postmortem and be done?I think Brian gave a great example earlier of, writing these things is incredibly cathartic, but also incredibly good for learning and we don't wanna deprive any, anyone of that equally. the goal is not, I don't wanna deal with my customers, I don't wanna share context with them. I'm gonna let their AI do all of the work.that's actually something that I want to do, right? As a leader of my company, as a founder of my business. But going through all of the tools, going through all the Slack channels, going through all of the support requests, trying to pull all of that together over hours and hours of work and pulling it into a cohesive narrative that I wouldn't mind if someone else took a first stab at.and I think that's where we've started from, principles wise for postmortems. And I think what we've managed to build, what the team have put together. Genuinely changes the game for me when it comes to how I'm gonna think about and write my postmortems. I'm really excited about it. probably like more so than almost anything we've built this year.A SRE pending 'cause that's not technically, released to everyone yet. but I wanna show you some of what it can do and why I find it so exciting. if whoever's in control of AV could switch back, let's go to, an incident that I've plucked out. and let's go, instead of the investigation tab, let's go to the postmortem tab.You'll see here we have a big, very tempting button, saying generate postmortem, promising the world. let's see in a minute, if we can deliver, in glorious, we call this color alade, which I think is quite a good, brand color pun. anyway, I'm gonna click this and what's gonna happen is the underlying AI technology that Lawrence talked to you about, the powers AI SRE is gonna get reused to instead go well, if we can figure out what's going on when everything's on fire.We should be able to do a really good job of figuring out what's going on when it's already happened. and let's see if it can do exactly that. So I'll set it to work and we'll see what it does. and by the way, I expect when this works. I expect rapture applause, from everyone in the audience. the Americans were a lot of weeping.I'm not expecting that, but I'll take claps.Oh.So what I love is that I know for a fact in the breakout room we put all the team there 'cause we wanted to save room for you guys. I imagine they're having a great time right now. So good job team. so let's take a look at this. This is not a final postmortem, nor am I gonna pitch that it is. but there are a few things that I wanna call attention to.The first is, yes, it's a document, it's text. yada. Let's go look at some of the really cool stuff that's just happened aside from the fact that it's generated it like this is not just text. If I hover over some of these things, like we've managed to enrich this with context from only instant iOS platform, right?your Google Doc isn't going to know about all your users and what roles they played, what teams they're in. We know that if we get it wrong, we can go adjust that information and it gets represented live here. Similarly, like timestamps. I trying to publish the postmortem on our blog I, and not in this system like PDTs and UTCs and BSTs everywhere, and graphs not lining up like such a pain in here.We know exactly when things happened in your incident. We can just show them to you in a way where. It says UTCI get UTC plus one. 'cause that's what matters to me. If Tom was looking at it, he'd be seeing it in PTT, right? these little like annoyances, these little friction points. We can do a much, much better purpose-built and tailored experience for writing postmortems through things exactly like this, adjacent to timestamps, durations, change a timestamp and then go everywhere in the document to figure out maths.I don't wanna be doing that when I'm doing a postmortem. It's one o'clock in the morning, I'm already tired, I can go here and I can look and it says, oh, this is the incident duration. It's made up of two timestamps in the platform, reported out and resolved that. And these are all features that already exist.If you're a customer, you might already use these things. You have custom durations. You have custom timestamps. You can put all of that embedded natively in, in the platform. and like I said, a moment ago, if I wanted to, I could go over here and I could go and edit those. Maybe I go and say, rather than starting at six o'clock, let's say, I realized it actually started at seven.We have a lot of timestamps. and it'll update live and the duration updates live and it's this is how I want to be editing my postmortem. I wanna focus on what actually happened, not changing text to make sure that it reflects that. Other things, like obviously if you mentioned pull requests, we've done some really nice little UI tweaks there to mention those.If youve used notions, probably quite a familiar, ux. and we have other things we can add, so maybe I could just put in. What's the related instance to this one? 5, 1, 2, 6. So if I wanted to say 1, 2, 6, I can tag incidents as well as people. And again, like that's much nicer than a link in a document.And actually, you don't even have to go look at it. I can just load it up and it says it's a minor incident that was closed in production, lasted about a minute and 43 seconds. that's the kind of time I appreciate. As you've just seen, it's also a very edited, it is a totally editable document.I won't labor this too much. You've all seen documents before, but I could say Hey, save zero. we have all the things that you'd expect, I can do some nice formatting strike through. I won't do that. Say hi to you guys. but I also have access to other features as well. So one of those is, off to the right hand side, I actually have access to the same sort of interface that Ed was showing you just a few moments ago.And what's cool about this is I have everything Ed had, but now I have all the context on both the content and the structure of my postmortem document. So there's even more cool things we can do with that. I'm not gonna bore you by asking generic questions on the document, but some of the things that I found useful are, can you review this for structure and tone and compare it to all the other postmortems we've written and say if there's anything that you think should be added that might be missing.Are there any follow ups from all the context that you hold that are not already listed in this incident? Things like that can be really, useful. But I can also ask really specific things. so Ed was showing some things around customers earlier. We've actually, seen these drafted into the document naturally.but if I wanted to, I could highlight those and we've actually added the ability to just ask ai. I could say like what packages, or what plans, let's say, these customers on. That's not context I have in this document. and I think arguably we could even, make these tags in the future, right?But for now, let's let the platform answer that for me and it's gonna go and figure out, these look like customers. I think they're in the catalog and I think we have context on their plans in the catalog. And that's gonna come back and it's gonna give me a response. I'm gonna give it a second to do that.Cool. So I can see majority of these enterprise customers that tells me something very different. Than what those names tell me just on the surface. And it is a con, slightly contrived example, but as I'm reading this as an exec, I'm like, okay, I vaguely know what those customer, who those customers are, but I dunno what plans they're on.I dunno how much revenue they're paying us. I dunno how they, if they have any custom support SLAs, I can probe and prod. I haven't had to speak to the team, I haven't had to add comments and bring them in and say, Hey, can you clarify this? The platform already has all that context. I mentioned follow-ups.I'm not gonna repeat what Ed did, and generate a load of actions, although I could do that if I wanted to. if I scrolled down. Yeah, there we go. so it is actually taken a few of the follow-ups. The team have already tagged. And it is not just rendered text as a sort of structure here.Sorry I skim past it, let's not read a postmortem in a live demo. and it's represented follow-ups as well. So it's actually capable of rendering rich blocks is what we're calling these. But follow-ups, timelines, things like that, it can actually render those as well. So there's not just a text rendering thing, it will draft up, here are the follow ups you should take.Here are all the specific points and key things on the timeline. And if I want to, I can go and edit those. If I fix this in linear, this is now live synced in. So how many times have you written a document that says We're gonna do 10 things? You come and read the document six months later and go, did we do the 10 things?Where are the 10 things? Who has the 10 things? We don't have that problem. They're all here. If someone actioned this right now, it would say that's been done. And I can jump straight into the issue if I want to on timeline, maybe I don't want to view it like this, like I want it like this because I know everything that happened.Maybe Steven comes in and says, can I see the detailed version of this? And. The editor here has all the context and all the ability to mutate this document at Wilt. So I wanted to, I could edit through typing. I dunno why I'd do that off to the right, but I can change things here if I want to as well.really cool. and there's tons more I think we can do here, both on these blocks, the way we render the text, and also the way we allow you to edit and mutate it and all we're trying to do, we're not trying to. Change or replace, the people writing the postmortem, but we're trying to get them from where they are to where they know they want to be without all the craft and the admin and the kind of operational typing away at your computer bullshit that sits in the middle.Final things, that I wanna show you, I guess one call out is, so far, this has just been me. postmortems are an inherently collaborative thing. although I have written, my fair share of lonely posts in my life, mostly for instance, that I have caused, but one of the things that you quite often wanna do is pull someone in.So let's say, I've got a question that a OS or E can't answer, or that, sorry, the platform can't answer. let's do response focus on code level, diagnosis of escalation policy, save logic, what the F is escalation policy, save logic. You might be asking yourself, and I'm gonna tag in, for example, Liz, and say what is this?I'm not sure. I actually think in this case, a SE and the instant platform, were probably able to figure it out because they have access to our code base. but for now, I'm gonna pull in this, and obviously she'll get a notification. You've all seen how comments work. she can come in, she can reply to that.but more than just going back and forth on the document, like obviously why do I have to chat to Liz through that? Why can't Liz just come in the document and fix it herself? So one of the really cool things that we've done is take. Everything you've just seen and make it real time and multiple.So I can bring my entire team into this document. for now it'll just be Liz, you see her working away. and I think that's really cool. And if I wanted to, I can invite, I dunno if any of the rest of the team are watching in the backstage room and just happens to be on their laptops, but if you want to come jump in, you're very welcome to.Yeah. Okay. They're all watching. That's great. and I think that sort of is the, for me, the ch the cherry on the cake. I dunno why I did that. This is a really bad idea. okay, cool. So at this point, while I let the team have their fun and mess around and celebrate the fact that, the demo worked, I just wanna call out a few things.I think the first is that on a more serious note, postmortems like this are such a rich source of context knowledge learning. we've already heard one great talk on how important they are. we published our writeup from Monday's outage, and honestly, we learned a ton. We've really done like 15 things to the platform to make it stable.Some in ways that are really obvious and others which are much more nuanced. And we've read through everyone else's as well. And there's a few things where we're like, oh, that didn't happen to us, but we got lucky. these things are really valuable. And so we wanna see more of them written. And I think the fact that they're so expensive and so complex and difficult to draft and they don't, you're hopping between different platforms and tools, it stops us doing that.Why don't we have every company writing an out writing a postmortem for every incident? Why is that not happening? And I think the barriers to entry is one of the main things, and that's what we're trying to tackle here. and then the final thing I wanna call out is that, it goes without saying.it's probably an obvious call out, but I'm gonna make it anyway, is everything. Your team learn, everything you write in these, everything, you jot down all the intricate learnings that maybe didn't surface and they come out in the debrief instead and, points after the incident. Everything your team learned from that, everything they know, we now know too.The incident io platform has this context, so it's not just guessing at what it thinks happened, it's going cool. if the team literally approved this writeup and did all the edits, like this must be the canonical version of the truth. Now we can compare that to what we think was happening in the incident and it can learn from that, right?And you've completed what we all know is a really important cycle of thing goes boom, fixed thing, learn from thing, going boom, make sure it never goes, boom again. And we've managed to bring that cycle into our platform. And this was the final piece of the puzzle that we've been waiting to solve for a very long time.if we flip back to slides, just a couple of final pieces from me. and actually I'm gonna let the team in the back room if you'll, oblige me. Have a quick round of applause. 'cause I think that was really cool.Thank you. What's next is then the question. So what are we doing with everything we've just seen? Is this going in production tomorrow? what's, the plan? already had several customers come and ask me about it. So the first thing is, you don't have to just trust me on everything you've just seen.There are three demonstrations out there. They have a. Similar-ish process where they can break a test environment that I think Chris has put together. and you can literally go and poke and prod this yourself. Go ask the dumb questions, go and ask, all the difficult things, go and ask the team how it works under the hood.we'd actively welcome that it's a real thing, we're not trying to hide it. the second is, from an a I SRE perspective, this is something we're working super hard on. It's already in the hands of many customers, very early access, and we're getting tons and tons of feedback.But we are really excited to get it into more. So if you're really interested, please do come talk to us. We are trying to keep the initial group very small because we're gonna try and go for high density, high volume feedback from a small number of people versus, trying to navigate 200 customers in an early access group.the good news though, is that postmortems is basically ready to go. So if you're a customer. Next few weeks, that's gonna be all yours. I'm super, super excited to hear what you make of it. There's already a few customers in the audience. We love all the feedback that you've got, we really, that you've been giving us.We really appreciate it. Please give us more. there is very few opportunities where you have the entire engineering team for a product you use in one place. I've already had six feature requests. feel free to go noble the team on that front. And I guess, the last thing that I wanted to leave you guys with.it's just a couple of reflections. after showing you the best of what we can do, I thought I'd wind back the clock and show you what it looked like when it was me, Steven, and Chris around the kitchen table. So this is about four years ago. The product had some really basic capabilities. It had a slack bot.You could have an incident, it could have four states, that I had picked, and I think a slightly janky integration mistakes page. And this is, I think this is like week three or something of work of working on this thing. I was very proud of it. I think it looks laughably basic after what we've just shown you.and if I fast forward that to today, I think it's night and day right? We have an end-to-end ai, native instant management platform. We've got a SE, we've got postmortems, we've got catalog, we've got status pages, we've got paging provider. We've got all these incredible tools, insights we haven't even talked about once today.Workflows, automation, chat. We are building the single place that you need to turn when things go wrong. And I think we're doing a good job, but we have a long, way to go. We are super excited to go ship all the stuff we know we can do next. We wanna get it into as many hands as possible, as fast as possible.and that means to all of you. and we're really excited to go and do exactly that. So hopefully you enjoy what you've seen here today. Little update from the incident. I, product team, I'm super proud of everything the team have done. I'm super excited to get it in the hands of you guys and hear what you make of it.And that's everything I've got. So thank you very much for listening and I'm gonna hand over to our next speaker.

London 2025 Sessions