How Data Meets Visualization — SolarWinds TechPod 065

Stream on:
When we think of alerts, we don’t necessarily think of them as things of beautyuntil now. Join THWACK® Community Manager Ben Keen and Technical Content Manager Kevin M. Sparenberg in an energetic discussion on taking your data to the next level. Ben and Kevin discuss the importance of making your data appealing to improve the user experience, help make sense of your data, and benefit your end users with a cleaner and prettier experience.  Related Links
Ben Keen

Host

Ben's love for computers started back in the 8-bit days and grew from there. After joining his current employer, he found himself working as the… Read More
Kevin M. Sparenberg

Guest | Technical Content Manager, Community, SolarWinds

Kevin's first computer was the family TI-99/4A. He's learned computing the best way possible: by fixing his own broken machines. He was a SolarWinds customer… Read More

Episode Transcript

Ben: Hello, and welcome to another episode of SolarWinds TechPod, where we get together and talk about all things tech. My name is Ben Keen, and I am the THWACK community manager and your host for today’s episode. Today’s guest is no stranger to the TechPod audience, Kevin. Say hi, Kevin.

Kevin: Hi, Kevin. Oh, that’s not how it’s supposed to go.

Ben: And our topic today is one that Kevin and I have discussed a few times over the years, and I think it will serve our audience well. Today’s topic is, how can you take your alerts, dashboards, and reports and move them to the next level? A discussion that is often overlooked, making things look good.

Kevin: Okay. Okay. Hold on. Hold on, Ben. Before we go any further, I think I may need a gut check here. So, we’re talking about alerts, dashboards, reports, basically visualization, right?

Ben: Correct.

Kevin: Okay. But this is a podcast, yes?

Ben: That is true.

Kevin: Okay. I guess the real question is how… All right. Let me go over. How are we going to express the importance of making things pretty and visually impressive when this is an audio-only media format?

Ben: Okay, Kevin, that’s a fair question, and I’ll give you that, but this discussion is more about the why of it, not so much the how of it, right? So, I think we can get away with just this audio resource. There are also a bunch of resources available in the show notes that we’ll provide to people they can look at, and I think it is important that we make it clear as to why people should take the time to do these things.

Kevin: Okay. Good enough. So, please continue, and my apologies for interrupting your wonderful introduction.

Ben: It’s okay, Kevin. I’m used to it. Honestly, I was pretty much done with the intro anyway. So, I assume that most of our listeners already know who you are, Kevin, and what you bring to the table, but I was hoping you could give us a quick refresher to any new listeners.

Kevin: Are you trying to distract me by getting me to talk about myself? Because I will certainly do that. For anyone who’s unaware, my name is Kevin Sparenberg. I’m the technical content manager for THWACK, the SolarWinds online community. I have been working with SolarWinds products for more years than I’m comfortable saying. I came from customer, I worked for product management team, I’ve worked on our public demo site, and now I work exclusively with the community, including going out and doing our SolarWinds User Groups, hosting THWACK Tuesday Tips, and a bunch of little tech videos, and being guests on wonderful podcasts such as this.

Ben: Absolutely. Well, Kevin, I appreciate you taking the time to join me for today’s TechPod, and let’s get into the meat of the conversation, shall we? What’s the first thing you really want to ask yourself when deciding if it’s time to make things pretty, right? When is the time to look at that aging report or at that alert and say, “You know what? This can use a facelift.”

Kevin: Okay, I’m going to say this probably has less to do with the report and the formatting, and for everyone listening, I’m using the word report, but I mean like reports or dashboards or alerts. You don’t necessarily need to care so much about what it looks like. You need to get in the mindset, or even better, talk to the people that are going to be consuming this because this is very much a consumer-driven thing. When we talk about data in the monitoring space, that is really not so much a consumer thing. It is really a reaching out and pulling information from all over the organization.

Kevin: But then, when that gets summarized and rolled up and you build your alerts, your reports, your dashboards, whatever that happens to be, whatever you’re looking at in this very specific scenario, the real question is, who’s consuming that? Is it going to the technician who can go and fix this thing? Is it going to the CEO, the CTO, the CIO? Is it going to just the director of infrastructure? Is it going to just the storage admins?

Kevin: That’s more important, at least as far as I’m concerned, to make sure you get the right data in there first, because there’s nothing more worthless than a 45-page report that you then have to take, stick into Excel or Google Sheets or whatever, and then summarize it some more, and then build in your trend lines, and then do this other thing, and do that other thing, and then put it in a pot and mix it around, put it over a fire at 350 degrees Fahrenheit for four hours before you get out the data you want. That’s the real important thing.

Kevin: So, when I think about pretty, I don’t always necessarily think about colors. That’s different in other areas, but when I think about things like reports and dashboards, it’s really about boiling it down for the receiving audience. That’s the very, very first thing that I ask myself, and thankfully, I can normally go through and talk to the people if I work in an IT organization. If I work with the people on THWACK, I can ask very specifically, “What is the target audience for this thing?”

Kevin: And unfortunately, it’s something a lot of people overlook. It’s not a failing in their part. It’s something that they just don’t always think about. It’s something that they are so consumed with the data they’re getting in and the monitoring information. And for me, one of my… Is it a failure? It was more of a tripping hazard for me, was thinking my CIO wanted to see everything I can see. I mean, he might, or she might, but the chances are probably not, and until I actually have that in my mindset, I’m not going to build a report and alert or a dashboard that’s useful for anybody.

Ben: Yeah. And I think you brought up a good point, Kevin. I just want to reiterate it for the audience. We’re using the word pretty, right? And that word can take on a bunch of different meanings, but I think in our use case, what we’re talking about today when we talk about pretty, it’s not just putting makeup on a pig, or whatever that saying is. But to have a pretty report or a pretty alert or a pretty dashboard goes beyond just, “Oh, is the color scheme right? Am I using the right CSS?” But also, more importantly, to make it pretty, you got to be giving the information that the consumer of the information needs, right? What’s the point of including a list of every single application and its components and how it’s working if that report is going to the CIO or something like that?

Kevin: Worse, going to the network engineer who doesn’t give two craps about what the applications are doing. They’re just caring that the bits move.

Ben: Well, it’s always the network though, isn’t it?

Kevin: Don’t get me started on this, Ben. You know it’s one of my trigger words. But in that vein, where do we start? So, let’s say you’re working with the SolarWinds platform, and you have Orion… excuse me, you have alerts, you have reports, and you have dashboards. You’ve got those three, and they’re mostly the three things we talk about here. Which one, for you, if you are net new and you are not given a specific assignment by any one team, which of those do you personally want to make pretty first?

Ben: Well, honestly, I think in my experience working with the product, it was about making the alert pretty, right? Because if you think about it, Kevin, what is most end users’ interaction with the product, right? I think most of us would say it’s the alerts, whether it’s consumed by an email or an alert system or a ticketing system or something like that. I think most typical end-users reach SolarWinds through the alert. Now, if you’re a monitoring engineer, that answer might be different. You might be saying dashboards. Or if you’re a director, you might be saying reports. But I think as the monitoring engineer, you got to look at the broader audience, right? So, for me, it was alerts. What about you, Kevin?

Kevin: I went a slightly different way in my past life. In my past life, I was the monitoring engineer for a global law firm. I started off with the network, so I built a custom dashboard that only showed the networking stuff so I could “show value” to the networking team, to my peers. And realistically, I wouldn’t do that again. I would go, and in a similar vein, I would concentrate on alerts, because alerts… Number one, if an alert goes to somebody, they got to be able to take some type of action on it, because for me, an alert is something that a computer can’t do and a human has to do. Something is down, they got to go and push a button, or this thing is broken, so they got to go and call a vendor, those kinds of things. It’s not something, “Oh, well, the system can automatically delete some temp files, or can run this script, or can do this.”

Kevin: So, for me, the alerts need to give them just the right amount of information, and if I make it pretty, then people are more apt to read it. There is nothing worse, in my personal opinion than having a plain text email about a thing being down and not giving any details on it. Like, if I get something that says, “Core router three is down,” and I don’t know what core router three is or how important it is to the infrastructure, or I don’t know, even its IP address or its location, what am I doing with it? It’s useless.

Ben: Yeah. And I think when you have that type of alert, that’s when people create that rule to automatically send their alerts to a different folder outside their inbox.

Kevin: Or right into the recycle bin. Yep.

Ben: And then, when there is a bigger issue, the first thing they’re going to come back with and say is, “Oh, I didn’t get an alert.” And then, they’re going to look at the tool, and then you got to show them that the alert went out, and then like, “Oh, yeah, I set that rule up.” And for me, that’s really where I wanted to start with making things pretty. Yeah, it’s nice to be able to introduce like some CSS and stuff like that through the HTML side and add colors and this, that, the other. But the first thing I did was take inventory of my alerts and say, “What’s the story I’m telling?” Right? Because to your point, if you’re sending an alert out, that means somebody somewhere is waking up at three o’clock in the morning, some ungodly hour their time, to do something, so if I’m going to wake you up, I better give you something worth waking up to.

Kevin: Yeah. Because I don’t want the call at 3:00 AM saying, “I got this alert. What’s it mean?”

Ben: And I think once you get your alerts pretty, right, once you know that your alerts are firing… And part of making pretty, too, is also verifying your thresholds, right? Like, you want to make sure that you’re not sending off an alert for a high CPU, but your threshold is 70%, something ridiculous. Making it pretty is also making it effective. I think that’s another key point, especially when we’re speaking about alerts.

Ben: But I would also like to move… Since we kind of nailed down the alerts, right, we got them looking… we checked our thresholds, we made sure the information is looking good. The information is not only looking good, but it’s also accurate, right? Because that’s also important. And then, we added some CSS and HTML. And again, for our listeners, if you’re curious about how this stuff works, there will be some notes in the bottom with this release, and as well as there’s some great information on THWACK where you can go ahead and get a good head start. And also, you can actually download some of this stuff and just import it to your system. And that brings me to the next discussion, dashboards. Kevin, you know me. I love modern dashboards.

Kevin: Well, as you should.

Ben: I think everybody should. Absolutely. Since their introduction and their release, I think, we’ve seen SolarWinds really turn the curve, so to say, right? Rather than just being informational and almost bulletin-board-esque, we’re starting to see real good information coming out in a real, good, timely manner. So, let’s talk about modern dashboards, how they differ a little bit from the classic dashboards. Let’s talk about the prettiness. Again, not just talk about the information, but also, what are some of the advantages of the aesthetics of the modern dashboard and how our listeners can use that for their benefit across organizations. So, Kevin, what are your thoughts when it comes to how the modern dashboard feel and user experience is helping our customers provide a better experience for their users?

Kevin: Well, I actually had a three-hour call yesterday with one of our pre-sales engineers, a guy I’ve known for, I don’t know, eight, nine years now, a guy named Will. Will is one of my favorite people to talk to because he has virtually no filter, which makes him fantastic to talk to, but not always great on a podcast if you understand what I’m saying. But he was talking about how we released this at 2020.2, I think, was the first version, so that’s two years ago, roughly. And then, to use a vulgar term, we haven’t really advertised it. We haven’t really pushed. We haven’t marketed it, and we’ve let people kind of discover it on their own.

Kevin: And the problem with that is it’s not that it’s buried. It’s that it’s not the first thing you go for. We don’t force you to use these, and that’s, that’s good and bad, because if you’re not ready for that learning curve, great. That’s fine. Get the rest of your stuff in there, get your discoveries, get your information, get everything monitoring, and then you can always investigate this. The problem is, Ben, you know this as a monitoring engineer, I know this from my previous roles, you don’t always have that time to go back. So, carving out yourself a little bit of time to say, “You know what? Maybe I should look into this. It’s totally worthwhile.”

Kevin: And for me, modern dashboards really are set aside because of their flexibility. And I don’t mean what things you can put on there. That’s fantastic. I don’t mean the types of data that you can put on there. That’s fantastic. I’m talking about the flexibility of the way you can report things in, regardless of the ecosystem it’s coming from. If it comes from storage, it can go in there. If it’s a volume on a node, it can come in there. If it’s interface statistics, it can come in there. If it’s general KPI information, just simple up/down, or alerts, the number of active alerts that are in a critical versus just a warning status, or the ones that are serious and the ones that are just informational… Which, sidebar, informational alerts should not be alerts, but I digress.

Ben: No, they can serve their purpose. Maybe not as an alert, but…

Kevin: Yes, yes. There is a purpose, but they should also not be triggered all the time. Anyway, modern dashboards has this flexibility that you can build it for a specific scenario, and for me, the very first one, I think… the very, very first one I built when I got the new bits and got my micro-training session from the product managers was, I said, “Hmm, what would I have found useful?” And my mind immediately went to network. And then I said, “No, no, no, Kevin, you know better than that. What’s the first thing the people who would consume this would find useful?” And I think that’s when I went back and said, “Let’s take a step back and let’s…” To your discussion, let’s talk about alerts.

Kevin: Alerts are important, and I kind of wanted to get an idea of… In my environment, which was a lab, so it was of course horrendous, but what kind of alerts are triggered? How frequently are they being triggered? Because alert fatigue is something that’s exhausting. I mean, that’s why it has the word fatigue in it, but it’s also one of the biggest things that causes people to shuffle those emails or those team notifications into the “I don’t care to ever look at this again” folder, and being able to kind of get a holistic overview of everything about how all of your alerts are working in your system. Now, what’s the oldest alert you have running there? Is that available? Sure. You can get that in one in the Manage Alerts or in the Active Alerts screen. You can sort by the active time. Not a big deal. But maybe I want to know that, but I want to know some type of aggregate information, like what type of alert is kept open longest. In other words, which team is not closing the deal.

Kevin: And the follow-up to that is, “Oh, why is this team not closing it?” That could lead to further discussions about, “Hey, they’re not closing it because they’re not getting the right information in the alert,” and it all ties back together. So for me, modern dashboards offers that little flexibility that you can build it for the audience. And then beyond that, let’s say I do build this alert, right? Build this alert dashboard. I think you, myself, and… was it Jake? Kind of collaborated on building one that we have up on THWACK that people can just download and import.

Kevin: We collaborated on that one, and then I said, “You know what’s really nice about this one, is that if I’ve got someone else…” Let’s say I have a junior admin, or someone specifically from the networking team, right? And they want an alerts dashboard, but just for the networking stuff, they can essentially clone that entire dashboard and make edits, or they can steal the entire widgets off that board and then just put whatever filter in so it just shows their networking stuff. And for me, that’s the real kind of beauty of it, because I can build something almost like a template. It’s not really a template. I don’t want people to get the wrong understanding that it is a template, but basically something that’s really generic, but still shows value, and someone else says, “And that’s nice, but can you make it like this?” And then I can be like, “No. I mean, yes, I am capable of doing that, but how about I turn that around? Let me show you that you can do it.”

Kevin: That’s the flexibility I see with modern dashboards, because when we contrast that with the classic dashboards, or what in the very old parlance were views, those can only be edited by a very, very subset of people working with the SolarWinds Orion platform. And because of that, you, me, the other admins out there, we were the ones responsible for building that stuff. Is it horrible? Nah, it’s not always horrible, but it’s such an iterative process, because every time we make a change, we got to go out and grab somebody, be like, “Is this what you wanted?” And they’re like, “Eh.” And then you’re like, “All right, let me try again. Is this what you wanted?” “Eh.” “Is this what you wanted?” And now, I can be like, “You know what, Joe Admin, Joe Network Engineer? Here, you can build your own, and you can even steal these ones that I built that I think are going to work pretty well for you and just bring them in.”

Kevin: And that’s just the feature of modern dashboards. That’s not even talking about the way you can tweak things like the KPI widgets or bring in your own time-series information from your PerfStacks or working with the… What is it? I think we just call it the proportional charts, but everyone uses donuts because donuts are superior to pie. And then of, course, generic table resources, which are great. And it’s one of those things that people when they get into it, they feel like it’s a horrific learning curve, and I get that. Anything new is always a little bit scary. However, it doesn’t stay that way for very long.

Ben: No. And I can personally attest to that for the listeners. When I first learned about modern dashboards and I upgraded my system to 2020.2, the reason I upgraded was because of modern dashboards. And one of the things going out within the publication was, “Oh, your end users can build their own dashboards, da, da, da, da, da.” And I’m like, “Oh, this is a great time save,” but I didn’t take the time to necessarily learn how that was done. And I think that what came of that was talking to people on THWACK, our discussions, you and I, and just figuring out that I could do this.

Ben: That’s the one thing I would tell people too, when it comes to making things pretty, don’t think you can’t do it. We do have a great content exchange on THWACK. You can download dashboards that were made by myself, Kevin, whoever. And also, if you find something within those dashboards, you can leave a comment saying, “Hey, did you ever think about trying this or changing this?” and we can make another version of it, because again, the other beautiful thing, I think, about modern dashboards is that nothing’s set in stone, right? It’s not one of those things that you hit save and you can no longer edit it.

Ben: But yeah, I think for our listeners, once you get into the modern dashboards, to Kevin’s point, you can make several different sort of widgets where it gives very broad information. Like, you’re talking the up/down status of every single node in your environment, and then, if a team member comes to you, whether it’s networking, security, infrastructure, whatever, insert their department here, you can show them that widget and then show them how to edit it to get it down to where they want it. And again, that goes back to making it pretty, because nobody wants to see a table that is 75 pages long and to find that one particular node that you’re concerned about, you got to page, page, page, page, page.

Kevin: Yeah. And that’s some of the things we’ve been cognizant of. Of course, like all things new in software, modern dashboards in the very first version, like the one you were talking about when you were still a customer when you were still working outside, and when you were still working with your teams, you wanted to empower these people to make their own. But we knew when it went out the door that it wasn’t at what we would consider perfect. In fact, nothing is ever going to be perfect in the software world. You’re going to strive towards that, and you do that through increments, and we’ve had some small updates to modern dashboards. And I say small, but I mean, from the outside, they appear very minor, but from functionality, they are incredibly deep. The ability to define search fields, the ability to put in refreshes, the ability to link it off to other resources.

Kevin: Realistically, these are very, very small things when you look at the visibility of it from kind of the top level. However, it increases the functionality multifold. It gets them so much beyond. Just the search thing, if we go back to the… and I know I’m beating a dead horse here, but if you go back to like a list of all the active alerts, which like I said, you can get elsewhere, but maybe I’ll just put it in a modern dashboard, to have a quick search in there to find just the name of this one thing, maybe you did get a flood of alerts because you didn’t set up your groups right or something like that. And it’s an accident. Can happen. But you’re looking for that one thing, you can just go in there and type it and it’ll sort right to it, so that gets away from the 75 pages of pagination.

Kevin: Other technical things are, if you’ve used the classic dashboards, again, used to be called views. If you used classic dashboards, they were great. They did the thing, and then you scrolled down, and you’re like, “All right, cool. So this is right there.” And then the page refreshed, and you were back at the top, and it made you want to cry because you’d have to scroll back down. And then you’d be going off to another screen to fix the thing, and then you glance back over for more information, and it would’ve refreshed, and it’s back at the top. And these are the things… Some of that goes into planning. Like, “Can I plan this for whatever screen resolution a majority of the people are going to be working with?” Or if it’s going to be on like an ultra HD screen somewhere, then you can really stretch the res out.

Kevin: The fact that the page doesn’t refresh, the widgets refresh within the page, it sounds really minor. And I know this is not going to be like, “New and improved, whiz-bang” kind of thing, but what it is is it’s a value of life kind of thing. It’s like, this is a benefit for you as a generalist working with these kinds of things. Like, “Does this make my life better?” Yes. Yes. This thing right here, this makes your life better. And I think that’s kind of what we’ve tried to do with modern dashboards as an overarching theme. It’s like, we’re not telling you can’t get this data in this way. We’re not telling you you can’t do this in a classic dashboard. We’re not telling you you can’t summarize this into one of the little SWQL-based rollups that were very popular in Enterprise Operation Console. We’re not telling you you can’t run your own custom SWQL query and get the stuff dumped out to a micro table or build a report. We’re not telling you you can’t do that.

Kevin: In fact, that, I think, is one of the most powerful things about modern dashboards, because modern dashboards are 100% read-only. There is zero danger in letting people play with them. The only danger is them misinterpreting the data, and that’s where we can step in as the owners. As the owners of the monitoring platform, we can explain to them a little bit, like, “Maybe you don’t understand what this field means. You think you know what it means, and it’s got a decent definition, but it doesn’t really mean that in our specific environment.” Because that’s the one thing about a platform-based solution, is that it’s not necessarily… going to be identical for every single person, for every single environment, for every single organization. What it is going to be is it’s going to be very personalized after you get things set up and installed. And that, for me, is kind of the benefit to the modern dashboards.

Kevin: And some of that stuff’s been inherited across. We’ve taken some of the improvements there and rolled those improvements in other parts of the platform, but modern dashboards just does it so… chef’s kiss that I am like… The number of people that see it when we go out on the road shows or when we go to SWUG or when we go to the industry events, like the Cisco, the Microsoft, and the VMworld, and all that. We go to those industry events and we show them modern dashboards for the first time, and they say, “I didn’t know I could do that.” And I ask them, “Well, which version are you running?” They crack open their laptop, and they VPN in, and they show me, and I scroll to the bottom and be like, “You already can do this. Have you never seen the dropdown in the menu for this? Did you not see the notifications? Did you not see the stuff on THWACK?”

Kevin: And realistically, most of the time, they may have gotten these notifications, but they don’t “see” them. And I’m putting “see” in quotes, which no one can see because again, podcast. They don’t see them because they’re busy with their day-to-day lives. They’re busy with keeping things ship-shape and making sure everything flows in the right direction and making sure that the ship is running true. And I get it. I was in it. You and I both lived that life. But missing things like that, I think it’s important we reiterate them here so that people, even if you don’t have the time, and anyone who’s listening who hasn’t worked with modern dashboards, I want to know why, if at all possible.

Kevin: But anyone who hasn’t, I hope this discussion encourages them to at least try to build one, whether you import one, and actually, that’s probably what I’d recommend. I would recommend you import one from THWACK and just give it a try, just to see, and if you don’t really feel comfortable with that, then maybe just try to build one, even if it’s just basically the same data you’re using somewhere else that you’ve already built in a classic dashboard, just to understand the flow and the flexibility of what you can do with the colors and putting things as a percent scale and all of these things that used to be relegated to portions of the platform that were only for alerts and reports are now all being able to be made visible in the same way.

Ben: I think you brought an interesting point. So, again, talking about my personal journey within modern dashboards and making things pretty, what I did originally was like, we had this dashboard, and we called it the executive dashboard, and it was really meant for the senior leaders within the organization, but it was just so much information. It was just complete mission overload. And what I did to show the value of modern dashboards was I took that information, and I broke it down. I think it ended up being three or four different dashboards, just because there was so… I mean, I think when I first counted, there was over almost 70-some widgets on this dash. It was incredibly huge.

Ben: But to your point, I started there. I took something that existed that I was like, “Okay, I can make this better. I can make…” Again, using the word pretty, because we’re using that throughout this episode. “I can make this pretty. I can make this where it’s not just widget, widget, widget, widget, widget, widget. It’s useful, useful, useful, useful.” Because you can load a dashboard with countless widgets, right? I don’t even think… there’s not even a cap, right? I’ve never hit a cap on widgets.

Kevin: I’m sure there is some type of cap, but I’m also sure it’s astronomically high.

Ben: Yeah, me too. But again, as we’re talking about making things pretty, there’s a balance line I think we have to… And let’s talk about this a little bit for a few minutes, Kevin. There’s a balance between good information and too much information, right? Like, if I make an alert, and in my thought process, when I was a monitoring engineer, I wanted alerts, I wanted somebody to pick it up, see it at three o’clock in the morning, say, “Okay, this server at this location is having this issue. Move.” How do you balance what information you want to give versus what information you have? Because let’s face it, the database is chock-full of different matrices, right? So, in your mind, how do you craft that decision of, “Okay, this is the information I want to give, even though I have all of this information available.”

Kevin: Well, that depends, again, on the audience. And for alerts, at least for the first version of them, I almost want to give them too much information. And I know that sounds horrible, and it is a gross oversimplification because there are ideal things. But there are things for an alert because I want someone to be able to look at that single alert and be able to take an informed action. Then, I need to give them as much of the detail around that as possible. Now, does that mean that, hey, I literally give it the full obituary of this thing and everything? “On June 2nd, it did this, and on April 7th, it did that.” No, probably not. I probably only need to keep it relatively short.

Kevin: But there are key things that we’ve talked about countless times at THWACKcamp episodes, at SWUGS, SolarWinds User Groups. We’ve talked about in the live casts. We’ve talked about it at SolarWinds Lab. We’ve talked all the time about what makes a good alert, and there is the absolute minbar. There’s a minimum bar in there, and that’s like, name, IP… whatever the device is, and we’re going to go real generic here, so we’re just going to say server, not application, not a port, not a unique… Name, IP address, any descriptive information you have, where… operating system is occasionally helpful, what it’s connected to, and what its primary role is in an organization. What’s that? Five, six pieces of information, basically?

Ben: Mm-hmm.

Kevin: And of course, the alert, what triggered the thing. Duh. I mean, that’s kind of implied, right? So, let’s round up, we’re talking 10 different things I want to include with this, and that’s plenty. Is it overkill? It might be. It legitimately might be for a node down. Maybe I don’t need to know about what specific role it is. I just need to know this is a production sev-one device, and therefore, if it’s down, I got to get it back up and participating. And that’s enough. Maybe that’s enough for that alert.

Kevin: But I always start by going a little too verbose in alerts and then speaking with the teams that receive them and say, “Is there anything else you need, and is there anything in this alert that is extraneous? Basically, am I giving you garbage you automatically know, because… I don’t know, by this IP address that’s on 10.140, if it’s on that network, I automatically know it’s the DMZ, so you don’t need to tell me it’s in the DMZ.” If those are the kind of things that you’re getting back from the people who are receiving these, then take that in stride and work with them on that. Some of that ends up being an iterative process.

Kevin: Now, if you want to talk the complete opposite side of the spectrum, like dashboards, then I want as little of that kind of stuff as possible. Legitimately, sometimes all I want is a count, a count of things that are one way or another. If we go in the simple example, up, down, warning, critical, unknown. If we talk the kind of classic five statuses, if that’s all I need to get over… not the best way to phrase this, but to get a health score of my environment off of those five specific metrics, then maybe that’s all I run with on a dashboard. And then, that will encourage people to go further. If they happen to be a network engineer or a monitoring engineer or a sysadmin or a DBA, they’ll want to go further in that for investigation.

Kevin: But their manager, maybe their manager only cares about kind of that general health score or that pie chart or whatever that kind of gives them overall breakdown. And the C-suite probably cares even less. They just kind of want to understand, this is the way… the overall health of our infrastructure is, whether that happens to be on-prem, some type of hybrid, or in the cloud. That’s maybe all they care about because they’re beholden really only to the president and CEO, kind of stuff, and if they need more information, they got whole teams under them to get that information for them, so they can ask that directly.

Ben: Yeah. That’s one thing that I really loved about when I started making things pretty, right? The abilities to show big, bright colors. When I think about if I was that C-suite person, whether it’s the CEO or CTO or insert chief-level officer here, all I care about is seeing a big green dot. Green means good. Green means money. Green means everything is happy. I wouldn’t necessarily care that the one domain controller’s running slow. Can people log in? Can we do our job? Can we make money? Yes? Cool. But again, to your point, as you go down the decision tree, you care more and more, right? You start at 30,000 foot, and you come all the way down to your ground level, right? To me, you, where something goes wrong, we’re the ones fixing it.

Ben: One of the last things I want to talk about has always been kind of a personal thing for me, but it also falls into the purview of the conversation. When you make things pretty, right, you have all the right information. You’re providing good context around what’s happening. They’re appealing to the eye because again, that is also part of making it pretty. What’s going to happen is you’re going to notice an uptake of attention, right? Because now people have stopped that rule sending everything to the trash can because they’re seeing a value-add by this alert, but then they start questioning the validity.

Kevin: Of course.

Ben: “Well, that alert, ba, da, da, da.” And the one phrase that just kind of irks me is “false alert.”

Kevin: Oh, you mean false positive.

Ben: Yes. Because to me, there’s no such thing. What it means is even though it may not be a task for… Say a router goes down, right? Or an alert shoots that says, “Hey, this router’s down,” or, “This server’s running at a high CPU.” That’s a better example, high CPU. And they say, “Well, that’s how it always runs. That’s a SQL server. It always runs… Chews everything up, blah, blah, blah, blah.” So, that’s telling the database team it’s a false positive.

Kevin: Yeah. But it’s not, because the data says it’s violating these thresholds you have in place.

Ben: Correct. And I think that’s what a lot of people when you start making these things pretty, you got to go back and you start… You as the monitoring engineer, right, it’s not a false positive, because there’s an action for you. You need to start that conversation and say, “Well, here’s the thresholds that we’ve exceeded.” Right? If you’re telling me how this is normal, then what’s abnormal?

Kevin: Yeah. And some of that goes into being able to use baselines. It’s one of the things… I’ve said it before, and I don’t know how recently I’ve said it. I’ve come up with this theory that regardless of your NMS, and obviously we’d prefer you to use the SolarWinds Orion platform or the SolarWinds platform solutions, but regardless of your NMS, I think what you really should be doing is just kind of collecting data for like a week or two, preferably two weeks so you can understand some kind of upsy-downsies and get some information, and don’t actually trigger at least notification alerts, like things that go out and email people or send something through a Slack channel or create a ticket. Don’t do that.

Kevin: Everything seems to come back with me to the same stories. Same thing I did when I worked retail. This is going way, way, way back, and I was brought into the new position, and I was working at a new store. For six weeks, I did nothing. I did basically the minimum. I came in as a manager. I worked the bare minimum, whatever I was supposed to do to do the thing, and all I did was observe. Figure out what the organization’s baselines are. In that case, figure out what the store’s baselines, how long the checkout lines are, how efficient the people were. Did we have a lot of people who were taking a little extra time on their breaks?

Kevin: It sounds minuscule, but being able to kind of step back and observe that week or two in a monitoring system, because obviously in retail, things move slower than collecting millions of metrics every hour, there is somebody who can go and look at that information after you’ve collected some and determine what is normal. And of course, the rule change, if you happen to work… Yeah. Ben’s chuckling, because Ben knows… he worked for a retail company in the IT organization, and he knows that the last month of the year was the crazy time, and that meant if you were recording stuff in December, and you worked in online retail sales, then you know December’s not really a good baseline for the other 11 months of the year. I mean, can you use it as the worst? Like, the top end of that curve? Yeah, of course, you can. But understanding a baseline on this is really helpful.

Kevin: And to your discussion, talking about if we had the SQL admins in there, and we’re like, “Hey, this, the CPU’s running at 90% all the time. Is that threshold right?” And they’ll be like, “Well, no, it’s not. It’s always going to run hot. Normally it runs around 83, 84, but then occasionally it’ll spike and it’ll stay up there for a half-hour or an hour, depending on if it’s doing its backups, or if it’s doing its routine maintenance, or if it’s doing these other things.” And you’re like, “OK, cool.” And for me, that’s where that little checkbox that says “Use baselines” is amazing. And that ties into a larger alert discussion where people are always complaining, and it’s not their fault. I don’t want to say they were misled, but they weren’t specifically told the alerts that any NMS, ours included, that ship out with the product are meant to be templates. They’re meant to be examples of what the platform can do. They are not meant to necessarily go into production.

Ben: Correct. Yeah.

Kevin: Which is why, years ago, and I’m talking it’s probably been seven, eight years with the SolarWinds platform, we’ve got that duplicate and edit button. So, literally, you can take this entire thing, duplicate it, edit it, turn the original one-off, and then… Phrase that one of the Head Geeks used all the time was, “Salt to taste,” which is maybe this particular type of alert doesn’t apply to all nodes. It only applies to, I don’t know, Cisco nodes, and it doesn’t even apply to all Cisco things. It only applies to Cisco routers.

Kevin: And similarly, naming your alerts something logical and actually giving them a description. The number one thing that IT people always seem to forget, I am 100% culpable in this, is putting in a good description. Because I’m like, “I’ll remember what this was for.” And it works great if I go back to it tomorrow or if I go back to it next week. If I go back to that thing in 90 days, six months, a year, I honestly go to myself and say, “What was past Kevin thinking when he did this?”

Kevin: So, filling out things like that, although it doesn’t necessarily go to the pretty, it can. That’s the one thing people kind of omit a little bit in this when they think about the alerts, is that you can include the alert description in there. Even if it’s all the way at the bottom of the email, so it’s not really seen, and people don’t skip by… what phrase do we use? It’s below the fold. An old newspaper analogy, but it was below the fold, but that still means it’s in that email.

Kevin: So, when people bring up that email and say, “This one’s garbage,” you can read the description literally in the email that was sent and say, “Oh, well, yeah, I’m sorry, this one is garbage for you and for your scenario, but this is the exact name of the alert, and this is the exact description of the alert. What I’m going to do is I’m going to clone a copy of this alert, change these couple settings, and I’ll change the description. I’m going to put you on that one and take you off the old one. How’s that?” And that’s how you could have that iterative journey with those people.

Ben: Yeah. And it’s almost like when you start doing coding or even some long SolarWinds Query Language queries, it’s always good to leave some comments as to, “Oh, this mean…” Like, if you put where, “Status equals zero,” well, you and I know zero is offline, but if you bring in a junior or someone, you leave the organization and someone takes over your position, they may not know that zero means off. They may think that zero is good and one is bad. So, again, that makes things pretty. It may not be customer-facing pretty, but trust me, as the admin, you’re going to love the fact that someone took the extra couple seconds to put a comment saying, “Zero equals down, one equals up, 14 is on manage,” you know?

Kevin: Yeah. Whatever it happens to be.

Ben: Whatever it is. And I know we’ve been talking a lot about alerts and dashboards, but for our listeners, I just want you to know that a lot of this also comes down to reports. When you make your reports, you want to make sure that everything we talked about. Is the data accurate? Are you giving the right information? Is a…

Kevin: Are you summarizing it in the right way for the intended audience? And that’s the problem, and with reports, is that it’s easy to understand alerts. Alerts are for the technician. It’s easy to understand dashboards. Dashboards are for the C-suite or management. Reports falls in kind of a gray area that is not in between. It’s all-encompassing. And honestly, I am garbage at determining what reports are good for what teams and the only way I ever got any kind of forward motion with it was to literally drag them into a room and say, “Let’s whiteboard out exactly what you want to see on here because if I take a guess, it’s probably going to be wrong.”

Ben: Or it’ll be the Kevin Report.

Kevin: It will be the Kevin Report, which will be 100% accurate based on the data I have and completely un-informational for you. And see, but once I have that kind of thing, it’d be like, okay, let’s say you want a report that has a chart here, and this here, and a table with these filters, and a table over here with these filters, and you just want these things over there. All right, cool. That’s what you want, and you want it 900 bytes wide or pixels wide so you can print it out and it doesn’t look like garbage. Okay, cool. Number one, why are you printing? That’s a different discussion. But let’s say you want that. And then you get to have that follow-up conversation and be like, “Did you want me to schedule this for you?” And they’re like, “What?” I was like, “I can literally make this a weekly report so that every Sunday at midnight, you have the previous week’s blank information.” And then, “Oh, did that not work? Did you want it monthly? I can have it every month. Did you want one of these in your mailbox every morning? I can do that, too.”

Kevin: That’s the thing people seem to overlook, is some of the… I’m hesitant to use the word automation, but it’s… probably report delivery is a better way to phrase it. People don’t have to go into the system to see these reports. People can just get them waiting for them when they start their days. I think I still get some from a database performance analyzer once a week about what is outside the norm from the demo lab where I used to work because you know what? It works fine, and it’s incredibly helpful, actually, to have that information.

Ben: Yeah, absolutely. The one thing I’m hoping that people take away from listening to this podcast is that you understand now why it’s so important not just to make things functional, but to make things look good, right? Yeah, we get that ticket to make this alert, and we want to get it out as fast as possible, because we know there’s 900 other requests coming behind it. But trust me, if you’re new to the monitoring space, or if you’ve been in the monitoring space for years, and you haven’t quite made to switch to this type of think process, understand that you’re going to be saving yourself so much time down the road, because you’re not going to have the email conversation back and forth of, “Oh, change this. Change that. Change this.”

Ben: To your point, Kevin, sometimes it’s worth saying, “Hey, look, pause. Let’s getting into a meeting space, whether it’s virtual or in person, boot up a whiteboard. Let’s get this baby drawn out.” Because typically, that 30-minute meeting will save you hours, countless hours, of back and forth. Kevin, I really want to start wrapping things up, because we could talk at length for days upon this subject, I think.

Kevin: I think we actually have before, so I think this is probably a good time to wrap, yeah.

Ben: Yeah. So, what I will tell people, just to wrap this up, if this is something you are interested in, or if you’ve tried it and it hasn’t quite worked out the way you wanted it to, or if you’re just getting started, be sure to log to thwack.solarwinds.com, come into our discussions, ask us questions. You can find myself. I am the the_ben_keen. Kevin, you want to plug your THWACK ID?

Kevin: Sure. I’m KMSigma, K-M-S-I-G-M-A, on pretty much all the things social and work-related. If you go to THWACK, check out the Content Exchange. Even if you don’t download anything from the Content Exchange, it’s a great place to see ideas. It’s a great place to just look and be like, “Someone else did this thing. That’s a really clever way of doing that. Maybe I could do something like that.” And then, if you have that, you can be like, “Hey, I have a request. How do I do this thing?” And that’s when you can jump over to the Report Lab or the Alert Lab and ask the questions in there. There are tons of people there willing to help. Ben’s there every single day. I’m there every single day. Our THWACK MVPs are in there probably almost more than we are, but everyone there is in the same process. They’re trying to do better with the solutions they already have because if there’s one resource you can never get back, it’s time.

Ben: Absolutely.

Kevin: You can always ask your company for more money. You can always ask for more bodies and resources for it, but you can never get back the time you’ve lost, and let’s be 100% transparent here, your company is not paying you for your time. They’re paying you for a slice of your life, so what you’ve technically done is lost a little bit of your life in this process.

Ben: Absolutely. And again, to the listener, come to THWACK if you have your questions. There’s over 185,000 modern professionals. We’re actually getting pretty close to hitting the 200,000 mark here, hopefully by the end of next year. It’d be great. But your questions will be answered quickly. No question is too small. No question is too large, so come to us with your question. The community is great. It’s all about interacting. It’s not about passing judgment or anything like that. It’s about helping people achieve their greatness, because… Great example, Kevin helped me make my first modern dashboard.

Kevin: You’re welcome.

Ben: And then, I took what he… Very much. Thank you. And then, I took that information, and I started developing my own, and now I’m helping other people build theirs. So, it’s all about helping others, and I think that’s the best thing about my job, right? I love this community manager’s job because I help other people. It’s so much fun. I get to interact with so many great people across the globe. Kevin, I’ll include you in that. Yeah, you know. But thanks again for taking the time to listen to SolarWinds TechPod. If you have any questions, by all means, drop us a message, and we’ll see you next time.

Kevin: Thanks again.