IT Monitoring: Managing Up — SolarWinds TechPod 042

Stream on:
Hang around with IT folks for anything longer than your typical stand-up meeting and you’re likely to hear some variation of “Managers! Am I right?” Snark aside, as folks who design, deploy, and maintain monitoring solutions, we’ve got a choice: we can continue to dismiss and diminish people who don’t know about our discipline; or we can take the time to educate and inform them.  It’s in this spirit we are building spacesfrom THWACK forums to conference talks to this podcastcreated geared to answering the technical questions managers—those folks who both lead and represent the boots-on-the-ground monitoring engineers—might have.  Related Links: 
Holger Mundt

Guest | Technical Architect — I.Tresor - A Loop1 Company

With almost 17 years of professional IT experience, Holger started his journey as an intern at T-Systems, a subsidiary of Deutsche Telekom, while studying Electrical… Read More
Leon Adato

Host | Head Geek

Leon Adato is a Head Geek™ and technical evangelist at SolarWinds, and is a Cisco® Certified Network Associate (CCNA), MCSE and SolarWinds Certified Professional (he… Read More

Episode Transcript

[Intro Music]

Announcer: This episode of TechPod is brought to you by THWACK.com, the SolarWinds community for IT Pros – that’s where you’ll find the Monitoring for Managers forum, where we look at monitoring from a manager’s point of view. Join the conversation at thwack.com/m4m.

Leon: Hang around with IT folks for anything longer than your typical standup meeting, and you’re likely to hear some variation of, “managers, am I right?” And yet, it’s our managers more than almost any other part of our organization who can make our work a pain or a paradise. Not only do organizations expect our managers to effectively direct our efforts, business leaders look to them to provide both insight and justification that enables us to keep working on the right things in the right way, with the right budget. As folks who design, deploy, and maintain monitoring solutions, we’ve got a choice. We can continue to diminish and dismiss people who don’t know about our discipline, or we can take the time to educate and inform. And that’s what today’s TechPod episode is about, providing the education and context for managers, so they can effectively represent and lead us. With me today to help with that is Holger Mundt, a solutions provider who occupies a unique place in the IT ecosystem, one which gives him a special vantage point on this issue. Thank you so much for joining me today, Holger.

Holger: Thank you, Leon. It’s a pleasure to be here and a great honor.

Leon: Before we dive into our topic, I want to take a minute to do a little bit of shameless self-promotion as I like to call it. If people wanted to connect with you or find out more about you, tell us a little bit about yourself and where they can find you on the Interwebs.

Holger: Sure, so my name’s Holger, as you obviously have seen in the show notes, and I work for a SolarWinds monitoring consultancy that some people, if you might know, we recently joined the Loop1 family with our little business here in Germany, and we’ve been doing monitoring and management projects for, I think 10 years now. So if you want to connect with me, you can find me on Twitter and on LinkedIn, and for the German guys who are listening, you can find me on XING.

Leon: Okay. And what we’re going to do is we’re going to have those links in the show notes. So if you’re starting to scribble down any of the links that we talk about in this podcast, stop it, put your hands back on the wheel, pay attention to the road. We will have notes for you. You’ll be able to pick that up. Okay, just a quick safety, safety first as always. To jump into this topic for real, I want to start with a couple of, sort of common themes that you see or hear from managers over the years. What are some just common threads that come up?

Holger: So from my past experience doing different projects with not only the IT guys, but also the managers, the managers tend to somehow want to have a high-level overview of a very detailed information. So does it sound like an oxymoron? Yes, it is. So you sometimes have to get them the idea what can you put in a high-level information, and what is detailed information? You can’t have both. So that’s something where we usually find to get everyone on the table and then discuss, that’s I think the … Also one of the main points that we should take out of the discussion today is get on the same table, discuss, and discuss with respect for each other. Not say, “He doesn’t understand it anyway.” Maybe he doesn’t, but it’s your job to explain it to him or her. And then we will usually find a common ground and say, “Okay, I got what you’re saying now.” And I can now rephrase my request for you and tell you, “Okay, this is what I want.” It might sound different in my initial requests that you couldn’t fully understand what I was meaning, but I know now how to phrase it for you.

Holger: So for example, I had one of my clients, they wanted to have a dashboard with a lot of information on it. And I said, “This information is too overwhelming for being on a dashboard. Let’s make it a report that you can find in your Excel sheet. Let’s make it something where you can just have it emailed to your desk. Not for everyone sitting right in front with a bunch of information on it.” So the misconception is that the managers have a desire to see everything. And they usually want to see everything also in detail, but that’s something where you cannot work in a visual or something.

Holger: That’s where you get your big Excel spreadsheets with a lot of numbers and everything in it. So yeah, the info bringing together in as little numbers as possible on a dashboard. So have 20 information boxes max on one page. And that’s something where we usually work with the IT engineers that actually have to do it. And the managers to say, “This is what we want to see.”

Leon: Right, I always love it. It’s like, “Tell me everything, I only have five minutes.” I’m not sure I can do that. So something I want to underscore is, first of all, these kinds of conversations are absolutely going to happen to IT professionals over the course of their career. It’s just the nature of the journey from an initial idea to the way it ultimately looks. And it’s okay to start off in what feels like the wrong place. You know, “There’s no way I can possibly show you all that.” Don’t get frustrated. Don’t get angry at the other person for not immediately and intuitively understanding how the technology works or the information goes together. I think, Holger, what you’re saying is, “Ask clarifying questions that shine a light on some of the difficulties without saying, ‘I can’t do that.'”

Leon: Okay, if we did that, it would look something like this. Continue to frame it. One of the things you said early on just now is that the context is really what we’re here to provide a manager, to help them be able to shape their idea. In many cases, they’re working with a few different business pressures or business goals. And so they’re trying to satisfy as many of those at one time with a single effort. And sometimes that’s possible and sometimes it’s not. And again, you’re trying to get that context also. Oh, what you really care about is this thing, even though you asked for something completely different initially, because you thought those words were meaningful and they weren’t, it’s a process. It’s a journey along the way. So that’s really good information. And I think that that’s helpful also for managers who wonder, “Why, when I present this idea to my engineers, why am I not getting back what I want?”

Leon: Is to remember again, from the manager side, it’s a journey and that you’re presenting your needs, your wants, your conceptions of how it’s supposed to be. And then how it’s going to look. As we were preparing, you had, Holger, you had another sort of broad category of conversation you have with managers, which is almost the opposite of that “show me everything, but I only have one screen on which to show it.” Can you talk a little bit about that?

Holger: Yeah, correct. That’s trying to condense all the information into like one number. Show me a green button if everything works, okay. That could be possible, if the definition of everything works okay would be crystal clear. So, and in some organizations, there are sometimes things that don’t work as expected, and it’s not something bad. It’s just things that happen. So if you pull everything into one number or one color, that color probably would always be yellow warning. So, hmm. Well, something’s always going on in the infrastructure, and depending on the size of the business, if you have just two to three servers and one or two switches, that could be possible that a green button should work for that company. But we usually deal with larger enterprises or larger companies that have more servers. And like I said, something’s going on all the time. If nothing would go on all the time, we wouldn’t need any engineers to fix stuff. That’s also one thing. So yeah.

Holger: There’s that, and yeah, like you said, circling back to the conversation, there’s just one thing that I wanted to add. Always keep in mind that we do speak sometimes different languages. And the good thing for me as being an a consultant is, I have to be the interpreter for both sides. So I know about the technical stuff. And I can explain to the technical guys, “Hey, this is probably what the manager wants.” If I just get thrown something at me and said, “Oh, my manager wants that again.”

Holger: And I would say, “Yeah, if you read between the lines, it’s not something that’s not doable. It’s just phrased in a different language.” It’s like, if I would talk German to you, you would say, “What are you talking about?” And if you would talk in some different language, like Spanish, I don’t know Spanish. If you would talk Spanish to me, I would say, “What are you talking about?” So we found the common ground of English as a language, because we both know English.

Leon: I think that that is, first of all, for engineers who are looking for perhaps a way to pivot their career, being that translator, that Rosetta Stone for the business is an incredibly powerful place to be. If you are comfortable doing that, being able to translate technical requirements into business specifications and vice versa. Saying, “This is where the business wants to go. And what that means technically is this other thing.” And we’ve talked about that here on TechPod and on THWACKcamp and other places, a number of times, how powerful that role can be.

Leon: But if you are a manager and you are finding that you have this disconnect with your engineering staff, that’s what you’re looking for is to either find someone who can be the Rosetta Stone for you, or better yet, work with the team to build a vocabulary where there’s an understanding, “Hey, folks, when I say increase revenue, because that’s one of our business goals is increasing revenue.” I recognize that monitoring probably isn’t going to increase revenue. So what I want you to think about is this other thing. Or when I say reduce cost, or when I say, so that as a manager you are communicating the business drivers and the business goals and helping working with the team.” Like you said, sit down at the table and build this translation vocabulary of, remember the business is interested in these things. So when I come to you with this type of request, what does that look like technically? How does that work technically?

Leon: And that’s a wonderful point to get us over to another conversation, which is about automation. That you had commented as again, we were preparing for this conversation, that monitoring, which is nothing more and nothing less than the simple regular collection of data and metrics from a set of devices that’s all monitoring, is a monitoring is not a beep or a clicky or a blinky. It’s none of those things. It is simply the ongoing collection of data. But when you start doing that, it often leads to a desire to automate. And what are some of your experiences when it comes to managers who want to start to automate things? Where are some of the roadblocks with that?

Holger: Yeah, correct. You introduced the topic very nicely. So when we go in as a monitoring project to some companies, it usually turns out not to be a monitoring project, it turns out to be a business process project. So we sit down with the engineers, and we tell them, like you already said, monitoring is just the collection of data. Then you put some alerting on it, then you put some reporting on it. But what you really want is automation. So, you don’t want to put in everything manually. You don’t want to have a node down alert for a system that sometimes goes down, but it really doesn’t bother you. So this is where we go in, sit down and say, “Hey, okay, let’s work with some values or some tags that we can put on devices that identify them as maybe something that’s not so important or maybe something that belongs to that group. So when that device has some issues, don’t send an alert to the centralized help desk or whatever, send it to that group.”

Holger: And this is where we also sometimes even circle way, way, way back and give guidelines on a naming schema. So when we come in and every note is named, so one is named liked comic books. The other ones are named like stars and so on. So there are a few creative ways how you can name your devices. So, and so that’s what we find. So I don’t know if you’ve heard of the comic Asterix®, and Obelix, it’s by a French?

Leon: Mm-hmm (affirmative), yes.

Holger: Yeah. So there was one server that was Trubadix and the big file server was Obelix. And so yeah, they were creative, but it didn’t help in automating any stuff. So we got back and said, “Okay, so if you want to grow and if you want to have a lot of servers, you need to have a standardized way.”

Holger: And they had different branches and different subsidiaries at different locations. So we brought in UN/LOCODE. So, it’s a naming schema for where your devices are located. And they said, “Oh, we didn’t know that exists. We were always thinking, ‘How should we name that location?'” And by going with UN/LOCODE, you could pull a standardized database and say, “Okay, this device starts with those letters. Look it up. Oh, that must be there.” So we were already able to automate where is that device actually sitting at? And then from there we were going on and the next letters were, is it an application? Is it a whatever device? This is a network device, and so on you. You can basically have a lot of different naming conventions, but you need to agree to one. Because if you have 20 naming conventions for your 20 subsidiaries, you’re not able to automate again. And this is something where we give the additional help to our customers.

Leon: Right, so Bob Lewis, who is a long time IT pundit here in America famously said that, “There are no technology projects. There are only business initiatives that have a technology component to them.” And the reason why I bring that up, that’s exactly what you said. But I want to mention to our listeners that if you’re an IT pro, you need to understand that, that there is never a project that is a technology project. We never do technology for the sake of technology. It is always in service to a business goal or a business need. And if you don’t know what that business need is, that’s a problem for you, because you’re going to end up implementing the wrong tech or the right tech in the wrong way, because you don’t know why you’re doing it.

Leon: The naming convention that you described actually is a really good opportunity for you and me to talk through the way that the technical side of the table and the manager side of the table might negotiate this. And what I mean by that is, you come in as a manager and you say that, “Our server names are all wrong. We need to go to a standard naming convention.” As a technician, as a less experienced technician, I might shoot back at you and say, “Well, that means we have to rename every single one of our servers. This is at least a six month project. And it’s going to completely mess up every single one of our applications, because they all use server name.” So as a manager, where does that leave you?

Holger: That’s where you then go and say, “Okay, we need to set up probably a new project due for our next server refresh that we need anyway, because we want to have support for the machines and so on. But we will then set the standard now that we are going to put over the next decade, whatever, how long it’s going to last, and then slowly rename everything on the go.” I know there’s some servers that have been running for 10 plus years, somewhere out there. But again, this is something that you probably shouldn’t do. And this is also where you put your company at risk and where you say, “Hmm, I have this running for 10 years. It’s been running for 10 years all right, and we’d never had any larger issues, but that large issue is doomed to come. So if you don’t refresh your machines, at one point, they will fail.”

Holger: Also, new machines will fail, but it’s more likely that those old machines will fail. And this is also something where the IT guys then can get to the managers, “Hey, like you said, we don’t want to have new servers for the sake of having new shiny things in our racks. We want to make sure that we can serve the business need, because if that server breaks and we need to restore it, it’s going to put whatever, four hours, eight hours, whatever recovery point and recovery time objective you set. It’s going to be off for that amount of time.”

Holger: How much does that amount of time cost? Well, as the IT guys we probably don’t know, but dear manager, this is something where you can tell us how much downtime is causing the business to lose money. And then we can say, “Okay, the service costs $20,000. When we do a refresh, the value of a six hour downtime is $80,000.” It’s an easy question, or an easy decision to make. You can say, “Okay, it’s worth a lot more.” And this is where the IT guys then start to understand the managers, that this is the other way round, where, “Okay, if we are down for that amount of time, the company is losing money.”

Leon: I also want to interject that I specifically said less experienced IT folks may respond immediately and aggressively with, “It’s going to be this really big project, whatever.” And as a manager you gave the normal appropriate manager response, “Okay, so we’re going to have to do this over the long-term, we’re going to have to.” But here’s the flaw in that. The IT person wasn’t listening carefully to the goal, which is, we need to standardize our names. That doesn’t mean we have to change our names, because we have this magical thing called DNS. And you can have names for your servers, an experienced IT person listening to the need of the manager, which is, “For monitoring or for whatever it is, we need a standard way to refer to these servers.” An experienced IT person will say, “Not a problem. We’ll set up another name record for our machines, and we’ll start referring to them with this new name. And once we have stopped referring by the old name, we’ll phase that name record out. It doesn’t require any change. The server can be known by multiple names and it doesn’t harm anybody. And we could do that this afternoon, boss.”

Leon: And all of a sudden what started off as a six-month dragging out painful, horrible, who wants to be part of this kind of project suddenly becomes an afternoon and it’s a win for the manager, it’s a win for the team. The other thing that you just mentioned a second ago was the value of the automation, the value of the project. Again, I’m encouraging managers to continue to communicate the value to the business of what’s going on, and then ask, “How can we prove that we are bringing value with this automation effort? Can we build in some sort of report or metric? So that, for example, this problem that we have costs us $500 every time it occurs. Because it takes three hours of our staff time and we have to replace this part,” and whatever it is.

Leon: I’m just, I’m coming up with this idea that a particular problem costs $500 every time. But with monitoring we are able to reduce the cost from $500 to $50. So what I want to know—I’m now the manager—what I want to know is, every single time this automation occurs so that I can go back month after month and say, “By the way, we didn’t spend $5,372 this month because my team was able to automate it. And that is savings that we will see in perpetuity, forever.” And that’s something that if the manager is able to communicate, the engineers can provide and the manager can then present that to leadership and say, “Look at the value that this one thing did for you. Wouldn’t you like us to do more of this?” And that’s where budget comes from.

Holger: Correct. And this comes back to your counter for every alerts where you have on the main page this alert fired 20 times. Each time it was like we saved 450 bucks. So you can easily calculate the value of that alert. And also, not only the value of preventing stuff, also the value of having consistent outcomes, because an automation always gives the same result. So there’s no, “How do I write that? I’m going to write that down this way. I’m going to spell that with one L. I’m going to spell that with double L.” That’s where mistakes happen. Then you can basically just say, “Okay, the automation task will do it for me.” And this is where you have consistent outcomes when you automate stuff. And this is what I also keep trying to tell the monitoring engineers.

Holger: Automation is not putting away your job, it’s making your job more precise, more consistent, and you will be the owner of the automation. You are the one in charge that this automation will be made better, will be probably put to somewhere else. And this is where you can also bring in your own personal value to the company, with your experience of how to write that automation or how to program that automation so to say.

Leon: Right, to the IT pros who are listening and worrying that they’re going to automate themselves out of a job, I will say, it’s not that nobody has ever automated themselves out of a job, but I will promise you, nobody has ever automated themselves out of a job that they actually wanted to do anyway. Nobody ever said, “I want to wake up at two o’clock in the morning and clear the temp directory every time there’s a disk full alert. That’s my favorite. I want to do that all the time.” Nobody wants to do that. And adding automation says that when you get a disc full alert, the first thing to try is to clear the temp directory. And then if it doesn’t work, then you want to get a human involved. That is, nobody’s going to lose their job over it. Or if, they’re going to lose the job that they hated doing anyway.

Leon: The other thing is that I find some IT professionals are leery of automation, because they think of the best way I can put it is The Sorcerer’s Apprentice. If for those people of a certain nature who remember the movie Fantasia, there’s a sequence set to the music. The Sorcerer’s Apprentice where the Mickey Mouse character gets the broom to carry the water buckets. And all of a sudden it gets out of control and the entire place floods, and it’s horrible. And I think some IT pros have that vision in the back of their mind that automation run amok, is going to destroy everything. And my answer is, “Don’t automate that part.” You are still in control. You need to have good processes, good controls, good testing, all of those things. Don’t automate something that you aren’t comfortable automating.

Leon: You will begin to have a vocabulary of what works easily and what doesn’t, and you can have incredible improvements along the way. The other thing for managers who want to encourage their staff to think about automating is, when you have a monitor, collecting statistics and an alert, “Hey, something, isn’t the way I want it.” Ask the team, “What will you do then?” When the team says, “We’re going to monitor, we’re going to get an alert, whether it’s a ticket or an email or whatever it is.” And you say, “Okay, you get that alert. What are you going to do?” “Oh, then I just type these three commands in and it fixes it.”

Leon: Well, why don’t you script that, so that you don’t have to wake up and type those three commands? And I’m not saying that is an answer, that is a conversation. And again, as a manager it’s all about continuing to have that conversation, continuing to walk that journey toward more robust solutions.

Holger: One thing to add to that is you said, “Why don’t you automate that?” You can even start smaller than that. So I have a client who put a link to a knowledge base article after each alert. So they have their alerts widget on their start page. And if an alert comes up of new hire or a junior engineer sees, okay, here’s a KB link click on that KB and it brings you what this alert, how to probably resolve it.

Leon: That’s really nice. And yeah, starting small, starting with something simple. And also non-destructive. I know I talked about automation with the clear the temp queue, and I can see how that could go horribly wrong. But putting a knowledge-base article or adding just a little bit more information is a wonderful starting point to your automation journey. Okay, so I want to hit what I call the lightning round. Do you have any final thoughts? Anything else you want to share from your time sort of straddling these two worlds of management and also technical in the monitoring space?

Holger: There’s one thing where I was talking to a manager and he was talking about the term data governance. You need data governance. And from my background as a archiving and backup consultant, I was thinking, “Hmm, we are now talking about monitoring. What do you mean by data governance in that monitoring our business process project?” And so he was telling me, “Well, you need to put a consistent naming scheme over all applications. You need to have custom properties. If you have them, they need to be named the same in every tool that we use, it needs to be a consistent governed set of data.” And I was like, “Okay, so we need to differentiate the term data governance a little bit.” So on one hand there’s data governance for data at rest. That’s the term that I was knowing from my background of archiving days.

Holger: There’s also data governance for data in transit, where you would need to know where’s your data flowing. Where’s it going? Where’s it come from? Where can I store it when it flows somewhere? And then there’s data governance for, in that part, business process relevant data. So if you are tagging something, or if you’re looking for common ways to name it, you need to govern that data. So don’t let one team decide what the custom properties on one instance of a tool are, bring them together. And that’s probably not only a management discussion that could also all the way go up to the leadership. So they say, “I do have all those nice teams who are doing their job very efficiently, very great.” And they also need to work together when they have their own set of tools that when the information comes all the way up to the leadership, it’s named all the same in all different teams. So that’s where that phrase data governance then also made sense in that business process part, not only in the data at rest part that I was familiar with.

Leon: Right. And again, that illustrates the, that word, you keep saying that word, it doesn’t mean what you think. It means that we have terms based on our experience. And those terms are the same words, but they often refer to different things. And as a manager taking a minute to say, “When I say this thing, what I mean is this.”

Leon: As an IT professional hold back on the urge to correct. The way that you know a word, whether that’s log file, which I know people define in at least four different ways, honestly, logging. So if your manager says a word and it doesn’t seem to fit, offer your definition, hear their definition, and then start to work on again that common vocabulary so that you can move forward. It just, it underscores that need for dialogue, for mutual respect that you said at the very beginning. And the idea that this is going to be a journey. This isn’t going to be a single point in time. Holger, I really appreciate you taking time out of your day to share some of these ideas. One more time for everyone who’s listening, where can people find you if they want to see more of what you’re working on?

Holger: They can find me on the Twitter, on LinkedIn and also for the German guys on XING, if you want to connect. And also, if you want, shoot me a message over LinkedIn. And I usually take my time during the week to answer questions, requests, and all stuff, feel free to do that.

Leon: I will also point out that Holger is one of the SolarWinds THWACK MVPs. So you can also find him on thwack.com, and where he shares of his experience and knowledge generously all the time. All right, Holger, thank you again for coming on to TechPod today.

Holger: Thank you very much for having me here.

[Outro Music]