Exploring APM “Cloud Confessions” With Adam Bertram — SolarWinds TechPod 024

Episode Transcript

Leon: The SolarWinds Cloud Confession 2020 survey explores how extensively technology professionals are using APM tools, whether on-premises or for SaaS-based application management, and how they monitor these environments. It also showcases areas where tech pros feel confident in their APM tools, and strategies, and challenges, and avenues for building confidence. This year’s survey shows that most tech pros are using APM tools in a troubleshooting capacity and not as part of proactive APM strategies to achieve optimization. It’s easy to say APM solutions must be more than reactive troubleshooting tools, but it’s far more challenging to determine what’s holding companies, teams, and individual contributors back from achieving it. I’m Leon Adato and with me today to discuss potential solutions and explain the opportunities of fully leveraging APM across the entire application stack is Adam Bertram, known to his friends and internet groupies as Adam the Automator. Welcome to TechPod.

Adam: Hello. Thanks for having me.

Leon: Adam, we like to start off TechPod with a little bit of shameless self-promotion. This is where you get to tell folks where they can find you on the interwebs, or projects that you’re working on, or things that they can keep their eye out for. So, tell us a little bit about yourself.

Adam: Yeah. So, as you said, I’m Adam Bertram, I’m a Microsoft MVP, online business pro—I guess you could say—content writer, trainer, and I do a lot of technical writing, especially on the interwebs. You can find me mostly at adamtheautomator.com, which is my blog and also recently released my first ever published book, which is PowerShell for SysAdmins. You can find that out at powershellforsysadmins.com.

Leon: Nice. I’m going to have to pick that up. I know that a bunch of the members of our audience are really into PowerShell and I think that that’s going to be really interesting to them. Just to round things out, I’m Leon Adato. I’m a Head Geek™ for SolarWinds. Yes, Head Geek is actually my job title. You can find me on the Twitters, as the youngins like to say, @Leonadato. You can also find me on THWACK.com, that’s the SolarWinds user forum, about 150,000 folks all hanging out talking about whatever tickles your fancy. You can find me there @adatoLE, that’s my last name and first two initials. And if you’re scribbling any of that down, stop it. Just sit back, and relax, and listen to this because we’re going to have links and references on the episode page. As I mentioned in the intro, a lot of this conversation is inspired by the 2020 Cloud Confessions report.

Leon: You can find that at solarwinds.com/cloudconfessions, and you might want to check that out because there’s a lot of statistics, and graphs, and charts showing the details behind some of the talking points that we’re going to have. We’re going to hit a really wide range of topics that are mostly focused around APM, from cloud and DevOps to hybrid IT, into the nature of IT work that we do. So I think, Adam, what I’d like to do is spend a little bit of time sort of defining our terms. I think as IT professionals, we like to define our terms before we go off and code things or build things or whatever. So, I think the first thing is let’s talk about cloud generally, and whatever the opposite is. My fellow Head Geek Thomas LaRock likes to say “earthed,” which I think is funny, but it’s hard to say clearly enough that people know what you’re saying. Cloud and the other place, on-prem or whatever. It can mean a lot of things. So what are some of the common types of environments that you encounter in the workplace as it relates to cloud?

Adam: Yup. So I mean, I think that as you said there, there’s the cloud and then there’s on-prem or on-premises. Don’t ever say on-premise or you’ll probably be shot online, but essentially I think there’s three. You have the old style, the monolithic approach to where everything is all bundled in one executable and it’s all one binary and everything is managed that way. And then you have the second type, which is the SOA type or service oriented architecture where things are sort of distributed more. They have different categories of services, but some services still interact with one another to some degree. And then finally to kick off that evolution, I guess if you will, of software architecture where you have—nowadays you have lots of microservices and DevOps to where everything is completely separated. It’s all its own entity. And at that point you have everything is its own little monolithic, if you want to say, but they’re really tiny monoliths and we have to all manage the dependency to how they all work together.

Leon: Right. And I think it’s important for people who are saying, “Wait, wait, wait, microservices? I don’t have a microservices. I don’t have it.” We’re not talking about environments the way that traditional IT pros might think of. We’re going to talk about that also. We’re going to talk about hybrid IT and things like that, but we’re talking about application architectures. There’s ways of approaching the building of an application and it starts off sounding familiar. It sounds monolithic, traditional, on-premises, etc. “Oh, okay, I know where I am.” And suddenly you’re getting into SOA, and microservices, and, “Wait a minute, how do I build that?” It’s slightly different. It’s slightly off. So I guess on that point, for people who are listening to this and maybe don’t feel like they’re programmers as the main part of their job, they might do some scripting or whatever, what do these distinctions mean to you as a generalist tech practitioner? How do you approach them in your work in your daily job?

Adam: I definitely think that for—like you’re right, a lot of those terms are traditionally software development terms, but in this day and age, I think many IT pros need to at least have a basic understanding. Sure, the software developer’s going to be learning, going to be knowing a lot more about the ins and outs of those terms. But as an IT professional, you are on the Ops side of DevOps, so you need to know how to manage those surfaces at a high level. And from going to the point of this talk today is how to monitor those services and just monitoring those services and just understanding at a high level to be able to—if somebody calls you, if something happens in the middle of night, one of your monitors goes off and you get paged or something of that in the middle of the night, you need to be able to understand the software at least at a high level to call the developer on standby, “Hey, this problem, this piece of the code is doing this. How do we get this resolved?”

Leon: Yeah, that’s exactly it. Is that even if you’re not going to build these kinds of environments or even work in these kinds of environments, again, microservices being the most extreme, I think, in terms of distance from traditional ops work, you still have to have an understanding of it and see how it fits into your architecture. But I do want to take a minute and talk about that. I want to talk about the hardware environments that we’re discussing. Again, on-prem is where it starts, but what’s the progression from there on up to the sky?

Adam: So if you’re talking about from on-prem to cloud, traditionally companies that have maybe a monolithic structure and have some certain line of business applications, they will first do the lift-and-shift model. So as an IT pro, you most likely would get involved with the lift and shift. Well, let me take a step back. There was the lift-and-shift model, which is essentially taking maybe your VMs that are running in Hyper-V or VMware on-prem. And essentially just taking those and throwing those into the cloud, maybe you’re going to have some reconfiguration in there, changing how the networking works and figuring out what kind of compute size that you need for the cloud VM, which essentially that’s the easiest model to use. You’re essentially taking that one package that you have and putting that into the cloud.

Adam: That’s the easiest way. Then you also have the model where essentially the software developers themselves re-architect the application to live a native linen cloud. If you wanted to get buzzwordy, you get cloud native to where those applications are natively aware of all of the past services. There’s IaaS, Infrastructure as a Service, or the VMs don’t even really come into the mix when you talk about cloud native because everything has an API, everything uses the platform as a service options here and there. Then that’s when we go into buzzwords like serverless and things like that. So essentially, there’s two ways to get into cloud, but what I found is traditionally companies that really don’t have a lot of cloud experience or they have very, very complicated applications, whether they’re monolithic or so applications, they tend to say, “Well, we want to go to the cloud, get the advantages, all the cost savings and the scalability of the cloud. But we don’t really want to go to the re-architecturing of the entire application.” Which that may come in down the line.

Leon: Got it. Okay. That’s a really good mapping. Thank you. So I guess the next question, now that we’ve defined what the environments are and how they sort of map together, is why has the industry moved from this centralized model to where we are today? It’s worth taking a minute and looking at the history and saying, “Well, why are we here?” Because there are people who would say that lift and shift is also known as the “bring a wheelbarrow full of cash” model because you’re going to take your existing environment and move it straight up. But it’s not optimized for that. So it’s just going to start chewing through workloads and resources, and some applications are more successful than others that way. But why are we going this way?

Adam: Yeah, I mean I think that you kind of have two camps there. You have the companies that maybe don’t have the necessary talent or the skill to really understand. You don’t have cloud engineers. You have system administrators that have only been focused on-prem their entire lives and cloud is kind of the new thing. But the CIO has been hearing all about cloud computing. Everything has to go to the cloud. And that’s the point to where they tend to use the lift-and-shift model rather than the actually re-architecting the application because being in the cloud and being on-prem is significantly different. Some people may disagree. We say, “Well, it’s just a VM is a VM.”

Adam: Well, not technically, but I think for the most part that the reason people are going to the cloud is the CIO sees scalability, the CFO sees, well, cost savings because we only pay for what we use. But like you said, if it’s a lift and shift model, if they actually just taking the VMs that they’re using that are running 24/7 and throw them up in the cloud, a lot of people, if they don’t manage and plan for those costs, they don’t monitor those costs, then it can leave definitely a black eye for a lot of organizations.

Leon: Right. Okay. So when we talk about defining our terms, one topic jumped out at me from the survey itself. There’s a lot of confusion around what APM was, when and how it should be used, which kind. There were quotes, everything from, “There’s the confusion around which tools are ideal for specific IT environments is consistent.” Another quote, “Lack of awareness of what APM solutions are currently offered and confusion over which currently offered APM solution are best for their needs.” And finally, another quote that jumped out at me was, “Knowing which combination of APM elements are right for your environment.” These all were direct comments about the data from the survey. And it takes me to one of the main points of the report, which is APM tools and uses. So there’s nuances that we’re going to gloss over here.

Leon: I mean, we only have a limited amount of daylight to do this. There’s outside-in versus inside-out type monitoring, there’s server-side technologies like logging and metrics and stuff like that. I want to spend a moment on the basic explanations and use cases for synthetic transactions versus real-time user monitoring. So I’ll take either side of this, Adam, which one do you want to explain? And I’ll explain the other one.

Adam: I can take the synthetic user transactions.

Leon: Great. Go for it.

Adam: Well truth be told, not a complete expert. Whatever I go back to user transaction monitoring, I go back to the days when I was a system center operations manager admin. So I’m going to, kind of relating this term to SCOM, which I’m sure a lot of IT professionals will be familiar with. In SCOM you had the management packs and then inside of the management packs there is various ways to monitor. There’s your typical up-down, your ping monitoring, your port monitoring, that sort of thing. And then there was always that synthetic user monitoring piece inside of SCOM. And I got to looking at it and at first I was like, “Wow, this is really cool.” Because at the time, I was used to more of the up-down. There’s a ping, is a port open or is a specific user? There’s some kind of configuration.

Adam: But whenever I dug into the synthetic user monitoring, what I found was we had sat to set up the actual agents on each of the machines. So I’m not for sure if this is different in any other environment, but when you talk about a system center operations manager, we had to set up agents, so it actually mimicked real behavior. So if you’re familiar with maybe Selenium, it was a web app where it mimicked real-user behavior, clicking on a menu, going to this item. And I mean, that’s where the synthetic user part comes. It’s a synthetic made up user that goes through and interacts with the application, and then monitors things like return response time for the website, how long did this element take to bring up in a page? Oh, did this element come up right? So that’s my experience with synthetic user monitoring, which it came in to be, was great at the time because it got really granular in the various metrics that we should monitor.

Leon: Right. Yeah, that’s exactly it. And for those people who are looking for the SolarWinds Rosetta Stone here we’re talking about, either Web Performance Monitor or Pingdom. Those are two tools that do a synthetic user. We have other tools that will also do it. And my best use case for that, just in case people are still having trouble wrapping their head around it, is one of our former Head Geeks who is now the product manager for one of our other products, Destiny Bertucci, got front row tickets to Aerosmith using synthetic transactions. What she did was she set up a synthetic transaction that checked the ticket sales system and it said, “Not on sale yet.” And she wanted an alert as soon as it did not say, “Not on sale yet.” So she would go to the ticket system, that webpage, click over to the Aerosmith page and look for the absence of the sentence, “Not on sale yet.”

Leon: And as soon as it said that, her phone went off, she jumped on the site, bought tickets, got them right away. So you’re mimicking a user. But think about the most annoying, obnoxious hit F5 refresh kind of experience you’re thinking of, that’s what synthetic user transactions can do for you. Now on the other side, real user monitoring or RUM is looking at actual users. What there are, are again, agents that are injecting code into the stream, into either the user’s browser or from different points within the transaction flow and they’re from time to time reporting back on how long it’s taken for them to get to the next step, or what they’re doing or what have you. And that tells you use by user. Now sometimes it’s sampled and sometimes you’re getting every single user’s experience, but you’re seeing the real experience of users as they’re moving their way through the system.

Leon: You’re seeing that that database query while it is running quickly on the database, it’s still not returning data fast enough to not slow down a particular experience. That’s a big deal. It’s not the same as synthetic. We’re not saying better or worse because at two o’clock in the morning, if nobody’s on the system, you won’t know. And so you’re going to have to wait for users to jump on the system to find out. Meanwhile, synthetic user transaction only happens from a specific set of locations. You may not know that the user’s in Tupelo, Mississippi, or San Francisco, or Cleveland, Ohio are having an experience because you’re only monitoring it from Paris, and New York, and Los Angeles, for example. So there are pros and cons to each one of them. The reason why I want to go through that is because I want to talk about your feelings on which use case fits the tool best. If somebody came to you and said, “Well, I’m just not sure what to apply.” How would you guide them?

Adam: Well, I think it depends on how granular you need to get. So coming back to your example, what I was thinking of, another good term to define synthetic user transaction is web scraping. So you take a website, you can scrape all the stuff off the websites. I’ve done this in the past where maybe a program, a service that you love, you need to automate, doesn’t have an API. So you go down to the bowels of web scraping and create Selenium scripts, or create some PowerShell scripts, or some Python modules and things where it’s a pain to figure out all the elements on the page and everything like that. So that’s one thing if you’re not really looking for very granular information, so for example, like you said, that’s a good example.

Adam: You can set up a monitor to monitor for a specific string in a webpage or you can monitor when a new element, a new image comes on, maybe it’s an new application that you want to be pinged whenever a new entry is made and that entry is defined by an image on the webpage. You can monitor on that. If you wanted to get more granular, I guess, more “Devy” where you can inject and have the application, have this monitoring service, really understand what’s going on under the hood, understand the HTML and even back to the storage and the database to map out when this user clicks on this button, okay, how long does it take the webpage to paint a picture? How long does it take for the user to get a response? How long does it take to go to the database? How long does a database retake?

Adam: So if you think about an application from a really granular perspective like there’s that website that had—I can’t remember what it was, like this is what it takes to actually bring up Google.com and it was the DNS query, all the web elements, the database, all of the network, all of that stuff has to come into play. And I think with real user monitoring, especially where it gets more granular and it allows you to understand each hop along there and to understand their performance and where those bottlenecks are.

Leon: Yes. And that’s the aspect of tracing, right?

Adam: Yes.

Leon: Where you’re actually tracing the entire transaction from cradle to grave and you’re watching every step along the way. I think something that you had said as we were talking through this topic before we started up was that the buzzwords a lot of times are what gets in the way that an engineer—they don’t know synthetic transaction. They just want to know the tool that acts like a user even when they’re not there or what have you, the buzzwords get in the way. So, you had gone through some of the APM tools that monitor architecture. You talked about the different kinds or states, and I wondered if you could do that for the folks who are listening.

Adam: Okay. So yeah, I mean, if you’re talking about how to monitor the three different environments, the monolith, SOA, and microservices, what you’re typically talk about, just removing a lot of the software development ESC terms from the fact, essentially what you’re doing is going from old school—we used to just set up a ping monitor, “Well, if an ICNP packet got from point A to point B and I got it back, oh, everything is great.”

Leon: Oh, those days were so simple. I miss them so much.

Adam: Yeah, you just setup your ping monitor. If the server pings, then everything is great. Well, nowadays, you have to get more granular. Just because a server pings and returns a decent response time doesn’t necessarily mean the—

Leon: Squat! It doesn’t necessarily mean squat.

Adam: Exactly. It even gets a TCP.

Leon: It doesn’t.

Adam: Even if the port is listening. “Okay, well I’m going to go a little bit, make sure port 80 is listening.” “Okay, well that’s great.” “Well, is the webpage coming up?” Okay. We’ll even go a little bit farther, the whole monitoring, the ecosystem has led to more of a granular, I guess, evolution of monitoring to where it’s not just up, down. Nothing is up, down. At my previous position, I was one of those guys who always ask, “Well, what does it mean to be up and what does it mean to be down?” And some of the guys who were looking at me like I had a third eyeball or something because if you’re not kind of in that frame of mind where you don’t see a green or red, now it’s not a green or red. It’s a green and an orange.

Leon: And blue, and a purple, and line sheet of mauve.

Adam: Yeah, completely down or you have yellow or orange, which is maybe degraded service, and then you have red, which is completely down, “Oh my God, we have to fix this thing. No user’s get to get in there.” It’s a gradual scale now. And I think that’s what we need as IT professionals. We need to really understand that nothing is really up, down anymore. And that’s where we go back into the real-time user monitoring and synthetic user transaction monitoring because that mimics what we truly, truly care about is how the user’s using the services, what their experience is like. So we’re trying to measure what the user experience is like rather than what we see in the back end. We see, “Oh, it’s the respondent on TCP port 80.” Well, the user doesn’t care at all about that. User just cares if the webpage comes up, it’s responsive, and it does what it was supposed to do.

Leon: Right. At SolarWinds, we like to remind people that slow is the new down and it doesn’t matter if all the servers are up, what was it? Charity Majors was pretty famous for saying, “Five nines doesn’t matter if your user’s not happy.”

Adam: Because they are not really mad. We need to measure user experience. Now that’s really hard. It’s kind of a cushy kind of pie-in-the-sky thing. But I think that’s what we’re really trying to get at nowadays.

Leon: Well, and I’m going to say, putting on my grizzled old ops veteran hat, that it’s both and, it’s not either or. You do need to know that the bandwidth, the way and connection is in—to your point, it’s not just red and green. It’s, “What’s the status?” It’s paisley. Like, “Okay, fine. Whatever that means.” But you need to know how your components are doing, but the component status doesn’t necessarily relate directly back to the user experience. That is another thing. And as I said before, the components could all be perfectly happy and the user experience is still bad and you know what? You’re still on the hook for it. You as the IT person still have to deal with that.

Adam: Yeah. Yeah, definitely. I really think everything goes back to really the end goal of the user experience. I think that more of the traditional ops, “Is the WAN looking good? Do we have enough storage?” I think those should be more kind of labeled as proactive things because if you see the user’s clicking along on the webpage, everything is looking great, but you don’t see the storm coming to where you’re going to run out of disk space, you definitely need that. To me it seems like those are more proactive monitors versus the user experience. That’s more real time right now monitoring.

Leon: I will remind listeners that FCAPS is still a perfectly good framework to look at and use. That fault is just one of the letters that FCAPS, fault capacity, access, performance, and security. That capacity, what you’re talking about, “Do I have enough space?” And performance, “How is it working?” Are both wonderful reasons to also monitor. I’ll also remind everyone that baselining, meaning I’m watching it all the time so that I know what normal looks like so that I can tell you when it’s not normal. 80% may be so abnormal that you need to look at it. In other cases, 80% is a big yawn, but you won’t know until you know what the normal average run state is for a particular system, device, number of customer connections, sales per hour, whatever that number is that you’re monitoring. We’re not just talking about CPU, but if you don’t have the normal, if you don’t baseline it, you’ll never know what the variance away from that baseline is.

Adam: Yeah. That used to be really, really hard to establish the baseline. Nowadays with AI and machine learning that we can use those tools and technologies to really help us define, “Okay, what is normal and what’s not?”

Leon: So something I want to take a look at that jumped out at me is that the top three commonly deployed tools to support APM, this is straight off of the survey, was database monitoring, application monitoring, and infrastructure monitoring. And those three things do not scream APM to me. So what that says to me is that there are tools that you use to help monitor the stuff that APM is monitoring or that there’s things that fill in the gaps or something like that. And I wanted to get your take on that.

Adam: Yeah. So I mean, I think at some point you have to think about all these different layers of monitoring and at some point you have to realize, “Well, who is watching the watcher?” You could have instances to where you have—I take this back to unit testing and integration testing and all those software development testing terms to where this is a common thing that goes into here. Well, if you have tests to test if you’re in a DevOps environment, if you have tests to verify that your virtual machine was deployed successfully, does it ping? Is it on? Is it responding on the port? Is it providing the right webpage? You can monitor and write tests for all of those as soon as a new piece of infrastructure is deployed. But at some point I’ve gotten a lot of questions about this because I’ve actually wrote a book that’s called The Pester Book. It’s a PowerShell testing framework. One of those big questions was—

Leon: Wait, wait, did you say pester as in bother me?

Adam: Pester. Yeah, pester is the PowerShell testing framework. Yup. It’s built in PowerShell and it allows you to do unit testing with a PowerShell code or a PowerShell module as a script just like you would C# or Java or anything like that. But yeah, because I wrote that book I get a lot of questions about testing, “Well, why do I need testing?” Same way with I can even go back to configuration management. If we go into something like Puppet, or Chef, or desired state configuration, DSE, you have tools that are supposed to provide you a platform and say, “I want this server to look like this.” It changes some register keys, installs some software, put some files wherever it needs to. That’s one layer and then I get the question, “Well, why do we need to test that if your configuration management tool is already doing it?”

Adam: It does raise a good point to where then if you ask, “Well, why do we need to test this? Where do I trust?” It all comes down to trust, so are you going to trust that your configuration management tool is doing it? Okay, if you’re not, I’m going to write tests for that. I’m going to monitor this, whether it’s, you’re going to talk about configuration drift or anything like that. I’m going to write tests to make sure I’m going to create a monitor to monitor this. Well, at some point, okay, then you say, “I’m going to monitor the monitor.”

Adam: At some point you have to realize, “Well, I have to eventually end up in something. I have to trust this is the golden parent tool of all of my monitoring. And if this tool says that my users are experiencing a hard time or my WAN link is down, or something like that. You just have to kind of trust it.” You can get a lot of us ITP, we can kind of be OCD sometimes, “Well I’m not quite for sure. I don’t really trust it.” But at some point I think it all comes down to trust and you just have to trust kind of that monitoring, your testing tool.

Leon: So as IT professionals, we do live in a heterogeneous reality, which is there isn’t a single tool that we use monolithically across things. That we use a variety of tools and a variety of solutions. And so yes, I have to trust that the data I’m getting back from this test or this system is true. I also have my sanity check, that backup system that will verify for me that the things I’m seeing from the first one are also true. Whether that’s a technique or a tool, it depends on the context. But at the end of the day, you do have to believe the information that you’re getting. Otherwise, you’re going to be chasing shadows your entire time. And we’ve seen that, right? We’ve seen where somebody comes in and says, “This code is actually incredibly inefficient.” It’s like, “What are you talking about? I wrote my heart out. This is the best code I’ve ever written.” And then they put a new tool and they show that it’s taking minutes to execute.

Leon: And on top of that, it’s not just it’s not running well, but where is it falling down, what isn’t working on it? And again, that takes us back to whether it’s synthetic transaction, real-user monitoring, tracing, etc, that tells us that. I think by this point folks listening may think, “All right. All right, I get it. I get it. APM is good. I need some of this stuff in my life.” But there is a gotcha. And I’m just going to read from the report. In terms of code analysis, APM solutions provide great value in identifying problems and code that report when a transaction or an application is slow or failing. However, these applications are not always built by your company, but are instead purchased, which is an important differentiation or delineation. Knowing where the problem is in the code is only valuable if it’s your code. So Adam, you’ve seen things, you’ve lived a life as I understand it.

Adam: I have fairly enough hair to show for it too.

Leon: Exactly. So I was wondering how this resonated with you.

Adam: Yeah, so I can go back to many, many stories, but the one that is top of mind, it was from a previous position that I was in where I was brought in as a DevOps or an automation engineer. And me being the PowerShell guy, I tend to do everything in PowerShell and unfortunately so did this company. But if any listeners out there realize that, well, PowerShell isn’t necessarily a “development language,” but there’s lots of tools that kind of coincide or work well with PowerShell. Powershell is kind of known to be the glue that fits everything together, kind of fills in the functionality gaps of lots of different services software. Well, the company that I was brought in to decided, well, I’m going to build at a complete automation orchestration engine with PowerShell from the ground up.

Leon: Like you do.

Adam: Yeah, exactly. Think about, at the time I was like, this is awesome cause I love writing PowerShell because now I’m essentially a PowerShell developer. I get to do it all the time. But to just give you a sense of what this tool did, it was essentially an environment orchestration engine. So we were kind of brought in to automate the provisioning of test environments, whether that be on-prem and the cloud. So, for example, a developer puts in a request, saying, “I need a test environment for XYZ application.” Well, it was our job to automate the provisioning of that. So that may consist of two or three VMs, all clustered together with the database, all the web app, the web front end, all the networking, everything to where essentially at one click, “Here, I want this application.” Click and you have everything brought up.

Adam: And we wrote all of that orchestration, all of the code that it took to do that in PowerShell, because you can. You can do pretty much anything with PowerShell. But what we got to see was we were facing challenges that were taking so much longer to do because there was other configuration management tools out there. So whenever we would want to create a new virtual machine, for example, we could create it once, but then the developer would say, “Okay, well I want it again.” I’m just going to run this again. And then it would air out, well, because it wasn’t item potent. The PowerShell scripts that we wrote didn’t natively understand if there’s a VM out there, then don’t try to create it again.

Adam: But fortunately, we had to write a bunch of, if the VM exists, then do not do it. I had to write all this stuff and it took months and months and months to do this. And what we found was configuration management tools did a much better job at that because then we wouldn’t have to write all that functionality in. Essentially, we could use the existing modules and packages that were in those tools and then add in our Power transcripts that we need and leverage all that ahead of time. And it became a nightmare managing all that and trying to reinvent the wheel. The one thing that we did a few months before I was getting ready to leave that position was finally we started looking into configuration management tools. And it solved so many problems because we were able to stand on the shoulders of giants and realize that, “Wow, they’ve already solved this problem. I don’t need to do that.”

Adam: But on the other side of the coin is where we kind of get in that not-created-here syndrome to where, “Oh, I don’t trust this software because I didn’t write it myself.” But what we found out was that there’s a lot of smart people in the other end developing these tools. They’ve already done all these things and we just needed to realize, “We need to trust that they’ve done their due diligence.” And 99.9% of the time we had a much better time to do that.

Leon: Yeah. I think that NIH, not invented here, is only superseded by the much worse impulse NIHBM, not invented here by me. And both, like you said, speak to a lack of trust. And also just an unwillingness to recognize that even if you acquire a tool that is 80% percent functional, let’s just go with regular outcomes rates are 80/20. 80% of the way to what you need is a significant amount of time that you don’t have to spend building it. And I know that at some point in that experience you just told about where you’re building it yourself, somebody said, “Yeah, but we don’t have to pay a thing for it. It’s free. I mean, we’re building it ourselves. We’re blazing our own trial.” It’s only free if your time is worthless. And in any organization, I’m willing to bet there’s something else that has a higher value to the business than building a tool that already exists in large part.

Adam: Yeah. Don’t get me started on those meetings where there’s 20 people in there and it only needs to be one or two.

Leon: Oh yeah, there’s all that.

Adam: They just don’t realize people—that’s a common thing to do. You don’t put really a value on time because while I’m not specifically paying for anything, but yeah, I agree with you completely.

Leon: Yeah. So in that vein of don’t get you started, I do want to pivot over to avoid the problem both of NIH and also of dealing with these code bases and things like that. The survey says one of the ways you can get around it is to develop an APM plan. And so it lists out a few bullet points, inventory, assess, evaluate, and I wanted to talk through those with you just to get your take on them. Starting off with inventory or discovery. Some people call it discovery, anything that we should be thinking about there as we are working our way toward a comprehensive APM strategy.

Adam: Yeah, I mean, I definitely think you have to go into to a discovery mode. You have to first know what’s out there. You have to know what servers that you have. You have to know what applications are installed in those servers. Discovery and inventory is by far and away to me, which is essentially the most important because if you don’t know what you have, that’s your core foundation. You have to have a tool that will allow you to figure out what exactly you have. Once you have the inventory in place, then comes the assessment phase where at that point you know what I have.

Adam: So taking us down to a real example, alright, my discovery tool recognize that I have a server in that’s in a closet somewhere or maybe under somebody’s desk that nobody knew about. Great. I’ve discovered it. I’ve done my inventory. Assess, “Well, oh no. That server is running a critical line of business, HR application.” Not that I’m talking from experience in any way. No, I would never do that. But that server is running a line of business and a critical line of business production, HR application. That’s your assessment phase. I think at that point you really need to evaluate the severity of it. If it’s a production application that HR is using well whether what’s under somebody’s desk or not, you need to evaluate, is that going to be a good option to take that out from under the desk and put it in the data center maybe? Take it to the cloud? In that point you go into testing and trial phase, “Well, okay, I’ve inventory. I know that there is a server, a desktop computer under somebody’s desk in HR.”

Adam: Assess, “Is this important or not?” Yes, it’s running production application. Evaluate, “Well, does it make sense to take this application and move it somewhere else?” Well, of course, it does if it’s under somebody’s desk. Then you kind of take that and worry about tests and trialing it where when you bring in maybe parts of the other team, bring in the software developers that may have built the application, maybe get the vendor involved. And at that point you bring all that kind of re-architect that way, maybe bring it to the cloud for example. And once it’s in the cloud, you come into more of a measuring monitoring phase to where, “Okay, I’ve built this up exactly how I want it to be. And at this point I need to maintain the state. I need to make sure it’s there.” Configuration management tools can maintain that state, keep it in the exact state that you want. And then monitoring tools can then make sure that the performance is not only the state and the configuration, but the performance maintains at that expected level.

Leon: Nice. That was a great rundown. Thank you so much. There is a whole lot more for us to talk about. This is really just scratching the surface of the survey, but I think we’ve given everyone who’s listening enough to think about for one sitting. So before we wrap up, I just want to give you a chance to tell everyone again, where can people find out more about you?

Adam: Sure. Adamtheautomater.com or you can also get me on Twitter. I’m probably there way too much @ADbertram.

Leon: As we all are. And if you want to read this survey and dig into the details and the data, you can find that at solarwinds.com/cloudconfessions. Adam, thank you so much for taking time today.

Adam: Yup, thanks for having me.