Video

SolarWinds Lab Episode 58: Master Your Virtualization Universe With Discovery, Remediation, and Optimization

In this episode of SolarWinds Lab, Head Geek™ Kong Yang and Chris Paap, Product Manager of Virtualization Manager, show you how to master your virtualization universe with Virtualization Manager and three core skills – discovery, remediation, and optimization.

Back to Video Archive

Episode Transcript


Hi, I’m Chris Paap, product manager for Virtualization Manager. Kong, thank you for having me.

Hey, welcome Chris. Howdy folks, I’m Kong Yang, Head Geek. And on today’s episode, I’m super excited! We get to talk about how to master your virtual universe. Chris, why is that important?

One of the key things that we always know from, myself being in IT, and just in general IT, there’s always do more with less, and there’s time constraints, right? If you can get ahead of problems, and you have a foundation of what is going on, you have a firm grasp operationally of your environment, especially virtualization environment, is key to getting ahead of those issues.

Exactly, you hit on two key points. IT organizations: the two things that we always lack, time and money. Time and money. With virtual environments, you have VM sprawl, resource sprawl, shadow IT, rogue IT, those marketing-easy terms, and that little thing called suboptimal performance. It’s what triggers all those tickets.

Right, and one of the things that is key to the suboptimal performance is not very apparent when you’re just looking at your normal metrics. And there’s causation, down the line, and dependencies that result in application performance issues.

Absolutely, so on today’s episode, we are going to talk on why that suboptimal performance, that suboptimal user experience, occurs. And it starts at the application, right? The application lifecycle, it’s changing. How is it changing, Chris?

That lifecycle has progressed rapidly. Used to be, it took days to get a VM out, and now that we’ve hit the cloud, as well as containers, you have things being deployed— VM instances being deployed in the cloud that are living for weeks and days, possibly. Or you even have microservices that are spun up in milliseconds and living for seconds.

Yep, and to your point, the chart that we’re showing right here is, the orange bar shows the application time to live, and to Chris’ point, we have buckets of virtualization cloud containers and microservices. So, virtualization, as you mentioned, takes many days to spin up, to provision and so forth, but they tend to live months to years. Cloud spins up in minutes; they tend to live for a few hours to a few weeks. Containers and microservices, you’re talking milli– microseconds in terms of their life-cycle. This chart also shows why IT pros need to pay attention to their core skill sets, and what we’re going to talk about in this episode, because you’ve got to follow the money. If you look at the bubble person representative there, each one of them represents a hundred job openings on dice.com, and the key word is virtualization, cloud, and containers and microservices. Searches were done on those, and if you look at it, there’s 2,000 odd jobs with virtualization, almost 12,000 jobs in terms of cloud, and then a thousand jobs in containers and microservices.

And I think one thing that we have to point out as well, is the fact that virtualization has pretty much stayed static, based on the metrics that you stated, and cloud has grown tremendously and so has containers. There needs to be a basis of having knowledge in each of those areas. Just like if, those that remember, that have been around in IT for a while, when virtualization came on the scene it was, okay, we have to know something about virtualization and possibly put it into our environment. Now today, it’s if you don’t know how virtualization works, that’s major pillar of knowledge that you’re lacking in your IT group.

What that tells me is, there’s multiple solution stacks to solve a single problem in there. To me, that tells me that IT is changing, and we, as IT professionals, have to change with that.

Correct, IT has been historically a system of record, and that’s transitioning to a system of continuous delivery. Or DevOps, as some people call it.

I love how you mention DevOps, because there’s a healthy debate about how one defines DevOps. Is it a culture? Is it a trend? Here at SolarWinds, we have what we call the IT Trends Report that we release every year. What it’s shown is whether you’re embracing the DevOps culture, or whatever you want to define that continuous integration, continuous delivery as, there’s four key tenets that’s happening. People are moving to the cloud, people are seeing cost efficiencies from the cloud. IT professionals know that they have to build skill sets with cloud technologies and business skill sets, and there’s an increase in lack of visibility because of the complex environments in there. And you’ve seen that in our prod portfolio, as the product leader for Virtualization Manager.

Correct, our inclusion of Amazon Web Services is a direct correlation of that. People are asking for it. It’s not a matter of if they’re going to put it in, it’s how much are they going to put and participate in the cloud.

That usually transitions into a question for the IT organization and IT leadership. And you, being one of those leaders in your previous career life–it usually boils down to three things. Is it a skill, hill, or will problem? And by that, I mean is an aptitude thing? Do you have the right personnel in those roles? Is an altitude problem? How much tech debt does your organization have, and can your personnel overcome that technical debt? Is it a will problem? Is your personnel willing to learn if they don’t have that skill set?

And this is a–those three things: aptitude, altitude, and attitude, are just the continuing evolution of IT. That is something that is a daily challenge and a monthly challenge. It’s not just a goal, it’s just a continuous— I wouldn’t call it a problem, but it’s a continuous evolution of growing your environment.

It certainly goes to the ABCs of IT pros, and that is you’ve got to always be a learner.

Absolutely.

Fundamentally, the CIO’s SLA, what we call the service-level agreement, that they have with their C leadership, it stays the same. And it involves— Funny, it also aligns with SLA because it’s around security, it’s around running lean, and being agile. Security: try not to get breached. Lean: do more with less; be efficient and effective with the solutions that you put forth. Agile: not the process, but being able to pivot off of anything technology-wise, personnel-wise, that’s advantageous for your business organization.

I think that’s where the DART framework comes into play, in terms of achieving that SLA, especially from a day-to-day perspective, getting control of your issues and putting out fires, to actually getting ahead and doing projects and achieving that SLA.

Exactly, so what is the one thing that we as IT professionals can control? That is having the core foundational skill sets. You’ve heard Leon talk about monitoring as a discipline. Well, the DART skills framework forms monitoring as a discipline. Specifically, for mastering the virtual universe, Chris and I are going to focus and show you how to put into practice discovery, remediation, and optimization so that you can free up your time and resources to do more with less and meet your CIO’s SLA. Chris, we’re going to cover discovery in this section, and how to put it into practice with a tool like Virtualization Manager.

Right, one of the tenets of Virtualization Manager that filters through all of SolarWinds products is out-of-the-box ease of use, and we’ll show you what you get. We don’t want you having to go back into your environment every time you add a VM or move a VM. We want that to automatically be pulled into your environment. This way you’re getting proper alerts and optimization recommendations.

With discovery, it gives you visibility into your virtual environment, whether it’s VMware vSphere, Microsoft Hyper-V, or AWS.

Correct, and the same thing applies with AWS. Once you add your AWS credentials, and you pull that information in— once the new instance is spun up or spun down, VMAN will represent the correct environment.

Okay, so in terms of requirements, what we’re looking at right now is the virtualization assets, the summaries, VMware vSphere and Hyper-V. For vSphere, you need your Vcenter credentials.

Right, so just to do typical monitoring, all you need are read-only credentials to pull that data in.

Okay.

But with VMAN, we also provide the ability to do recommendations and take action such as a VMotion or with Hyper-V, a storage VMotion or a migration, or to reboot and turn on a VM. With those, you need the appropriate permissions and delegated roles that are assigned. You do not need full admin access to your vCenter or Hyper-V host, but you need the correct permissions to allow you to execute those actions.

Okay, so you need the proper credentials to take management actions that Virtualization Manager allows you to do.

Correct, that’s absolutely correct.

From here, you can see your virtual data center, your host; your virtual machines obviously, clusters and whatnot. What I like is with discovery, you get a quick view into your virtual asset summary. So, if you have an environment that changes quite often at different schedules and whatnot, with Virtualization Manager discovery you don’t have to worry about that. You don’t have to go in and check in on hosts on a host-per-host basis.

Right, we want to free up that time, so that you’re actually able to focus on projects and other key events, not just maintenance of what your inventory is, which can be time consuming by itself. What’s key to this discovery in this tree view, what you do see is once you go into a VM for instance, and you see that relationship, this gives you a virtual machine details view of everything you see so you can aggregate and visualize your data, and then take action or don’t take action.

Exactly, it’s the basis for monitoring as a discipline because if you see over here, the resource utilization, you’re starting to get trend data, and you can go a day out, you can granularly get it down to an hour out. It is truly a core foundational skill set that one can leverage to show you what’s going on in your environment.

Absolutely, so this way your focus is on the attention for those things in your environment that need that attention, so you’re not wasting time trying to find out what you should be working on.

Discovery, it’s the start. It gives you a baseline, gives you visibility, aggregates, visualizes those performance data that you need to basically master your virtual universe. But here’s an area that we’re going to talk about that we as IT professionals get paid for. As my friend Thomas LaRock, fellow Head Geek, SQLRockstar on Twitter, says we get paid to fix things and fix things fast.

Absolutely. Any IT person worth their salt will tell you that when it comes down to fixing an issue, not only do you have to do discovery to find out what you’re looking at or finding it, what are the next steps? Part of that is recommendations, the key of that is recommendations in VMAN. It’s an evolution of everything in our products that you see, where it’s not enough to say you have a problem, but what are the next steps? What do you do with that problem? What’s the context of that? And how do you fix it? That’s where recommendations come into play for Virtualization Manager.

All right, let’s go into remediation with recommendations engine from Virtualization Manager. So Chris, we’re on the virtualization summary page, and by default, recommendations engine’s right up there.

Surprise, surprise. It’s that it’s on the homepage. Our recommendations come in two flavors. Once you into all recommendations and you jump into it, we have two types of recommendations. One are our active recommendations, that is our remediation portion. Think of it as an after the fact. If you have a problem that’s already occurred in your environment, we’re going to provide steps to fix that, and we call those active recommendations. The reason we break those out is, active recommendations usually show up within an hour, 30 minutes to an hour. They’re issues that are already occurring in your environment. Could be as simple as VM that doesn’t have enough memory.

So you mentioned time frame in that, let’s step back a little bit. With respect to recommendations engine, how long does the engine need before it starts providing recommendations to the end-user?

For active recommendations, in as little as 30 minutes it should provide you recommendations, if it picks them up, if you have an issue. For predictive recommendations, which are our second type of recommendations, to prevent issues from happening–if we foresee based on historical data, we can identify and discern the difference between an anomaly and what truly may happen, based on your workload of your VM. It’s not just the fact that you’ve been consuming x amount of resources, we base it on the workload of what that VM is utilizing. So we’re in that VM, our host gets put in the environment, and if it only does work, say like once a week, it’s a payroll server or something to do with a report server that runs, and it does nothing throughout the week, but the whole purpose of it is that Friday report that runs out, or reports, we’re going to know that Friday is really when its true workload is. So, we’re going to give you recommendations based on if it has the resources that it’s capable of doing. And predictive recommendations take seven days’ worth of data. We don’t want to give a predictive recommendation if we don’t have a full picture. So at minimum, to have predictive recommendations, seven days’ worth of pulled data.

Gotcha. So to summarize for our audience out there, active recommendations is issues happening in the now, and you should take actions to remediate those. Predictive is over time, and as your environment changes, this engine is changing and accommodating those thresholds.

Right, and it’s key to know that the more data you have, the better the recommendations, the more accurate they get. With ours, we predict out to about a week. Right now, that’s something that we’re working on as a product; keep your eye out on what we’re working on, because you will see improvements to that. The reason now it’s for a week is we can stay accurate on that, so if you have a week’s worth of data or even more, we can actually determine you’re going to have this issue in a week if you don’t change something now.

Perfect, and what Chris was referencing was the Virtualization Manager page on thwack.com–what are we working on–a blog post that you keep up to date.

It’s things that we currently are in the research process or actually implementing, and it’s broken out in areas of either ease of use or strategies that we’re implementing. It’s very easy to follow, so any updates that are made to that are usually after a release that we have. We usually go back through and update that. It’s good to bookmark that so you get constant updates on any changes to it.

Perfect. You’ve mentioned two types of recommendations, active and predictive. What type of strategies form the basis of the recommendation engine?

To take a step back, the idea of recommendations in general is to get your environment to a much better level of health. We don’t want it continually making changes, so it’s a strategic long-term initiative to get it in a state to where your environment can accommodate all the needs. With that said, we have four basic strategies that make up those predictive recommendations. Those are storage capacity, host performance, workload balancing, and VM sizing.

Perfect. With respect to these four strategies, you’re talking recommendations, it goes into remediation, it goes into the third topic that we’re going to talk about, optimization, all those are coupled together. Chris, can you walk us through managing policies? Because the recommendation engine is based on this notion of policies.

Right, in the VMAN world, we have manage policies, which you can go into and add an exclusion. A perfect example in a real-world scenario is, you have an environment that you have set up, but maybe it’s a developmental environment. It’s pre-production, which is a common term that we use a lot, where it just wasn’t ready to go live. You don’t want alerts on it. Or it could be something quite the opposite, it something that you decommissioned, you just haven’t removed yet. So you don’t want that to become noise in your environment, that just becomes clutter, you start ignoring the stuff that’s true to you that you want to see, so you can go in and actually click on a VM, and decide that you don’t want to see that anymore. You can create a policy, and you don’t want to actually have that policy have anything to do with recommendations. An addition to what we’ve been doing is, you can exclude based on that management action. If you want recommendations but it is a tier-one application and you cannot remove any memory— So, even if you’ve over-provisioned, which is to the chagrin of a lot of virtual administrators, application engineers will come back and say, hey I need x amount memory, I need x amount of CPU cycles. So you give them that, and they’re only using five percent of it. Because of the priority of that application, you can’t always go back and resize those in a real-life scenario. Instead of having that noise, you can put a policy that says, “I don’t want to see recommendations based on memory sizing or CPU sizing, but I want to see it on migration.”

Perfect, because we understand that each and every one of your environments are unique. And it will throw a unique baseline, and what the recommendation engine is trying to get to is what your good state eventually is. With these exclusion policies, basically you can filter out the noise and get to the signal that you really want. You can also impact and influence the policies based on your knowledge, your expertise, of what your virtual environment’s behavior is.

If you want to go in and make a change in your environment, and if you have change controls in place, and you want to make sure that you’re making the right move, you can use Virtualization Manager recommendations to emphasize that and confirm that by going into those recommendations, and you can see the data that supports that.

Basically, what you’re saying is, folks, you can leverage VMAN recommendation engines to establish that baseline, that baseline of good. And continue to use that, because your environment’s not going to stay static. It changes over time.

Absolutely, and the other thing is, one size does not fit all. Just because we make a recommendation does not mean that’s important to your environment. As much as we want to show you what you may not know, that you had a problem, we also want that to fit your environment to what’s important to you. Everybody’s environment’s different. They have their own business cases that affect that, and because of that, that’s where policies come in place, and that’s where you can actually mute some of that.

So Chris, how many default policies, default recommendations, does Virtualization Manager’s recommendation engine come with?

Out of the box, we’re not going to put any exclusions or policies, because we want you to see what is in your environment, and then have you manually go in and exclude those. We have the four strategies, you’re going to get all the recommendations on those four strategies, so immediately what you’ll get— Usually what happens out of the box is, you’ll get all the active or remediation type of recommendation, and then a week later, you should be getting, based on the health of your environment. It’s very dynamic based on the health of your environment. You may be getting more of those predictive recommendations. One thing that we’ve seen in customer’s environments is that once they do those predictive recommendations, it doesn’t always mean that you won’t have any more recommendations. To the point you made earlier, your environment’s always changing. There’s always different needs in your environment. Once you do that first round of recommendations, you actually may have a lot more recommendations that come down the pipe as you put more load on your environment.

Excellent. Just to recap, the up-and-coming policies, strategies, and so forth, will always be talked about on the Virtualization Manager product log, what are we working on, correct?

Right, if you go out there today, you’ll actually see a section that actually deals–we have several new strategies, several new implementations that are coming down the pipe, and you’ll see where we’re going in our road map. It’s essentially our road map on THWACK.

Perfect. And if you want to affect those policies, those recommendations, vote them up.

Provide use cases.

As features.

Yeah, exactly, the best use cases or the best stories that I get back from customers, are when they define a problem they’re trying to solve and they just can’t solve it with our product, or with another product, and what they’d like to see. And when you provide that story, it’s much easier for us to implement that in a product.

Recommendations gets you to ‘good,’ and establishes that good baseline. What that speaks to is optimization, and it’s an exercise that we as IT professionals strive for every day, or struggle with every day. How do you stay at good? What is good? How do you maintain good?

Without the right tools, it’s a mythical nirvana. You’re just not going to get there. There’s too many other things that are going to pop up, things that do get those little steps that do get you to that good, better, best state, aren’t just one action. It’s a series of actions that make that up. If you aren’t prioritizing your time and identifying what are issues in your environment, and then re-prioritizing your time to get ahead of those, you’re not knocking those out to get to that better state. That’s what VMAN does for you; it helps you get to that better state, over a series of time.

Exactly, because if you’re optimizing for everything, essentially you’re optimizing for nothing.

Absolutely. You’re trying to fix everything; you’re going to fix nothing.

So let’s fix one thing, with this leveraging recommendations engine, to optimize your virtual environment.

If you’re going into a recommendation, the way you can execute it is, you actually select it on the recommendation itself, and then scroll up and you can apply selected recommendation.

Okay.

But, in this case, if you want— Most admins are going to go in there and see invalidate the data. We’re just not asking you trust VMAN, we want there to go in there because it is your environment, and you should understand why we’re making this recommendation. When you go in here, it actually gives you the problem, provides the recommendation, and steps to perform. But when you click on the statistics tab, one of the key areas here is visualize the data. Here is before the recommendation, and that ‘now’ line delineates what’s going to happen after you execute the recommendation, if you do it now. And you have a current value, in this case it’s a current value of storage and a predictive value of storage, how that’ll change. Then you can actually, based on this, you can actually go through here and execute it at that point in time, or you can schedule it for a later point in time if you have change control process or you have a change control window that you need to execute.

Excellent. Recommendations engine can be leveraged to continue to optimize your environment. First pass through, get to the baseline, and continue to establish that baseline. It gets back into that train of thought of trust but verify, allows you to leverage your expertise before you enact that policy, correct?

Right, and then as you go over time, this plays two roles, too. Once you verify and you understand what VMAN is doing, and we try to provide those steps easily and as breadcrumbs to follow where the logic’s coming from, and why it’s going to benefit your environment, then you can start doing a little bit of automation if you want. You cans schedule a bunch of recommendations, you can select a bunch to execute at once that you consider low-hanging fruit. Or if you want to just execute where you feel safer doing that in your environment, you can do those all at once or you can schedule them for a later time and stagger those. Then you can start building–you build recommendations into your business processes, into your IT processes, as a company, so it fits your environment.

Thank you Chris, for enlightening us and showing us three core foundational skills put into practice via Virtualization Manager. Now you can leverage discovery to get a baseline visibility into your virtualization environments, then move to remediation via the recommendations engine and the default that comes with it, both in terms of the types and the strategies that are available by default, and lastly, leveraging that same recommendations engine, you can step into the optimization skill set, which allows you to maintain good in your virtual environment.

You can download the eBook, which contains the DART framework, which is everything we talked about earlier, discovery and troubleshooting, remediation, and all those tenets are what VMAN is based on. Again, there’s not one single magic bullet that’ll get you to that great state, but if you follow these steps over time, you should get there.

We’ll include links to the eBooks in our helpful resource section. From SolarWinds Lab, I’m Kong Yang.

And I’m Chris Paap, thanks for watching.