In this episode, Senior Manager of Product Strategy Chris O’Brien is joined by Head Geeks™ Patrick Hubbard and Leon Adato to explore better tools you can use for sophisticated monitoring challenges – challenges like Cisco Nexus devices and automated mapping. The solution are techniques that offer both depth of functionality and scalability. You won’t want to miss this!
You know, I think the thing that we like most about this gig is that nobody ever tells us that we’re done, that a tool is good enough.
Right, like Ping for uptime. It’s not good enough. You need to have either SNMP or WMI response time also as a minimum.
Yeah, but, isn’t it natural to revisit existing tools to identify areas that can be improved, or ways to leverage new technology in creative ways?
Well, right, okay. So, you mean things like taking NetPath and totally transforming that base technology, and reinventing Traceroute, and then giving it away to the world as a free tool to make it a better place, and to see joy and sunshine and puppies, right? That was what you meant?
Exaggeration is almost like breathing for you, isn’t it?
Ah, I just, I like to tell facts in an epic kind of way. Alright, okay, so let’s take another example. Network Insight. Now we did Network Insight with F5 back in 2016?
Right, and then a year later, 2017, ASA, Network Insight for ASAs was added, and now there’s deep management for Nexus. And NPM is going to try to Network Insight all the things by the time we’re done.
Perhaps, but I thought we were pretty clear Network Insight is bigger than just monitoring for F5 and ASA.
It is, and that’s why feedback is so important from the community, and its really kind of how we confirm what is a real need versus what we just think is cool. And that reminds me that, if you have feedback for us live during the show, just go ahead and put it over here in the chat box that you see. And if you don’t see that chat box, it’s because you’re not with us live. So, swing by our homepage. That’s lab.solarwinds.com. You can check it out for reminders for upcoming live episodes and you can check our catalog of previous shows. So, what do you think, Leon? Should we show them some new ways to solve challenges?
I’m up for it.
Okay, no exaggeration. This is going to change your life!
I think you mean, uh, from my personal perspective as Leon Adato, Network Insight for Cisco is going to change your life.
Right, yes. In my personal opinion, Network Insight for Cisco Nexus is going to change your life!
Yeah, personal opinion. That’s how you Geeks get your shows around legal, huh?
[Leon And Patrick] Yeah. It’s one of the ways. Mostly we just try to be really nice about it. Yeah, so we have a lot to cover today, including how to use the new Nexus monitoring tool, the NPM, and some really cool enabling technology that’s also been added to NCM.
And don’t forget, we’ve got the new Maps and also a couple of other goodies.
Yeah, so what do you think, Chris? You want to help change Leon’s life? I think the first big question, really, is, what is the difference for Cisco Nexus? Why is it a challenge and why is managing it a little bit different?
Well there’s a couple things, right? But the first thing is, Nexus have a distributed architecture for their hardware. So, you’ve got this great big chassis switch. You put in some line cards and then you can also have something called a Fabric Extender, a FEX. So, this is sort of like a line card that you take out of the chassis and you put remote, right? And you connect it up to the chassis. Now that is, effectively, a remote line card. So, with that architecture, you can have hundreds, thousands of ports on a single Nexus.
And you’d, of course, want to have a bunch of ports for security and configuration.
That’s right, and so, in today’s Orion or yesterday’s Orion, that did not look so great. So, we would, you’d go to node details, you would see a list of your interfaces. You would see all the interface names–
And another list of interfaces and another list of them and [imitates explosion]
They each had slightly different information on each resource. So, you would tend to need multiples and it’s just a lot of text.
One of a hundred in the resource.
Yeah, there’s paging, all of that stuff. So, we worked on that, and we knew that going in. That was one of the first things we worked on. We created this new view called Interfaces. So, we’ll click into that, and one of the cool things is we did this for Nexus, but we also backported it to other nodes. So, you can get this on any node now, but it’s particularly useful on Nexus. So, this uses our list view, which is a component of our UI framework. The UI framework is important because it creates a consistent user interaction model across a whole bunch of different tools.
So NetPath and a bunch of the other really graphically intensive UIs are all using this setup.
And you know when you’re looking at it because usually the screen comes up first with the little Rubik’s Cube thing, and then that’s how you know you’re looking at that interface framework.
One, it’s got UI in the URL. Instead of saying Orion, it’s /UI.
Yep. So here we see that interface’s subview. So, we’ve gone to new details for Nexus. We’re at Interfaces Subview, and we see our list view here. Now, you have a lot of interfaces on Nexus and we can control this in all the ways you would expect to control a large list. So, we’ve got page size, sorting, complex filtering, so I could look at, for example, only my up interfaces that had a critical utilization. You can pop those off as you see fit. So, all of this stacks together. You can also search, right? So, one of the common things on a Nexus is to search for a FEX or a slot number. So, you can 1/ and you’ll get all of your 1/ interfaces. We also highlight anything that’s over the custom thresholds that you set, whether those are static, manually-defined thresholds or the dynamic thresholds. We’ll call that out over on the right. So, clicking into any one of these will bring you to interface details, which is the most detailed view we have about that interface, all of the information that Orion has about that interface. This view is not rocket science. It’s just a nice, polished, sort of sophisticated way of dealing with large lists when you have a thousand or 2,000 interfaces on a Nexus.
Okay, but, maybe it is revolutionary, in that, or, at least, re-evolutionary,
in that when we started, we mean us as admins, 24 ports, 48 maybe, that was getting to be a pretty good size switch, and so the idea of how many of those elements, graphically, would fit into an interface was scaled based on what we actually saw.
And dealing with the configs and it wasn’t having the ports. It was the care and feeding of the ports.
Right, and so now to say 2,000 interfaces, all configurable, where you need to search. Is it by those with errors, or some of the other data values on these? That’s something that just didn’t exist. It wasn’t part of our experience as admins. So, it is not just evolutionary. It is really, really helpful, especially in this context.
Yeah, I think so. So, we knew this problem going into it. We knew we would have to solve it and so forth. So, the other thing that changes is a Nexus has, if you ask yourself, what is the purpose of a Nexus in my data center? A big part of it is providing super reliable access-layer connectivity. Like an order of magnitude better access-layer connectivity in terms of reliability than other devices. This is part of why you’re paying 200, 300, $500,000 for a single Nexus, and then pairs are more expensive. So, we thought about this as a sort of defining characteristic of a Nexus. And the way that Cisco solves that is virtual port channel.
Yep, so this is their technology to make it so that, basically, you can uplink a single endpoint to multiple switches rather than just one switch. So, you’re removing one of the last single points of failure in your data center. So, super popular technology from Cisco. If a customer buys a Nexus, they will very often buy a vPC. So, we knew that was part of the deal, and we wanted to provide great monitoring for vPCs. Now, when we talked to end users and administrators of Nexus about how they were dealing with those and what their challenge areas were, they told us that the real problem is tracing down all of the components to a vPC. So, let’s take a look, sort of similar to other network insights, on the left-hand side, you have subviews for all of the technology areas that that appliance is providing to the network.
Decentralizing configuration’s always a good thing.
Or when [mumbles].
So, I’m going to click into the vPC view here and we’re going to start with what is a healthy vPC. What does that look like in the tool here?
I’m guessing green.
Yes, that is true. We know what green means. Green’s very important. So, if we hover over this vPC 46 here, all of these are found and enumerated for you. So, you go and add a node, and you add your CLI credentials, just like Network Insight for ASA, and we do all this for you. So, you’re not looking at a lot of work to get this.
So that’s automated discovery that’s a part of Network Insight for Nexus.
Yes, that’s right. So, this vPC 46 is healthy. And we can see that vPC 46, which is the logical connection from both Nexus, that contains within it, port channel 28. Port channel 28 is the logical connection from Nexus-1. And then that port channel contains three member interfaces. These are the real interfaces where you plug something into. Now, on the other side of that pair, you’ve got Nexus-2 and it has its own port channel 28 with its own member interfaces here. So, if you stack all of that together, there’s nine components that you have to inspect and understand to know how this vPC 46 is doing. It’s a lot of components. You have to map out how all of these are connected.
Or identify them as configuration, which could be anywhere along either of those devices.
how expert administrators of Nexus were approaching this is, it would just take them a long time to do show vPC, show run interface PO31, show interface PO31, show run interface ETH 17, and continue to do like a dozen commands and then you
On two separate, physically separate devices.
Then you log in to a second device and do it again. So, they could do it, these experts could do it, it just took a long time. So here we want to do that for you. That’s what, that’s the information set you enter into when you get to this page.
So, speaking from the accidental network engineer side of things, just to explain this, just to go over this one little bit, what you have is two physically separate devices, two Nexuses, and then you have line cards, like you said. And you’ve got three ports in each one of them that are channel bonded. They represent a single circuit, but there are three different plugs over here and over here. So those are now on two separate things and then you, logically, make those two things appear to be a single connection also for redundancy and security and high availability and performance and all of those things. So now you have to manage two separate devices, three plugs on each device, two virtual connections connected together.
It’s nine different components in this example, each with their own status, each with their own health and performance, each with their own configuration. It’s a lot to deal with. So, experts were sort of trudging through it. It just took a while and they sort of save it in their head or on Notepad. But the engineers who are newer to Nexus, this is really painful. They often don’t know how this works. Just don’t know how it works. So, the scary thing here is, some of the environments that we started looking into with this with beta, we found that the company had put forward hundreds of thousands, or millions of dollars to build a redundant data center. And then they put in all of these man hours to make sure it was configured properly. And then everyone went home, stuff started going down, but thanks to all of that redundancy, the vPCs were green, the services were green. So now you’ve lost all of that redundancy that you paid so much for and you’re one step away from a catastrophe, but your service is up right now, right?
So, that’s something we wanted to shed light on with this feature.
Well, and it parallels that same issue with stack switches where maybe you have power or data ring redundancy issues, so you’re not as redundant as you thought you were. Or, ASA, VPN, configuration on both sides of the tunnel. So, again, it’s that same sort of visualization of those non-obvious configuration issues. And I mean, how many times did we have customers with the beta where they end up, basically, just starting with this view and then going into status to start with what’s down or what’s in a warning condition–
because a lot of times it’s just a misconfiguration and they can quickly identify it out of 2,000 ports.
Yeah, that’s right. So, when you’re chasing that Nex 9 in reliability, it’s super important to be very critical and methodical in how you’re managing your redundancy. So, we looked at what a healthy vPC looks like, right? Next, let’s, obviously, look at what an unhealthy vPC looks like. So, we’ll take a look at this vPC 30 up here that we’ve got some sort of warning. We’ve already found in NPM here that, PO30 on Nexus-1 is fine. That’s all green. Clearly, the problem is over here with my PO30 on Nexus-2. If I hover over that, I’ve got a couple of things going on here. I’ve got two down interfaces, one some sort of warning. What’s going on with that? High transmit percent utilize. That kind of makes sense, right?
It’s not going across anything else.
Yeah, half my capacity or 2/3 of my capacity is down here, so my remaining capacity is oversaturated. So that’s not surprising. The real problem is these down interfaces. Looks like this one may not be in monitoring. So, I can add that right here if I want to, but we’ve got two down interfaces. Those are really the root of the problem. And when I think about why an interface could be down, it’s really two sources, right? An egregious misconfiguration or some physical problem. The cable’s cut, there’s no cable plugged in.
That never happens.
Sometimes that happens. So, let’s click into one of these down interfaces and we’ll see what’s going on here. One of the other changes we made with Network Insight for Nexus is the configuration for an interface is pulled into your interface details view. So, we see that right here. It’s really nice to have the configuration in the same view where you have the performance and the status information.
This alone, I mean, the amount of joyful screaming in my house when I saw it [screaming], this is amazing. Forget about all the rest of the Nexus. This is awesome!
Yeah, it’s really simple, but this is how it should work, right? Because, as a network engineer, when you’re troubleshooting something, one of the easiest ways to baseline what’s going on is look at the config. Because you could apply hundreds of different commands that would change the behavior of this interface, right? And so, how am I thinking about what this interface is going to do in my sort of problem area could be hundreds of things. So, go check the config. Here we see on this interface Ethernet 116. We’ve got a description, which is like a comment, and we’ve got four lines that matter here. The ones that jump out at me are we’ve got a ipv6 access list out. And we’ve got two VLANs being allowed through this trunk and that’s it. So, nothing here is crazy. In fact, this looks like most of my other configurations for my interfaces on, like got that warm and fuzzy that there’s nothing crazy going on here, right? I feel good about that. So, it looks like this is a problem, a physical problem of some type. You can see here we’ve got some demo data problem where this is up. But, in any case, in this scenario it’s a physical problem of some type.
So that’s the configuration for the interface. But, obviously, we care about the larger configuration environment as well.
Yeah, so let’s head back to the vPC view. Now, one of the challenges with vPC is something they call inconsistency. This is a Cisco term. They’re basically saying that the way you’ve configured, or the way you’ve configured your ports, or your port channels in the vPC, or the way those are behaving, they selected based on negotiations, is not compatible with each other. It’s inconsistent. A good example of that would be one of the physical ports allows two VLANs and the other physical port allows three. And you’ve told the Nexus to treat them like they’re the same thing. So, I would send traffic down the one that allows two VLANs for that other VLAN and the VLAN, the switch would drop that traffic, right?
So, big problem. We do detect that here. We’re not seeing that exact problem occurring in our demo space. However, we have the feature that helps you understand what’s going on and that’s that View Config button. So, with NPM, if the problem is occurring, we’ll tell you there’s an inconsistency. And then you can always, whether there’s a problem or not, click into this View Config. And what View Config will do for you is it will, basically, do the step that the experts were doing manually. It will take the config from the port channel on each side, and all of the member interfaces on each side, and put it side by side. This is an excellent and super easy way to spot differences. Immediately, I can see there’s a description on this one and the description is slightly different than the others. That’s fine. There is an allowed VLAN mismatch here, so that’s super interesting here. All of these are active. Everything else looks pretty good. The port channels look fine too. So, right off the bat, I know this is the thing that’s concerning ’cause it’s different from the others.
Now, I don’t want to accuse you of cheating. And I do want to call out that this is the beginning of a dark theme.
But um, this feels an awful lot like something that you inherited from ASA. From the configs for ASA and sort of a beginning to analyze configurations as objects, not just as text.
Yeah, I think in the ASA, that was our first big step in interacting with devices via CLI and sort of slightly automating some of the troubleshooting steps that, particularly, experts were doing. And then providing that to you like, right when you showed intent to investigate that thing rather than you go and do a manual step.
Intent to investigate. I like that. I think I’m going to use that in an email next time I get a ticket open and someone’s asking, have you got the CIOs, like, are we on that? It’s like, we have
[Leon and Patrick] intent to investigate.
It’s not solved. I don’t plan to solve it, but I have an intent to investigate it. How’s that for non-committal?
What if you have increased urgency to intend to investigate?
So, speaking of intent to investigate, and the whole concept of digging into things, one of the other pieces of the Nexus that I think is very attractive to organizations is the security facet of it. That you’re able to bring security closer in to the port-level experience.
Or at least just attach it to the thing that should be secured.
Right, and Network Insight for Nexus helps address that as well.
Okay, and that really bore out in the user research. So, one of the big findings we had was, Nexus administrators were pushing security closer and closer to the endpoint, closer to servers. So, this materializes as more port access lists, PACLs or VLAN access lists, VACLs. So, we just recently worked on access lists with ASA, right? We did a ton of work with that with the new rule browser and optimization, stuff like that. We, basically, took that functionality, extended it to the Nexus, and added on top of it. So, two key adds is, we do MAC address lists now. We also do non-contiguous subnet masks to the extent that you use them. So, if we look over on the left-hand side again with that sort of subview concept here, we’ve got an access list subview. We’ll drill into that. The first thing I’d like to highlight about the access list view is, we version control each access list. So, again, like the list view, this is not rocket science, right?
But you keep saying that.
Yeah, but this is how it should be. So, if you think about a simple question like, how has my access from this VLAN into my organization changed over time? Before, that would have been really hard to answer. What would you do? You’d like to look through your change controls and parse out commands. Or you’d go and look at the version control for the whole config, and this device has thousands of ports, so there’s a lot of noise. By version controlling each access list it becomes a very direct question. You just look at the access list you’re interested in and you can see all the history of what changed and when. So, all of your access lists come in here. Again, no extra configuration. Now, there’s something going on with this acl-01-ipv6. We’ve got some warnings over here. So, I’m going to drill into that. And we’ll go into the rule browser view. So, this is the deeper analysis on a single access list. You can see we have our filtering and our sorting options as you would expect line numbers, how we process. Hit count, maybe a way to zone into the rules that aren’t being used anymore. We’ve got search, which is a great way to find object groups and all that sort of stuff. We’ve still got the object group enumeration. So, you click on an object group, we tell you all the members. All of that sort of stuff that you saw with the ASA. There is one thing I’d like to highlight because I think it bears using. This works for ASA as well, but it’s super effective. So, basically, we’re looking at a simple access list here and we’ve got some sort of problem already. Not good. If I hover over this, you see it’s warning about a fully-shadowed rule. So, I’m going to click to get some more detail here.
And, just to point out, that that language is consistent with the language that was used in the Network Insight for ASA. We’re not inventing new language, whether it’s a duplicated, or a shadowed, or whatever terms we used, still work here so folks who are used to the ASA world and now are starting to manage Nexuses.
And it’s also an industry standard term. So, if you’re talking about a firewall where, so where you would have a lot of policies, that is exactly the same thing that you have going on here.
Yeah, so it’s kind of like a four-by-four square where you have two terms. The first term is fully shadowed. The first term is fully, the second one is partially, right? So, if you think about access lists, the way a Nexus processes an access list, as you guys know, is it will take a packet and compare it to each line in your access list, one after another. Once it finds a match, it takes the action specified. Permit it or deny. So that’s how an access list works. A full redundancy means that you, basically, you have some set of line specifying this traffic and then, below it, you have a subset of that traffic that you’re trying to act upon. And because the first rule already took action on that traffic, the second rule doesn’t matter. It won’t ever have any affect. So, this is a chance that maybe there’s some sort of misconfiguration or optimization opportunity. So, that’s a full shadow. A partial shadow is when some of the traffic overlaps. The other piece that’s important is whether this is overlapping or a shadow. The word, overlap, basically means that the rules are trying to do the same thing. So, both rules are permitting [mumbles], you just can’t permit the traffic twice. It doesn’t matter. Maybe you remove the second rule to make it a little bit easier to maintain and think through. But the really dangerous one is the shadow one, right? That’s where the rules are trying to take a different action.
So, close on value.
Yes, so clearly, like one’s trying to permit, the other one’s trying to deny, but the traffic isn’t getting to the deny. So, this is a contradiction in how the device is behaving versus what the administrator appeared to intend to configure it to do.
Right, and whenever you have a serious fail with security, it’s always that, sort of, occluded intent. Where you think you know how it’s going to work. The code example is like unreachable code, in the previous example of, it’s basically blocked or it’s denied by the first rule and that second rule will never happen. But, often in security, it’s, we had something right, we’ll maybe we even got it right the first time, and then we’ve gone back over time and adjusted it or created some particular ACL for an edge case, and that’s where we end up in that situation where that intent is not clear.
Yeah, these access lists grow organically over time and they get really complicated. So, we’re seeing a really simple example with two lines here. But in production environments, you usually have dozens to hundreds of lines, and so this sort of optimization gets really complicated. There’s a lot of math. If you think about allowing an object group with 20 subnets in it to an object group with another 20 subnets in it, it’s 20 times 20. For that single line, you have to calculate 20 times 20 in all the subnet masks. So, I think that math is something for computers to do and for people to understand. We understand what’s going on, but computers, please do that math for me. I’ve done this manually myself a number of times with access lists with a hundred lines, and you stare into the access list until your eyes bleed,
Into the abyss.
and you’re crying
You stare into the security abyss.
It just takes hours and hours, and there’s a lot of risk involved because humans, not that great at math. So, having it here in a way where it just pops up, I’ve expressed no special configuration to this and I get the full details. So, this is really satisfying for me.
I know that I’m really excited about talking about the scalability improvements in what we’ve been doing because I’ve been writing a lot about it lately. In fact, very soon I’ve got a couple of white papers and an eBook coming out that I’ve spent a lot of time sort of crafting. So, obviously, I’m excited to see it come out. And the THWACK Mission this month is all about scalability. So, those are all the reasons why I’m excited about it, but somehow, I imagine we have some other, more altruistic reasons, less about my ego, perhaps.
Yeah. We don’t actually make all our product decision based on Leon’s ego, which is good for our customers.
So, one of the things that we found is many of our customers are growing. This is not super surprising, but people are growing in their career and they’re taking us into larger and larger environments. Oftentimes, we will also have an instance here or there, sort of a departmental installation of Orion, some Orion tools. And then, over time, that installation grows and then you get others in different areas of like Fortune 500s we’re in, and then, all of a sudden, it’s like, why are we paying for Orion six times? And we start looking at consolidation, right? So that consolidation and growing with the customer who’s growing in their career, we want to make sure that our tools continue to provide a great experience for them.
For those situations, we’ve always offered the Enterprise Operations Console. You can take multiple installations and then combine them, right? But I feel like most customers don’t want that. They want a single installation.
Yeah, Enterprise Operation Console combines all of the monitoring visibility, but it does not combine the configuration. So, a lot of these customers would prefer a single instance of Orion in their environment.
Right. Now, one of the things that you mentioned to me earlier was that the way that we approached this was kind of two pronged. One was the architectural side, but also there was a talking-to-the-customer side, which is always nice.
Yeah, so going into each one of those. We knew, right off the bat, that the element count was a big problem. We were at a hundred thousand elements. Now, with this release, we’re at 400,000 elements that we support. So, four-fold what we could do before. That really was driven by our engineering teams and our architecture team so, looking through our code to find all of the bottlenecks, the areas where, in production environments, it does not have the capability to go past a hundred thousand elements. And then solving those. We also built some tooling and instrumentation on our side, so that we can test at a large scale in a consistent fashion, repeat ably, and all that sort of stuff, so that we can make sure going forward, as we release new software and so forth, it still scales well. So that’s sort of the engineering side.
Okay, but, going back to my point earlier about I’m not sure if that’s not revolutionary. I think we have all been involved with software for a long time. I think you have all had many different types of software, not just your monitoring and management tools. And I think the biggest one, in terms of scalability and performance and just making things work better over time, things tend to get more complicated over time, not less, so to be able to actually improve performance, especially four X, is huge. And it is hard. And the amount of work, I mean, how long, this is a couple years worth of work, right?
All bundled up. You always are involved in the releases where you sort of sneak things out. Like, hey, NetPath. So, same thing. Hey, four X improvement in performance. But this is based on a ton of feedback from you, and very specific performance feedback about where you would expect to see things work better. And not just be able to scale to much, much larger environments, but to be more efficient on the hardware that you already have.
And the customer engagement. So, you talked about the architecture. Right?
But the customer conversation piece is also that. That we wanted to solve, we wanted to find out what the real problems were on the ground, and part of the problems weren’t even technically oriented. We just came out with an optimization guide that we worked really, really hard to put together because people were constantly saying, “Love the tools, I just, I don’t know how big to make it.” I don’t know what to provision.
Or the cloud install guy. Right?
So, not just increases in overall scale, but increases in the scale of complexity.
Right, so, and we’ll have links to those things in the show notes, so don’t worry about like Googling it right now or anything like that. But I think we’ve danced around it, we’ve sort of hinted at it, but what are the scalability improvements that we’re going to see? Let’s start off with the old. Okay, so, the way things were before, I’m not saying the bad, old days, but, in the days of yore, we had a hundred thousand elements. We had 20–
A hundred thousand elements per instance.
Per instance, right.
So, for customers that were already many, many times more than that they would use EOC.
Twenty additional pollers, which a lotta folks didn’t know ’cause they didn’t get to that point. Let’s see here, we’ve got, oh yeah. You could install SolarWinds in the cloud, but it wasn’t really of the cloud. It was cloud present, but not necessarily cloud happy.
Well, or it just required you to, maybe, know a little bit more about a cloud environment and tuning for that environment. It worked exactly the same way and we have a lot of customers who’ve actually migrated Orion, the Orion Platform, along with the rest of their apps, sort of depending on where they are in that migration to cloud. But a defined, this is the approved way to use the sys in that environment, that wasn’t documented.
Right, so let’s take it backward. So now, the improvements for cloud, like what have we, what’s the new version doing to address that?
So, the first big thing is for Amazon, for AWS customers, we now support Amazon RDS as your primary database server.
I knew that’d make you happy.
So that’s your primary database server now, which is great. So, you’re officially supported. Good to go on Amazon RDS.
Okay, and we also can install pollers on Amazon EC2 and also as Azure VMs. So that’s the cloud piece.
It’s funny, lots of people are migrating to the cloud and, particularly, to Amazon and Azure. And when we inspected, we found that, actually, we had several hundred customers already running in Amazon, which was a little reminder to us that we’re a little behind. We need to go and certify some of these and have the official support.
Right, so that’s one thing. Talking about the pollers. Again, working my way backward up. Pollers, we now support a hundred additional pollers–
Yep, not 20, but a hundred.
And for elements, this is the big number that I love saying, 400,000 elements in a single instance. So, four times improvement.
Well, and just from the cloud perspective alone, the increase in the number of pollers, I think is huge. Because that ends up being not so much about an expansion of overall numbers of things that are monitored, ’cause that’s more about transition, but there’s a lot of dislocation and a lot of breakup where you’ve actually got dedicated cloud environments that are, maybe, differentiated by features. Maybe you’ve got some in AWS and some in Azure and some in Google, and so, it’s now not so much that you’ve doubled the number of things that you’re monitoring, you’ve doubled or tripled the number of environments for that monitoring, where you want to have low-latency disconnect-tolerant monitoring where a poller would really be appropriate.
Right, and that doesn’t even get into the things that we’ve had for awhile, which are also scalability features. Again, we’ve mentioned EOC. There’s the stackable pollers. There’s–
Additional web server
Additional web server
is a big one
is a big one that people don’t realize is the impact. Agents, a lot of people don’t think about agents as being a scalability piece, but it has store and forward. It’s a way to extend the environment in a particular way.
Are you, are you
Is this the part where you get to ask me if I’m somewhat excited that there’s an agent for ARM?
I was going to, I was just getting there. I was going to say about the agent for AIX,
’cause I’m excited about that one, but, yes, agent for ARM, Raspberry Pi.
Yeah, but it’s really shocking. When we rolled that out we thought, “Hey, this is nice” because people are going to want to be able to use a Pi as a monitoring agent, sort of a physical monitoring agent or environment monitoring agent inside of a data center. We had no idea how quickly that was actually going to be adopted and how many of you are actually doing that.
Right. Last piece I want to point out is sort of a caveat installer thing. The word deprecated. The word deprecated does not mean not supported. The word deprecated means that it will, in a future date, not be supported, but it is supported right now. And the reason why I mention that is because Windows 2012, SQL 2012, are being deprecated in this version. That means we know that installing the upgrading, especially at the OS and SQL level, is challenging for a lot of our customers.
It takes time to approve. It takes time to get into the environment. All those things. So, we’re telling you now, well ahead of time, this is the last version that’s going to support those versions. So, be aware, know that.
Except I think, I think they’ve figured out the secret. And it’s funny ’cause I had someone from the THWACK community come up at, what was it, it was re:Invent in November, and actually said, “Yeah, I figured out when SolarWinds “transitions on to the next version of SQL Server.” And I said, well, that’s really interesting ’cause I’m not entirely sure myself. And he said, “Well, every time there’s a “big increase in capability in SQL Server, “it’s 18 months after that.” That it’s deprecated 18 months after, and then support usually is a couple years after that. But yeah, so, in this case, 2016 is offering a whole lot of capability that’s going to be really great for everyone in the future, and so this is the beginning of getting everyone to move toward that.
It’s a tough decision because there’s a lot of people still using 2012, but, if we preclude ourselves from requiring 2016, then we really deliver old technology. We’re just always building on old technology. So, this release deprecated 2012. One other thing that’s coming along with this release is in NTA 4.4, which actually requires 2016, or you can stay on NTA 4.2.3., 4.3, then you get a new database engine on the back end. So, we move that to MS SQL using the column index storage. So, a technology that’s specific in 2016. So, particularly in scale environments, NTA queries go faster. And what that means to you, is NTA webpages load faster.
So, it’s basically built-in fast bits that would have been in external flow storage that’s just included in your–
So that covers scalability really well. And I think that we’ve covered enough for today.
No, no, no, no, you are not going to take us out until we talk about Maps. I cannot imagine how I would manage an environment without using Maps and we’ve added a new type of map in this release, and Chris is here, and we have to talk about that. Because we now have, what, four different types of maps that you can use in different situations, and I want to just briefly cover how they’re different and what you would use them for.
Okay, then, here we go, Maps.
So, people love Maps. So, let’s start with the new one. You’ll notice on many entity pages, so an entity is things like a node, an interface, a server. All sorts of things–
that owns another thing, or, at least, is related to another thing.
Sure, yeah. On many entity pages, if you look in the left, you’ll see a new subview, we like our subviews in this release, called Map. So, I’m going to click into this. And the first thing to note here is you’ve got nothing to configure. [laughs] So, one of things that we found in research was, lots of people make maps and then lots of people let them go out of date. And so, they become less and less useful. You’d look at a map when you’re troubleshooting and you’re like, “I know this piece is wrong.” “Ah, I’ll just ignore the whole map ’cause who knows what else is wrong?”
Well, but I don’t really think that’s, that’s part of it, but also, like with NetPath, the whole point of it is the network is changing so quickly you could not possibly manually configure it or set thresholds or anything else.
Yeah, that’s the root cause.
Yeah, so automapping, in that case, is absolutely necessary and so, we got so much feedback from users that said, well, can you just extend that same capability to relationship mapping, that that’s a really big part of what drove this.
Yeah, particularly in scale environments that mapping automatically is important. So, what we see here is we’ve got a map subview for this lab-transit-switch2. We can see there’s all sorts of different things going on here. This left connection is a dependency. So, we’ll map those out. We also do our topology connections. Basically, all of the data that we’ve been building up about relationships for years in network topology, in AppStack, which has a bunch of different types of relationships, all of that shows up here. So, when you click on Maps, you have the node you came in on, or the entity you came in on, centered, and then everything that’s connected to it directly attached, and we get some little bubbles that tell us more. Of course, we can click down to inspect and get more details about any of these things. It’s all linked to the deep data we have in the details pages. So, you can click for the node details or interface details or volume or whatever you’re on.
And so, in this case, also these pop up performance metrics. These are going to be based on the type of relationship, right?
Yeah, absolutely. Like, you want different information depending on the type of relationship. Here we’ve gotten a traffic utilization on a network topology relationship. So that’s one example of that. The other example is when you look at a group. Now it becomes super interesting, right, because everything in that group is something that the administrator has told us is related. So, we can put that all on one screen here. And, you can see some of our fancy zoom changes here and some of our re-layout options. We can see there’s sort of a tree structure here. There’s quite a bit going on. And this contains everything in the group and what’s directly connected to that.
So, whether that’s a manually-created group, or a custom query-based group, or whatever it is, you can now map and display those relationships ’cause you know they have something to do with each other.
There’s a ton of detail and data available through inspection here. But that’s a quick look.
Okay, so just to kind of wrap this up again. It’s not exactly a debate between having more in-depth views or scaling. But they really are connected, right?
So, you could say, that, hey, our being able to visualize 2,000 interfaces in a Nexus is actually a deeper view into Nexus, but it allows you to scale your use of that system. In this case, especially like, if you have, dynamic groups, for example, those are the ones that are, you’re never going to have time to create a map for those. Having a map that’s automatically creating, or automatically mapping, the objects and the relationships in that group allows you to really scale out your use of dynamic groups, and an increase in performance makes the queries that actually connects those groups and that do the discoveries for automated mapping, for PerfStack and AppStack and all the other areas where we’re seeing automatic connection between objects, perform even better. So, I’m not entirely sure that there’s really a debate between what is depth of capability for specific things like Network Insight for Nexus, or a large increase in performance that makes it easier to scale out to a larger environment or, at least, a more complex environment that you already have run better on the same environment.
Right, it’s the combination of, it’s not an either/or, it’s a both/and, right? You’re getting the depth as part of the process for building out the scalability.
Yeah, it took us, what, 20 years, but I think we’re getting there finally. [all laughing]
We are getting there. Now there’s one more case of the right tool for the job that I want to talk about before we close out. And that’s finding the right tool for the job of learning how to get the job done.
Ah, so training.
Exactly. It’s another example of SolarWinds providing new choices to solve common challenges. Now, if you take a look at the SolarWinds Success Center, you’re going to notice that, at the top, there’s a link that says “Training.” That’s going to take you to a list of e-Learning and instructor-led sessions.
Yeah, and I think the easiest way to get there is just Google SolarWinds Success, it’ll take you right to this page. But what I really hear you saying is that, regardless of what the challenge is, right, it’s hardware, software, meatware, that we’re always working on ways to make sure that there’s a way to solve that, and that can include products or training.
Uh, wait, wait. Meatware?
Yeah, meatware. You know, like wetware, human, chair, interface.
Okay, yeah, right. So, regardless of what biologically-based system you used to watch us, I want to thank you for joining us. For SolarWinds Lab, I’m Leon Adato.
I’m Chris O’Brien.
And I’m Patrick Hubbard. Thanks for watching.