
Episode Transcript
Whoa! So, you're telling me this not only shows me that my Azure SQL Database instance is down, but that the problem really is the network. Oh, I'm so loving this. Hold on, hold on, wait a minute. I'm going to tweet this. Get a photo. Hold on. You're missing the NCM magic. Because you guys are all up in the NetPath. You caught the fat finger that actually broke the link. You can remediate that with NCM. Yeah but I mean we see the outage but what do you mean by a bad config? You guys do have NCM and NTA installed right? No, I have DPA and NPM. And I installed NPM and NTA but I haven't gotten NCM yet. Oh my goodness, guys. Okay, NetPath is cool; I'll give you that. But without NPM's Super Friends like NTA and NCM, you're missing the whole awesomeness of the NetPath integration. Let me show you what I'm talking about. However, you need to get NTA and NCM installed like yesterday. Okay so let me get this straight. I'm going to be able to monitor by databases, my earth databases and my cloud databases and I'll know when the Network is giving me some grief and I'll know when some Network monkey has screwed up a config so I can blame and mock. I am so in. You realize I almost had him buying the whole it's never the Network thing, right? You realize that I'm one of those Network monkeys, right? Oh. Now focus. We've got a lot to cover here. Let's introduce the new superhero modules. Let's first say hi. Actually, first let's watch this. [Dramatic music] Only brave individuals woven in courage and forged in the fires of Geekdom may seek the grail. It has become a symbol. A symbol of hope, courage and Network unity. Many will dream. Only the chosen few will earn the right to don the lab coat. For they are no longer mere mortals, but SolarWinds Head Geeks. [Laser effect] There's a new Head Geek? Oh yes, surprise, surprise. Surprise. This is the worst kept surprise in SolarWinds history. Absolutely. Welcome to the team. Our new Head Geek. Destiny Bertucci. [Applause] It is so overdue and Leon you've been teasing. You've been using the coat with her in promos for a while and we are so great to finally be able to do that. You've been a huge part of the THWACK community since the very beginning and it's great to have you on the team. I just really appreciate you guys allowing me to come on here. I've been waiting so long to be a Head Geek so I'm so excited. Yeah well now, maybe you'll get some travel budget. We can get you out to even more shows. All right so you guys have a lot to cover. We'll get out of here. Tom, kick it off. All right. Hi, I'm Thomas LaRock. I'm Leon Adato. And I'm Head Geek, Destiny Bertucci, and welcome to SolarWinds Lab. We know you're going to have lots of questions so put them in the live chat window to the right. If you miss the live event, sign up for the next one and check out our backlog of shows. Everything you need is on our homepage. Lab.solarwinds.com. Okay, let's give them what they asked for. How to use NCM and NTA with NetPath. Okay, so we're going to start with NetPath and show you the integration with NCM. So you can actually go into the dashboards, Network, NetPath Services. From here, I'm going to look at the Salesforce because I know there's a router configuration error and so that's where we're going to start. So one thing great about NetPath is that it stores the information, obviously, that you have. So we know back here there is an issue. We'd already remediated it. So I can go here. And see the configuration change that actually pops up? [Laughs] So this is actually tied into the NCM. So the goodness that I always say that is wrapped into the integration is we're on the backside with NetFlow information and NCM information. So, by here, by actually seeing the broke link we know that there was a configuration change that correlated with the interface as well as these two devices. Right, so just to be clear if we didn't have NCM and NPM integrated on this system, then we would see that obviously there was an outage. We would see that the link was down. But we wouldn't get this little pop up that tells us why that's going on. Okay so I imagine that you can do something about it now. Right so if you click on it it'll actually show you the comparison of where the config change is. So then, it will actually show you right here. It's the old traffic shaper gag. Exactly. So if you accidentally leave out or add too many zeros, this can cause a problem and obviously, you'll have a routing issue. Leon. [Leon and Destiny laugh] Right. So if your database was on one of the sides and everybody is calling in, this is a time when you can say, "Leon." Okay I see where this is going. So with NCM being here available with the NetPath, you're able to see configuration changes, you're able to see the correlation between the interfaces and then you can actually drill in like you've seen and compare them. So automatically out of the box, this is already helping you visually to troubleshoot your Network errors, so you can keep this guy happy as well as yourself. Right. So what's next? All right, so we can also get in to binary configs because why is that important when we're talking about NetPath or you're talking about F5s? Well and I actually want to differentiate because I've had a number of people ask me oh binary configs, that means you can back up the actual binaries that are the iOS images. Right? That's not what we're talking about. We're actually talking about the configuration files themselves. That happen to be in a binary format. That is correct. So we can actually go into the configuration management and show you how they do it just the same way. Like a lot of people kind of wonder okay, is there a different process; is there something else that's going to go around there? They're just like the regular templates that we've always had. You just assign it. It'll know it’s a Cisco. It'll automatically backup how you already have it set up. So it's not anything that we haven't seen before. But we can go into the F5 for example. And I'm going to actually download the config. Then you can actually go in to the transfer status. That page makes me happy. Okay so that's complete, we're going to go ahead and look at it. So basically, it's the same thing that you recognize from a normal config file. Now this is important because a lot of the times when they download the actual binary configs you can do that offsite, like to a FTP, STFTP, things of that nature. Have them as backups; roll them back out. This is not any different than the other config backups. It just means that we are taking NCM to the next level and able to do these configs that are in a binary format that previously you haven't been able to get because they were in text only. Exactly. Now when you talk about F5s themselves when we go into the load balancing potion of it, a lot of people when we gather information from NCM we integrate it with the NPM and then you have the load balancing information that you have here. So it's in the stack layers. So you can drill into the devices. So if it's an F5 device that you're managing you can pull the configs off of it. Now I have to say this looks a lot like the Appstack screen. Which is, you know, wonderful. People love looking at that especially as they buy more modules and they see more data coming in there. So yeah, what are we looking at here? Okay so it's the same like Appstack. When you drill into these, it'll keep you tied into here. So if I go into here to the virtual servers, it'll actually track the services, the pools, the members that goes into it once you drill into these. So this one here this Virtual_adfs_14697, if you click on that does it gray out the ones that aren't involved? So it'll actually bring it to its own. So now it'll actually show you within the balancing environment, it'll actually show you. Is it connected to the GTM, the LTM? And which pools and members are actually associated. So like with here it's having a problem because it's saying actually that the pool members are down. It sees the one that is down and this one is in an unknown state. But we know that it's up from NPM. So what's important about this as Chris O'Brien has said many times is that we're monitoring from an F5 version, how an F5 sees itself. So that's key because an F5 it only will balance off of what it thinks and what it knows. We're actually monitoring these pool members so if you look here, we know this is an up node but F5 doesn't think so. So that's how you can troubleshoot it and then go in and make sure that its members are backed up and how you have everything that are set up with your F5s. Nice. So that's how the F5s actually go in. So we've taken it to where you can go into them. Drill into them. Same concepts is like you have seen with the Appstack. We keep you together with everything to go drawn down. We'll actually show you how F5 thinks. We also show you how NPM knows and then you have the configurations with the NCM. So when you combine these together, you actually see the bigger picture instead of just part of it. So, something that's also great with the configs is compliance STIGs came out. So we can actually look at these and there's like 134 STIGs that are actually available. What is great for people who are not federal or somebody that's well what is a STIG. Why do I need these things? Well a lot of people are just starting off creating their security policies. A lot of times you're kind of curious of maybe my policy is not up to par or, you know, am I really, you know, monitoring what I need to, are things actually set up correctly with my security? You can use these as a benchmark. You can use these as a template. Because you can actually go into these and change them. You can copy them. Take out some of the rules. Create new policies and then roll those policies up into your own report. So when we add all these in from the DISA STIGs, it's important to always kind of at least acknowledge that they're there. Because it gives you a foundation that you can actually stand on to create your own policies within your corporation. Right, again these are just the government standards for what they feel individual device types what they should have configured or turned off or what have you. So you know as a best practice or as a launching point for your own internal policies. Because we've gone over in the past, we had almost a year ago now we had a whole episode lab episode on NCM where we talked about taking these templates and then modifying them and then adding things for your corporate banner or whatever legal requires you to say or what have you. So this is just, we've just expanded that list of reports to include even more standards. And some of that we always harp on is the pyramid. So I mean you have the actual rules and then you have them into the policies, and then you have them into the reports. So here you can actually grab say you're wanting to create a new report, you can check different policies and add them to this report. Very simply. And then all you have to do is submit that and run the report and you're able to see all your information that goes there. And we'll do that from here. Okay so I'm just going to go to this one and actually view the report so you can kind of get the idea of what all those policies and rules in the background are actually doing. So when you look here you actually get a critical--the warning--the information from the reports, that the rules of which are in violation. And the cool thing is that you can actually click on them and you can actually execute the remediation script on that node itself or to all the nodes in violation. Okay and I say this every time we're demoing this, do not try this out of the gate. Do now, you know, it's one thing to say yeah I'm pretty sure this is going to fix it and another thing to say please change every single device in my environment, all 50, 100, 5000 devices that have this problem. You really, you know. It's wonderful because you can roll those out but just know what you're doing. It's really just one change that effects 50,000 devices. Right. And everything that cascades down below. Yeah it's not a big deal. But I just made one change. Right. Yeah and it's usually that one change that starts the firestorm, right? And then you've already may have fixed it but it's still having the ripple effect and affecting you even the next day or weeks or months later to know that you're the one that did it. If you're still there. That's a resume generating event. There we go. So at the beginning we talked about the integration between NetPath and NetFlow and we haven't had a chance to dig into that but there's some really neat stuff that we've added in the next, this latest version of NTA. First of all, NetFlow does show up here. It's not obvious. But if you look, for example, at this device, this interface, you can see down at the bottom here, that I'm getting one percent traffic between these two IP addresses. And I can actually over here on the side; I can actually go to that interface and see the NetFlow data that's there. That's just integration and again we're not here to really focus about, you know, Superman, the NetPath piece. We're going to talk about all of the other supporting characters. So we want to talk about some of the stuff that's in NetFlow that's a little bit new. Now one of the pieces is NBAR2. NBAR2 is next generation NBAR and it supports things like IP version 6 and version 6 transitioning technologies. That's included in the NetFlow data. There's also QoS tagging and things as well. And QoS tagging and IP grouping. Okay. That's grouping that you set up. So, to take a look at what that looks like, IP version 6 stuff you can't really show, it's in there because we're getting IP version 6 traffic but there's nothing there but grouping is kind of sweet because what you look at here at these IP address groups, is these groups, operations, sales, Virgin Media. So these are things that we set up as a company to say that traffic with these characteristics, this destination, these ports, whatever, these URLs, they are grouped together and so we categorize that so we can get a summary of the traffic or conversations that are happening there. You see another one here; this setup is with the NOC and East side, West side, so on and so forth. So you've got all that. So that's another piece that's really a lot of fun. Now the other really neat feature that's in this version of NetFlow, of NTA, is the inclusion of wireless data in NetFlow. What I mean by that is here I am on NetFlow transmitters and I can see that, you know, THWACK, and so on and so forth. Oh, this is a Wi-Fi. Wi-Fi local, what's that? Well, we are getting information off of this fast Ethernet interface, okay, so that's sFLOW, you can see that. There's information coming off of ProCurve, that's also sFLOW. Off of a steel head, that's NetFlow. Off of a Juniper, that's jFLOW. We also have coming off if VMware virtual switch, so we're getting, you know, that kind of NetFlow data as well. We're getting it from all of those and we can see the NetFlow data that is specifically wireless. So again, conversations. Who's using up the bandwidth, where it's going on a, you know, minute by minute, hourly basis. So that's a real neat enhancement of those features that you can see. And something also with our THWACK users, and you guys have been asking for it to be able to see the wireless information for a while and now they actually pull it all together and you'll be able to see it from NetPath and everywhere else. You know, to drill into it. That's going to make a big difference. Right and yeah, so it's all there in whichever screen you're on and the last thing I want to show you is something that Tom, I think you'll appreciate, which is the, and this is pretty standard but I think a lot of people overlook it, is the application view. We've broken down. We can see that 21% of our data here is SQL server and if I scroll down here, I open that out and I can see that this device on this interface and this device on this interface are transmitting that much SQL server data. You know, however that's going. We're going to dig into that in a second. Is that interesting at all or does that help? I've been waiting for this. Okay, you've been waiting very patiently by the way. All right, so if we dig into let's say this circuit, here we can see the hosts that are having these SQL conversations and how much has been transferred. So, you know, it goes back to something that we've been talking about a lot, which is the intersection between Network and database. And I know we have a lot of fun you know teasing each other about you know it's not the Network, it is the Network it's not the database and whatever. But, I really think that with everything that's happening around us in the IT world, knowing that overlap, having that integration point is crucial and I think that, you know, you've talked in the past about having experiences that highlight that. Yes, so the great thing for me here to be able to understand. So first of all, I'm always a person that's kind of said blame the Network. Because if nothing else at least it bides me time, right? And of course, if the Network guys get a chance to say, yeah that's right, you should have bought the right equipment to begin with, then everybody eventually wins. But the reality is that there's a lot of opportunities where you're really trying to troubleshoot something and you simply don't know. Everything on my end looks really good. And everything on the storage guy, everything looks good everywhere. And what you find is that it's a router somewhere, or it's a network card in somebody's desktop. And you'll be able to track this down saying, “Why is the traffic pattern different for this particular node you're looking at?” Like we have an example even back at an event in 2013. We were trying to do some demos, connect up to Azure, and the conference center told us flat out there would be no trouble with any of this. What ended up happening though is there was one router that had a firewall. And wouldn't you know, our traffic was actually going through that router. So other people, they were fine. They could get to it and they were saying it must be our stuff. It wasn't our stuff. Our stuff was running it was just that one router. I don't know how long it took them to figure it out. Well I do, it took then about three hours. To track down everything, they came back and said yes it's on our end, we've found it. While your demo was going. Well our day had started. We hadn't got to the point where we needed it; we were getting there. But no, it took them a while to figure it out. NetFlow, NetPath, all this stuff well with five minutes, 10 minutes. Right, that's, I mean, that's the point. And this is the integration between all of it. So the databases, you're not able to get to the databases, you notice something going on there. NetPath would have shown that there was a breakdown between your machine and there, and then NCM would have shown that a configuration change occurred at a particular point. So, you would have gotten all of that. Something else that you could have done in that situation though is that they could have looked at the other routers and just compared the configs. What they probably need up doing. It just took them hours to get to that. Yeah. The point where they're like oh let's just compare the configs. NCM would have done that, like I said, in just a few minutes. They would have been able to have that information. Right. That's something good about NCM that some of the other people do not know though is that you can compare configs that don't have to be from the same device. You can do it from others. Right, so you can see either from a baseline or just a generic baseline or you know, I have device A, which is my master, why is it not, you know, why is my other device not looking like that. And this is why now I say instead of keep calm and blame the network; I usually say just keep calm and monitor that. Monitor the network, very good. Because if you do that it just puts you at ease because this is a thing where I just need to spend five minutes ruling the network out as a possible bottleneck. Just like I do when I talk about virtualization. I say, he needs to look at the hypervisor layers. I just need to spend five minutes. It's a problem with the guest or the host. I look at those layers; I rule them out, now I can focus on the query itself. Right. Which is what I talk about a lot, which is increasing the speed to MTTI. MTTI? Mean Time To Innocence. Oh that's, I'm stealing that. Yeah, see, by all means. So I think a lot of our viewers have experienced that also. It really comes down to, can I reliably and confidently check off, it's not this and I know it. So that I can drill into what it really is because that fix is probably going to take some time. So there is one other piece that I want to show which is a little bit ‘NetPathy.’ But it's something we've been able to do since we've had agents available which is, most of our monitoring tends to be from a center point out to an external target, so from our polling engine or from DPA or whatever it is, how is that doing, how is that doing. So an outage is simply a description of from this machine I can't get to that, whether it's a firewall change or whether it's a router or what have you. However, if you put an agent on machine A and have it monitor machine B, and have it then report back to the polling engine, all of a sudden you've got a whole different world view. Now, you love WPM and so you talk about doing that all the time, having all these little devices out there running the same tests and you can get triangulation. Well, the same thing with NetPath. I can install an agent, a NetPath agent on one Azure device and have, let's say my application server, and have it monitor the path to my remote database on another Azure device and another Azure data center and then I can see the path in between. So I want to take a look at that just to see what that might look like. So I'll take a look. And as you can see here, you know, we just have the path. There's nothing exciting about it right now, I mean you know, everything is pretty up and running. But you're able now to know not just can my systems in my data center see if the database is up but, hey, you know, the application is running slow, oh, the connectivity between the application server and the database is having an issue also. So that can be really critical and really insightful for people to know the real health about their environment. You know, from all the different connected points. And to your point with WPM, like I use it all the time is that that's my unbiased user. That's what I always like to call it. I like to set up things from the outside so that I can actually look at it and be like, okay, they're not going to be pointing at anything, you know directly. This is just actual data that proves yes or no if the application is accessible. Is it not? Is there anything on this? Is this how it should be? This is a standard box. So that's usually how I monitor. Exactly. So this is great. I love the idea of being able to take all of these tools and learn to monitor everything on my database environment including the network. And in addition to all of this goodness, I can also use these tools to test my cloud infrastructure of choice in order to make sure that this would be a safe move for me. Right. And you can also use it to verify the network inside and out and when there is an issue use the integration for remediation and compliance and reporting. So much "yes" right there, guys. Just make sure that you update, not only NPM to the latest version but also NCM and NTA. Right. And the new installer's going to help me figure out which order to do those upgrades in right? You bet. Well, my job is done here. For the moment. Let's wrap this up. I'm Thomas LaRock. I'm Leon Adato. And I'm Head Geek, Destiny Bertucci. Thank you for watching SolarWinds Lab today as we show you why friends really are better together.