Home > SolarWinds Lab Episode 45: Hacking the Internet with NetPath™

SolarWinds Lab Episode 45: Hacking the Internet with NetPath™

May 15, 2018 | Video

NetPath™ is a revolutionary and powerful new approach to visually discover and monitor networks. This episode of SolarWinds Lab dives into advanced network troubleshooting using SolarWinds® NetPath, and also covers internal change detection, external routing latency, time travel, NetFlow integration, and more. Join Chris O’Brien, Product Manager of Network Products, and Head Geek™ Patrick Hubbard as they uncover the mysteries of misconfigured configs on link performance for internal networks, detecting ISP routing issues, and untangling service contention inside remote SaaS provider’s networks. Learn to wield the newest weapon against service issues that drive users crazy, and hammer out problems regardless of where they occur, in your network or anywhere outside.

Back to Video Archive

Episode Transcript

Hello, and welcome back to a special NetPath episode of SolarWinds Lab. I'm Patrick Hubbard. And I'm Chris O'Brien. And I feel a little bit like we're being kept after school. No, no, no, no, no. I'm just trying to convince myself that they're trying to get upgraded to NPM 12 as soon as possible. They're just really excited. And not that we didn't just slack off last episode? Okay, we did kind of slack off. You guys are really excited about the new NetPath tool and lit up the chat client with a lot of questions. A whole lot of questions. A lot, a lot of questions. So we pushed our next topic to the next show, next to the next show, so that we can really deep dive just into NetPath this episode. And thanks so much for all your feedback, we're glad you're finding NetPath useful and hopefully we can answer the most common questions you've been asking over the last couple of weeks. And Chris, that is a great point. Please feel free to ask your questions live in the chat box over here to the right of the screen. Of course if you don't see that screen, it means that you're not watching us live, so swing by our homepage, which is lab.solarwinds.com and sign up for the next live show reminder. You know I'm signed up. [Laser zaps] So Chris, last time we talked about NetPath, it was in context with everything else in the NPM 12 release, right? So that's the new F5 load balancer support, the new interface, and the upgraded installer. But I think that segment was just a little bit short. Well, you said we had to do the entire NetPath demo in seven minutes. Well, I try to keep these things inside of 30 minutes or so. That never seems to work well. No, that never works really well. How about this? You saw all the follow-up questions, you are the product manager for NPM, and you have the conn. What would you talk about if you could talk about anything, just NetPath? I have the com? What I'd like to cover is what NetPath looks like and how you read it, and what does trouble look like in NetPath? Okay, that's great, but let's add to that how to configure paths and probes, and the secret details of how this thing actually works under the covers, because we had a lot of viewers asking about that. Packet crafting. Packet crafting, yeah. [Laser zaps] Okay, so just as a refresher from last time, what is NetPath and why did we build it? So, some of our customers, you guys, were telling us about how many of your services that were traditional services, hosted in your traditional data center and so forth, were moving out into the cloud. Exchange becomes Office 365, you've got Salesforce, you've got all these things, and you're responsible for these new services that now you have to go through the internet to reach. And you hear us talking about that, we'll talk about hybrid IT. That's really what we're talking about. Yeah, yeah, it's almost— Sometimes you don't realize it's happening, but you are being pulled into this bigger environment that includes more companies, more infrastructure than just your own. It's everything that you had to manage before, plus now this whole new... That's right. Set of technologies that are outside of your firewall. Right. All this new stuff is new, but you're still responsible as the IT guy, so what we're seeing is sort of a lack of visibility and we're hearing concerns from our customers about how they don't know how to troubleshoot or monitor, and that's not cool, man. Monitoring, if you own NPM, you need to have great network monitoring. So we started in the lab, building this tool, to see what we could do to start providing visibility and some problem isolation, and help people solve these problems. Right, and especially now that so many of those resources that move out into the cloud, you're— they become more and more opaque from a monitoring perspective, and certainly in terms of an administrative control. Yeah, absolutely. So being able to figure out what's going on is an even greater challenge, certainly than being able to use— well, certainly nobody's going to give you root, but give you pretty much unlimited access, administrative access, to everything in your environment. Yeah. As network engineers, we draw it as a big cloud. It's like our abstraction saying, "We don't know what's in here." "We don't care." "It doesn't matter to us." But it does matter now. So we need to be able to pop open the cloud and see what's going on. Okay. One of the things I like when you get into the interface here, right off the bat is, if you're not sure how to use NetPath, go up here and click on How to Use NetPath. And it'll actually explain to you pretty much what we just talked about. So maybe they could have done that instead, and saved some time. No. But seriously, so, yeah. It'll walk you through here. You can actually step by step through each of the components and come away with a pretty good sense of how this works. Mm-hmm. I think the view that most people start off with, and when you install it, by default, we'll actually set up a monitor, right? Yes, yes. We'll install what we call probe, the source of the path, in your main poller, and then we'll go start polling Google, just to give you some idea of what a path looks like. Well, walk us through that. So this is kind of, if you're exploring something out on the internet, this is typical of what you would see. Yeah. Here, our Austin office, which is our probe here in Austin, goes to Google. This is the probe on the left, the source of the path, that goes through a bunch of different nodes on our network, as well as through our ISP, through some other companies, and finally to Google, where they're providing the content. So at the end here, we have the destination server, and you can see that complete path intent. Oh, that's great. And I think this view, you've got a certain amount of detail turned on, and these buttons over here on the side will let you turn this on and off, right? So this one is at maximum detail view. Otherwise, you will just see a little bit less. So when we have been talking about hop-by-hop latency analysis, I don't see hops there. I see logical organizations of clusters of hops. So the internet is a little bit complicated, so we summarize some of that for you. So if we jump into this network here, Time Warner Cable, that's our ISP. If you click on that network, it will break it out into the hops that comprise that network, and then we're starting to see this hop-by-hop performance, right? From this node to this node, your traffic going through that is imposed two milliseconds of delay. And we can get some more details if we hover over that, about how much packet loss there is, how much transit likelihood. So very cool. One of the things that we've discovered, and is definitely new for internet paths is that most of these paths are multi-paths. And in fact, thanks to academia, we know that over 80% of internet paths are multi-paths. So you can see that here, as well. Our ISP has two connections from this AG 33 router over to their next layer. And each of those connections only take a portion of the traffic. So this link takes 47%, the other link takes the remaining amount. And you can see that continue. We can continue through. So there's another network owned by Time Warner Cable, and finally Google's network. Whew! So there's a little bit of merger and acquisition between the two Time Warner Network. There's Time Warner Cable and Time Warner Telecom. You can actually see that surface in what we're seeing in their networks. Then we get into Google, and I would never expect Google to have a complex internal network. Google's a little complicated. They have quite a few nodes. They are very geographically dispersed. The important thing here is, when you place the probe at the source, it's going to get service like other users at that location. So when we place a probe at the Austin office, we're seeing the content servers that Google is using to provide content to this Austin office. And that's what we're trying to provide visibility into, right? And if you click, particularly with dispersed— geographically dispersed services like Google, bigger services, you can use this history bar at the bottom and select a value, and you'll actually get the network at that time. Right. And we talk about being able to do time travel and look at what these routes were back in the past, but how many times do you guys have a trouble ticket. Oh, no, not a trouble ticket. The worst example is, an executive comes by and says, "Hey, how's it going in IT?" "Oh, it's good." "Yeah, it seems to be going pretty good upstairs, "but man, last Thursday the internet was just so slow." That's right. Well, being able to then ask approximately what time on Thursday, and then go back and look, and a lot of time identify an issue, to then reply and say, "Hey, yeah, by the way, Google was having a problem." Or, "Our ISP was having a problem." And being able to describe it then in very technical terms that person cannot possibly understand and thereby making them feel like you've got it all under control... That's very important. That's very important. So one of the things we see with Google, as we move through time, that first graph had a single end point. Those end points change over time, because it is so geographically dispersed. So we can see then, moving around a little bit, and sometimes changing. Meanwhile, we always get the path that matches the user experience. Okay, so we've talked about paths and probes just a little bit, so let's take a second and just untangle that. So a path, if I'm thinking about like, ping. A path is just me to my destination for that ping. That's right. For any network tap, traffic, the path is the route through your network or any other network from the source to that destination service. And this case, we're actually telling you how it's getting there. Before, I would have no idea. Yes. That's the magic of NetPath. Right. And then probe is virtual you, right? Right. So you wouldn't typically, if you have a small network, you might put your probe in your knock. And in fact, the probe, that's installed by default when you install IPM or you do your upgrade, will be on your primary poller. Yes. So that's how everyone thinks about it, but that's not the way to continue thinking about it. The idea is that the probe is a representative of your user. So if you start thinking about it like that, you start thinking, I'm going to place probes where I have users. So for SolarWinds, we have users in Austin, we have users in Lehi and Brno, and Singapore and Australia. All these places, that's where I want to place probes--not my main poller. And, you also don't need instances of server for that. You can install it on pretty much any halfway-modern Windows desktop. Windows client, we will push the agent uses the agent architecture we already have for SAM in QoE; this is just a new piece of technology, what we call a plug-in. So the same Windows client will work. That's awesome. Well, let's talk about how that gets done. All right. So to configure a path, you're going to use the NetPath services dialogue. That's in the dashboard. And of course, you guessed a new dashboard, right? So we're going to come over here to NetPath services, and that's going to bring us to this page. Here's the ones we've already defined, and this is on a different server now that we're going to get into in just a little bit. But it's pretty straightforward. And if you think about this, we're going to start with create new service. So, we break down the pieces of this. The service is the destination, right? Yep. Google, email, file shares, any destination. And the port. And the port, that's right. The TCP port. The TCP port. And that's really important, because you could very easily have completely different routes for completely different services based on the applications being delivered. Different routes and, even more common, different QS behaviors. So we want to capture the behavior that you're seeing for your traffic. And that's really the big difference with Traceroute, right? Traceroute is usually using ping. Whatever route ping would discover. But you will have completely different potential routes for different applications services. So this way, you're going to see exactly what your users on that application are going to generate. It's one of the key differences, that's right. Okay. So the service is the end point. And the probe be is going to be where we send it from. Yep, that's always your source. So then, the path is going to be the probe plus the service. Mm-hmm. Okay. So to create a new one here, like if we wanted to say, oh, I don't know, demo.solarwinds.com on Port 80. And this one doesn't need it. The alias would be the common name for it. That's right, some nickname. Okay. And then interval I'm going to probe it, and it's not polling, because this is different. There's a lot of different things going on, and we'll get into that in just a little bit. Right. But in essence, it's different, right? Because polling suggests you have some administrative access. That's not what we have. So we're using probing here. Okay. So the next thing we'll say is we're going to assign that to a probe. If you wanted to create a new probe, you'd click on Create New Probe, and you'd basically give it the credentials it would need to go in and install it. And it's using the agent framework, right? That's right. So host name, creds, we will go deploy that probe for you. And then just like any of the other agents, it'll keep it up to date for you, and in fact, I think it's actually a plug-in into the existing framework. That's right. Okay. So if you have an agent already running somewhere, it's just going to install a little shim of code, it's not actually going to set up a whole new agent? Yeah. And if it's already set up for that, you really don't need any extra credentials. It's just going to use the credentials you have, right? Mm-hmm. Okay. So, if you're going to create a new one— and in this case, we're not going to create a new one. We'll use one we already have. It'll tell me right here, in the probe we're connected to, how many of the possible paths that we've assigned so that you can at least load balance a little bit. If you have tons and tons and tons of paths in one location, you might want to break that up a little bit. And then we just say, create. And we can see that it's been created: demo.solarwinds.com. And it's telling me that the first poll has not yet completed. It'll take a little while, and it may be up to this case, the first probing interval is 10 minutes, so maybe about 10 minutes. That's right. What else do we need to configure, here? That's it. That's actually all you have to do. So we all know and love our network atlas where we can spend hours making our maps look exactly how we want. But here, you give us the destination, you assign a probe, you're done. So the diagrams that we were looking at before, we're going to look at again in a minute, are all being generated automatically as a result of the route detection. Just that effort we did there. Okay, so we saw how to set up a path. And before that, we saw how to explore the internet beyond your firewall. So what does it look like in NetPath, and how do you do troubleshooting when there is actually a problem? Here we have a path to Salesforce. Salesforce is really important at SolarWinds. So, at the left hand side, we see my computer going to Salesforce. Salesforce, in this case, has two... Low bounds. Yep, two content servers providing that content. We can see the front of the path. We have a section of the network that is our network, right? These are the routers, these are R2, our R3, what we're used to. And I was going to make a point there. You'll notice a couple of these routes are thicker in the diagram, and that tells you that more traffic is actually flowing over that link. Yep, that's transit likelihood. So 71% versus 30%. Most of the traffic is going over that top link. Okay. Through our network, we end up— actually, it looks like here we have a node that is internal, but it's unmonitored, so we may want to add that to write. And finally that connects to our ISP, and out through this backbone provider, TeliaSonera, and over to Salesforce. So these are all the companies that must have their network working properly for Salesforce to work for my users. So let's say a user comes up— User comes up and says, "Hey, at 3 p.m., I was having a problem." We'll bump out of full screen, here. And we can see across the bottom, we've got our chart here that shows our end-to-end performance. But the unique thing about the chart being that you can actually select the interval where you're seeing a problem. So we bump up to 300 milliseconds here. Conveniently coded red so you know there was a problem. And we'll talk about what the definition of a problem is in just a minute. So we select that value, and it will pull us to the network path at that time, including its performance. So here we can see a couple places where there's red. Okay, hold on. Right off the bat, I'm red at my end, because something in this overall path is being affected, right? Yes. But just off the bat, right off the bat, I can see that there's another red link here that looks like it's inside my firewall, so just in this first view, do I suspect there's a problem inside? Looks like there's a problem in your network. So let's break this down, right? We select this. It looks like the problem is affecting both of our end points. And it's averagely latency of 280 there. We see it highlighted in red. And we can actually expand issues here and see that the latency value is over critical threshold. So we use human language there to explain what's going on. And another thing here. A lot of times people will ask, you know, how we get all the detail information in here. DNS is a wonderful thing, and it can tell you a lot of things. Like, one of the things that it's also giving here is the contact information for that particular node. That's right. That comes from one of the internet databases, and you can get even more information if you drill down and start inspecting the DNS, if you're careful in how you parse through it. So here, I can see these are aggregation, AG, aggregation routers. Likely this XE2-2-1 is the interface name. So I can actually get down to the interface name sometimes when people are using that as part of their DNS standard. So pulling back out, hiding some of that detail. We don't need that now, because it's all green. We saw the problem is affecting both end points. The only other spot that's red is right here in our network. So let's zoom into that. So, looks like between our R3 and R5, there's 240 milliseconds of latency. Most of our traffic is going over that link. Actually, this is ... Comparison is extremely important in troubleshooting, so let's click back and verify. Yeah, that was 15 milliseconds before. So you can actually candle back and forth. Yep. 15 milliseconds before, now its 234. It's definitely a problem, definitely a new behavior. NetPath has identified that the problem is right here. It's also, after NetPath discovers where your transit impact is, which is really important. Where your transit impact is, the thing that's affecting your users is, then it'll start to bubble up information from NPM, NTA, and NCM about what may be causing that transit impact. Here we see a config change occurred around the time that this transit impact happened, on the device that appears to be involved in the transit impact, right? So we've got configuration chains that happen between 3:43 and 4:16 p.m. Looks like the problem started around 3:50 p.m., ran through 4:10, 4:20 or so. So we know there was a config change. What was the config change? We'll just click, and it'll tell us what changed in your configuration. And here it’s using NCM to get that. Absolutely. So the DIF view that you're familiar with. Looks like this line was changed. Well, if you're messing around with traffic shaping, that is going to affect it. That's right. And if we we're good troubleshooters here, we think Ethernet 1.0 has a traffic shaper applied. And there's one less zero here, so the traffic shaper is more constricting on that Ethernet 1.0. And we will jump back over to the graph here. Can actually see, this is ETH-1.0 that this traffic was coming in on. Oh, and we know that because that's a monitored node. Right, you're monitoring this interface, in fact. Which means another way to solve this problem is, instead of removing the traffic shaper; you can take a look at the traffic that is being sent through that shaper. So here we can see our NTA information coming across. Traffic shapers are egress on this interface, so we'll click our egress here. We can see there's a very small amount of traffic being sent through this interface, so likely we're shaping, at a low percentage, of that interface's available bandwidth. Right, so we can start seeing this is the largest conversation. It's only 3%. And if we were to click into this, we can actually take a look at the NTA data, the deeper flow analysis for all of that traffic. Okay. The interesting thing about this is that if we were looking at this from a regular monitoring perspective, there is nothing wrong with any of these devices. The CPU utilization, traffic, interface—all fine. We have this misconfiguration of the traffic shaper behind the scenes that's causing a problem, and it's causing an application problem because the transit likelihood of this link is driving a bunch of stuff over a slow-responding link, where we've added a bunch of latency. But the network is behaving as it's been programmed. So from the perspective of, if we forgot about that shaping rule that we had applied, there is nothing wrong. But the application is having a problem. Sometimes. Sometimes. Like, one out of a number of packets, it's... Right, right. So we see the transit likelihood of 67% on this link, means some of the customers are getting to the link and it's going fine. Or when it's coming back and they potentially get routed to a different path because they got a new speed conversation, then they may get a link that's working, they may get a link that's not working. And so one of the things that's really interesting about NetPath, is when you have these really difficult to diagnose intermittent problems, or problems that affect some people but not other people, particularly when they happen in the path. This starts to allow you to pull that apart and understand each individual component, and which component is causing that bad behavior. Right. And you wouldn't have gotten— in the past, you would not have gotten alerts to let you know that users for that application would potentially be having a problem some percentage of the time. In this case, now, you can alert them. That's right. And we had talked about that before once, about sort of bad latency, the difference between good latency and bad latency, assuming that it is possible to have good latency. But in this case, this is bad latency specifically for that link, as opposed to something in the overall path. It's a really interesting challenge that we had when we were trying to figure out how to make an intelligent presentation of whether each link was healthy or unhealthy, and which node was healthy or unhealthy. The challenge is, in any of these paths, particularly those that go over the internet, you will have portions of the path that take very little time, like between your own equipment and from inside one room in a building to another room inside that same building, between wiring closets. Meanwhile, you will also have single hops that are going over an internet backbone, that may be going over hundreds or even a thousand miles, or undersea cables, whatever that might be. And so the latency for that wiring closet to wiring closet link needs to be something like two milliseconds, less than two milliseconds. Right. Meanwhile, the latency for the longer hauling, 30 milliseconds may be entirely fine within the United States. More may be entirely fine for undersea cable. So the challenge is, when you color a link red, it can't just be because the latency is over 100 milliseconds. What we want to do, is we want for our LAN local link, if it goes from five milliseconds to 30 milliseconds, that should be red, right? Or at least warning. Meanwhile, if the long-distance link under the ocean, or what have you, is at 100 milliseconds, might be entirely fine. Right. So there's a big difference in the performance of each one of these links, and we really need to be able to figure out the difference between those so we can color red or green. All right. So let's say we've got that example where there's an undersea link, or any other link that's outside of your firewall connecting to a remote application. In this case, we saw something that was caused by a config change that we made in our network, which is great, because then we as the administrator can log in and fix that. Right? Answer, closed ticket, solved. Done. If it's in the ISP's network, what do you do? Yeah, so let's zoom out here, and we'll have to— if take this same scenario as an example, if we zoom out and take a look at Salesforce. Okay, this is Salesforce's network. Let's go ahead and expand that and see the nodes that are part of that network. Here we're still, of course, seeing that hop-by-hop performance. We can see how much of our traffic is using each of these links. So it looks like of these four links, most of our traffic is going over that top link. Now, this is performing fine now, but let's pretend it's not performing fine. Okay. So, it's multi-home, one link out of four links. One link, right. So some of our users sometimes get bad service. That's the challenge, troubleshooting-wise. So if we actually mouse over the nodes on each side, we can start pulling out the information we need. This is owned by Salesforce, and if we click, we can see the prefix they're routing on, who's originating that, as well as contact information. So here's their phone number and their email address. And we can then have a specific contact. We know Salesforce owns this node. I'm going to call them on that phone number, and I'm going to tell them something precise and actionable. So an example of that would be, Hey, you have your edge router 13.108.50 connecting to your aggregation router 204.14.237.131, and around about 3:50 p.m. to 4:20 p.m., we were seeing bad behavior where 60% of our traffic was going through that link, and we were experience 100 milliseconds of delay, or 20% packet loss, or whatever it may be. Okay, so, on the ISP, two things happen. First of all, and when we've done this--and certainly a lot of the customers who've already upgraded are beginning to experience this. We saw a lot of this during the beta period. ISPs would say, first, "That is extremely helpful." "Hold on one second." And they could get to a troubleshooting step much, much more quickly for their internal network than they otherwise would. Because otherwise you're going to say, "I have this percentage thing," with some anecdotal information about when it happened. So they seem to, first of all, be grateful, because you can get them to resolution quickly, and it's something that doesn't just affect us. It affects any of the users who are using that service. So they're usually happy to receive that. They want their network to work well. It's just a fact of the matter that customers calling in and saying, "Hey, sometimes our internet is slow and we think it's you." Not super helpful. Right. Which brings us to the second point. And we have had, occasionally, people— actually, we had a customer I was talking to not too long ago, that they were named contact in one of the databases, one of the publicly available databases that we're pulling this information from. So for Time Warner, they went and actually removed their cell number from that. So that brings us to the second point, which is, they often times ask, how do we get this information? Because there's a part of it that seems, "Wow, this is some kind of amazing spoofing that's going on," or we're looking inside of firewalls. But this is all publicly available information. So how do we technically do that? This is not anything like Traceroute. So what are we doing? There's really two pieces that we're putting together here. One is, there are internet databases that have contact information for networks. This is designed for network engineers by network engineers to help us manage networks. Right? Right, which we should. Basically, public internet networks. Because we want people to help us make sure that our network's performing well. Right. And the other piece of it is this technology to figure out what the path is, who's involved, and what specific performance you're seeing between hops. So it's really only the combination of those two that allows you to reach the intelligent conclusion of, I have a problem in Salesforce's network at this spot specifically. And now I have the phone number to tell them about it. Okay, so I imagine this is some sort of custom-crafted packet, where we are messing with the information in the packet, the port, and the TTL information. You imagine right. I didn't exactly imagine that. It's our information. Cheated a little bit. But one of the questions we get really commonly is, "How do you get this information?" "Where is this information coming from?" And particularly, "Is this just visual Traceroute?" The way that Traceroute works is by incrementing TTLs and sending ICMP or UDP traffic across a path to try and get responses. You may notice that often times Traceroute is blocked. Oftentimes Traceroute gives you information that's a little bit unintuitive, like hop number three has high packet loss, while hop number four has no packet loss. What does that mean? So there's a layer of— there's two things different that we're doing in NetPath. The first of which is, we are crafting our own packets, TCP packets, that look like the application traffic. That's to solve the first problem when the firewall's blocking our traffic. Right, so application specific, application management firewall looks the packet headers and everything else and says, "I know what this is, this is an HTTP request." It looks like your application. So we're allowed through when the application is allowed through, which again is the story of the view that you see in NetPath is mimicking the behavior that your customers are getting in their browser. Well, and it's also behaving. We're not doing anything that the normal application would not do. Yes, we're very careful about that. So, for example, if you have a firewall midstream here that is blocking traffic for this specific application, we will see that here as a block. And that's what you want to see. You'll see the path dies right there. The path dies at that firewall. And as a firewall owner, especially when you are looking at segmented networks for security inside your own environment, that's really nice, to be able to walk down the hall and say, "You're blocking my traffic." And fix it quickly. It's really, really handy to do that. So the second piece of the equation there is that there's a lot of data that we come up with, particularly for internet paths. Some of these paths have 50, 100, 150 nodes, and there's performance data for each one of the links between them. So we do a lot of analysis on that data to find out where the problem is and expose that in a simple way. So this is an example of, we've got data for all of these different links, but we found this specific link is the problem, and the problem is that that link has high latency. Okay, so, what we were looking at there, the errors that were introduced— the latency errors that were introduced into that path so we can see it. To experiment with that, you really would not want to break your own network. Yeah, so I totally went down to IT and was like, "What part of the network will you allow me to break?" And they were like, "None of it." I am so surprised to hear that. Yeah. We just want to add a bunch of latency to Salesforce. It's going to be fine. Yeah, we'll see what happens. Yeah. We're a monitoring company. Okay, so when you showed me what you were doing here, I want to give you a chance to take two minutes and really geek out on this, because this is really, really cool. So, show me how you created this network. Yeah, sure. So I used—let me log into the box here. Okay, so this box would be your Mac. Right, this is my MacBook. So, everything's running on this box. We're using GNS3, which is an amazing network emulation tool. Yep, we've got a lab episode on it. You guys definitely want to check it out. So, we see, we've got our Windows 2012 server that's running Orion. We've got a GNS3 VM. This is Parallels. Yes, Parallels is the hypervisor there, running the Windows VM. We've got GNS3 VM running through VirtualBox as the hypervisor. That's what they prefer, so that's how they packaged it. Meanwhile, we've got GNS3 of the GUI running local to this box, and we've got a local Dynamips instance as well. So if we walk through this topology, just as you saw in the demo, this is where our probe is. That's where Orion server is, and also our probe. In this case, that is connected to a set of routers. I've got a WIN emulator where I can inject latency and packet loss; make sure we're detecting it accurately. That connects through over to this internet cloud. So let me break down the crazy amount of virtualization in this. So this is all running on Apple OSX. Within Apple OSX, there's the Parallels hypervisor, where I'm running a Windows 2012 VM, that runs Orion. That's host. That connects through a virtual interface to my local Dynamips instance. Dynamips then, thankfully, routes that over, I don't know how, to the GNS3 VM, running on Virtual Box, which is a different hypervisor. That'd be the third level of virtualization. Yeah. That GNS3 VM is where R2, R3, R4, R5 and so on, are running. That routes back over to my Dynamics, local Dynamips instance here. That connects to a virtual interface on OSX. OSX using Packet Filter that's built in to the OSX operating system. That's where we do the firewalling and also NAT back to my Wi-Fi interface on that device, and that's how it connects over to the internet. So the first portion of the path you see, that is the local network as provided by the GNS3. Tack onto that, the rest of the internet, and you get that end-to-end visibility. So what's cool here is I can now take my laptop to anywhere there is Wi-Fi and demonstrate not only a local network, but also how it connects out from their internet all the way to the destination service. That's really handy. Especially if you were, I don't know, a consultant, or and NSP, or something else, and wanted to be able to walk on-site, pre-configured, and talk to them about their ISP performance. That's right. Their ISP performance and the level of visibility they would get for their local network if it was managed in Orion. That's awesome, and thanks for going through that. I know you've spent a lot of time on this. I don't think you've ever gotten to talk about it before, and this is a work of art. It's cool stuff like, I can't believe that we're in a day and age where I can have my laptop and be walking around to meetings, and have like, two VMs, two hypervisors, Dynamips, all of this stuff going on--virtualize a dozen routers connected off to the internet and the Wi-Fi, NAT out, RDP, and HTTP services. And the dev group is using my box while I'm walking to meetings. How crazy is that? It's amazing. On one laptop. And will you help me justify maybe getting a Mac? We can work on that. [Laser zaps] Okay, so let's come back over here for one second. And I do want to talk about one other thing, and I'm looking at the Google path here again. The one that really is a shocker is Yahoo. I would not expect that to be more complex, but it is. It is. And we picked Google here, because we're not worried about hurting Google. Yeah. They're not going to get their feelings hurt. And it also is a better way to start, because I think a lot of us use Google as a reference, so you'll see a lot of nodes that you recognize. You want to check internet, you go to Google. Maybe that's not how it should be, but that's how it is. So here's a question for you, then. Where do we go from here? And I know we're not going to reveal any kind of current roadmap or anything, but what are some of the things that you're thinking about? You were talking to me yesterday, something about ISP metrics and a couple of other things. So what sorts of things are just asking question about? What could we do with this data? So, this is a whole different style of data and degree of data, particularly about the internet. And so we're starting to think about, what are some other ways we can use that? So here we're identifying what part of the network is yours, what part of the network is your ISP, what part is the destination, and then the transit that connects your ISP to the destination network, right? So we can break down the performance of each of those chunks and where problems occur in each of those chunks. So then you start thinking, well, what if I looked at my entire network, and starting to think about, if I'm using ISP one, what does my latency look like versus ISP two, across all of my dispersed locations. Or, which ISP causes me the most problems? Or interesting questions like, how often is the problem in my network versus my ISP, or versus the destination? Or versus someone else? So this is really interesting data that— NetPath shows you a single path, and it answers some of the questions, but now that we have that deeper level of data, we're starting to come up with a lot of other interesting questions. Right, so you would encourage everyone who's watching to come and visit the NPM forum and talk about the sorts of ideas they have around what they would like see, and where they would like us to go with NetPath. What would you do with this data? Awesome. [Laser zaps] Well Chris, thanks so much for coming on the show. It's always a pleasure to have you, and especially thanks for being here to walk us through the details of NetPath. You bet. And thanks to all of you out there who asked so many great NetPath questions during the NPM 12 show. Hopefully we got them all answered today. Yes. And of course, if you have more questions, you can always hit me up, or you can ask Chris directly in the NPM product forum on THWACK. And be sure to get upgraded soon. We added pre-flight checks to the new installer to make it easier than ever before. Yeah, and there's more goodies to come, and you're going to want to be on version 12 as soon as you can. That's not very subtle. Nothing about SolarWinds Lab is subtle. That is true. Okay, so I think we got I covered this time. Please keep your feedback coming to our homepage, which is, of course, lab.solarwinds.com. And thanks again for all of you who have already upgraded. Your comments on NPM12, and especially NetPath, have been amazing, and I think the dev team more than anyone is really enjoying reading them. I think dev is the only team more isolated than IT. Yes, but they get to take longer lunches. True. And with that, thanks as always for watching SolarWinds Lab. I'm Chris O'Brien. And I'm Patrick Hubbard. See you next time.

SolarWinds Lab Episode 45: Hacking the Internet with NetPath™

Episode Transcript

Tags

The Ultimate Service Desk Metric

SolarWinds Lab Episode 54: Monitoring 201

SolarWinds Lab Episode 45: Hacking the Internet with NetPath™

Episode Transcript

Tags

The Ultimate Service Desk Metric

SolarWinds Lab Episode 54: Monitoring 201

You may also like