Home > How to Solve Real World Application Problems With APM – SolarWinds Lab Episode #83

How to Solve Real World Application Problems With APM – SolarWinds Lab Episode #83

January 23, 2020 | Video

Based on one of the most popular SWUG^™ (SolarWinds User Group) sessions of 2019, Jim Hansen, SolarWinds VP of application management products, shows you how to combine user experience monitoring with custom metrics, distributed tracing, log analytics, and log management to provide unparalleled visibility into your custom applications. Jim will demonstrate, step by step, how Pingdom^®, AppOptics^™, and Loggly^® integrate with one another to help you pinpoint performance issues and keep your end users happy. This episode will be a true live event, and you’ll have an opportunity to ask Jim questions and hear his answers in real time.

Back to Video Archive

Episode Transcript

Hey everybody, welcome back to SolarWinds Lab.

Yeah, and Leon it is so great to have you here. I think you were probably our most common live pre-presenter but we’re all here right now. Tom is on the mega cast right now so he’s upstairs and we also have Sascha Giese with us in person. Usually you’re just more recently, what over Teams from Cork?

Yeah, kind of transmitted via magic, right?

Yes, the Teams magic, right?

Yeah.

Yup. And of course all the way down there on the right-hand side Jim Hansen. How are you doing?

Hey everybody.

Jim of course, you all know him, he’s been on three or four episodes now of SolarWinds Lab.

Yeah, thereabouts.

Yeah, so what are you going to be talking about?

Today we’re going to talk about application performance monitoring.

Yeah, and this show’s a little different.

Right, it’s, as you know if you’ve been watching THWACKcamp and a few other things, we’ve been doing a lot of different formats, different presentation styles, different locations and things, and that’s because you’ve been asking us to change it up a little bit. So this time the whole episode is live. Like normally our intros are live but this whole thing is going to be live, live. So, you’re ready for that?

Live?

Live, like actually live, live demos.

Like live, live?

Live, live demos.

Everything.

Yeah.

OK, let’s do it.

Surprise, OK.

All right.

OK, so yeah, we’re really excited about that.

So it’s actually the first time that we found a victim for this.

I think the word you’re looking for is volunteer.

Ironman.

Whatever.

Yeah, exactly.

OK.

OK, so we’re ready to go, you ready?

Yeah, I think we’re ready to go, let’s do it.

All right, so let’s go ahead, let’s kick off this episode, here we come. We’re still here.

Hey guys, I think you need to get off now.

Oh, it’s live.

Oh yeah.

Live, you said that I think right?

Yeah we did, OK fine.

OK, have fun.

Go ahead, go, go. All right, oh hey, don’t forget get on chat.

I know, I know.

All right, fine.

OK, all right, great.

I turned off your mics.

All right, well hello everybody and welcome to SolarWinds Lab, live. I’m your host for this episode, Jim Hansen, not Hensen, Hansen, and I am going to be your host for today and we’re going to use this session to talk about application performance monitoring and how you can troubleshoot your custom applications and find issues and resolve those issues and get those applications up and running as quickly as you possibly can.

Now, I’ve got a couple of examples as we go through the session here that I’m going to show you and I’m going to also walk through all the different products that are used as part of the SolarWinds APM suite that’ll help you get to both a root cause and an understanding of what the actual issues are. Now, in addition to that, I want to remind you that as we go through this session, if you’re not already on the chat, please get on chat. I’ve got a couple guys over here in the corner, they’re my PMs that are going to be helping to answer some questions as we go and if it’s a special question they’re going to give it to me at the end and I’ll just answer it for you live, OK, all right. So let’s go ahead and get going.

Now, just to set the stage a little bit in terms of what we’re talking about here, we know that every single organization across the planet has some kinds of applications, it doesn’t matter if you’re a retail chain, if you’re a financial organization, if you’re a healthcare organization, you have applications. And these applications are effectively the life blood of your organization, they keep you functioning, they keep your customers and your partners, or in some cases perhaps even your internal employees visible and they can see what’s going on and they have access to the services that they need to do their jobs effectively. Now, it also for those organizations that deliver this service to customers, it’s the way that you make money right, so it’s super important in a lot of cases that these applications stay functional, they stay operational, they stay performant, and ultimately we want to make sure that we have visibility into what’s happening with those applications. And again, that’s really what we’re going to show you here today. Now, these applications could be pretty simple, right? They could be something like Office 365, they could be something more complex which includes microservices. If you have made that huge journey to the cloud or started to move to the cloud into more of a microservice type of architecture you’ll have some of these applications that need to be monitored, right, and they’re a little bit more complex though they’re intentionally designed to be able to provide higher degrees of scalability and so forth and so on. Now, as far as this environment that we’re going to be looking at here today, imagine for just a minute that you were an administrator of a hotel booking site.

OK, and that’s going to be kind of what we use to show you some of the different examples that I’m going to show you here in a little bit as we go through the actual demonstration. So just imagine that you’re this administrator and that you’re having to keep this, oh and it turns out I actually have one of those hotel booking things right here on the screen. Now, this is just a little bit of a fictitious site that we put together to give you kind of an idea as to how you would interact with this, right. You have your users, they come to the site, they interact with this site if I can figure out how to use this PC, as you can see I use a Mac. So, let’s see, and so you know, you might come in here and for example, you’ll say, “Gosh I want to find all of the hotels in London,” right? So, I’m going to do a search on that site and inevitably you’re interacting with it as an end user and this is a good example because I think it illustrates where when you have people engaging either in your application or the service that you’re delivering it’s extremely important that this thing remains functional and that it remains up and running. Sometimes you’re going to get in here and it’s just not going to work properly and you want to understand why it’s not working properly as an IT administrator, right. And this is actually where the challenge comes into play. When you’re managing applications, applications first off, you as an IT team or as a monitoring team, or as an IT pro type of organization, you’re responsible for keeping that thing up and running. You’re responsible for keeping it performant and ultimately making sure that that service is actually available. You’re being asked to meet or rather reduce some of the mean time to resolution, from point in time when an issue takes place down to when you can actually resolve that issue, i.e., the MTTR. You’re also expected to work in some of these more complex types of environments, we already talked about microservices. You might also have other factors to consider. You might have part of your application sitting in an on-premises infrastructure, you might have part of your infrastructure sitting inside of the cloud, you might have something that crosses those two things and is ultimately kind of a hybrid type of environment. Regardless of what your situation is, you need a set of tools that give you the visibility that you need to be able to effectively get access to what’s going on when an issue is taking place related to that end user and the experience that they have all the way down into the application, the database, the systems, and ultimately the network itself because we know that all of those different elements play a factor in terms of how that application functions.

If something breaks in any one of those components of the stack, things don’t work, right, and so that becomes really important. Now, the first thing that I want talk to you about is again, the APM Suite and this is really important. I promise you this is the only slide I’m going to show you during this entire presentation, but I do want to at least set the foundation. I want you to understand what it is that we’re talking about as part of the various components that help you get that visibility. And the first is Pingdom. Pingdom is an application that we make available which effectively gives you what is in the market referred to web app performance monitoring, but effectively includes everything from very basic up-time monitoring. You want to know whether or not that website is even there or the application is there. Are the APIs that you’re serving as part of that application or service, are those available? Are they there? You then might also want to use something like page speed monitoring, which allows you to be able to understand not only is it there but is it performing well, is the time it’s taking for that page to load what we’re expecting to see or is there something getting in the way from the end user being able to actually access that and get to the service quickly, right. And again, think about that hotel booking example. If you’re trying to get into the system and you click on that little you know, go to button in order to do the booking and it just spins and spins and spins, well, that doesn’t really do you any good, right? You want those users to be able to get through that booking experience as quickly as you possibly can and that’s where that Pingdom page speed capability comes into play.

We also have within Pingdom, synthetic monitoring and this is kind of a fancy way of saying let me monitor transactions that take place. So imagine for example when you get into that hotel site, the entire transaction of being able to select through all of those different elements to get to a place where you click on book this room and you get to closure, that is the entire transaction, but there’s lots of transactions that your application or your service may actually be doing, so you want to be able to have visibility into that. And then lastly, there’s a component within Pingdom called RUM or real user monitoring, and real user monitoring is basically what are my real users actually doing within this site. It’s not synthetic in any kind of way, it’s not me running a check on the transaction, it’s just showing what users are doing and how they’re actually interacting with the site either in real time or even historically, OK. The second part of this puzzle is this notion of just general application performance monitoring. You can think of this in the context of really two different things and this is where things like traces and metrics come into play. And this is where you look at things like well, what is actually happening within the application. So instead of looking at things from the outside in, being able to see whether that application or service is functional, now I can actually look at it on the actual system itself. So think of that as more of the inside out view of the actual application itself, OK. Now that’s where SolarWinds AppOptics comes into play. And we include two primary capabilities there. One of which is code profiling and the second is basically infrastructure monitoring.

So let me go collect all of the little data points about my infrastructure and my systems and so forth and so on, so that I can understand what’s happening. If I see a queue that starts to build up and we know that when the queue builds up it causes a performance issue, then now we can actually proactively go identify those issues and resolve those issues. And again, I’ll show you some examples here in a couple minutes. And then the last piece of course is log monitoring and we know that within the infrastructure from the very top of the stack all the way down to the bottom of the stack, including the network, that logs get generated by all of these different devices and components. And those, all those little devices and components when you look at that log data, it can be pretty complex sometimes to be able to especially in a distributive architecture try to figure out what’s actually going on. So, having all of this capability together more or less in a single place that you can get visibility into it is actually pretty valuable. So, those are the three components that we’re going to talk about as I mentioned, promised there’d only be one slide and you’re probably thinking wait a minute, you just showed three. It’s actually one, it’s a build, but anyway. Let me go ahead and switch back over now and I want to take you over to the first of those pieces which is basically Pingdom, OK.

Now, when you get into Pingdom as we talked about there are those four major capabilities. Uptime, transactions, page speed, and RUM. I’m actually going to start right now with uptime. Now traditionally, you’re an IT person, you’ve gone in here, you’ve set up a bunch of stuff and eventually you’re looking at your phone, you’re sitting in a meeting and you’re like oh, there’s an alert. Right, I need to answer that alert and that alert’s going to be something related to one of these kinds of checks that you have within the application. So those checks like a page speed or an uptime check is going to be able to identify whether or not there’s a problem and then you’re going to get notified on you know, your favorite device to let you know hey, there’s a problem and then you can go launch into the application and you can take a look at it. Well, I’m not on my phone I’m on my computer so I can’t really show you that, but what I do want to do is show you a couple of examples here.

So, when you look at this uptime set of checks that we’ve actually got defined in here, there’s a couple things that I want to point out. Number one, and this is something that in some of the conversations that I’ve had most recently with several customers who they deliver services either to partners or to the vendors within their supply chain that they actually work with and one of the things that we find is that most people when they think about hey is my service up and running, they only really think about that main location of their site. So you might go to, you know, Solarsuites.info which is my hotel booking site and inside of there that might be the only thing that you look at but the reality is, there’s actually a ton of services and pages within that that you really do want to know are they functioning. If your API is down for example, you know that your partners and other folks that are trying to leverage that API, they’re not going to have the ability to interact with your service and ultimately that potentially costs you money, right. So let me go ahead and drill into this example here. So this is an API check that we’ve set up, which is just an uptime check, and the reason as I’ve mentioned that this is important because if that API in this case is down what the uptime check is doing is we’ve got just a little over 100 probes which sit all around the world and those probes are basically constantly connecting to the resource that you’ve asked for, in this case the API, to basically ask it hey are you there? I want to know whether you’re up and running. If you’re not up and running we then have a small algorithm within the product that actually then evaluates well is it really up or is it just not up for maybe 10 or 15 seconds. Is it a network glitch somewhere between the user and the actual site itself? And what that allows us to do is from multiple regions throughout the world, we can actually assess each one of those resources and evaluate whether or not that resource is available and online. And again, availability as we know is a really important aspect.

Now in this case you can see, and we’ve set this up purposely of course, the API is basically down, right. Now, of course it’s down all the time as you can see but that’s actually really important to understand. And of course, based on this we might now want to go drill in and see why that API is down. Now, before we go off and do that and again, we purposely turned off the API just so it would do that, there’s another example here that I want to show you. In my hotel service here in North America I’ve actually set up a series of checks that are also checking to make sure that within the North America region both on the west and on the east that we have the ability to determine whether or not this hotel site is actually available for our users to be able to see. Now, you can see when you get in here that there is really just a couple of small glitches, one back on January 9th, we had one on the 14th. What’s today? Today’s like, I don’t know what today is, 15th yeah, thank you, the 15th. So today’s the 15th so we’ve had a couple small outages just in the last 24 hours and of course then there’s also some page speed-related response times that we’re also getting here as well. Now, what’s interesting about this is that if I wanted to go in and drill into say one of these errors here what we can see is the actual root cause and this is actually one of the really cool things about doing this kind of outside in analysis. What we can do is we can evaluate that website based off of the information that we’re either getting back from the website itself or from the API call or whatever it is that we’re actually doing the assessment against and now this gives us a little bit of information about what’s actually going on underneath the hood. In this case we can see that in this particular case this particular error, this downtime was actually caused by essentially an HTTP 503, right, which we all know is just a service is unavailable. Now, it doesn’t necessarily mean that the site is not there it just means that it for some reason isn’t able to process that request and respond to that request, right. And so this gives us at least some insight into where we may want to go into a little bit deeper and take a look.

Now, in a minute or so what I’ll do is I’ll go through and I’ll show you from here once we know that oh looks like it might have to with in this case ’cause this is my hotel site, this might have to do with my booking service, right. So I’m going to go kind of drill into that here in just a second. But before I do that I actually want to show you one more thing and that is the real user monitoring insights because when we’re doing troubleshooting of the actual applications, understanding what our current users who are in the site are actually experiencing is pretty important and it’s pretty valuable, right. In this case I’ve already set one up for the hotel booking website. If you click on the details here and there’s a whole bunch of stuff in here, you guys can go play around with it later if you want to. I just want to kind of walk you through and show you like how do you use this thing to go understand what’s going on. But when we drill into the details, there’s really two pieces that we see. Number one, experience, meaning what are the users actually doing within the website, where are they coming from, what kind of, you know, how many sessions do we actually have, what kind of you know, page views are we getting, the countries from which they’re coming from, platforms that they’re on, and browsers, and when we’re hosting a service or an application that’s reliant on connection to the service, this gives us a lot of insight or visitor insight into what’s actually happening so that we can understand where our users are coming from and it helps demographically for us to be able to understand how to actually deliver this service more effectively.

Now, the piece that I wanted to actually drill into here is the actual performance tab and the reason is because when we actually look at the performance, first off, what we’re looking at is what’s going on performance-wise right now. But you can see for example within the load time that if we look at this, there’s two little bars here and I think you can see it on the screen but you can see a little gray bar which is here, this is basically showing the previous 24 hours as related to the current 24 hours, right. And that’s really important because if you start to see anything that is highly deviant, if we see for example load times spiking in some kind of area and you can see for example even as we look at roughly around 6:30 this morning it looks like we actually did have a little bit of a load time spike on our hotel site which you know, could be a problem in this case that we want to go investigate. Now, the way that we would investigate this and this is actually really important because this is where doing application performance monitoring again, can be pretty tricky, right. When we’re looking at the data from the outside in we don’t necessarily know what’s happening under the hood. What’s happening within the application, Pingdom gives us a good view of what’s happening from an end user experience but now we actually want to look underneath and see what’s going on there.

Now, to do that what we’re going to do is we’re going to click on this little tab here or little button, which actually takes us effectively over to AppOptics and as we talked about just a few minutes ago, AppOptics is that metrics and code profiling component which allows us to now go get deep visibility into both the infrastructure and the actual code itself in terms of what it’s doing and all of the calls that are being made across the entire application stack itself. Now, in this case we can look at just at a very high level all the different services that we’ve actually got available here right, so we see Web Tier, Transaction Service, and so forth, and you’ll notice under here that even within looks like earlier this morning there was a whole bunch of activity in here which looked like it was you know, taking a lot longer than it normally would to process. And so what we might want to do is go drill into that and if you recall from what we were just talking about when you see something like a 503, 503 could be that the service is just being overloaded for some reason and it’s a good indicator as to why you want to when you get that alert, go look at it, drill into it, go dig into the application itself to understand what’s happening in the application so then you can actually get into understand the root cause of the actual issue itself. Now, in this case what I’m going to do is I’m going to do one side track here and then I’ll come back to the screen here in just a moment. What I want to do is I want to show you the infrastructure metrics because this usually comes into play pretty quickly. And let’s see, I think we’ve got, it’s the demo environment dashboard and the reason is because when we collect metrics, first off, the infrastructure and the stack has a whole bunch of data that if you don’t have visibility into and if you don’t have the visibility along with the rest of your application data, it becomes really difficult to troubleshoot and understand what’s going on.

So, this data, once you’ve collected it you can build a very similar, a simple dashboard like this. There’s a bunch of stuff that we provide out of the box if you care about it but if not, go take the metrics that matter to you. In this case, for example within our demo environment we’ve actually created something like a demo host CPU, host memory, the disk percentage, the load averages and this give us now across the different systems that we’ve got within the demo environment or within this, sorry, hotel application, we can actually start to now get much better understanding of where things are performing and where they’re not and you can see even here at the infrastructure level things weren’t actually performing all that great earlier this morning, right. And so again, an indicator that there could be a problem. We can also see everything from you know, the information that’s coming in and out of the actual demo environment, you can see that that spiked. So really, I mean already even after just a couple minutes I can already tell that in our demo environment we probably had something going on which was causing some traffic to basically increase at a level which then had an adverse affect on the system, right, or on the service. But we’re going to dig into this a little bit further.

Now, let me go back over to the place where we were just a minute ago if you recall this page and now what I want to do is I want to drill into the actual web tier itself and what I want to do from here is as you’re looking at what’s being displayed here what we’re showing you is all of the trace level, code profiling that’s being done on the application itself. And this code profiling is really important because this now gives me a service-by-service view of exactly where all of the performance bottlenecks are and you can see things like here there’s a spike here, there’s a spike here of course, and it looks like in this case most of it’s happening with the application as opposed to an underlying booking service, remote call, or even the authorization service. I can imagine in some cases you go to this website and you try to login and just kind of keeps going and spins, right. When that happens it could be the authentication service that’s having an issue and again, as an IT pro you want to go figure that out pretty quickly. Now, in this case what we’re going to want to do is we’re going to actually want to drill into this into a little bit more detail, right. And to do that what we’re going to do is we’re going to go ahead and click into here and from here what I’m going to do is I’m going to click on Trace Requests. And what this is, is for every time an actual trace happens which is the code profiling component we’re going to actually keep track of every single call that’s actually being made within the application, so we can understand what’s happening from again, the very bottom of the stack to the very top of the stack. When I click on this guy this now gives me a little bit better visibility into a whole bunch of additional information about the, in this case, the host controller, right.

Now, from here we’re going to drill into just this little section right here and this is now a subset of those traces, which now help me understand OK, well, I know that this thing is having at least in this case about a five second response time which in the context of this kind of application if you’re waiting five seconds people start to get a little bit annoyed, right. And so in the context of a hotel booking site you want this thing to respond probably a little bit faster than that. So what I’m going to do in this case is I’m going to actually click on that trace and this takes me into the trace view and this trace view is really important and you might look at this and say oh my goodness this is a whole lot of information. But fundamentally what this is showing you is that every time the actual calls get made, we can actually see how much time across each of these layers, Java, Spring, you know, we can see, what’s the purple there, MongoDB and so forth that all of these calls are being made. Now, at the same time you scroll down a little bit and we can actually see more detail about the actual queries that are being made in this case as well. And you can see for example here that there’s a couple kind of strange things happening, right. For every trace that’s being run there’s basically a query that’s being executed 762 times, this is not 762 times in the day, this is not 762 times over an hour, this is in this one trace. Basically what’s happening is this database call is being made 762 times, right. Now, again, this is just an example. This is based off of a real-world example that we’ve seen in the past and basically when you see this kind of thing you can already pretty much tell it’s the database, right. And as our Head Geeks know it’s always the database. I’m just kidding, it’s not always the database but in this case when we drill down into this particular example, it’s really just a database issue that’s causing the actual problem and it’s actually not, I should take that back, it’s not really a database issue in this case, it’s more of an application which is going into some kind of loop which is creating this massive number of database calls, which is probably pounding on the database and a result of that database getting pounded on is things just start to slow down because it’s getting pummeled by this particular application service, right. And so ultimately, you know, as you saw here this just gives us a ton of visibility all the way down into in this case, the database lever to understand this is the actual call that’s being made and of course, the next step in this process would be to actually go have a conversation with the app guys, figure out why this thing is being made so many times, and fix that. And by fixing that we can actually reduce the load time and ultimately increase the overall performance of the environment itself.

Now, if you recall from the very beginning, I mentioned that there’s also another capability within here that’s super important. We’ve talked about the end-user experience, we’ve talked about the traces, we’ve talked about the metrics that we’re collecting off of the systems itself, the one thing that we haven’t talked about is logs, right. Because again, logs are being generated by pretty much everything. Now, in this case one of the cool things that we’ve done and this is another layer as you saw between Pingdom and AppOptics, another layer of integration that we’ve done to be able to take you from looking at a set of traces or even a particular trace and narrowing that down into the very specific set of log messages that are being generated about that same time that that trace is being generated. So the way that we do this is that we actually take and we throw an agent onto the system and that agent now has the ability to instrument the log data that we’re collecting and bringing into Loggly and it also then includes as you can see here this trace ID. So we know that in that particular instance where that set of database calls was being made now we have more visibility into the actual log data that’s sitting underneath the hood and now we can start to explore this to try to also figure out was there something else that was happening on that system which may be causing that loop to actually take place that we saw. Now, as you start to dig through this log data of course, you can see here it’s not a lot, right. Imagine with all of the terabytes and terabytes of log data or hundreds of gigabytes or whatever it is that you have of log data that you’re collecting from all of these applications and systems and infrastructure, being able to narrow this down to just that basic set of log data is pretty powerful. Of course, we can leverage this to go search around and look for other types of things as well but fundamentally we now know that for this particular issue related to as you can see here the web tier and so forth and so on I can now drill down to just that basic set of log data to figure out what’s going on.

Now there’s a number of other tools within Loggly that you can use as well to search and look for various kinds of data but being able to go from the very top of the stack from the outside in, looking at the end user down to the actual application itself all the way into that system data and the log data is incredibly powerful and it’s something that we made possible through this application performance monitoring type of activity. Now again, this was just a simple example of how we take you through that entire process but it’s something that I think is extremely valuable and extremely powerful for you if you are responsible for doing monitoring engineering, if you are a network engineer, a systems engineer, a database engineer, like whatever it is that you do in the IT profession, you’re ultimately responsible for managing applications, and this is something that is actually pretty helpful. And again, I think I’ve shown you at least a couple of examples of how you can leverage these to be able to go do that. Now, I want to pause for a minute and I think what we said we were going to do here towards the end was we’re going to try to take some questions. There’s a question here about AppOptics and Pingdom and here’s the question. It says AppOptics and Pingdom both monitor web applications, but how are they different? At its most fundamental level, it’s really important to understand that when you’re looking at things from the end-user experience perspective you want to look at things as though you were that user out in the world, right. So if I’m for example, connecting to my hotel booking site and I’m trying to book a hotel room, I want to be able to do things like determine can I get to the site. Can I go through and actually execute a transaction on that site so that I can buy something? I mean, think about going out to you know, your favorite I don’t know, I don’t know if I’m allowed to say the A word, but you know, the vendor that you go buy things from and you know, if you were to go in there and try to check out and it doesn’t work, like you’re like, uh, OK. Well, if you’re in a hotel booking site or something of that nature and it doesn’t work, you’re going to go someplace else and try something different, right. So, you know, that’s where the difference between basically AppOptics and Pingdom, Pingdom is all about the outside in. It’s being able to help you understand that end-user experience whereas AppOptics is about being inside out, right, so understanding what it is that the application is doing and where within that application stack from the actual code down to the database and the actual server and the systems to understand where that bottleneck is and where the problems may actually exist, OK.

Those are the differences between the two. Oh goodness, difference between Pingdom and WPM? So, some of you may know that within SolarWinds we have another product which is called WPM, right. It’s Web Performance Monitor and inside of that application or part of that application, which is tied to the Orion Platform, you can think of it also more as an inside out kind of tool. Within WPM it’s designed to help you if you’re within your firewall to be able to go a look at and understand things. You remember I mentioned within Pingdom there are these probes? The probes sit all across the globe and those probes are designed to give you that outside in view, whereas Pingdom is there sitting on the inside of the firewall and it can still give you visibility into those applications within the firewall but of course, when you’re looking at a probe from outside the world it’s much more difficult to be able to look at those applications from within the infrastructure. So that’s the primary difference. The other major difference comes down to the real user monitoring functionality as well, right. And that real user monitoring functionality is something that Pingdom makes available and WPM does not but both products are actually super complementary to what you would potentially be trying to do in terms of being able to understand the health and performance of your applications. How much dev work is needed to get code visibility shown in AppOptics? So, this is kind of an interesting one. So, there’s a general I would say misunderstanding that to do application performance monitoring and specifically code profiling that it’s super hard to do, that you gotta go in and you’ve gotta do all this massive instrumentation of your application. Generally speaking it’s not true. With AppOptics you basically take what we call an agent but it’s effectively just a library, you drop that library into your code, and we do the rest. It actually simplifies the process of being able to go get code-level visibility into things. So if you do that, you drop that library in and into the application, basically we get full visibility into the various places within that application where calls are being made and so forth and so on so it’s actually very easy to do. How do I know if a critical transaction step such as a login or a search of a shopping cart feature is failing and why? This is where you would leverage something like Pingdom synthetics, right, we talked about that transaction monitoring capability.

Transaction monitoring is explicitly designed to be able to go through those various steps of the form and determine whether or not the user has the ability to get through that. Again, it’s called synthetics because it’s not actually a real user going through it, it’s an automated process which is going to test every step along the way, but when you actually get in there it actually gives you the ability to be able to understand what’s going on at the individual step level within that transaction and in fact, let me see if I can show you that real quick, hopefully. We’re back to Pingdom here. Let’s see, if I go into transactions and we go back over to US East, we look at this guy, we’re going to drill into the actual transaction itself and what this is going to do then is its’ going to show me basically all the steps and this one’s pretty boring because it looks like it might have just one step in it. Let me choose a different one, hold on here just a second. All right, transactions, all right. You guys are seeing that right? Yeah, OK, cool. Let’s see, so let’s choose the GoldenEye booking. Let’s go with US West, all right, so if we we’re to look at this guy here, now, this one’s actually going to show us a number of different steps within that actual transaction which are either doing well or not. It also shows you basically as you start to look at this you can see how much time it’s actually taking for the synthetic process to go through each one of these steps as well, right.

So in this case like for example, if we were to look at here we can see that in this case there was an error, we click on this to go understand the root cause, and in this case it actually looks like that within the application there’s yet another issue here, which is that that actual path that was defined to go through doesn’t actually exist. Now again, we can go drill into this and we can go troubleshoot it in a similar way that we showed just a couple minutes ago. But again, that’s just an example. Ah, here’s a good one. So we’re moving to containers and we’re moving to Kubernetes. How do these products work in those kinds of environments? Such a great question. So how many folks are moving from more traditional on-prem types of infrastructure for their applications to Kubernetes and maybe to Docker and things like that? And the reality is a lot of people are, right. It makes sense in terms of being able to manage and scale from an elastic type of perspective and it makes those applications easier to manage, but it also makes them incredibly more complex and with the AppOptics product, we have the ability to have visibility all the way down into the actual containers itself and we can provide you that same level of visibility that we’ve talked about through the application into those containers. So if for example you’ve got a container that is not behaving properly through the metrics that we’re collecting about that container, you’d be able to see hey, looks like this container is not behaving, maybe that container needs to be rebooted and/or killed and a new container started, right.

Alternatively, you can also see it at the application level so that code profiling does have the ability to get into the application code that might be running within that container and now we’d be able to get visibility into what’s happening at that application tier as well. So, that would be actually done through AppOptics. What configuration or coding is necessary to get the integration between AppOptics and Loggly running? Such a fantastic question. It’s actually super simple. So, when you take and install AppOptics, remember we talked about the code profiling component, you’d instrument your application by just dropping in the library and letting that library do all the work for you but on the log side we also have something which we call a snap agent. And the snap agent is more like a traditional agent, you install that onto the system. If you’re running that in containers or like again in Kubernetes, you can put that into the build process so that every time that thing starts up, that little agent is there and basically what that allows is it allows that code library to talk directly to the snap agent and provide the ID correlation between the log data that’s it’s collecting off that container and then the application traces that are actually being run within that environment as well, so it’s super simple. So you take the little library, you put it in, take the snap agent, drop the snap agent in, and voila, you’re good to go. Are there any more questions that you guys are seeing? Oh, how licensing works, where’s that guy. OK, so licensing for these products is pretty simple. So, you know, if you are an existing SolarWinds customer, you’re probably familiar of course with more of the perpetual kind of model that we have within most of our Orion-based products.

Within these the application performance monitoring side of things, these are all subscriptions. So if for example you’re only interested say in doing real user monitoring, you can just buy that set of functionality and there would be a set price for that and of course you can go take a look at our website if you’re really interested and you’ll see all the prices out there. Alternatively, let’s say you want AppOptics. AppOptics does have a subscription for itself as well, which is largely just based off of the number of hosts that you want to go run or potentially the number of containers depending on kind of what your environment looks like and so we have a you know, basically a subscription price based off of number of hosts that you’re going to run this within. And then lastly, on the Loggly side, Loggly is priced a little bit differently. It’s actually based off of data volume. So if you have say a I don’t know, 100 gigabytes of data that you’re collecting on a daily basis, then we would charge you one thing. If you had a terabyte we would charge you obviously a little bit more than that, so just depends but it’s all based off of the data volume that you’re bringing in. So, each product is licensed right now individually. The APM Suite itself, i.e., all three of these products is designed, they’re all designed as we talked about to work together but ultimately you can still decide which are the pieces that I care about the most. If you want to use them all of course you can get a subscription to all and leverage them all or you can just do one or the other depending on what you care about.

All right, all right, what else do we have? [background chatter] No, we can do that, yeah sure. Yeah, so yeah, I guess that’s, looks like that might be all of the questions that we have right now, yeah, nothing else, no, no, we’re not getting anything, OK. All right, so if that’s it then I guess we’ll go ahead and wrap it up and we’ll give you some time back in your day. If you want to ask any more questions or if you want to get a little bit more detail on anything that we talked about, I’d be happy to have a conversation with you. You can reach out to me at Jim, that’s J I M, not G Y M, dot Hansen and that is H A N S E N @ solarwinds.com. I’d be happy to answer your questions and if I can’t answer them, I’ll find one of the folks on my team either one of the PMs or the PMMs and we can certainly try to get you whatever answers that you need. Other than that, hey listen, I want to say thank you for joining us today. I appreciate your time, I hope you got something out of this and you know, look forward to seeing you guys next time. Thank you.

How to Solve Real World Application Problems With APM – SolarWinds Lab Episode #83

Episode Transcript

Tags

6 Types of IT Assets Your CMDB Should Track

Don’t Shut Your Pi-Hole, Monitor It! (The Sequel)

How to Solve Real World Application Problems With APM – SolarWinds Lab Episode #83

Episode Transcript

Tags

6 Types of IT Assets Your CMDB Should Track

Don’t Shut Your Pi-Hole, Monitor It! (The Sequel)

You may also like