You may think of yourself as only a network administrator, but if you’re making production changes outside of scheduled maintenance windows, you’re DevOps. If you’re managing cloud and DevOps processes, putting best practices in place will allow you more flexibility and reduce the risk that you’ll have to stay late at the office. In this episode, the Head Geeks introduce DevOps advantages, using cloud services from Pingdom® and Librato® as examples. Patrick Hubbard talks with Cloud Manager, Gerardo Dada, about the future of SolarWinds cloud products. The Head Geeks also demo the new Pingdom mobile client and demo automated monitoring Minecraft servers in the cloud.
Hello, and welcome to a special episode of SolarWinds Lab. I’m Patrick Hubbard.
I’m Kong Yang.
And I’m Thomas LaRock, and— wait, what do you mean, special episode?
Well, today we’re talking about cloud and managing cloud applications.
Yeah, we’ve already done that. I did a demo of DPA right here, running in Azure.
And we did a how-to using Orion agents, monitoring cloud applications, and even monitoring cloud network traffic.
That’s true, with pros, but that’s not quite what I’m talking about. We’ve been talking about it from the perspective of hybrid cloud, where you have some resources that are maybe in Azure or AWS, but the majority of your systems are actually still in a traditional data center.
Right, that’s the same thing.
It’s not the same thing. Today, we’re talking about cloud-only administration. As in, your primary job is managing enterprise systems delivered from AWS or Azure alone. Think of it as admins who are okay with never configuring a router or dealing with Exchange updates.
Ah, you mean the “real” cloud.
Maybe. I think a better way to think of it is to think about what stereotypical cloud admins look like.
Oh, oh, how about this? So, the 20-something, super sharp, never touched a router or installed Oracle RAC and knows how to pronounce Hadoop correctly.
Yeah, but you said it correctly. Okay, that’s Eric, first of all. But I see where you’re going. I mean, he does sort of fit the first stereotype.
Or how about this? Career IT, great with numbers. Moving to the cloud to manage costs.
Okay, well that’s actually Brad, but there have been a number of admins who have moved to cloud in order to manage costs. No, no, here’s what a cloud owner looks like.
You really are an egomaniac.
You been at SolarWinds a long time, and you’ve worked in data centers for years, but what makes you think you’re a cloud admin?
Ah, this. [Groaning]
Oh no, the Raspberry Pi again?
Yeah, we just did a whole show on LINUX. But no, you’re right; we did, and thanks, by the way, everyone for the great LINUX ideas on our home page, which is lab.solarwinds.com.
And if you like the shop, please be sure to sign up for reminders and check past episodes like the LINUX show.
Absolutely, but what makes me a cloud owner is that although I’m an IT guy, I really am a programmer. And I live for DevOps, so while I will configure a router manually if I have to, I would much rather build automation that does it at the push of a button, so that I can do one or a million transactions with the same effort.
So what you’re really saying is the qualifier for cloud is geeking out with automation. Too bad Gerardo isn’t here. He’d love this.
Too bad I’m not here for what?
Ah, that’s creepy! It’s like the Nube’s Cube Rob Hawk episodes.
Nah, that’s Gerardo Dada, welcome to Solar Winds Lab.
Glad to be here.
Okay, so tell you what. I can already tell you’re going to do all the cloud-only products, so Kong and I, we have to get ready for Cisco Live and BM World, so we can take off and we’ll come back at the end of the show. But promise me one thing?
You will show DPA in the Amazon Marketplace.
You got it.
Yeah, that’s up first.
It’ll mostly be cloudy while we’re gone.
Thanks for coming. We’re about to get really cloud geeky. For those of you who haven’t met Gerardo, he’s the VP of SolarWinds cloud products and also a geek. You were certainly a business guy, you were focused on delivering software for customers, but you’re a geek.
Yeah, many years before that I started programming at a computer with a couple kilobytes of memory, storage system that was cassette tape, basically. Used dBase, Assembly Language, PASCAL, you name it. And also installed a bunch of Novell Network networks.
Right, but you come from a programming background and an application delivery background, not so much packets on a network.
That is correct. It’s all about software.
It is all about software. And that’s really—the thing here is that, automation is that theme where software is the platform. We say that things are software configurable, but in this case, cloud really is sitting entirely on software.
Absolutely. So, if you have a passion for software and for using software to control hardware, so that everything can be monitored, everything can be automated— automation is a key point as you mentioned, then you really have a cloud mindset. So being a cloud person is more about how you think about your job and how you think about the processes and automating things from development to operations, you know, DevOps, the new buzzword. It’s more about that and automation and monitoring, and making everything like a continuous stream of work, than the fact that you’re building out a data center as your own company or a co-location facility or a cloud company.
A lot of that, I think you start off, when you’re junior in your career, you almost end up sometimes as a software apologist. The platform limits, what you’re able to do, and you think of it as black box, and you’re focused on “this is an Oracle instance” or “this is a piece of network gear that I have to configure to get stuff done.” Or almost like it’s this evil box of goo that you have to deal with. And when you finally make that jump and say, I want to participate in that, I want to get involved in actually configuring software and actually driving how that automation works, then the whole world opens up and then the problems that were driving you crazy before and are bedeviling, you automate out and they disappear.
Exactly, and everything’s becoming software. You heard software-defined networks, software-defined hardware, infrastructure as a service, software-defined storage, even. Everything in the cloud cannot function with automation scale. It’s all about using the software to scale out.
Exactly. Now, the other thing about cloud is open source. That’s a big part of this.
Absolutely, and open source is interesting because it’s easy to get your hands on. That’s a big part of the cloud: everybody can play with it, there’s no barriers to entry. So you can start using software, you can create a server in a few minutes, right?
Right. The other thing about it is, it’s kind of like for enterprise admins, you might start with Nagios as a free solution to do monitoring, and the same way in AWS, you’ll actually start with Graphite, right?
And that’s because it’s free and it’s accessible and you want to play with it. It’s fun to take Nagios and Graphite and Collectd, and all these technologies, and meld them together any way you want and you can play with it like Lego. But open source software can be great even at scale, some even very large-scale commercial deployments, but also can become very expensive and very hard to maintain. We’ve seen that, often times, the task goes from “let’s use monitoring because we want to get insights” to “let’s keep on playing with the monitoring technologies.” And you spend more time connecting that data and playing with the monitoring tools where they become a task in and by itself. And some large companies have dedicated teams that just focus on monitoring tools, and that’s not what you want.
And there’s also a difference in motivation. You use the word “play,” and it really is. You’re playing with it, you’re enjoying it, you’re having a great time setting it up. But once it becomes just another tool that you need to get your job done, after a while the play aspect kind of diminishes, and now it’s a thing, and it’s just a lot easier to find something that just works.
And the idea of playing with the cloud is about results, it’s about being dynamic and getting things done faster. And getting things faster doesn’t mean you cannot do that if you have to spend hours connecting more things together. You want a tool that you just turn on, get your data, and get ready to get inside, so you can spend more time optimizing your infrastructure and optimizing your application and delivering results to the business, rather than having to figure out how everything plugs together on the back end.
Right, an extreme version of that might be if you wanted to monitor CloudWatch directly, build your own queries, and pull stuff down. I was talking to a guy the other day who actually pulls that down, he builds Excel spreadsheets and then crunches on that, and that’s just wrong. You should not be doing that.
Exactly, so the first thing that companies do when they get serious about the cloud is they upgrade their Collectd, Graphite, Grafana setups because like Nagios, they get a point where they hit a wall. They say, well, I can spend more hours doing this. But your time as a professional is valuable. It sounds free, but if you’re going to spend a full day setting up your monitoring system and a couple hours a week doing that, then it’s not free anymore. It’s your own personal time. And for the company, it’s costing hundreds of dollars per hour. If you are an AWS monitoring guru, you don’t want to spend time instrumenting and configuring and doing all those things. You just want to turn it on and have it work.
Right, and that’s really the point, too. You want to just turn it on. You don’t want to install anything; you just want to hit the button. You want it to be built as you use it and have it just instantly available.
And it needs to be flexible, so it needs to be able to collect data from whatever thing you’re playing with and whatever the business requires you to monitor.
Are you saying I play with stuff? All right, kind of. Okay, this is the part where we get to really geek out here. We’re going to show three different technologies here that I don’t think any of you guys have ever seen as an example of how this works. The first thing is, we’re going to show DPA.
Database Performance Analyzer.
Database Performance Analyzer, but in the Amazon Marketplace. As in, click a button and run it against RDS. We’re going to show Librato and we’re going to show Pingdom.
Fantastic, so why don’t we start with DPA for 60,000 milliseconds and then focus on AWS for hosted performance and monitors?
We did tell Tom we were going to do that first, and 60,000 milliseconds, you are a geek.
See, there you go.
All right, let’s take a look at this.
Let’s show DP in the Marketplace.
We’re going to it in the Marketplace. And this is one thing I’m really excited about here, because look, here we are in the AWS Marketplace. We are not on solarwinds.com, we are out here working in our cloud environment, and I’m going to come up here to the top, and what am I going to type?
SolarWinds. There you go.
Look at that, Database Performance Analyzer. I’m going to click that, and here I’ve got a regular instance to actually stand this up as an EC2 instance. The nice thing, it’s going to tell me everything that it’s going to be using, it’s going to tell me what my fees are going to be. And the other thing that’s nice here is, you’re going to get a free trial with this, so the only thing you’re really going to do is click continue. And that’s going to bring me to my launch page. And if I want, of course, I tend to click manual launch because I like to have a lot of control over here. Tell it which AMI ID I want to use and where I want to put that, specify security groups, the rest of it, but it’s really easy to just use one click.
Absolutely, and the nice thing is that you can get DPA up and running in a few minutes, and you can monitor EC2 databases or RDS databases, because DPA is an agentless system. And you can monitor both SQL Server or Oracle, or even DB2 or Cybase databases.
Absolutely. There’s a ton of videos on how to use Database Performance Analyzer. We have talked about it a million times. If you want to see those, we’ll put the links down below and you can check those out. But rather than dive into that, I just wanted you guys to see this so that if you are using RDS, you’re a heavy user of Oracle, or SQL Server, or Cybase, or MySQL, and you want to have really deep down to the query level analytics for how that’s performing, in cloud, that’s available and there’s nothing to recall here.
Here’s the important thing in the cloud or even on-premises: about 80% of all performance problems come from the database. When you think about it, most applications are built to store data and extract data. Have you ever been traveling, you go to a rental car agent and he says, “I’m sorry, my computer is slow”? Or you’re on the phone with the —
Yeah, it’s the database.
Everybody says, “I’m sorry, my computer is slow.” It’s either working for the network or the database. So the database being the heart of the application, that’s where all the performance problems you’re going to find, and that’s where you’re going to find all the performance goodness as well. So that’s why DPA is so important, especially cloud, where everything’s changing so quickly.
So then, the other thing you’re monitoring a lot in cloud is just availability of web applications.
Absolutely. At the end of the day, infrastructure and applications are only useful to the extent they provide a useful website to people. Pingdom actually is known for uptime, but you want to know four things about your application. Is it working? Which is uptime. Second, how fast is it performing? Third, what is the experience of users that are using the application itself? And four, can it actually do what it’s supposed to do? Meaning, can it complete transactions?
And last, you don’t want to install anything and you want it to work.
Of course, you want to monitor it from the outside in so that if your systems go down, you need to have a system that is actually up and monitoring from different parts in the world.
Exactly, and up from everywhere is sort of a very different thing from up from a traditional not core.
Right, and the web is flat, so you want to know what is the performance for users all over the world, not only the ones that are in the same country where you are.
Yeah, all right. So that then brings us to Pingdom. I logged in at pingdom.com; I’ve set a few things up here. And I’m not saying that I’ve hidden anything exactly, but you guys don’t need to know everything that I’m monitoring, especially things like IP addresses. I can see right off the bat where my major errors are. I can tell whether I’ve got someone on coverage or not for managing alerts. If I take a look at monitoring, just basic things like uptime, for example, and I’m monitoring a couple of different things here. I’ve got a couple of regular HTP monitors that I’ve got a couple sites up, I’m making sure that the SE demo site’s working. I’m also doing a TCP monitor here to make sure that my Orion HGSS ports are working the way I expect, sort of independently of the application so that I can debug something a little bit easier. And you don’t want to know what I’m doing here, testing my weather API.
Why don’t we take a look at transaction monitoring? Transaction monitoring allows you to look at multiple steps on websites. It could be a search function, could be a checkout function, could be really anything, even a login. The most basic functions on your website, you want to make sure those things actually work. So you have a very basic scripting language where you can define multiple steps and what you expect to happen in those multiple steps. So you can verify everything on the website is working properly.
Yeah, and it’s nice, because it’s sort of in between WPM, which uses a recorder approach where you actually record what you want to test, or something like a full web performance scripting library. This gives you really tight control over what elements you’re testing and making sure that it’s lean and tight and you have full knowledge of every step you expect to execute. But at the same time, you’re not wring scripts. The nice thing about it is when you take a look at your data…
There you go. You can see your transaction. Not only can transactions be completed, but what is the time to complete those transactions over time.
Yeah, really, really cool. And the other thing here, you’ll look right down here, because this is a cloud-based tool, you are not installing something. You’re only paying for what you need, you turn it on, and then you’re actually doing it by the number of checks and a couple of other elements, but it’s really easy to use. Now, the thing that I really like is the mobile app. I’m actually going to run over here to camera three and I’m going to show that to them right quick. Hang on one second.
I know I’m kind of surprising you here, Andy, thanks for focusing on that. All right, so, here we go. I’m going to come in here to my app; I’m going to take a look at my checks. These are the same ones that we were looking at before. And I just love how responsive this thing is. And we come in here and I’ll take a look at this guy. This is actually one that’s going out to a server of mine that I’m not going to explain just right now. I can actually see, in addition to my overall uptime chart, I can see the availability, how it’s behaving itself. Same response time that we were looking at before. And then I get sort of my detailed check info. I’m going to blur that out just a little bit. The cool thing here when we look at outages is, how many times do you have to try to figure out what’s causing an outage? Well, it’s smart enough to actually give you a little detail here about what it’s doing to try to figure out what the root cause is. If I drill in here and look at this particular outage, I can see here, it’s getting this— here’s one that happened. This is on the 6th, 3:16 in the afternoon. It says, I think this target might be down. So then it hammers on it from a couple different sites and says, yeah, this thing is actually really down-down. Then I can come in here and it’s going to take a look at the root cause, and what it’s trying to do now is figure out what’s actually causing that problem. It’s going to go do IP resolution, take a look at the trace route, then it’s going to go do a content pull and it’s going to keep banging on it until it can figure out. And you’ll see here: Netherlands, Amsterdam, Toronto. It’s using a whole bunch of different sites to try to figure out what’s going on because it might be just a partial reachability issue for that target. And then here it comes back up and it’s telling me the first place that it saw it was in New York. And the other thing that they do that’s nice is right here. It tells me how long it was out. So that first question, you get an email, “Hey, how long were we down?” We were only down for 10 minutes. So while you’re looking at that, it’s just really, really handy. That part of the app is just really great and it’s right there in your pocket. And then, of course, you get alerts and all the other good stuff.
The main goal of the mobile application is that you want to be the first to know when something goes down. And you want to have enough information so you can actually get to start fixing it, so hopefully you’ll fix it before anybody else notices.
Yeah, absolutely. All right, so the next thing we’re going to take a look at is RUM reports.
Today when you’re looking at uptime, you can see the uptime as it’s perceived by different probes around the world. You can decide if you want to monitor from Seattle, Toronto, different parts in the world. And those are from these Pingdom probes. But you can also see RUM, which stands for real user monitoring, which means you can see exactly what is the performance and page load times for individual users that are hitting your website or your web application cloud application from mobile devices, tablets, PCs, in any part of the world. So you can see if your users, let’s say, in Africa, are having a performance issue because of latency or what have you. Or if users from mobile devices are having longer page load times than you want them to have.
Exactly, especially when you’re doing dynamic follow-the-sun provisioning for services. You want to make sure that those response times map to where your customers actually are. Because I’m willing to take a longer ping time at 3 o’clock in the morning in the U.S. if I’ve moved a bunch of my instances over into Europe or into my Asian servers since they’re closer to customers and I’m not doing 95th capacity percentile provisioning in every geo all the time.
Exactly. But even further, if you want to know what is slowing that website, so you can actually see what is slowing down those page load times by looking at what I call the Waterfall Report. It’s actually free. You can go to the free tools section on the Pingdom website and you can see exactly the page load time on each of the elements, from CSS files to graphics, etc. You can find out if you need to zip up some of your elements on your website or you want to use a CDN for your images to load faster in different parts of the world.
Right, how many times does that happen with CloudFront, where you’re sitting there trying to decide, do I want to actually host all of this content directly out of my web app or do I want to go ahead and start using CloudFront to actually project some of that and have some tighter level of your CDN control?
And that’s the key thing about monitoring. You need to know what’s actually going to impact your performance, not only guess.
Exactly, which, and definitely check this out. I was sitting here looking at this going, wow, Amazon really on their front page has got an awful lot of stuff, it’s all probably half of it tracking too. But that actually does bring up a great point, which is how do you, in the CloudFront example, how do you manage and monitor those discrete services, which are actually part of your fundamental cloud delivery platform? And for that —
We have Librato.
Librato, so let’s take a look at that.
Librato, it’s a very unique tool with what we call a real-time monitoring platform. It takes data from everywhere. You can take data from on-premises servers, from CloudWatch, even from Pingdom real-user monitoring. And then you can transform it, time slice it, correlate different metrics together, and then create beautiful visualizations that you can use to exactly find the things that you want to know about your operations. And then it has a learning system that connects to anything you might be using, like email to PagerDuty, Victor Ops, whatever it is.
And again, that’s running in the cloud, there’s nothing to install.
And it’s really great at supporting all of your open source framework, as well as your package applications.
Commercial. That’s correct.
Before I go in and show you this page, just so you can see what this looks like in terms of the AWS setup, one of the things that’s really cool about it is it’s monitoring a couple of different things. You’re using CloudWatch as the back end for metrics collection for AWS. So you basically are just going to give it a set of access keys and it’s going to start monitoring.
And you don’t have to use CloudWatch. That’s the easiest solution, but you could actually install your own agent. Part of the philosophy of Librato is that we don’t have agents that prescribe how to monitor or what to monitor. You can use any agent you want. You can instrument your code. You can instrument business processes so that they send data to Librato.
Right, an example of that— well, let me ask you this then. When you look at Heroku, for example, you are basically monitoring anything that gets kicked out to a standard out.
But there’s an agent option there too, right?
There’s an easy way, if you’re in Heroku, it’s just one of the plug-ins. Just turn it on and you start getting your dashboards in Librato.
Exactly, and it’s a lot like, Collectd’s the same way. You’re using the right HDP plugin.
Correct, and you can define exactly what you want Collectd to collect and to send to Librato. And once you have your metrics in Librato, you can do anything you want with them.
Exactly, and then for the .NET folks, and you know sooner or later I am going to have to mention .NET just for legacy.
If you’re on AppHarbor, you know you’re going to be able to pull those metrics as well.
If you are using a pass like AppHarbor, we have a plug-in that gets the data from AppHarbor, but you can also instrument your code manually, whether using Java, Python, .NET, or whatever it is you’re using, and send the data you want to Librato.
Right, and I’m going to show you an example of that when we get down here to the end of it, because that’s really the point, is that so many times when you are managing cloud applications, those apps— the differentiator for the service that you’re providing is something special to that application itself. You’re not just building a pile of open source platform products and then putting something on top. Your business is usually providing something unique and your business logic and the other things that you’ve done in the custom code for that application are a critical part of the service that you deliver. If you’re Airbnb or any of the other companies that are a true cloud-hosted platform, a lot of the things that they’ve done, they need to be able to instrument. So having those hooks to be able to plug in to your custom applications and then provided in that dashboard is a big part of it.
Absolutely, and the other big part is that in the cloud, you might have hundreds of servers, thousands of servers, or thousands of things that are contributing to performance. And you don’t want to have CPU memory and all those basic health metrics, 50 different metrics for each one of those hundred servers, and then you have a dashboard with millions of things happening, tons of alerts hitting your mobile application all the time. You want to specify exactly what are the things that are really important for you. You might want to say, I want to look at end-user performance page load times from three different regions and then I want to look at how many users do I have logged in to my application at one given point, and I want to look at what is the throughput to my database servers, for example, and some other key elements, and then combine all those things in a mashup that really shows you what’s happening inside your application.
Groovy. How many different platforms use a pretty good amount of Ruby, and between managing gems and versions and a couple of other things, it can be a little tricky to make sure that all the elements that you need are set up correctly to serve your applications. A lot of folks are actually using Bundler for that, to be able to pull those metrics, being able to build a dashboard out of it, it’s kind of handy, right? This is actually—we’re looking at the Librato dashboard here. This one’s a custom dashboard for this environment. I can actually see which versions of Bundler I’m using, the version collectors. I can actually see my Ruby versions, I can actually see all the commands that are actually being executed, how many times they’re being executed. I can do things like overall performance. I can do by load breakouts. Again, something really specific like my MEM utilization over my cash hit utilization. All of these are available in this dashboard.
How long did it take you to set up this dashboard?
I didn’t set this dashboard up actually, but I actually know how long this took to set up. One of the things, definitely let us know in the chat, and on the lab.solarwinds.com page, is if you want us to go in and show you how to do this. This was actually set up by the Librato team. I know they didn’t spend all that much time setting it up. So if you guys are interested in seeing how we actually set this up, let us know. We’re going to try to keep this show tight, but this is a great question. I will show you what it looks like on a dashboard that I set up. This dashboard is using the AWS monitoring for CloudWatch. So the first thing I need to do is go in and just do a quick—and this is really kind of a how-to before I show it to you— I need to do a quick setup and I created this read-only Librato monitoring account. And then I gave it read-only access permissions to all the things that I want it to see. Now, I am not going to be able to show you some of the things like DynamoDB monitoring, which is unfortunate, I had that working and it broke. But you can see here all of the different services that were available to monitor. Everything from CloudWatch, EC2, RDS. I can actually see my SimpleQUE services. I can look at my ElastiCache. If I’m using Mapreduce, just about anything that you can monitor with CloudWatch is available here.
Anything that is a time series data, basically, you can feed into Librato. For example, there’s an open source plugin, because it’s an open platform, somebody built an instrumentation code to send data from Nest thermostats into Librato.
Yes, I saw that actually. I was playing with that and I was thinking, I need to go ahead and pull that in, and then I discovered that I firewalled off my Nest at my house so I couldn’t pull it in, but here’s what this looks like once I went to my dashboard. Again, I did this just as a test to see how long it would actually take it to set up. And I spent about five minutes setting this up. Here is my Head Geek TL2 space. Dashboards contain multiple spaces, so that you can kind of slice it and dice it the way you want. And I’ve got a lot of the visibility for–this is for the SolarWinds deep dive training platforms. So if you want to come in and take formal training on our Enterprise products, we give you a system, a lab machine, and you can come in, it’s pre-configured, it’s ready to go.
Tell me what each one of those things are doing.
Yeah, absolutely. The trick with that environment is I’ve got about 100 VMs running in a secure undisclosed location that we don’t ever change. Those are my pets, and then my cattle are a thousand to 2,000 instances that I stand up during the week and tear down. So classical monitoring for that would really be a pain. I don’t want to have to set up monitoring for each one of those machines variable IPs, so here they just sort of come and go and they roll in and out of these reports. For example, I can look at all of my Elastic Block Store write performance, whether it’s the number of writes, the ops for those, or the read breakout, or my write breakout–and I can look at my net traffic in and out–and in each case, I can actually look at, I’ve got a select here of a couple of the students that we can actually see, and I can see how much of the time that they’re actually consuming in my total compute. I can go in and do things like estimated cost. Again, anything that I could normally get out of CloudWatch is just available to me as a metric here.
So you’re running your Minecraft server in the SolarWinds AWS training sandbox?
I did not say that. This is an integration test. So, let me show you how this works. I am running my Minecraft server in the cloud, but I promise I’m not actually running it inside of my AWS Sandbox Instance that SolarWinds is paying for. [Clears his throat] But here I can actually see the number of new blocks created and the world in which they’re being created, because I got kids on there and some other people and I want to make sure that it’s behaving itself. Let me show you how I set this up, and this was the part that I spent probably half the time doing. Adding the rest of these gauges is done exactly the same way that I’m about to show you here for this one, and I’m using this as an abstract version because if I actually have Chefmetrics or something else in here, everyone’s going to focus on that. So I sort of picked this as a neutral switch or wind of applications that you wouldn’t expect. I spent more time building the plugin for Spigot for my Minecraft server, I spent about 10 minutes building that, it was just doing some basic polling and then sending out REST queries and then five minutes doing what I’m about to show you here. This is actually showing the number of blocks created by world. So that is actually what we call a counter. And there’s sort of two types of data. There’s counter and gauges. Gauges is some value and counters are accumulating numbers over time. And it’ll sort of calculate what the base number is and you can do projections and a couple other things. That one’s based on a counter. The other one’s that kind of cool is, how many players are online? For that, I’m going to come up here and say create new chart, what do you think, stacked or line?
Okay, stacked chart. And then here’s my metrics. Now, we talked about, before, all of the things that we were monitoring before, remember when in the…
All the stuff you’re sending from CloudWatch is here.
All the things for CloudWatch right here. I’ve got AWS, billing, estimated charges. In all of these cases, if you have something that you want to take a look at, you can actually click on it and it’ll give you the metrics for it, and then you select what you want to add. I actually don’t want to put that on there, I want…
And you can also do composite metrics as well.
You can do composite metrics as well, exactly. I’m going to come back over here and I’m going to just search for mc, oh, look at that, Minecraft players online. That’s cool, so I’m going to click on that box. I’m going to say add metrics to chart. I’m going to change the chart type to, oh, stacked–it’s already checked, I’m good. I can change it right there, and I say save. Now I can see this, that’s my total number of players. If I want to do a breakout on that, I’m just going to go down here, edit chart, and then down here, I can change my display. The first thing would be how I want to do it. I’m going to change it to breakout. And I’ll just leave it as average, because that is a polling interval, so I really want the average inside that polling interval. And then that’s going to give me my stacked chart. And again, I get my metrics and it’s going to give me the world name, which is actually being sent as a part of the update. There’s basically a source that it’s doing the sub, the rollup of sub elements inside of one chart automatically for me. I didn’t have to go send four different polls. When I sent it, I just give it to world and then that one metric name, which is mc.playeronline, which I created, and then it’s automatically polling that together based on the time stamps. But what gets really cool, is if you want to add something else. For example, what if instead of metrics, I want to add events? In the case of deployments of new operating systems or database maintenance or something else, where you’re really expecting the charts to bounce around and your performance to bounce around, having those events to be able to correlate events to the metrics that you’re looking at in your cloud environment can be really tricky. In this case, I’m using the same API and I created two new events, which are Minecraft PlayerDeath and Spawn. I’m going to add those two to the chart and say save.
So you can see exactly when those events happen.
I can see when those events happen. Now, in addition to being able to see how many players are online, each one of these is telling me, here, bioDEgradeR, he spawned. Here’s one where [laughing] GnuGotMe died. I can actually look at those events here, and normally, you expect to see server events, or maybe application events…
You can check for popit when some code is checked in or code is being deployed, you can see it here. And if you don’t have an event metric defined, you can also do annotations on the charts in real-time. And then share the charts with other people on your team so they know why performance is changing at a given point in time.
That’s right, and here’s how that was done in about 15 minutes. The Librato API is amazing well documented and its super-duper easy to use. You basically go pull your API access key. You can give it one of three permissions–either full permission, update permission or read permissions–so that you can integrate it. And the other nice thing is you can pull these things out as plug-ins and route the gauges in just about any application you want. You can even publish whole dashboards publicly, if you want to. But in this case, I’m using the metrics API. Mentioned before, gauges and counters is the two types. And then I just went down here to my post action. It described what I needed to do. And it even gives me examples if I’m using a couple of different platforms for how to do that, and then it just gives me my JSON format and the elements that I need, and it even gives me examples of what that looks like when it gets sent. And a couple of different ways for using sources or arrays, or a couple of different ways of rolling that up.
Thanks for the tour back there. I never knew where we kept all those hamsters. [Laughing] Thanks for showing us DPA inside the Amazon Marketplace.
And remember, if you are running RDS, you will want your monitoring to be running right beside it. Gerardo, thanks for coming by today.
Thanks for having me on the show. I’m really excited about where we’re headed with the cloud. Be sure to stay tuned for more.
You bet. It’s great to have you visit. And Patrick, I think that now I accept that your head is really in the cloud and not just in Minecraft. [Laughing] I mean, seriously, your student training program is so heavily automated and you couldn’t operate it cost effectively if it wasn’t.
No, it would be impossible. I run it on a budget. And that’s really the thing: if DevOps and automation is your passion, then you really might have a great career in ensuring the reliable delivery of services from AWS or Azure or other platforms.
Absolutely. Hey, you should remind them to sign up.
Oooh, definitely. Today’s topic came from a specific viewer request, so we’re really interested to know what you thought about it. It’s a little bit different for us and your feedback’s really important. Do you want to see more about cloud and open source? Do you want to see how to do deep dive, maybe, for RDS set up or AWS monitoring? Make sure you swing by our home page, which is lab.solarwinds.com. You can sign up for reminders for upcoming episodes, chat live with us, and let us know what you want to see on upcoming shows. I think we’re about done. Are we ready to terminate this instance?
Only if it’s automated.
Touché, SQLrockstar, touché. All right, I’m Patrick Hubbard.
I’m Kong Yang.
I’m Thomas LaRock.
And I’m Gerardo Dada. Thanks for watching SolarWinds Lab.