Home > SolarWinds Lab Episode 35: MySQL Database Monitoring With DPA

SolarWinds Lab Episode 35: MySQL Database Monitoring With DPA

The new version of SolarWinds® Database Performance Analyzer is hitting the streets, and there's a new member of the family - MySQL! Join Head Geeks Leon Adato and Thomas LaRock, along with guest Senior Technical Product Marketing Manager Janis Griffin as they show off all the shiny new features in this release, including a deep dive into the ways DPA can make you look like a MySQL expert, even if you aren't.

Back to Video Archive

Episode Transcript

Rough crowd out there, Janis? You have no idea. I've seen caffeinated toddlers with fewer questions. What's up? And where are we keeping the caffeinated toddler? We're hosting a group of SQL experts and giving them a sneak peek at the new version of DPA. Don't get me wrong now, it's going well, everyone is excited about what they're seeing. Well as long as you're here now maybe you could give us a walk through of some of the cooler features. You know I'm a huge fan of everything Linux. In fact, everyone out there is probably interested in seeing it too. Which reminds me, you can ask us questions and we'll try to answer them during the live chat. And if you don't see a live chat window over here that's because you're not watching us live. And you can do that by going to lab.solarwinds.com, registering so that you can get notifications on new or upcoming episodes, leaving us questions or even feedback. Oh, no rest for the wicked I suppose. Dude, you have been hanging out with Patrick way too much. Tell me about it. I tell you what Janis, you catch your breath and I'll get things started here with a look at what's new in DPA. Oh Tom, I take back everything I ever said about you. [LAUGH] No you don't. [SOUND] So we have the new version of DPA right in front of us. I'm just looking at a SQL server instance. Kind of looks all the time. But there's two things on this page that I'm going to point out to you. These are features that customers have been requesting for years. And I know this because as a former customer, there are things I've wanted in the product for a while. The first is the annotation. So the idea that you can simply make a note at a point in time to say something like, dropped index or maintenance window, right. So I can come in here and I can just sort of say, hey, look, let's see, for July 28th. I'll just leave it at midnight and I'll just say I did this, my manager told me to. It was okay. Perfect, right? Created by me, and there you go. And what happens now is on this screen very quickly you're going to see the little annotation. So what's brilliant about this is now I can share the information across teams. So if I'm responsible for having done something or I know somebody else has done something if there is always been the spike and somebody say oh geez what happen there, I can send the note to future me to remind me or anybody else what has really happened there. So, honestly much requested feature over time it, we finally got into this version. Now let's say that of course we had a big spike or we just happen to know that there's some particular query of ours that we'd like excluded from this chart, wouldn't that be nice. It's one of those things that has come up from time to time. Hey, I had this really huge spike and it's throwing off because there'd be this one bar that would be bigger than any of the others. Can we just not display that one? Until this was released, the answer was no you're stuck with it. Now, we have exclusions. So, I can come in here and I can give it a name. All the regular stuff. Now we have this little check box up there. I can exclude this statement from these trend charts. I click okay. The screen refreshes. That is now gone. Now, the natural problem with this is how do you know what you're looking at? Like there's something missing here. Where did that information go? But it's just down here. It just says this little thing comes up that says, oh, look, by the way, you're looking at the chart that has one particular statement that's been excluded. So, now you know, so you can toggle it back and forth. Now, my bars came down a little bit because I got rid of that, and I can go through and I could exclude more and more I want. I can make more annotations again. It's a knowledge and sharing and transfer between a lot of different teams. But it also lets you focus in on what's important at a particular, in a situation. It's not so much what I affectionately call painting the roses green where they're like oh, that's really ugly and it makes us look bad so let's exclude that [LAUGH]. I'm sure that some people will try to use it and it's nice to know that you can see what's being excluded, but also when you're trying to troubleshoot a particular problem and you want to get all the other stuff out of the way you can just focus in on those elements so I like that a lot. It's very cool. Another feature that we've had for a while, I'm the math geek. Indeed you are. I appreciate the idea of having some sort of baselines that are tied to, say, a percentile. And what you can see here is I can look at the block sessions and I can see where am I, what's my average, am I between the 10th and 90th percentile? So we have the concept of baselines into our resource metrics now so maybe this might be a better one to show right quick. So you can see here, the average is the baseline from 5 to 6 AM, for just that 1 hour, the average is 8. The average between 4 to 5 AM is closer to 9. This gives me an understanding so if something happens; I can get a better idea of is this normal? Is this a problem? What actions should I take to possibly solve that? So the idea of a baseline is much better than just having the old graph, like if I hide the baseline and you just see the raw data itself, is that normal or not? I have no idea, what is normal, what's the range of normal for this particular segment I'm looking at, which is a one-hour graph. And that helps thematically for us to fit into some of the other tools like software application monitor for SAM, which also has the baselining now built in where you can see. And also NPM, where you can see where you can use the collected data to build a trend. And like you said, now you know are we out of normal or is this like normal? Exactly, exactly. Okay, this is very cool stuff. But I think we all know that the main event is MySQL. Now one question I had, and I actually think that a lot of people out there have it to is, why did we wait until now? Why didn't we include MySQL support before now? What changed? My SQL is open source, and it's free. And not only that, DPA depends on waits states, and most of the databases we monitor actually have been instrumented and show what resources are being used or being weighted on. And we grab that information, and actually give great clues on how to tune, you know, and actually increase a performance of databases. And it wasn't until just recently that the later releases of my sequel came out that basically--they've instrumented themselves as well. And so now we get that good information from MySQL, and we could report that back to quickly find things in the database to fix. And as Janis just mentioned about MySQL being open-source. So for many years, customers would come to us and say, “Hey, can DPA monitor for MySQL?” And the question I would ask is, “Well, how much did that MySQL instance cost you?” Cost them nothing. So how much did you spend for a monitoring tool, it was roughly the same budget. So it took a while, but when those customers started getting into enterprise level support through Oracle or some contracts, that's when it became a little more viable for us to put some money and invest into MySQL monitoring as well. In fact, I have a son-in-law that works for Amazon. And he told me the other day that he's just so amazed at how much money he sees people throwing at MySQL databases today, just to keep them up and running and performing like they're supposed to. Got it. Okay, well that clears it up. So now I'm really excited to see this. So can I get the nickel tour? Oh sure thing. [SOUND] Excellent. Okay so you know those are great features in DPA but the best thing in this release is MySQL support. And so, what we're looking at here is a MySQL instance and we're actually seeing like 30 days of detail. Now, what's really great about MySQL is it's very well instrumented now with lots of thread states and weight instruments and actually operations that you can actually quickly see. So, I'm looking at 30 days of detail here. And you can see on this one day I had like 22 days of wait. That's every active session, every second working in this database. You know, 524 hours and I'm only looking at the top 15 SQL statements. That's a long day. Isn't that? [LAUGH] Yeah. Each color here is a unique SQL statement. So I can highlight over each one of these. But what's great about this is right form the very get-go I can see, well, what are my threads doing? What are these weight states all about? And so, I can quickly look here across the whole month here and see that my top wait state is sending data. So a lot of sitting. But that means network, right? Blame the network. Well it's not- It's never the network. [LAUGH] Well we can get a clue just by clicking on sending data. And this is going to tell me what the sending data means. And if I've got a lot of time spent on it I can actually walk through these steps and actually get some good information on how to reduce set time spent. So, it's really a nice help here if you're not familiar with MySQL but it's a good reminder if you're really savvy about MySQL as well. Not only that, I can say well, I want to choose a day, and let's just come down here, I want to click on this day, when I do that I actually get 24-hour period of what that day feels like and you can see this annotation here. This is what Tom has shown in the last one. Where actually I had to take it down because I actually upgraded the release so you can actually make those neat notes and all that. But I was looking at my wait states, and I can say I'm sending data, what about the instruments? What's been instrumented? And I could see well, makes sense. MySQL handler, which means that it's actually going to INNODB, and sending data to and from. You can kind of see that, that's my top wait, then I can go into operations. What are my operations and what's my most one? Well of course, it's fetching. It's sending data fetching, that type of thing. You just get all this good useful information. Now I can click in to any one of these and that will take me down into that hour, so that's this dark bar here, as you can see. And this is just like all the other databases that we've done, the navigation is the same with all the other database types. But I can say, what is doing all that fetching and so now I'm on this one fetch statement. I've got all MySQLs here that are actually primarily doing fetching so I can click in there and get further information on that. So just really granular information. Now that I'm down here I see this one is actually doing a lot of fetching. I can actually say, well, okay, what program called this? What database does it come from? It's a cube that I can turn it many different ways to get an idea of where is this coming from and exactly pinpoint what the problem is. So, now I'm looking at programs and so you know I can actually look at the many different ways and slice and dice this data in any way I want to. One thing that's really kind of neat if I come back up here to the top, real quickly with just one key click I can actually deep dive into any SQL statement I want to. And so, what's fun about that is I can get some information to figure out if this SQL is behaving normal or not. Here I can see, well it hasn't been running in the last 30 days. It actually started on the 27th here. And it’s actually been creeping up in fact you can see the executions along with the total time spent in the database. This day is not normal of 69 hours in the database. What's my through put? 44 executions? Maybe I should work on that a little bit. This is one that I have to tune and actually see why it's actually spending that much time on 44 executions. Scroll down. Because you get the average? This is my favorite screen in all DPAs, the historical-- And it's just great, because you can get the rows affected, the rows or sent. You can actually see how many rows were examined. Look at that. Almost a billion rows, is that true? With the b. Okay, oh my gosh. Let's see if I can highlight over that and see if that works. I hope this database is worth it. My gosh! [LAUGH] Look at the temp tables created in this one query. This is very inefficient. Maybe we should look at rewriting. I mean temp tables? Come on. 160? During that. How many executions? Do we see how many executions? Well that day was busy. 80. 160 temp tables for 80 executions. And then you get the row sorted, so you get all this good information. And the average wait time-- Yep. And here it is, this great baseline here. I love this baseline, you know this one is just been running only during this time, so we're going to looking but it's actually going over and giving you the average and so you can actually see that this day was not normal. So this gives you a clue that yea I need to work on this. And not know by how much also. If I was a little bit out of whack or a little. The other thing is that just having poked out DPA for a little bit is that this is bringing the MySQL monitoring in line with what we were already doing with Microsoft SQL and Oracle and the rest of the stable. So, it's not necessarily that this is world altering for MySQL particularly it's just that now is in line with everything else we've been able to tell for so long. Correct, correct. I mean if you have other types of database, say Oracle, SQL Server. And as a busy DBA, a lot of times you don't have all that all in your brain. It's nice to have it here so that you can just quickly move back and forth, and see the performance, and get good ideas on how to fix it. Yeah, the consistency of you. I'm not a MySQL DBA. But DPA let's pretend to be one. [LAUGH] There you go. It really does. [SOUND] That's great, now we've worked really hard to make sure that real DBA's appreciate all the functionality that's inside of DPA but, do you know who I also think is also going to appreciate this? Oh, your kid's orthodontist. Your rainy day bacon fund? So, you're both right but, that's not important right now. What I really mean is the regular admin so the server admins were subtle with MySQL boxes that they never asked for. You know what? I can see that, in fact, that's a lot of what we talked about last winter at THWACKcamp with the accidental administrator session. That's right. So what do you think? Can we show an example? A situation that would be a challenge for a non-DBA to analyze but for which DPA makes stupid simple. You know what? I have just the example. Okay, so before we dive into this. You'd mentioned locking and blocking. And I want to just say, for those of us that are accidental or non-DBAs, what is locking and blocking, and why would I care? Locking and blocking, you have to keep the data consistent. So you have to actually grab a lock, and for consistent reading for everyone using that same data. You can't be changing it underneath other people, and be suddenly surprised by, for sure. So, you have to grab a lock, update the data and then you give it back and they see a consistent read. The problem becomes when there's synchronization issues or maybe slowdowns, maybe bad performance where data grows at such a pace that all of a sudden locking takes longer than it's should. And so, then people wait and so you grab a lock, you're actually blocking the people trying to read it. Ah! Okay. In MySQL depending on the engine and what you're doing a lot of times in a database system it might be row level locking but in MySQL what you find by default sometimes it's table locking. Oh, that's not good. So, it's many rows being locked at once. Yeah, okay so I can see why that would be a problem. So how does DPA make it easier for me to see that and know that there's an issue. Yeah what's really great about is that you can come in here you don't really need to know how lock and blocking work you can come here and see, as we were looking earlier this one day was 524 hours, but what is that? And we can click into that day and what we can see is well, here around seven o'clock things started really happening and we got a lot of wait all day long, we can click into that day, or that hour and so now I'm in the seven to eight hour and notice my top one. It's on ‘On’ a system lock. It has a system lock. So, it's grabbed a lock and is holding onto it. It's actually held onto it for quite some time. You've got 6 hours out of the 8 hours and 55 minutes on that system lock, and then it starts updating. Now we can actually come up here and say, well, okay, if that's blocking, who are waiting? What was it running while it was blocking? So you can come into the blockers tab. Okay so we go into this blockers tab and what you can see here gives you a great a view of okay here's a blocker. He almost caused about half an hour of wait. And we can expand him. And this is a process ID. But, we can expand it and see he actually blocks these guys and they and they in turn block more. So, you can see how long they were blocking others, and how long they waited. See the whole chain. Yeah. And this is the actual query that it was doing that was creating that locked. Yeah, now we can look at the details of the blocker. The blocker doesn't know he's actually blocking, he's just merely running along trying to do his processing. But we can see that he was actually trying to do this update, we highlight over that. And then, basically, he's doing a select for update. So, that's the reason why he had the lock and what was this one? And this was just a select that he would probably done between the time while he was holding those locks. And so, we see what he is doing and then- That's a long list. I know, that's a lot of blocking locking. But you can see here what these guys were actually trying to run while they were waiting. And they were all trying to do this select for update. So, for an accidental DBA for one thing you could maybe even go out and kill that one process. Process not person. Yeah, yeah. No. [LAUGH] Then we have another person but-- What else this tells you is that the code, the way they're trying to use the engine inside MySQL may not be appropriate for the workload that they're doing. So, this is one of those cases where you look at it and say okay I see what you're trying to do and the resources that you're trying to do that with clearly don't work. So, we maybe think about a different architecture, or maybe think about you guys not running all these statements at the same time. Right. And I think there's a mindset difference between the accidental DBA and the real DBA. Accidental DBAs tend to think in more of systems. If you're a network guy or a server guy who becomes, accidentally takes ownership of a database, you're thinking, how do I find out the problem and fix the problem now? And that's not really the mindset. I think the mindset is that, ah, I see what happened yesterday and all the impact it had and how do I architect or design this to not happen again in the future? But it's not about, oh my gosh, we've used up all the bandwidth. What can I take away now to relieve that pressure immediately, or disk space, or what have you? It's not the immediacy as much as the data analytics, to be able to go back and just keep things from happening in the future. Right. [SOUND] And so Leon, was that enough to satisfy your curiosity? Oh definitely. And hopefully, everyone who's watching got their questions answered as well. Great. So Janis, before you head back to the MySQL mosh pit over there, do you want to do the honors and close us out? Sure, I'd love to. For SolarWinds Lab, I'm Janis Griffin. I'm Thomas LaRock. And I'm Leon Adato, thanks for watching.