Presenter: This episode of TechPod is brought to you by SolarWinds, database performance management solutions, designed to help manage database platforms running on-premises, in the cloud, or in hybrid environments. Visit SolarWinds.com/solutions/database-solutions to learn more.
Thomas: The DBA role is complex. Pitfalls exist everywhere you turn, and it can be difficult to survive and thrive. On this episode of TechPod, we are going to help those accidental DBAs listening to become a DBA survivor. I’m your host, Thomas LaRock, SolarWinds Head Geek. Joining me today on this episode of TechPod is my fellow Head Geek, Kevin Kline and Denny Cherry. Say hello gentlemen.
Denny: Hey everyone, nice to see you.
Kevin: Hey Tom.
Thomas: So Denny, tell us a little bit about yourself.
Kevin: Hey, thanks Tom. So I’m Denny Cherry. I’m the president and CEO and principal consultant for Denny Cherry and Associates Consultant, based out of Oceanside, California. I’ve been working with databases for 25-ish years at this point. But we’re coming mixed up Azure for about six months less than it’s been around. And been in IT kind of forever, through all that. I’ve been consulting for 10, 11 years now. And still enjoy working on all of these various problems that people bring up and throw at me and couldn’t get them resolved and get them moving on and get performance for those systems working ugly.
Thomas: Yeah. Denny, you have what my friend Rob and I talk about, it’s that data gene. You share that, we all share that together, and you certainly have it in 25 years. I won’t call you old. We’ll just say experienced. Yes. Very experienced.
Kevin: Thank you. I appreciate that.
Thomas: So we’re going to talk about a handful of things today. How to help that accidental DBA becoming a DBA survivor. We’re going to talk a little bit about the fundamentals, help somebody understand how database engines work in general. We’ll talk about how to best configure monitoring alerting. And then I think to probably the most important piece towards the end, talking about what happens as your skills grow. Right? And as you develop as a DBA, that transformation that kind of takes place. And we’ll finish out with talking a little bit about if you should be a generalist versus a specialist, I think each of us has our own viewpoint on that. So let’s dive in and talk about the fundamentals. Where I usually start is I like to tell people, if you can’t recover, you can’t keep your job. And I think we’re going to throw it to you first, Kevin. Let’s talk about the recovery plan, things like RTO and RPO, to me these are the most important things any accidental DBA really needs to know first.
Kevin: Well, in my early career, when I first switched from being a database developer to being a database administrator, one of the really surprising things for me, it was an epiphany of sorts. I had thought that the big, expensive multi-hundred thousand hardware system that we had bought was the asset. Right? That’s the thing I needed to guard and protect as an admin. And we had built… Again, liked Denny, this is way in the early days of even using Windows systems for big database applications. And we knew it was going to crash and we didn’t really even have any of the technologies for high availability. We used an old technique called DNS switching. Right? Where you’d have another server ready and you just change the DNS pointer as if it ever went down. But we knew we were going to have a crash at some point, and it’s just the nature of high-end systems.
And eventually it did happen early in that job as a DBA. And I was certain that the CIO was going to call and give me a good chewing out for letting that happen. And that did kind of happen. But what happened was a real shock for me, because the CIO did not say, “Hey, what are you going to do about this very expensive server that crashed? How are you going to fix that?” What the CIO said is, “What are you going to do about our data? Where’s our data? How long does it going to take to get recovered?” And that’s when the light went on. Right? I realized at that point, it’s not about the hardware, it’s not about taking good backups, because there were other lessons learned along the same time about, you can take a lot of backups, but they can all be corrupted and unusable.
So, very soon after those incidents happen, we started a program, not only of doing our backups, but of also testing the recovery of those backups. And it was a really empowering sort of experience as well to say, okay, we’ve had a crash. Now the CIO is on the phone saying, “How long till you can get this back up and running?” And I was able to say, “Well, it’s going to be about 45 minutes. The database has grown a little bit since our last test and it took 40 minutes at that time, when we recovered it for the last time. So this time, I’m going to give it a little tiny bit of pad.” And sure enough, it was back up within that amount of time.
And I realized, okay, backups are important, but really we only do backups because what we really need is the ability to recover when there’s downtime. And those were kind of hard-won lessons. You go through some pain when you do that. But as you said Tom, this is job one. If you can’t do that, if you can’t back up your data and then also recover it, then don’t you dare call yourself, a database administrator.
Thomas: How they say, backups are worthless, restores are priceless. That’s the thing to understand. So, now when we talk RTO and RPO, Kevin you mentioned a DBA, but there’s also something I want people to understand if you’re that accidental DBA out there. And that is you have to really understand the role you do have. So, DBA means a lot of different things to a lot of different people. I think we touched upon that in our previous episode. And I’m going to ask Denny for some feedback on this one. How important is it to really know your role, like an administrator versus the database developer or a data engineer, or maybe you’re a data scientist and you just play with databases. I think it’s important to understand your role with regards to these fundamentals.
Denny: Yeah. It definitely is. You definitely have to know where you fit in the ecosystem and what you’re expected to be able to do. If your role is a database developer, then you hopefully will not be the person responsible for the backups and restores. You probably won’t be responsible for the production performance tuning of box. You might be, but hopefully you won’t be. If you’re a data scientist, your job isn’t to build the application. Your job is to analyze the data after the fact. So, it’s very different roles that you have to be able to fall into, and you have to definitely understand what those are and how those are going to play into the broader organization.
I want to touch back on RTO and RPO, and any of those roles that you just mentioned, a database admin, database developer, data scientist, these are all basically IT functions. These are not the people that should be setting the RPO and RTO. RPO and RTO should be set by the business because they’re the ones ultimately responsible for the application and for the business and for selling the widgets and servicing the customers. And they’re the ones that are going to be on the hook when things go wrong.
Thomas: Denny that reminds me too, of a related lesson that I had back in those early days, which was, know who you need to satisfy. Right? So, I always saw before I went into the role of DBA, that the person who is most important in my world is my boss. Right? As you’re saying with RPO and RTO. It didn’t take long for me to find out that the real person I have to impress as the internal customer that we were servicing. So there’d be different teams from say the audit division, different teams from the tax division or things like that, and those are the people that I have to keep happy. And they in turn will tell my boss whether I’m doing a decent job of that or not. And that was kind of a important lesson learned as well.
Denny: They’ll definitely tell your boss whether or not you’re keeping them happy or not. And more importantly, they’ll tell your boss that you’re not keeping them happy. They won’t usually tell them that you are keeping them happy, because they only tend to only tell bad information. But yeah. It’s very important to identify who the customers are of your system. And as somebody working in IT, your customer is not the company’s customer. Your customer is the business because that’s what drives the company.
Thomas: So we’ve covered some of the fundamentals. We talked recovery, understanding what your role really is and knowing where you fit or who you serve, who your customer is maybe at the moment or in general, but there’s more to these fundamentals and we could spend days on this topic, but I’m going to ask you, Denny, can you give an idea for the accidental DBA out there right now? They’re looking at this database server, whatever they’re working with and it’s a black box. Query goes in, data comes out. Can you help them understand kind of how in general a database engine works?
Denny: Yeah. I mean, the fundamentals of it is, it’s kind of like Excel with how you visualize it and how you think about it. So data goes into rows. Those are stored in tables. And the way you can most easily think about a table is it looks kind of like a tab in an Excel sheet, or an Excel workbook rather. And it’s just the find with X number of columns, as opposed to 50,000 columns or whatever Excel has these days. And you can put an infinite number of rows in it as opposed to the million and a quarter or so that you can put an Excel. But visually that’s kind of what it’s going to look like, as in Excel sheet. And that’s how data’s going to go in and get stored as it’s going to go in as the next row in the Excel sheet, or it’s going to go somewhere in the middle, we’re going to move down a bunch of rows and we’re going to slap that row in the middle of the Excel sheet.
And that’s basically how the database is going to work at a real basic level that should be easy for everybody that works with computers to kind of grasp. And then on top of that, we can do index tuning, performance tuning to make the code, bring back to the data faster, and we can change the way the database stores the information, and stores the data. We can choose the order that they stores the data in. And so that’s how we get the data out faster because we’re putting it, even those that stored necessarily in alphabet order, we can create another index on top of it that going to be stored in a different order. So it’s going to reorganize the data and resort the data, so we can create that data faster.
Thomas: So Kevin, same question to you. Anything you want to add or subtract from Denny’s explanation of how databases work? How do you explain that to an accidental DBA?
Kevin: Well, I would just add on a bit more to that. For example, you’ve got a spreadsheet that’s very useful. That’s a good way to store data. Now what you have to take into account is the fact that that spreadsheet is going to be used simultaneously by five, 10, 20 people, 200 people, 2000 people. And so, if you start to type on one of the rows in your spreadsheet, the database will engage all of these internal behaviors and architectural components to make sure that Tom doesn’t jump right into my record in the spreadsheet and start writing his own stuff on top of that. And at the same time, if power goes out on the spreadsheet and I’m only halfway done with the row that I’m typing in, then the database will make sure that it’s doesn’t count at all, because there’s not enough information for that row to be valuable at this point.
So, it has all these mechanisms to make sure that the transactions, the records that we write into those Excel spreadsheets so to speak, play well with others. And then those records, if they’re incomplete, if they’re broken, the system makes sure that those don’t go into the production environment. Conversely if someone starts to do something and can’t finish it, it’ll roll it back. So it’s all about multi-user concurrency and scalability as well as the other points that Denny made about where it’s located, how it performs and those kinds of things.
Thomas: So, you both hit upon a lot of the high-level functionality of a database in general. But the one thing I would tell you to add is, or I want to add, is that how this ties back to recovery, because Denny highlighted, hey look, you should think of it as it gets stored as a concept of a table, which is a logical concept because it’s really just bits on disc anyway. But data gets into a database, and Kevin, you talked a little bit about how the data has to come out for a multi-concurrency and how that kind of works. And that’s kind of reinforcing the point of why RTO and RPO matters so much. Because imagine if I got in the middle of your transaction and the power goes out, what do you do now? Where can you put me back? Where’s the last known good database backup, and how far back is that? And if your answer is, I have a backup from three days ago and your accounting team has to input transactions for three days’ worth before they can continue, these are bad things. So, you have to get that the idea down of how this all kind of relates to each other. On top of that, it’s easy to feel alone because quite often, at least in my experience, as a SQL server DBA, you kind of operate alone. You’re asked to do a lot, and you’re kind of working as your own little team. Maybe you have one other person, but that’s really not enough. So, it’s important to that there’s help available, usually through professional networks. And Kevin, I’m just going to have you talk a little bit, how to help that accidental DBA find out where they can find help.
Kevin: Well, I would start with mentioning there’s a great hashtag in Twitter. So, if you’re a Twitter user, just append to your question, #SQLhealth. And there’s a very large active community there that will answer brief questions, the file or the fixed size of a tweet, but they’ll answer those questions very quickly. So, if you have longer form questions you’d like to ask, you could go to locations like dba.stackexchange.com, a fantastic website. And then there’s also, if you like Slack, there’s also the sqlcommunity.slack.com, where you can have long and detailed conversations with people who are again, skilled and experienced DBAs, as well as developers and things like that, if you need help honing a SQL query. And then we do have things like our professional associations and local user groups, which you can find out about online. So, there’s quite a lot of resources available for us as DBAs.
Thomas: Denny, anything you want to add?
Denny: Yes. I’d say on top of that, there’s a bunch of Facebook groups that are dedicated to the SQL server, Azure professional development. They even joined just kind of search on Facebook for SQL, and you’ll find a ton of groups available that are open. There’s also some stuff on LinkedIn and LinkedIn Learning that are built around classes, or virtual classes that you can take and they have communities around them, plus the people that have taken them and the instructors that are teaching them that you can join, you can ask questions and learn more about SQL server.
Thomas: So, I want to come back to this understanding how the database engine works, because I didn’t hear a part of an answer that I was rather expecting from both of you, experienced, I’d notice I didn’t call you old, but I said experienced DBAs that you are. There’s this concept of, I’ll just call it a weight event. It’s more about understanding the idea of what happens when the connection is made. And if you’re writing a query, to you just enter a query, you hit return and back comes data. But technically and talking specifically SQL server, there are different terms use. For example, I think the exact specific one, it’s called a request. Right? So we have a way to see the requests that are being made inside of a server. And those requests, when they make a request, there’s actually one of three states that that request can be in. And this request can be its commonly, we’ll call it a batch but you can send what is a store procedure, which is a collection of batch statements. Right?
But it all comes down to being a specific request that gets broken down into this one little piece of work site. Anyway, you get three states, running, runnable, suspended. And this is how I like to explain it to an accidental DBA. So, I want them to understand that all that activity, that multi-concurrency that you talked about Kevin, all these users trying to hit this database at once, that tiny little engine has to figure out how to handle each of those requests as they’re coming in. So, I’d like to have the accidental DBA understand what’s happening with all these requests, these multi-concurrency that Kevin, you talked about. That tiny little database engine has to figure out and sort through and say, all right, you’re next. Now you. Now, I need something else before I can do that again. And it does this like say, we’ll call it context switching. It really does this. It’s almost the most, I want to say extreme form of ADD you could imagine. Right? It just moves back and forth.
So, the states are running, runnable, suspended. Running is what you think it is. It’s actively returning data for you. Runnable means it’s ready to go. It’s simply waiting for an available CPU, processor. Scheduler is the word that we really use. Right? It’s just simply waiting for its time. The other state is suspended, which means it needs an extra resource. I might need to do another read from disk to get the data I need. I have what I need or I’m waiting on something else, a network wait, or I’m waiting on a memory grant. Things of that nature, these are all what we call wait events. And it’s really when that request is in this suspended state. So, there’s running, runnable, suspended, and there’s a cycle that goes over and over again.
And that’s how I usually try to get an accidental DBA to just understand the workload inside of there. So Denny, talk a little bit more about the resource constraints. So ones that maybe you see the most of, less of, you have a lot of experience in Azure SQL database as well. Tell us the things that you’re seeing when this engine is doing its business.
Denny: Yeah. So, the biggest constraint that I see when people write queries is, besides horrible queries, is disc. Disc is the slowest component in the IT world. Always has been, pretty much always will be. Just because discs are measured in milliseconds and things like CPU’s are measured in nanoseconds or faster. So, it’s because that disc is typically and always going to always be slow, that and until we have the world of SSDs and NVMe drives, disc meant something was physically moving. So, that will always be slower than electrons. No doubt about it. But these days we have extremely fast hard drives. We have NVMe. We have SSD. Even in the cloud, we’ve got things in Azure. We’ve got ultra disk, and we’ve got local SSD that are extremely fast that can really crank the throughput through them.
So, we will see a lot of other things become our top bottleneck. And we’re always going to have a top bottleneck. It doesn’t matter what the system is doing or how powerful the system is. Something will always be the slowest component and that component is going to be what we’re seeing as showing up as a weight in SQL server. So, we’re going to see weight through the memory, CPU disk, network, or locking, or some sort of locking is going on internally whereas what we mean by locking is that, if Kevin has a row, and you talked about this earlier. If Kevin’s got a row and he’s making changes to it, I can’t make changes to that row until Kevin’s done, because I don’t want to overwrite what Kevin’s written halfway through Kevin’s done writing it. Otherwise, we’re just going to get a jumble of information written to the hard drive.
So, I have to wait till Kevin’s done. And then once Kevin’s done, then I can overwrite what Kevin wrote. If that’s what I’m making changes to, but I have to know that Kevin’s done before I actually could wipe out his information.
Thomas: Let’s talk a little bit about execution plans then. So, let’s say we’ve gotten through the concept of how the engine’s really working, but especially as an accidental DBA myself, I remember just being told, hey, go look at the execution plan, as if that was something I should have just known what to do and how to do. Kevin, how does somebody learn to read an execution plan? What’s a way that we can help somebody in accidental DBA understand an execution plan?
Kevin: Well, the younger generation, you’re used to playing video games, and one of the things that you’ve experienced as a video game player without a doubt is what they call the pathing algorithm. Right? So you’re a little character in the role-playing game, wants to get from the room where you’re fighting the orcs over to the room where you’re fighting the wizard, and you have to move around doors and you have to go over the little bridge. And so when you’re clicking or using your console, it’s moving you from here to there. And it can choose a lot of different paths to get from the room with the Orcs to the room with the wizards. That’s the pathing algorithm. Well, with relational databases, they’re very similar in concept, different terms and things like that are used but the idea is basically the same.
You don’t actually tell it where each footstep will go. What you do is you tell it, here’s where I sit and this is what I want. It’s a declarative language rather than a procedural language that we use for these databases. And so the database then takes what you have requested of it and it builds a pathing algorithm, except in this case what we call it is at Oracle for example, in some of the other platforms, like MySQL call it an unexplained plan on the SQL server and Sybase SAP side, we call it an execution plan. And so this tells us the operations that the database chose to execute in order to do what you need. So, you’ve requested to know about all of the salespeople within the Southwest region who met quota, but didn’t have any very large enterprise sales. So I want to know about all the sales people last quarter, Southeast region, who had sales that met quota but individual sales never exceeded a $100,000.
Well, that’s a kind of complex nuanced request for information. So then your database engine, whatever it may be, not just SQL server but all of the relational database engines would say, oh, okay. So you’re asking about sales information. You’re asking about salespeople. You’re asking about territory information. So, I need to build a join across these three tables, based on the salesperson ID, because that’s the relationship that is built between the sales figures, the salespeople and the territories. And then I need to put some conditions on it, somewhere clauses that say where the max sales size doesn’t exceed a certain amount and so forth. And the database decides how to implement each of those for fastest response. So, in one case, it might choose one set of indexes based on the salesperson ID. But then in another case, it might choose a different set of indexes, say in the sales table, based around the date of the sale and the name of the customer and the quantity amount for each of the sales.
So, think of it that way. Things have to get done yet you’re not telling the relational database precisely how to do it. You’re saying, “Here’s what I want. Now you go out and do that for me.” So, it turns out a lot of the times that when we say we want this particular set of information, we do it either very poorly, or we do it against a database that has not been configured very well. And so it performs horribly. And as a result, you need to go back and take a look at that execution plan and see, oh, okay. I see what happened. It did a full table scan of the sales details table, which is 30 million records long. Okay. I need to make sure it doesn’t do that. I need to make sure it uses a index to go super-fast.
So, that’s the kind of thing that they don’t tell you about when you get signed up as the accidental DBA. Somebody, like Tom mentioned happened to him. We’ll walk in and say, “I need this to be faster. Have you looked at the execution plan?” And you say, “What is this execution plan you speak of, oh, wise master?” And so you start that process of learning.
Thomas: Yeah. So Kevin, I’m going to ask you a pointed question.
Kevin: Oh.
Thomas: What’s the best way for somebody to learn how the read execution plan?
Kevin: Gosh, that is a really hard question.
Thomas: I’ll make it easy for you. The answer is to use Plan Explorer.
Kevin: Thank you. That’s excellent. Yes. Okay folks. So we make a free tool and we like to brag that is so free, we don’t even ask for your email address. And Plan Explorer does things like tell you there’s this really expensive full table scan, and it uses bright colors and highlights things. So, great point Tom. You were going to say something else though, I think.
Thomas: I’m going to move us along. Let’s talk a little bit about configuring, monitoring, and alerting. I tried to break this down and help people especially accidental DBA, understand the difference between health and performance. And my friend, Rob Mandeville explained it to me a few years back. He goes, so I am healthy enough to run a four-minute mile. Health-wise, it is possible. I could never actually perform at that level. So when you start breaking it down and you start to think about difference between say the health of a server and the performance of a server, and you start to think in grouping of those, health versus performance metrics, I was going to ask Denny, what are some of the health and performance metrics he relies on the most, especially considering he does work not just with the earth SQL server but with the cloud version.
Denny: Yes. So, things that I typically want to look at when I’m looking at a server or CPU obviously, so I want to know how much CPU a box is actually using a disc IO, both how much we’re pushing through to the hard drive. So, if the throughput we’re getting in the number of IO we’re pushing, but also the response time that we’re giving back to the time per reading and time per rate, which we can get from Windows. I also want to look at things like page site expectancy, not because it’s going to tell me where a problem is but it’s going to tell me if there’s a memory problem. And the answer is if the page life expectancy is low, then I just need to throw money at a problem and buy more RAM, because that’ll increase that, or I need to do performance tuning.
So it’ll tell me if something’s wrong or if nothing’s wrong. If it’s consistently high, then less chance that things are wrong. If it’s extremely low, something’s definitely wrong. And those are kind of the big counters that I’ll start with as I’m looking at a server and look at the performance of the server to see how it’s actually handling it’s day-to-day worth load.
Thomas: And something else worth mentioning here is, a lot of times when we talk about allow the applications that historically we’re used to seeing, there are standalone systems. They’re built for standalone systems. Maybe they are replicating the system for an HADR thing, but in terms of load balancing, I don’t think there’s as many. However, there is a growing class, a very large class that’s been there the whole time but growing over the past 15 years, and now there’s what we call cloud native. And for those types of monitoring, you’re going to look at things a little slightly different than if it was a server that was sitting in the closet right behind you. And that’s where you’re going to talk about those four golden signals, your concurrency, your errors, your latency, your throughput.
Those would be the first layer of defense, so to speak, the metrics that you’ll say, “These are the necessary ones for me to start with.” But in our world say of SQL server and relational databases where we live a lot of times, we were looking at well, what’s the server metric? What’s the health of the server? And what’s the actual performance, what’s happening? Where are things getting blocked? And now this leads naturally, this monitoring also leads to alerts. As I try to tell people, an alert requires action. Right? If you send yourself an email and you read it and you just ignore it, don’t interrupt your day unless there’s something that you need to do and act upon. Everything else is just information for later. I want to ask Kevin how you feel about that.
Kevin: For a long time, in the early days of my career as a DBA, I measured how well things were going with my SQL server database by how often the phone rang, and a user called to complain. And somewhere down the line, I realized that’s a hell of a dumb way to run your organization. The system has all of these metrics in it, that Denny just described. And you can keep an eye on those and you can use your alerting system to notice when things are going in the wrong direction and give you that alert so that you can actually fix things before end-users start to complain to your boss about how bad this is. And it was a major revelation. And so, there are some good ways to do monitoring, but there are also some bad ways.
For example, a bad way… I’m sorry, alerting. So a bad way to do alerting for example, is to say to the system that does manage alerts, tell me about everything. So for example, there’s a feature built in the SQL server called SQL Agent alert. And you can use it to actually track perfMon counters. And so if you said tell me about every perfMon counter related to SQL server, that’s maybe even thousands, if not high hundreds, of different items you can take a look at. And what we have happened then is, all of your alerts turn into noise that you ignore. And that’s kind of one of the classic mistakes that people make. So Tom, you had said, hey, an alert needs to be actionable and I couldn’t agree more. That’s why it’s actually better to have just a handful of alerts that really mean a lot.
We are out of memory. Our CPU is pegged. These backups failed to work or they didn’t happen. So, I need you to make sure that you have a good usable backup. Those kinds of alerts are extremely important to have in place. Performance is just unacceptable to your end-users because you know you have to take action when that alert comes in. And so once you have that kind of a tribal experience institutional knowledge, either in your team or individually, then you can start to add those less critical sorts of errors to your alerting system so that you can know, oh, okay, this is about my health, not my performance. Right? And so from there, you can take action again, but it’s less priority.
It’s like, okay, this alert is telling me I need to start planning for our next round of storage. We’re going to have to our storage footprint. So, there’s a couple of ways to tackle it, but when you’re setting up your alerting, make sure that you start with just the things that are movingly critical. They get you up out of your chair. And from there, you can gradually expand and add additional things to alert on. And also to Denny’s point earlier too, is that some alerts that you might set have easy fast resolutions, some alerts that you might set point to a really long-term kind of low intensity problem, but there’s something that you need to fix eventually. So, alerts also have a very useful function and that they help you kind of trend the behavior of your systems over time.
On top of that, Denny tossed out a lot of things that you, as a new and accidental DBA might not… What does page life expectancy mean? What does a buffer cache hit ratio mean? Well, the good news for all of you listening is that our community of SQL server DBAs is Tom, Denny, myself, we all share that knowledge. So all of us are bloggers. You’ll see various different experts in our industry who keep track of all of these things and help you know, what are the most important and alerts you really need to take into account? Help explain what all those kind of cryptic messages mean or what all those cryptic weight statistics, what those identify as a problem in your system? So you’re not alone in this. You feel alone a lot of the time. I know I did, but today it is much easier to go from inexperienced, ignorant, and fearful to fairly experienced, fairly confident pretty quickly.
Thomas: So monitoring, building up your own monitoring, rolling your own. Denny, talk a little bit about why you might build your own versus buying something. How do you make that decision?
Denny: The biggest reason I see people build their own monitoring platform basically falls to one of two reasons. One, your boss won’t pay for anything. Or two, you’re in an extremely huge environment where the modern tools just don’t want to scale. And this is a Facebook-size, a MySpace-size environment. When you’re dealing with tens of thousands of servers that need monitoring, pretty much nothing is going to scale to that point. You’re going to end up rolling your own. It’s also going to want to look at very specific things across that environment. And you’re going to need to build your own. But that’s a 1% solution. That’s a 1% problem or .1% problem, almost nobody’s in that situation.
The main reason I see people rolling their own monitoring is the boss won’t pay for a tool. That’s where people get bogged down into, all right, I have to build something or I’ve got to use the native monitoring SQL agent. When you’re looking at a few servers that need to start, you start having them to worry about, you started to worry about monitoring, that’s when it’s going to make the most sense to look towards a product, to actually do your monitoring.
Kevin: As the resident old man on our panel discussion here, I actually started at a time when there weren’t any tools to buy to do monitoring. And Oracle on these HPUX kind of workstations back in the mid 80s, I remember being so excited when something called pop CPU came out, and it was basically kind of like a text-driven version of task manager. Right? Where you could see all the different processes on the server and how much CPU and memory and things like that, each of those were consuming. And so for those of you who work in an environment where there is budget to do things with tools, boy, oh boy, don’t try to roll your own. It’s a hard business to keep up with your own monitoring. And so today there are great tools.
And of course, I am very biased about that. But speaking as Kevin from the 1990s and the early 2000s when I was a DBA, again, there were a couple tools around at that time but not many. We still went ahead and bought those because as an organization, not just from me in the database team, but for our windows server and CIS admin teams and the networking teams, because just like with a carpenter or an electrician or a stonemason tools, tools facilitate faster productivity. Right? So, a carpenter could do everything with a hand tool, but what if you got the power tool, then you get the same quality of work done a 100 times faster. And so value your time as a human being and as an IT pro, and make sure you get ahold of the right tools to make your job as productive as possible.
Thomas: So, I want to make sure I mentioned, I want to call out how Denny mentioned MySpace. And it’s not that just that he’s showing how experienced he is, but he really has, because he was the DBA at MySpace. So he’s just bringing back a lot of memories. And his being a DBA at MySpace got me out of jury duty once, but that’s a story for a different podcast.
Kevin: That sounds awesome.
Thomas: So look, we’ve covered some of the fundamentals. Right? We’ve taken people to get them, taste of what it’s like to be a DBA or what the role is, how an engine would work, where to go for help if you need it, things to consider for monitoring alerting. So let’s just assume that this accidental DBA is now, we’ll say they’ve evolved to be at least the junior DBA. They have some experience. They know what they’re doing. When they show up and the phones ring, they don’t always panic. So, Let’s talk a little bit about how as those skills grow, so do your responsibilities. Because this a thing I know nobody talked to me about when I raised my hand and said, sure, I’ll be a DBA. Nobody said, yeah, that’s great. You think that’s great now, you just wait.
Now we did this SolarWinds Query Report. A survey went out to some Tech Pros, what’s it called, SolarWinds Query Report 2021: Database Priorities and Pitfalls, if you want to look that up. And I think we should have a link in the show notes. A third of the respondents talked about how they manage more than 300 databases. And I looked at that and I said, “Is that all, only 300? That seems low to me.” But in reality, on day zero as a DBA, if somebody said you’re going to be in charge of 300 databases, I would’ve been like, well, that’s nuts. That’s not right. And on top of that, seven out of 10 said that half of the databases in their care are considered to be critical. So 150 out of the 300 databases, and they’re all critical. Everything’s critical.
Well, if everything’s critical, then nothing’s critical. Right? It’s this weird environment you slowly wade into overtime. And part of the problem that in my experience are a lot of third-party vendor apps that just simply run on SQL server. So, I’m talking your SharePoint, your BizTalk, your AppDynamics. I’m talking anything the finance team decides to buy, and it runs SQL server as a back end. So you guys are developing it, but as a DBA, you’re responsible for this. And each of those apps has its own peculiarities as far as what it needs for configurations.
So as your skills grow, you end up taking on more and more, or should I say more and more just gets tossed your way. And it’s not necessarily bad you’ve automated a few things. So, I thought maybe Kevin, you could talk a little bit about how you’re going to add value to the business by being part of these new app dev projects as they come up? Because you don’t want to be blindsided and told, “Hey, by the way, we just bought this thing.” You want to be part of the conversation before it comes in the door. Isn’t that right?
Kevin: Oh, absolutely. And that’s a big part of what the kind of mentorship that I think we’ll see a lot of the senior DBAs try to convey to, to juniors and accidentals who are just getting into the business. Is that, it begins with recognizing the true value of the data and how important is to the business. But then as you grow in your technical skills and are able to take on more responsibility for say different apps and things like that, you’re also able to recognize more opportunities that can help improve the business and add value to the business. So, it’s at times like that, that you’re able to start to get to know the different application owners. Again going back to that early concept we talked about if your boss, yeah, they sign your paychecks and approve your vacation and so forth.
But it’s those customers, internal customers I should say, who own those various applications that are either bought or built, where building a relationship with them is going to provide an opportunity for you to become a key part of not just the IT team, but of the business decision-making teams that decide, oh, okay, what are we going to, how are we going to, or why are we going to build this particular new system? And so you can by showing your expertise and your quality of care for that asset, for that data, you’ll find that you can get to be the person that is invited to those design or those build kinds of discussions. And because you know the business owners, you are able to share advice and recommendations with them. You’re also able to point out important things that even they may not know. For example, one of the things that was really helpful for me back in the day was, I built some histograms that just kept track of the batch requests and transaction requests per second, that SQL server monitors, over time.
And I was actually able to tell business owners things they didn’t know about their own apps. Like what time is it actually most busy? A lot of times the business owners will say, “Well, it’s going to be Monday morning when everybody comes back in and signs on to the application, that’s going to be our most busy time. Right?” And predictably in some cases, it would be Friday before everybody leaves and they have outstanding transactions that are open. And when you’re able to share things about the business to the business stakeholders, they really want to know your opinion more. Oh, when should we do our patching then? Because we’ve always had a tradition that we do it every two weeks when the network team does it.
Well, it turns out that maybe you shouldn’t do that on Saturday because everybody actually logs in for 10 minutes on Saturday afternoon to get the Monday report due for management on Monday morning. So maybe you should schedule that for Friday evening rather than Saturday during the day for your patching. When you begin to reveal things like that, about the business users applications, they begin to want you to be there any time they’re making important decisions. And so that’s really cool because you can then kind of steer and influence those decisions and be a respected member of the team.
Thomas: So, let’s bring it home here. Let’s talk a little bit about, especially for somebody starting out in their career, would it be helpful to specialize or generalize? And we’re going to go to you Denny. So, something that struck me also in the query report, that survey that we did, about half the respondents were talking about how performance or query tuning was one of the top three skills followed by security management and data engineering. As you and I know, these are three very different things. But you and I could talk about, Hey, should I specialize in Synapse versus Azure SQL database? Or maybe I should be looking at Redshift and things of that nature. So for somebody just starting out, how would you help them understand the difference or the necessity for specialization versus generalization, and what path would you point them down?
Denny: So, the conversation between generalization and specialization, you can go either way obviously, and whichever way you end up going, it’s going to be either in here to find work long-term or harder to find work long-term. But there’s going to be a higher reward financially if you go one route. So, if you go with the specialization route and you get really good at SQL Server performance tuning or SQL Server databases, the amount of money that you can demand go as you move through your career is higher. But it can be harder to find jobs because not every company is willing to pay a large amount for a SQL Server performance tuning expert.
On the flip side of that, if you go with generalization and you just kind of know everything in SQL server, it’s relatively easy to find a job, because most companies need a DBA on staff. But because you don’t have that specialist knowledge, you can’t demand the higher salaries. You’re going to be in that lower salary bracket. So, there’s the trade-off there is it’s a little more comfortable to be a generalist because it’s a little easier to find a job, it’s easier to find work, but the higher pay, the higher title, those are all going to come with specialization and diving into a specific field.
Thomas: So I tried to tell people that these days, I would say, you should pick a cloud platform, a data platform to generalize in as much as possible, like Azure, or even AWS. And be a generalist in the cloud platform itself. But you’re going to want to specialize in one to two things, but only if you truly enjoy them. If you’re going to choose to be a specialist, just because you think you’re going to get paid, I’m pretty certain you’re going to be miserable in whatever it is you’re doing. You have to really enjoy what you’re doing to the point that if you didn’t get paid for it, you’d probably still do it just out of habit. Those are the things you want to specialize in, and because then if you get paid for on top of it, then paid well. It’s more satisfying.
But yeah. I would not, and I’ve had people say this, should I specialize in SQL server? Or how much do you enjoy it? Well, I just know they get paid. Okay. We can sometimes, but that doesn’t mean you should specialize it. I think the description is that your career should be T-shaped. So the top of the T gives you the breadth of everything that you want. And the stem of the T is the depth. You have to be able to take that deep dive into one or two things, but don’t start with the deep dive. You really should generalize and explore as much as possible in everything there is before you make that decision, I want to dive a little deeper in this area right now. So, let’s talk about for you Kevin, would it be a good idea to specialize in something that maybe you’re seeing is perhaps being automated away?
Kevin: Yeah. That’s a really important question. Okay. So what we’re dealing with in our world now is a lot different than it was say 20 years ago. In 20 years ago, Denny, myself, Tom, we, not only could but we did know everything there was to know about SQL Server. Because there was really only SQL Server and one or two other features that were significant. There was a very easy data integration tool called, what was it, a distributed transformation services or something like that. DTS. It was simple. There was a little reporting feature. It was simple. There was a little ELAP feature. You could be brain dead and still know it. As time has gone on, each of these different additions are massive. And so even when we say, oh, you know what? I’ve seen you work. You could be a great SQL Server journalist if you wanted to.
Even that implies a certain degree of specialization because the world of SQL server, the Microsoft data platform, and this is true for other database platforms as well, is massive now because there’s moving data around, there’s transforming data, there’s reporting and visualizing off of that data. So, there’s just so much more out there. That’s the first thing. The second thing is that our world has also grown in terms of the kinds of bells and whistles that they add on to these different data platforms and the different features so that they can be automated. And there are certain aspects of that, that if you make that the bedrock of your career, you’ll find that there’s a robot doing that, either right now or in the next few years.
An example of that is backup and recovery, especially in the cloud, is kind of being a commoditized automated sort of feature set inside of GCP or AWS or Azure. Right? They come with database backups and recovery built in. It may not be exactly the kind of backup you want, but there will be backups built into those. So, the things that you want to keep an eye out for, that can’t be automated away at least anytime soon, usually focus around designing things and improving things. Right? So that would be something like, how do we design this database? How do we choose the indexes, the data types we use for the different columns and so forth? How do we design the queries that hit those? Those are all builder functions. And those things don’t get automated away very quickly or easily. But routine, do this every Wednesday, every Friday and every Sunday, those kind of, oh, rote tasks that I need to do on the regular that are a big part of DBA jobs, those things will be going away.
So, if your business card says “Joe Smith, DBA Backup and Recovery Specialist”, well, I’d encourage you to start learning some new tech. Right? Because backup and recovery is one of those things that will be kind of automated. Flip side, if it said Joe Smith, Josephine Smith, and it said a database designer and architect, well that will never be replaced by a robot, at least not anytime soon. So yeah. Give that some consideration. And my hope is that you like that aspect of building and designing and architecting, because those set of skills in whatever form they take, whether that’s the queries or the transactions or the design itself, those won’t be going away.
Thomas: Kevin, you say design won’t go away, and NHibernate would like to have a word with you.
Kevin: And if you’ve ever done troubleshooting and performance tuning on a system built with NHibernate…
Thomas: Yeah. So that’s a different podcast altogether.
Kevin: Yes.
Thomas: This is the part of the central theme of the whole idea of that disappearing DBA is that there are tasks such as query tuning that will disappear. They are already disappearing. The engines are getting better. Oracle’s autonomous database handles a lot of stuff as well. The makers of these commercial platforms are actively looking to reduce those tasks you have to do that are distracting you from doing the real work, which as you said, Kevin is design and architecture. Those are not likely to be replaced by robots anytime soon. You’re probably going to want a human that’s part of the discussion, but a lot of the other things are going to go away. And that’s when we say the DBA is disappearing, people aren’t going away, it’s the title. It’s the job role. It’s changing and it’s evolving.
Thomas: And you see it with every release of SQL Server. And every time Joe Sack puts in a new feature that does automatic index, whatever he’s doing, magic and fairy dust he has, it is. It’s a slow process, but I believe we’re going to get there. And that’s a good thing, because it’s going to let the machines do the stuff that they should be good at. And the humans do the stuff that we’re going to be good at.
So I want to thank you both. I’ll start with you, Kevin. Kevin, thank you for taking the time to join us on TechPod here today.
Kevin: My pleasure. Always fun.
Thomas: And Denny, my old, experienced friend, thank you as well.
Denny: No problem, happy to be here.
Thomas: A good DBA is hard to find, and this is because a good DBA evolves into three to four full-time jobs. On the next episode, in the disappearing DBA series, we’re going to talk about how the secret is automation. And automation is the reason that DBA is disappearing. If you enjoy SolarWinds TechPod, we’d love to have you follow, rate, and review the podcast. Thank you for listening, and until next time, I’m Thomas LaRock.