/ What Databases Taught Me About Scaling Observability

What Databases Taught Me About Scaling Observability

January 10, 2023

Page Contents

I recently attended a virtual event and heard the speaker comment, “Relational databases don’t scale.” To my ears, this is about as silly a statement as saying, “No one can eat 26 hot dogs in 12 minutes” right before Kobayashi shows up and eats 50. In my experience, relational databases scale when they’re placed in the hands of someone who knows what they’re doing. Just imagine if Kobayashi was your data architect! At first the comment from the speaker bothered me, but then I smiled, recalling the many discussions with colleagues through the years about applications, performance, and scalability. The speaker wasn’t ignorant about scalability—they were trying (and failing) to make a point about architectural choices for scalable systems. These decision topics aren’t new; they’ve been happening for decades. Companies are always trying to design and architect reliable and scalable systems. These days we call such systems “cloud native.” It’s the same thing as calling something “scalable,” but it looks and sounds better, like when we changed the name from “build process” to “DevOps.” It’s like this: 30 years ago, if I ordered a three-inch-thick, foot-long piece of bacon, I would’ve sounded insane. Now, I can order the pork belly appetizer, and everyone just nods as if my heart isn’t going to explode at some point. Achieving scalability isn’t as easy as ordering a few appetizers. There are more than a handful of barriers, including things like data architecture and current network bandwidth. It’s crazy for me to look back at the design decisions I’ve witnessed through the years focused on solving one problem and completely ignoring something basic like whether the network will have enough bandwidth. It’s like ordering soup for the table and the server only gives you one spoon to share.

Why Traditional Scalability Approaches Don’t Always Work

Many engineers have tried to solve scalability designs with a simple solution: throwing hardware at the problem. This is due to a traditional, legacy ideology stating scalable systems must solve for the following:

The ability to handle an increase (or decrease) in data loads
The ability to handle an increase (or decrease) in computational power (due to user and data loads)
The ability to add (or remove) resources as necessary

These three principles are why people and companies believe bigger and faster hardware solves scalability. The trouble with this approach is bad code will bring the best hardware to its knees. Ask anyone who’s tried “lift and shift” migrations to the cloud only to migrate back when the experience goes poorly.

Common Barriers to Scalability

But even having the best code and the most powerful hardware isn’t enough to build a scalable system. I commonly see two main barriers to achieving scalability at organizations:

First, there’s the fragmented tool ecosystem. This ecosystem is composed of all the different tools in use by the various IT and business silos. This disparate set of tools often collects the same data but reports it differently to the end users. This leads to confusion when teams try to communicate and collaborate during root cause triage. For example, a database engineer sees spikes with disk throughput and notifies the server team. But the server team sees no spike because their monitoring tool aggregates and reports an average latency in five-minute intervals, smoothing out any spikes. Both tools are correct from each team’s point of view. And both are useless for either team to determine root cause and find a resolution.
This leads to the second barrier, the ability to quickly identify and resolve issues as they arise. Or, even better, to fix problems before they become an issue noticeable to the end users and customers.

Though existing cloud service providers are making efforts to remove these two barriers, they remain behind existing third-party enterprise monitoring platforms. But there’s still a gap beyond these two main barriers. Cloud-native—er, scalable—systems demand a different type of monitoring. Traditional metrics found with legacy monitoring tools operating within IT silos aren’t useful for organizations seeking truly scalable architecture.

The Importance of Scaling Observability

This need for deeper, actionable visibility across environments has given rise to the need for observability solutions to provide the full-stack view necessary for systems located on-premises, in the cloud, and in hybrid cloud environments. But the real barrier is how observability is only discussed or viewed as a technical solution. The truth is observability is much, much more than a technical solution. Let me explain. What observability is, in the simplest sense, is the evolution of traditional enterprise infrastructure monitoring. In your day-to-day life, observability means you can infer the health of a system by observing its outputs— such as metrics, alerts, logs, or traces—in a centralized location. Observability is also the natural shift from being reactive to becoming proactive. With a primary focus on end-user experience (as opposed to uncorrelated legacy metrics), a true observability solution can help provide meaningful resolutions to issues as they arise. With machine learning algorithms for anomaly detection, engineers can use observability insights to isolate and fix issues affecting the user experience, often before the end users notice a problem. But everything we’ve discussed is a technical solution to solve technical issues. The real—and hidden—strength of observability solutions is they can allow for “observability at scale,” giving valuable business-critical information across departments, including those outside of traditional IT silos. Observability at scale can give a business the necessary insights into their enterprise and an edge in the marketplace over their competitors. In my experience, having one solution also means you must be willing to allow people outside of your team (or silo) to see the status of systems you administer. There’s a natural tendency for a team to resist allowing people with less domain experience to see the status of their systems. No one wants to spend all day fielding phone calls asking about benign events. In time, however, trust develops between teams. Having an observability platform with smart alerting based on anomaly detection is an absolute must to help build this trust between teams.

How SolarWinds Can Help With Scalability

The ability for all employees to see and understand a service outage in one area and how it will impact quarterly returns isn’t some mythical ideal decades away from becoming a reality. In fact, it already exists. But don’t just take my word for it; read how SolarWinds Observability SaaS (formerly known as SolarWinds Observability) solutions did the following:

In technology, we typically must wait for our monitoring tools and processes to keep pace with the technical innovations built by developers. SolarWinds Observability SaaS (formerly known as SolarWinds Observability) and SolarWinds Observability Self-Hosted (formerly known as Hybrid Cloud Observability) are built to support enterprise scale to allow organizations to not only keep pace but save money while doing so.

Is Your Observability Solution Scalable?

Every business decision comes down to three things: cost, benefit, and risk. As companies drive toward more cloud-native solutions, the need for full-stack observability is clear. Just consider the higher cost, benefit, and risk of staying with traditional monitoring. Traditional monitoring also comes with less collaboration, continued poor communications between silos, and longer mean time to resolution, resulting in poor customer experiences. At the end of the day, every company has the same goal: happy customers. Building a scalable system is one step, but it’s not the only one. You also need an observability solution to give you full-stack insights. It should also provide insights all teams can understand, even those outside of traditional IT roles. This benefit alone outweighs the cost and reduces the risk. Once you have scalable architecture and an observability solution across business units, you will have achieved observability at scale. You can learn more about how SolarWinds Observability SaaS (formerly known as SolarWinds Observability)—our software as a service (SaaS) observability offering—and SolarWinds Observability Self-Hosted (formerly known as Hybrid Cloud Observability)—our observability solution designed for on-premises, hybrid cloud, and multi-cloud environments—are built to provide effortless scalability to meet your needs today and in the future. You can also read more about what observability is and how it can help transform your organization in this free eBook.

Thomas LaRock

Thomas LaRock is a Head Geek™ at SolarWinds and a Microsoft® Certified Master, Microsoft Data Platform MVP, VMware® vExpert, and former Microsoft Certified Trainer. He has over…