We create applications in an age of simple, powerful, flexible databases that do magic for us. There’s a large variety of modern databases that supply just what’s needed for lots of use cases, so we can pick the right tool for the job. We’ve never had it better, right? So why is “it’s the database again” still a sufficient explanation for a lot of outages and performance problems?
The reality is that as we’ve made huge strides in data management, we’ve both simplified and complicated our lives. One important way this has unfolded is precisely because of the diversity of choice we have. The phrase “choose the right tool for the job” is another way of saying “introduce diversity and complexity into the persistence tier.”
The World Has Changed
If you draw your application’s high-level architecture diagram on the whiteboard, what does it look like? Chances are you’ll draw some variation on a classic three-tier architecture.
The top tier is the app’s presentation logic, which might include things like a PHP website, an AngularJS in-browser app, or a mobile app. The second tier is the business logic, where policies and rules are encapsulated. In a modern app this is likely to be expressed as a set of APIs and/or internal services.
And at the bottom we have the database. It used to be that you chose one
database. “We use Oracle.” (Or SQL Server, or MySQL, or MongoDB, or PostgreSQL.) But something’s wrong! This diagram is badly out of date. A crude sketch of a modern application will look completely different:
A few things have changed. First, each tier is a distributed
system. Everything is distributed these days. This is a big deal and I’ll return to this later. But that’s not all that has happened.
In the persistence tier, we have not only distributed our persistence needs (clustering or sharding, as the case may be), but we have multiple technologies and paradigms
. In this diagram, we have MySQL (relational), Kafka (messaging/queueing), Cassandra (columnar), Redis (key-value), and ElasticSearch (document-oriented, search, has-no-official-logo). Each of these provides differing data models and functionality. Choose the best tool for the job, right?
Welcome to the world of polyglot persistence
. Google search for nearly any modern application’s technology stack, and you’ll find polyglot persistence. A few examples:
- Stack Exchange: SQL Server, Redis, Cassandra, Elasticsearch…
- Shopify: MySQL, Kafka, Redis, Memcached…
- Uber: MongoDB, Redis, MySQL, PostgreSQL, Kafka, Elasticsearch…
Modern apps usually have 3-5 solutions in their data tier, and often more. It’s easy to overlook Hadoop, RabbitMQ, and the like, but those count too.
Polyglot persistence is current reality. It’s not an emerging trend. It’s here now, and it’s here to stay. It’s also very different from the world we lived in 5 to 10 years ago, and many of our tools haven’t changed to reflect that.
With the introduction of diverse, distributed data tiers comes great power and flexibility, but also some significant challenges. Foremost among these is understanding how the app really behaves. Some of the biggest problems that arise with diversity and scale include:
- Monitoring. What systems do we have, and are they healthy? Traditional monitoring tools fall short of the unified, high-level view needed to understand the persistence tier’s status overall. They largely function server-at-a-time and have no notion of servers working together as a unit.
- Performance Management. Unlike monitoring, performance management is concerned with the work the servers and services are doing. There’s currently no solution that shows developers and operations staff alike a coherent, high-level view of the requests (queries, commands, and the like) that the servers and services are fielding. Not on a cluster of a single type of database, and certainly not on multiple clusters of different kinds of databases – yet that is exactly what is needed to see, measure, understand, and improve the application’s database workload overall.
- Engineering Bottlenecks. Databases are complex, difficult technologies to use in many cases, and this complexity is often irreducible. With developers outnumbering DBAs and operations staff by 10:1 or more in many organizations, the entire engineering team can quickly get bottlenecked by developers who need help from operations to understand and improve their interactions with the vitally important and performance-sensitive data tier. As a result, Ops and DBAs are increasingly reactive and interrupt-driven, causing cascading problems for the engineering organization.
These challenges, especially the final point, are serious — sometimes existential. They impact organizations’ profitability, speed to market, and even relevance.
The ratio of servers to engineers continues to grow and accelerate, and the data-to-human gap is growing non-linearly while budgets and headcounts remain mostly flat. We need tools that support smart humans and make them much more productive in a variety of roles and scenarios.
Our goal in building a Database Performance Management platform has been, and remains:
- To provide a coherent, logical, high-level view of the entire database and persistence tier, from a work-being-performed stance, from both the application’s and the server/service’s point of view. Anything less leaves critical blind spots about performance, availability, and quality of service.
- To make production behavior accessible, visible, and understandable to developers and operators alike. Developers need to be able to self-service. This brings many more smart brains to bear on the hardest problems in modern applications, and relieves critical bottlenecks in organizations and business processes.
- To enable fluid zoom-out and drill-down from high-level to the lowest level of details. To be able to troubleshoot and diagnose quickly, users must be able to see and isolate problems swiftly, without getting buried in simplistic, unhelpful views such as walls of charts.
- To provide more than monitoring, more than measuring, more than visualizing. We need analytics. Smart, product- and tool-specific analytics. Generic, least-common-denominator approaches, such as one-size-fits-all receptors for plugins and the mostly meaningless metrics they emit, are not solving anyone’s specific technology problems.
- To do all of this in a single presentation across a diversity of technologies and paradigms. To accomplish this, there must be an underlying, logical principle that unifies the entire system and supports a disciplined thought process for users. Again, we believe work getting done is the best way to think about and improve performance.
Where We Are, Where We’re Headed
Building what’s really needed in the marketplace, instead of yet another square wheel, is ambitious. It’s a hard design problem, it’s a hard engineering problem, it’s a hard scaling and platform problem. But that’s exactly why this is worth doing: if it were easy, someone would have done it already.
It also requires attention to detail; the specifics of each technology must
be dealt with at a deep level. We can’t just slap together a plugin for each new technology. That wouldn’t provide the deep level of insight and value that’s needed. Worse, it would cause problems instead of solving them. As a MySQL performance consultant, I solved several outages by disabling abusive monitoring plugins that were written in ignorance of how MySQL actually works.
Our MySQL agents took months of engineering to complete. The platform to run and support them took much more, and is ongoing. We are now building on this platform and our progress is accelerating. There are three important numbers in computer science: zero, one, and many. With the release of our PostgreSQL agents and platform improvements, we’re past “one” and growing our “many.” The PostgreSQL agents took dramatically less time and effort to create and mature, and future agents will take less still.
We have several more technologies in internal alpha and private beta with customers, and we’re hearing tremendous excitement and eagerness from customers about where we’re heading with the product. You can look forward to rapid releases of new supported platforms in the future. To set your expectations, though, it won’t be two a week. Again, we’re not just building plugins and graph templates.
Today’s distributed, diverse persistence tiers have created a whole new world of possibility, but also new problems to overcome. Whoever solves the operational and organizational challenges of polyglot persistence — and we think we’re well on the way — will make the world a better place for many people and companies.
This is what gets me out of bed every morning, impatient to get into the office and hustle on the day’s work. I couldn’t be more excited about our future. We’ve come so far, and accomplished so many amazing things in the past two years. Much more remains. I can’t wait.
In closing I just want to give credit to our team, who impress me every day, and thanks to our customers, investors, and the many advisors and mentors who continue to support, educate, and encourage us.