Databases 101: Choosing the Right Database
In my last post, we talked about the different types of databases available on the market for you to choose from. With over 300 options, it is hard to know which one is the right database. Today, we are going to help you with choosing the right database for your requirements.
We will start with a discussion of the CAP theorem, and then discuss details of key-value, document, and graph databases.
Let’s get started.
Choosing the Right Database
This is a natural question that many will ask: Which database do I want? Time will be spent trying to determine which database is the right database. The truth is that your application systems are complex, and when broken down, you will find that different pieces are best served by different database engines. Netflix is a great example of this, as they use a handful of database engines to process the billions of data points they collect and analyze daily.
So, set aside the idea that you are going to find the one true database that is better than everything else. Start thinking about how you can find the right database for different pieces of your complex systems.
Now, let’s look at how you can make an informed decision as to the database engine that will work best for your needs.
CAP stands for Consistency, Availability, and Partition tolerance. The theorem states that you cannot have all three, as there are natural tradeoffs between the items. Similar to “fast, cheap, and easy, pick two,” people say about CAP: “consistency, availability, or partition tolerance, choose two.”
Consistency means that any read request will return the most recent write. But as we have discussed earlier, consistency need not be immediate. Our requirements could be satisfied with a state of eventual consistency. This graph helps explain the different consistency levels.
Availability means that a non-responding node must respond in a reasonable amount of time. This is different than partition tolerance, which states the system will continue to operate despite network or node failures.
When you put these three items together, you should start to understand why you can only ever prioritize two of them at a time. For example, if your system needs to be available and partition tolerant, then you must be willing to accept some latency in your consistency requirements.
Here’s what the CAP theorem looks like:
Let’s look at each side the triangle above (labeled CA, AP, and CP). Traditional relational databases are a natural fit for the CA side. That is, they can feature strong consistency and be highly available, but often at the expense of partition tolerance.
Non-relational database engines are meant to satisfy AP and CP requirements. This is why we have so many flavors of NoSQL: key-value, document, and graph, to name a few. We have two sides of the triangle supported by NoSQL database engines. That means we have a large variety of availability and partitioning requirements along with specific query and workload requirements. It is easy to understand why we have so many NoSQL solutions available.
As a very general classification, you could think of things this way: relational databases are optimized for writes, and NoSQL databases are optimized for reads. That’s not an absolute for each engine, just a generalization to understand why the engines were originally built.
Key-value databases work best for scenarios where you have a simple schema, you are doing many read/writes and few updates, you need something that can scale across many nodes and perform well, and you don’t have complex queries that require joins. Key-value databases include Redis, DynamoDB, and Cosmos DB. An example of a company using key-value to achieve high performance is Snapchat, who moved their Snapchat Stories feature to DynamoDB.
Document databases work best for scenarios where you need a flexible schema, your data is in XML or JSON format, you have a balanced read to write ratio, require high read performance, and want to use indexes for performance improvements. Examples of document database are MongoDB, DynamoDB, and Couchbase.
[Wait, didn’t we have Dynamo as a key-value database? Yes, we did. Dynamo is one of those multi-model databases I mentioned at the start. So, you can find the same NoSQL database touted as a solution for more than one scenario. This is why there is confusion in the market over time, which is one of the reasons I am writing this series.]
Graph databases are perfect for scenarios where you need to create and navigate between nodes, or points of data, and return details about the relationship between those nodes. Examples of graph databases are Neo4j, Cosmos DB, and Amazon Neptune. In one use case, Walmart uses Neo4j to power their recommendation engine.
In the last post in this series, we will discuss the additional factors you need to consider when choosing a database.
The SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.