Database

Databases 101: Choosing the Right Database

Databases 101: Choosing the Right Database

In my last post, we talked about the different types of databases available on the market for you to choose from. With over 300 options, it is hard to know which one is the right database. Today, we are going to help you with choosing the right database for your requirements.

We will start with a discussion of the CAP theorem, and then discuss details of key-value, document, and graph databases.

Let’s get started.

Choosing the Right Database

This is a natural question that many will ask: Which database do I want? Time will be spent trying to determine which database is the right database. The truth is that your application systems are complex, and when broken down, you will find that different pieces are best served by different database engines. Netflix is a great example of this, as they use a handful of database engines to process the billions of data points they collect and analyze daily.

So, set aside the idea that you are going to find the one true database that is better than everything else. Start thinking about how you can find the right database for different pieces of your complex systems.

Now, let’s look at how you can make an informed decision as to the database engine that will work best for your needs.

CAP Theorem

CAP stands for Consistency, Availability, and Partition tolerance. The theorem states that you cannot have all three, as there are natural tradeoffs between the items. Similar to “fast, cheap, and easy, pick two,” people say about CAP: “consistency, availability, or partition tolerance, choose two.”

Consistency means that any read request will return the most recent write. But as we have discussed earlier, consistency need not be immediate. Our requirements could be satisfied with a state of eventual consistency. This graph helps explain the different consistency levels.

From: https://docs.microsoft.com/en-us/azure/cosmos-db/consistency-levels

Availability means that a non-responding node must respond in a reasonable amount of time. This is different than partition tolerance, which states the system will continue to operate despite network or node failures.

When you put these three items together, you should start to understand why you can only ever prioritize two of them at a time. For example, if your system needs to be available and partition tolerant, then you must be willing to accept some latency in your consistency requirements.

Here’s what the CAP theorem looks like:

Let’s look at each side the triangle above (labeled CA, AP, and CP). Traditional relational databases are a natural fit for the CA side. That is, they can feature strong consistency and be highly available, but often at the expense of partition tolerance.

Non-relational database engines are meant to satisfy AP and CP requirements. This is why we have so many flavors of NoSQL: key-value, document, and graph, to name a few. We have two sides of the triangle supported by NoSQL database engines. That means we have a large variety of availability and partitioning requirements along with specific query and workload requirements. It is easy to understand why we have so many NoSQL solutions available.

As a very general classification, you could think of things this way: relational databases are optimized for writes, and NoSQL databases are optimized for reads. That’s not an absolute for each engine, just a generalization to understand why the engines were originally built.

Key-value

Key-value databases work best for scenarios where you have a simple schema, you are doing many read/writes and few updates, you need something that can scale across many nodes and perform well, and you don’t have complex queries that require joins. Key-value databases include Redis, DynamoDB, and Cosmos DB. An example of a company using key-value to achieve high performance is Snapchat, who moved their Snapchat Stories feature to DynamoDB.

Document

Document databases work best for scenarios where you need a flexible schema, your data is in XML or JSON format, you have a balanced read to write ratio, require high read performance, and want to use indexes for performance improvements. Examples of document database are MongoDB, DynamoDB, and Couchbase.

[Wait, didn’t we have Dynamo as a key-value database? Yes, we did. Dynamo is one of those multi-model databases I mentioned at the start. So, you can find the same NoSQL database touted as a solution for more than one scenario. This is why there is confusion in the market over time, which is one of the reasons I am writing this series.]

Graph

Graph databases are perfect for scenarios where you need to create and navigate between nodes, or points of data, and return details about the relationship between those nodes. Examples of graph databases are Neo4j, Cosmos DB, and Amazon Neptune. In one use case, Walmart uses Neo4j to power their recommendation engine.

In the last post in this series, we will discuss the additional factors you need to consider when choosing a database.

The SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates.  All other trademarks are the property of their respective owners.


Thomas LaRock is a Head Geek at SolarWinds and a Microsoft® Certified Master, Microsoft Data Platform MVP, VMware® vExpert, and former Microsoft Certified Trainer. He has over 20 years’ experience in the IT industry as a programmer, developer, analyst, and database administrator. LaRock has spent much of his career focused on data and database administration, which led to his being chosen as a Technical Evangelist for Confio Software in 2010, where his research and experience helped to create the initial versions of the software now known as SolarWinds® Database Performance Analyzer. LaRock has served on the board of directors for the Professional Association for SQL Server® (PASS), and is an avid blogger, author, and technical reviewer for numerous books about SQL Server management. He now focuses on working with customers to help resolve problems and answer questions regarding database performance tuning and virtualization for SQL Server, Oracle®, MySQL®, SAP®, and DB2®. He’s made it his mission to give IT and data professionals longer weekends.