With SolarWinds, IT pros are discovering a smarter way to manage apps, servers, databases, and hybrid clouds. The shift starts here.
The internet is littered with information, and misinformation, regarding data and databases.
A short time ago, a colleague asked me where to find a “Databases 101” guide for the non-technical professional. He wanted something to help make sense of data and databases, specifically relational versus NoSQL. But his internet research showed it was difficult for someone entering the field of data and databases to understand the bigger picture.
So, that’s what I decided to write for you. This is the start of my Databases 101 blog series for the non-database professional; something to help anyone understand why the word “database” is an overloaded term. We use the term “database” to describe a great many things. To show how far down the rabbit hole we have gone, all you need are three words: “Microsoft Access Database.”
[
Microsoft Access is not a database.
And Microsoft Access is a database. It’s Schrödinger’s database, really. It’s likely to be the database of choice for the quantum computers once they are online. But I digress.]
This series is organized into sections for you to digest:
Types of Databases
Choosing the Right Database
Additional Factors to Consider
First, we look at the different types of databases available on the market today. In the next post, we will discuss ways for you to choose the right database for your requirements. In the final post, I will list out some factors to consider when trying to decide on any specific database technology.
Also warrants mentioning: I use the terms database and database engine interchangeably in this series.
Let’s get started.
Types of Databases
To me, databases only have two classifications: relational databases, and everything else. I don’t care what anyone else tells you. Those are the two buckets you start with. Either your database engine is relational, or not.
I have found that the
DB-Engines website is a good resource to understand the myriad database options that exist. I’ve written before about this ranking and
why it is good, but not perfect.
“I’ve been following the DB-Engines ranking for a few years and would encourage you to do the same. The rankings are not an exact science. You can read for yourself how they are calculated. One thing to note here is the selection bias in how they collect their data. The rankings show a clear preference for systems that have a lot of engagement online. It doesn’t talk about revenue, the number of installations, or if the engagement online is negative or positive.”
I will use data from DB-Engines to break things down into more detail.
Relational databases
There are currently 138 relational databases (out of 343 total) listed at DB-Engines. Relational is by far the largest category.
The top four ranked databases at DB-Engines are relational engines, and six of the top ten overall. Yes, Microsoft Access is listed there, ninth as of July 2018. I’ll let that sink in for a moment.
Historically, what defined a database as relational (in a traditional sense) was the ability for database transactions to adhere to the
ACID properties. ACID stands for Atomicity, Consistency, Isolation, and Durability. You can read more about these concepts and other database transaction concepts
in this paper written by Jim Gray in 1981.
This is not to say that only relational databases support ACID, but you will find that the majority of differences between relational and non-relational databases are ACID versus BASE (which we will discuss next).
Relational databases leverage ACID for database transactions to maintain data quality and integrity. And, to some computer folk, this is a weakness in the
design of a relational engine. Favoring data quality and integrity comes at a cost. The cost is measured in performance, latency, storage, and throughput.
The truth is that relational databases scale up just fine when placed in the hands of people that (1) know what they are doing, (2) aren’t afraid to learn new things, and (3) don’t mind doing the work.
I said scale up, which is another way of saying “throw hardware at the problem.” Scaling out means you add more nodes to handle the workload. Scaling up versus scaling out is a traditional difference between relational and non-relational database engines. Often, when you want more performance from a relational engine, you will scale up. But for non-relational systems, it is easier to scale out. Scaling out is often cheaper and easier than scaling up.
Non-relational databases
In contrast to relational database engines, years ago, a handful of systems arose that marketed themselves as “No SQL,” an alternative to relational database engines. Over time, that stance softened, and instead of the anti-SQL stance, they became friendlier by labeling themselves as “Not Only SQL,” or NoSQL. This is the group I classify as non-relational. DB-Engines has 205 systems listed as non-relational, but with many subcategories.
The most popular NoSQL system types are
key-value stores (66),
document stores (46),
graph databases (31),
time-series databases (25), and
search engines (17). There are also 36 NoSQL databases listed as “multi-model,” meaning they are more than one non-relational engine. An example of a multi-model engine is Cosmos DB, as it supports document, graph, key-value, table, and column-family data models.
As stated earlier,
NoSQL solutions favor performance over data quality through the use of BASE. BASE stands for Basic Availability, Soft-state, and Eventual consistency. If I had to sum up the differences between relational and non-relational database engines, it would be this:
Relational engines require consistency at the end of each transaction, whereas non-relational engines only require that the database be consistent at some point in time (i.e., eventually).
This is not a hard line. You can find plenty of counter-examples where a relational engine allows for eventual consistency, or a NoSQL engine embraces ACID. Over the decades, software providers have been baking lots of functionality into their products.
This page for SQL Server at DB-Engines lists ‘secondary engines’ of document, graph, and key-value. That’s a lot of functionality inside a traditional relational engine.
I’m offering a way to help you understand the differences at a high level, so you can make an informed decision about what database engine would best suit your specific requirements.
In the next post, I will help you understand how to choose the right database for your requirements.