Home > What Is Cardinality in a Database?

What Is Cardinality in a Database?

itil alternatives
Databases have a lot of jargon, and cardinality is one of those words that experienced people tend to forget that they didn’t know once upon a time. But if you don’t know it—and it takes a while to really get comfortable with cardinality—it’s super confusing when the DBA just drops it into the middle of a sentence without slowing down! Fear not: I’ve got you, as they say. Cardinality means two things in databases. For our purposes, one matters a lot more than the other. Let’s do the simple one first, and then dig into the one that matters for query performance. The definition of cardinality that matters a lot for query performance is data cardinality. This is all about how many distinct values are in a column.The first meaning of cardinality is when you’re designing the database—what’s called data modeling. In this sense, cardinality means whether a relationship is one-to-one, many-to-one, or many-to-many. So you’re really talking about the relationship cardinality. Cardinality’s official, non-database dictionary definition is mathematical: the number of values in a set. When applied to databases, the meaning is a bit different: it’s the number of distinct values in a table column, relative to the number of rows in the table. Repeated values in the column don’t count. We usually don’t talk about cardinality as a number, though. It’s more common to simply talk about “high” and “low” cardinality. A lot of distinct values is high cardinality; a lot of repeated values is low cardinality. Picture a product description table in an e-commerce database: The ProductID column is going to be high-cardinality because it’s probably the primary key of that table, so it’s totally unique. If there’s a thousand rows in the table, there’ll be a thousand different ProductID values. The Category column will have a lot of repetition, and it’ll be low or medium cardinality: maybe 50 or 100 different Category values. Name is probably high cardinality, unless there’s more to this table than meets the eye (such as multiple rows for different product colors and other variations). Cardinality impacts performance a lot, because it influences the query execution plan. The planner will examine column statistics and use them to figure out how many values a query is likely to match, among other things. Depending on what it finds, it might use different query execution plans to try to get the best performance. But that’s a topic for a different blog post, because it takes a bit of work to explain. So now, next time someone drops “high cardinality” into a sentence without pausing, you know they really mean “a lot of different values.” And there you have it!
SolarWinds
We’re Geekbuilt.® Developed by network and systems engineers who know what it takes to manage today's dynamic IT environments, SolarWinds has a deep connection to…
Read more

Tweets

SolarWinds's Twitter avatar
SolarWinds
@solarwinds

Who has access to all the sensitive information in an organization? Find out why it’s a good idea to regularly audi… t.co/JOf0QofKjz

SolarWinds's Twitter avatar
SolarWinds
@solarwinds

In times of tedium, what strategies help us find our zen? Head Geek Chrystal Taylor shares how to use creativity to… t.co/hKsN796nNA

Julia Chatterley's Twitter avatar
Julia Chatterley
@jchatterleyCNN

"All we can do is continue to learn and constantly be vigil." @solarwinds CEO Sudhakar Ramakrishna discusses their… t.co/iyEFNumLS2

Retweeted by SolarWinds