Over the past two to three years, I’ve seen customers make a move towards both MySQL and PostgreSQL. There is no software licensing with either of these database systems, which can benefit companies, and developers love the community support for them. While they’re not as robust in terms of features as some of the commercial relational database management systems (RDBMSs) out there (like SQL Server), their user base is dramatically expanding and new features are being added, edging them closer and closer to parity.
It’s worth mentioning these databases are built to run on the Linux operating system, and sometimes administration for these engines can be somewhat challenging if you’re a Linux novice. Thankfully, cloud offerings such as Google’s Cloud SQL and AlloyDB, Aurora DB from AWS, and Azure SQL for PostgreSQL and MySQL automate most of the administrative work that would consume your time as a PostgreSQL or MySQL administrator.
For an aspiring database professional or a seasoned professional looking to expand their skill set, I highly recommend immersing yourself in one or both engines, as they’re going to be around for a long time. This article is a high-level explanation of the PostgreSQL vs. MySQL database systems as I think it would be useful to review them both to give you a better understanding of where to focus your efforts for new applications.
A Brief History of PostgreSQL and MySQL
PostgreSQL originally started out as a project by the US government, known as the
Postgres project, back in 1986. In 1994, a couple of students and the University of California, Berkley started adding a SQL language interpreter to the Postgres engine, launching what we know today as PostgreSQL.
MySQL started out as an open-source relational database project in Sweden. One of the original developers of MySQL had a daughter named “My,” which is where the My comes from in MySQL. This engine originally appeared in 1995, and the original company MySQL AB was purchased by Sun Microsystems, which was then acquired by Oracle in 2010. Once Oracle acquired MySQL, one of the original creators of MySQL forked the engine code and created the MariaDB relational database system.
What are the Main Differences Between PostgreSQL and MySQL?
PostgreSQL is known as an object-relational database management system (ORDBMS). This concept comes from the object-oriented programming paradigm where designs are constructed for classes and objects, rather than designed for function and logic.
There are two main features in PostgreSQL related to object-oriented design:
- Table inheritance
The idea here is to define a base table that includes a set of commonly used fields, and then you can create other tables to inherit the fields from this table. An example of where this is commonly used in PostgreSQL is for table partitioning, where a base table is defined that is range partitioned by a certain column in the table, and then additional partition tables are declared that inherit from the base table and specify a range of partitions values. Common use cases are for large tables or performing large data movements for ETL operations.
- Function overloading
This is the ability to have more than one function with the same name but different parameter lists. This is a very common feature in most programming languages, but not a feature most RDBMSs include.
MySQL is a traditional RDBMS. One interesting feature advantage it has over PostgreSQL is its support for a wide array of different storage engines. The InnoDB engine is the default storage engine and offers features most RDBMSs offer, such as ACID compliance (atomicity, consistency, isolation, durability), crash recovery, row-level locking, clustered indexes, and replication (and you can even replicate to a different storage engine, like MyISAM). However, there are
many other storage engines to take advantage of based on your data needs. One example is the Memory storage engine, which allows you to store data entirely in memory for fast retrieval. For the rest of this article, when I refer to MySQL, I’ll be discussing the InnoDB storage engine.
Backup and Recovery Capabilities
Both
MySQL and
PostgreSQL can perform database backups and restore those backups to a point in time, which is critical for any RDBMS. The process for performing backups and the subsequent restore process can be quite intensive.
Row Versioning Similarities
PostgreSQL implements Multi-version Concurrency Control (MVCC) by creating new versions of a row in the same table when an update occurs. This ensures writers do not block readers and readers do not block writers. However, the cost of this strategy is a possibility the table could experience bloat if the row versions are not garbage collected. A process called vacuum is responsible for the row cleanup. This process is generally executed in the background but may need to be performed manually as necessary. You can read more about what to know about
PostgreSQL concurrency control in a previous article I wrote.
MySQL also has the concept of MVCC to ensure write operations do not block read operations and vice versa. In MySQL, older row versions are kept in an Undo Tablespace. An Undo Tablespace is a collection of records with information on how to roll back a given change.
PostgreSQL vs. MySQL Development Environment Differences
Both MySQL and PostgreSQL have their own free Integrated Development Environments (IDE) tailored to their needs; MySQL has
MySQL Workbench, and PostgreSQL has
pgAdmin.
I am not a big fan of either product. Instead, I typically use
DBeaver Community Edition when I’m working on either MySQL or PostgreSQL. I really like how it allows you to organize your code and it works great for other data platforms as well.
PostgreSQL has a richer set of data types compared to
MySQL. For example, PostgreSQL has an array data type, which allows you to store a simple or multi-dimensional set of array values. You can access the values in the column array via regular array notation (numerical offset of the value in the array). PostgreSQL also has composite data types, which are essentially a type representing a row/record. For example, you can define a Sales composite data type containing the fields CustomerID, Price, and SaleDate. You can then define a column in a table of type Sales. That column would contain the fields CustomerID, Price, and SaleDate. It’s a way of storing complex data in a single column in a table.
The last of the advanced data types from
PostgreSQL I want to mention is the Network Address types, such as cidr, inet, and macaddr. Using these data types are preferable to storing network addresses as text because there are validation routines to ensure address correctness as well as some built-in functions and operators specific to networking.
Does PostgreSQL or MySQL Perform Better?
Database performance is always biased towards what the individual needs of the application. However, there are a few things to consider when choosing between PostgreSQL and MySQL.
PostgreSQL has much better query parallelism support than MySQL. Query parallelism was introduced in MySQL 8.0 but is only currently supported for clustered index scans without a WHERE clause. PostgreSQL on the other hand can use parallelism for sequential table scans, bitmap heap scans, and index-only scans. It also supports parallelism for joins, aggregate queries, and append operations.
Plan caching is a very important concept for relational databases as plan-reuse can help the overall performance of the system, but it is dependent on several variables.
MySQL does have the ability to cache query plans, but it’s very limited. Cached plans in MySQL cannot be reused across different sessions, so as soon as the session ends the plans in the plan cache are discarded. Cached plans are also only available for prepared statements or statements that are in a stored procedure, function, or trigger. Similarly, PostgreSQL also doesn’t have a shared query plan cache. Prepared statements in PostgreSQL can be cached, but the statements are evicted from the cache when the prepared statement is closed.
MySQL does have a very interesting feature called the ‘query cache’. This cache stores the text of a SELECT statement and the associated result dataset of that statement in cache. If the same statement comes in from a different session, the results can be sent without needing to parse and execute the statement again. This is great for those environments where data doesn’t change often, and the same query is executed often. However, this feature doesn’t work for partitioned tables and any data modifications to relevant entries in the cache invalidate the results.
One big drawback for
PostgreSQL is how expensive it can be to initiate new connections to the database engine. For those applications that have workloads with bursts of connections, the way PostgreSQL generates connections can really be a bottleneck. This is where it is critical to have a connection pool application, such as pgBouncer, ready to help with the workload. Establishing connections in MySQL doesn’t involve the same amount of overhead as PostgreSQL, so this is much less of a problem for MySQL applications.
One additional feature to keep in mind as it pertains to performance is the fact that
PostgreSQL supports materialized views. A materialized view is a view definition that is persisted in a table-like form. This can be a great performance enhancement for complex queries that are called often. MySQL does not have the materialized view feature.
Does PostgreSQL or MySQL Have Better Support?
Because
PostgreSQL is a fully open-source software project, getting support when you have an issue can sometimes be a challenge. However, there are several companies offering Professional Support Services, and they are listed on the PostgreSQL
website. Unfortunately, there isn’t a centralized support organization you can call when you have a problem, so you’ll need to pick one of the companies from the list that PostgreSQL.org provides or find a local consulting firm.
MySQL, on the other hand, is owned by Oracle, which means if you have purchased the Enterprise Edition of MySQL then you will have access to their technical support.
Which Database Engine Should You Choose?
This article provided a high-level overview of the differences between MySQL and PostgreSQL. As these are large software packages, it would be impossible to detail all the differences between the two in a single post. However, I did cover what I feel are the major differences between PostgreSQL and MySQL. I feel like PostgreSQL has a more robust engine with (slightly) better features, the main concern for me being that MySQL has very limited query parallelism support while PostgreSQL is much more advanced in that area.
Bottom line: While I do enjoy working with both database systems, I tend to tell clients to lean towards PostgreSQL for new development.
Crucial to using and maintaining any database is the ability to optimize its performance. SolarWinds database solutions are built to help you more easily manage database performance across database platforms wherever they’re located—from the cloud to on-premises and everywhere in between. You can learn more about SolarWinds
PostgreSQL performance tuning solutions here and
MySQL performance tuning here.