PostgreSQL 10 Features: Parallel Queries
This is the second story of the series “PostgreSQL 10 features”, you can read the first story “Hash indexes” if you missed it.
We are approaching to the final release of PostgreSQL 10, now I will talk about “Parallel Queries”, like the Hash indexes in the previous post, Parallel Queries it’s not an entirely new feature, but in PostgreSQL 10 there is an Improved Query Parallelism.
With 9.6, PostgreSQL introduces initial support for parallel execution of large queries. But it’s not enabled by default and only covers parallel execution of sequential scans, joins and aggregates.
In PostgreSQL 10 we will have:
- Support parallel B-tree index scans.
- Support parallel bitmap heap scans.
- Allow merge joins to be performed in parallel.
- Allow non-correlated subqueries to be run in parallel.
- Increase parallel query usage in procedural language functions.
All of this is translated to Substantial performance improvements, especially in the area of scalability on multi-CPU-socket servers.
Yeah! our beloved Database is better and better with each release. We need to thanks Robert Haas and other PostgreSQL Hackers, for their hard work. It has been a long and awaited feature, but we are finally there.
Some people coming from other RDBMS could say: “WHAT!!! the X commercial database system has supported parallel queries since many years ago”; and yes it’s true, but these commercial databases heavily invest in development and the license to use them costs lots of money, and with PostgreSQL you get a pretty advanced feature set, comparable and sometimes better than the commercial counterpart, at zero cost, free/gratis, so I invite you to give a donation to the project to support the hackers that make this possible.
I will not make this post long since Robert Haas himself write a nice post about this improvements.
Remember to tune properly to your environment, by default max_parallel_workers_per_gather is set to 2, on multi cpu-core-sockets servers, you might want to increase this value, but be aware that parallel queries may consume very substantially more resources than non-parallel queries.