Episode Transcript
[00:00:00] Speaker A: In this episode of Scaling Postgres, we talk about Postgres 15, Beta One release, optimized, query, breakdown, sort, performance and prepared versus partitioning. I'm Kristen Jameson, and this is scaling postgres episode 216.
You all right? I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 15, Beta One released. This is from Postgresql.org and they've released Beta One of Postgres 15 and this article talks about the different feature highlights from 15. So some of the main features from a developer perspective are being able to support the new SQL Merge standard as an alternative for insert on conflict. They talked about the updated JSON support to match what is available in the SQL standard as well as some more regular expression functions. They go over some different performance enhancements, including areas that use more parallelization, as well as some enhancements to sorting and then some enhancements to backup, archiving and compression, which I'm particularly interested in seeing some of these changes in terms of being able to support more than just like gzip. But it looks like they've extended support for LZ Four and Z standard through a number of different postgres utilities, as well as enhancements to logical replications, some administrative areas and of course, security. So you can definitely review this as well as the draft release notes that are available here the next piece of content postgres Query Optimization left join versus Union all this is from Crunchydata.com and they're talking about a query that a customer gave to them saying hey, we're trying to make this more optimized.
It's running against two tables that have millions of rows in them and it was taking about two to 3 seconds to run. And they were looking at this and trying to figure out how to optimize it. So it's using a coalesce function to replace the case where there's no counts in Table B. They made it a zero, but yet they wanted to order it by the count and how this was structured. They really couldn't use an index to do this because this is dynamically created and they said even an expression index really wouldn't handle this. So they put it in their thinking caps to come up with another way to do it. And what they basically did is they broke it down into two different steps. So basically, go ahead and do the case where rows exist in Table B. So go ahead and do a natural join to get where there are counts in Table B. And this is the example of that query here. So relatively simple. And this will use an index on the table count so it can do the ordering by table count pretty efficiently. But then for the case where there are no rows in Table B, do what they call an anti join. So you still join Table A to Table B, but then you are only looking for rows where there are no rows in table B. So essentially you're just getting the set in A that does not exist in B and that by definition has a zero count. So you just do a static zero and each of these queries independently super fast. So then you can just put them together using a union all or you could even just query them separately, it doesn't really matter. But this gives it all to you in one query. And with this, with being able to utilize the indexes that were on the tables, it basically executes in little over a millisecond. So basically 2000 3000 times faster. So this is a great example of rethinking kind of how you could potentially run a query to process it much more efficiently. So if you want to learn more.
[00:03:46] Speaker B: About that, definitely check out this blog post.
[00:03:49] Speaker A: The next piece of content speeding up performance in postgres 15. This is from Citusdata.com and they're talking about some enhancements that had been done for sort performance and they really covered four areas of improvements that made it into postgres 15. So the first is improvements on sorting a single column. So that means you're just selecting one column and you're wanting to do a sort by it. They were able to do this more efficiently and get you about 25% better performance in the example that they did here, looking at 10,000 integer values. The second area is basically reducing memory consumption by using generation memory context. So basically sorting within memory the way that postgres normally does it, it always rounds the memory allocation up and that actually utilizes more memory than necessary. But when adjusting it to conserve memory it also resulted in a speed boost. And of course with this type of enhancement it's based upon the size of the data. So you can see the differences between 14 and 15. It kind of basically depends upon what size the data is in terms of the amount of improvement. And you can see how there's basically a step function with postgres 14 whereas with 15 it's more linear, the amount of performance. And the table going down here is the number of table columns that have been added so that gave a little bit more of a speed boost. The third area is add specialized sort routines for common data types. So it looks like this is mostly with regard to integers to make things more efficient in terms of the comparing. And this gave a little bit more of a speed boost as well between 14 and 15, not much but a little bit better. They're saying between four to 6% better within memory sorts. And then the last area is replacing polyphase merge algorithm with Kway merge. Now this was specifically targeted for larger scale sorts that exceed the size of work mem significantly and you can see with a given work memory size if it's low, that performance definitely improved for postgres 15 relative to 14 and they said it's like 43% faster for large sorts. So another set of improvements, but they do say they're not done yet. There's some more future work that needs to be done for performance and sorting as well in postgres that they'll be working on. But if you want to learn more.
[00:06:15] Speaker B: About these areas, definitely check out this blog post.
[00:06:19] Speaker A: The next piece of content postgres on performance of prepared statements with partitioning. This is from Amitlan.com and he's talking about the issue where you can send a prepare statement to prepare a query and then you send a secondary execute to actually execute the query. So this basically allows it to do a parse and pre plan before executing the query. And the intent in this is like say you want to send ten different queries that are essentially identical, maybe the parameters change if you prepare it first, then it can operate much more efficiently. And he shows an example here using PG bench where he's using a protocol sample so it's not using the prepare and then execute, it's just running the command. And the average query latency is 00:58 milliseconds with a TPS in this example of 17,000. Well, when you go with prepared so it prepares it and then executes those queries or statements that are sent to the database. The latency drops to 00:31, which is about half the latency. As you would expect. The TPS goes up to 32,000 so essentially doubles your transactions per second. So it is advantageous to use this. But the issue comes with partition tables because the database won't know what parameters are going to be expected. It can't really narrow down what partitions need to be hit. So partition pruning is not something that can be done at planning time, it has to be done at runtime. So that makes things a bit slower. And then plus there were some inefficiencies he mentions with regard to doing the runtime pruning of the partition tables and basically the more partition tables you have, the longer the latency of the queries. However, you can do what they call a custom plan and what that basically means is it has to do the full parsing and planning and it doesn't take advantage of prepare and then execute. So basically for partition tables that's what it basically does. But he's looked into a patch where you actually can do the prepare and execute on partition tables and he made the runtime partitioning more efficient, I believe. And you could see here an appreciable improvement in performance from the patched version compared to just not using prepare and execute. So this isn't going to be in postgres 15, but potentially in postgres 16. We could look for this type of improvement with partition tables if you use that prepare and execute. But if you want to learn more, this is actually the post that was reviewed in five minutes of postgres episode 18 partition Pruning prepared statements in Generic versus Custom Query plans. So if you want to learn more about this, definitely check out this episode from Pganalyze.com.
Next piece of content solving PostgreSQL's planner issues due to statistics. Missing chunks. This is from Ungress.com. They're talking about an issue where basically they had corruption. So they had a corruption on a toast table. So they got an error that says error. Missing chunk number zero for toast value. Something in PG. Toast two 6119. And looking it up, that was actually the PG statistics table. So the toast table for that table was corrupt. So presumably they had something listed here like a disk failure, a power loss, some sort of hardware failure, or maybe F Sync was turned off. But basically they have this issue. How did they deal with it? So their solution was, well, this just stores statistics, so maybe we can just delete all of the data in the table and then run analyze to repopulate the statistics. So that's exactly what they did and it ended up working. Now normally you would not want to do this to a systems table, but with a specific table, it seems like this would be an acceptable way to do it. And they did give an example of where you potentially wanted to just delete rows from the statistics table that you know are corrupt if you happen to know which particular table, and then you could just analyze that table to add the statistics back. But this is interesting technique to address this type of table corruption. But if you want to learn more.
[00:10:33] Speaker B: Definitely check out this blog post.
[00:10:35] Speaker A: Next piece of Content Unfinished Business with Postgres this is from Craiguresteins.com and this is a very interesting post. It's relatively long, but it's talking about his experience working at Heroku over many years. He's now with Crunchy Data, but he's talking about the impact that Heroku postgres had on different utilities as well as different enhancements that were made to postgres over the years, and talking a little bit about hosting postgres. So if you want a little bit of a insight into some history with regard to postgres, definitely an interesting blog post to check out.
Next Piece of Content adventures in Collating Text in Linux Docker and Postgres this is from PTC and he's talking about an issue he was running into where text was being sorted differently in different environments. And he checked all sorts of different stuff here, but ultimately it came down to the correlation that was being used in the postgres that was being used in each environment. One was set to a different correlation and it was their docker container at GitLab that they wanted to fix. So he went through the process of setting the correlation exactly the way he wanted it in the docker container that they're using, and he goes through the process of getting that set appropriately so that the data now sorts consistently. So if you want to learn more.
[00:12:01] Speaker B: You can check out this blog post.
[00:12:04] Speaker A: Next piece of content my experience with exporting postgres RDS partition tables to S Three this is from Geoped 85 GitHub IO, and he's using the exporter for AWS RDS databases to S Three. And while it was working for their technique of trying to export them to parquet files, when it came to partition tables, it was doing the child tables and the parent tables, which you would want one or the other, not both. And he basically had a struggle with it. So if you're using RDS exports to send to S Three, you may be interested in checking out some of the issues he encountered when trying to do that.
Next piece of content. Instant Heat Map with PG Featureserve. This is from Crunchydata.com. Now, this is a web service that allows you to access postgres and by extension postgres data for rendering geo information system information over the web. And this example, they're using it to do a heat map. So if you want to learn how they get it set up and working.
[00:13:11] Speaker B: You can definitely check out this blog post.
[00:13:15] Speaker A: And the last piece of content the postgres Goal Person of the Week is Haki Benita. So if you're interested in learning more about Haki and its contributions to Postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks our channel.