16 Times Faster Join Plans | Scaling Postgres 395

[00:00:00] Frequently when I am doing a performance optimization for clients, I focus so much on the execution time and I don't spend a whole lot of time on the planning stage because usually when there is a problem it's an execution problem. But the reality is for some smaller queries, the planning time can take a lot longer than the execution time, particularly if you have a lot of large data that you're collecting more statistics on to try and get more accurate plans and as well as you're using partition tables. So in that cases, frequently the planning time does exceed the execution time for simple queries. So it does matter in the aggregate. So when there's a change that gives a boost in planning time that could be beneficial, particularly for larger databases. Also, be sure to stay till the end where in my consulting corner. I'm going to talk a little bit about PGPartman and whether you should build partitions yourself, but I hope you, your family, friends and co workers continue to do well. Our first piece of content is speed up join planning up to 16 times faster. This is from that guy from Delhi.com and he says for over 20 years Postgres used a simple O squared loop to compare most common values during join estimation. And he says now this worked fine when most definitely default statistics targets were left at the default, which is 100. But like I was saying in the intro, with larger data sizes and needing to collect more statistics to be able to accurately give good plans, people are cranking those defaults up. Like I think I may have set some as high as 4,000 or somewhere in that range. Not to say that's what you should do, but there's a case where I did do that. But this particular 20 year behavior could be changing in Postgres 19 because the problem is quadratic. When you're using say a nested loop to do this and you have 100 most common values for a column, it basically has to do 10,000 comparisons. Whereas if you have a max statistics target of 10,000, which is 10,000 most common values, you have to do 100 million comparisons. [00:02:19] So this is the solution they came up with. So they check the total number of most common values and if it's greater than 200, basically 100 on each side, it builds a hash table. If it's less than 200, it goes ahead and does the nested loop because that's just faster than taking time to build a hash table. So they build the hash table for the smaller side and then probe it from the larger side essentially. So it goes from o to O when using the hash table. And he went ahead and set up a benchmark here to compare the performance with different default statistics targets between Postgres 18 and 19. And you can see on the bottom here, scenario C is where it's just the default statistics target of 100 and there's no real performance difference. Actually, Postgres 19 is slightly slower. I don't know if that's just a statistical variation. [00:03:13] But in scenario B where he set the statistics target to 1000, now Postgres 19 is 40% faster than Postgres 18. [00:03:21] And when he cranked it all the way up to a 10,000 statistics target, Postgres 19 was 16 times faster than Postgres 18 again after this patch for 19. And that's an impressive speed up. And then he looked at this quote quadratic curve in action, and he posted postgres 18 versus 19 for different most common value counts. And you can see around the 1000 most common value counts. Postgres planning time starts going quadratic, whereas Postgres 19 remains relatively stable. So definitely this is an awesome patch if it can make it into Postgres 19. And again, it's unfortunate we have to wait a year for some of these posts I'm covering, but still great that people are working on this. So thanks to everyone involved. [00:04:07] Next piece of content. What is better, a lookup table or an enum type? This is from Cypress postgresql.com and he also considers a third one, basically a string with a check constraint on it. So one solution is you just create a text column and you put a check constraint for all the different values that are possible. Now, the first disadvantage he mentions is that it actually makes the table larger because you have potentially longer state names or country names or whatever status you have in the actual table itself that expands the size. The next issue is if you actually change the name of a state, you have to update every row referencing that state. So it's really hard to rename a value in it since you have to touch every row where it exists. And indexes on a text string aren't as efficient as they would be on a numeric like an integer. The other option he mentioned is creating an enum. So here you just create type, give it a name as enum, and list all your values, and then you just reference your named enum when you create your column in your table. Now, the size of an enum, he says, is a 4 byte floating point number, so that's definitely smaller than a larger text string. The indexes are marginally smaller, but it's super easy to rename the value of an enumerate. You just use an alter type command here and it's easy to add states as well. The disadvantage of enums is that you can't drop them once they exist. The alternative he came up with is to actually rename the value to say it's been deleted. So have some sort of a deleted marker to know it's been deleted. The other option is to just use a lookup table. In this case you would create some type of integer based upon how many values you have integrated. He used a small int in this case since there are not many states in his example and you have a separate table that that row references that has the integer and then of course the name. The advantage of this is a small table. The indexes should be relatively small. The disadvantage is you have to do a join for every query you want to get the actual state name. And sometimes this can result in increased query costs and also result in inaccurate planning estimates. But he says one way to get around it is to actually query using the ID as opposed to the actual state name. And I've actually done something similar to this my own application where larger lookup tables like countries or set of states and provinces. I actually cache the IDs in a constant on application startup. So every query can just use the construction constant that's in memory. So I don't need to do a join every time. But if you want to learn more, definitely check out this blog post. [00:06:51] Next piece of content cnpgrecipe23 managing extensions with image volume in cloud native PG this is from gabrielebartolini.it so with the release of Postgres 18 and the addition of the Extension control path configuration variable as well as kubernetes and Image volume feature, you can now add extensions to your Cloud native PG instance by simply attaching a new image. [00:07:20] And right now There is a PGVector image and a PostGIS image that have been released by the PostGres Extensions Containers project. So basically if you want to install either of these extensions, you don't have to have a huge postgres image. You can have a very bare bones postgres image and then add these additional images as extensions to give postgres these extra capabilities. [00:07:45] So this is super interesting. So if you use Cloud native pg, maybe you want to check this out. [00:07:50] Next Piece of content TDE is Now available for PostgreSQL 18. This is from Percona Community and TDE is the transparent data encryption and they were waiting till everything was ready. But it is finally ready now and it is Patched distribution of Percona's Postgres and 18.1 now supports TDE as well as of course Asyncio that comes with Postgres18 and here's a link to the extension. Again, you can't use this with community postgres yet. Basically you have to use the patched version of the Percona distribution, but they've added some additional key management services with it as well, so so if you're interested in that, you can check out this blog post Next Piece of content using JSON JSON vs jsonb pglz vs lz 4 key optimization parsing speed so this is from depeche.com and he went through an investigation of JSON B and different ways to use compression and how efficiently to pack things. How fast can you get access to that data? [00:08:59] So this is quite a long post, but I'll just highlight some things that he mentioned. Firstly, pglz, which is the default postgres compression decompression algorithm, compresses things slightly better than LZ4, but it's not significant at all. Maybe it's less than 5% in all of these cases, but depending upon the data it depends. But looking in terms of speed, LZ4's definitely faster, like it seems to be about a third faster in terms of compression compared to pglz. [00:09:33] Then he looked at storing data with JSON versus jsonb and looking at this, it looks like the columns might be labeled a little incorrectly up here because the results look exactly identical. But looking down here there is this pretty obvious size comparison. JSON does appear to be slightly smaller than JSON B, but again, where JSON B wins out is on its functionality, what things you can do with it. You can apply indexes to it to be able to efficiently pull out data. So that definitely has an advantage. Then he looked at the speed of parsing different values and he compared text, JSON, JSON B and then different levels of compression. And basically text and LZ4 were the fastest. So if you're going to use JSON, it's going to be a little bit slower to write that data. And JSON B is even slower because it does have to do processing into this binary format that Postgres uses. But PGLZ is much slower compared to LZ4 for all of the use cases here. Then you took a look at the read times. They were a little bit closer, but still text was the fastest, followed by JSON followed by jsonb. So then given that text is faster, should you be casting it to and from JSON B? So we casted text to JSON or JSON B and that seemed to be a lot slower than just using the default data types JSON JSON B. [00:10:58] But if you want to learn more, definitely check out this blog post. Next Piece of Content Understanding Snowflake sequences in a Postgres extension this is from pgedge.com and they released a snowflake sequence generator as an extension on GitHub. Because their product pgedge is of course a multi master system. [00:11:19] Sequences are generated per instance, so when you're talking about a multi master cluster, having each of them generate their own sequences won't work. So basically they came up with a snowflake sequence generator and a snowflake sequence is made up of a 41 bit unsigned value that represents a millisecond precision and an epoch of 1-1-2023. Then you have a unique set of 12 bits. So you could generate around 4000 IDs per millisecond or 4 million IDs per second and then 10 bits that identifies the actual cluster or node that it's coming from. So this is set as a configuration variable, this unique node. So this gives you unique keys across your cluster of systems. And they talk a little bit about how they use it with PGEdge. So check this out if you're interested. [00:12:11] Next Piece of content there's been a release of pgbouncer 1.25.1 with the title of Fixing a bunch of Bugs Before Christmas. This is from PostgreSQL.org this release does address CVE2025.128.19 where it's possible for an unauthenticated attacker to execute arbitrary SQL during authentication by providing a malicious search path parameter. But you have to have all of these different things enabled to be vulnerable. Things like setting track extra parameters to a certain value and auth user to a certain value as well as auth query. So basically all of these need to be set for to be vulnerable. But they also included some other bug fixes as well. So check out the changelog if you're interested. [00:12:57] And now it's time for my consulting corner, so I'm going to talk a little bit about partition tables. So I know probably the most popular tool for managing your partitions is pgpartman, and I've mentioned it in the show before. I'm not a huge fan of pgpartman because I think it's Relatively easy enough to create and remove your own partitions with just a little bit of code. But as I've used pgpartman a little bit more, I found a few other things that are limitations in my eyes. I just wanted to mention the first thing. It's not necessarily a limitation to PGPartman, but it does like to create default partitions by default. So a lot of times when someone starts using pgpartman, it creates default partitions for everything. But the problem with that is that now you cannot detach a partition concurrently, which is the safest way to remove a partition table from a partition set. So generally I don't like using default partitions for that reason as well as I think it just hides issues with your partitioning scheme. I would much rather receive an error if I don't have a proper partition setup. The next issue that I just discovered is it doesn't do hash partitioning, at least from my interpretation of reading the docs. And I've been using hash partitioning since it was released on postgres. The other interesting thing is that the list partitioning only uses numbers or integers, so. So you can't use a text string for a list partitioning, which I find a bit odd because it just makes looking at your partition scheme a little bit harder if they're all numbers. I mean, I think I'm accurate in that assessment from when I was examining it. And lastly, the other thing I've noticed as I've worked with pgpartman and other projects, its API, how you interact with it in its different versions, there seem to be many more changes that require you change how your partitions are set up. Like there was a big change from, I think from version four to five, if I'm correct. So there's a lot of change that seems to be happening. Whereas with postgres API, how you create partitions and detach them and delete them. I wrote some code to set up my partitions way back in Postgres 10 after declarative partitioning was introduced, and, and I literally haven't changed that code since then. It continues to work to this day. Now, my intention is not to get you to not use pgpartman. I mean use whatever works best for you. But I wanted to mention it's not rocket science to write a script that creates your partitions on a regular basis and then detaches or deletes those partition tables on a regular basis and, and then put whatever controls you need in place to be able to audit, you know, what you've written, but if you disagree with me, feel free to let me know in the comments. [00:15:53] I hope you enjoyed this episode. Be sure to check out scalingpostgrads.com where you can find links to all the content mentioned, as well as sign up to receive weekly notifications of each episode there. You can also find an audio version of the show as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 162

Custom Data Types, Row Level Triggers, Dynamic pl/pgsql Columns, citext Usage | Scaling Postgres 162

Episode 166

Postgres 14 Beta 1, PG 14 Improvements, Best Primary Keys, Composite Keys | Scaling Postgres 166

Episode 128

Statistics Speed Up, Synchronous Commits, TLS Updates, Avoiding Cursors | Scaling Postgres 128