Postgres Sort Instability? | Scaling Postgres 322

[00:00:00] Whenever I do performance work frequently, I might try to rewrite queries in different ways to try to make them more performant. But over the years, as the postgres optimizer has gotten better, I've found that that doesn't work as frequently because the optimizer has just gotten better at interpreting whatever SQL you're sending it and finding the most optimum path to execute it. Now you can still rewrite things like nothing in queries as something like a left outer join and looking where the value of that particular table you're joining to is null. The optimizer can't find optimizations like that, at least yet, but the first piece of content we covered this week shows some really non optimal behavior. But I hope you, your friends, family and coworkers continue to do well. Our first piece of content is postgreSQl sort estimation instability. This is from danolevo dot substack.com. and specifically he's talking about incremental sort estimation instability. So as he mentioned, incremental sort was added in 2020, which is postgres version 13. And it basically uses earlier sort operations and say a node of the query plan so it doesn't have to do sorting later on. It can pass through that sorting and or build upon sorting as well, given its incremental nature. And he says, quote, it adds to the optimizer one more strategy to bypass massive sort operations by utilizing sort orders derived from index scans or switching to merge joins to have at least partly pre sorted orders before some grouping or ordering operation. So normally this is a great optimization. However, he shows some pretty weird behavior here. So first he creates a two column table, one column x. One column, yeah, inserts the value of one into column Y and inserts 1 million unique integers in column X, creates indexes on both, and then does a vacuum analyze. And he disables parallel query processing as well. And he has this particular query where he's joining the table two itself and saying check where x on table one equals y on table two and where y on table one equals x on table two and then group by table one's x and then table one's y. And he is getting an incremental sort based upon the sort key of Y, which is not great because that value is one. And if you'll notice, it's also flipping the group keys. So he's wanted to do group by x, then by y, but the optimizer is choosing to group by y and then by x. So then he says, okay, let's just flip the conditions around the equals instead of table one x equals table two y. He says table two y equals table one x and the same thing for the next and condition. Now you think it would come up with the same plan. However, it changes the plan and it's no longer using an incremental sort. It's going back to a full sort basically. So I said okay, let's disable the sort. So he sets enable sort off, runs it again, he gets the incremental sort back. Now the execution time is less than half than it was previously and Y because now it is grouping by x which makes more sense and then by y. So it's doing the pre sorting on x to give you the greater performance. So now of course, ideally this would be the plan that the optimizer would always pick with these type of queries. Changing something in the where statement around the equal sign shouldn't result in this much of a different plan or turning off sorts as well. And he did narrow it down to it was the group by reordering that was causing a little bit of slow performance. So the system choosing to group it by y and then by x when y is one for every row is not going to help you with sorting operations. So even though the optimizer continues to get better with each version of postgres, there are still issues like this where doing group by reordering or showing different plans based on sort and incremental sort just based on changing locations of columns in your where clause can cause issues can apparently still happen. But hopefully things like this will be addressed in future versions of postgres. [00:04:14] And if you're wanting to learn more about how to scale postgres, I have posted my training that's approximately an hour called postgres performance demystify. Now this is similar to the training I did at the end of 2023. So if you participated in postgres performance demystified at that time, I don't think you're going to get too much more out of it. I did make some tweaks, but not significant ones. But this is free training that you can participate in and I kind of share the framework I used to achieve 1000 fold improvements. So I'll have the link in the description below where you can sign up. [00:04:50] Next piece of content should we use foreign keys? This is the next episode of Postgres FM, and Nikolai and Michael discuss whether foreign keys should be used or nothing. Now they are of the same opinion I am generally you should use foreign keys to maintain your data consistency because without them, you have to rely on your application to handle that type of consistency. And Nikolai made a point that sometimes you have multiple different types of applications talking to the same database. Like maybe it's a Ruby app or a Python app, and maybe they handle things differently or things are coded differently and you could get inconsistency in your data that way. And it's always best to put your constraints like foreign key constraints in the database itself. And he also mentioned that foreign keys also offer some features like using cascade deletes, where you can delete a child row if a parent row is deleted. Of course, foreign keys also come with their disadvantages. Mainly there is a performance penalty, or they mentioned attacks for using them. And also you need to be cautious when applying them because these are DDL operations that could impact performance or potentially cause downtime if it's not done right. So you should do it in two phases. First, create the foreign key constraint in a not valid state so it starts working on all new data going forward, and then validate that constraint over time after that's done. And of course when doing any DDL, make sure you're using a lock timeout setting. They talked about whether you should or should not index your foreign keys, and with regard to that, there's definitely varying different viewpoints on it. You're going to want one there if you ever delete a row in a reference table. But of course there's also the performance impact of having them there and being maintained. They also mentioned a greater performance problem that Nikolai was actually investigating with multi transaction locks that can occur when adding a lot of child rows, and then other updates are done at the same time. But ultimately, to answer the question, should you use them? Generally they recommend yes, you should. But if you're already at a really large scale or large number of transactions, you do need to be more cautious because there are some performance issues you may have to deal with. But if you want to learn more, definitely listen to the episode or watch the YouTube video down here. [00:07:10] Next piece of content making the postgresql visibility map visible this is from cyberduck postgresql.com, and they're talking about the visibility map, which is basically an area in memory that stores two bits for each page or data block in the database. One bit is the all frozen flag, which anti wraparound auto vacuum uses. The other is the all visible flag. Now the reason this is important is because visibility information what's visible to a given database session depending upon the activity that's going on is stored in the table data or the heap. The indexes do not contain this information, they don't have any visibility information. So postgres came up with this memory structure called the visibility map. And when a query is done and it's possible to get an index only scan, it still needs to know if that data is visible or not, so it can check the visibility map to very quickly confirm are there any rows I need to check in this particular page? So it's a quick memory check of how many pages they need to check, and then they go and check the rows to get the exact visibility information. [00:08:17] And this blog post basically shows how some of that process works. So he created a table here with an integer primary key and an index, inserted a million random rows into it, then vacuumed it and analyzed it. So that builds the visibility map. He then does an explain analyze doing a query on that table by the id where the id is less than 1000 and you can see that he is getting an index only scan and it says the number of heap fetches were zero. So it never had to go to the heap to check any visibility information. Why? Because the visibility map was updated and everything was visible. There were no dead rows in a page that would require having to check something. So the next thing he did is he updated a value and set id seven to a different value, then ran that same query as before. And now you can see 186 heap fetches. So this is the number of rows it had to retrieve from the heap in order to satisfy the index only scan. Now to look at the actual data file, he looked at the common Tuple ID or the CtId of the table for all the rows. The first value here is the page or the block. The second value is actually the tuple. So as you can see, tuple seven is missing from the first block because it was updated, but it does exist in this block or page at 5405. But if you remember, the heap fetches was 186. So basically it had to review all the rows in this page. So there were 185 rows in page zero and it also had to check the visibility of this id because it was less than 1000 for this page. So that was the plus one. So 185 plus one gives you the 186 heap fetches. He then looked at the visibility map using the extension pgvisibility, and you can see the all visible bit is set to false for the first block and it's set to false for the last block, 5405. Now he ran the select again and then suddenly the heap fetches were one less so 185. So he was like what's going on? And then he remembered this feature in postgres called killed index tuples. So quote if an index scan detects that a table row is no longer visible to any transaction, it marks the index entry lp dead so that the next index scan knows it can ignore the entry. So basically that saved one heap fetch. But then he vacuumed his table expecting the heap fetches to go to zero when he ran the same query again. But it didn't. So he's like what's going on? Well, this is another feature of postgres that tries to bypass unnecessary index vacuuming, and it basically says they only skip index vacuuming when 2% or less of the table's pages have one or more lp dead items. So since this only impacted one page, it was far less than 2%. So this index vacuuming was skipped. But he says you can force it by using index cleanup on for your vacuum command. And when you do that, he ran the query again and you get zero heap fetches. So this blog post goes into a lot of detail about how the visibility map works. So if you want to learn more about that, definitely check out this blog post next piece of content understanding vacuum, vacuum template zero and transaction id wraparound this is from Andreascherbond, LA and this blog post is talking about, you know, imagine a situation where you have a very active large database and it's basically hogging all the vacuums for the database cluster. And maybe your postgres database, maybe the template zero and template one databases, which are basically templates for creating new databases, aren't getting a vacuum done to prevent transaction id wrap around. And he explains what transaction id wrap around is here. So I'm not going to explain it right now, but basically there are mechanisms to run auto vacuum to prevent transaction id wraparound from happening. And there's a couple of different settings. There's auto vacuum freeze max age, which is set at 200 million. That kicks off a very aggressive vacuum to try and prevent transaction id wrap around. There's also the vacuum failsafe age parameter, which is set at 1.6 billion transactions. That's even more aggressive. And then there are different warnings that start getting admitted to the log once you start coming up to the 2 billion limit of transaction ids. And in this scenario where your database is your postgres database, the template zero, template one, start to approach that 2 billion limit. Basically how you can resolve it is do a vacuum freeze for the database. So basically you can connect to postgres. Do the vacuum freeze and that'll reset the txid. Then you can connect to template one, do the vacuum freeze, that will reset those tx ids. So now you can see the age is zero for both of these. But what about template zero? So template zero, when you try to change to template zero, basically connections are denied, you can't connect to it. So what you need to do is alter the template zero database to allow connections. Only then can you connect to it. Then do a vacuum freeze on it. But after you do that, make sure you set allow connections to false after you're done. And now you can see that the transaction age has now been reset. So if you want to learn more about this, definitely check out this blog post. [00:13:35] Next piece of content custom enum type columns and order by this is from bugfactory IO and this is another reason why I choose not to use enums. I just like using text columns with constraints set on them. But in this scenario they have an enum of marital status type with several different values and he wanted to query a table and sort by the marital status. The problem is it doesn't sort by the value inserted, it sorts it by the order of insert. So there's an internal number referencing this value and that's what it's sorting by as opposed to the value of the enumerated. And of course he said okay, well that's fine, I can just cast it to a text. And that should work. And it does work. It does now sort by the text value of the enum. But the problem is, if you do an explain plan, you can see when you cast it to text, it is now like a functional index and it no longer uses the index that was created on it. And he says okay, we can get around that by creating a functional index on the column. So he tries doing that, but then it says error functions in index expression must be marked immutable. So that didn't work either. So basically he had to create a dedicated function that returns the text status, flagging it as immutable. And then finally he was able to create the functional index using this new function. And then finally he could order by the enum using this procedure. And that's why I say another reason why I tend to not use enums. But if you want to learn more, definitely check out this blog post. [00:15:07] Next piece of content post mortem postgres out of shared memory error this is from baidad dot de v dot. And he said one of the platforms started getting out of shared memory errors. And of course if you google this, the main thing it says is you probably need to increase your max locks per transaction. The problem with that, of course, is that it requires a restart of the database. But he went ahead and did that and that seemed to resolve the problem. But he wanted to investigate why this was the case. So he looks into transaction he talks a little bit about repeatable reads, but ultimately said when he looked at what was causing the issue in terms of transaction locks, he was seeing a lot of SI read locks and basically they had a long running analytical query which only performed read operations and this long query was acquiring a lot of these SI read locks. Now he talked about repeatable read transaction isolation up here, but with a serializable isolation level there is the option to have a session that is a read only transaction. So I don't see that feature available for the type of transactional isolation he was mentioning up here only for serializable. But if you do that and you only have an read only transaction, you would want to set this because it should reduce the number of SI readlocks that are present, giving you better performance. [00:16:32] So if you want to learn more about that, you can check out this blog post next piece of content mentoring program for code contributors this is from RHaas dot blogspot.com and he announced via the postgres mailing list a call for applications for people to join a new mentoring program for people learning to contribute and commit to postgres. So this blog post explains some of the detail, but I here's the [email protected] and basically he has found nine committers who have volunteered to mentor one person each. So there's up to nine slots available for people who want to contribute more to postgres and be mentored by one of the existing committers, which I think this is a great way to kind of grow the ecosystem now, he said the downside of this is that since there's only nine slots, you have to have already probably contributed in some way already to postgres to be able to get one of these slots. But he did say in the post he had a good number of people that don't participate regularly on P's cool hackers at all. He basically said they wanted to get some mentoring. So he said this mentoring program may not be a good fit for that. Plus they only have nine slots. But what is possible for people who don't have some existing experience? But if this is something that sounds interesting to you, definitely check out these two pieces of content. [00:17:58] Next piece of content crunchy data postgresqL 16 Security Technical Implementation Guide released by DISA, which is the United States Defense Information Systems Agency. This is from crunchydata.com and they released these security technical implementation guides periodically. This is the one for Postgres 16, so if you're interested in that, you can check out this blog post. [00:18:22] The next two pieces of content are actually proof of concepts for the new extension ecosystem that David Wheeler is working on. First piece of content is PoC distributing trunk binaries via OCI. This is the open container images standard and he goes into all the detail about this particular proof of concept as well as PoC BGXN binary distribution format. So this is another proof of concept with a binary distribution and you can find both of [email protected] dot the last piece of content was when I grow up I want to be a database administrator, said no one ever. This is from karenjex dot blogspot.com and this is a presentation she gave at Django Con Europe 2024 and this is a combination transcript and slides from her presentation talking about her journeys being a DBA. Basically what that meant. What are the responsibilities today and potentially moving forward? So if you're interested in that, definitely welcome to check out this blog post. [00:19:23] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com where you can find links for all the content mentioned, as well as to sign up and receive weekly notifications of each episode. You can also find an audio version of the show as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 220

PG 14.4 Release, Subscription Skip, Using Schemas, Open Source Citus | Scaling Postgres 220

Episode 304

Overhead of pg_stat_statements | Scaling Postgres 304

Episode 114

Data Directory, Vacuum Features, Binary Performance, Network Latency | Scaling Postgres 114