Postgres 17 Commit-orama | Scaling Postgres 312

[00:00:00] Many years ago when I was in school and I had a deadline of some work that needed to be done, a lot of times I would cram at the last minute. Well, the postgres 17 commit fest just concluded, and there is a lot of patches that were committed at the last minute, essentially. So we'll talk about all the new goodies we might be getting this fall, but I hope you, your friends, family and coworkers continue to do well. Our first piece of content is waiting for PostgresQl 17 invent serialize option for explain now normally when I do query optimization, I run explain analyze. I might do explain analyze buffers, but this one there's a new option called serialize, so it tries to account for the serialization step that must happen in preparing all the data to send across the wire to a client. Now, it doesn't include the time required to send data to the client, it's just serializing it up to the point to send it across the wire. And this does things such as de toasting compressed data. So this can be a significant component of preparing to send a lot of data to a client. And I have to agree with dipesh here, this will be huge. So I could see me using this significantly in my performance work. And he shows an example here where he's just doing an explain analyze of a table. I don't know what the schema structure is, presumably there's some toastable rows in here, but running this query as is, the timing is 750 milliseconds. But he added the new serialize option to it. So explain analyze serialize and you'll see this additional option called serialization and how much time is required. It's 19 and a half seconds, so the total time is almost 20 seconds now, so more than 20 times slower to do the required serialization step for getting all the data from this table. So this is huge and will be a great tool to have for doing performance work. So thanks for everyone who had a hand in putting this together, and I found the discussion on this patch particularly interesting as well. If you want to check out the link here. Next piece of content waiting for PostgresQl 17. You'll see a trend in a lot of these posts. Add new copy option log verbosity. So if you remember previous episodes of Scaling postgres, there was a copy option called I think originally save onerror two or something similar to that. My interpretation was oh, we can save errors that happen during copy to a separate file and potentially fix it and reload it but no, that's not what the option did. It simply ignored errors. So they changed the option to onerror and the option is basically stop, copy or ignore. And I was disappointed, but now I'm excited again because they added a verbose option to it. And this verbose option emits a notice message that shows the discarded row as well as the line number, the column name, and maybe some reasoning as to why it failed. So I think again, this is huge. So if you look at the example down here, he tries copying in something and he gets notices for the errors here. So skipping row due to data type incompatibility on line two for column y and what the value is. So this is awesome, because now that we have at least this now you could save it to a separate file. You could then parse that file and identify the rows in the original file you need to load again and just write a program to extract those and reload those. So it's a few more steps, but essentially gets you the same result. So that's great. [00:03:50] Next piece of content PostgresQl 17 PGBuffercache evict this is from dBI services.com and this is a new feature being added to Pgbuffer cache that allows you to evict elements from the cache. So you can see here he's using the function for a particular buffer id for a given rel file node. Now you might be thinking this would be great for performance work, because I know when I'm doing performance work, I generally like to test a non cache in a cache state, but this doesn't do anything for the operating system cache. So that will still be in place, because postgres of course has its buffer cache, but then it also relies on the operating system caching as well. So just evicting it from the buffer cache, it still may be present in the memory in the operating system cache, but what's great about this is it does let us evict it from the buffer cache. But in the next post by Lucas here, as part of his five minutes of postgres, he did a waiting for postgres 17 benchmarking with PG buffer cache evict this is from pg analyze.com dot and he basically used this tool to to evict things from the cache, but then he also evicted things from the operating system cache as well, and he identified it using the Fincore utility to identify if a given file is cached or not. And then he used the DD command to basically clear it. And what he found was that if you clear both the buffer cache and the operating system cache for a particular file area, he got cold cache performance. Basically it had to go to the disk. But if that file information was either in the buffer cache or in the operating system cache, he basically got warm cache performance because it's in memory somewhere. So if you want to test a cold cache, you would have to clear both of these. An easier way to do this if you have a large enough data set is to just use different sets of data, you know, haven't been queried. So this would have to be on a quiet system, but this could definitely have its use cases. So I definitely encourage you to check out his video or his blog post here. Next piece of content PostgresQl 17 features I am most waiting for this is from H. Dombrovoskaya dot WordPress.com and she's talking about a presentation she saw. I guess it was at scale by Magnus Hagandar, and I didn't know some of the features that she was talking about here. So I actually tried to look up his talk. I did find his website and he does have a post of the presentation he will be giving this April, I think, and it's a look at the elephant's trunk postgres 17. So you can definitely check this [email protected]. But some of the things she specifically talks about she's looking forward to is the event trigger on login. So that could definitely have some use cases, error handling and copy. And I talked about how much I'm looking forward to that. [00:06:50] PG changed password so this is a protocol level ability to change password redundant, not null removal. Now I believe this is referring to at planning time. If a column is defined as not null, it's basically a no op if the query is asking for it being not null. Next is self join removal. So if you're joining to a table multiple times and it's redundant to do so, it can remove those. And then lastly the temporal primary key. And I was like, hmm, what's that? And looking up here, I was like, oh, they're talking about temporal databases essentially, or temporal tables, where you're able to store the past, the present, the future of a data set all in one table. So a given row may be represented multiple times, but is effective only in a given time range. So here's an example of that. You have a test table. It has an id that is an integer. Then it has a column called valid, that is a timestamp range. Then you define a primary key where it's the id and that valid timestamp range, but you define without overlaps. [00:08:03] So what that means is if you have an id of one in this table, it could occur multiple times as long as the timestamp range never overlaps for id one. So that allows you to store the entire history of this id, one row in this table, and you query it based upon the timestamp range to determine what data is active in a given time range. So that's definitely interesting. Next piece of content waiting for PostgresQL 17. Add support for merge when not matched by source. This is from dipesh.com and this is an enhancement for merge, which basically lets you merge data together and you can define what happens during a merge. What do you want to do when something is matched or not matched or different conditions? You can insert, update or delete data. So in this example here, he has some base data and he says, I want to apply these incremental changes could be negative, could be positive to the data set. Basically merge them together. And with this command you get the results like this. Well, there's a new option called when not matched by source and you can define an action for that. So in this case, he deleted a row from one of the data sets. Now when he merges it in, if it doesn't find a matching row for it, go ahead and delete that row. So it deletes it from the base data as part of this merge command. So that could be definitely beneficial for some data loading. Use cases next piece of content postgresql 17 commit fest conclusion this is from Peter eisentraut.org dot. And you know, at the top of the show where I was talking about cramming to get stuff done, you can look at the number of committed patches that we have and in March there were 134. So that's pretty crazy. And overall you can see that we did exceed the number of patches that were in 17, so up to 450. So check this blog post out if you want to look at more statistics. [00:10:09] Next piece of content there was another episode of Postgres FM last week. This one was on logical replication common issues, and Nikolai and Michael were joined by Si, who is the CEO of peer DB. So they were talking all about logical replications. And he mentions common issues here. Well, from what I recall, most common issues that people were complaining about is the fact that you can't do DDL, so you can't make schema changes as a part of logical replication. You need to maintain those manually. The second biggest concern probably was sequences not being carried over. So again, those have to be maintained separately, but they listed a number of issues as well. But I think it was another point of consensus is being on the newest version of postgres is the best way to have less issues with logical replication because they keep adding new features and improving logical replication as new versions are released. So definitely being on a more recent version will make your life easier with logical replication. But you can listen to the episode here or check out the YouTube video down here. [00:11:13] Next piece of content, GQL 2024 is out. This is from Peter eisentraut.org dot and this isn't necessarily related to postgres, but you know, postgres uses SQL well, there's now a standard for graph databases called GQL, so I don't know if some of this will eventually get into postgres or not, or carry over into postgres potentially in the future. But you can check out this blog post if you want to learn more. [00:11:39] And the last set of content is related to both Subabase and Neon because both of them on the same day, April 15, 2024, which ironically is also the US tax paying day, announced their general availability. So again, this is Neon. Eon tech and Supabase are both generally available now. As a consequence, they posted a lot of blog posts, so I'm not addressing these separately, but I am including a link to the blog post so you can check out all the different announcements that they made specifically on the 15th. What I'm particularly interested in reading, I haven't got an opportunity to yet, is the Orioli DB joins Supabase and Neon did the same thing. They have many blog posts in both their engineering and the community section, so feel free to check that out if you're interested. [00:12:31] I hope you enjoyed this episode. Be sure to check out scalingpostgrass.com where you can find links for all the content mentioned, as well as sign up to receive weekly notifications of each episode. You can also find an audio podcast version there, as well as a full transcript. Thanks, and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 190

Hello Babelfish, Planner Deconstruction, Exist & Not Exist, Fun With SQL | Scaling Postgres 190

Episode 346

Things I Don't Do In Postgres | Scaling Postgres 346

Episode 290

Postgres Releases = Reindexing | Scaling Postgres 290