Postgres 18: The Deep Cuts | Scaling Postgres 386

[00:00:00] Well, we're going to Talk more about Postgres18 features today, but we're going to be focusing on the deep cuts. So we're not going to talk a lot about the headline features that was covered in my episode last week, but this week we're going to be talking about some less discussed features. If you're not familiar with deep cuts, they basically refer to, say, an artist's less popular songs, particularly with regard to music, but I'm calling them the deep cuts of Postgres 18. [00:00:31] Also, be sure to stay till the end where I'm going to have my consulting corner and talk about some postgres reboot issues or restarting issues, but I hope you, your friends, family and co workers continue to do well. [00:00:44] Our first piece of content is pipelining in psql postgresql18. This is from PostgreSQL Virtual Pro, and we're talking about pipelining, where a client can send requests and not necessarily wait for the results of previous queries before it sends the next. So it can pipeline many requests in parallel. So it's a little bit like Asyncio, but as opposed to on the database server communicating with the disk, it's actually the application client sending multiple requests and not waiting for the response to the database server itself. [00:01:19] Now, pipeline was available way back when in version 7.4 in 2003 of Postgres, where it was introduced in the Extended Query protocol. But it's only been available in LibPQ, which is the library a lot of languages use to communicate with Postgres until Postgres 14. And since then he says some LibPQ based drivers like Psychocop PG3 have started to support it. But the significant thing is now psql has a feature to use pipelining as a postgres 18. Now this is interesting because this is a client capability, so you don't have to upgrade your server to 18, but you can just upgrade the psql client. And you can already start using pipeline commands within or with psql. So they added this backslash command Start pipeline and End pipeline. So basically you can start a pipeline, send multiple, say select queries and then send an end pipeline. But if you need to force a synchronization at any point, you can send backslash syncpipeline and then backslash getresults to get the results back as of that point. [00:02:30] So he shows an example of this in action. This is a bash script, and he's sending in the start pipeline command and looping and Sending individual insert on conflict statements. And at the end he's doing the end pipeline. So relatively simple bash script. And he ran this with a hundred rows in the batch or a thousand, five thousand, ten thousand, fifty thousand and one hundred thousand. And he did it with and without pipelining to see how it would behave. So it's conditional whether the start pipeline was run or the end pipeline was run. And lastly, he checked across three different types of networks. He tested it locally on a localhost, he tested it over a LAN and over a WAN where the ping time was around 4 milliseconds. So if you look here on localhost, the chart with the really high queries per second, especially the large batch sizes, is with pipelining, whereas with no pipelining it's pretty minimal. So the performance varies from 1.5 times faster for small batch sizes to 5x faster for large batch sizes of 100,000. And the performance grows the worse the network performance gets. So looking at a lan, this graph is just incredible how low the no pipelining performance is. And when you enable pipelining, the small batch size increases by 2.6 times faster and the large batch size is 42 times faster. So that's impressive, but not as impressive as the WAN, where at a 4 millisecond delay, the performance improvement is fivefold for small batch sizes and 71 times faster for large batch sizes. Now he says yes, you could optimize this more by sending in a multi row insert. So this is just an example of using pipelining. You may not necessarily send individual queries this way, but I was working on a backfill for a client and I wonder if I could have used pipeline in that scenario to get some better performance. [00:04:35] But if you want to learn more, definitely check out this blog post. [00:04:38] Next piece of Content Exploring PostgreSQL 18 A Developer's Guide to new features. The copy command gets more user friendly. This is from database rookies.WordPress.com and as of Postgres 17, we got the onerror option. So you could log failed rows. But now Postgres18 adds a reject limit. So now you can define the number of errors are allowed before the whole copy command gets canceled. So it's basically if you have to reject say 5 or 10 or 100, whatever you determine, then it will cancel the whole copy process as opposed to letting it continue. [00:05:15] And you get a message like this error skipped more than reject limit of what you set rows due to the data type incompatibility. And you can see the reject limit is set here for when you're doing a copy, so that's a pretty cool addition. [00:05:30] Next piece of content There was another episode of Postgres FM last week. This one was on Postgres 18. [00:05:36] Now they talked about all the major headline features of course such as Async IO and Skip scan, but they also mentioned the fact that checksums are now on by default, which is great. And they also mentioned the not null constraints can be added in a not valid state. And then they had a side discussion talking about AI and how there's more and more blogs or information being generated about Postgres that you unfortunately can't trust 100%. So they had a discussion if there's a way to deal with that. [00:06:07] But they talked about so many more topics. So if you want a more comprehensive overview of Postgres 18, you can definitely listen to their episode or watch the YouTube video down here. Next piece of content Going down the Rabbit Hole of Postgres 18 features this is from Zeta I.O. and this is a really long blog post and comprehensive and I found a lot of additional deep cuts or features looking through it. Now of course they have the main headline features, but some ones that we didn't discuss is the temporal DB improvements. [00:06:38] So that way you can create a primary key where particular unique identifier such as the room ID can be duplicated, just not within a range of a say timestamp that's on the table. So basically now this room can exist in multiple rows as long as it doesn't overlap with the time range that exists they talked about. Now you can create not valid foreign key constraints on partition tables. So this is an addition I hadn't heard of. Talked about an introduction of a new protocol version, although this probably won't be used until all the different tooling and everything can accommodate the new version. But it has been improved, but just a slew of additions and improvements and I would say this is probably the most comprehensive blog post of all the different features that have come in 18. So if you really want to understand everything that has changed, this is a great blog post to review. [00:07:31] Next piece of content cumulative statistics in PostgreSQL 18 this is from databeam IO and if you're looking for information about statistic changes and also the monitoring subsystem in postgres, this is the blog post you'd want to look for with regard to that it talked about enhancements to the PGSTAT IO view. The fact that you can now export statistics and refresh where statistics are missing different columns added to things like pgstatstatements as well as pgstatsubscription stats. So they got a whole host of additional columns. So if you want more information with regard to that, you can check this one out. [00:08:09] Next piece of content. What's new in PostgreSQL 18? A developer's perspective. This is from bytebase.com and again, most of this we've heard before, but what I thought was interesting was the PGGet ACL view for permission inspection. So this could be great for resolving permission issues more easily as you're using this new view. [00:08:33] Next piece of content. PostgreSQL 18 and beyond from AIO to direct I O this is from Cyberdeck postgresql.com and this is where we're kind of looking into the future. [00:08:44] And they said it's great we got asynchronous I O, but what about direct I O? So this is basically bypassing the OS page caching layer and whatever limitations that may exist there in that interaction. And he said there is a feature you can enable on postgres. It's a debug feature that uses direct IO, but he says, please do not use this in production. It's not ready for workloads, but it looks like something that someone has worked on as a demonstration. [00:09:17] So he set up a system and he ran a parallel sequential scan test across two SSDs, and he got up to 3.7 gigabytes per second from the two SSDs. And this is not with a direct I O. This is just stock Postgres 18, I believe. [00:09:35] And then he enabled debug IO direct equal to data, restarted Postgres and ran the system again. And now he got 6 gigabytes per second and using less CPU as well. And he looked at perf between both runs and the one without the direct I O there were a lot more kernel messages you can see here, whereas when using Direct I O all the limitations are in postgres itself. So this is pretty interesting and something that the future may hold eventually. But he does a final comparison here and it's not quite twice as fast, I would say. But check this out if you want to learn more. Next piece of content. Waiting for PostgreSQL19 add GroupByAll. So this is a very simple improvement. Where normally you would look at a query such as this, where you have some sort of aggregator, like we're counting all the rows and grouping it by two different columns, you would have to group by each of the columns listed here, whereas now you can just do group by all and it'll grab everything that's not being aggregated or related to a window function and it goes ahead and places those columns in for you. So that's definitely a big convenience. Next piece of content. How do you update limit in postgresql? This is from cybertech postgresql.com and it seems naively easy to in a subquery, select one ID from what you want to update doing a limit and then update the table where the ID exists there. But unfortunately if you have concurrent operations going on, it is possible for this to hang. If you have a transaction we're trying to delete from that table as well, I believe it might be grabbing the same id, but I wasn't clear if it was here. But if you actually allow the delete to go through, it actually reports updating zero. So because of this potential to hang, it's not a great way to do this. So what is another way he tried is using for updateskiplocked, but then suddenly he got an update of 99 records because of where the limit was applied. So that's no good either. And this is the option that definitely worked is where he used a materialized view to select a single row and use for update skip locked on it to grab that row and then update referencing that materialized view. So that definitely works. And you get even better performance is if you do a for no key update skip locked. And this will avoid concurrent insert statements on a table with a foreign key referencing the table. But check this out if you want to learn more. And the last piece of content, how I learned to use wall Inspect. This is from H. Domrovoskaya.WordPress.com so she had an issue of a lot of wall records being generated. So she said, all right, let's create the extension pgwallinspect. Get a few of the log sequence numbers and get the wall stats for them that are presented here. And 97% of the walls were including information about locks. And she found the table that had all of these locks and looked at specific statements that were causing it. And basically the slow update was due to a slow select query that was happening. So that was the result of her wall growth. And you can check that out if you want to learn more. [00:12:55] And now for my consulting corner. I've done a few upgrades for some clients and what I've noticed recently, most of them are using Ubuntu and they were using say 22.04 and we upgraded to 24.04 as well as upgraded to. I think it's usually Postgres 17 is what we've been upgrading to. There may have been some others that upgraded to 16 or 15, I can't remember. But what I've noticed on Postgres 17 and 24.04, I'm not quite sure which one it is, but there's a new behavior that didn't exist in 22.04, or maybe earlier versions of Postgres. I'm not sure. But that new behavior is Postgres now restarts if a dependent library gets updated. So if a LIBC library gets Updated or an OpenSSH library gets updated, Postgres reboots. So it's not the actual postgres binary that's being updated, but postgres reboots because a library it is linked to or dependent upon is updated. And I can tell this because sometimes we have two different systems, one on 24.04, Postgres 17, another on 22.04, and Postgres 14 or 15 or 16 or something, and it's the Postgres 17 24.04 is rebooting. So I'm just wondering, have you experienced this upgrading to this OS version or Postgres 17, and how have you dealt with it? One way that could be dealt with it is turning off all automatic updates and moving to a manual update of the os. I suppose that's what most Postgres providers do. For example rds, they say you need to do an OS update. They indicate that in their console. So I think that's the route we're mostly going, but I wanted to know how you handle this. [00:14:44] I hope you enjoyed this episode. Be sure to check out scalingpostgrads.com where you can find links to all the content discussed, as well as sign up to receive weekly notifications of each episode. There you can find an audio version of the show, as well as a full transcript. Thanks and I'll see you next week.

Show Notes

Episode Transcript

Other Episodes

Episode 14

Postgres 11, Multi-tenant Scale, Trigger Speed, Full Text Search | Scaling Postgres 14

Episode 347

Easy Incremental Data Processing? | Scaling Postgres 347

Episode 287

15% - 250% Faster Query Throughput | Scaling Postgres 287