New Postgres Releases, A Hairy Incident, Slow Down To Go Faster, Loadable Module Archiving

[00:00:00] In this episode of Scaling Postgres, we talk about new postgres releases. A Hairy incident, slow down to go faster and loadable module archiving. I'm Kristen Jameson and this is Scaling Postgres episode 202. [00:00:24] All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 14.213, point 612.1011, point 15 and ten point 20 are released. So these are mostly for bug fixes and improvements. There are no security related issues that they mentioned, so that's good. Although what I'm noticing, what seems to be a trend, is that some index related items keep appearing on the bug list for things that are improved. And the top two items, one is a low probability scenario of index corruption. The other is when reindexing concurrently, there's a possibility for toast tables to be corrupted. So basically the recommendation is to go in and re index your indexes again for your database. But if you want to learn more about the bug fixes and improvements, you can go ahead and check out this post. And this is from Postgresql.org, the next piece of content, a Hairy PostgreSQL incident. This is from Ardentperf.com, and this is basically a story walking through an episode of a database issue that this author encountered. So it was at the end of the day and another part of his organization had done a PostgreSQL upgrade to version eleven and suddenly they started having performance problems. And he mentions in the post, sometimes you get some regressions that happened after you do an upgrade, but this was causing significant problems and they had to try and figure out what was going on. And this blog post goes into detail about, okay, what did they suspect was going on here's, what the queries look like, and then walking through the process of actually discovering the issue and then how to rectify it. And what was convenient is that they had another database system on the previous version that was operating fine. So they were able to look at what a good query plan was and then ultimately what the problem is. But you can tell they were looking at flame graphs, looking at what processes were running, but ultimately they discovered that this bitmap and that the planner was using was causing basically 100% CPU usage. Now, from what they could tell, it wasn't a statistics issue. They were relatively up to date. Things were recently vacuumed and analyzed, so they basically couldn't think of any other way to resolve it rather than to rewrite the query. And what they used is a CTE that basically materialized a subquery and then used it in the query and that gave them the result that they were looking for and the good performance. So this blog post is a great insight into a series of problem solving steps to find the issue and come up with a solution. Now it may not be super elegant and he calls it a total hack, a beautiful hack. But the reality is if the planner is kind of not doing what you think it should do and the statistics are up to date and it doesn't help to add extended statistics, this is probably the thing you have to do. But if you want to learn more about that, definitely check out this blog post. [00:03:29] The next Post slow Things Down to make them go Faster fostan 2022 this is from Virus and this is actually a YouTube video, a presentation that was done by Jimmy Angelacos and this is an updated presentation that I think was given at another conference probably in December or November. But this is such a good presentation I definitely wanted to post it and make note of it again because I believe it has some updates to it. What he means by slow things down to make them go faster is that counterintuitively sometimes you need to put some constraints or barriers in that will actually make you process more transactions or increase the throughput. Like for example, maybe you don't want to be explicitly locking things but instead use the serializable isolation level. Now, a perfectly designed locking solution may beat that, but a lot of times you're not going to come up with a perfect locking solution. And serializable isolation may be more faster. And if there's any failures, because that'll be what happens. If there's the serialization error, you just redo the work again. So that is slowing down to ultimately give you greater transaction throughput. The next area talked about is connection pooling. When you have a lot of connections to your database counterintuitively, you may need to put PG bouncer in front of that. That will slow down the number of connections that hit the database at one time. So the PG bouncer kind of queues them in a state. But because of the shared resources of the database, this actually lets the database perform work faster. So you slow things down to make them go faster. But this post is a great education about a lot of different concurrency related matters with regard to postgres and I highly suggest you check it out. [00:05:14] The next piece of content waiting for PostgreSQL 15 allow archiving via loadable modules. And this is from Depes.com and I have to agree, what he's saying here is that this is huge. I have to agree this is a huge improvement. So historically if you wanted to archive the wall files you would use an archive command and it would run that command for every file that needed to be archived. Well, you can run into big problems when you're generating a ton of wall, like thousands or tens of thousands of wall files in a short span of time because you have to fork and get that process going for each file. Whereas if this is a loadable module, presumably that's going to be able to operate much more efficiently. And they do have an example module that's coming in postgres 15 called Basic Archive that you can use. But he predicts, and I agree with him, that Barman and PG backrest and these other third party backup and recovery tools will be implementing this new solution to give them more efficiency. But definitely a great update coming to postgres 15. [00:06:18] Next piece of content serverside LZ Four backup compression. This is from Rhos blogspot.com and he's talking about an enhancement coming to postgres version 15 where they've added LZ Four backup compression to PG backrest. And he has an example here of one example where without it took 16 minutes to do a backup, whereas with it it took 1 minute and 50 seconds, which is quite significant. And the size went from 1.5gb down to 169 megabytes. So quite a high compression ratio. Now, he said there is a lot of repeatability because this is a synthetic database that he backed up from running PG bench. But he did another test case using the plane as opposed to the tar output and the result went from 15 minutes down to three minutes. So not as good, but still pretty significant. So this is definitely a great enhancement. But the question I'm wondering about is that frequently the built in gzip compression I actually have never used for large clients, we've been using Pegs or the parallel gzip process to be able to use multiple processes in the server to do the compression. So we've always piped PG backrest to Pigs or this parallel gzip functionality. Now, I know LZ Four is fast, but can it be as fast as gzip running across 16, 32, 64 processors? I don't know that, but definitely an interesting enhancement coming to postgres the next piece of content, postgres constraints for newbies. This is from Crunchydata.com, and they're talking about how you can apply constraints to your database. And in their example here, they had a users table, a rooms table, and then a linked reservations table that links users to rooms and then the actual times. Now, in terms of the different constraints they mentioned, they mentioned you can set up foreign key constraints so that the data between the reservations table and the users table and the rooms table can be consistent. They talked about unique constraints, so you can only have a unique room number, for example, in rooms. They talked about the importance of setting cascading for certain operations. Like if you're going to delete a user, you want that delete to actually cascade and delete any reservations. They talked about not null constraints and how you would want to set that for appropriate columns, as well as being able to implement check constraints so that you can have bounds for certain values in certain fields. And lastly, they covered exclusion constraints, which are great for excluding overlapping time ranges when you're doing room reservations or even bounding boxes when you're talking about geographical information systems. So they cover that topic here as well. But if you want to learn more about constraints in Postgres, definitely check out this blog post. [00:09:02] The next piece of content. Five minutes of postgres. Episode four writing your own custom postgres aggregates with rust or raw SQL. So this is from Pganalyze.com and he's talking about the ability of postgres to create your own aggregate. So aggregates are like sum or they are the average of a value or even things like being able to aggregate JSON together. Well, you can create a function and then create your own aggregate from that function and do it in Plpg SQL. But he also mentioned that there's this extension called PGx that lets you easily create extensions in Postgres and you can actually write them in Rust. So you write them in Rust, create your extension from it, and then you can install that extension in Postgres to get all of those aggregation features. So he goes over different blog posts that cover how to do that. So if you want to learn more about that process, definitely check out this episode. [00:10:00] The next piece of content is actually a YouTube channel. It's the United States PostgreSQL Association. And about a week ago they posted all of the talks, it looks like from Pgcomf New York City 2021. So if you're interested in more video content, you can definitely check out this YouTube channel. [00:10:19] The Next Piece of Content waiting for PostgreSQL 15 add unique Null Treatment Option this is from the epesc.com they're talking about. Normally when you assign a unique constraint to a field, you can insert as many nulls as you want. So in this example here, created a table and made the code name unique. Well, you can actually insert as many null code names as you want by default. But there's a new enhancement coming to 15 where you can actually specify that the nulls are not distinct. So what that means you can only ever insert one null value. If you try to insert more than one, it'll give you an error. It says Duplicate key value violates unique constraint. So definitely a customization that you could potentially add to your database if you'd like. [00:11:04] The next piece of content logical replication of all tables in Schema in PostgreSQL 15. This is from PostgreSQL Fastware.com and they're talking about an enhancement in postgres 15 to Logical replication, where you can now define all the tables in a Schema to be able to be published for logical replication purposes before you could do individual tables or even a whole database. Well, now you can do it at the Schema level, and this blog post walks through that enhancement. So if you want to learn more, you can check out this blog post. [00:11:39] The Next Piece of Content there have been three blog posts that have been posted about PG Pool Two. It's what's new in PG? Pool two. 4.3. And then part two. And then part three. So if you're interested in learning more about PG Pool two and the enhancements that are coming to it. Definitely check out these blog posts. The next piece of content, the Postgres Girl Person of the week is burned. Helmet. If you are interested in learning more about Burned and his contributions to Postgres, definitely check out this blog post and the last piece of content. We did have another episode of the Rubber Duck Dev Show this past Wednesday. In this episode, we discussed how we got started coding, so if you want to learn a little bit more about our history in terms of becoming developers, feel free to check out our show. [00:12:24] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 177

Planner Workarounds, Grouping Sets, Lateral Join, Not-So-Easy | Scaling Postgres 177

Episode 74

Picturing Vacuum, Recovery Changes, Cert Authentication, Function Replication | Scaling Postgres 74

Episode 44

Postgres Rise, Row Count Estimates, Nulls, Linux Config | Scaling Postgres 44

New Postgres Releases, A Hairy Incident, Slow Down To Go Faster, Loadable Module Archiving | Scaling Postgres 202