Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about an out of cycle release should you upgrade state of Postgres Survey and Automatic Indexing I'm Kristen Jameson and this is Scaling Postgres episode 219.
[00:00:23] Alright, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 14 out of cycle release coming June 16, 2022. So that's this coming Thursday. So this is to resolve the create index concurrently or reindex concurrently bug that was introduced when postgres 14 was released. They were trying to optimize something and unfortunately it resulted in a potential index corruption issue when you're using that concurrently option when manipulating indexes. And we covered this issue in last week's episode of Scaling Postgres, but this one says the patch for it will be released this Thursday. They also mentioned here you can use the PGAM check command with heapall indexed in order to see if you've had corruption, if you have run indexes concurrently or reindex concurrently. And they mentioned that if you do have corruptions, how you can get around it now is just do a straight create index or reindex without using the concurrently option. Of course that's going to lock up the data, so you have to weigh those cost benefits there. But definitely when the next version comes out, version 14.4, I would definitely upgrade to it as soon as you can.
[00:01:42] The Next piece of Content Important PostgreSQL 14 Update to avoid silent corruption of Indexes this is from Migops.com and this is basically reiterating the issue that exists for the concurrent usage of indexes. But what was interesting about this post is that at the bottom here, they show you the individual commands to run to do the am check. So create the extension am check and then this is the command to be able to check a pattern of tables or a pattern of individual indexes to make sure that they are not corrupted. So you can check this post out if you want to get that information.
[00:02:20] The next piece of content notes on updating to PostgreSQL 14.313 point 712.1111, point 16 and ten point 21. This is from JCats five and he's talking about an issue of whether you should potentially upgrade to some of these because there's been a couple of issues that have come from some of the patches. Now, the first one he mentions is the concurrently issue with indexes and postgres 14. Now I don't think there's any question about that when 14.4 comes out with this fix that you should immediately upgrade to it. But most of this post is focused on the CVE 2020 215 52. So this is a security issue that was patched in these most recent releases of postgres that are listed here. And it basically allows an unprivileged user to craft malicious SQL to escalate their privileges. Now, he said there's been another bug that's been discovered as a result of the CVE, so it impacts all of the patched versions. And it relates to, quote, creating an expression index using an operator class from a different schema. So it seems kind of specific, but there may be other instances where something like this causes issues, and particularly it's for a Gist index with a trigramops operator, and someone saw this from a PG dump, but you can replicate it with a few commands that are hyperlinked here. But it's just something to be aware of, that this is kind of another bug that's known to exist right now. And he asked the question, should you upgrade? Now? Personally, I think the security risk is great enough that I would go ahead and upgrade, at least for the database that I'm directly managing. And I say that mostly because I know that this Gist Trigram Ops is something that's infrequently used, if at all, and I'll go ahead and check to see in the schema if any of these types of indexes exist anyway. But it's just more information to be aware of as you plan to do your upgrades of various different systems. Because if you're going to be upgrading, of course, to 14.4, you're going to be getting this CVE patch no matter what, and you're going to be exposed to this potential bug here. And there's no patch for this bug at this time, at least according to this post. But if you want to learn more, definitely check out this article.
[00:04:36] The next piece of content, the 2022 State of PostgreSQL survey, is now open. This is from Timescale.com. They have a post announcing it, and you can just click on the link here and it looks like it's going to take you to a type form to fill out to take the survey. So I encourage everyone to go ahead and do that.
[00:04:54] Next Piece of Content an automatic indexing system for postgres. How we built the PG Analyze indexing engine. This is from Pganalyze.com. Now, this is actually a part of their product, but they're kind of opening up the curtain a bit to help you understand how they actually created this process.
[00:05:13] So they built it as a way to give you index suggestions for your database, essentially. And they call it AI assisted. So they're using some AI techniques to determine what index should be created. But it's not something that's done automatically, it's developer driven, meaning it's kind of giving these suggestions to you. At least that's my interpretation. And basically how it works is that it analyzes your database, it's connected to it, and it does query analysis. So it looks at things like PG Stat activity. Maybe it looks at PG stat statements. They didn't mention that, but maybe it does. And it collects all the different queries that are being run. And then from that it assesses, okay, what tables are essentially being scanned.
[00:05:58] And then the next phase, it looks at the indexes. So what indexes exist and what indexes could be potentially introduced or combined. And in that process it's doing essentially the AI part of it, the what if analysis. So what if we had this index? Or what if we combine these indexes into a multi column index? Now, how they're doing this is they actually rebuilt the postgres planner, I'm assuming for different versions in their software product itself. So basically they're using their software product does this what if analysis. So it's not impacting your production environment at all. Presumably the only impact would be to my understanding, is it collecting the metrics to be able to do the query analysis and to know what indexes are already in place on it. And then from that point it gives recommendations and it analyzes from a cost improvement analysis. So in other words, how much could an index improve performance, but also from a write overhead, because anytime you're going to add a new index, it's going to have to be maintained and therefore it's going to have an overhead for rights to that table. So apparently it does an analysis of that as well to give you proper recommendations. Now, this post goes over a lot of detail about how they design this, how it's set up, kind of how it's working right now, and I definitely found it pretty interesting. So definitely suggest checking that post out. A companion post to it, more product oriented was a balanced approach to automatic postgres indexing, the new version of the PG Analyze Index Advisor. And this shows a few more product images, for example. So they're showing you here on this dashboard where you could see essentially the index recommendations, their relative assessment of its impact, the right costs appear here. This may be a little bit hard to see, but it basically says, hey, here are these suggestions if you'd like to implement them. It's, again, my understanding from how this works. And they also show information with regard to the write overhead as well of certain of these recommendations. So if you're interested in that, definitely encourage you to check out these two blog posts.
[00:08:11] Next piece of content also from PG Analyze is five minutes of postgres episode 21 server side backup compression with LZ four and Z standard in postgres 15, and the removal of exclusive backup mode. So the first thing that he discusses is the removal of the exclusive backup mode for backups. And this is where you have to do a PG start backup and then backup the files manually and then do a PG stop backup. Now, most recently, I've used this for doing snapshots of the databases as a backup technique. I prefer to use PG based backup, but there's certain database systems where clients were interested in just relying on snapshots as opposed to being able to do a point in time recovery. And for that purpose, that's what I'm using some of these for. But in the next version of 15, the exclusive option is no longer available, which I wasn't using anyway. But what's interesting is they actually changed the name of the function. So some people's backup scripts will no longer work because it has gone from PG Start backup to PG backup Start. Now, they did this so people would be aware that essentially the parameters changed. But it basically means I'm going to have to go in and change certain backup scripts and you will as well if you are using this technique. The other part that he covers is the server side backup compression and the support for LZ Four and Z standard. Now, this has been covered in previous episodes of Scaling Postgres, and what I must say as I've been using it with different clients, zstandard has been an amazing improvement over Gzip. So even though the server side backup isn't available now, I'm using it for a way to compress terabytes of data much more quickly and using much fewer CPU resources. In some cases a quarter of the CPU resources for the same end result. So if you're using something like parallel Gzip or pigs on a Linux system, highly suggest checking out Zstandard to see if that will benefit your use case. But if you want to learn more, definitely check out this episode.
[00:10:14] The Next Piece of Content understand PG's MVCC Visibility basic Check Rules this is from Higo CA and this post is basically about how Postgres goes about determining if a row is visible and it goes through the process. So it talks about some of the information of the additional columns that are on each tuple or Essential on each row. And it goes through the process. At first it checks the hint bit and it goes through the different steps of that. Then it checks if the tuples x men equals to is equal to the global top transaction ID, and it goes through the process for that, as well as checking the current snapshot and finally checking the commit log. So if you want to have a better understanding of how Postgres checks visibility of a row for a given session, definitely check out this blog post. The Next Piece of Content update on the trademark actions against the PostgreSQL community this is from Postgresql.org. So this is an announcement that was actually made back in April, and it's talking about the trademark issues that have been mentioned in some previous episodes of Scaling Postgres. And I mentioned it here because very recently in June, the organization that they are discussing, located at PostgreSQL Fund, is talking about an update on the trademark actions and it's essentially a reply to that. So if you're interested in that type of information, definitely check out these blog posts.
[00:11:36] The next piece of content, the Postgres Girl Person of the week is hiro. Nobu Suzuki. If you're interested in learning more about Hiro Nobu and his contributions to Postgres, definitely check out this blog post and the Last Piece of Content we did have another episode of the Rubber Duck Dev show this past Wednesday evening. This one was on reviewing the 2022 Rails Community survey and it's the second part of that. So basically we finalize our review of that survey. So if you're interested in that type of developer content, definitely welcome you to check out our show.
[00:12:10] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.
[00:12:29] Our.