Index Corruption From ICU Collation Change | Scaling Postgres 288

Episode 288 October 29, 2023 00:19:02
Index Corruption From ICU Collation Change | Scaling Postgres 288
Scaling Postgres
Index Corruption From ICU Collation Change | Scaling Postgres 288

Oct 29 2023 | 00:19:02

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss index corruption from a ICU collation change, another caveat for PgBouncer prepared statements, ways to version data, and using Postgres as a cache.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/288-index-corruption-from-icu-collation-change/

 

View Full Transcript

Episode Transcript

[00:00:00] It. Do you know at the end of the movie Revenge of the Sith, where Obiwan defeated Anakin and he had said that he was supposed to destroy the sith, not join them? Reading this first blog post gives me memories of that, but I hope you, your friends, family and coworkers continue to do well. So our first piece of content is the collation versioning problem with ICU 73. This is from PostgreSQL Verite Pro, and the reason why I said this reminded me of Revenge of the Sith is because the ICU collations were supposed to deliver us from issues with Glib C changing when we upgraded operating systems. So this was supposed to be a way to prevent collation changes from breaking our indexes and potentially corrupting our data. But unfortunately, in ICU 73, there's a bug due to a, quote, uncommon move where they changed the root sort order without upgrading to a whole new Unicode version. So of course this can break sorting in Postgres and some warnings that Postgres has set up to detect this. They're counting on these version number changes to put warnings in the logs that, hey, something has changed. Like, for example, right after connecting, it does a warning database, whatever the database name, has a collation version mismatch. Or when you're using the collate clause, it says collation, and the collation you're using has a version mismatch. And basically, if you ignore these warnings, you can end up with data corruption. As he says, they are silent data corruption. Now he shows an example of what can happen. So for example, in ICU 72, when you do a sort in this specific way, you can see A and B is ahead of A and two. But when you sort it in ICU 73, a two is ahead versus AB, but the versions are exactly the same, 153 120 and 153 120. So if there's a sort change, these versions should change as well. So that way, you know, you need to essentially reindex your indexes to avoid any kind of data corruption. Now, we took a look at this on Fedora 38, created a database, put some words in, it did an upgrade, and it did go from ICU 72 to 73. And the issues were there was no warning about the version mismatch. When searching for certain specific existing values within the index, they were not found. When searching for the same specific values without the index, just looking at the heap, they were found. So they were there, but that wasn't reflected in the index due to the correlation mismatch. And when checking with the B tree index checker from Am check A, corruption is detected. So he shows all of that here. Now, the good news is that ICU 74, so the next version is at the RC stage or the release candidate stage, and it should be released in the next few days. So maybe people will never see this upgrade to ICU 73, although it could of course happen. But just know if you're intending to upgrade to ICU 73 or you've already done it, you need to go ahead and do a reindexing. And he also adds at the end here, if you're using bytewise sorts, so a C APOsix, C Utf Eight for your collation, you shouldn't have to be concerned about these upgrade difficulties. But he says he's going to release a blog post about this in the future. So definitely something to be aware of if you use ICU correlations with your postgres database. [00:03:32] Next piece of Content prepared statements in transaction mode for Pgbouncer this is from Crunchydata.com and this follows up on the topic we discussed in last week's Scaling Postgres episode with the release of Pgbouncer that supports prepared statements, specifically version 1.21, and he reiterates this release and its importance. Basically, Pgbouncer can now support prepared statements in transaction and statement pooling mode. And prepared statements basically means the query only has to be parsed once and the query only has to be planned once and basically it can be executed many times at that point. Now, because of this, a lot of the performance improvements you'll see are going to be in the reduction of the planning stage. For example, he did a test here where he joined a table to itself about 14 times, so making the planner do a lot of work to figure out how to run this query. You'll see, the planning time is 170, so milliseconds for the planning, even though the execution time was much shorter, it on average looks twelve milliseconds or so. But if he prepares this statement and then executes it multiple times, you could see during the first execution it's still around 170 milliseconds for the planning time, but the planning time thereafter is like two hundredths of a millisecond. So super fast, hardly any work needs to be done. So that's the major benefit of the prepared statements. Any discusses the importance of it working in transaction mode because that's how most people set up PG bouncer to be able to maintain more than one connection from your app to a single postgres server connection. Basically, it allows you to minimize the number of server connections you're using and how when you're running in transaction pooling mode, on the old version you would get errors like this error prepared statement does not exist, whereas after the upgrade to the new version of PG Bouncer, you get success and no such error. Now, he does mention a caveat we mentioned last week in that this only supports protocol level prepared statements, it doesn't support SQL level prepared statements. So what you saw above where he was doing a prepare that wouldn't work. So the library you're using needs to use protocol level statements. But the second caveat that was not mentioned last week is that potential issues with de allocation. So apparently the protocol supports destroying prepared statements, but the library that most people use to connect to postgres. Even if they're using Ruby or Python, usually it's still going through Lib PQ. This library or interface doesn't support preparing closed statements until postgres 17, which that's a year away and what he says is quote hence many drivers and programs will resort to doing a manual de allocate. So basically these libraries will have to be adjusted. And he mentioned he did do a patch to the Perl postgres library to offer an option to PG Skip deallocate, because without this, apparently he sometimes got a destroy failed error prepared statement. A reference to it does not exist. So definitely, even though PG Bouncer supports this, you need to make sure that whatever library you're using in whatever language is using protocol level prepared statements and it doesn't have this de allocate issue or it has a way to get around it. But if you want to learn more, definitely check out this blog post. [00:06:51] Related to that is PG Analyze's. Five minutes of postgres where they cover this topic as well. PG Bouncer 1.21 adds Prepared Statement support and transaction Mode next piece of content versioning data in postgres testing a Git like approach. This is from Specfly IO and he's talking about the desire to do data versioning and he kind of tried to take it from an approach of how Git works. Now, the first part of the post he's talking about and explaining how Git works to help you kind of understand what he's going for here. But I'll jump down to the meat of the blog post down here and what he wants to set up is say, a Post table. So blog posts and it's associated with a Blobs table. So this is where all the versions essentially reside. So if you're going to be updating a post, all the content will be added to this blob and then you simply change the Blob ID reference to that newer version of the Blob. And he did want to support being able to walk to see what the previous version is. So he is including a parent ID here. So that's how we want to do his versioning. So what that means when you create a row, you insert the content into the Blob and he's using JSON to store a lot of the data. For example, he just has the title here. Then once you know that Blob ID, then you can insert into the Post table and use that Blob ID to be able to identify which version to use. If you update a row, you essentially insert into that Blobs table the new version, and then you update the version that's referenced in the Post table using this command here. [00:08:22] If you revert a row, he actually has two different ways to do it. One, you just in the Post table point to the original version or the previous version. Another option is to actually copy the original Blob and then point to that. I don't know if I'm necessarily a fan of that, but that's possible to do in terms of deletion. He said you could do soft deletes using a deleted at column in the table or do a soft version delete where you're essentially doing the same thing in both tables. Or a hard delete where basically you insert a version stating that it's been deleted and then remove the reference in the post table. Oh, and then a full hard delete where you just delete both apparently. And he has a query on how you can list the versions. Then he actually wanted to talk about branching where you essentially queue up multiple blobs or multiple blog posts, have them associated with one branch that I guess you can commit all at one time into your posts. To me this seemed to push the git analogy a little too far. I would have probably kept it more simple and not have this table. But he did say the biggest drawback of this solution is dealing with schema drift. So as the schema changes over time, how are you going to be able to handle all of these different versions and migrating between them? And he said he dealt with it by having an array of ignored fields, which seems a little complicated to me as I was thinking about this, looking back at his original simplified schema with just a post and a blobs table. Personally I think what I would do is just have a post table and a post versions table, that way I wouldn't have to deal with the JSON in here. So I think that would probably help to minimize schema changes because you can just do standard migrations of this post versions table. I could also imagine a scenario where you can just do the versioning in one table and have different columns define which is the active version. But I thought this post was pretty interesting and you can check it out if you want to learn more. [00:10:21] Next piece of content. You don't need a dedicated cache service PostgreSQL as a cache. This is from Martin Hines Dev and basically he wants a cache that supports expiration. So basically data will expire in the cache eviction, so basically data can be evicted maybe if it reaches a certain size when the cache is full, support invalidation, so overwrite data when it changes. Something that's performant, it doesn't have persistence and a key value stored. Now how he's designing this cache is basically he's using an unlocked table, so he's avoiding writing to the wall. That should give you a lot more performance, but you still have to deal with vacuum. So if rows are going to be highly updated, you're still going to be fighting vacuum with this. So I don't know how well that would work in terms of expiration. He basically says you need to set up a delete process that gets removed after a particular retention period. So basically put that in a Cron job. He's using PG Cron extension here instead. He said you may be able to do it with triggers. I don't necessarily if I'd want to do that because that would just slow things down more. He didn't go too much into Eviction, but that would require tracking a last red timestamp and essentially being updated after every select, which now you're doing an update with every select, which is going to slow things down a little bit more. So I think the performance is where you're going to be hurting trying to do this. And he mentions there are also downsides, one of them being performance. So I thought this was an interesting thought experiment, but I don't think I'm going to be giving up things like memcache any soon for my application cache. But if you need cache like storage, unlocked tables are a good candidate for that. Just keep in mind they do get erased if your database crashes. It's not crash recovery proof because you're not using the wall. So just make sure you don't put anything in there that you don't want to lose. But check out this blog post if you want to learn more. [00:12:21] Next piece of content using Pgactive Active Active Replication extension for PostgreSQL on Amazon RDS for PostgreSQL. This is from AWS Amazon.com and this is an extension on RDS. But I looked for reference that it was open source but I couldn't find it. It says Pgactive is based on the open source BDR project, which is bi directional replication that I think Second Quadrant originally did that is now a part of EDB. But looking through this I don't see that it's open source, so it may only be available on RDS. So if you're looking for that you may want to check out their service. Or also the new Active Active Logical Replication that's available in postgres 16. I don't know how ready it is for primetime and production workloads yet, but that's a new feature that's here today in the most recent version of Postgres. [00:13:17] Next piece of content. There was another episode of Postgres FM last week. This one was on over indexing. So whereas most of the problem is not having the correct or the right indexes on, it can also be a problem if you're putting on too many indexes. Now the most obvious problem is you're going to be slowing down inserts and updates because all those indexes need to be kept up to date as data is added. But they also mentioned a few other problems that you can encounter. You could actually slow down selects because of increased planning time. Again, like we saw on the post earlier where someone did 14 joins and how the planning time got really long. Well, the same thing can happen with too many indexes that might be covering the same row, so it has to spend more time planning out what it needs to do. They also discussed the lock manager and how it has a I'll call it a magic limit of 16 and once you exceed 16 objects that need to be locked as part of a query in terms of a shared access lock, you start to slow things down. It's no longer, I think, a fast path query. So you can get to this number of 16 by having too many indexes on a table or you're joining multiple tables with a lot of indexes, or even having partition table where you're not pruning down to a single partition, but the more partitions you're touching and joining to other tables and having multiple indexes, you could potentially hit that 16 really quickly. Now they did say most of the time you're not going to see this because you actually have to have lock contention with the same query being run many, many times. And a lot of the frequent queries aren't going to be all these complex joins happening at the same time, but it's just something to be aware of and to again keep your indexes set at the appropriate mount for your workloads. But I found this to be a great episode. You can listen to it down here or you can check out their YouTube video here. [00:15:10] Next piece of content animize dump of your postgres data. This is from Tembo IO and they're talking about using the PG crypto extension along with the PostgreSQL, anomizer extension to anonymize your data that resides in postgres. Now, the scenario they're doing here is they're saying you have a postgres database with data you want to anonymize and he wanted to bring it down to a local system, anonymize it and then you can use it. At that point, personally, what I would do is I'd probably put a separate postgres instance in the production area, do this process he describes here, anonymize it on that production instance in the secure area and then download it to a local system. But talk to your security engineer about that. But once he gets the extension set up in the database, he just loads the data into local database he created, sets up the extension and defines the columns he wants to anonymize and then goes ahead and anonymizes the database. So if you're interested in that, you can check out this piece of content. [00:16:12] Next piece of content using psql variables to introspect your script this is from Fluco 1978 GitHub IO and you can use variables in psql and they have the format that looks something like this. You can do a GSET to set them and then you can reference them using this syntax here. So if that's of interest, you can check out this blog post. [00:16:34] Next piece of content human version sort in PostgreSQL revisited this is from Depeche.com in here he's talking about ICU correlations and he wants to be able to sort this sample data in a way a typical human might do it. So basically he used a syntax down here to set up what he called a human sort and then applied that collation as part of his query. And now it sorts the way one would expect. So definitely feel free to review this post if you want to understand 100% of what he's doing here to get this collation sort the way he wants it to work. Next piece of Content a quick glance at PG based backup compression. This is from Higo CA he's talking about in newer versions of Postgres, starting from 15, you can now add a compression option and specify whether you want to tap it on the server or the client. And it also can support more than just Gzip. Like. Personally, I really love using Zstandard. Now, what this basically does is determine which system has the encryption load. Is it the client or is it the server? So based upon how you want to do your backup, you can define either to have the postgres server database to it or whatever client you're taking the backup from. And they show an example of it working here. Now, I actually haven't tested this yet, but I wonder if it uses all the cores with an example like Zstandard. Because I know what I've typically done is taken PG based backups without using the compress option, and I pipe it into Zstandard or Pigs, which is parallel gzip to use all the cores of the machine. But if you want to learn more, you can check out this blog post. [00:18:09] The last piece of content. Quick overview of PostgreSQL's table access method. This is from Higo CA. This is just a quick summary of the table access method in Postgres. And this is the interface that's used to define additional ways of storing Postgres data. Essentially different storage engines. And it's what, as he says here, Orioli DB is using to develop their custom engine that's trying to do things like support update in place and a redo log. So if you're interested in that, you can check out this blog post. [00:18:41] I hope you enjoyed this episode. Be sure to check out scalingpostgres.com to get links for all the content discussed, as well as the audio, podcast and transcript of the show. Thanks and I will see you next week.

Other Episodes

Episode 34

October 15, 2018 00:11:46
Episode Cover

Inserts, Nearest Neighbor, Freezing, Comment | Scaling Postgres 34

In this episode of Scaling Postgres, we review articles covering all about inserts, how to handle nearest neighbor, freezing confirmation and Comment. To get...

Listen

Episode 225

July 24, 2022 00:10:21
Episode Cover

PSQL GEXEC, Delete Duplicates, Postgres Podcast, Puny to Powerful | Scaling Postgres 225

In this episode of Scaling Postgres, we discuss /gexec available in psql, how to delete duplicates using a window function, a new Postgres podcast...

Listen

Episode 281

September 10, 2023 00:17:12
Episode Cover

HNSW Indexes, Vacuuming Bloat, Watch Me Now, Connections | Scaling Postgres 281

  In this episode of Scaling Postgres, we discuss the benefits and disadvantages of HNSW indexes for working with vector data, configuring vacuum to reduce...

Listen