Indexing With tsearch, Tuples Not Yet Removable, Wicked Problems, Pattern Matching | Scaling Postgres 205

Episode 205 March 07, 2022 00:09:56
Indexing With tsearch, Tuples Not Yet Removable, Wicked Problems, Pattern Matching | Scaling Postgres 205
Scaling Postgres
Indexing With tsearch, Tuples Not Yet Removable, Wicked Problems, Pattern Matching | Scaling Postgres 205

Mar 07 2022 | 00:09:56

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss the best ways to index with tsearch, what to check if tuples are not yet removable, wicked problems with Postgres and different pattern matching techniques.

To get the show notes as well as get notified of new episodes, visit:

https://www.scalingpostgres.com/episodes/205-indexing-with-tsearch-tuples-not-yet-removable-wicked-problems-pattern-matching/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about indexing with tsearch tuples, not yet removable, wicked Problems and Pattern Matching. I'm Kristen Jameson, and this is scaling postgres episode 205. [00:00:17] You alright? I hope you, your friends, family and coworkers continue to do well. Our first piece of content is how to index data with T search. This is from Deps.com and he's talking about an issue that he was looking at on a slack discussion where they said should you build a TS vector in a field and then index that or just make an expression index that has the TS vector in it. So in general, he says he doesn't like having an additional TS vector column in the table because it doesn't really serve any purpose if you're using expression index and it's just taking up space. Although my personal opinion is I actually like the cleanliness of it when querying, but he actually has a solution to that. So we'll take a look at what he's talking about. So he created a post table and he created an expression index using gin and a TS vector where he's concatenating the title, headline, main body. And of course, in order to deal with search issues, you want to put a space between them and take into account if any of these happen to be null by using the coalesce function. Now the issue is that every time you do a query, you're going to have to include this in the where clause so it matches the expression in the index for the index to be used. But he has a technique that he suggests using. So he creates this index here that basically returns this value of the concatenation and you can create the index with that function. And then when you do a query, you can actually use it and it will use the index. So this is definitely cleaner to use. And he also says there's actually an obscure capability in postgres that's not necessarily well documented, but you can use some dot notation to be able to do it. So you can actually say your table name, dot the function name as long as you're passing in table as the parameter and that will still work. So that's definitely pretty readable. Now I don't know if I'll necessarily use this technique, but it's definitely an interesting one if you have a lot of text search needs. But go ahead and check out this post if you want to learn more. [00:02:36] The next piece of content. Five minutes of postgres episode seven auto vacuum dead tuples not yet removable, and the postgres X Men horizon. So in this episode of Five minutes of Postgres, he's talking about a friend that had an issue where Auto Vacuum was not cleaning up tuples and he referenced two postgres from 2018 that serve as a good reference for finding the issue. So the main problem is that something is holding on to an older transaction ID that won't get recycled, and therefore auto vacuum cannot clean up rows in, say, a particular table. And there's basically three main areas you want to check for that. One is long running transactions, because a long running transaction will prevent those transaction IDs from being recycled. So you definitely want to be cautious of long running transactions. The second area is replication slots. So if you have a replication slot but nothing is consuming that slot, the wall builds up. Now, a lot of times you see this as a disk space problem. So your log volume, for example, keeps expanding because nothing is essentially consuming the wall and allowing it to be recycled. But this also can cause transaction ID wraparound problems too. So you want to make sure that none of your replication slots are orphan. And the third one, which is more rare, is prepared transactions. And this typically happens when you're using two phase commits. And that's definitely a rarity that I've seen. But these are the issues that you want to be aware of if you are seeing that vacuum is not fully vacuuming in tables, or it's taking a long time to run because it's not clearing out dead tuples, there's other reasons auto vacuum can take a while to run. That's usually if you have a ton of data in indexes, but if it's not actually clearing out those dead tuples, that's something to look at. But if you want to learn more, definitely check out this episode. [00:04:28] The next piece of content is actually a presentation that was actually given in Postgres Build 2021, and you can get access to the videos online, but this happens to be the presentation that was done, and it's called Solving PostgreSQL Wicked Problems. And they're listing, say, ten different problems that postgres has. And their solution to try and alleviate these issues is a new extension that they've developed called Orioli, I believe. And with these changes in this extension development, what they've seen in their testing is a four to 50 fold improvement of different metrics from I ops improvements to transaction performance. And some of the things that they've done, they've listed here is moving the transaction IDs that need to be wrapped around from a 32 bit value to a 64 bit values. So that drastically reduced the risks of that happening inefficient replication that spreads corruption. I believe they're talking about block level replication, whereas this new extension does row level replication, relying on MVCC having the heap contain all versions of the rows they're moving to an undo log. So undo log should result in some performance improvements. The primary key index is a space hog. Basically, you could avoid a primary key index if you actually order the tables by the primary index. So you can avoid that and not doing block level compression of the data, whereas this implements a block level compression of the data to get some space savings. So everything in green has already been implemented in the initial release of this extension. That happened a short time ago. So this post was actually in February 19. The presentation I think was at the end of 2021, but this is available to try out and the other areas are things that they're working on, but if you want to learn more about that, you can check out this post as well as the presentation online. [00:06:27] Next Piece of Content how to Cheat at Wordle with PostgreSQL this is from Crunchydata.com, and this post does exactly what it says, loading up a bunch of different words into postgres and then trying to find word matches given a certain set of words. So it's definitely an experiment in pattern matching in postgres. So if you want to take a look at all the different techniques they use to try and do some of this word matching, you can definitely check out this post. [00:06:55] The Next Piece of Content migrat is a schema comparison tool for PostgreSQL, and this is from Databaseci.com. And frequently you have file comparison tools that show you the difference between one file and another. Well, this actually does it for postgres schemas, so you just run the command migra at a command prompt. It's a Python library, I believe, and you give it two PostgreSQL URLs. It connects to those databases and then tells you, okay, this is how you match the schemas up or what needs to change to match one schema to another. So it literally gives you the alter table commands to do that. So definitely an interesting tool. [00:07:34] And it says that they cover tables, partition tables, constraints, views, functions, indexes, so a lot of different objects and notes you may need to be aware of that are listed here as well. But definitely an interesting tool. I encourage you to check out if there's interest. [00:07:49] Next piece of content. Pgpool Two configuration parameters auto failback. And this is from Bping blogspot.com. And for connection pooling, I tend to use PG bouncer. I haven't actually used PG Pool two yet, but it has a lot of capabilities for handling a cluster of PostgreSQL servers. And one of those areas is Automatic failback of a postgres server. Like if it goes online and comes back, how do you fail back to it? And this post talks about that configuration and process, so if you're interested in that, you can check out this blog post. [00:08:20] The Next Piece of Content cloud Data Sources for the Postgres Operator for Kubernetes, this is from Crunchydata.com and they're talking about an enhancement to their Crunchy data postgres operator, where for building a new cluster of postgres systems, you can actually rely upon an existing backup that is stored in AWS, Google Cloud Storage, or even Azure. So if you're interested in doing that, you can definitely check out this blog post. [00:08:48] The next piece of content a snippet to acquire a lightweight lock. This is from Haigo CA, and if you're interested in learning more about the internals of how lightweight locks are set up in postgres, you can definitely check. Out this blog post next piece of content. The PostgreSQL Person of the week is John Nicole. And if you're interested in learning more about Johnny and his contributions to Postgres, definitely check out this blog post. [00:09:14] And last Piece of Content we did have another episode of The Reproductive Dev show this past Wednesday. This one was on reviewing Web application security risks as defined by OWASP. So if you're a developer and interested in reviewing those, you can definitely check out our show. [00:09:31] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 199

January 24, 2022 00:12:17
Episode Cover

Shaped Sample Data, Version Changes, Missed Bottlenecks, Indexes for Newbies | Scaling Postgres 199

In this episode of Scaling Postgres, we discuss how to generate shaped sample data, track changes between Postgres versions, identify missed bottlenecks and use...

Listen

Episode 44

December 31, 2018 00:08:15
Episode Cover

Postgres Rise, Row Count Estimates, Nulls, Linux Config | Scaling Postgres 44

In this episode of Scaling Postgres, we review articles covering the rise of Postgres, row count estimates, nulls and linux configuration. To get the...

Listen

Episode 225

July 24, 2022 00:10:21
Episode Cover

PSQL GEXEC, Delete Duplicates, Postgres Podcast, Puny to Powerful | Scaling Postgres 225

In this episode of Scaling Postgres, we discuss /gexec available in psql, how to delete duplicates using a window function, a new Postgres podcast...

Listen