Full Text Search, Query Optimization, Exception Blocks, Procedural Language

[00:00:00] In this episode of Scaling Postgres we talk about full text search, query optimization, exception blocks and procedural language. I'm creston. Jameson. And this is scaling postgres episode 100. [00:00:22] I hope you're having a great week. Well, welcome to episode 100. So after approximately two years of doing this, we are at the 100th episode. I didn't prepare anything special for because unfortunately I was quite busy this week. But I'm definitely here with the content that I was able to find this week interestingly. The YouTube channel is very close to 2000 subscribers so that's another milestone it looks like we will be hitting soon. But our first piece of content is a YouTube video. The state of full text search in PostgreSQL twelve. By Jimmy Angelacos And this is actually on the Peter Cooper YouTube channel and this does exactly what it says. It explains different ways of searching text, especially the full text search capabilities of postgres and how you can basically use it and the different features that are available within it. Now you also may find it beneficial. There is the slides here that I will provide as a separate link and it discusses the different contents that are part of the presentation in terms of operators, functions, dictionaries examples, indexing, type of indexes to use. Interestingly, had like a 4000 fold improvement adding a gen index to one of those examples he had talking about non natural text searching, collations other text types as well as maintenance vacuum becomes very important, particularly with different index types as well. Now, one thing I liked about the presentation is with the advent of postgres twelve is using a generated column. So using a generated column to actually build a TS vector of the data that you want to search on so it's automatically maintained and then you could just index that field for full text search purposes. But if you're interested in full text search, I highly encourage you to check out this presentation because it has a lot of great information. [00:02:22] The next post is Introduction to Profiling and Optimizing SQL Queries for Software Engineers. [00:02:29] And this is from the Scope blog on Medium. Now this is not meant to be Python or postgres specific, but those are the examples he's using. But he also speaks more generally about how these techniques can be used and it's mostly from a developer's perspective. So you want to speed up a slow query. How would you do that? And the first thing he discusses is how defined them. So you can look at the database slow query log, that's one method to do it. You could also use PG Stats statements in postgres to look for some of the different queries there. And he talks about other places you can look for it in certain middleware or application logs. And he has some Python or Django specific tools you can use, or even application performance management platforms to be able to track and identify slow queries. He shows how you can do it. So once you find it, how do you profile it? And he talks about Explain and Explain Analyze and the difference between them. Basically Explain Analyze actually runs the query and he has an example, pretty comprehensive query he's showing here, and then what an Explain Analyze output looks like. Now someone could be quite intimidated by it, but this query is pretty intimidating in and of itself. But he actually shows a visual representation and talks through how to look through the Explain Analyze output to be able to determine what's going on and what could be the slow component of it. And he uses the website Explain Depom to give you a graphical representation of your Explain plans. And then he discusses a particular thing to adjust or an index to add that would improve the performance and then even a way to test the performance with using a transaction and see what the differences are. So this is more on the basic level, depending on your level of database knowledge, but definitely good information for developers if they're wanting to find out how to optimize slow queries in their database. [00:04:37] The next post is the strange case of the exception block. And this is from Pgdba.org and he's talking about postgres functions and exception blocks where you begin something and then if something has an exception, you can define what exception will trigger it and then you're going to execute some other code instead when this code here errors out. So he's talking about exception blocks within it. And he had a very unusual thing happen where the Xids were being incremented. So he actually replicated this like setting up a unique constraint, and then said when you hit it, basically you don't do anything. But what he noticed is that when this is running, I believe the loop is 1000. It actually increments because he's looking at the frozen XID age here. It's actually incrementing the Xids even though absolutely nothing has happened, no data has been added, basically just burning Xids. And he says, quote, whether the exception is handled or not, the DML consumes an XID every time it's executed. And he also makes note of here this kind of behavior that I'm struggling to find documentation for is what caused the one and only emergency shutdown to prevent XID wraparound I had to deal with in my career. So this is definitely something to be aware of, that this can happen with exception blocks within functions in PostgreSQL. Because actually what seems to be happening is that these are subtransactions and transactions are essentially running with these exception blocks when that code is executed. And this is further seen in a second post discussing this plpgsql exception in Xids from Fluco 1978 GitHub IO. And he mentions here, quote I think Plpgpsql is using subtransactions or save points to handle exceptions. So whenever those exception blocks happen, they're going to use a transaction ID to handle it. Or not. And he actually does a different implementation where he has set up a function and he's actually doing an output and tracking what the XID does with the different behavior using TXID current if assigned TXID Current. And his assessment is it that exceptions are quite clearly implemented in Plpg SQL and possibly in other languages by means of subtransactions. So if you're using Postgres functions and using exception blocks, this is just something to be aware of because if you have a function running very fast and erroring out, you could be burning through your Xids for that particular table. So just something to be aware of. [00:07:25] The next post is creating a PostgreSQL procedural language, part One setup. So this is a setup to add a new procedural language to use with Postgres. So by default, as part of the standard distribution, there's plpgsql, pltcl, PL perl and PL python. Well, they actually wanted to add a PL Julia programming language, so the first step they want to do is actually create an extension for it. So they wanted to create a control file that defines the basic properties of the extension, a SQL file that creates the extension's objects, a C file for the extension itself, and a make file to build the extension. And he has the exact code used here for setting up PL Julia as a new procedural language, the SQL for the functions, the C code, as well as the make file. So these are basically the setup, the bones for getting this set up. And we'll have to consult part two in order to see how this moves forward. And this is from the second Quadrant.com blog. The next post is an overview of job scheduling tools for PostgreSQL. Now, last week I believe we talked about a PG timetable as one means of doing scheduling, and they're discussing three others listed here just using a general Cron tab in Linux, which is what I tend to do. There's also the agent called a PG agent, which I believe requires PG Admin. I believe for. So that's something to take into account. And the last thing they mentioned is the extension PG Cron, but this looks like it can only operate on things within the database itself. So store procedures, SQL statements and PostgreSQL commands. So I don't believe it can run jobs outside of Postgres, but these are three options for managing jobs for your PostgreSQL installation. And this is from the several nines.com blog. [00:09:28] The next post is Can PG Bouncer session survive everything we throw at it? So this is a fourth in a series of blog posts from Enterprisedb.com covering PG Bouncer and its connection and pooling capabilities, predominantly using the EDB Failover Manager, which is kind of like a virtual IP manager and has some other features. And this goes through in depth on how to set this up with the Failover Manager for PG Bouncer and doing different tests of failure to see what survives and what doesn't. And at the bottom here they have the general conclusions of what is possible. So if you want to learn more into handling failover of PG bouncers, definitely a blog post to check out. [00:10:17] The next post is how to migrate from trigger based partitioning to native in PostgreSQL. So this is going from the trigger based partitioning to basically the declarative partitioning that was set up in postgres ten improved in eleven, and then of course twelve. Now, it looks like a lot of these instructions are with regard to PG partman, so it doesn't look like it's using just only trigger base, but a lot of the instructions are if you've used Pgpartman and moving to it. And it looks like the basic process is creating new parent tables and then detaching the child partitions from the current primary, attaching it to the new declarative partition scheme and then doing some renames. But they go through the whole process here in this pretty long blog post. So if you are interested in doing that, definitely blog post to check out from Crunchydata.com. [00:11:13] Also from Crunchydata.com is a next post guard against transaction loss with PostgreSQL synchronous replication. Now, this does discuss synchronous replication, so there is some knowledge to be gained about what it is and how it works. Generally, when you set up replication, by default it is asynchronous, meaning that transactions are committed and saved on the primary database and then they're sent off to the Replicas or streamed to the Replicas. When you set up a synchronous replication, the write doesn't get fully acknowledged to the client until it's written to two synchronous systems, or it could be more. And they have a great graph here that displays this. So the client sends some data to the primary, but the primary, before acknowledging, sends the data to the synchronous Replica. The synchronous Replica acknowledges to the primary and only then does the primary database acknowledge success to the client. So you're always going to have data in synchrony. So the client does not get acknowledged until it's written essentially to both places. Now there's a performance hit for doing this, of course. And with the stock synchronous replication PostgreSQL, if your Replica goes down, essentially no writes happen on the primary because it can't do asynchronous write. So that's something to be aware of. Now, what this blog post discusses is actually their PostgreSQL operator 4.2. So this whole post is written from that perspective. So that's something to keep in mind if you want to look at this. It's not about setting up synchronous replication necessarily, it is doing it through the postgres operator, which does postgres set up in Kubernetes, but it doesn't just basically tell you how it is set up normally. So you can see here they're using the postgres operator commands, PGO, et cetera, to set things up. So if you want to learn a little bit more about synchronous replication, but especially if you're wanting to use their PostgreSQL operator for Kubernetes, then definitely it's the blog post to check out. [00:13:21] The next piece of content is a YouTube video and it is Oracle to Postgres Schema Migration Hustle. And this is from the Enterprise DB YouTube channel and they're covering considerations of migrating from Oracle to Postgres with regard to migrating schemas and all the different data and objects that need to be passed over. So if you have a need for that, definitely a webinar you'll probably want to check out. [00:13:48] The next post is how to use the KNN machine learning model with two UDA. PostgreSQL and Orange Part One Well, this post is about machine learning capabilities with something I've never heard of before. But if you're interested in machine learning using PostgreSQL and Orange, this is a blog post to check out from second quadrant. The next post is how to run a clustering algorithm within PostgreSQL. And this clustering algorithm is essentially, they say here the well known K means. So if you have a need to use that in PostgreSQL, you can check out this blog post. [00:14:26] And the final blog post is that a new NPG Twelve New Leader PID column in PG Stat Activity so there's a new Leader PID column in PG Stat Activity that it says tracks the PID of the group leader used with parallel queries. So if you have an interest in doing that, definitely a blog post to check out. [00:14:48] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingposgres.com where you can sign up to receive weekly notification occasions of each episode. Or you could subscribe via YouTube. Rightunes, thanks.

Show Notes

Episode Transcript

Other Episodes

Episode 276

BRIN & Correlation, Poor Partitioning, 10 Beginner Tips, Table & Index Usage | Scaling Postgres 276

Episode 119

Replication Complications, One Word Performance, Pgbouncer Users, Serializable Isolation | Scaling Postgres 119

Episode 321

Ottertune Is Dead! | Scaling Postgres 321

Full Text Search, Query Optimization, Exception Blocks, Procedural Language | Scaling Postgres 100