Postgres Releases = Reindexing | Scaling Postgres 290

Episode 290 November 12, 2023 00:14:22
Postgres Releases = Reindexing | Scaling Postgres 290
Scaling Postgres
Postgres Releases = Reindexing | Scaling Postgres 290

Nov 12 2023 | 00:14:22

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss my Postgres Performance Starter Kit, new releases of Postgres, removing JIT, and a lightweight message queue.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/290-postgres-releases-equals-reindexing/

 

View Full Transcript

Episode Transcript

[00:00:00] Hey, we got some new releases. Unfortunately you may have to do some reindexing, but before we get into that, I hope you, your friends, family and coworkers continue to do well. Now, before we get into this first topic, I actually wanted to make an announcement that I created a free mini course. I'm calling the PostgreSQL Performance Starter Kit. Now, it's called a starter kit because it's very introductory and as you can see here, it's basically an intro to SQL, an intro to PG Stats statements, and an intro to explain, and it's less than 30 minutes of content essentially. Now, this is not my big course that I'm planning for performance optimization, but this is for developers that don't have as much experience with these tools if they wanted to get up to speed before taking the course that I'm planning. So if you're a regular viewer of Scaling Postgres, I don't think you'll learn very much from this course, but feel free to share this with your developer friends in case they want to start learning more about PostgreSQL performance optimization. Now with regard to the course, I have a lot of information about that, but I'm going to put it at the end of this episode, so be sure to stay till the end to find out the details about the course. But to get started, the first piece of content is PostgresQL 16.115, point 514.1013, point 1312.17 and eleven point 22 are released. This fixes three security issues as well as 55 bugs, predominantly in Postgres 16. And the first thing they mentioned here is that this is the final release of PostgreSQL Eleven. So if you're still on Postgres eleven, you're going to want to upgrade as soon as possible, essentially. So the security issues are a memory disclosure in aggregate function calls that has a CVSS base score of 4.3, a buffer overrun from integer overflow in array modification that has a base score of 8.8. So the most severe and then the last one is role PG cancel back end can still signal certain super user processes, which has a score of 2.2. Because most of the postgres processes, this won't affect them from what they're saying here, but it may affect extensions that you're using, and they have a number of the bug fixes and improvements listed here, so feel free to look that over. But of course the most important part is the changes that will require reindexing. So first all your gist indexes need to be re indexed. Secondly, Btree indexes that use the interval data type, and thirdly, brin indexes that use date timestamp easy and timestamp data types and a Minmax multi ops class need to be re indexed. So be sure to check your database and Reindex those when you upgrade to these versions and they put a little reminder here, you can reindex them concurrently in version twelve. But make note, I think it's not until version 14 where you can actually Reindex concurrently partition tables as well. But check out this post if you want to learn more. Next piece of Content Django maybe Disable PostgreSQL's JIT to speed up Many joined queries this is from Adamj EU and he ran into an issue in his Django application where they had a crazy join here. Joining to I can't even tell how many tables, maybe it was ten, but basically the JIT compilation of the query was taking 99.7% of the runtime, which is insane. And with all of these joins, apparently there weren't that many rows. And he says down here, even though they had few results, this query took 3 seconds to JIT compile as he mentioned, but milliseconds to execute. But that's kind of crazy. And he said he's using PG Mustard and they recommended either reducing cost estimates or disabling the JIT compiler. So he opted to do that because it was easier. And what it looks like he did, he changed his database connection options to turn off the JIT compiler, and that individual test that he was running that took 3 seconds, now took milliseconds. So that definitely fixed the problem. But of course what I'm thinking when I look at this is you can turn JIT off in a particular session. So that's one option, or for particular user that connects to the database, you could turn it off for that purpose if you need to run this particular workload on it, or rethinking everything. The reason that it was doing all of these joins was a feature in Django doing multitable inheritance, and the query joins all these subclasses tables. So apparently there's all these subclasses having to do that. So I would question whether you want to use the application of the database in this fashion. I mean, if it works for this use case, fine. But seeing this is a bit of a warning flag to me, because what other potential performance problems could come into play down the line? But check out this blog post if you want to learn more. Next piece of content PGMQ Lightweight Message queue on Postgres with no background worker this is from Timbo IO, and we've discussed this in previous episodes of Scaling Postgres, this PGMQ extension, and they're saying it is a self regulated queue, so it has no background worker to check or update statuses. So basically it is a queue system that uses for update and skip locked to be able to lock a given entry so no one accesses it, as well as skip any that are already locked. So you're just moving down the line with each item in the queue. But what's new to me is this self regulating queue in that they're using a visibility timeout. So apparently once something is grabbed from the queue, it's set to be invisible from all other workers, essentially for a period of time. So the entry is always in the queue, it's just marked as invisible once it's picked up, and then that message is archived or deleted once the job is finished. If it fails for any reason, it just becomes visible again for presumably other workers to pick up. So with this implementation, they don't have to have a background worker checking the state of things. So I found this pretty interesting. And check out this blog post if you want to learn more about it. Or there's a link to the extension right here. Next piece of content Tuple shuffling postgres CTEs for moving and deleting table Data this is from Crunchydata.com. This blog post describes exactly what it's doing. So he's showing examples of using CTEs to move data around. Now the benefit of this is that it all happens within one transaction. So if you wanted to archive some data, for example, you could do this in a delete statement, and they're just deleting data that hasn't been modified in a year. You use the returning function to return all the columns from what was deleted, and then you insert that into the archive and you're just selecting all those columns from the deleted to put in there, assuming the schema is the same. And he says you can also filter the data that you're actually inserting into the archive by using a where clause as well. So here you can only archive the important data from what was deleted. And he shows an even more complicated example where he's deleting from multiple tables and inserting it using union all into the single archive table. But just keep in mind this all happens in a single transaction. So if anything should fail within any of this statement, it'll just do a rollback and no changes will have been made. So that's a big benefit of this. Next he goes into updates. So here he's updating some balances and returning the modified rows from this query as a part of the CTE. And with this data he actually inserts it into an awards table. So once a certain amount is triggered with a balance update, they earn certain awards. Then he looks at partitioning, where he's actually deleting from the source rows and inserting it, based upon the category into partition tables. Most of the partition tables I deal with have a lot of data, so I couldn't imagine using this in this use case, but this is just an example of how it works. So check out this blog post if you want to learn more. Next piece of Content Ruby on Rails Neighbor Gem for AI embeddings this is from CrunchyData.com and the primary author of the PGVector Extension, Andrew Kane, also writes a lot of Ruby Gems, which are libraries for the Ruby language, because I think that's what he does, I think at Instacart. And he's developed a new gem called Neighbor. Why? Because it helps do nearest neighbor searches. So if you're familiar with the Ruby ecosystem, you just set this up in your gem file to get the library, and it makes it really easy to do database migrations and rails with the PG vector extension. So here you can see you're defining a vector that you're calling embedding and what the size limit is, and it even understands the index type. So here it's creating a HNSW index with the particular vector twelve Ops Ops class, as well as giving you options to define embedding in your model and then being able to find the nearest neighbors using Euclidean or cosine. So if you use Ruby on Rails, this is a great library to add to your environment if you're using the PG vector extension. So check out this blog post if you want to learn more. Next piece of content There was another episode of Postgres FM last week. This one was on data model tradeoffs. So Nikolai and Michael were talking about different tradeoffs of how you structure your data. Should you put a lot of columns in one particular table? Should you move columns into a separate table based upon index usage, data usage, how it's being queried, how far should you normalize or denormalize, and really a lot of it is. It depends on your use case and where are the hotspots in your application. But I found this a great episode. You can go ahead and listen to it here or you can watch the YouTube video here. Next piece of content Introducing dynamic PostgreSQL how we are evolving the database to help you sleep at night. This is from timescale.com and pretty much this blog post can be summed up by this graph here and the statement buy the base, rent the peak. So what they're doing is offering a dynamic computational load with a particular range. So you purchase a base amount of performance and it will never drop below that. So so many CPUs, for example. But then if a load increases to a certain level, it will dynamically ingest and give you more compute resources up to a particular peak. And once the load is subside it will drop back down to your base level. And they did this because they saw a lot of customers that purchased their services to handle the peak load, which is typically what you do with databases unless you have this kind of dynamic capability. And this is something timescale has done. Now this, to my understanding is for their timescale cloud product. So I don't think it's inherent in their timescale open source solution. I think this is part of their timescale product. And what they state is that, quote, customers running production workloads will save ten to 20% using this solution when compared to AWS RDS for PostgreSQL and 50% to 70% when migrating from AWS Aurora serverless. So they definitely have a lot of information covering this new capability. I encourage you to check out because this could offer a novel way to help you reduce costs based upon your use case. So check out this blog post if you're interested. Next piece of content using multiple triggers on the same table in PostgreSQL. This is from cybertechnposgresql.com and this is talking about Trigger ordering of course on the same table. And the rule is, he says right here is that PostgreSQL first executes statement level before triggers, then row level before triggers, then row level after triggers, and finally statement level after triggers. And if you have multiple triggers of the same type, it just does it alphabetically. So he shows a very quick example here where he adds three different before insert triggers and two after triggers and applied them in reverse alphabetical order. So CBA and CB. But when you actually insert into the table, the triggers are fired first, the befores. So there's three of the befores, two of the afters, and it's always in alphabetical order, ABC BC not based upon how they were applied. Now he puts a note down here, quote the PostgreSQL execution order diverges from the order decreed by the SQL standard, so that's different. So PostgreSQL does it alphabetically. But if you want to learn more, definitely check out this blog post next piece of content SQL scoping is surprisingly subtle and semantic. This is from Buttondown email. So if you're looking for a brainstorming puzzle, I think this probably fits the bill. So he's proposing looking at a select statement such as this and say, okay, what should the result of this query be? And then what's this one and this one and this one? And he actually just has some text here that you have to wade through before you get to the answer. And then he goes more in depth about it. So I won't spoil it here, but if you want to learn a little bit more about how postgres handles some of these things, definitely check out this blog post next piece of content writing a storage engine for Postgres an inmemory table access Method this is from Notes eatonfill.com and this is a tour de force blog post. This is super long as you can tell, and he did a lot of work to actually create an end memory table access method. So if you want to go through his process and try to understand how he did that, and maybe you want to play around and create your own, definitely encourage you to check out this blog post now talking about my course that's coming up. First I wanted to say that I really kind of want to make this course special, so it's not going to be like a book or necessarily another online video of course you may have taken. Now, it will still be a video course, and the lessons will be similar to how they're done in the free mini course I have here. But there's going to be a live component to it, so I plan to release modules weekly and have QAs along with them. I might even have some live lessons. I haven't determined that yet, but don't worry, anything that's live will definitely be recorded and a part of it. My plan is to create as real as possible multi terabyte database with billions of rows for testing and training purposes. So I really want to see where the hangups are and ways to make certain queries faster. Now, the planned course start date is the end of January, so probably January 29, which is about ten weeks away, but I am looking for some early adopters to sign up for the course and provide feedback as I develop it. And if you want to be an early adopter, I will be offering a 50% off discount for the course and details about this offer will be available on Black Friday coming up, as well as Cyber Monday. For those of you outside of the US, that is 1124 and 1127. So about a couple of weeks away. But if you are interested in the course and you haven't signed up for my newsletter yet, please go ahead and do that because I'll probably be announcing some things there as well. [00:14:03] And with that, I hope you enjoyed this episode. Be sure to check out scalingPostgres.com for all the show notes as well as trans script and podcast version of the episode. Thanks and I will see you next week.

Other Episodes

Episode 135

October 11, 2020 00:19:42
Episode Cover

Connection Scalability, Benchmarking Performance, Community Impact, zheap Progress | Scaling Postgres 135

In this episode of Scaling Postgres, we discuss connection scalability, benchmarking Postgres performance, the community impact of an acquisition and the progress of zheap....

Listen

Episode 88

November 04, 2019 00:14:34
Episode Cover

Partitioning, Logical Replication Upgrade, Columnar Compression, HAProxy Connections | Scaling Postgres 88

In this episode of Scaling Postgres, we discuss partitioning, logical replication upgrades, columnar compression and HAProxy connections. To get the show notes as well...

Listen

Episode 304

February 25, 2024 00:14:09
Episode Cover

Overhead of pg_stat_statements | Scaling Postgres 304

In this episode of Scaling Postgres, we discuss new Postgres releases, performance comparisons between PgBouncer, PgCat and Supavisor, a new extension pg_analytics, and new...

Listen