Tidy Vacuum, Dropping Roles, Merge Command, PgBouncer Tutorial

[00:00:00] In this episode of Scaling Postgres, we talk about tidy vacuum, dropping roles, merge command, and PG bouncer tutorial. I'm Kristen Jameson, and this is scaling postgres episode 209. [00:00:22] All right, I hope you, your friends, family, coworkers continue to do well. Our first piece of content is PG. Friday tidying up with vacuum. This is from Enterprisedb.com, and this is a very good post talking about how to configure vacuum. To follow up with the post we had last week on Scaling Postgres. Now, this is a follow on from their post about XID wraparound and how to deal with that. Well, this goes into the configuration of vacuum. The first section here he talks about is releasing the throttle. So basically, how do you get vacuum to do more work? Because by default it's configured to impact the system very minimally. So it can work on a variety of server or machine types. But these are the different parameters you can adjust to make it vacuum faster. Now, the number one adjustment that I tend to do is varying the vacuum cost limit, the auto vacuum vacuum cost limit, because once you increase those up, then more work can be done per vacuum. And how vacuum works is essentially it has a set of tasks it has to do, and each of those tasks is assigned a cost. So here's the vacuum cost page hit, page miss, page dirty. And once those costs reach that limit, then it pauses for the delay, so it stops doing work for a time. Well, if you increase the limit, you can also adjust the cost down. But I intend to increase the limit and minimize the delay. You can get vacuum to do more work in a given unit of time. And they also say that older versions of postgres had the delay set to 20 milliseconds. But with recent versions, as of postgres twelve, they moved the default to two milliseconds. So a much shorter delay, giving the faster storage systems today. Then he goes into the section Expanding the Workforce. So by default, you get three auto vacuum workers to vacuum your different tables. And I usually leave it at that. Maybe I push it to five, but not so much more than that. It also depends upon the customer's tables in their database, because if they have one huge table and then a bunch of smaller tables, then 3 may be sufficient. But if you have a lot of partition tables and none of them are really large, then more auto vacuum workers make more sense. But the thing to keep in mind is that adding more workers doesn't make things go faster because they use the global settings that you set up here for the limits and the cost delays. And then the next section they cover is Factoring in Thresholds. So this triggers when vacuum should process a table. And you can set the scale factors, the vacuum thresholds to be able to determine how often tables in your database should be vacuumed and you can also set this per table. So those are settings that you can typically do to fine tune when a particular table gets vacuumed. And the last section he covers is ahead of the pack and basically defining at what point freezing should be done. Now, by default, this is at the 200 million row limit, but with a very active OLTP system, he suggests putting this up to say, a billion or at least something north of that 200 million. You definitely want to avoid the 2 billion limit, but you should be able to move it, give it a little bit more headroom and not have to freeze as often. So this was a great post about how to configure vacuum and if you're interested, definitely suggest you check it out. [00:03:45] Next piece of content how to drop role or drop user in PostgreSQL this is from Cyberdeck postgresql.com and basically roles of course own objects in the database. And when you try to drop a role, it won't drop those objects because there are objects that are dependent upon it. Now, with other parts of the system you can do a cascade, but why doesn't that work? And he mentions here that because roles are cross database in a postgres data cluster, so therefore you can't go into each database and delete all those objects or it doesn't support that because as he says here, a SQL statement can only affect objects in the database to which you are connected. So you can't really drop a role that then drops all the objects in multiple databases. And then he talks about how you can avoid trouble with drop role and what he advocates is set up group roles. So these are separate roles that own all the objects in the databases, the schemas, the tables, whatever you're creating. These group roles own those objects and then the user roles, they are just members of those roles. So essentially the users of the system never really own anything, so they could easily be dropped if you need to, he says. But there are two commands you can use to drop or assign roles more easily. One is Drop owned by and this drops all the objects owned by a particular role. Now this works because it's per database, so you basically have to go into each database and run drop owned by a particular role to drop all the dependent objects of that role. You could also do a reassign so you could reassign owned by a particular role to reassign objects to say, a group role. And he talks about the different ways you can do that here. But he did say there are a few objects left behind by drop owned by. One is databases that are owned by the role, but those can easily be addressed by transferring ownership or deleting the database if it's not necessary, as well as table spaces. But again, you should be able to reassign ownership or drop those separately. Then he says but what if I still get an error when I try to drop the Role postgres? I don't know when you would want to do this, but he says okay, here's a way you can do it, I guess, if you want to do it. But this was another great post and I highly suggest you check this one out. [00:06:03] Next piece of content waiting for PostgreSQL 15 add support for Merge SQL Command so this is a SQL standard command that's being added to postgres called Merge. So this is huge. You can think of it like insert on conflict. Do something like do an update or do nothing on steroids. So basically it enables you to do a statement and have a bunch of conditions. It could either do a delete, an update, an insert, or even do nothing. So they have an example here. When something matches and this condition matches, do an update. When something matches, then do a delete. When something is not matched with an additional condition, do an insert and when it's not matched, then do nothing. So this is kind of like an else. So it allows you to define a modification that should be done to the database and defining conditions under which it will be inserted, deleted or updated. And he goes through additional examples here. So this is huge and even he says huge. Thanks to all involved, definitely. So this looks like it's going to be getting into postgres 15. Now they have the separate docs on the Merge command here that goes into more detail about it and includes some additional examples. So if you're interested in that, definitely check out these pieces of content. [00:07:24] The next piece of content is a YouTube video and I was actually directed to this by one of the listeners of Scaling postgres, his name is Neil and he said, hey, why don't you check out this YouTube channel. And this is the high performance programming YouTube channel. And on about a weekly basis they've been posting videos about postgres, at least more recently. And this video they released this week was a PG Bouncer tutorial and they showed graphically kind of how it works, as well as go through the commands to set up PG Bouncer. And then they even did a scaling test where they showed much higher performance using PG Bouncer and avoiding the overhead of creating connections and dropping connections with talking to postgres directly. So if you're interested in that, you can definitely check out this piece of content. [00:08:13] So the next set of posts are all from Depes.com and they're talking about new features coming in PostgreSQL 15, specifically around JSON and they're adding a bunch of features to postgres that support the SQL JSON standards. Waiting for PostgreSQL 15 SQL JSON constructors so this allows you to construct JSON, JSON arrays, JSON array aggregates, JSON objects and JSON array aggregates. Now a lot of the functions to do some of these things already exist, but these are the SQL standard functions that have been added and he goes through the process of doing that. The next post is is JSON predicate and this allows you to test is JSON a value, is JSON an array, is JSON an object, is JSON a scalar, et cetera. So it lets you do these types of tests. And again, he goes through some examples in his post below. And the third post is on SQL JSON query functions. So this lets you query JSON exists JSON query JSON value, so you're able to evaluate these with these functions and he gives some further examples of how to use these functions. So if you're interested in these new SQL standard JSON capabilities coming to Postgres, definitely check out these blog posts. The Next Piece of Content parallel Commit in Postgres Varn Data Wrapper this is from Higo CA this is another post about something coming in postgres 15 that is the ability to do a parallel commit. So if you're doing a commit and you have multiple remote databases that you're connecting to via a postgres foreign data wrapper, there is a parallel commit function that enables you to commit to these in parallel. And it looks like it does gives you a little bit of a speed boost because of that. So if you're interested in that, you can definitely check out this blog post. [00:10:04] The Next Piece of Content how postgres chooses which index to use for a query this is from Pganalyze.com and this is an in depth post that goes deeply into the internals and looks at a lot of source code to follow through the path of how postgres chooses which index to use for a query. So if you're wanting to know how postgres does it at a detailed level based upon the source code, this is definitely a blog post to check out. [00:10:33] The next piece of content. Queries in PostgreSQL three sequential scan. This is from postgres.com. This is another in depth article about how sequential scans work within postgres. So again, if you're looking for a more detailed blog post about how the internals of postgres work, this is another excellent post to check out. [00:10:55] The Next Piece of Content ultimate Guide to Citiscon and Event for Postgres this is from Citusdata.com and this post is about the upcoming Citiscon which is April twelveTH. That covers of course their citizens product, but a lot of postgres content as well. So they cover the three keynotes and then multiple different talks about Citis as well as all of the different postgres talks. So if you're interested in learning more about the content at this conference, you can definitely check out this blog post. [00:11:24] The Next Piece of Content identified PostgreSQL Performance Bottlenecks with PG Stat Statements and they're talking about how this has been recently added to their Timescale Cloud account, but this is also applicable for postgres as well. And I always recommend clients enable PG Stat statements and use it because it's the best way to find out what queries are using the most time in your database. And they show you some of the commands that you can use to run to evaluate the different queries and the different rows that are returned, and how to identify long running queries, how to examine your hit cash ratio for different queries. But then, interestingly, they also discuss looking at queries with a high standard deviation. So looking at averages is one thing, but you might miss some queries that with some data are incredibly long. In other words, there's a wide deviation from what the average is compared to what the longest running instance of that query is. And they have a technique you can use where you're dividing the standard deviation execution time by the mean execution time to give you a coefficient of variance. And this is a way for you to again identify those queries that aren't just in and around the mean, but they have a wide variance. So they may also give you an example of some slower queries. But if you're interested in learning more, you can check out this blog post next piece of content installing Crunchy postgres operator version five on EKS this is from Bping Blogspot.com, and they're talking about installing the Crunchy data postgres operator version five. This is for Kubernetes, and they're installing it in conjunction with Amazon's Elastic Kubernetes service, which is EKS. So if you're interested in the install process for that, you can definitely check out this blog post. [00:13:14] The next piece of Content the PostgreSQL Person of the week is Claire Giordano. If you're interested in learning more about Claire and her contributions to postgres, definitely check out this blog post and the last piece of Content we did have another episode of the Rubber Duck Dev show this past Wednesday. This one was on redundant infrastructure on the cheap. Basically, how can you put together redundant servers without necessarily relying on a platform as a service to deliver your solution inexpensively? So if you're interested in that type of content, we welcome you to check out our show that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks. [00:14:10] You close.

Show Notes

Episode Transcript

Other Episodes

Episode 247

Generate Test Data, Faster Archiving, Date Statistics, Useless Indexes | Scaling Postgres 247

Episode 320

100 TB and Beyond! | Scaling Postgres 320

Episode 362

Optimizing For Time-Series Data | Scaling Postgres 362

Tidy Vacuum, Dropping Roles, Merge Command, PgBouncer Tutorial | Scaling Postgres 209