Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about 28 tips, lucky 13, auto, vacuum tuning and logical pitfalls. I'm Kristen Jameson, and this is scaling postgres episode 129 one.
[00:00:22] All right, I hope you, your family and co workers continue to do well. Our first piece of content is Postgres tips and tricks. This is from PG IO and this is a set of about 28 different tips and tricks with postgres from things you can do with SQL commands to other types of configuration. So I'll just run through a few of these just so you get a sense of it. But there are 28 of them, so I'm not going to cover everything. The first is you can insert multiple rows in one statement by doing a multiro insert. Definitely a better thing to do for performance. It allows you to insert data much faster, insert a row and return automatically assigned values. So this is the returning statement. Further down below you can see how you can use it with delete statements or update statements as well to return data from changed data, how to auto generate UUID primary keys. And there are basically 28 other tips you can use for working with postgres. So I highly encourage you to check out this blog post to see if there's a tip that could maybe help you in your work.
[00:01:26] The next piece of content is why PostgreSQL 13 is a lucky release. This is from Crunchydata.com and he's highlighting a few features that are coming with version 13 this fall. The first is why are my indexes smaller? Basically, with the deduplication that is in version 13, you can get a lot of space savings with Btree indexes. So he generates an index here and gives you an example where postgres twelve had a 28 megabyte index, whereas postgres 13 had a 9.5 megabyte index. In addition, query times were two x faster on postgres 13. So definitely a benefit. But of course, once you upgrade to 13, you're going to need to reindex probably concurrently your indexes to get this deduplication and presumably some of these performance benefits. But as he mentions here, your mileage may vary based upon your performance gains. The next thing is speeding up vacuum or auto vacuum, because now you can do indexes in parallel, which is a huge advantage because that takes the longest amount of time for vacuum to run, in my experience, is going through all of the indexes. The next is something I wasn't as familiar with, but it says sort less, sort quicker that they are offering an incremental sorting. And it appears like when you're sorting by more than one column, it uses some efficiencies. As long as there's an index. I believe on this column x the first column you order by to give you faster query execution. So it looks on here. As in version twelve, this particular query ran in 39 milliseconds, where in version 13 it ran in less than one millisecond, zero 82 milliseconds. So that's a huge performance change. Again, your mileage may vary, but if this is possible, that's definitely an interesting addition. Then of course, he mentioned some of the other enhancements whereby more create statistics options are possible, different partitioning benefits in terms of full support for logical replication and before triggers hash aggregation used with aggregate functions and grouping sets. The Lib PQ drivers now supporting Scram authentication with channel binding a date time function in the JSON path query language and generate a random UID can now be used without an extension. That's a good one. And postgres Farn data wrapper now supports certificate authentication. So a lot of big improvements coming for 13 that should be out sometime this fall. And if you're interested in learning more about it, you can check out this blog post.
[00:03:53] Next piece of content is tuning PostgreSQL Auto Vacuum. So first they cover what are the tasks of Auto Vacuum and they mention cleanup dead tuples left behind after update or delete operations. Update the free space map that keeps track of free space and table blocks. Update the visibility map that's required for index only scans. That helps the system determine if it needs to actually go look at the heap or it can just stay on the index freeze table rows so that the transaction ID counter can safely wrap around. And then also do an analyze so that your statistics are kept up to date. So the first thing that they mentioned is make sure that nothing keeps Auto Vacuum from reclaiming dead tuples. So basically make sure autovacuum is running. The main things that stop it from running are long running transactions. So basically do what you can to avoid those. And they mentioned two configuration changes that you can make is the idle in transaction session timeout. You can set that to a value such that if you have transactions that are idle, they will be canceled after a period of time. Also the general statement timeout. So any long running statements will automatically be canceled. The next step is tuning Auto Vacuum to run faster. What I consider the primary method of doing that is the Auto Vacuum vacuum cost limit here. So you basically put this somewhere up in the thousands to make sure more of the system is dedicated to doing vacuums, as long as it doesn't impact your query performance, of course. And then auto vacuum vacuum cost delay. In newer versions it's at two milliseconds, but you can of course adjust that. In older versions it was 20. So the closer you get to zero, the faster you'll auto Vacuum tables. And they do mention a number of times in this post that you can set these settings not just at the cluster level, but also at the table level. So here they show an example of a busy table where they're changing the vacuum cost delay to one millisecond. Now, I've heard opinions on both sides. One opinion is that you shouldn't really do table options because it starts making things more difficult to manage and just stick with the cluster level options. Whereas others say if you have particular tables that are exceptions, go ahead and change the parameters for those tables. So I'll leave it up to you to decide which you think is best. The next area is change the workload so that fewer dead tuples are generated. So basically, can you make changes to your application where potentially you're updating or deleting rows less? So for example, in terms of deletion, could you maybe partition and then just truncate or drop those tables? Or if you're doing a bunch of updates, could you batch those updates or alter your application so that you're maybe doing more inserts and less updates? They also mentioned here that you could reduce the number of dead tuples by using hot or heat only tuple updates. So you could set a fill factor to something less than 100 and then make sure when you're doing an update that the column you're updating is not indexed so it doesn't have to go to the index to do an update. In terms of tuning auto vacuum for index only scans, again, this deals with a visibility map. Basically he suggests setting your auto vacuum scale factor relatively low so that it gets vacuumed frequently. And then for inserts in version 13, there's the new Auto vacuum vacuum insert scale factor that will help that table be picked up to be vacuumed more frequently so you get the visibility map updated more frequently. This is for the case where you have predominantly append only tables. There's not a lot of updates and deletes happening now on lower versions. He says you can modify the freeze max age so that again you get it to vacuum more frequently. In terms of tuning auto vacuum to avoid transaction wraparound problems, he mentions making sure that the anti wraparound vacuum can freeze tuples in all tables. And again, we're going back to those very long transactions. You want to minimize those and do the suggestions that were mentioned previously and then to address it for updates, deletes and inserts. You just want to make sure Autovacium is running. There's no long running transactions or things of that nature that's blocking auto vacuum from doing its job. Now they also mentioned partitioning and he says if you are partitioning with a lot of tables, then you probably want to consider having more auto vacuum workers because more tables, then you can work on them more frequently with more workers. Although keep in mind the cost limit, which was the first thing mentioned here, covers all workers. So if you have three workers, they can do more work per worker. But if you go to six, this limit is for all six, or if you go to nine, this limit is for all nine. So the more workers you have, the less work that can be done by each one, but if you're doing partitioning with a lot of tables, that's okay to do because there's less work to be done in each partition. And lastly, they cover tuning autoanalyze and some configuration changes you can make to have it analyze your tables more frequently. So overall, if you're looking to do some tuning to PostgreSQL's auto Vacuum, definitely check out this blog post from Cybertechn Postgresql.com.
[00:08:43] The next piece of content is Pitfalls and Quirks of Logical Replication in Postgres Twelve. This is from Elephanttamer Net. Now, he's talking about logical replication. He doesn't mention the particular version, but he talks about some quirks you need to be aware of. The first thing he talks about is that disk space considerations because if you have logical replication failing for some reason, you could potentially run out of disk space on the primary because logical replication uses replication slots. Now, you can also run into this if you're using replication slots for physical replication or wall based replication, but you pretty much have to use slots with logical replication. Therefore this disk usage risk exists. So you need to have monitoring in place to handle that and identify if you're getting too much wall due to, say, an orphan slot being present. The next thing that they mentioned is that basically you need to have a replication slot first before you create your subscription. Otherwise, the create subscription statement just hangs. So that slot needs to be present. Next thing he mentions is be careful with publication for all tables because if you do this, it doesn't really address tables that are newly added. In fact, he says, quote, it will cause the replication to stop as soon as you issue a Create Table statement. In addition, it'll automatically include tables created by extensions such as the spatial refsys table he mentions for PostGIS and also for tables that don't have a primary key or Replica identity, which poses a problem for logical replication. So you want to be very cautious, he says, of these for All Tables, and perhaps just do individual tables. The next thing, consider the tables that you're replicating, because in the case of the spatial sys table, this is actually supposed to be independent per PostgreSQL instance, even if you're replicating over. So you're probably not wanting to replicate this particular table. Next is to review your primary keys carefully and make sure each table that you're Replicating has a primary key or a Replica identity. And the last recommendation is after adding a new table, be sure to refresh your publication. So for example, you do an Alter publication, add a table on the primary server and then you Alter subscription refresh publication on your Replica. So definitely some issues to be aware of if you are using logical replication. If you want more details about this, check out this post.
[00:11:03] The next piece of content is actually a YouTube video and it's webinar business intelligence with window functions in PostgreSQL by Gianni Jolie. This was actually done as a webinar in August 2019 and covers PostgreSQL Eleven. But I'm highlighting this because anything with window functions, I always like to remind myself of their features because I don't use them that frequently. So if you want to learn more about window functions, this YouTube video was just posted from the webinar that was done in 2019.
[00:11:32] The next piece of content is authentication in Pgpool Two. This is from Bping blogspot.com. They're talking about how you set up Pgpool Two in terms of authentication, in terms of how it works. Basically we have a client that first authenticates to Pgpool Two and then Pgpool Two authenticates to the PostgreSQL server. And they have all these different authentication methods that are supported from trust MD five Scram, Shot 256, certificate, Pam and LDAP. And some of these are version specific. Of course for PG Pool Two, then it goes into some of the different settings you can make and configuration. And of course they definitely suggest using either some sort of directory based authentication or using the Scram for security reasons. So if you're interested in understanding how PG Pool Two handles authentication, definitely check out this blog post.
[00:12:24] The next piece of content is introducing the Postgres Prometheus adapter. This is from Crunchydata.com. Prometheus is a systems and service monitoring system, so it tracks metrics for different servers, for different metrics you want to measure. Now it has a built in database it already uses, but this blog post describes an adapter that allows you to send data to Postgres as opposed to the default database. So it has the code of how you get the adapter and how you can set it up and configure it to be able to use Prometheus to send all of your data to Postgres. And it looks like it sends it to partition tables. So if you use Prometheus and want to try to send the data to Postgres as opposed to its default database, definitely check out this blog post from Crunchydata.com.
[00:13:11] The next piece of content is who is spending wall crazily. This is from Heigo CA, and they're basically talking about trying to identify what is causing increase in the amount or size of wall records. And basically this post is a tool kind of describing PG wall dump. So this helps you look at each of the different types of records that exist in a set of wall files and tells you the record size and the percentage of each type. So this blog post goes over how you can use the command PG Walldump, some different configurations that they've set and also describes some of the different column output and the types of records that exist. Now, they don't cover everything, but they give a basic overview. So if you're interested in using PG wall dump to analyze your wall files, definitely check out this blog post.
[00:14:00] The next piece of content is Advanced Partition matching for Partitionwise join. This is from Secondquader.com, and this is a feature that's coming in postgres 13 where they made some enhancements to partitionwise joins. So if you want to learn more about the enhancements that are coming to version 13, definitely check out this blog post.
[00:14:22] Next piece of content is waiting for PostgreSQL 14 PG Stat Statements track number of rows processed by some utility commands and basically for statements tracked in PGSTAT statements, it retains how many rows were processed. So this is now an output option. So you can see how many rows are returned by given statements that are tracked. So if you want to learn more about that, check out this blog post from Depc.com.
[00:14:48] Next piece of content is preventing SQL injection attacks in postgres. This is from Crunchydata.com. Now, this isn't necessarily postgres specific, they do mention a few configuration changes you can make to help audit some of the statements coming through here, like Log statement, Logman Error Statement. But a lot of the advice I give is for any database system and a lot of application frameworks kind of do all of these things already, but it was still a good post to give you an overview of SQL injection attacks. So if you're interested in learning more, definitely check out this blog post.
[00:15:22] And the last piece of content is the PostgreSQL person of the Week is Renee Phillips. So if you're interested in learning more about Renee and her contributions to PostgreSQL, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show Notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.