Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about monitoring progress, more SQL replication, slot failover, and postgres contributors. I'm creston Jameson. And this is scaling postgres episode 198.
[00:00:24] All right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is progress bar for postgres queries. Let's dive deeper. This is from postgres AI, and he's talking about a previous post that he read called Query Progress Bar. And this is something I considered putting on Scaling Postgres, but it was actually this quote that he mentions here, don't use this in Prod. Why? I ultimately decided not to showcase it last week. Plus there was tons of content because it was a couple of weeks of content due to the holiday break. But he reviewed it and he kind of did his own riff or his own take on it, talking about how to monitor postgres. Now, first he goes into the thing to consider is that if you're going to be using something like this against production, you need to split the updates or deletes or inserts whatever you're doing into batches because you don't want anything to take too long and impact vacuum or impact locking or things of that nature. And his gauge he likes to use is 1 second. So whatever amount of records you choose to make changes to keep any updates to 1 second. So batch it in 1 second increments to hopefully avoid excessive locking and also vacuuming, although with a vacuum, it still may have difficulty keeping up depending on how quickly you're doing these batches. So sometimes when I'm doing mass updates on a very large database like Terabytes in size, it is important to periodically do vacuums and analyzes during any sort of automated process. But anyway, he takes the examples from Brian's post that he mentions here, and he has devised his own way of doing this type of progress bar that looks something like this, and he does it in a batched fashion. So I found this particularly interesting and he shows an animated demonstration of how it works. Now, going forward, I'm not sure I would use this technique because historically I've either created Bash Scripts to do it so I have an existing template I use to be able to do a lot of data changes in batches. Or you drop down and use the language, say that an application is using it could be Python or Ruby and use that to do the updates and just lean on the SQL to do the actual deleting updating part. Or with more recent versions that support procedures, you could devise a procedure to do it or even call a PL SQL function to do certain things. But this is an interesting way to do it, essentially just using a CTE to give you a little mini progress bar. Now, he does mention something that could be useful for non production usage, and that is PG Query State, which is, I think, a lesser known extension, but he says you can do it if you have your own postgres instance to use, so that's a possibility. He also mentioned monitoring DDL queries. Now, this is super important when you're doing things like creating indexes concurrently, particularly on terabyte sized tables, they can take a very long time and being able to track the status of it can be important. Now they do have in version twelve the PG Stat Progress Create Index and he actually has what looks to be a very useful query to be able to track that and actually give you a status of it. So I'm less worried about this because it is just doing pure selects against system views essentially, and PGSTAT Activity to give you this information of the progress and it just uses a watch command to keep up and update the status on a periodic basis. So this could be super useful. And he mentioned some other long running commands that you could use and has links to progress reporting because they continue to add more capabilities to be able to monitor different jobs in postgres. So I found this to be a super useful post and if you're interested, you can go ahead and check it out.
[00:04:31] Related to that is the next piece of content, which is Quick snapshot on Postgres active SQL from PG Stat statements. They also have a companion post looking at a snapshot on PG Stat Activity. So basically it uses a CTE and some techniques to be able to give you a 32nd snapshot of what are the active queries that are running against the system. And they do it for PG Stat Activity and for PG Stat statements in each of these two postgres. Now, what's interesting is that version 14 now offers Query ID. So I wonder if some of these queries could be made easier by using the Query ID to link information between PGSTAT Activity and PGSTAT statements. There have been some previous posts on Query ID that we showcased on a previous episode of Scaling Postgres, so I might prefer doing that technique to some of these examples here, but this is available if you want to learn more.
[00:05:29] The next piece of content more SQL, less code with PostgreSQL this is from PG IO and they're showcasing different SQL you can use to actually minimize the amount of code you run or have to write in your app, or just sends less traffic back and forth between, say, your app and your database. So the first feature they showcase is an Upset. Basically try to do an insert and if the record already exists, update it instead. So this is a capability that has existed since postgres 9.5. So if you're looking to do an insert or just update it if it already exists, definitely use Upsets because again, it minimizes network traffic and with a single statement you can easily do it rather than having to do a select, see if the value is shared, do an insert and try to avoid race conditions. You can just do it in one go with a statement. The next one is insert returning. So when you do an insert you can actually ask it to return certain values. One of the big ones that tends to happen is when you have auto generated IDs, like through a serial data type or an identity data type, it will return to you what those IDs were inserted. So this could be super useful. And again it minimizes round trips because you're sending one insert statement but you're getting information back on the record being inserted so you don't have to do a follow up select. The next one is Any in a set. So if you have a case where you have a query where you're using an N, where you're checking different values, there's another way you could do it. You could just send it an array and just use any and within paren send an array of values to check if it's there and this way you can insert as many values as you want to in the array. Although they do say this is better for a limited set of values, not a ton of values. You may want to use a different technique if you're going to be doing this with a ton of values. The next one is moving rows between tables. So this is something you can do using a CTE in one go by deleting rows from a table and then inserting those deleted rows into a new table so they show you a way of doing that here. The next feature they mentioned is Coalesce which enables you to replace null values with some other value. It could be with a blank for a string or it could be a zero for an integer or even any other particular type of value. So it enables you to more easily handle nulls and it's a function that exists right in the database. Next feature they mentioned is Case statements, or these could also be filters and it basically is an if then for the information to return from the database. So you can choose to do this in SQL versus your own code. The next is Support of Unions so combining different result sets from queries into one. The next feature is select distinct on and this can be convenient when you're only wanting to return one row from a join table, like for example getting the most recent purchase from a particular customer. You can use distinct on to do something like that and then the last thing they mention is actually using numbers in group by or order clauses. So especially when you have something like a case statement, you can choose to use numbers and it makes it much more convenient than having to repeat the exact case statement that exists in the select area. So if you want to learn more about these techniques, definitely check out this blog post Next Piece of Content how Petrone addresses the problem of the logical replication slot failover in a PostgreSQL cluster. Now I chose this post because this tends to be a big problem with replication slots in that the replication slot that exists on a primary does not get copied over and its state does not get copied over to a replica when it is promoted. And I think this is a pretty big feature gap for postgres. So this can be a problem for people where you have a failover event, but then you forget to do the replication slot or you don't have a process in place to recreate that replication slot and have it in a particular state. Now, ideally this feature would exist in postgres itself and I hope it will eventually get there. But in the meantime, it seems the Petroni project, which does high availability, has devised a way to handle this replication slot failover. So basically what it does is it manages Petroni, the replication slots for postgres. And when you say you want to create a replication slot, it actually creates it on the primary, but it also creates it at the same time on each of the replicas and then it has a process where it keeps the replication slot at the correct location. So when it advances on the primary, it advances it on the replica and before a failover, it makes sure that those are in the same state. So basically you can presumably seamlessly transition from a primary to a replica and that replica would have a slot that's in the proper location and you essentially wouldn't lose your replication slot. So definitely good to have, but I definitely prefer it to be a feature that's added to postgres in the future. But if you want to learn more, you can check out this post from Procona.com, the Next Piece of Content who contributed to PostgreSQL development in 2000 and 22,021. This is from Robert Haas at Rhoss Blogspot.com and he's presenting a list of all the contributors lines, adjusted percentage of the lines and the number of commits across 2000 and 22,021. So thank you for everyone who contributed to postgres and has made it what it is today. But if you want to look at some of the contributors, you can definitely check out this post.
[00:11:12] The Next Piece of Content a Postgres Primer for Oracle DBAs this is from Crunchydata.com and this is a pretty simple post for those on Oracle looking to go to postgres and what are the top line things you should be thinking about? And they compare the differences in terms of nomenclature between what are table spaces, users versus roles, what about the procedural language, how do you do an export and an import? How you do graphical administration, if it exists, although there are multiple tools and tools, and PG Admin is just one. But they also cover important things to think about, such as how do you handle backups and the different ways to do it, and even some different specific tools like PG backrest, as well as some things to take into account with security and user access. So if you are on Oracle and looking to migrate to Postgres, you may find interest in this post next piece of content. Why give a conference talk or why give a postgres talk? This is from Citusdata.com, and there is an upcoming event, Citiscon, an event for postgres. And they're looking for new speakers and they're advocating as why it would be important to give a talk, because basically this could prove beneficial for both you and the organization that you support. And they give some ideas of some subject topics you may want to cover and how it is highly encouraged, even if you're new, to postgres, to give a talk about something new that you've learned and to present a proposal to them. So if you're interested in potentially giving a talk and want to learn more about ways to go about it, you can definitely check out this blog post, the Next Piece of Content PostGIS Three Two new and improved. This is from Crunchydata.com. They're talking about new features that have been added to the just released postgres Three Two that to my understanding works best on postgres 14. But here specifically, they're talking about differences in the Raster algorithms, discussions about Raster cloud access, faster better validity and faster just index builds. But if you're interested in learning more about that, you can check out this blog post.
[00:13:18] The next piece of content creating a Standby Cluster with the Percona distribution for PostgreSQL operator. This is from Percona.com and they're talking about using their operator I believe it's a Kubernetes operator, and setting up basically a standby disaster recovery site that is separate from your primary cluster. And how do you do that? So they walk through the process of setting that up using their operator. So if you're interested in that, you can check out this blog post.
[00:13:45] The next piece of content. The PostgreSQL Person of the Week is Emile Shakolnik. If you're interested in learning more about Emil and his contributions to postgres, definitely check out this blog post and the last piece of content. We did have an episode of the Rubber Duck Dev Show this past Wednesday where we talked about the JavaScript options that are available in Rail Seven. So if you're interested in a long form podcast discussion format, you can definitely check out this content. This Wednesday's upcoming show is actually going to be talking about the dark side of open source. So if that content interests you, we welcome you to join our show.
[00:14:21] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com where you can sign up to receive weekly notifications of each episode. Or you can subscribe via YouTube. Righttunes. Thanks.