Episode Transcript
[00:00:00] You. In this episode of Scaling Postgres, we talk about psql, G Exec, delete, duplicates postgres, podcast and puny to powerful I'm Creston Jameson. And this is scaling postgres episode 225.
[00:00:20] One all right, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is Gxec in psql PostgreSQL power user practice. This is from CyberTech postgresql.com. And they're talking about using G exec. So when you use psql, it has a number of different slash commands. One is G, which just repeats the command that was just executed. It runs it again, but G Exec actually takes the output from the previous command and actually runs it as statements within psql. So that's what this post explores. So the first exercise is if you're going to be wanting to craft commands to run within psql, you're going to need to craft those statements. So the first example here he's using is the double pipe operator, which is basically the concatenate two things together. So here he grabs a number of different tables just by selecting from PG tables. And then he writes a statement just as a literal, concatenates it to the table name that's being returned up here, and then concatenates the remainder of the command, including, of course, the semicolon that produces an output as you see here. Now he also talks about a safer way to use this other than that, but we'll continue with what he's doing at this point. And then you simply call G Exec on that a psql command and it will actually run the statements from the last command's output. So it runs this command and this command, and this command and this command. As you can see down here, all the grants were run. Now throughout this post, he's using the double pipe operator. But if you are unsure of the input you're going to be passing to this, you are vulnerable to an SQL injection attack. So you want to use some different functions in order to build out these commands. In this particular example, he could have used a quote Ident, because this is an Identifier he's passing in essentially. Or you could use format and it has a number of different params you can use. Like this is the Identifier param. There's also just being able to put a literal string for literal string replacement. And he uses the format example here where he's writing out the full query and then within it saying, okay, this is an Identifier that needs to be replaced and it's going to be replaced with the table name. So this should be familiar to you if you've used other interfaces that protect against SQL injection. Basically you have a replacement variable in here that gets replaced by what you want to pass in. So that's basically what Geexec does. Now he further describes some different exercises you can do. One example, if you actually need to run certain commands against the tables, you're identifying in this case, you can actually use a cross join to explicitly stated values here and here. He's able to say, I want to do an insert, update and delete for each table and then you can actually build that out. Again, you should use the format or quote Ident functions instead, but it actually says Grant Insert or Grant update or Grant Delete on the particular table of interest. Now, he also mentions here you could also add double quotes for different purposes if they're required. But again, you shouldn't necessarily use the concatenation operator. It's easier to use format. And actually when you look at the postgres documentation, I'll include this link as well. When you get down to the format function and they're talking about the SQL Identifier replacement, it actually double quotes it if necessary. So you don't even have to worry about explicitly doing double quotes within it if you use format because it'll do it automatically. But if you want to learn more about that, definitely check out this blog post.
[00:04:19] The next piece of content delete duplicate rows. This is from Sqlfordevs IO and this is a slightly different take on how you can delete some duplicate rows. Now he gives both a MySQL example and a PostgreSQL example, but he first talks about usually how people handle this is they do a group by and then they give the men ID to retain and delete everything other than that min ID maybe using some sub queries to do it. And I myself have done this frequently, but this is a pretty elegant way to do it, using a window function to do it. So basically he has a CTE where he is identifying all of the duplicates. He pulls back the ID, he wants to print out the row number as a part of the window function and he does it over these partitions. So every distinct first name, last name, email will get its own row number. And in the case where they're identical, they'll get however many duplicates exist and then he orders it by age. Now, if this was say the created at for that record, or if it's the age of the record, basically you would want to order it by which record you want to retain. So if you wanted to keep the oldest record, you probably wouldn't do it by descending. You would want to do it ascending by the created at date, for example. But that'll give you all the duplicates and show you how many duplicates there are. And then you can use this delayed statement. So delete from the table using duplicates, using the CTE as the reference point and saying where the ID equals the ID and the duplicates row number is greater than one. So you're basically going to keep the first one that you've identified by this ordering, so however you want to do that. So I found this to be a pretty great way to delete duplicates and if you want to learn more, you can check out this blog post.
[00:06:04] The Next piece of Content introducing a new postgres podcast. This is from Pgmuster.com and he's announcing the start of Postgres FM, which is also going to be posted on Postgres TV. I believe this YouTube channel. So this looks to be a weekly format, roughly 30 minutes for each episode. So if you're interested, I believe they have an episode or two up so you can go ahead and get started.
[00:06:31] The next piece of content is actually a YouTube video and it's RailsConf 2022. Punitive powerful PostgreSQL Rails Apps by Andrew Atkinson. Now, this was from a Ruby on Rails conference, but it all covers areas of postgres performance, particularly talking about doing safe migrations and how to handle connection pooling, how to do query optimization using things such as PGE stat statements and enabling that for your database, as well as index maintenance such as vacuum and following up with partitioning. Now this is more beginner level information he's covering, but it's great for developers and I thought this was a pretty good talk about those subjects. So if you're interested, you can check that out.
[00:07:17] The Next Piece of Content five minutes of postgres episode 27 waiting for postgres 16. Make subquery alias optional in from clause. This is from Pginalyze.com. Now this is something coming in postgres 16, which is actually about a year and three months away. But this particular feature many, many people will appreciate, as Luca shows the question about, hey, why doesn't this work? When you don't have an alias. When doing a select from a sub query shows up over 200,000 times on stack overflow. And I myself do this all the time, where I'm selecting from a sub query and I forget the alias and it gives me an error and I have to put the alias in. Well, basically it looks like in postgres 16 you won't have to do that anymore because it has been merged in as a patch. But if you want to learn more about that and what you can look forward to, definitely check out this piece of content.
[00:08:12] The next piece of content is another YouTube video. This one is also from RailsConf 2022 and it's a Rails performance guidebook from zero to 1 billion requests per day. Now, this wasn't all about databases. There is some general performance topics covered here, but there are a few database items covered. But this is from a performance engineer at Zendesk and basically they have billions and billions of tickets and terabytes and terabytes of data. So I always find it interesting getting insight into how these larger applications run so you may find it of interest as well.
[00:08:52] The Next Piece of Content PostgreSQL Performance Puzzle this is from Ardentperf.com and this is actually a puzzle that someone proposed. They set up a table and with it just a few columns and did a query and got dramatically different results for data that one would potentially expect to be similar. Now, I'm not going to give anything away because you can actually go through the puzzle, see if you can figure it out, and at the end of the post, they do give the explanation. So if you're interested in that type of performance puzzle, check out this blog post next piece of content. The PostgreSQL Person of the Week is Laura Ricci. If you're interested in learning more about Laura and her contributions to Postgres, definitely check out this blog post.
[00:09:38] And the last episode is we did have another episode of The Rubber Duck Dev show this past Wednesday evening. This one was with Andrew Mason and Colin Gilbert of Ruby Radar. So if you're interested in a developer discussion, we definitely welcome you to check out our show.
[00:09:56] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.