Episode Transcript
[00:00:00] In this episode of Scaling Postgres, we talk about useful features, monthly blog, PG 15, anticipation and generic plans. I'm Kristen Jameson and this is scaling postgres episode 233 one.
[00:00:23] Alright, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is six simple and useful PostgreSQL features that I wish I knew when I started. This is from It Bettykov.com and he's running through a set of, well, a little bit more than six features that he likes using in postgres. The first one is identity. Historically, when you want to create an auto incrementing integer, you would use serial and that creates a sequence that auto increments and it uses that to populate typically an ID column of a table, its primary key. But the SQL standard is actually to use identity and you can actually create a SQL standardized way to do it using Generated always as identity. And it does the same thing in the background, creating a sequence, but it gives you an auto incrementing integer. Now, this has some benefits compared to the serial. Number one, it is part of the SQL standard. Number two, the serial. You can actually insert values outside of what the sequence is choosing to insert. Whereas if you try to do that with an identity, it actually generates an error and saying basically you can't insert your own IDs into it because it's Generated Always. However, he does show an insert that allows you to override what the system is doing and insert your own values. The other advantage he mentions is that it's easier to grant privileges for someone to insert into a table, whereas when you're working with serial in It, you have to actually grant usage on the sequence to it, whereas you don't have to with the identity. The next one he covers is coalesce and null if. So coalesce is a function that basically returns the first value that is not null. So in the sequence of values here, it returns the first, not null value, which is one. That's what the coalesce function does. The null if returns if the two values you're passing in are equal, then it returns a null. Otherwise it returns the first value. So null if one and one return null, null if one and zero it returns a one. Then he says that you can actually combine these to return the first value that is not zero by combining nullifs and coalesce. And he shows an example of doing that here. The next area he covers is grouping. So he talks about grouping sets, which group by normally enables you to group your rows together as a part of the output. Well, when you create a grouping set, it groups sets of values together. So for example, he's using grouping sets one and two. And these refer to the columns up above, country and Product Name. So you can see when you do that, it groups the sets of values together. Now roll up is different in that it actually gives you totals of a given value that you are grouping up. So you can see the total for Argentina, the total for Germany, the total for USA, as well as the totals for everything. So it's basically something you don't have to calculate in the application. You can do a roll up using SQL. The next 20 covers is Cube, which is basically like roll up but it gives you all the different permutations of ways to adding up numbers. Essentially it includes the second group's totals in there as well. The next area he covers is common table expressions or with clauses. Essentially. Basically this is a way to do sub queries but have it be much more readable than a sub query would typically be. Next he covers is Domains, which is a separate object you create that represents a defined column that can define nulls and defaults and constraints. And for example, the Postgres documentation uses postal codes as one. Maybe you want certain constraints around that to say what values are viable to go in the postal code. And once that domain is created, you can then use it as a data type in any table within the particular database. He mentions the Using keyword and as long as your ID columns are identical between two different tables, you can do a join and the Using keyword and those are the things that he wish he knew when he was getting started with Postgres. Feel free to check it out if you're interested as well.
[00:04:28] Next piece of content psql Friday a monthly blog event for the PostgreSQL community. This is from Software and booze.com and he's talking about starting a monthly event where the community writes blog posts on a particular topic. So basically a volunteer is chosen who is running a blog and they invite someone to participate and create a blog post. Now he says invite, but I'm wondering if some of the blog owners are going to be writing the post themselves and then they post it up and then the community posts about that same topic and references the original host of the first blog post. So we'll have to see how this works, but he's already lined up a number of hosts to get started and the first two proposed topics are Why PostgreSQL and the second one is Upgrade PostgreSQL. So I'll be keeping an eye on this and see what comes of it. The next piece of content is actually a YouTube channel and it's PostgreSQL 15 Anticipation. This is on the San Francisco Bay area PostgreSQL users group. And this is something that Christoph Pettis presented. It's about 34 minutes in length, but it's a great overview of many of the different features coming in postgres 15. And if you haven't been keeping up with it as we get closer to the release date of it, I highly encourage you to check out this particular presentation on it.
[00:05:52] Next piece of Content explain from PG Stat statements normalized Queries how to always get the generic plan in postgres this is from Dev two in the Yugabyte section, and he's talking about when you're working with PG Stat statements. All of the queries are essentially normalized. They convert values that are passed in into these variable replacements and you can't just take it and do an explain plan on it because it gives an error like there is no parameter dollar sign one. Now you can prepare this query and then you can see what data types it expects by querying the PG prepared statements system view. But if you try to explain and execute this statement, it won't really give you a plan. Now the key is you actually need to have it generate a generic plan and apparently you can run it multiple times to eventually get a plan to come out. But with postgres twelve there's a way to set the plan cache mode to be forced generic plan. Now, there's a lot more detail with this particular post, and I encourage you to check out five minutes of postgres episode 35 how to Run Explain on Normalized queries from PG Stat statements using Plan Cache mode, where Lucas goes into much more detail about the previous post and adds in some more information and insight into how this works.
[00:07:11] Next piece of content what is an inner join in SQL and what is an outer join? This is from Cybertechyphen Postgresql.com and they're covering different types of joins. They talk about explicit versus implicit joins. The implicit basically you're not specifying a join, it's just with a column between the tables. And then the where clause actually defines the join relationship. Whereas explicit joins, you're literally saying join on the particular columns of interest. And he says the plans for these are mostly the same, although he does mention that once you get up to more than eight or nine tables joined together, you start seeing some differences between them, but for the most part they're about the same. Then he covers outer joins and there's left outer joins where show the values in A even though they don't exist in B. And then right outer joins where show the values in B even though they don't exist in A. He talks about full join, which is showing all the values in matching up the rows that match. But then he mentions common mistakes and errors where someone seemingly put something like a where clause within the on section for a join. And I haven't seen someone do this, but apparently someone has and it gives you very unexpected results. So you definitely don't want to put something there. If you want something like this, you're probably going to want it to potentially be an inner join or have this in the where clause. Because presumably what someone is looking for is just this row. But you definitely don't. Want to put this here and then the closes out the post talking about semi joins which is basically using where an ID is in a value from a sub query or not in. So if you want to learn more, definitely check out this blog post.
[00:08:50] The Next piece of content PostgreSQL JSON versus JSON b. This is from Proopensource it and they're talking about the difference between the JSON data type and the JSONB data type. And basically JSON is great if you're wanting to retain the full format and spacing and including duplicate keys. Everything's identical and you're just wanting to store that payload in postgres. That's what JSON is for. JSON B is great when you're building your own JSON, you want it to be fast, you want it to be indexable. Because JSONB is basically a binary format, it removes duplicate keys. And again, like I said, it enables efficient indexing using gen or Gist indexes. And this blog post covers a number of those issues as well as other areas such as the different types of data that are allowed within JSON. So if you want to learn more about that, you can check out this blog post.
[00:09:46] The Next piece of content working with snapshots in PostgreSQL this is from Procona.com and explicitly they're talking about exporting transaction snapshots. And this is a capability that was developed so that when you do a PG dump to a directory format, you can actually specify the number of jobs that can work on it. So basically multiple processes will export your data in parallel. And it does this by being able to export transactional snapshots between different processes, apparently. And he shows an example of how this essentially works in Postgres, or at least the rough implementation of it. So if you want to learn more about that, you can check out this blog post. Next piece of content generate unlimited crypto using Postgres this is from Crunchydata.com. The title is a bit of a misnomer. It's basically about the PG Crypto extension and all the different ways that you can do encryption related activities within Postgres, from using it for symmetric encryption, using it for public key encryption where you actually have a secret key and a public key, working with OpenPGP, as well as generating and testing passwords, as well as generating digests. Now, for a lot of this activity, I tend to use my application framework to do it. I usually don't like Postgres doing this responsibility or storing keys anywhere on the database. I usually have them in the application and the data gets encrypted and then put in Postgres. But Postgres definitely has capabilities of doing this with the PG Crypto extension. And if you want to learn how to do that, definitely check out this blog post next piece of Content Distributed postgres goes full Open source with Cytus why, what and how. This is from citrusdata.com. And this post goes into a little bit more depth about why they decided to go open source with their Citus extension. And primarily it was because they were doing enterprise licensing. That kind of got diminished when they introduced the hosted Citus service on Azure. And basically, that is what's funding this work as opposed to the enterprise licensing now. So they basically released all the features open source, which is great, but if you want to learn more details about that, encourage you to check out this blog post.
[00:12:00] Next piece of content configuring and managing VIP for Pgpool Two on AWS. This is from Bping blogspot.com. And by VIP, they're meaning virtual IP addresses. And essentially how you do this in AWS is using Elastic IP addresses so they talk about how you can use PG Pool Two on AWS using this capability.
[00:12:22] The next episode of Postgres FM has been published and it's on query macro analysis. Intro. So basically this is looking at queries and different runtimes on a probably 10,000 foot view at the whole system level. Not an individual query, but looking at all the queries and how they're performing. So you can definitely listen to this episode or click here to watch the episode on YouTube.
[00:12:47] Next piece of content. The PostgreSQL person of the week is Karen Jex. If you're interested in learning more about Karen and her contributions to Postgres, definitely check out this blog post and the last piece of Content we did have another episode of the Rubber Duck Dev Show this past Wednesday evening. This was on engineering managers versus product managers and how they can work together to achieve their organization's goals. So if you're interested in that type of long form discussion, definitely welcome you to check out our show.
[00:13:18] That does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show notes. Be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.