PG15 Beta 4, Primary Key Options, Sequence Limits, Configuration | Scaling Postgres 232

Episode 232 September 12, 2022 00:14:36
PG15 Beta 4, Primary Key Options, Sequence Limits, Configuration | Scaling Postgres 232
Scaling Postgres
PG15 Beta 4, Primary Key Options, Sequence Limits, Configuration | Scaling Postgres 232

Sep 12 2022 | 00:14:36

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss the release of Postgres 15 Beta 4, different primary key options, sequence limits and how to examine your Postgres configuration.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/232-pg15-beta-4-primary-key-options-sequence-limits-configuration/

View Full Transcript

Episode Transcript

[00:00:00] In this episode of Scaling Postgres, we talk about postgres 15, beta four, primary key options, sequence limits and configuration. I'm creston jamison. And this is scaling postgres episode 232.1. [00:00:23] Alright, I hope you, your friends, family and coworkers continue to do well. Our first piece of content is PostgreSQL 15, beta four released. This is from Postgresql.org and basically they've just released a new beta of postgres 15. There's a number of changes that they do list here from beta three, including that the SQL JSON features have been removed that they were planning to go into 15. This was mentioned last week in one of the posts and there's a suggestion in one of the posts this week that it may have been performance regression related. But if you want to learn more, check out this blog post next piece of content. Choosing a Postgres primary Key this is from Supabase.com and they're talking about how you should go about and choose a primary key for your tables. Now usually the first thing that people reach for or use is integers or big integers. And Postgres has a very convenient way to set these and have them auto increment by using serial and big serial and using that as the data type when you create the table and setting it as the primary key, as you can see down here. Now what that does is it creates it as an integer primary key, but it also creates a sequence that gives you auto incrementing values. Now you can also create an auto incrementing ID with integers using the SQL standard, which is key generated by default as identity. And a lot of people recommended using this going forward because it is the SQL standard. But serial and big serial still work. In Postgres, they also discussed potentially using random IDs for your primary key and the benefit of that is that someone has less information about your database. Like if you show certain user IDs as part of the URLs, it'd be hard to interpret what the next one is. Now, whereas you can use random IDs they're talking about, it's probably more important to use secure random IDs, but whenever you start talking random IDs, you lead into UUIDs. Now they focused on discussing two different types of UUIDs version one, which is presumably the original one, that is actually constructed of a date timestamp using the Gregorian calendar, a Mac address. So as you can tell, it's not fully random because it's including a date time and a Mac address in it, but it tries to give you what it says, a unique identifier. Now version four, UUIDs use pretty much all the characters for randomness. And generally this is the version that you've probably been using for a while, if you're using IDs is a fully random version for UUID. And in Postgres version, I believe 13, there's a built in way you can do it, or for versions prior, you could add the extension to do it. Now, apparently relatively recently there's been an explosion of different UUID types and versions six, seven and eight were published in 2021. So really recently. And the reason why is because you can have a lot of problems when using these UUIDs because they sort terribly I mean, essentially they're very random in nature and this can cause a lot of problems. With trying to index them and trying to write them when they're being indexed because you have to go all over the index to place the proper values. And a lot of organizations have developed new versions that I would call pseudorandom, primarily using time. So what the IETF did was establish these three additional versions, version six, version seven, and version eight. Now, along with this post, I did some googling and I found this additional reference analyzing new unique identifier formats, UUID six, seven and eight. This is from Devgenius IO and he gives a very good briefing of kind of the differences. So UUID six is basically a simple reordering of the bits within the UUID to allow it to be sorted as an opaque sequence of bytes. And what they basically did, they took UUID version one and instead they're putting the year first and then the month and then the day and then the timing. So it's much easier to sort compared to UUID version ones. Now, UUID seven is a new time based UUID. So the first part of it is time based and the latter part of it is random. And the added benefit, it's based upon the Unix timestamp. So that's much more familiar for a lot of people compared to the Gregorian calendar, version six one. So basically UUID six is probably your go to if you're going to want to use something in a database. And 100% randomness is not important. So it's a pseudo random number. The first part of it sorted by time and then the latter part is a random sequence of characters. And UUID version eight, it has a lot of flexibility in how you want to build it out. But going back to the original post, you could see what they mentioned here. UUID six has 62 bits of Gregorian time and then some 48 bits of randomness. UUID seven has 36 of big Indian Unix timestamp and then variable randomness up to 62 bits. And then version eight is more variable. So basically, if you can't find a use case for version seven or version six, then you should use version eight. Now to go to a different post that I think is also relevant if you're wanting to learn more about. This is a post from 2018 from second quadrant called Sequential UUID Generators. And in here they're talking about an extension that was developed called Sequential UUIDs, which it seems to be they're doing a similar thing to what the standards bodies have done with version six, version seven and eight, putting a sequential portion of the UUID and then some randomness in it. But here what they found out, is that when you're using completely random UUIDs, your transactions per second falls off a lot based upon how much data and how you have to write it to disks, because there's a lot of random I O with it. So you can see how the transaction per second drops precipitously. In addition, you're going to write a lot more wall and you're going to have a lot more full page images being written to the wall. So you have a lot of disk activity going on relative to that when you're using random. If it's more sequential, you're not going to create as many full page images in the wall. And here's another example of a medium sized data set and then large data set, and you can see how the fully random I O, and even if you have just a portion of it sequential, but not very much, is going to be writing a lot of full page images to the wall. So in postgres, using fully random IDs can cause a lot of performance problems. And you can reference this post if you want to learn more about that, but you're saying, okay, this sounds great, how would I get started using something like this? So there is a PG underscore ID kit extension that includes the new version six, version seven standards, as well as a host of other standards developed by other companies primarily. And you can just click here to get the extension to use it in postgres. And this post follows up talking about generation speed of UUIDs and how much it is to store them. And really, if it takes longer to generate them, they're usually smaller in size, whereas those that are fast tend to be a little bit larger in size. But there's not a significant amount of difference in my opinion. So in terms of what you should use, basically using serial or big serial is probably the way to go. My personal opinion is that I would always go big serial, because if your database is small, you can handle a big serial. It's not going to take up a lot of room. Once you get to have a super large database, you're going to be running out of a normal integer size anyway, so you're going to have to migrate to it. So from the start, I would advocate using big NS or big serial to start. Now, in terms of UUIDs, there's a lot of people that like to use them well, with the version seven being more sequential, I think that's potentially more appealing to use as a primary key. But what I tend to do is only do that if my unique Identifiers are being generated outside of the database. So if they're generated inside, I go ahead and stick with using auto incrementing integers, but if something's generating outside of it, I go ahead and use UUIDs. But this is a very comprehensive post about choosing a primary key. So if you want to learn more, definitely check out this one next piece of content error next val reached maximum value of Sequence this is from Cybergresql.com and I assume this is a post to deal with this error where your sequence has reached its maximum value. [00:09:00] So basically if you chose to use a serial type or a standard integer four type, it'll create a sequence of that size as well and that can cause a problems if you eventually run out of them because now you can't insert any more records because there's nothing left in the sequence. Now, a lot of times you're going to hit this limit when you hit the 2 billion mark in your table. But if you do frequent deletes from that table or a lot of inserts that don't go through completely, you can actually have a much lower number of rows in your table. But still, you're running out of sequence numbers because you're using so many of them. So again, this is another reason that argues for go ahead and using the big zero or the big int to start and just don't worry about it. But this post does show you how you can see what the current value of your sequence is using select Curve value it can actually set the value to a different value and he shows you how you can simulate this particular error. But if you want to learn more you can check out this blog post next piece of content PostgreSQL Configuration this is from Proopensource It and they're talking about all the different configuration parameters in postgres and basically there are 346 of them. I'm assuming that's from yes, version 14. [00:10:14] And if you want to look at all these different parameters, all you have to do at a psql command prompt is type show all with a semicolon and gives you all the different configuration settings along with the description of what each one is. Now if you want to look at a specific one, you can also specify that particular parameter. So you can say show shared buffers and I'll give you an output of what the shared buffers is currently set at. Now these are also categorized into groups and she has a query here where she's looking at the PG settings system view and it actually breaks out by category all the different parameter settings for each one. And finally this post closes out by saying there is this website postgresqlcode NF for. Comf for looking information about each parameter. Of course you can also look at the postgres documentation as well as two different configurators that help you get a start for configuring all of these many parameters of postgres. But if you want to learn more about this you can definitely check out this blog post. [00:11:12] The next piece of content is Parallel Vacuuming in Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL. This is from AWS, Amazon.com, and they're talking about parallel vacuum. And basically vacuum is the process by which you vacuum up dead rows that are left from an update or delete. It doesn't delete them or update them, it actually just marks them for future cleanup, to reclaim that space, to use it for something else, or to actually potentially shrink the table if it's at the end. So this post goes over how vacuum works, and it also talks about Auto Vacuum, which basically handles this automatically based upon different parameters on whether a table is ready to be vacuumed yet. But you can also do manual vacuums. Now, the manual vacuum offers the parallelism. You don't get that with auto vacuum. And this post goes over how you can set different parameters within the database to be able to configure when a parallel vacuum should be run, and also the commands to do it, because you can specify parameters, whereas even though you tell it to do a parallel vacuum, it won't do it. Like if you say it has to have at least four indexes to do parallel, even though you tell it to do parallel, it still won't do it. And I should say that even though this talks about RDS and Aurora, this is applicable to postgres as well. Now, if you want more detail on this particular post, you can also check out five minutes of Postgres episode 34, making the Most of Parallel Vacuum and why SQL JSON was pushed out to postgres 16. This is from Pganalyze.com. So the parallel vacuum post is what Lucas went over in this week's episode, so you can get more insight into the post from him. And he also looks into the SQL JSON being removed from postgres 15, hopefully to come into 16. And what he mentioned from his investigation into the conversation here is that was actually a performance regression, so that's kind of why it was slowed down. But if you want to learn more about that, definitely check out his episode. Next piece of Content how to plot an ASCII bar chart with SQL. This is from blog J Ooq.org, and this is just an interesting little take on how to use a lot of SQL code to generate ASCII charts. So if you want to see what can be done without a graphing library, definitely check out this blog post. The next piece of content wall and checkpoint tuning was the next episode of Postgres FM, so you can listen to that episode or click here to check the YouTube video on this particular episode. [00:13:41] Next Piece of Content PostgreSQL with Andrew Atkinson this is from Codewithjason.com, and this is another podcast that has a guest who's talking about postgres. And interestingly, one of the upcoming episodes of the Robert Dev Show will be having Andrew on as a guest as well, so we'll get to ask him some questions there. [00:14:00] And the last piece of content the PostgreSQL person of the week is Jeevan La Day. If you're interested in learning more about Jeevon and his contributions to postgres, definitely check out this blog post that does it. For this episode of Scaling Postgres, you can get links to all the content mentioned in the show. Notes be sure to head over to Scalingpostgres.com, where you can sign up to receive weekly notifications of each episode, or you can subscribe via YouTube or itunes. Thanks.

Other Episodes

Episode 224

July 18, 2022 00:13:57
Episode Cover

Column Performance, BRIN Win, Unique and Null, Parallel Distinct | Scaling Postgres 224

In this episode of Scaling Postgres, we discuss how too many columns in a table can affect performance, at what point BRIN indexes win...

Listen

Episode 216

May 22, 2022 00:13:50
Episode Cover

PG 15 Beta 1 Release, Query Breakdown, Sort Performance, Prepared vs. Partitioned | Scaling Postgres 216

In this episode of Scaling Postgres, we discuss the Beta 1 release of Postgres 15, optimizing a query by breaking it down, improvements to...

Listen

Episode 308

March 24, 2024 00:18:15
Episode Cover

Scale Through Sharding | Scaling Postgres 308

In this episode of Scaling Postgres, we discuss high availability's future, a custom sharding solution by Figma, sharding pg_vector queries, and PG17 logical replication...

Listen