UUID vs Bigint Battle!!! | Scaling Postgres 302

Episode 302 February 11, 2024 00:13:23
UUID vs Bigint Battle!!! | Scaling Postgres 302
Scaling Postgres
UUID vs Bigint Battle!!! | Scaling Postgres 302

Feb 11 2024 | 00:13:23

/

Hosted By

Creston Jamison

Show Notes

In this episode of Scaling Postgres, we discuss when and how you should use UUIDs or not, how to optimize space with column order and arrays, whether it is better to tune parameters or queries and what devs should know about Postgres with Andrew Atkinson.

To get the show notes as well as get notified of new episodes, visit: 

https://www.scalingpostgres.com/episodes/302-uuid-vs-bigint-battle/

 

View Full Transcript

Episode Transcript

[00:00:00] Before we get started with the show this week, I did want to remind you that my postgres performance webinar is coming up the week that this episode is being released, and there will be another webinar next week as well. The webinar is postgres performance demystified. Learn a simple framework to achieve a thousandfold improvements. It's entirely free and I welcome everyone to attend. The link to it will be the first one in the episode for this week, but I hope you, your friends, family and coworkers continue to do well. But the actual first piece of content this week is UuID benchmark war. This is from ardentperf.com and there are a series of blog posts due to PGSQL Friday being about uuids and whether you should use them or not, and especially should you use them as primary keys, although it seems like a lot of the blog posts were focusing on should you use them for primary keys or not. But this was the most interesting post of the bunch because he ran some benchmarks. So he looked at using a text field to store uUIds using the native UUID data type using UUID version seven, which has a timestamp order in the first part of the UUID and compared those to big ants. And he did the benchmark by, I think, creating a table of 20 million rows and then inserted 1 million rows as fast as they could. And he says, quote, the results successfully demonstrate how each option is better than the previous until you get to begint. So I think this part should have been bolded as well, because he says, quote, the UUID seven performance is so good, it basically matched begint. So here are some of the results you could see. The time to insert 1 million rows for the text UUID was 410 seconds, for UUIDV four was 375, for UUIDV seven was 290, and for begint it was 290. So identical. So not only is there a speed difference, but there's also a size difference. So UUIDB four is 63% smaller than the text version, uUId seven is 7% smaller than the UIDB four version, and big ents is 25% smaller than the UUIDB seven version. So that's one disadvantage of the Uuids compared to big ents is that they're twice the size in terms of big ents being eight bytes, whereas the uuid is 16 bytes. So he shows the graph here of the performance. He compared the table size here and broke it down by index and primary key. And the records showed his benchmark stats showed a time graph performance which is transaction per second across seconds. He thinks this drop here is due to checkpointing. He shows the table size differences tracked over time. So definitely what to get out of this. Never use a text version of a UUID if you can definitely use the built in data types with postgres when you can. He shows how much more data reading goes on with the text in the UUID V four compared to the other two, as well as caching efficiency. Like it's really bad for the text and the v four uuids. So I thought this was fascinating and definitely encouraging for UuId v seven, so can't wait for that to land in postgres. Hopefully it will be in 17. [00:03:22] Next piece of content about UUids was uuID and postgres pgsql Friday 15 this is from blog rustprooflabs.com and his tLdR is quote I avoid uuids. The only place I've used uuids in production are the places where a third party system is involved, and this is exactly what I do. I definitely prefer big ents and I haven't used uuids for primary keys. It's usually just like if javascript generates some sort of unique identifier and gets passed in, that's when I would store the uuId. And he also discusses how he likes the developer experience with integers. Basically they're much easier to remember, I mean, not so much when they get so long, but still much easier to work with compared to something like this. So he definitely prefers using integers over uuids. [00:04:16] Next Post related is PGSqL Friday 15 Primary keys UUID, CuID or TSiD this is from mydbainobook.org. Basically her TLDR is please don't. But she says, well yeah, it depends, but really her opinion is you should really never use uuids for your primary key, and you should only be using integers. Now I haven't actually heard of CUID, but apparently it's been deprecated because it was supposed to be even more unique than a UUID, and a TSID is basically a timestored unique identifier. So that's basically what UUID B seven is going to be. But she definitely likes postgres, auto generated integers and not really uuids. But you can check out this blog post if you want to learn more. And the next UUID related post is postgres AI bot interview for PGSQL Friday 15 this is from cyberdecksgresql.com and here they use the new postgres AI bot. If you don't know what that is, check last week's scaling postgres episode where I cover it there to interview it, asking it the UUID question so basically what it is and how it should be used in their systems. I'm not going to really cover this, but you may find it interesting to take a glance at what AI thought next piece of content reducing table size with optimal column ordering and array based storage this is from pganalyze.com, and this is Lucas's five minutes of postgres, and it's actually his 100th episode, so congratulations. So that's great. Now in this episode he actually covers some older posts talking about a particular topic. Because the way that postgres handles data types, it pads smaller ones, so there's actually a way to order them to save space on disk. I've never really used that because with the applications that I tend to work with where I'm called in to consult on, they're typically changing fairly regularly. But if you have a table you know is going to be static and you know it's going to be a huge table, then it can be an advantage to optimize your column order in that table to make the table as small as possible. So he covers talking about that and also some of the advantages of array based storage. So that's another way to get more compressed data in the table as opposed to storing many rows of data storing in the single column in an array. And he demonstrated in his own application, PG analyze, where they saw a five x size improvements by moving some data into arrays. So that was pretty interesting. So if you want to learn more, definitely check out his piece of content. [00:06:59] Next piece of content tuning parameters versus tuning queries presentation and recording this is from hdombrovoskaya WordPress.com and she posted a presentation as well as the slides here, talking about the differences between optimizing parameters versus optimizing queries, and how a lot of people say, hey, I want to really tune my database. Whereas tuning your parameters of the database gets you, as she says here, maybe you get ten or 20% improvements, maybe 50% improvements if something's really misconfigured. But tuning queries can give you tenfold improvements, 100 fold improvements. So if you're looking to optimize performance, doing it at the query level, how the data is laid out, that will give you the most bang for the buck. But check out this presentation if you're interested. [00:07:49] Next piece of content what developers should know about postgres with Andrew Atkinson. This is from the rubberductdev show, and this is my interview with Andrew Atkinson. We were talking about his book high performance postgresQL for rails, and also of course, what developers should know about postgres. So this is a longer piece of content. It's an hour long podcast essentially, but if you're interested you can watch it on YouTube or your favorite podcasting provider. And he did include a discount link for his book, I think about 35% off. So I'll include that link in the show notes as well. [00:08:25] Next piece of content there was another episode of Postgres FM this week. This one was on modern SQL, and in it Michael interviewed Marcus Winnan, who runs usetheindexluke.com and modernsql.com and the author of SQL Performance explained. And basically he talks about modern SQL, which to him is basically the SQL post, the 92 standard. So the 92 standard is what so many people use. It's what most orms are using. But the versions that have come out after 92 cover JSON B and window functions and a lot of different non relational concepts. So that's essentially what they discussed. They discussed some of the new things coming to the SQL standard. And he also mentioned the number one thing to learn about modern SQL if you haven't learned it yet, is window functions because of the power that they offer. But if you want to learn more, you can definitely listen to the episode or watch it on YouTube down here. Next piece of content. Postgres again elected dbms of the year in 2023. But I'm worried. This is from blog Inurat info and he's talking about postgres being elected the database of the year and all the accolades that have come with postgres over the years and how it looks to be in a really strong position. But his fear, or the reason that he says he's worried, is not because of the state of postgres, but it's basically the state of the DBA and how basically dbas are going extinct. And if that happens, what are we going to be missing if organizations no longer have any dbas and what knowledge is not going to be available anymore? Because a lot of developers, they just bring up a database on their cloud hosting provider and they just get to coding. So I thought this was an interesting read and if you want to learn more you can definitely check it out. [00:10:20] Next piece of content is actually a gist that references what someone wrote before about postgres being enough like just use postgres for everything, and I definitely endorse that. I prefer using postgres for doing search, using postgres for queues when it makes sense, but I don't think I actually use it for everything. [00:10:39] Like the second post here is simplify move code into database functions, which to me that sets off alarm bells. As a developer, I would much rather have the business logic in my application code than in the database. There's very few cases where I would want some of that handled by the database. Then they talk about background and cron jobs. I tend to just use Cron on the instance itself, but if you're hosted, using one of these extensions is definitely something you could do. But you can check out this gist if you want to learn more next piece of content podcast highlights on benchmarking postgres performance from Path to Cituscon episode eleven this is from citrusdata.com. I haven't had a chance to listen to this yet, but you can definitely check out this episode if you want. Next piece of content benchmarking postgres Vector search approaches PGvector versus Lantern this is from Timbo IO and apparently there's another extension called Lantern that lets you work with llms or AI in postgres and they compared what the differences are in them because they are taking a slightly different tact. I think lanterns, I think they said it's based on USArch and what they found is that lantern does do faster index generation, but PG Vector was better with performance, so seems like some bit of trade offs going there. Then they have a second article talking about, quote the one with parallel indexes where they tested version six of PG vector, but the build time still exceeded lantern because you can do parallel index builds in lantern as well. But check this out if you want to learn more about that. [00:12:16] And the last piece of content is contemplating decentralized extension publishing. This is from just a theory, and this is talking about the extension network that David Wheeler is working on as a part of tempo IO. Whereas typically you have to register to be able to push some sort of package or library to some of these library ecosystems with go. It's decentralized. So basically he did an in depth study of how it actually works because he's contemplating potentially using this technique for creating the postgres extension network. But if you want to learn more, definitely check out this blog post. [00:12:54] I hope you enjoyed that episode. Be sure to check out scalingposgrows.com where you can get links to all the content discussed. Get the video and audio podcast version as well as a full transcript. While you're there, be sure to sign up for the email list so we can send you notifications when new episodes come out. Thanks, and I'll see you next week. [00:13:20] Our.

Other Episodes

Episode 43

December 17, 2018 00:09:49
Episode Cover

Logical Replication, Prepared Transactions, Ansible, Partitions | Scaling Postgres 43

In this episode of Scaling Postgres, we review articles covering logical replication, prepared transactions, Ansible Postgres deployment Ansible and tidy partitions. To get the...

Listen

Episode 177

August 09, 2021 00:16:28
Episode Cover

Planner Workarounds, Grouping Sets, Lateral Join, Not-So-Easy | Scaling Postgres 177

In this episode of Scaling Postgres, we discuss ways to work around the planner, working with grouping sets, using a lateral join and not-so-easy...

Listen

Episode 271

June 26, 2023 00:16:24
Episode Cover

30% Faster, UUID Downsides, Growing WAL, Processes vs. Threads | Scaling Postgres 271

  In this episode of Scaling Postgres, we discuss how to get 30% faster performance, the downsides of UUIDs, having too much WAL and whether...

Listen